193 lines
5.6 KiB
Markdown
193 lines
5.6 KiB
Markdown
|
\[ [Index](index.md) | [Exercise 2.1](ex2_1.md) | [Exercise 2.3](ex2_3.md) \]
|
||
|
|
||
|
# Exercise 2.2
|
||
|
|
||
|
*Objectives:*
|
||
|
|
||
|
- Work with various containers
|
||
|
- List/Set/Dict Comprehensions
|
||
|
- Collections module
|
||
|
- Data analysis challenge
|
||
|
|
||
|
Most Python programmers are generally familiar with lists, dictionaries,
|
||
|
tuples, and other basic datatypes. In this exercise, we'll put that
|
||
|
knowledge to work to solve various data analysis problems.
|
||
|
|
||
|
## (a) Preliminaries
|
||
|
|
||
|
To get started, let's review some basics with a slightly simpler dataset--
|
||
|
a portfolio of stock holdings. Create a file `readport.py` and put this
|
||
|
code in it:
|
||
|
|
||
|
```python
|
||
|
# readport.py
|
||
|
|
||
|
import csv
|
||
|
|
||
|
# A function that reads a file into a list of dicts
|
||
|
def read_portfolio(filename):
|
||
|
portfolio = []
|
||
|
with open(filename) as f:
|
||
|
rows = csv.reader(f)
|
||
|
headers = next(rows)
|
||
|
for row in rows:
|
||
|
record = {
|
||
|
'name' : row[0],
|
||
|
'shares' : int(row[1]),
|
||
|
'price' : float(row[2])
|
||
|
}
|
||
|
portfolio.append(record)
|
||
|
return portfolio
|
||
|
```
|
||
|
|
||
|
This file reads some simple stock market data in the file `Data/portfolio.csv`. Use
|
||
|
the function to read the file and look at the results:
|
||
|
|
||
|
```python
|
||
|
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||
|
>>> from pprint import pprint
|
||
|
>>> pprint(portfolio)
|
||
|
[{'name': 'AA', 'price': 32.2, 'shares': 100},
|
||
|
{'name': 'IBM', 'price': 91.1, 'shares': 50},
|
||
|
{'name': 'CAT', 'price': 83.44, 'shares': 150},
|
||
|
{'name': 'MSFT', 'price': 51.23, 'shares': 200},
|
||
|
{'name': 'GE', 'price': 40.37, 'shares': 95},
|
||
|
{'name': 'MSFT', 'price': 65.1, 'shares': 50},
|
||
|
{'name': 'IBM', 'price': 70.44, 'shares': 100}]
|
||
|
>>>
|
||
|
```
|
||
|
|
||
|
In this data, each row consists of a stock name, a number of held
|
||
|
shares, and a purchase price. There are multiple entries for
|
||
|
certain stock names such as MSFT and IBM.
|
||
|
|
||
|
## (b) Comprehensions
|
||
|
|
||
|
List, set, and dictionary comprehensions can be a useful tool for manipulating
|
||
|
data. For example, try these operations:
|
||
|
|
||
|
```python
|
||
|
>>> # Find all holdings more than 100 shares
|
||
|
>>> [s for s in portfolio if s['shares'] > 100]
|
||
|
[{'name': 'CAT', 'shares': 150, 'price': 83.44},
|
||
|
{'name': 'MSFT', 'shares': 200, 'price': 51.23}]
|
||
|
|
||
|
>>> # Compute total cost (shares * price)
|
||
|
>>> sum([s['shares']*s['price'] for s in portfolio])
|
||
|
44671.15
|
||
|
>>>
|
||
|
|
||
|
>>> # Find all unique stock names (set)
|
||
|
>>> { s['name'] for s in portfolio }
|
||
|
{'MSFT', 'IBM', 'AA', 'GE', 'CAT'}
|
||
|
>>>
|
||
|
|
||
|
>>> # Count the total shares of each of stock
|
||
|
>>> totals = { s['name']: 0 for s in portfolio }
|
||
|
>>> for s in portfolio:
|
||
|
totals[s['name']] += s['shares']
|
||
|
|
||
|
>>> totals
|
||
|
{'AA': 100, 'IBM': 150, 'CAT': 150, 'MSFT': 250, 'GE': 95}
|
||
|
>>>
|
||
|
```
|
||
|
|
||
|
## (c) Collections
|
||
|
|
||
|
The `collections` module has a variety of classes for more specialized data
|
||
|
manipulation. For example, the last example could be solved with a `Counter` like this:
|
||
|
|
||
|
```python
|
||
|
>>> from collections import Counter
|
||
|
>>> totals = Counter()
|
||
|
>>> for s in portfolio:
|
||
|
totals[s['name']] += s['shares']
|
||
|
|
||
|
>>> totals
|
||
|
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
|
||
|
>>>
|
||
|
```
|
||
|
|
||
|
Counters are interesting in that they support other kinds of operations such as ranking
|
||
|
and mathematics. For example:
|
||
|
|
||
|
```python
|
||
|
>>> # Get the two most common holdings
|
||
|
>>> totals.most_common(2)
|
||
|
[('MSFT', 250), ('IBM', 150)]
|
||
|
>>>
|
||
|
|
||
|
>>> # Adding counters together
|
||
|
>>> more = Counter()
|
||
|
>>> more['IBM'] = 75
|
||
|
>>> more['AA'] = 200
|
||
|
>>> more['ACME'] = 30
|
||
|
>>> more
|
||
|
Counter({'AA': 200, 'IBM': 75, 'ACME': 30})
|
||
|
>>> totals
|
||
|
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
|
||
|
>>> totals + more
|
||
|
Counter({'AA': 300, 'MSFT': 250, 'IBM': 225, 'CAT': 150, 'GE': 95, 'ACME': 30})
|
||
|
>>>
|
||
|
```
|
||
|
|
||
|
The `defaultdict` object can be used to group data. For example, suppose
|
||
|
you want to make it easy to find all matching entries for a given name such as
|
||
|
IBM. Try this:
|
||
|
|
||
|
```python
|
||
|
>>> from collections import defaultdict
|
||
|
>>> byname = defaultdict(list)
|
||
|
>>> for s in portfolio:
|
||
|
byname[s['name']].append(s)
|
||
|
|
||
|
>>> byname['IBM']
|
||
|
[{'name': 'IBM', 'shares': 50, 'price': 91.1}, {'name': 'IBM', 'shares': 100, 'price': 70.44}]
|
||
|
>>> byname['AA']
|
||
|
[{'name': 'AA', 'shares': 100, 'price': 32.2}]
|
||
|
>>>
|
||
|
```
|
||
|
|
||
|
The key feature that makes this work is that a defaultdict
|
||
|
automatically initializes elements for you--allowing an insertion of a
|
||
|
new element and an `append()` operation to be combined together.
|
||
|
|
||
|
## (c) Data Analysis Challenge
|
||
|
|
||
|
In the last exercise you just wrote some code to read CSV-data related
|
||
|
to the Chicago Transit Authority. For example, you can grab the data
|
||
|
as dictionaries like this:
|
||
|
|
||
|
```python
|
||
|
>>> import readrides
|
||
|
>>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv')
|
||
|
>>>
|
||
|
```
|
||
|
|
||
|
It would be a shame to do all of that work and then do nothing with
|
||
|
the data.
|
||
|
|
||
|
In this exercise, you task is this: write a program to answer the
|
||
|
following three questions:
|
||
|
|
||
|
1. How many bus routes exist in Chicago?
|
||
|
|
||
|
2. How many people rode the number 22 bus on February 2, 2011? What about any route on any date of your choosing?
|
||
|
|
||
|
3. What is the total number of rides taken on each bus route?
|
||
|
|
||
|
4. What five bus routes had the greatest ten-year increase in ridership from 2001 to 2011?
|
||
|
|
||
|
You are free to use any technique whatsoever to answer the above
|
||
|
questions as long as it's part of the Python standard library (i.e.,
|
||
|
built-in datatypes, standard library modules, etc.).
|
||
|
|
||
|
\[ [Solution](soln2_2.md) | [Index](index.md) | [Exercise 2.1](ex2_1.md) | [Exercise 2.3](ex2_3.md) \]
|
||
|
|
||
|
----
|
||
|
`>>>` Advanced Python Mastery
|
||
|
`...` A course by [dabeaz](https://www.dabeaz.com)
|
||
|
`...` Copyright 2007-2023
|
||
|
|
||
|
. This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)
|