Initial commit
This commit is contained in:
192
Exercises/ex2_2.md
Normal file
192
Exercises/ex2_2.md
Normal file
@@ -0,0 +1,192 @@
|
||||
\[ [Index](index.md) | [Exercise 2.1](ex2_1.md) | [Exercise 2.3](ex2_3.md) \]
|
||||
|
||||
# Exercise 2.2
|
||||
|
||||
*Objectives:*
|
||||
|
||||
- Work with various containers
|
||||
- List/Set/Dict Comprehensions
|
||||
- Collections module
|
||||
- Data analysis challenge
|
||||
|
||||
Most Python programmers are generally familiar with lists, dictionaries,
|
||||
tuples, and other basic datatypes. In this exercise, we'll put that
|
||||
knowledge to work to solve various data analysis problems.
|
||||
|
||||
## (a) Preliminaries
|
||||
|
||||
To get started, let's review some basics with a slightly simpler dataset--
|
||||
a portfolio of stock holdings. Create a file `readport.py` and put this
|
||||
code in it:
|
||||
|
||||
```python
|
||||
# readport.py
|
||||
|
||||
import csv
|
||||
|
||||
# A function that reads a file into a list of dicts
|
||||
def read_portfolio(filename):
|
||||
portfolio = []
|
||||
with open(filename) as f:
|
||||
rows = csv.reader(f)
|
||||
headers = next(rows)
|
||||
for row in rows:
|
||||
record = {
|
||||
'name' : row[0],
|
||||
'shares' : int(row[1]),
|
||||
'price' : float(row[2])
|
||||
}
|
||||
portfolio.append(record)
|
||||
return portfolio
|
||||
```
|
||||
|
||||
This file reads some simple stock market data in the file `Data/portfolio.csv`. Use
|
||||
the function to read the file and look at the results:
|
||||
|
||||
```python
|
||||
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||
>>> from pprint import pprint
|
||||
>>> pprint(portfolio)
|
||||
[{'name': 'AA', 'price': 32.2, 'shares': 100},
|
||||
{'name': 'IBM', 'price': 91.1, 'shares': 50},
|
||||
{'name': 'CAT', 'price': 83.44, 'shares': 150},
|
||||
{'name': 'MSFT', 'price': 51.23, 'shares': 200},
|
||||
{'name': 'GE', 'price': 40.37, 'shares': 95},
|
||||
{'name': 'MSFT', 'price': 65.1, 'shares': 50},
|
||||
{'name': 'IBM', 'price': 70.44, 'shares': 100}]
|
||||
>>>
|
||||
```
|
||||
|
||||
In this data, each row consists of a stock name, a number of held
|
||||
shares, and a purchase price. There are multiple entries for
|
||||
certain stock names such as MSFT and IBM.
|
||||
|
||||
## (b) Comprehensions
|
||||
|
||||
List, set, and dictionary comprehensions can be a useful tool for manipulating
|
||||
data. For example, try these operations:
|
||||
|
||||
```python
|
||||
>>> # Find all holdings more than 100 shares
|
||||
>>> [s for s in portfolio if s['shares'] > 100]
|
||||
[{'name': 'CAT', 'shares': 150, 'price': 83.44},
|
||||
{'name': 'MSFT', 'shares': 200, 'price': 51.23}]
|
||||
|
||||
>>> # Compute total cost (shares * price)
|
||||
>>> sum([s['shares']*s['price'] for s in portfolio])
|
||||
44671.15
|
||||
>>>
|
||||
|
||||
>>> # Find all unique stock names (set)
|
||||
>>> { s['name'] for s in portfolio }
|
||||
{'MSFT', 'IBM', 'AA', 'GE', 'CAT'}
|
||||
>>>
|
||||
|
||||
>>> # Count the total shares of each of stock
|
||||
>>> totals = { s['name']: 0 for s in portfolio }
|
||||
>>> for s in portfolio:
|
||||
totals[s['name']] += s['shares']
|
||||
|
||||
>>> totals
|
||||
{'AA': 100, 'IBM': 150, 'CAT': 150, 'MSFT': 250, 'GE': 95}
|
||||
>>>
|
||||
```
|
||||
|
||||
## (c) Collections
|
||||
|
||||
The `collections` module has a variety of classes for more specialized data
|
||||
manipulation. For example, the last example could be solved with a `Counter` like this:
|
||||
|
||||
```python
|
||||
>>> from collections import Counter
|
||||
>>> totals = Counter()
|
||||
>>> for s in portfolio:
|
||||
totals[s['name']] += s['shares']
|
||||
|
||||
>>> totals
|
||||
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
|
||||
>>>
|
||||
```
|
||||
|
||||
Counters are interesting in that they support other kinds of operations such as ranking
|
||||
and mathematics. For example:
|
||||
|
||||
```python
|
||||
>>> # Get the two most common holdings
|
||||
>>> totals.most_common(2)
|
||||
[('MSFT', 250), ('IBM', 150)]
|
||||
>>>
|
||||
|
||||
>>> # Adding counters together
|
||||
>>> more = Counter()
|
||||
>>> more['IBM'] = 75
|
||||
>>> more['AA'] = 200
|
||||
>>> more['ACME'] = 30
|
||||
>>> more
|
||||
Counter({'AA': 200, 'IBM': 75, 'ACME': 30})
|
||||
>>> totals
|
||||
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
|
||||
>>> totals + more
|
||||
Counter({'AA': 300, 'MSFT': 250, 'IBM': 225, 'CAT': 150, 'GE': 95, 'ACME': 30})
|
||||
>>>
|
||||
```
|
||||
|
||||
The `defaultdict` object can be used to group data. For example, suppose
|
||||
you want to make it easy to find all matching entries for a given name such as
|
||||
IBM. Try this:
|
||||
|
||||
```python
|
||||
>>> from collections import defaultdict
|
||||
>>> byname = defaultdict(list)
|
||||
>>> for s in portfolio:
|
||||
byname[s['name']].append(s)
|
||||
|
||||
>>> byname['IBM']
|
||||
[{'name': 'IBM', 'shares': 50, 'price': 91.1}, {'name': 'IBM', 'shares': 100, 'price': 70.44}]
|
||||
>>> byname['AA']
|
||||
[{'name': 'AA', 'shares': 100, 'price': 32.2}]
|
||||
>>>
|
||||
```
|
||||
|
||||
The key feature that makes this work is that a defaultdict
|
||||
automatically initializes elements for you--allowing an insertion of a
|
||||
new element and an `append()` operation to be combined together.
|
||||
|
||||
## (c) Data Analysis Challenge
|
||||
|
||||
In the last exercise you just wrote some code to read CSV-data related
|
||||
to the Chicago Transit Authority. For example, you can grab the data
|
||||
as dictionaries like this:
|
||||
|
||||
```python
|
||||
>>> import readrides
|
||||
>>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv')
|
||||
>>>
|
||||
```
|
||||
|
||||
It would be a shame to do all of that work and then do nothing with
|
||||
the data.
|
||||
|
||||
In this exercise, you task is this: write a program to answer the
|
||||
following three questions:
|
||||
|
||||
1. How many bus routes exist in Chicago?
|
||||
|
||||
2. How many people rode the number 22 bus on February 2, 2011? What about any route on any date of your choosing?
|
||||
|
||||
3. What is the total number of rides taken on each bus route?
|
||||
|
||||
4. What five bus routes had the greatest ten-year increase in ridership from 2001 to 2011?
|
||||
|
||||
You are free to use any technique whatsoever to answer the above
|
||||
questions as long as it's part of the Python standard library (i.e.,
|
||||
built-in datatypes, standard library modules, etc.).
|
||||
|
||||
\[ [Solution](soln2_2.md) | [Index](index.md) | [Exercise 2.1](ex2_1.md) | [Exercise 2.3](ex2_3.md) \]
|
||||
|
||||
----
|
||||
`>>>` Advanced Python Mastery
|
||||
`...` A course by [dabeaz](https://www.dabeaz.com)
|
||||
`...` Copyright 2007-2023
|
||||
|
||||
. This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)
|
||||
Reference in New Issue
Block a user