\[ [Index](index.md) | [Exercise 2.1](ex2_1.md) | [Exercise 2.3](ex2_3.md) \]

# Exercise 2.2

*Objectives:*

- Work with various containers
- List/Set/Dict Comprehensions
- Collections module
- Data analysis challenge

Most Python programmers are generally familiar with lists, dictionaries,
tuples, and other basic datatypes. In this exercise, we'll put that
knowledge to work to solve various data analysis problems.

## (a) Preliminaries

To get started, let's review some basics with a slightly simpler dataset--
a portfolio of stock holdings. Create a file `readport.py` and put this
code in it:

```python
# readport.py

import csv

# A function that reads a file into a list of dicts
def read_portfolio(filename):
    portfolio = []
    with open(filename) as f:
        rows = csv.reader(f)
        headers = next(rows)
        for row in rows:
            record = {
                'name' : row[0],
                'shares' : int(row[1]),
                'price' : float(row[2])
            }
            portfolio.append(record)
    return portfolio
```

This file reads some simple stock market data in the file `Data/portfolio.csv`.  Use
the function to read the file and look at the results:

```python
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> from pprint import pprint
>>> pprint(portfolio)
[{'name': 'AA', 'price': 32.2, 'shares': 100},
 {'name': 'IBM', 'price': 91.1, 'shares': 50},
 {'name': 'CAT', 'price': 83.44, 'shares': 150},
 {'name': 'MSFT', 'price': 51.23, 'shares': 200},
 {'name': 'GE', 'price': 40.37, 'shares': 95},
 {'name': 'MSFT', 'price': 65.1, 'shares': 50},
 {'name': 'IBM', 'price': 70.44, 'shares': 100}]
>>>
```

In this data, each row consists of a stock name, a number of held
shares, and a purchase price.   There are multiple entries for
certain stock names such as MSFT and IBM.

## (b) Comprehensions

List, set, and dictionary comprehensions can be a useful tool for manipulating
data.  For example, try these operations:

```python
>>> # Find all holdings more than 100 shares
>>> [s for s in portfolio if s['shares'] > 100]
[{'name': 'CAT', 'shares': 150, 'price': 83.44}, 
 {'name': 'MSFT', 'shares': 200, 'price': 51.23}]

>>> # Compute total cost (shares * price)
>>> sum([s['shares']*s['price'] for s in portfolio])
44671.15
>>>

>>> # Find all unique stock names (set)
>>> { s['name'] for s in portfolio }
{'MSFT', 'IBM', 'AA', 'GE', 'CAT'}
>>>

>>> # Count the total shares of each of stock
>>> totals = { s['name']: 0 for s in portfolio }
>>> for s in portfolio:
        totals[s['name']] += s['shares']

>>> totals
{'AA': 100, 'IBM': 150, 'CAT': 150, 'MSFT': 250, 'GE': 95}
>>> 
```

## (c) Collections

The `collections` module has a variety of classes for more specialized data
manipulation.  For example, the last example could be solved with a `Counter` like this:

```python
>>> from collections import Counter
>>> totals = Counter()
>>> for s in portfolio:
        totals[s['name']] += s['shares']

>>> totals
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>>
```

Counters are interesting in that they support other kinds of operations such as ranking
and mathematics.  For example:

```python
>>> # Get the two most common holdings
>>> totals.most_common(2)
[('MSFT', 250), ('IBM', 150)]
>>>

>>> # Adding counters together
>>> more = Counter()
>>> more['IBM'] = 75
>>> more['AA'] = 200
>>> more['ACME'] = 30
>>> more
Counter({'AA': 200, 'IBM': 75, 'ACME': 30})
>>> totals
Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
>>> totals + more
Counter({'AA': 300, 'MSFT': 250, 'IBM': 225, 'CAT': 150, 'GE': 95, 'ACME': 30})
>>> 
```

The `defaultdict` object can be used to group data.  For example, suppose
you want to make it easy to find all matching entries for a given name such as
IBM.  Try this:

```python
>>> from collections import defaultdict
>>> byname = defaultdict(list)
>>> for s in portfolio:
        byname[s['name']].append(s)

>>> byname['IBM']
[{'name': 'IBM', 'shares': 50, 'price': 91.1}, {'name': 'IBM', 'shares': 100, 'price': 70.44}]
>>> byname['AA']
[{'name': 'AA', 'shares': 100, 'price': 32.2}]
>>>
```

The key feature that makes this work is that a defaultdict
automatically initializes elements for you--allowing an insertion of a
new element and an `append()` operation to be combined together.

## (c) Data Analysis Challenge

In the last exercise you just wrote some code to read CSV-data related
to the Chicago Transit Authority.  For example, you can grab the data
as dictionaries like this:

```python
>>> import readrides
>>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv')
>>>
```

It would be a shame to do all of that work and then do nothing with
the data.

In this exercise, your task is this: write a program to answer the
following three questions:

1. How many bus routes exist in Chicago?

2. How many people rode the number 22 bus on February 2, 2011?  What about any route on any date of your choosing?

3. What is the total number of rides taken on each bus route?

4. What five bus routes had the greatest ten-year increase in ridership from 2001 to 2011?

You are free to use any technique whatsoever to answer the above
questions as long as it's part of the Python standard library (i.e.,
built-in datatypes, standard library modules, etc.). 

\[ [Solution](soln2_2.md) | [Index](index.md) | [Exercise 2.1](ex2_1.md) | [Exercise 2.3](ex2_3.md) \]

----
`>>>` Advanced Python Mastery  
`...` A course by [dabeaz](https://www.dabeaz.com)  
`...` Copyright 2007-2023  

![](https://i.creativecommons.org/l/by-sa/4.0/88x31.png). This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)