Initial commit

2023-07-16 20:21:00 -05:00
parent 82e815fab2
commit 7d4b30154a
259 changed files with 600233 additions and 2 deletions
--- a/Exercises/ex2_2.md
+++ b/Exercises/ex2_2.md
@@ -0,0 +1,192 @@
+\[ [Index](index.md) | [Exercise 2.1](ex2_1.md) | [Exercise 2.3](ex2_3.md) \]
+
+# Exercise 2.2
+
+*Objectives:*
+
+- Work with various containers
+- List/Set/Dict Comprehensions
+- Collections module
+- Data analysis challenge
+
+Most Python programmers are generally familiar with lists, dictionaries,
+tuples, and other basic datatypes. In this exercise, we'll put that
+knowledge to work to solve various data analysis problems.
+
+## (a) Preliminaries
+
+To get started, let's review some basics with a slightly simpler dataset--
+a portfolio of stock holdings. Create a file `readport.py` and put this
+code in it:
+
+```python
+# readport.py
+
+import csv
+
+# A function that reads a file into a list of dicts
+def read_portfolio(filename):
+    portfolio = []
+    with open(filename) as f:
+        rows = csv.reader(f)
+        headers = next(rows)
+        for row in rows:
+            record = {
+                'name' : row[0],
+                'shares' : int(row[1]),
+                'price' : float(row[2])
+                }
+            portfolio.append(record)
+    return portfolio
+```
+
+This file reads some simple stock market data in the file `Data/portfolio.csv`.  Use
+the function to read the file and look at the results:
+
+```python
+>>> portfolio = read_portfolio('Data/portfolio.csv')
+>>> from pprint import pprint
+>>> pprint(portfolio)
+[{'name': 'AA', 'price': 32.2, 'shares': 100},
+ {'name': 'IBM', 'price': 91.1, 'shares': 50},
+ {'name': 'CAT', 'price': 83.44, 'shares': 150},
+ {'name': 'MSFT', 'price': 51.23, 'shares': 200},
+ {'name': 'GE', 'price': 40.37, 'shares': 95},
+ {'name': 'MSFT', 'price': 65.1, 'shares': 50},
+ {'name': 'IBM', 'price': 70.44, 'shares': 100}]
+>>>
+```
+
+In this data, each row consists of a stock name, a number of held
+shares, and a purchase price.   There are multiple entries for
+certain stock names such as MSFT and IBM.
+
+## (b) Comprehensions
+
+List, set, and dictionary comprehensions can be a useful tool for manipulating
+data.  For example, try these operations:
+
+```python
+>>> # Find all holdings more than 100 shares
+>>> [s for s in portfolio if s['shares'] > 100]
+[{'name': 'CAT', 'shares': 150, 'price': 83.44}, 
+ {'name': 'MSFT', 'shares': 200, 'price': 51.23}]
+
+>>> # Compute total cost (shares * price)
+>>> sum([s['shares']*s['price'] for s in portfolio])
+44671.15
+>>>
+
+>>> # Find all unique stock names (set)
+>>> { s['name'] for s in portfolio }
+{'MSFT', 'IBM', 'AA', 'GE', 'CAT'}
+>>>
+
+>>> # Count the total shares of each of stock
+>>> totals = { s['name']: 0 for s in portfolio }
+>>> for s in portfolio:
+        totals[s['name']] += s['shares']
+
+>>> totals
+{'AA': 100, 'IBM': 150, 'CAT': 150, 'MSFT': 250, 'GE': 95}
+>>> 
+```
+
+## (c) Collections
+
+The `collections` module has a variety of classes for more specialized data
+manipulation.  For example, the last example could be solved with a `Counter` like this:
+
+```python
+>>> from collections import Counter
+>>> totals = Counter()
+>>> for s in portfolio:
+        totals[s['name']] += s['shares']
+
+>>> totals
+Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
+>>>
+```
+
+Counters are interesting in that they support other kinds of operations such as ranking
+and mathematics.  For example:
+
+```python
+>>> # Get the two most common holdings
+>>> totals.most_common(2)
+[('MSFT', 250), ('IBM', 150)]
+>>>
+
+>>> # Adding counters together
+>>> more = Counter()
+>>> more['IBM'] = 75
+>>> more['AA'] = 200
+>>> more['ACME'] = 30
+>>> more
+Counter({'AA': 200, 'IBM': 75, 'ACME': 30})
+>>> totals
+Counter({'MSFT': 250, 'IBM': 150, 'CAT': 150, 'AA': 100, 'GE': 95})
+>>> totals + more
+Counter({'AA': 300, 'MSFT': 250, 'IBM': 225, 'CAT': 150, 'GE': 95, 'ACME': 30})
+>>> 
+```
+
+The `defaultdict` object can be used to group data.  For example, suppose
+you want to make it easy to find all matching entries for a given name such as
+IBM.  Try this:
+
+```python
+>>> from collections import defaultdict
+>>> byname = defaultdict(list)
+>>> for s in portfolio:
+        byname[s['name']].append(s)
+
+>>> byname['IBM']
+[{'name': 'IBM', 'shares': 50, 'price': 91.1}, {'name': 'IBM', 'shares': 100, 'price': 70.44}]
+>>> byname['AA']
+[{'name': 'AA', 'shares': 100, 'price': 32.2}]
+>>>
+```
+
+The key feature that makes this work is that a defaultdict
+automatically initializes elements for you--allowing an insertion of a
+new element and an `append()` operation to be combined together.
+
+## (c) Data Analysis Challenge
+
+In the last exercise you just wrote some code to read CSV-data related
+to the Chicago Transit Authority.  For example, you can grab the data
+as dictionaries like this:
+
+```python
+>>> import readrides
+>>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv')
+>>>
+```
+
+It would be a shame to do all of that work and then do nothing with
+the data.
+
+In this exercise, you task is this: write a program to answer the
+following three questions:
+
+1. How many bus routes exist in Chicago?
+
+2. How many people rode the number 22 bus on February 2, 2011?  What about any route on any date of your choosing?
+
+3. What is the total number of rides taken on each bus route?
+
+4. What five bus routes had the greatest ten-year increase in ridership from 2001 to 2011?
+
+You are free to use any technique whatsoever to answer the above
+questions as long as it's part of the Python standard library (i.e.,
+built-in datatypes, standard library modules, etc.). 
+
+\[ [Solution](soln2_2.md) | [Index](index.md) | [Exercise 2.1](ex2_1.md) | [Exercise 2.3](ex2_3.md) \]
+
+----
+`>>>` Advanced Python Mastery  
+`...` A course by [dabeaz](https://www.dabeaz.com)  
+`...` Copyright 2007-2023  
+
+![](https://i.creativecommons.org/l/by-sa/4.0/88x31.png). This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)