Initial commit

2023-07-16 20:21:00 -05:00
parent 82e815fab2
commit 7d4b30154a
259 changed files with 600233 additions and 2 deletions
--- a/Exercises/ex2_3.md
+++ b/Exercises/ex2_3.md
@@ -0,0 +1,365 @@
+\[ [Index](index.md) | [Exercise 2.2](ex2_2.md) | [Exercise 2.4](ex2_4.md) \]
+
+# Exercise 2.3
+
+*Objectives:*
+
+- Iterate like a pro
+
+*Files Modified:* None.
+
+Iteration is an essential Python skill.  In this exercise, we look at
+a number of common iteration idioms.
+
+Start the exercise by grabbing some rows of data from a CSV file.
+
+```python
+>>> import csv
+>>> f = open('Data/portfolio.csv')
+>>> f_csv = csv.reader(f)
+>>> headers = next(f_csv)
+>>> headers
+['name', 'shares', 'price']
+>>> rows = list(f_csv)
+>>> from pprint import pprint
+>>> pprint(rows)
+[['AA', '100', '32.20'],
+ ['IBM', '50', '91.10'],
+ ['CAT', '150', '83.44'],
+ ['MSFT', '200', '51.23'],
+ ['GE', '95', '40.37'],
+ ['MSFT', '50', '65.10'],
+ ['IBM', '100', '70.44']]
+>>>
+```
+
+## (a) Basic Iteration and Unpacking
+
+The `for` statement iterates over any sequence of data. For example:
+
+```python
+>>> for row in rows:
+        print(row)
+
+['AA', '100', '32.20']
+['IBM', '50', '91.10']
+['CAT', '150', '83.44']
+['MSFT', '200', '51.23']
+['GE', '95', '40.37']
+['MSFT', '50', '65.10']
+['IBM', '100', '70.44']
+>>>
+```
+
+Unpack the values into separate variables if you need to:
+
+```python
+>>> for name, shares, price in rows:
+        print(name, shares, price)
+
+AA 100 32.20
+IBM 50 91.10
+CAT 150 83.44
+MSFT 200 51.23
+GE 95 40.37
+MSFT 50 65.10
+IBM 100 70.44
+>>>
+```
+
+It's somewhat common to use `_` or `__` as a throw-away variable if you don't care
+about one or more of the values.  For example:
+
+```python
+>>> for name, _, price in rows:
+        print(name, price)
+
+AA 32.20
+IBM 91.10
+CAT 83.44
+MSFT 51.23
+GE 40.37
+MSFT 65.10
+IBM 70.44
+>>>
+```
+
+If you don't know how many values are being unpacked, you can use `*` as a wildcard.
+Try this experiment in grouping the data by name:
+
+```python
+>>> from collections import defaultdict
+>>> byname = defaultdict(list)
+>>> for name, *data in rows:
+        byname[name].append(data)
+
+>>> byname['IBM']
+[['50', '91.10'], ['100', '70.44']]
+>>> byname['CAT']
+[['150', '83.44']]
+>>> for shares, price in byname['IBM']:
+        print(shares, price)
+
+50 91.10
+100 70.44
+>>>
+```
+
+## (b) Counting with enumerate()
+
+`enumerate()` is a useful function if you ever need to keep a counter
+or index while iterating. For example, suppose you wanted an extra row
+number:
+
+```python
+>>> for rowno, row in enumerate(rows):
+        print(rowno, row)
+
+0 ['AA', '100', '32.20']
+1 ['IBM', '50', '91.10']
+2 ['CAT', '150', '83.44']
+3 ['MSFT', '200', '51.23']
+4 ['GE', '95', '40.37']
+5 ['MSFT', '50', '65.10']
+6 ['IBM', '100', '70.44']
+>>>
+```
+
+You can combine this with unpacking if you're careful about how you structure it:
+
+```python
+>>> for rowno, (name, shares, price) in enumerate(rows):
+        print(rowno, name, shares, price)
+
+0 AA 100 32.20
+1 IBM 50 91.10
+2 CAT 150 83.44
+3 MSFT 200 51.23
+4 GE 95 40.37
+5 MSFT 50 65.10
+6 IBM 100 70.44
+>>> 
+```
+
+## (c) Using the zip() function
+
+The `zip()` function is most commonly used to pair data.  For example,
+recall that you created a `headers` variable:
+
+```python
+>>> headers
+['name', 'shares', 'price']
+>>>
+```
+
+This might be useful to combine with the other row data:
+
+```python
+>>> row = rows[0]
+>>> row
+['AA', '100', '32.20']
+>>> for col, val in zip(headers, row):
+        print(col, val)
+
+name AA
+shares 100
+price 32.20
+>>>
+```
+
+Or maybe you can use it to make a dictionary:
+
+```python
+>>> dict(zip(headers, row))
+{'name': 'AA', 'shares': '100', 'price': '32.20'}
+>>>
+```
+
+Or maybe a sequence of dictionaries:
+
+```python
+>>> for row in rows:
+        record = dict(zip(headers, row))
+        print(record)
+
+{'name': 'AA', 'shares': '100', 'price': '32.20'}
+{'name': 'IBM', 'shares': '50', 'price': '91.10'}
+{'name': 'CAT', 'shares': '150', 'price': '83.44'}
+{'name': 'MSFT', 'shares': '200', 'price': '51.23'}
+{'name': 'GE', 'shares': '95', 'price': '40.37'}
+{'name': 'MSFT', 'shares': '50', 'price': '65.10'}
+{'name': 'IBM', 'shares': '100', 'price': '70.44'}
+>>>
+```
+
+## (d) Generator Expressions
+
+A generator expression is almost exactly the same as a list
+comprehension except that it does not create a list.  Instead, it
+creates an object that produces the results incrementally--typically
+for consumption by iteration. Try a simple example:
+
+```python
+>>> nums = [1,2,3,4,5]
+>>> squares = (x*x for x in nums)
+>>> squares
+<generator object <genexpr> at 0x37caa8>
+>>> for n in squares:
+        print(n)
+
+1
+4
+9
+16
+25
+>>>
+```
+
+You will notice that a generator expression can only be used once.
+Watch what happens if you do the for-loop again:
+
+```python
+>>> for n in squares:
+         print(n)
+
+>>>
+```
+
+You can manually get the results one-at-a-time if you use the
+`next()` function. Try this:
+
+```python
+>>> squares = (x*x for x in nums)
+>>> next(squares)
+1
+>>> next(squares)
+4
+>>> next(squares)
+9
+>>>
+```
+
+Keeping typing `next()` to see what happens when there is no
+more data.
+
+If the task you are performing is more complicated, you can
+still take advantage of generators by writing a generator function 
+and using the `yield` statement instead.
+For example:
+
+```python
+>>> def squares(nums):
+        for x in nums:
+            yield x*x
+
+>>> for n in squares(nums):
+        print(n)
+
+1
+4
+9
+16
+25
+>>>
+```
+
+We'll return to generator functions a little later in the course--for now,
+just view such functions as having the interesting property of feeding
+values to the `for`-statement.
+
+## (e) Generator Expressions and Reduction Functions
+
+Generator expressions are especially useful for feeding data into
+functions such as `sum()`, `min()`, `max()`,
+`any()`, etc.   Try some examples using the portfolio data from
+earlier.  Carefully observe that these examples are missing some
+extra square brackets ([]) that appeared when using list comprehensions.
+
+```python
+>>> from readport import read_portfolio
+>>> portfolio = read_portfolio('Data/portfolio.csv')
+>>> sum(s['shares']*s['price'] for s in portfolio)
+44671.15
+>>> min(s['shares'] for s in portfolio)
+50
+>>> any(s['name'] == 'IBM' for s in portfolio)
+True
+>>> all(s['name'] == 'IBM' for s in portfolio)
+False
+>>> sum(s['shares'] for s in portfolio if s['name'] == 'IBM')
+150
+>>>
+```
+
+Here is an subtle use of a generator expression in making comma
+separated values:
+
+```python
+>>> s = ('GOOG',100,490.10)
+>>> ','.join(s)
+... observe that it fails ...
+>>> ','.join(str(x) for x in s)    # This works
+'GOOG,100,490.1'
+>>>
+```
+
+The syntax in the above examples takes some getting used to, but the
+critical point is that none of the operations ever create a fully
+populated list of results.  This gives you a big memory savings.  However,
+you do need to make sure you don't go overboard with the syntax.
+
+## (f) Saving a lot of memory
+
+In link:ex2_1.html[Exercise 2.1] you wrote a function
+`read_rides_as_dicts()` that read the CTA bus data into a list of
+dictionaries.  Using it requires a lot of memory. For example,
+let's find the day on which the route 22 bus had the greatest
+ridership:
+
+```python
+>>> import tracemalloc
+>>> tracemalloc.start()
+>>> import readrides
+>>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv')
+>>> rt22 = [row for row in rows if row['route'] == '22']
+>>> max(rt22, key=lambda row: row['rides'])
+{'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896}
+>>> tracemalloc.get_traced_memory()
+... look at result. Should be around 220MB
+>>>
+```
+
+Now, let's try an example involving generators. Restart Python
+and try this:
+
+```python
+>>> # RESTART
+>>> import tracemalloc
+>>> tracemalloc.start()
+>>> import csv
+>>> f = open('Data/ctabus.csv')
+>>> f_csv = csv.reader(f)
+>>> headers = next(f_csv)
+>>> rows = (dict(zip(headers,row)) for row in f_csv)
+>>> rt22 = (row for row in rows if row['route'] == '22')
+>>> max(rt22, key=lambda row: int(row['rides']))
+{'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896}
+>>> tracemalloc.get_traced_memory()
+... look at result. Should be a LOT smaller than before
+>>>
+```
+
+Keep in mind that you just processed the entire dataset as if it was
+stored as a sequence of dictionaries.  Yet, nowhere did you actually
+create and store a list of dictionaries.   Not all problems can be
+structured in this way, but if you can work with data in an
+iterative manner, generator expressions can save a huge amount of memory.
+
+\[ [Solution](soln2_3.md) | [Index](index.md) | [Exercise 2.2](ex2_2.md) | [Exercise 2.4](ex2_4.md) \]
+
+----
+`>>>` Advanced Python Mastery  
+`...` A course by [dabeaz](https://www.dabeaz.com)  
+`...` Copyright 2007-2023  
+
+![](https://i.creativecommons.org/l/by-sa/4.0/88x31.png). This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)