Initial commit
This commit is contained in:
365
Exercises/ex2_3.md
Normal file
365
Exercises/ex2_3.md
Normal file
@@ -0,0 +1,365 @@
|
||||
\[ [Index](index.md) | [Exercise 2.2](ex2_2.md) | [Exercise 2.4](ex2_4.md) \]
|
||||
|
||||
# Exercise 2.3
|
||||
|
||||
*Objectives:*
|
||||
|
||||
- Iterate like a pro
|
||||
|
||||
*Files Modified:* None.
|
||||
|
||||
Iteration is an essential Python skill. In this exercise, we look at
|
||||
a number of common iteration idioms.
|
||||
|
||||
Start the exercise by grabbing some rows of data from a CSV file.
|
||||
|
||||
```python
|
||||
>>> import csv
|
||||
>>> f = open('Data/portfolio.csv')
|
||||
>>> f_csv = csv.reader(f)
|
||||
>>> headers = next(f_csv)
|
||||
>>> headers
|
||||
['name', 'shares', 'price']
|
||||
>>> rows = list(f_csv)
|
||||
>>> from pprint import pprint
|
||||
>>> pprint(rows)
|
||||
[['AA', '100', '32.20'],
|
||||
['IBM', '50', '91.10'],
|
||||
['CAT', '150', '83.44'],
|
||||
['MSFT', '200', '51.23'],
|
||||
['GE', '95', '40.37'],
|
||||
['MSFT', '50', '65.10'],
|
||||
['IBM', '100', '70.44']]
|
||||
>>>
|
||||
```
|
||||
|
||||
## (a) Basic Iteration and Unpacking
|
||||
|
||||
The `for` statement iterates over any sequence of data. For example:
|
||||
|
||||
```python
|
||||
>>> for row in rows:
|
||||
print(row)
|
||||
|
||||
['AA', '100', '32.20']
|
||||
['IBM', '50', '91.10']
|
||||
['CAT', '150', '83.44']
|
||||
['MSFT', '200', '51.23']
|
||||
['GE', '95', '40.37']
|
||||
['MSFT', '50', '65.10']
|
||||
['IBM', '100', '70.44']
|
||||
>>>
|
||||
```
|
||||
|
||||
Unpack the values into separate variables if you need to:
|
||||
|
||||
```python
|
||||
>>> for name, shares, price in rows:
|
||||
print(name, shares, price)
|
||||
|
||||
AA 100 32.20
|
||||
IBM 50 91.10
|
||||
CAT 150 83.44
|
||||
MSFT 200 51.23
|
||||
GE 95 40.37
|
||||
MSFT 50 65.10
|
||||
IBM 100 70.44
|
||||
>>>
|
||||
```
|
||||
|
||||
It's somewhat common to use `_` or `__` as a throw-away variable if you don't care
|
||||
about one or more of the values. For example:
|
||||
|
||||
```python
|
||||
>>> for name, _, price in rows:
|
||||
print(name, price)
|
||||
|
||||
AA 32.20
|
||||
IBM 91.10
|
||||
CAT 83.44
|
||||
MSFT 51.23
|
||||
GE 40.37
|
||||
MSFT 65.10
|
||||
IBM 70.44
|
||||
>>>
|
||||
```
|
||||
|
||||
If you don't know how many values are being unpacked, you can use `*` as a wildcard.
|
||||
Try this experiment in grouping the data by name:
|
||||
|
||||
```python
|
||||
>>> from collections import defaultdict
|
||||
>>> byname = defaultdict(list)
|
||||
>>> for name, *data in rows:
|
||||
byname[name].append(data)
|
||||
|
||||
>>> byname['IBM']
|
||||
[['50', '91.10'], ['100', '70.44']]
|
||||
>>> byname['CAT']
|
||||
[['150', '83.44']]
|
||||
>>> for shares, price in byname['IBM']:
|
||||
print(shares, price)
|
||||
|
||||
50 91.10
|
||||
100 70.44
|
||||
>>>
|
||||
```
|
||||
|
||||
## (b) Counting with enumerate()
|
||||
|
||||
`enumerate()` is a useful function if you ever need to keep a counter
|
||||
or index while iterating. For example, suppose you wanted an extra row
|
||||
number:
|
||||
|
||||
```python
|
||||
>>> for rowno, row in enumerate(rows):
|
||||
print(rowno, row)
|
||||
|
||||
0 ['AA', '100', '32.20']
|
||||
1 ['IBM', '50', '91.10']
|
||||
2 ['CAT', '150', '83.44']
|
||||
3 ['MSFT', '200', '51.23']
|
||||
4 ['GE', '95', '40.37']
|
||||
5 ['MSFT', '50', '65.10']
|
||||
6 ['IBM', '100', '70.44']
|
||||
>>>
|
||||
```
|
||||
|
||||
You can combine this with unpacking if you're careful about how you structure it:
|
||||
|
||||
```python
|
||||
>>> for rowno, (name, shares, price) in enumerate(rows):
|
||||
print(rowno, name, shares, price)
|
||||
|
||||
0 AA 100 32.20
|
||||
1 IBM 50 91.10
|
||||
2 CAT 150 83.44
|
||||
3 MSFT 200 51.23
|
||||
4 GE 95 40.37
|
||||
5 MSFT 50 65.10
|
||||
6 IBM 100 70.44
|
||||
>>>
|
||||
```
|
||||
|
||||
## (c) Using the zip() function
|
||||
|
||||
The `zip()` function is most commonly used to pair data. For example,
|
||||
recall that you created a `headers` variable:
|
||||
|
||||
```python
|
||||
>>> headers
|
||||
['name', 'shares', 'price']
|
||||
>>>
|
||||
```
|
||||
|
||||
This might be useful to combine with the other row data:
|
||||
|
||||
```python
|
||||
>>> row = rows[0]
|
||||
>>> row
|
||||
['AA', '100', '32.20']
|
||||
>>> for col, val in zip(headers, row):
|
||||
print(col, val)
|
||||
|
||||
name AA
|
||||
shares 100
|
||||
price 32.20
|
||||
>>>
|
||||
```
|
||||
|
||||
Or maybe you can use it to make a dictionary:
|
||||
|
||||
```python
|
||||
>>> dict(zip(headers, row))
|
||||
{'name': 'AA', 'shares': '100', 'price': '32.20'}
|
||||
>>>
|
||||
```
|
||||
|
||||
Or maybe a sequence of dictionaries:
|
||||
|
||||
```python
|
||||
>>> for row in rows:
|
||||
record = dict(zip(headers, row))
|
||||
print(record)
|
||||
|
||||
{'name': 'AA', 'shares': '100', 'price': '32.20'}
|
||||
{'name': 'IBM', 'shares': '50', 'price': '91.10'}
|
||||
{'name': 'CAT', 'shares': '150', 'price': '83.44'}
|
||||
{'name': 'MSFT', 'shares': '200', 'price': '51.23'}
|
||||
{'name': 'GE', 'shares': '95', 'price': '40.37'}
|
||||
{'name': 'MSFT', 'shares': '50', 'price': '65.10'}
|
||||
{'name': 'IBM', 'shares': '100', 'price': '70.44'}
|
||||
>>>
|
||||
```
|
||||
|
||||
## (d) Generator Expressions
|
||||
|
||||
A generator expression is almost exactly the same as a list
|
||||
comprehension except that it does not create a list. Instead, it
|
||||
creates an object that produces the results incrementally--typically
|
||||
for consumption by iteration. Try a simple example:
|
||||
|
||||
```python
|
||||
>>> nums = [1,2,3,4,5]
|
||||
>>> squares = (x*x for x in nums)
|
||||
>>> squares
|
||||
<generator object <genexpr> at 0x37caa8>
|
||||
>>> for n in squares:
|
||||
print(n)
|
||||
|
||||
1
|
||||
4
|
||||
9
|
||||
16
|
||||
25
|
||||
>>>
|
||||
```
|
||||
|
||||
You will notice that a generator expression can only be used once.
|
||||
Watch what happens if you do the for-loop again:
|
||||
|
||||
```python
|
||||
>>> for n in squares:
|
||||
print(n)
|
||||
|
||||
>>>
|
||||
```
|
||||
|
||||
You can manually get the results one-at-a-time if you use the
|
||||
`next()` function. Try this:
|
||||
|
||||
```python
|
||||
>>> squares = (x*x for x in nums)
|
||||
>>> next(squares)
|
||||
1
|
||||
>>> next(squares)
|
||||
4
|
||||
>>> next(squares)
|
||||
9
|
||||
>>>
|
||||
```
|
||||
|
||||
Keeping typing `next()` to see what happens when there is no
|
||||
more data.
|
||||
|
||||
If the task you are performing is more complicated, you can
|
||||
still take advantage of generators by writing a generator function
|
||||
and using the `yield` statement instead.
|
||||
For example:
|
||||
|
||||
```python
|
||||
>>> def squares(nums):
|
||||
for x in nums:
|
||||
yield x*x
|
||||
|
||||
>>> for n in squares(nums):
|
||||
print(n)
|
||||
|
||||
1
|
||||
4
|
||||
9
|
||||
16
|
||||
25
|
||||
>>>
|
||||
```
|
||||
|
||||
We'll return to generator functions a little later in the course--for now,
|
||||
just view such functions as having the interesting property of feeding
|
||||
values to the `for`-statement.
|
||||
|
||||
## (e) Generator Expressions and Reduction Functions
|
||||
|
||||
Generator expressions are especially useful for feeding data into
|
||||
functions such as `sum()`, `min()`, `max()`,
|
||||
`any()`, etc. Try some examples using the portfolio data from
|
||||
earlier. Carefully observe that these examples are missing some
|
||||
extra square brackets ([]) that appeared when using list comprehensions.
|
||||
|
||||
```python
|
||||
>>> from readport import read_portfolio
|
||||
>>> portfolio = read_portfolio('Data/portfolio.csv')
|
||||
>>> sum(s['shares']*s['price'] for s in portfolio)
|
||||
44671.15
|
||||
>>> min(s['shares'] for s in portfolio)
|
||||
50
|
||||
>>> any(s['name'] == 'IBM' for s in portfolio)
|
||||
True
|
||||
>>> all(s['name'] == 'IBM' for s in portfolio)
|
||||
False
|
||||
>>> sum(s['shares'] for s in portfolio if s['name'] == 'IBM')
|
||||
150
|
||||
>>>
|
||||
```
|
||||
|
||||
Here is an subtle use of a generator expression in making comma
|
||||
separated values:
|
||||
|
||||
```python
|
||||
>>> s = ('GOOG',100,490.10)
|
||||
>>> ','.join(s)
|
||||
... observe that it fails ...
|
||||
>>> ','.join(str(x) for x in s) # This works
|
||||
'GOOG,100,490.1'
|
||||
>>>
|
||||
```
|
||||
|
||||
The syntax in the above examples takes some getting used to, but the
|
||||
critical point is that none of the operations ever create a fully
|
||||
populated list of results. This gives you a big memory savings. However,
|
||||
you do need to make sure you don't go overboard with the syntax.
|
||||
|
||||
## (f) Saving a lot of memory
|
||||
|
||||
In link:ex2_1.html[Exercise 2.1] you wrote a function
|
||||
`read_rides_as_dicts()` that read the CTA bus data into a list of
|
||||
dictionaries. Using it requires a lot of memory. For example,
|
||||
let's find the day on which the route 22 bus had the greatest
|
||||
ridership:
|
||||
|
||||
```python
|
||||
>>> import tracemalloc
|
||||
>>> tracemalloc.start()
|
||||
>>> import readrides
|
||||
>>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv')
|
||||
>>> rt22 = [row for row in rows if row['route'] == '22']
|
||||
>>> max(rt22, key=lambda row: row['rides'])
|
||||
{'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896}
|
||||
>>> tracemalloc.get_traced_memory()
|
||||
... look at result. Should be around 220MB
|
||||
>>>
|
||||
```
|
||||
|
||||
Now, let's try an example involving generators. Restart Python
|
||||
and try this:
|
||||
|
||||
```python
|
||||
>>> # RESTART
|
||||
>>> import tracemalloc
|
||||
>>> tracemalloc.start()
|
||||
>>> import csv
|
||||
>>> f = open('Data/ctabus.csv')
|
||||
>>> f_csv = csv.reader(f)
|
||||
>>> headers = next(f_csv)
|
||||
>>> rows = (dict(zip(headers,row)) for row in f_csv)
|
||||
>>> rt22 = (row for row in rows if row['route'] == '22')
|
||||
>>> max(rt22, key=lambda row: int(row['rides']))
|
||||
{'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896}
|
||||
>>> tracemalloc.get_traced_memory()
|
||||
... look at result. Should be a LOT smaller than before
|
||||
>>>
|
||||
```
|
||||
|
||||
Keep in mind that you just processed the entire dataset as if it was
|
||||
stored as a sequence of dictionaries. Yet, nowhere did you actually
|
||||
create and store a list of dictionaries. Not all problems can be
|
||||
structured in this way, but if you can work with data in an
|
||||
iterative manner, generator expressions can save a huge amount of memory.
|
||||
|
||||
\[ [Solution](soln2_3.md) | [Index](index.md) | [Exercise 2.2](ex2_2.md) | [Exercise 2.4](ex2_4.md) \]
|
||||
|
||||
----
|
||||
`>>>` Advanced Python Mastery
|
||||
`...` A course by [dabeaz](https://www.dabeaz.com)
|
||||
`...` Copyright 2007-2023
|
||||
|
||||
. This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)
|
||||
Reference in New Issue
Block a user