python-mastery/Exercises/ex2_3.md

\[ [Index](index.md) | [Exercise 2.2](ex2_2.md) | [Exercise 2.4](ex2_4.md) \]

# Exercise 2.3

*Objectives:*

- Iterate like a pro

*Files Modified:* None.

Iteration is an essential Python skill.  In this exercise, we look at
a number of common iteration idioms.

Start the exercise by grabbing some rows of data from a CSV file.

```python
>>> import csv
>>> f = open('Data/portfolio.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> headers
['name', 'shares', 'price']
>>> rows = list(f_csv)
>>> from pprint import pprint
>>> pprint(rows)
[['AA', '100', '32.20'],
 ['IBM', '50', '91.10'],
 ['CAT', '150', '83.44'],
 ['MSFT', '200', '51.23'],
 ['GE', '95', '40.37'],
 ['MSFT', '50', '65.10'],
 ['IBM', '100', '70.44']]
>>>
```

## (a) Basic Iteration and Unpacking

The `for` statement iterates over any sequence of data. For example:

```python
>>> for row in rows:
        print(row)

['AA', '100', '32.20']
['IBM', '50', '91.10']
['CAT', '150', '83.44']
['MSFT', '200', '51.23']
['GE', '95', '40.37']
['MSFT', '50', '65.10']
['IBM', '100', '70.44']
>>>
```

Unpack the values into separate variables if you need to:

```python
>>> for name, shares, price in rows:
        print(name, shares, price)

AA 100 32.20
IBM 50 91.10
CAT 150 83.44
MSFT 200 51.23
GE 95 40.37
MSFT 50 65.10
IBM 100 70.44
>>>
```

It's somewhat common to use `_` or `__` as a throw-away variable if you don't care
about one or more of the values.  For example:

```python
>>> for name, _, price in rows:
        print(name, price)

AA 32.20
IBM 91.10
CAT 83.44
MSFT 51.23
GE 40.37
MSFT 65.10
IBM 70.44
>>>
```

If you don't know how many values are being unpacked, you can use `*` as a wildcard.
Try this experiment in grouping the data by name:

```python
>>> from collections import defaultdict
>>> byname = defaultdict(list)
>>> for name, *data in rows:
        byname[name].append(data)

>>> byname['IBM']
[['50', '91.10'], ['100', '70.44']]
>>> byname['CAT']
[['150', '83.44']]
>>> for shares, price in byname['IBM']:
        print(shares, price)

50 91.10
100 70.44
>>>
```

## (b) Counting with enumerate()

`enumerate()` is a useful function if you ever need to keep a counter
or index while iterating. For example, suppose you wanted an extra row
number:

```python
>>> for rowno, row in enumerate(rows):
        print(rowno, row)

0 ['AA', '100', '32.20']
1 ['IBM', '50', '91.10']
2 ['CAT', '150', '83.44']
3 ['MSFT', '200', '51.23']
4 ['GE', '95', '40.37']
5 ['MSFT', '50', '65.10']
6 ['IBM', '100', '70.44']
>>>
```

You can combine this with unpacking if you're careful about how you structure it:

```python
>>> for rowno, (name, shares, price) in enumerate(rows):
        print(rowno, name, shares, price)

0 AA 100 32.20
1 IBM 50 91.10
2 CAT 150 83.44
3 MSFT 200 51.23
4 GE 95 40.37
5 MSFT 50 65.10
6 IBM 100 70.44
>>> 
```

## (c) Using the zip() function

The `zip()` function is most commonly used to pair data.  For example,
recall that you created a `headers` variable:

```python
>>> headers
['name', 'shares', 'price']
>>>
```

This might be useful to combine with the other row data:

```python
>>> row = rows[0]
>>> row
['AA', '100', '32.20']
>>> for col, val in zip(headers, row):
        print(col, val)

name AA
shares 100
price 32.20
>>>
```

Or maybe you can use it to make a dictionary:

```python
>>> dict(zip(headers, row))
{'name': 'AA', 'shares': '100', 'price': '32.20'}
>>>
```

Or maybe a sequence of dictionaries:

```python
>>> for row in rows:
        record = dict(zip(headers, row))
        print(record)

{'name': 'AA', 'shares': '100', 'price': '32.20'}
{'name': 'IBM', 'shares': '50', 'price': '91.10'}
{'name': 'CAT', 'shares': '150', 'price': '83.44'}
{'name': 'MSFT', 'shares': '200', 'price': '51.23'}
{'name': 'GE', 'shares': '95', 'price': '40.37'}
{'name': 'MSFT', 'shares': '50', 'price': '65.10'}
{'name': 'IBM', 'shares': '100', 'price': '70.44'}
>>>
```

## (d) Generator Expressions

A generator expression is almost exactly the same as a list
comprehension except that it does not create a list.  Instead, it
creates an object that produces the results incrementally--typically
for consumption by iteration. Try a simple example:

```python
>>> nums = [1,2,3,4,5]
>>> squares = (x*x for x in nums)
>>> squares
<generator object <genexpr> at 0x37caa8>
>>> for n in squares:
        print(n)

1
4
9
16
25
>>>
```

You will notice that a generator expression can only be used once.
Watch what happens if you do the for-loop again:

```python
>>> for n in squares:
         print(n)

>>>
```

You can manually get the results one-at-a-time if you use the
`next()` function. Try this:

```python
>>> squares = (x*x for x in nums)
>>> next(squares)
1
>>> next(squares)
4
>>> next(squares)
9
>>>
```

Keeping typing `next()` to see what happens when there is no
more data.

If the task you are performing is more complicated, you can
still take advantage of generators by writing a generator function 
and using the `yield` statement instead.
For example:

```python
>>> def squares(nums):
        for x in nums:
            yield x*x

>>> for n in squares(nums):
        print(n)

1
4
9
16
25
>>>
```

We'll return to generator functions a little later in the course--for now,
just view such functions as having the interesting property of feeding
values to the `for`-statement.

## (e) Generator Expressions and Reduction Functions

Generator expressions are especially useful for feeding data into
functions such as `sum()`, `min()`, `max()`,
`any()`, etc.   Try some examples using the portfolio data from
earlier.  Carefully observe that these examples are missing some
extra square brackets ([]) that appeared when using list comprehensions.

```python
>>> from readport import read_portfolio
>>> portfolio = read_portfolio('Data/portfolio.csv')
>>> sum(s['shares']*s['price'] for s in portfolio)
44671.15
>>> min(s['shares'] for s in portfolio)
50
>>> any(s['name'] == 'IBM' for s in portfolio)
True
>>> all(s['name'] == 'IBM' for s in portfolio)
False
>>> sum(s['shares'] for s in portfolio if s['name'] == 'IBM')
150
>>>
```

Here is an subtle use of a generator expression in making comma
separated values:

```python
>>> s = ('GOOG',100,490.10)
>>> ','.join(s)
... observe that it fails ...
>>> ','.join(str(x) for x in s)    # This works
'GOOG,100,490.1'
>>>
```

The syntax in the above examples takes some getting used to, but the
critical point is that none of the operations ever create a fully
populated list of results.  This gives you a big memory savings.  However,
you do need to make sure you don't go overboard with the syntax.

## (f) Saving a lot of memory

In [Exercise 2.1](ex2_1.md) you wrote a function
`read_rides_as_dicts()` that read the CTA bus data into a list of
dictionaries.  Using it requires a lot of memory. For example,
let's find the day on which the route 22 bus had the greatest
ridership:

```python
>>> import tracemalloc
>>> tracemalloc.start()
>>> import readrides
>>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv')
>>> rt22 = [row for row in rows if row['route'] == '22']
>>> max(rt22, key=lambda row: row['rides'])
{'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896}
>>> tracemalloc.get_traced_memory()
... look at result. Should be around 220MB
>>>
```

Now, let's try an example involving generators. Restart Python
and try this:

```python
>>> # RESTART
>>> import tracemalloc
>>> tracemalloc.start()
>>> import csv
>>> f = open('Data/ctabus.csv')
>>> f_csv = csv.reader(f)
>>> headers = next(f_csv)
>>> rows = (dict(zip(headers,row)) for row in f_csv)
>>> rt22 = (row for row in rows if row['route'] == '22')
>>> max(rt22, key=lambda row: int(row['rides']))
{'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896}
>>> tracemalloc.get_traced_memory()
... look at result. Should be a LOT smaller than before
>>>
```

Keep in mind that you just processed the entire dataset as if it was
stored as a sequence of dictionaries.  Yet, nowhere did you actually
create and store a list of dictionaries.   Not all problems can be
structured in this way, but if you can work with data in an
iterative manner, generator expressions can save a huge amount of memory.

\[ [Solution](soln2_3.md) | [Index](index.md) | [Exercise 2.2](ex2_2.md) | [Exercise 2.4](ex2_4.md) \]

----
`>>>` Advanced Python Mastery  
`...` A course by [dabeaz](https://www.dabeaz.com)  
`...` Copyright 2007-2023  

![](https://i.creativecommons.org/l/by-sa/4.0/88x31.png). This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)
Initial commit 2023-07-17 03:21:00 +02:00			`\[ [Index](index.md) \| [Exercise 2.2](ex2_2.md) \| [Exercise 2.4](ex2_4.md) \]`

			`# Exercise 2.3`

			`Objectives:`

			`- Iterate like a pro`

			`Files Modified: None.`

			`Iteration is an essential Python skill. In this exercise, we look at`
			`a number of common iteration idioms.`

			`Start the exercise by grabbing some rows of data from a CSV file.`

			```python
			`>>> import csv`
			`>>> f = open('Data/portfolio.csv')`
			`>>> f_csv = csv.reader(f)`
			`>>> headers = next(f_csv)`
			`>>> headers`
			`['name', 'shares', 'price']`
			`>>> rows = list(f_csv)`
			`>>> from pprint import pprint`
			`>>> pprint(rows)`
			`[['AA', '100', '32.20'],`
			`['IBM', '50', '91.10'],`
			`['CAT', '150', '83.44'],`
			`['MSFT', '200', '51.23'],`
			`['GE', '95', '40.37'],`
			`['MSFT', '50', '65.10'],`
			`['IBM', '100', '70.44']]`
			`>>>`
			```

			`## (a) Basic Iteration and Unpacking`

			The `for` statement iterates over any sequence of data. For example:

			```python
			`>>> for row in rows:`
			`print(row)`

			`['AA', '100', '32.20']`
			`['IBM', '50', '91.10']`
			`['CAT', '150', '83.44']`
			`['MSFT', '200', '51.23']`
			`['GE', '95', '40.37']`
			`['MSFT', '50', '65.10']`
			`['IBM', '100', '70.44']`
			`>>>`
			```

			`Unpack the values into separate variables if you need to:`

			```python
			`>>> for name, shares, price in rows:`
			`print(name, shares, price)`

			`AA 100 32.20`
			`IBM 50 91.10`
			`CAT 150 83.44`
			`MSFT 200 51.23`
			`GE 95 40.37`
			`MSFT 50 65.10`
			`IBM 100 70.44`
			`>>>`
			```

			It's somewhat common to use `_` or `__` as a throw-away variable if you don't care
			`about one or more of the values. For example:`

			```python
			`>>> for name, _, price in rows:`
			`print(name, price)`

			`AA 32.20`
			`IBM 91.10`
			`CAT 83.44`
			`MSFT 51.23`
			`GE 40.37`
			`MSFT 65.10`
			`IBM 70.44`
			`>>>`
			```

			If you don't know how many values are being unpacked, you can use `*` as a wildcard.
			`Try this experiment in grouping the data by name:`

			```python
			`>>> from collections import defaultdict`
			`>>> byname = defaultdict(list)`
			`>>> for name, *data in rows:`
			`byname[name].append(data)`

			`>>> byname['IBM']`
			`[['50', '91.10'], ['100', '70.44']]`
			`>>> byname['CAT']`
			`[['150', '83.44']]`
			`>>> for shares, price in byname['IBM']:`
			`print(shares, price)`

			`50 91.10`
			`100 70.44`
			`>>>`
			```

			`## (b) Counting with enumerate()`

			`enumerate()` is a useful function if you ever need to keep a counter
			`or index while iterating. For example, suppose you wanted an extra row`
			`number:`

			```python
			`>>> for rowno, row in enumerate(rows):`
			`print(rowno, row)`

			`0 ['AA', '100', '32.20']`
			`1 ['IBM', '50', '91.10']`
			`2 ['CAT', '150', '83.44']`
			`3 ['MSFT', '200', '51.23']`
			`4 ['GE', '95', '40.37']`
			`5 ['MSFT', '50', '65.10']`
			`6 ['IBM', '100', '70.44']`
			`>>>`
			```

			`You can combine this with unpacking if you're careful about how you structure it:`

			```python
			`>>> for rowno, (name, shares, price) in enumerate(rows):`
			`print(rowno, name, shares, price)`

			`0 AA 100 32.20`
			`1 IBM 50 91.10`
			`2 CAT 150 83.44`
			`3 MSFT 200 51.23`
			`4 GE 95 40.37`
			`5 MSFT 50 65.10`
			`6 IBM 100 70.44`
			`>>>`
			```

			`## (c) Using the zip() function`

			The `zip()` function is most commonly used to pair data. For example,
			recall that you created a `headers` variable:

			```python
			`>>> headers`
			`['name', 'shares', 'price']`
			`>>>`
			```

			`This might be useful to combine with the other row data:`

			```python
			`>>> row = rows[0]`
			`>>> row`
			`['AA', '100', '32.20']`
			`>>> for col, val in zip(headers, row):`
			`print(col, val)`

			`name AA`
			`shares 100`
			`price 32.20`
			`>>>`
			```

			`Or maybe you can use it to make a dictionary:`

			```python
			`>>> dict(zip(headers, row))`
			`{'name': 'AA', 'shares': '100', 'price': '32.20'}`
			`>>>`
			```

			`Or maybe a sequence of dictionaries:`

			```python
			`>>> for row in rows:`
			`record = dict(zip(headers, row))`
			`print(record)`

			`{'name': 'AA', 'shares': '100', 'price': '32.20'}`
			`{'name': 'IBM', 'shares': '50', 'price': '91.10'}`
			`{'name': 'CAT', 'shares': '150', 'price': '83.44'}`
			`{'name': 'MSFT', 'shares': '200', 'price': '51.23'}`
			`{'name': 'GE', 'shares': '95', 'price': '40.37'}`
			`{'name': 'MSFT', 'shares': '50', 'price': '65.10'}`
			`{'name': 'IBM', 'shares': '100', 'price': '70.44'}`
			`>>>`
			```

			`## (d) Generator Expressions`

			`A generator expression is almost exactly the same as a list`
			`comprehension except that it does not create a list. Instead, it`
			`creates an object that produces the results incrementally--typically`
			`for consumption by iteration. Try a simple example:`

			```python
			`>>> nums = [1,2,3,4,5]`
			`>>> squares = (x*x for x in nums)`
			`>>> squares`
			`<generator object <genexpr> at 0x37caa8>`
			`>>> for n in squares:`
			`print(n)`

			`1`
			`4`
			`9`
			`16`
			`25`
			`>>>`
			```

			`You will notice that a generator expression can only be used once.`
			`Watch what happens if you do the for-loop again:`

			```python
			`>>> for n in squares:`
			`print(n)`

			`>>>`
			```

			`You can manually get the results one-at-a-time if you use the`
			`next()` function. Try this:

			```python
			`>>> squares = (x*x for x in nums)`
			`>>> next(squares)`
			`1`
			`>>> next(squares)`
			`4`
			`>>> next(squares)`
			`9`
			`>>>`
			```

			Keeping typing `next()` to see what happens when there is no
			`more data.`

			`If the task you are performing is more complicated, you can`
			`still take advantage of generators by writing a generator function`
			and using the `yield` statement instead.
			`For example:`

			```python
			`>>> def squares(nums):`
			`for x in nums:`
			`yield x*x`

			`>>> for n in squares(nums):`
			`print(n)`

			`1`
			`4`
			`9`
			`16`
			`25`
			`>>>`
			```

			`We'll return to generator functions a little later in the course--for now,`
			`just view such functions as having the interesting property of feeding`
			values to the `for`-statement.

			`## (e) Generator Expressions and Reduction Functions`

			`Generator expressions are especially useful for feeding data into`
			functions such as `sum()`, `min()`, `max()`,
			`any()`, etc. Try some examples using the portfolio data from
			`earlier. Carefully observe that these examples are missing some`
			`extra square brackets ([]) that appeared when using list comprehensions.`

			```python
			`>>> from readport import read_portfolio`
			`>>> portfolio = read_portfolio('Data/portfolio.csv')`
			`>>> sum(s['shares']*s['price'] for s in portfolio)`
			`44671.15`
			`>>> min(s['shares'] for s in portfolio)`
			`50`
			`>>> any(s['name'] == 'IBM' for s in portfolio)`
			`True`
			`>>> all(s['name'] == 'IBM' for s in portfolio)`
			`False`
			`>>> sum(s['shares'] for s in portfolio if s['name'] == 'IBM')`
			`150`
			`>>>`
			```

			`Here is an subtle use of a generator expression in making comma`
			`separated values:`

			```python
			`>>> s = ('GOOG',100,490.10)`
			`>>> ','.join(s)`
			`... observe that it fails ...`
			`>>> ','.join(str(x) for x in s) # This works`
			`'GOOG,100,490.1'`
			`>>>`
			```

			`The syntax in the above examples takes some getting used to, but the`
			`critical point is that none of the operations ever create a fully`
			`populated list of results. This gives you a big memory savings. However,`
			`you do need to make sure you don't go overboard with the syntax.`

			`## (f) Saving a lot of memory`

Fix broken links 2023-07-17 17:41:16 +02:00			`In [Exercise 2.1](ex2_1.md) you wrote a function`
Initial commit 2023-07-17 03:21:00 +02:00			`read_rides_as_dicts()` that read the CTA bus data into a list of
			`dictionaries. Using it requires a lot of memory. For example,`
			`let's find the day on which the route 22 bus had the greatest`
			`ridership:`

			```python
			`>>> import tracemalloc`
			`>>> tracemalloc.start()`
			`>>> import readrides`
			`>>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv')`
			`>>> rt22 = [row for row in rows if row['route'] == '22']`
			`>>> max(rt22, key=lambda row: row['rides'])`
			`{'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896}`
			`>>> tracemalloc.get_traced_memory()`
			`... look at result. Should be around 220MB`
			`>>>`
			```

			`Now, let's try an example involving generators. Restart Python`
			`and try this:`

			```python
			`>>> # RESTART`
			`>>> import tracemalloc`
			`>>> tracemalloc.start()`
			`>>> import csv`
			`>>> f = open('Data/ctabus.csv')`
			`>>> f_csv = csv.reader(f)`
			`>>> headers = next(f_csv)`
			`>>> rows = (dict(zip(headers,row)) for row in f_csv)`
			`>>> rt22 = (row for row in rows if row['route'] == '22')`
			`>>> max(rt22, key=lambda row: int(row['rides']))`
			`{'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896}`
			`>>> tracemalloc.get_traced_memory()`
			`... look at result. Should be a LOT smaller than before`
			`>>>`
			```

			`Keep in mind that you just processed the entire dataset as if it was`
			`stored as a sequence of dictionaries. Yet, nowhere did you actually`
			`create and store a list of dictionaries. Not all problems can be`
			`structured in this way, but if you can work with data in an`
			`iterative manner, generator expressions can save a huge amount of memory.`

			`\[ [Solution](soln2_3.md) \| [Index](index.md) \| [Exercise 2.2](ex2_2.md) \| [Exercise 2.4](ex2_4.md) \]`

			`----`
			`>>>` Advanced Python Mastery
			`...` A course by [dabeaz](https://www.dabeaz.com)
			`...` Copyright 2007-2023

			`![](https://i.creativecommons.org/l/by-sa/4.0/88x31.png). This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)`