\[ [Index](index.md) | [Exercise 2.2](ex2_2.md) | [Exercise 2.4](ex2_4.md) \] # Exercise 2.3 *Objectives:* - Iterate like a pro *Files Modified:* None. Iteration is an essential Python skill. In this exercise, we look at a number of common iteration idioms. Start the exercise by grabbing some rows of data from a CSV file. ```python >>> import csv >>> f = open('Data/portfolio.csv') >>> f_csv = csv.reader(f) >>> headers = next(f_csv) >>> headers ['name', 'shares', 'price'] >>> rows = list(f_csv) >>> from pprint import pprint >>> pprint(rows) [['AA', '100', '32.20'], ['IBM', '50', '91.10'], ['CAT', '150', '83.44'], ['MSFT', '200', '51.23'], ['GE', '95', '40.37'], ['MSFT', '50', '65.10'], ['IBM', '100', '70.44']] >>> ``` ## (a) Basic Iteration and Unpacking The `for` statement iterates over any sequence of data. For example: ```python >>> for row in rows: print(row) ['AA', '100', '32.20'] ['IBM', '50', '91.10'] ['CAT', '150', '83.44'] ['MSFT', '200', '51.23'] ['GE', '95', '40.37'] ['MSFT', '50', '65.10'] ['IBM', '100', '70.44'] >>> ``` Unpack the values into separate variables if you need to: ```python >>> for name, shares, price in rows: print(name, shares, price) AA 100 32.20 IBM 50 91.10 CAT 150 83.44 MSFT 200 51.23 GE 95 40.37 MSFT 50 65.10 IBM 100 70.44 >>> ``` It's somewhat common to use `_` or `__` as a throw-away variable if you don't care about one or more of the values. For example: ```python >>> for name, _, price in rows: print(name, price) AA 32.20 IBM 91.10 CAT 83.44 MSFT 51.23 GE 40.37 MSFT 65.10 IBM 70.44 >>> ``` If you don't know how many values are being unpacked, you can use `*` as a wildcard. Try this experiment in grouping the data by name: ```python >>> from collections import defaultdict >>> byname = defaultdict(list) >>> for name, *data in rows: byname[name].append(data) >>> byname['IBM'] [['50', '91.10'], ['100', '70.44']] >>> byname['CAT'] [['150', '83.44']] >>> for shares, price in byname['IBM']: print(shares, price) 50 91.10 100 70.44 >>> ``` ## (b) Counting with enumerate() `enumerate()` is a useful function if you ever need to keep a counter or index while iterating. For example, suppose you wanted an extra row number: ```python >>> for rowno, row in enumerate(rows): print(rowno, row) 0 ['AA', '100', '32.20'] 1 ['IBM', '50', '91.10'] 2 ['CAT', '150', '83.44'] 3 ['MSFT', '200', '51.23'] 4 ['GE', '95', '40.37'] 5 ['MSFT', '50', '65.10'] 6 ['IBM', '100', '70.44'] >>> ``` You can combine this with unpacking if you're careful about how you structure it: ```python >>> for rowno, (name, shares, price) in enumerate(rows): print(rowno, name, shares, price) 0 AA 100 32.20 1 IBM 50 91.10 2 CAT 150 83.44 3 MSFT 200 51.23 4 GE 95 40.37 5 MSFT 50 65.10 6 IBM 100 70.44 >>> ``` ## (c) Using the zip() function The `zip()` function is most commonly used to pair data. For example, recall that you created a `headers` variable: ```python >>> headers ['name', 'shares', 'price'] >>> ``` This might be useful to combine with the other row data: ```python >>> row = rows[0] >>> row ['AA', '100', '32.20'] >>> for col, val in zip(headers, row): print(col, val) name AA shares 100 price 32.20 >>> ``` Or maybe you can use it to make a dictionary: ```python >>> dict(zip(headers, row)) {'name': 'AA', 'shares': '100', 'price': '32.20'} >>> ``` Or maybe a sequence of dictionaries: ```python >>> for row in rows: record = dict(zip(headers, row)) print(record) {'name': 'AA', 'shares': '100', 'price': '32.20'} {'name': 'IBM', 'shares': '50', 'price': '91.10'} {'name': 'CAT', 'shares': '150', 'price': '83.44'} {'name': 'MSFT', 'shares': '200', 'price': '51.23'} {'name': 'GE', 'shares': '95', 'price': '40.37'} {'name': 'MSFT', 'shares': '50', 'price': '65.10'} {'name': 'IBM', 'shares': '100', 'price': '70.44'} >>> ``` ## (d) Generator Expressions A generator expression is almost exactly the same as a list comprehension except that it does not create a list. Instead, it creates an object that produces the results incrementally--typically for consumption by iteration. Try a simple example: ```python >>> nums = [1,2,3,4,5] >>> squares = (x*x for x in nums) >>> squares at 0x37caa8> >>> for n in squares: print(n) 1 4 9 16 25 >>> ``` You will notice that a generator expression can only be used once. Watch what happens if you do the for-loop again: ```python >>> for n in squares: print(n) >>> ``` You can manually get the results one-at-a-time if you use the `next()` function. Try this: ```python >>> squares = (x*x for x in nums) >>> next(squares) 1 >>> next(squares) 4 >>> next(squares) 9 >>> ``` Keeping typing `next()` to see what happens when there is no more data. If the task you are performing is more complicated, you can still take advantage of generators by writing a generator function and using the `yield` statement instead. For example: ```python >>> def squares(nums): for x in nums: yield x*x >>> for n in squares(nums): print(n) 1 4 9 16 25 >>> ``` We'll return to generator functions a little later in the course--for now, just view such functions as having the interesting property of feeding values to the `for`-statement. ## (e) Generator Expressions and Reduction Functions Generator expressions are especially useful for feeding data into functions such as `sum()`, `min()`, `max()`, `any()`, etc. Try some examples using the portfolio data from earlier. Carefully observe that these examples are missing some extra square brackets ([]) that appeared when using list comprehensions. ```python >>> from readport import read_portfolio >>> portfolio = read_portfolio('Data/portfolio.csv') >>> sum(s['shares']*s['price'] for s in portfolio) 44671.15 >>> min(s['shares'] for s in portfolio) 50 >>> any(s['name'] == 'IBM' for s in portfolio) True >>> all(s['name'] == 'IBM' for s in portfolio) False >>> sum(s['shares'] for s in portfolio if s['name'] == 'IBM') 150 >>> ``` Here is a subtle use of a generator expression in making comma separated values: ```python >>> s = ('GOOG',100,490.10) >>> ','.join(s) ... observe that it fails ... >>> ','.join(str(x) for x in s) # This works 'GOOG,100,490.1' >>> ``` The syntax in the above examples takes some getting used to, but the critical point is that none of the operations ever create a fully populated list of results. This gives you a big memory savings. However, you do need to make sure you don't go overboard with the syntax. ## (f) Saving a lot of memory In [Exercise 2.1](ex2_1.md) you wrote a function `read_rides_as_dicts()` that read the CTA bus data into a list of dictionaries. Using it requires a lot of memory. For example, let's find the day on which the route 22 bus had the greatest ridership: ```python >>> import tracemalloc >>> tracemalloc.start() >>> import readrides >>> rows = readrides.read_rides_as_dicts('Data/ctabus.csv') >>> rt22 = [row for row in rows if row['route'] == '22'] >>> max(rt22, key=lambda row: row['rides']) {'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896} >>> tracemalloc.get_traced_memory() ... look at result. Should be around 220MB >>> ``` Now, let's try an example involving generators. Restart Python and try this: ```python >>> # RESTART >>> import tracemalloc >>> tracemalloc.start() >>> import csv >>> f = open('Data/ctabus.csv') >>> f_csv = csv.reader(f) >>> headers = next(f_csv) >>> rows = (dict(zip(headers,row)) for row in f_csv) >>> rt22 = (row for row in rows if row['route'] == '22') >>> max(rt22, key=lambda row: int(row['rides'])) {'date': '06/11/2008', 'route': '22', 'daytype': 'W', 'rides': 26896} >>> tracemalloc.get_traced_memory() ... look at result. Should be a LOT smaller than before >>> ``` Keep in mind that you just processed the entire dataset as if it was stored as a sequence of dictionaries. Yet, nowhere did you actually create and store a list of dictionaries. Not all problems can be structured in this way, but if you can work with data in an iterative manner, generator expressions can save a huge amount of memory. \[ [Solution](soln2_3.md) | [Index](index.md) | [Exercise 2.2](ex2_2.md) | [Exercise 2.4](ex2_4.md) \] ---- `>>>` Advanced Python Mastery `...` A course by [dabeaz](https://www.dabeaz.com) `...` Copyright 2007-2023 ![](https://i.creativecommons.org/l/by-sa/4.0/88x31.png). This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/)