{ "cells": [ { "cell_type": "markdown", "id": "7d22d1c6-0a90-4410-ae48-1a16f094cbf2", "metadata": {}, "source": [ "
Peter Norvig
Jan 2023
\n", "\n", "# Docstring Fixpoint Theory\n", "\n", "In reading/writing/debugging/verifying a short Python function we would like there to be a clear and obvious connection between the problem **specification** and both the **docstring** and the **code**. This notebook makes a proposal:\n", "\n", "- One approach is to repeatedly alternate editing the **docstring** and the **body** of the function until they converge to a **[fixpoint](https://en.wikipedia.org/wiki/Fixed_point)** in which there is an obvious one-to-one correspondance between the two, and no changes can be made to improve either one.\n", "\n", "This approach follows [Tony Hoare](https://en.wikipedia.org/wiki/Tony_Hoare)'s first method: *\"There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.\"* \n", "\n", "Some caveats: \n", "- This approach is not always appropriate! For many functions the docstring is a high-level description and the code has more detail that is not in the docstring. Docstring fixpoint theory makes the most sense for short functions (just a few lines).\n", "- The edits will change the wording, but must maintain the meaning (the correspondance to the intended purpose; the specification)." ] }, { "cell_type": "markdown", "id": "a265710c-070d-455a-ba21-19c78441ca90", "metadata": {}, "source": [ "# Example: The Rainfall Problem\n", "\n", "The \"Rainfall Problem,\" initially posed by [Elliot Soloway](https://cdc.engin.umich.edu/elliot-soloway/), has been used to explore the ways that novices address a programming problem. We will use [Kathi Fisler](https://cs.brown.edu/~kfisler)'s [version](https://cs.brown.edu/~kfisler/Pubs/icer14-rainfall/) of the problem:\n", "\n", "\n", "- *Design a program called **rainfall** that consumes a list of numbers representing daily rainfall amounts as entered by a user. The list may contain the number -999 indicating the end of the data of interest. Produce the average of the non-negative values in the list up to the first -999 (if it shows up). There may be negative numbers other than -999 in the list.* \n", "\n", "The problem doesn't say what kind of numbers will be in the list, so I'll define `Number` to be either an integer or floating point number (It doesn't make sense to have a [complex](https://docs.python.org/3/library/cmath.html) amount of rainfall.)" ] }, { "cell_type": "code", "execution_count": 1, "id": "158c20d7-c4d4-463e-b228-248f39220cad", "metadata": {}, "outputs": [], "source": [ "from typing import *\n", "\n", "Number = Union[int, float]" ] }, { "cell_type": "markdown", "id": "829a95cc-0697-4a55-b01d-86a02322409f", "metadata": {}, "source": [ "Start by writing a function prototype containing the complete problem statement as the **docstring**:" ] }, { "cell_type": "code", "execution_count": 2, "id": "4ac87ea7-7ad9-48e2-a21f-fb772e540487", "metadata": {}, "outputs": [], "source": [ "def rainfall(numbers: List[Number]) -> Number:\n", " \"\"\"Design a program called rainfall that consumes a list of numbers \n", " representing daily rainfall amounts as entered by a user. \n", " The list may contain the number -999 indicating the end of the data of interest. \n", " Produce the average of the non-negative values in the list up to the first -999 \n", " (if it shows up). There may be negative numbers other than -999 in the list.\"\"\"\n", " ..." ] }, { "cell_type": "markdown", "id": "493ba182-2d75-4caf-9998-c46637fd949f", "metadata": {}, "source": [ "Now edit the **docstring** to remove extraneous parts, in the process abstracting the problem away from rainfall and focusing on the numbers involved:" ] }, { "cell_type": "code", "execution_count": 3, "id": "3000881a-f2b0-4e00-9bfa-9c358a94de2c", "metadata": {}, "outputs": [], "source": [ "def rainfall(numbers: List[Number]) -> Number:\n", " \"\"\"Produce the average of the non-negative values in a list of numbers,\n", " up to the first -999 (if it shows up).\"\"\"\n", " ..." ] }, { "cell_type": "markdown", "id": "be7f6cc2-cd41-4472-8abf-f052a30bb031", "metadata": {}, "source": [ "Next write a function **body** that mirrors the docstring as closely as possible. (It introduces helper functions.)" ] }, { "cell_type": "code", "execution_count": 4, "id": "15bbdbb7-b16d-4347-80c1-bdac375a95b4", "metadata": {}, "outputs": [], "source": [ "def rainfall(numbers: List[Number]) -> Number:\n", " \"\"\"Produce the average of the non-negative values in a list of numbers,\n", " up to the first -999 (if it shows up).\"\"\"\n", " return mean(non_negative(upto(-999, numbers)))" ] }, { "cell_type": "markdown", "id": "601a806a-f710-4877-851e-caff50657039", "metadata": {}, "source": [ "Lightly edit the **docstring** once more to bring it into even closer agreement with the code:" ] }, { "cell_type": "code", "execution_count": 5, "id": "701bb096-d90e-4a45-bfbb-58ea2106fdd5", "metadata": {}, "outputs": [], "source": [ "def rainfall(numbers: List[Number]) -> Number:\n", " \"\"\"Return the mean of the non-negative values in a list of numbers,\n", " up to the first -999 (if it shows up).\"\"\"\n", " return mean(non_negative(upto(-999, numbers)))" ] }, { "cell_type": "markdown", "id": "6a1a566c-2113-4d71-a341-4a683d97e64a", "metadata": {}, "source": [ "Fill in the missing bits, `mean`, `upto`, and `non_negative`:" ] }, { "cell_type": "code", "execution_count": 6, "id": "dc686d52-2c8d-4e2b-be41-d8c354ef903e", "metadata": {}, "outputs": [], "source": [ "from statistics import mean\n", "\n", "def upto(sentinel, items: list) -> list:\n", " \"\"\"Return the list of items that appear before `sentinel`,\n", " or all the items if `sentinel` does not appear in the list.\"\"\"\n", " return items[:items.index(sentinel)] if sentinel in items else items\n", "\n", "def non_negative(numbers: List[Number]) -> List[Number]: \n", " \"\"\"The `numbers` that are greater than or equal to 0.\"\"\"\n", " return [x for x in numbers if x >= 0] " ] }, { "cell_type": "markdown", "id": "dd04aa98-ef15-4288-bd70-e08d7d07d071", "metadata": {}, "source": [ "Write some tests and pass them, and we're done!" ] }, { "cell_type": "code", "execution_count": 7, "id": "9e6c81b8-1665-47cf-83f3-bf010bb6aa3d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def test_rainfall() -> bool:\n", " \"\"\"Unit tests for the `rainfall` problem.\"\"\"\n", " \n", " assert 0/2 == rainfall([0, 0]), \"no rain\"\n", " assert 5/1 == rainfall([5]), \"one day\"\n", " assert 6/3 == rainfall([1, 2, 3]), \"the mean of several days\"\n", " assert 6/4 == rainfall([0, 1, 2, 3]), \"the mean (which is a non-integer)\"\n", " assert 6/4 == rainfall([1, 0, 3, 2]), \"order doesn't matter\"\n", " assert 9/3 == rainfall([1, 2, -9, -100, 6]), \"negative values are ignored\"\n", " assert 7/5 == rainfall([1, 0, 2, 0, 4]), \"zero values are not ignored\"\n", " assert 8/3 == rainfall([1, 2, 5, -999, 404]), \"values after -999 are ignored\"\n", " \n", " numbers = [3, 1, 4, 1, 5, 9, -2, 6, -5, 3, -6, -999, 2, 7, 1, 8, 2, 8, 1, 8, 2, 8]\n", " assert upto(-999, numbers) == [3, 1, 4, 1, 5, 9, -2, 6, -5, 3, -6]\n", " assert non_negative(upto(-999, numbers)) == [3, 1, 4, 1, 5, 9, 6, 3]\n", " assert mean([3, 1, 4, 1, 5, 9, 6, 3]) == 32/8 == 4\n", " assert 32/8 == rainfall(numbers)\n", "\n", " return True\n", "\n", "test_rainfall()" ] }, { "cell_type": "markdown", "id": "74522f90-1a09-4fdf-b07c-7d0134efc3e3", "metadata": {}, "source": [ "# More Complex Example: The Segments Problem\n", "\n", "A [paper](https://www.cambridge.org/core/services/aop-cambridge-core/content/view/ABEA634EB9763953CBCC8D2AC58FE710/S0956796821000216a.pdf/segments-an-alternative-rainfall-problem.pdf) by [Peter Achten](https://www.ru.nl/en/people/achten-p) poses another problem:\n", "\n", "- *Design a program called **segments** that consumes a list of numbers. Produce a list of all elements, without duplicates, and sorted in increasing order. Instead of containing all individual elements, these are organized as segments. A segment is either a single value x or a pair of two values a and b such that a, a+1, ...., b −1, b are in the list (neither a−1 nor b+1 are in the list). The segments must be shown as strings, formatted as x for singleton segments and a−b for the other segments.*\n", "\n", "I'll start by taking the problem description verbatim, and breaking it up into two **docstrings**, one for the function `segments` that was requested in the description, and one for the class `Segment` that I think will be useful." ] }, { "cell_type": "code", "execution_count": 8, "id": "ba65e450-bb0c-44bf-94a3-8aef055e6f01", "metadata": {}, "outputs": [], "source": [ "class Segment:\n", " \"\"\"A segment is either a single value x or a pair of two values a and b such that \n", " a, a+1, ...., b −1, b are in the list (neither a−1 nor b+1 are in the list). \n", " The segments must be shown as strings, formatted as x for singleton segments \n", " and a−b for the other segments.\"\"\"\n", " ...\n", " \n", "def segments(numbers: List[Number]) -> List[Segment]:\n", " \"\"\"Design a program called segments that consumes a list of numbers. \n", " Produce a list of all elements, without duplicates, and sorted in increasing order. \n", " Instead of containing all individual elements, these are organized as segments.\"\"\"\n", " ..." ] }, { "cell_type": "markdown", "id": "6e317bd3-dce8-4934-b90a-64a7b76d16d0", "metadata": {}, "source": [ "I'm not very happy with the docstring for `Segment`. First, the docstring is mixing up two things: what a \"segment\" is, and how it relates to the numbers \"in the list.\" Second, `start` and `end` are better names than `a` and `b` for a segment's attributes.\n", "\n", "Here's another try at the **docstring**:" ] }, { "cell_type": "code", "execution_count": 9, "id": "69712e85-906e-4e52-bbdc-c809b95cbfb2", "metadata": {}, "outputs": [], "source": [ "class Segment:\n", " \"\"\"`Segment(start, end)` is like `range(start, end + 1)`, but is mutable and has fewer methods.\n", " `repr(Segment(start, end))` is the string 'start-end', or just 'start' if start == end.\"\"\"\n", " ..." ] }, { "cell_type": "markdown", "id": "641aa3b9-f716-4a94-bca7-446a5218743e", "metadata": {}, "source": [ "Now I'll write the **body** for `Segment`.(A full `Segment` class should implement all methods of `range`, but they are not needed for this problem.)" ] }, { "cell_type": "code", "execution_count": 10, "id": "a946e971-c9a1-46ce-9c9c-b5cc6b28630a", "metadata": {}, "outputs": [], "source": [ "from dataclasses import dataclass\n", "\n", "@dataclass\n", "class Segment:\n", " \"\"\"`Segment(start, end)` is like `range(start, end + 1)`, but is mutable and has fewer methods.\n", " `repr(Segment(start, end))` is the string 'start-end', or just 'start' if start == end.\"\"\"\n", " start: int\n", " end: int\n", " \n", " def __repr__(self) -> str: \n", " return f'{self.start}-{self.end}' if self.start != self.end else f'{self.start}' " ] }, { "cell_type": "markdown", "id": "8350f623-c363-4fca-935c-b355e983bf1c", "metadata": {}, "source": [ "Let's make sure this works:" ] }, { "cell_type": "code", "execution_count": 11, "id": "7ed48c0d-360f-47fc-acb3-8dc59f548d78", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[7-11, 25]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[Segment(7, 11), Segment(25, 25)]" ] }, { "cell_type": "markdown", "id": "be9c2978-0609-4b39-aff4-aa77e37f6765", "metadata": {}, "source": [ "Now the **body** of the function `segments`: " ] }, { "cell_type": "code", "execution_count": 12, "id": "c39f92a8-03d2-4e80-8c24-0450ff8d989a", "metadata": {}, "outputs": [], "source": [ "def segments(numbers: List[Number]) -> List[Segment]:\n", " \"\"\"Design a program called `segments` that consumes a list of numbers. \n", " Produce a list of all elements, without duplicates, and sorted in increasing order. \n", " Instead of containing all individual elements, these are organized as segments.\"\"\"\n", " return organize_as_segments(sorted(without_duplicates(numbers)))" ] }, { "cell_type": "markdown", "id": "dfbfb16b-3779-4cdf-a9f1-97d1076e2c91", "metadata": {}, "source": [ "This is pretty good, but I think it can be improved. The docstring has extraneous text (\"Design a program\") and could be more closely aligned with the body. And it might be more modular to have a function to do one thing (add a *single* number to a list of segments), rather than a function that organizes *all* the numbers into a list of segments. Finally, I realize that converting every number to a singleton segment would (arguably) be consistent with the problem description, but it is clear that the intent was to make segments be as long as possible. I'll edit the **docstring** first to reflect this:" ] }, { "cell_type": "code", "execution_count": 13, "id": "f501c7aa-2077-4f1e-96fd-ea85ca1c20ed", "metadata": {}, "outputs": [], "source": [ "def segments(numbers: List[Number]) -> List[Segment]:\n", " \"\"\"Return a sorted, minimal, list of segments such that each of `numbers` is in one of the segments. \n", " Iterate through `numbers (in sorted order and without duplicates) adding each number to the \n", " evolving list of segments, making each segment as long as possible.\"\"\"\n", " ..." ] }, { "cell_type": "markdown", "id": "0b0280bc-933f-4376-aedf-0df7e55d2137", "metadata": {}, "source": [ "Now I'll edit the **body**:" ] }, { "cell_type": "code", "execution_count": 14, "id": "ebd7b8ff-1375-4c20-b72c-c72c6497f925", "metadata": {}, "outputs": [], "source": [ "def segments(numbers: List[Number]) -> List[Segment]:\n", " \"\"\"Return a sorted, minimal, list of segments such that each of `numbers` is in one of the segments. \n", " Iterate through `numbers (in sorted order and without duplicates) adding each number to the \n", " evolving list of segments, making each segment as long as possible.\"\"\"\n", " segment_list = []\n", " for n in sorted(without_duplicates(numbers)):\n", " add_to_segment_list(n, segment_list)\n", " return segment_list" ] }, { "cell_type": "markdown", "id": "46b60d8d-8ae8-4885-9fe7-2074cc888f33", "metadata": {}, "source": [ "There's not quite a one-to-one correspondance here between docstring and code. Maybe that means I'm following my advice of \"This approach is not always appropriate!\" Or maybe it means that the pattern of initializing a variable to the empty list is so well-known that I don't need to mention it in the docstring. \n", "\n", "In any case, here are the remaining missing pieces, `without_duplicates` and `add_to_segment_list`:" ] }, { "cell_type": "code", "execution_count": 15, "id": "fdd08113-3d84-46d9-88e6-727acb82502d", "metadata": {}, "outputs": [], "source": [ "without_duplicates = set # A set is a collection with no duplicates\n", "\n", "def add_to_segment_list(n: int, segment_list: List[Segment]) -> None:\n", " \"\"\"Mutate `segment_list` to cover `n`. If `n` is one more than the end of the last Segment, \n", " then add `n` to that last segment. Otherwise append a new Segment (for `n`) to `segment_list`.\"\"\"\n", " if segment_list and segment_list[-1].end == n - 1:\n", " segment_list[-1].end = n\n", " else:\n", " segment_list.append(Segment(n, n)) " ] }, { "cell_type": "markdown", "id": "684a5bc9-6f8f-4c71-88a4-2815120e3d1f", "metadata": {}, "source": [ "Again, write some tests and pass them, and we're done!" ] }, { "cell_type": "code", "execution_count": 16, "id": "d6540c49-0782-4245-9c7a-e1ad7f67be0e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def test_segments() -> bool:\n", " \"\"\"Unit tests for `segments` problem.\"\"\"\n", " assert segments([]) == [], \"empty numbers\"\n", " assert str(segments([42])) == '[42]', \"one number\"\n", " assert str(segments([42, 42])) == '[42]', \"one number duplicated\"\n", " assert str(segments([1, 2, 3])) == '[1-3]', \"one segment\"\n", " assert str(segments([3, 1, 2])) == '[1-3]', \"same segment from unsorted input\"\n", " numbers = [4, 0, 4, 8, 3, 1, 10, 2, 9, 7, 11, 24, 7]\n", " assert str(segments(numbers)) == '[0-4, 7-11, 24]', \"multiple segments\"\n", " for s in segments(numbers):\n", " assert type(s) == Segment and s.start <= s.end, \"result is a list of valid Segments\"\n", " assert without_duplicates([1, 2, 2, 1]) == {1, 2}, \"no duplicates\"\n", "\n", " s = []\n", " add_to_segment_list(1, s)\n", " assert s == [Segment(1, 1)]\n", " add_to_segment_list(2, s) \n", " assert s == [Segment(1, 2)]\n", " add_to_segment_list(4, s)\n", " assert s == [Segment(1, 2), Segment(4, 4)]\n", " add_to_segment_list(5, s)\n", " assert s == [Segment(1, 2), Segment(4, 5)]\n", " \n", " return True\n", "\n", "test_segments()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 5 }