diff --git a/ipynb/DocstringFixpoint.ipynb b/ipynb/DocstringFixpoint.ipynb index 1238686..80d1f67 100644 --- a/ipynb/DocstringFixpoint.ipynb +++ b/ipynb/DocstringFixpoint.ipynb @@ -9,34 +9,20 @@ "\n", "# Docstring Fixpoint Theory\n", "\n", - "This notebook makes a proposal:\n", + "In reading/writing/debugging/verifying a short Python function we would like there to be a clear and obvious connection between the problem **specification** and both the **docstring** and the **code**. This notebook makes a proposal:\n", "\n", - "- One approach to writing a short function (in Python or other programming langauges) is to repeatedly alternate editing the **docstring** and the **body** of the function until they converge to a **[fixpoint](https://en.wikipedia.org/wiki/Fixed_point)** in which there is an obvious one-to-one correspondance between the two.\n", + "- One approach is to repeatedly alternate editing the **docstring** and the **body** of the function until they converge to a **[fixpoint](https://en.wikipedia.org/wiki/Fixed_point)** in which there is an obvious one-to-one correspondance between the two, and no changes can be made to improve either one.\n", "\n", - "This approach follows the first of [Tony Hoare](https://en.wikipedia.org/wiki/Tony_Hoare)'s two methods: *\"There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.\"* \n", + "This approach follows [Tony Hoare](https://en.wikipedia.org/wiki/Tony_Hoare)'s first method: *\"There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.\"* \n", "\n", "Some caveats: \n", - "- This approach is not always appropriate! For many functions the docstring is a high-level description and the code has more detail that is not in the docstring. Docstring fixpoint theory makes the most sense for very short functions.\n", - "- The edits will change the wording, but must maintain the meaning.\n", - "\n", - "Python programs are easier to understand with [type hints](https://docs.python.org/3/library/typing.html), and dataclasses are nice, so I'll start with this:" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "158c20d7-c4d4-463e-b228-248f39220cad", - "metadata": {}, - "outputs": [], - "source": [ - "from dataclasses import dataclass\n", - "from typing import *\n", - "Number = Union[int, float]" + "- This approach is not always appropriate! For many functions the docstring is a high-level description and the code has more detail that is not in the docstring. Docstring fixpoint theory makes the most sense for short functions (just a few lines).\n", + "- The edits will change the wording, but must maintain the meaning (the correspondance to the intended purpose; the specification)." ] }, { "cell_type": "markdown", - "id": "6b363922-2e61-425e-8b43-ede837481017", + "id": "a265710c-070d-455a-ba21-19c78441ca90", "metadata": {}, "source": [ "# Example: The Rainfall Problem\n", @@ -46,7 +32,27 @@ "\n", "- *Design a program called **rainfall** that consumes a list of numbers representing daily rainfall amounts as entered by a user. The list may contain the number -999 indicating the end of the data of interest. Produce the average of the non-negative values in the list up to the first -999 (if it shows up). There may be negative numbers other than -999 in the list.* \n", "\n", - "We start by writing a function prototype containing the complete problem statement as the **docstring**:" + "The problem doesn't say what kind of numbers will be in the list, so I'll define `Number` to be either an integer or floating point number (It doesn't make sense to have a [complex](https://docs.python.org/3/library/cmath.html) amount of rainfall.)" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "158c20d7-c4d4-463e-b228-248f39220cad", + "metadata": {}, + "outputs": [], + "source": [ + "from typing import *\n", + "\n", + "Number = Union[int, float]" + ] + }, + { + "cell_type": "markdown", + "id": "829a95cc-0697-4a55-b01d-86a02322409f", + "metadata": {}, + "source": [ + "Start by writing a function prototype containing the complete problem statement as the **docstring**:" ] }, { @@ -145,13 +151,13 @@ "source": [ "from statistics import mean\n", "\n", - "def upto(end, items: list) -> list:\n", - " \"\"\"Return the list of items, but if `end` occurs in the list,\n", - " only return the items that appear before `end`.\"\"\"\n", - " return items if end not in items else items[:items.index(end)]\n", + "def upto(sentinel, items: list) -> list:\n", + " \"\"\"Return the list of items that appear before `sentinel`,\n", + " or all the items if `sentinel` does not appear in the list.\"\"\"\n", + " return items[:items.index(sentinel)] if sentinel in items else items\n", "\n", "def non_negative(numbers: List[Number]) -> List[Number]: \n", - " \"\"\"The numbers that are greater than or equal to 0.\"\"\"\n", + " \"\"\"The `numbers` that are greater than or equal to 0.\"\"\"\n", " return [x for x in numbers if x >= 0] " ] }, @@ -181,7 +187,9 @@ } ], "source": [ - "def test_rainfall():\n", + "def test_rainfall() -> bool:\n", + " \"\"\"Unit tests for the `rainfall` problem.\"\"\"\n", + " \n", " assert 0/2 == rainfall([0, 0]), \"no rain\"\n", " assert 5/1 == rainfall([5]), \"one day\"\n", " assert 6/3 == rainfall([1, 2, 3]), \"the mean of several days\"\n", @@ -190,8 +198,15 @@ " assert 9/3 == rainfall([1, 2, -9, -100, 6]), \"negative values are ignored\"\n", " assert 7/5 == rainfall([1, 0, 2, 0, 4]), \"zero values are not ignored\"\n", " assert 8/3 == rainfall([1, 2, 5, -999, 404]), \"values after -999 are ignored\"\n", - " return True\n", " \n", + " numbers = [3, 1, 4, 1, 5, 9, -2, 6, -5, 3, -6, -999, 2, 7, 1, 8, 2, 8, 1, 8, 2, 8]\n", + " assert upto(-999, numbers) == [3, 1, 4, 1, 5, 9, -2, 6, -5, 3, -6]\n", + " assert non_negative(upto(-999, numbers)) == [3, 1, 4, 1, 5, 9, 6, 3]\n", + " assert mean([3, 1, 4, 1, 5, 9, 6, 3]) == 32/8 == 4\n", + " assert 32/8 == rainfall(numbers)\n", + "\n", + " return True\n", + "\n", "test_rainfall()" ] }, @@ -248,8 +263,8 @@ "outputs": [], "source": [ "class Segment:\n", - " \"\"\"`Segment(start, end)` represents the same sequence of integers as `range(start, end + 1)`,\n", - " but a Segment is mutable. The `repr` is the string 'start-end', or just 'start' if start == end.\"\"\"\n", + " \"\"\"`Segment(start, end)` is like `range(start, end + 1)`, but is mutable and has fewer methods.\n", + " `repr(Segment(start, end))` is the string 'start-end', or just 'start' if start == end.\"\"\"\n", " ..." ] }, @@ -268,12 +283,14 @@ "metadata": {}, "outputs": [], "source": [ + "from dataclasses import dataclass\n", + "\n", "@dataclass\n", "class Segment:\n", " \"\"\"`Segment(start, end)` is like `range(start, end + 1)`, but is mutable and has fewer methods.\n", " `repr(Segment(start, end))` is the string 'start-end', or just 'start' if start == end.\"\"\"\n", " start: int\n", - " end: int\n", + " end: int\n", " \n", " def __repr__(self) -> str: \n", " return f'{self.start}-{self.end}' if self.start != self.end else f'{self.start}' " @@ -335,7 +352,7 @@ "id": "dfbfb16b-3779-4cdf-a9f1-97d1076e2c91", "metadata": {}, "source": [ - "This is pretty good, but I think it can be improved. The docstring has extraneous text (\"Design a program\") and could be more closely aligned with the body. And it might be more modular to have a function to do one thing (add a *single* number to a list of segments), rather than a function that organizes *all* the numbers into a list of segments. I'll edit the **docstring** first to reflect this change:" + "This is pretty good, but I think it can be improved. The docstring has extraneous text (\"Design a program\") and could be more closely aligned with the body. And it might be more modular to have a function to do one thing (add a *single* number to a list of segments), rather than a function that organizes *all* the numbers into a list of segments. Finally, I realize that converting every number to a singleton segment would (arguably) be consistent with the problem description, but it is clear that the intent was to make segments be as long as possible. I'll edit the **docstring** first to reflect this:" ] }, { @@ -346,8 +363,9 @@ "outputs": [], "source": [ "def segments(numbers: List[Number]) -> List[Segment]:\n", - " \"\"\"Return a segment list that covers all the `numbers`. Iterate through `numbers`\n", - " (in sorted order and without duplicates) and add each number `n` to the evolving segment list.\"\"\"\n", + " \"\"\"Return a sorted, minimal, list of segments such that each of `numbers` is in one of the segments. \n", + " Iterate through `numbers (in sorted order and without duplicates) adding each number to the \n", + " evolving list of segments, making each segment as long as possible.\"\"\"\n", " ..." ] }, @@ -367,8 +385,9 @@ "outputs": [], "source": [ "def segments(numbers: List[Number]) -> List[Segment]:\n", - " \"\"\"Return a segment list that covers all the `numbers`. Iterate through `numbers`\n", - " (in sorted order and without duplicates) and add each number `n` to the evolving segment list.\"\"\"\n", + " \"\"\"Return a sorted, minimal, list of segments such that each of `numbers` is in one of the segments. \n", + " Iterate through `numbers (in sorted order and without duplicates) adding each number to the \n", + " evolving list of segments, making each segment as long as possible.\"\"\"\n", " segment_list = []\n", " for n in sorted(without_duplicates(numbers)):\n", " add_to_segment_list(n, segment_list)\n", @@ -397,7 +416,7 @@ "def add_to_segment_list(n: int, segment_list: List[Segment]) -> None:\n", " \"\"\"Mutate `segment_list` to cover `n`. If `n` is one more than the end of the last Segment, \n", " then add `n` to that last segment. Otherwise append a new Segment (for `n`) to `segment_list`.\"\"\"\n", - " if segment_list and segment_list[-1].end + 1 == n:\n", + " if segment_list and segment_list[-1].end == n - 1:\n", " segment_list[-1].end = n\n", " else:\n", " segment_list.append(Segment(n, n)) " @@ -429,7 +448,8 @@ } ], "source": [ - "def test_segments():\n", + "def test_segments() -> bool:\n", + " \"\"\"Unit tests for `segments` problem.\"\"\"\n", " assert segments([]) == [], \"empty numbers\"\n", " assert str(segments([42])) == '[42]', \"one number\"\n", " assert str(segments([42, 42])) == '[42]', \"one number duplicated\"\n", @@ -439,7 +459,7 @@ " assert str(segments(numbers)) == '[0-4, 7-11, 24]', \"multiple segments\"\n", " for s in segments(numbers):\n", " assert type(s) == Segment and s.start <= s.end, \"result is a list of valid Segments\"\n", - " assert without_duplicates([1, 2, 2, 1]) == {1, 2}, \"no duplicates\"\n", + " assert without_duplicates([1, 2, 2, 1]) == {1, 2}, \"no duplicates\"\n", "\n", " s = []\n", " add_to_segment_list(1, s)\n", @@ -473,7 +493,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.8.15" + "version": "3.9.12" } }, "nbformat": 4,