Add files via upload

2024-04-11 01:02:57 -07:00 · 2024-04-11 01:02:05 -07:00
1 changed files with 343 additions and 51 deletions
--- a/ipynb/DocstringFixpoint.ipynb
+++ b/ipynb/DocstringFixpoint.ipynb
@ -2,46 +2,61 @@
 "cells": [
  {
   "cell_type": "markdown",
-   "id": "4c9cc4a3-6e84-46be-9462-3ac9d12d7a61",
+   "id": "7d22d1c6-0a90-4410-ae48-1a16f094cbf2",
   "metadata": {},
   "source": [
-    "<div style=\"text-align: right\" align=\"right\"><i>Peter Norvig<br>2023</i></div>\n",
+    "<div style=\"text-align: right\" align=\"right\"><i>Peter Norvig<br>Jan 2023</i></div>\n",
    "\n",
    "# Docstring Fixpoint Theory\n",
    "\n",
-    "This notebook makes the following proposal:\n",
+    "This notebook makes a proposal:\n",
    "\n",
-    "- One approach to writing the code for a function is to repeatedly edit the **docstring** and the function **code** until they converge to a **fixpoint** in which there is an obvious one-to-one correspondance between the two.\n",
+    "- One approach to writing a short function (in Python or other programming langauges) is to repeatedly alternate editing the **docstring** and the  **body** of the function until they converge to a **[fixpoint](https://en.wikipedia.org/wiki/Fixed_point)** in which there is an obvious one-to-one correspondance between the two.\n",
    "\n",
-    "This approach follows the first of [Tony Hoare](https://en.wikipedia.org/wiki/Tony_Hoare)'s two methods: *\"There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.\"* Some caveats: \n",
-    "- This approach is not always appropriate! For many functions the docstring is a high-level description and the code has more detail that is not in the docstring.\n",
-    "- The edits to the docstring must maintain the meaning (just change the expression).\n",
+    "This approach follows the first of [Tony Hoare](https://en.wikipedia.org/wiki/Tony_Hoare)'s two methods: *\"There are two methods in software design. One is to make the program so simple, there are obviously no errors. The other is to make it so complicated, there are no obvious errors.\"* \n",
    "\n",
+    "Some caveats: \n",
+    "- This approach is not always appropriate! For many functions the docstring is a high-level description and the code has more detail that is not in the docstring. Docstring fixpoint theory makes the most sense  for very short functions.\n",
+    "- The edits will change the wording, but must maintain the meaning.\n",
    "\n",
-    "\n",
-    "\n",
-    "\n",
-    "# Example: The Rainfall Problem\n",
-    "\n",
-    "The \"Rainfall Problem\" has been used to explore the ways that novices address a programming problem. We will use [Kathi Fisler](https://cs.brown.edu/~kfisler)'s [version](https://cs.brown.edu/~kfisler/Pubs/icer14-rainfall/) of the problem:\n",
-    "\n\n",
-    "- *Design a program called 'rainfall' that consumes a list ",
-    "of numbers representing daily rainfall amounts as entered by a user. The list may contain the number -999 ",
-    "indicating the end of the data of interest. Produce ",
-    "the average of the non-negative values in the list up to ",
-    "the first -999 (if it shows up). There may be negative numbers other than -999 in the list.* \n",
-    "\n",
-    "We start by writing a function prototype containing the complete problem statement as the docstring:"
+    "Python programs are easier to understand with [type hints](https://docs.python.org/3/library/typing.html), and dataclasses are nice, so I'll start with this:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 1,
+   "id": "158c20d7-c4d4-463e-b228-248f39220cad",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from dataclasses import dataclass\n",
+    "from typing import *\n",
+    "Number = Union[int, float]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6b363922-2e61-425e-8b43-ede837481017",
+   "metadata": {},
+   "source": [
+    "# Example: The Rainfall Problem\n",
+    "\n",
+    "The \"Rainfall Problem,\" initially posed by [Elliot Soloway](https://cdc.engin.umich.edu/elliot-soloway/), has been used to explore the ways that novices address a programming problem. We will use [Kathi Fisler](https://cs.brown.edu/~kfisler)'s [version](https://cs.brown.edu/~kfisler/Pubs/icer14-rainfall/) of the problem:\n",
+    "\n",
+    "\n",
+    "- *Design a program called **rainfall** that consumes a list of numbers representing daily rainfall amounts as entered by a user. The list may contain the number -999 indicating the end of the data of interest. Produce the average of the non-negative values in the list up to the first -999 (if it shows up). There may be negative numbers other than -999 in the list.* \n",
+    "\n",
+    "We start by writing a function prototype containing the complete problem statement as the **docstring**:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
   "id": "4ac87ea7-7ad9-48e2-a21f-fb772e540487",
   "metadata": {},
   "outputs": [],
   "source": [
-    "def rainfall(numbers: list):\n",
+    "def rainfall(numbers: List[Number]) -> Number:\n",
    "    \"\"\"Design a program called rainfall that consumes a list of numbers \n",
    "    representing daily rainfall amounts as entered by a user. \n",
    "    The list may contain the number -999 indicating the end of the data of interest. \n",
@ -55,17 +70,17 @@
   "id": "493ba182-2d75-4caf-9998-c46637fd949f",
   "metadata": {},
   "source": [
-    "We then edit the docstring to delete extraneous parts:"
+    "Now edit the **docstring** to remove extraneous parts, in the process abstracting the problem away from rainfall and focusing on the numbers involved:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 3,
   "id": "3000881a-f2b0-4e00-9bfa-9c358a94de2c",
   "metadata": {},
   "outputs": [],
   "source": [
-    "def rainfall(numbers):\n",
+    "def rainfall(numbers: List[Number]) -> Number:\n",
    "    \"\"\"Produce the average of the non-negative values in a list of numbers,\n",
    "    up to the first -999 (if it shows up).\"\"\"\n",
    "    ..."
@ -76,20 +91,20 @@
   "id": "be7f6cc2-cd41-4472-8abf-f052a30bb031",
   "metadata": {},
   "source": [
-    "We then write code that mirrors the docstring as closely as possible:"
+    "Next write a function **body** that mirrors the docstring as closely as possible. (It introduces helper functions.)"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 4,
   "id": "15bbdbb7-b16d-4347-80c1-bdac375a95b4",
   "metadata": {},
   "outputs": [],
   "source": [
-    "def rainfall(numbers: list):\n",
+    "def rainfall(numbers: List[Number]) -> Number:\n",
    "    \"\"\"Produce the average of the non-negative values in a list of numbers,\n",
    "    up to the first -999 (if it shows up).\"\"\"\n",
-    "    return mean(non_negative(upto(numbers, -999)))"
+    "    return mean(non_negative(upto(-999, numbers)))"
   ]
  },
  {
@ -97,20 +112,20 @@
   "id": "601a806a-f710-4877-851e-caff50657039",
   "metadata": {},
   "source": [
-    "And lightly edit the docstring once more to bring it into closer compliance with the code:"
+    "Lightly edit the **docstring** once more to bring it into even closer agreement with the code:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 41,
+   "execution_count": 5,
   "id": "701bb096-d90e-4a45-bfbb-58ea2106fdd5",
   "metadata": {},
   "outputs": [],
   "source": [
-    "def rainfall(numbers: list) -> float:\n",
+    "def rainfall(numbers: List[Number]) -> Number:\n",
    "    \"\"\"Return the mean of the non-negative values in a list of numbers,\n",
    "    up to the first -999 (if it shows up).\"\"\"\n",
-    "    return mean(non_negative(upto(numbers, -999)))"
+    "    return mean(non_negative(upto(-999, numbers)))"
   ]
  },
  {
@ -118,23 +133,26 @@
   "id": "6a1a566c-2113-4d71-a341-4a683d97e64a",
   "metadata": {},
   "source": [
-    "Now fill in the missing bits, `mean`, `upto`, and `non_negative`:"
+    "Fill in the missing bits, `mean`, `upto`, and `non_negative`:"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 42,
+   "execution_count": 6,
   "id": "dc686d52-2c8d-4e2b-be41-d8c354ef903e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from statistics import mean\n",
    "\n",
-    "def upto(items: list, end) -> list:\n",
-    "    \"\"\"The items before the first occurence of `end` (if it shows up).\"\"\"\n",
-    "    return items if (end not in items) else items[:items.index(end)]\n",
+    "def upto(end, items: list) -> list:\n",
+    "    \"\"\"Return the list of items, but if `end` occurs in the list,\n",
+    "    only return the items that appear before `end`.\"\"\"\n",
+    "    return items if end not in items else items[:items.index(end)]\n",
    "\n",
-    "def non_negative(numbers: list) -> list: return [x for x in numbers if x >= 0]  "
+    "def non_negative(numbers: List[Number]) -> List[Number]: \n",
+    "    \"\"\"The numbers that are greater than or equal to 0.\"\"\"\n",
+    "    return [x for x in numbers if x >= 0]  "
   ]
  },
  {
@ -142,26 +160,300 @@
   "id": "dd04aa98-ef15-4288-bd70-e08d7d07d071",
   "metadata": {},
   "source": [
-    "Pass some tests, and we're done!"
+    "Write some tests and pass them, and we're done!"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 43,
+   "execution_count": 7,
   "id": "9e6c81b8-1665-47cf-83f3-bf010bb6aa3d",
   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def test_rainfall():\n",
+    "    assert  0/2 == rainfall([0, 0]),               \"no rain\"\n",
+    "    assert  5/1 == rainfall([5]),                  \"one day\"\n",
+    "    assert  6/3 == rainfall([1, 2, 3]),            \"the mean of several days\"\n",
+    "    assert  6/4 == rainfall([0, 1, 2, 3]),         \"the mean (which is a non-integer)\"\n",
+    "    assert  6/4 == rainfall([1, 0, 3, 2]),         \"order doesn't matter\"\n",
+    "    assert  9/3 == rainfall([1, 2, -9, -100, 6]),  \"negative values are ignored\"\n",
+    "    assert  7/5 == rainfall([1, 0, 2, 0, 4]),      \"zero values are not ignored\"\n",
+    "    assert  8/3 == rainfall([1, 2, 5, -999, 404]), \"values after -999 are ignored\"\n",
+    "    return True\n",
+    "    \n",
+    "test_rainfall()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74522f90-1a09-4fdf-b07c-7d0134efc3e3",
+   "metadata": {},
+   "source": [
+    "# More Complex Example: The Segments Problem\n",
+    "\n",
+    "A [paper](https://www.cambridge.org/core/services/aop-cambridge-core/content/view/ABEA634EB9763953CBCC8D2AC58FE710/S0956796821000216a.pdf/segments-an-alternative-rainfall-problem.pdf) by [Peter Achten](https://www.ru.nl/en/people/achten-p) poses another problem:\n",
+    "\n",
+    "- *Design a program called **segments** that consumes a list of numbers. Produce a list of all elements, without duplicates, and sorted in increasing order. Instead of containing all individual elements, these are organized as segments. A segment is either a single value x or a pair of two values a and b such that a, a+1, ...., b −1, b are in the list (neither a−1 nor b+1 are in the list). The segments must be shown as strings, formatted as x for singleton segments and a−b for the other segments.*\n",
+    "\n",
+    "I'll start by taking the problem description verbatim, and breaking it up into two **docstrings**, one for the function `segments` that was requested in the description, and one for the class `Segment` that I think will be useful."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "ba65e450-bb0c-44bf-94a3-8aef055e6f01",
+   "metadata": {},
   "outputs": [],
   "source": [
-    "def test():\n",
-    "    assert rainfall([3]) == 3,                   \"one day\"\n",
-    "    assert rainfall([0, 0]) == 0,                \"no rain\"\n",
-    "    assert rainfall([1, 2, 3]) == 2,             \"just the mean\"\n",
-    "    assert rainfall([1, 2, 3, 4]) == 2.5,        \"just the mean (which is a non-integer)\"\n",
-    "    assert rainfall([1, 2, 3, 4, 0]) == 2,       \"zero values are counted\"\n",
-    "    assert rainfall([1, 2, 3, 4, -100, 0]) == 2, \"negative values are ignored\"\n",
-    "    assert rainfall([1, 2, 3, -999, 404]) == 2,  \"values after -999 are ignored\"\n",
+    "class Segment:\n",
+    "    \"\"\"A segment is either a single value x or a pair of two values a and b such that \n",
+    "    a, a+1, ...., b −1, b are in the list (neither a−1 nor b+1 are in the list). \n",
+    "    The segments must be shown as strings, formatted as x for singleton segments \n",
+    "    and a−b for the other segments.\"\"\"\n",
+    "    ...\n",
+    "        \n",
+    "def segments(numbers: List[Number]) -> List[Segment]:\n",
+    "    \"\"\"Design a program called segments that consumes a list of numbers. \n",
+    "    Produce a list of all elements, without duplicates, and sorted in increasing order. \n",
+    "    Instead of containing all individual elements, these are organized as segments.\"\"\"\n",
+    "    ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6e317bd3-dce8-4934-b90a-64a7b76d16d0",
+   "metadata": {},
+   "source": [
+    "I'm not very happy with the docstring for `Segment`. First, the docstring  is mixing up two things: what a \"segment\" is, and how it relates to the numbers \"in the list.\"  Second, `start` and `end` are better names than `a` and `b` for a segment.\n",
+    "\n",
+    "Here's another try at the **docstring**:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "69712e85-906e-4e52-bbdc-c809b95cbfb2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Segment:\n",
+    "    \"\"\"`Segment(start, end)` represents the same sequence of integers as `range(start, end + 1)`,\n",
+    "    but a Segment is mutable. The `repr` is the string 'start-end', or just 'start' if start == end.\"\"\"\n",
+    "    ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "641aa3b9-f716-4a94-bca7-446a5218743e",
+   "metadata": {},
+   "source": [
+    "Now I'll write the **body** for  `Segment`. It will be a dataclass with member variables `start` and `end`, abd a magic `repr` method. (A full `Segment` class should probably implement the same methods as `range`, but they are not needed for this problem.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "a946e971-c9a1-46ce-9c9c-b5cc6b28630a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@dataclass\n",
+    "class Segment:\n",
+    "    \"\"\"`Segment(start, end)` is like `range(start, end + 1)`, but is mutable and has fewer methods.\n",
+    "    `repr(Segment(start, end))` is the string 'start-end', or just 'start' if start == end.\"\"\"\n",
+    "    start: int\n",
+    "    end: int\n",
    "    \n",
-    "test()"
+    "    def __repr__(self) -> str: \n",
+    "        return f'{self.start}-{self.end}' if self.start != self.end else f'{self.start}'  "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8350f623-c363-4fca-935c-b355e983bf1c",
+   "metadata": {},
+   "source": [
+    "Let's make sure this works:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "7ed48c0d-360f-47fc-acb3-8dc59f548d78",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[7-11, 25]"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[Segment(7, 11), Segment(25, 25)]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "be9c2978-0609-4b39-aff4-aa77e37f6765",
+   "metadata": {},
+   "source": [
+    "Now the **body** of the function `segments`: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "c39f92a8-03d2-4e80-8c24-0450ff8d989a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def segments(numbers: List[Number]) -> List[Segment]:\n",
+    "    \"\"\"Design a program called `segments` that consumes a list of numbers. \n",
+    "    Produce a list of all elements, without duplicates, and sorted in increasing order. \n",
+    "    Instead of containing all individual elements, these are organized as segments.\"\"\"\n",
+    "    return organize_as_segments(sorted(without_duplicates(numbers)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "dfbfb16b-3779-4cdf-a9f1-97d1076e2c91",
+   "metadata": {},
+   "source": [
+    "This is pretty good, but I think it can be improved. The docstring has extraneous text (\"Design a program\") and could be more closely aligned with the body. And  it might be more modular to have a function to do one thing (add a single number to a list of segments), rather than a function that organizes all the numbers into a list of segments.  I'll edit the **docstring** first to reflect this change:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "f501c7aa-2077-4f1e-96fd-ea85ca1c20ed",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def segments(numbers: List[Number]) -> List[Segment]:\n",
+    "    \"\"\"Return a segment list that covers all the `numbers`. Iterate through `numbers`\n",
+    "    (in sorted order and without duplicates) and add each number `n` to the evolving segment list.\"\"\"\n",
+    "    ..."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0b0280bc-933f-4376-aedf-0df7e55d2137",
+   "metadata": {},
+   "source": [
+    "Now I'll edit the **body**:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "ebd7b8ff-1375-4c20-b72c-c72c6497f925",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def segments(numbers: List[Number]) -> List[Segment]:\n",
+    "    \"\"\"Return a segment list that covers all the `numbers`. Iterate through `numbers`\n",
+    "    (in sorted order and without duplicates) and add each number `n` to the evolving segment list.\"\"\"\n",
+    "    segment_list = []\n",
+    "    for n in sorted(without_duplicates(numbers)):\n",
+    "        add_to_segment_list(n, segment_list)\n",
+    "    return segment_list"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46b60d8d-8ae8-4885-9fe7-2074cc888f33",
+   "metadata": {},
+   "source": [
+    "There's not quite a one-to-one correspondance here between docstring and code. Maybe that means I'm following my advice of \"This approach is not always appropriate!\" Or maybe it means that the pattern of initializing a variable to the empty list is so well-known that I don't need to mention it in the docstring. \n",
+    "\n",
+    "In any case, here are the remaining missing pieces, `without_duplicates` and  `add_to_segment_list`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "fdd08113-3d84-46d9-88e6-727acb82502d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "without_duplicates = set # A set is a collection with no duplicates\n",
+    "\n",
+    "def add_to_segment_list(n: int, segment_list: List[Segment]) -> None:\n",
+    "    \"\"\"Mutate `segment_list` to cover `n`. If `n` is one more than the end of the last Segment, \n",
+    "    then add `n` to that last segment. Otherwise append a new Segment (covering `n`) to `segment_list`.\"\"\"\n",
+    "    if segment_list and segment_list[-1].end + 1 == n:\n",
+    "        segment_list[-1].end = n\n",
+    "    else:\n",
+    "        segment_list.append(Segment(n, n))   "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "684a5bc9-6f8f-4c71-88a4-2815120e3d1f",
+   "metadata": {},
+   "source": [
+    "Again, write some tests and pass them, and we're done!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "d6540c49-0782-4245-9c7a-e1ad7f67be0e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def test_segments():\n",
+    "    assert segments([]) == [],                          \"empty numbers\"\n",
+    "    assert str(segments([42])) == '[42]',               \"one number\"\n",
+    "    assert str(segments([42, 42])) == '[42]',           \"one number duplicated\"\n",
+    "    assert str(segments([1, 2, 3])) == '[1-3]',         \"one segment\"\n",
+    "    assert str(segments([3, 1, 2])) == '[1-3]',         \"same segment from unsorted input\"\n",
+    "    numbers = [4, 0, 4, 8, 3, 1, 10, 2, 9, 7, 11, 24, 7]\n",
+    "    assert str(segments(numbers)) == '[0-4, 7-11, 24]', \"multiple segments\"\n",
+    "    for s in segments(numbers):\n",
+    "        assert type(s) == Segment and s.start <= s.end, \"result is a list of valid Segments\"\n",
+    "    assert without_duplicates([1, 2, 2, 1]) == {1, 2}, \"no duplicates\"\n",
+    "\n",
+    "    s = []\n",
+    "    add_to_segment_list(1, s)\n",
+    "    assert s == [Segment(1, 1)]\n",
+    "    add_to_segment_list(2, s) \n",
+    "    assert s == [Segment(1, 2)]\n",
+    "    add_to_segment_list(4, s)\n",
+    "    assert s == [Segment(1, 2), Segment(4, 4)]\n",
+    "    add_to_segment_list(5, s)\n",
+    "    assert s == [Segment(1, 2), Segment(4, 5)]\n",
+    "    \n",
+    "    return True\n",
+    "\n",
+    "test_segments()"
   ]
  }
 ],
Author	SHA1	Message	Date
Peter Norvig	0321f7280f	Add files via upload	2024-04-11 01:02:57 -07:00
Peter Norvig	ad820b67a1	Add files via upload	2024-04-11 01:02:05 -07:00