From cbb8228d4a5b3c09058ea918ae8d8e5b3ae66500 Mon Sep 17 00:00:00 2001 From: Peter Norvig Date: Sat, 13 Apr 2024 23:39:54 -0700 Subject: [PATCH] Add files via upload --- ipynb/Paint.ipynb | 1073 +++++++++++++++++++++++++-------------------- 1 file changed, 607 insertions(+), 466 deletions(-) diff --git a/ipynb/Paint.ipynb b/ipynb/Paint.ipynb index 43a3479..bfae465 100644 --- a/ipynb/Paint.ipynb +++ b/ipynb/Paint.ipynb @@ -5,7 +5,7 @@ "id": "d3bc52a7-6c7d-49b5-b6d4-b43a74e0ef34", "metadata": {}, "source": [ - "
Peter Norvig, April 2024
\n", + "
Peter Norvig
April 2024
\n", "\n", "# Counting Cluster Sizes\n", "\n", @@ -15,70 +15,114 @@ "\n", "![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F079ab505-66c7-427b-ad6a-a2cf6a1794a6_1600x384.png)\n", "\n", - "Altogether there are 20 squares and 9 clusters, so the average cluster size is about 2.22 squares. Under the assumption that every configuration of red and blue squares is equally likely, Zach poses two questions (and I'll add two more):\n", + "Altogether there are 20 squares and 9 clusters, so the average cluster size is 20/9 ≈ 2.22 squares. Under the assumption that every square is equally likely to be red or blue, Zach poses two questions (and I'll add three more):\n", "\n", "- What is the average cluster size for a grid consisting of a single infinitely long row?\n", "- What is the average cluster size for a grid consisting of two infinitely long rows?\n", "- What is the average cluster size for a grid of any given size *w* × *h*?\n", "- What if there are three or more colors rather than just two?\n", + "- If you pick a random square, what is the average size of its cluster?\n", "\n", "# Code to Make Grids and to Count Clusters\n", "\n", - "My strategy will be to try consider every possible grid of a given size, compute the average cluster size for each grid, and then average across all of them. I'll define two data types and five functions:\n", - "- `Square`: a pair of `(x, y)` coordinates, e.g. `(2, 1)`.\n", - "- `Grid`: data type for a grid; a dict of `{square: contents}`, where `contents` initially starts as a color (e.g. `'R'` or `'B'` for red or blue).\n", - "- `grids(width, height)`: returns a list of all possible grids of the given size. Each grid has a different combination of colors. With *c* colors, there are *c*(*w* × *h*) possible grids of size *w* × *h*.\n", - "- `one_grid(width, height, colorseq)`: returns one grid of given size, filling in squares with entries from colorseq.\n", - "- `mean_cluster_size(grid)`: returns the average size of the clusters in a grid.\n", - "- `cluster(grid)`: Mutates the grid so that each square's contents is changed from a color (a string) to a cluster number, where 1 is the number of the first cluster, 2 of the next cluster, and so on. Uses a [flood fill](https://en.wikipedia.org/wiki/Flood_fill) algorithm.\n", - "- `neighbors(square)`: the four squares surrounding the given square.\n" + "I can see three approaches to answering these questions:\n", + "1) Enumerate all possible grids up to a certain size, compute the average cluster size for each grid, and average the averages. This gives an exact answer for grids of a specific size, but it can't say anything about infinite size grids. In fact it starts getting slow for grids with more than about 20 squares, because there are 2*n* grids with *n* squares. \n", + "2) Randomly select some grids and average cluster sizes over them. This can handle grids with thousands of squares, but the averages will be only estimates.\n", + "3) Come up with a mathematical proof that proves the answer for grids of any width *w*.\n", + "\n", + "\n", + "I can easily write code to implement the first two approaches. I'll start with some imports and the definition of three data types:" ] }, { "cell_type": "code", "execution_count": 1, - "id": "d3991ae8-4c2c-4e4f-8884-95fd74aec5be", + "id": "4ddf868c-8902-49f3-8bdd-79d9377d1a11", "metadata": {}, "outputs": [], "source": [ "import itertools \n", + "import random\n", "from statistics import mean, stdev\n", "from typing import *\n", "\n", - "Square = Tuple[int, int]\n", - "Grid = Dict[Square, Union[str, int]] # e.g. {(1, 1): 1, (2, 1): 'B'}\n", - "\n", - "def grids(width: int, height: int, colors='RB') -> List[Grid]:\n", + "Square = Tuple[int, int] # A square is a pair of `(x, y)` coordinates, e.g. `(2, 1)`.\n", + "Grid = Dict # A dict of `{square: contents}`e.g. {(1, 1): 1, (2, 1): 'B'}\n", + "Color = str # A color is represented by a one-character string, e.g. 'R' for red and 'B' for blue\n", + "COLORS = ('R', 'B')\n" + ] + }, + { + "cell_type": "markdown", + "id": "9a0980a5-2a66-4533-b16c-ad3ab810251a", + "metadata": {}, + "source": [ + "Now I'll define the function `all_grids` to make a list of all possible grids of a given size (all possible ways to color each square), and `random_grids` to sample `N` different grids of a given size. The helper function `one_grid` makes a single grid from a sequence of colors." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "3fa0280d-3e62-4645-8096-a9156b170ca1", + "metadata": {}, + "outputs": [], + "source": [ + "def all_grids(width: int, height: int, colors=COLORS) -> List[Grid]:\n", " \"\"\"All possible grids with given width, height, and color set.\"\"\"\n", " return [one_grid(width, height, colorseq) \n", " for colorseq in itertools.product(colors, repeat=(width * height))]\n", "\n", - "def one_grid(width: int, height: int, colorseq: Sequence[str]) -> Grid: \n", + "def random_grids(width: int, height: int, N: int, colors=COLORS) -> List[Grid]:\n", + " \"\"\"N grids of size width × height, filled with random colors.\"\"\"\n", + " return [random_grid(width, height, colors) for _ in range(N)]\n", + "\n", + "def random_grid(width: int, height: int, colors=COLORS) -> Grid:\n", + " \"\"\"A single random grid.\"\"\"\n", + " return one_grid(width, height, [random.choice(colors) for _ in range(width * height)])\n", + "\n", + "def one_grid(width: int, height: int, colorseq: Sequence[Color]) -> Grid: \n", " \"\"\"A grid of given size made from the sequence of colors.\"\"\"\n", " squares = [(x, y) for y in range(height) for x in range(width)]\n", - " return dict(zip(squares, colorseq))\n", - "\n", - "def mean_cluster_size(grid: Grid) -> float: \n", - " \"\"\"Mean size of clusters in a grid.\"\"\"\n", - " return len(grid) / max(cluster(grid).values())\n", - " \n", - "def cluster(grid: Grid) -> Grid:\n", - " \"\"\"Do a flood fill, replacing one cluster of adjacent characters with an integer,\n", - " then incrementing the integer for the next cluster and continuing.\"\"\"\n", + " return dict(zip(squares, colorseq))" + ] + }, + { + "cell_type": "markdown", + "id": "fcd29cb3-c7cf-4036-8659-6cd87c116c16", + "metadata": {}, + "source": [ + "Finally, the function `cluster` mutates the grid so that each square's contents is changed from a color to a cluster number, where 1 is the number of the first cluster, 2 of the next cluster, and so on. It uses a [flood fill](https://en.wikipedia.org/wiki/Flood_fill) algorithm." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "998ba6a6-8355-4474-9f78-92918f18656a", + "metadata": {}, + "outputs": [], + "source": [ + "def cluster(grid: Grid[Square, Color]) -> Grid[Square, int]:\n", + " \"\"\"Mutate grid, replacing colors with cluster numbers.\n", + " Do a flood fill, replacing one cluster of adjacent colors with an integer cluster number,\n", + " then incrementing the cluster number and continuing on to find the next cluster.\"\"\"\n", " cluster_number = 0 \n", " for square in grid:\n", " c = grid[square]\n", - " if isinstance(c, str):\n", + " if isinstance(c, Color):\n", " cluster_number += 1\n", - " # Assign cluster number to square and all its neighbors with the same contents\n", - " Q = [square] # queue of squares in cluster `i`\n", + " # Assign `cluster_number` to `square` and all its neighbors of the same color\n", + " Q = [square] # queue of squares in cluster `cluster_number`\n", " while Q: \n", " sq = Q.pop()\n", - " if grid.get(sq) == c: \n", + " if sq in grid and grid[sq] == c: \n", " grid[sq] = cluster_number\n", " Q.extend(neighbors(sq))\n", " return grid\n", "\n", + "def mean_cluster_size(grids: Collection[Grid]) -> float: \n", + " \"\"\"Mean size of the clusters in a collection of grids.\"\"\"\n", + " return mean(len(grid) / max(cluster(grid).values()) for grid in grids)\n", + " \n", "def neighbors(square: Square) -> List[Square]:\n", " \"\"\"The four neighbors of a square.\"\"\"\n", " (x, y) = square\n", @@ -92,21 +136,25 @@ "source": [ "# Answering the Questions\n", "\n", - "Here's one more function to help solve the questions:" + "Here's a function to help answer the questions:" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 4, "id": "816f854a-7c99-4022-b539-98f87bc20369", "metadata": {}, "outputs": [], "source": [ - "def do(n, h=1, colors='RB') -> dict: \n", - " \"\"\"A dict of {(w, h): mean-cluster-size} for grids with width w from 1 to n, and with height h.\"\"\"\n", - " for w in range(1, n + 1):\n", - " μ = mean(map(mean_cluster_size, grids(w, h, colors)))\n", - " print(f'{w:2} × {h} grids: {μ:6.4f} mean cluster size')" + "def do(W, h, N=30_000, colors='RB') -> None: \n", + " \"\"\"For each width w from 1 to W, print the mean cluster size of w x h grids.\n", + " If `N` is an integer, randomly sample `N` grids.\n", + " If `N` is `all`, exhaustively enumerate all possible grids.\"\"\"\n", + " which = \"all possible\" if N is all else f\"{N:,d} randomly sampled\"\n", + " print(f' Average cluster size over {which} grids of width 1–{W} and height {h}:')\n", + " for w in range(1, W + 1):\n", + " grids = all_grids(w, h, colors) if N is all else random_grids(w, h, N, colors)\n", + " print(f'{w:2} × {h} grids: {mean_cluster_size(grids):6.4f}')" ] }, { @@ -116,12 +164,12 @@ "source": [ "# One-Row Grids\n", "\n", - "I can't make an infinitely long row (a grid with *n* squares has 2*n* possible color arrangments, it will be slow to investigate grids with more than about 20 squares (a million arrangements)). However, I can look at successively wider rows and see if the average cluster size seems to be converging:" + "Let's see what we get. First an exact calculation over all possible grids up to size 14 × 1:" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 5, "id": "e004289f-0c81-46ab-8900-9312a0aa6f7c", "metadata": {}, "outputs": [ @@ -129,46 +177,97 @@ "name": "stdout", "output_type": "stream", "text": [ - " 1 × 1 grids: 1.0000 mean cluster size\n", - " 2 × 1 grids: 1.5000 mean cluster size\n", - " 3 × 1 grids: 1.7500 mean cluster size\n", - " 4 × 1 grids: 1.8750 mean cluster size\n", - " 5 × 1 grids: 1.9375 mean cluster size\n", - " 6 × 1 grids: 1.9688 mean cluster size\n", - " 7 × 1 grids: 1.9844 mean cluster size\n", - " 8 × 1 grids: 1.9922 mean cluster size\n", - " 9 × 1 grids: 1.9961 mean cluster size\n", - "10 × 1 grids: 1.9980 mean cluster size\n", - "11 × 1 grids: 1.9990 mean cluster size\n", - "12 × 1 grids: 1.9995 mean cluster size\n", - "13 × 1 grids: 1.9998 mean cluster size\n", - "14 × 1 grids: 1.9999 mean cluster size\n" + " Average cluster size over all possible grids of width 1–14 and height 1:\n", + " 1 × 1 grids: 1.0000\n", + " 2 × 1 grids: 1.5000\n", + " 3 × 1 grids: 1.7500\n", + " 4 × 1 grids: 1.8750\n", + " 5 × 1 grids: 1.9375\n", + " 6 × 1 grids: 1.9688\n", + " 7 × 1 grids: 1.9844\n", + " 8 × 1 grids: 1.9922\n", + " 9 × 1 grids: 1.9961\n", + "10 × 1 grids: 1.9980\n", + "11 × 1 grids: 1.9990\n", + "12 × 1 grids: 1.9995\n", + "13 × 1 grids: 1.9998\n", + "14 × 1 grids: 1.9999\n" ] } ], "source": [ - "do(14)" + "do(14, 1, all)" ] }, { "cell_type": "markdown", - "id": "13dd529d-519e-487e-9430-e782c8b23e0f", + "id": "dda7a929-b903-47e2-891a-d4544dfb86ec", "metadata": {}, "source": [ - "For rows that are at least 11 squares wide we're getting very close (within 0.001) to an average cluster size of 2.\n", - "\n", - "Now that I see the result, I can come up with a justification: Every cluster starts with one square. The next square in the row will be the same color half the time. Continuing, we would get three of the same color in a row a quarter of the time, four in a row an eigth of the time, and so on. So we get:\n", - "\n", - "mean cluster size   =   Σi∈{0,1,...∞} (1/2)n   =   2\n", - "\n", - "# Two-Row Grids\n", - "\n", - "Now consider grids that are two rows tall:" + "Let's compare that to the random sampling approach, which can handle wider grids:" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 6, + "id": "306d2161-07aa-4b8e-9c54-8462e0a0d6ac", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " Average cluster size over 30,000 randomly sampled grids of width 1–20 and height 1:\n", + " 1 × 1 grids: 1.0000\n", + " 2 × 1 grids: 1.5025\n", + " 3 × 1 grids: 1.7502\n", + " 4 × 1 grids: 1.8762\n", + " 5 × 1 grids: 1.9374\n", + " 6 × 1 grids: 1.9726\n", + " 7 × 1 grids: 1.9912\n", + " 8 × 1 grids: 1.9987\n", + " 9 × 1 grids: 1.9960\n", + "10 × 1 grids: 2.0011\n", + "11 × 1 grids: 1.9928\n", + "12 × 1 grids: 1.9976\n", + "13 × 1 grids: 2.0020\n", + "14 × 1 grids: 1.9991\n", + "15 × 1 grids: 2.0034\n", + "16 × 1 grids: 1.9952\n", + "17 × 1 grids: 1.9995\n", + "18 × 1 grids: 1.9984\n", + "19 × 1 grids: 1.9925\n", + "20 × 1 grids: 2.0022\n" + ] + } + ], + "source": [ + "do(20, 1, 30_000)" + ] + }, + { + "cell_type": "markdown", + "id": "138d8441-b233-457f-97d6-81946e8a07fa", + "metadata": {}, + "source": [ + "The answer seems to be converging on 2. The random sampling approach has pretty good agreement with the exhaustive approach, always agreeing to within 0.01, and sometimes to within 0.001.\n", + "\n", + "Now that I see these results, I can describe a mathematical justification of why the limit is 2. In fact, I have two justifications:\n", + "\n", + "**First**: what is the average number of new clusters introduced per column? Half the time column *k* will be different from the previous column *k* - 1, and thus half the time a column will introduce a new cluster. So with *W* columns there will be *W*/2 clusters, and the average cluster size will be *W*/(*W*/2) = 2.\n", + "\n", + "**Second**: what is the average length of a cluster that starts in a given column? Every cluster starts with one first square. Half the time, the next square will be the same color, giving two in a row. Continuing, we would get three in a row a quarter of the time, four in a row an eigth of the time, and so on. So using the [formula for the sum of a geometric series](https://en.wikipedia.org/wiki/Geometric_series) we get:\n", + "\n", + "mean cluster size   =   Σi∈{0,1,...∞} (1/2)n   =   1 / (1 - (1/2))   =  2\n", + "\n", + "# Two-Row Grids\n", + "\n", + "Now consider grids that are two rows tall, using both approaches:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, "id": "e2f0c3ca-acac-4dd8-9970-c846e3389ad5", "metadata": {}, "outputs": [ @@ -176,39 +275,97 @@ "name": "stdout", "output_type": "stream", "text": [ - " 1 × 2 grids: 1.5000 mean cluster size\n", - " 2 × 2 grids: 2.1250 mean cluster size\n", - " 3 × 2 grids: 2.4688 mean cluster size\n", - " 4 × 2 grids: 2.6682 mean cluster size\n", - " 5 × 2 grids: 2.7920 mean cluster size\n", - " 6 × 2 grids: 2.8737 mean cluster size\n", - " 7 × 2 grids: 2.9305 mean cluster size\n", - " 8 × 2 grids: 2.9717 mean cluster size\n", - " 9 × 2 grids: 3.0027 mean cluster size\n" + " Average cluster size over all possible grids of width 1–9 and height 2:\n", + " 1 × 2 grids: 1.5000\n", + " 2 × 2 grids: 2.1250\n", + " 3 × 2 grids: 2.4688\n", + " 4 × 2 grids: 2.6682\n", + " 5 × 2 grids: 2.7920\n", + " 6 × 2 grids: 2.8737\n", + " 7 × 2 grids: 2.9305\n", + " 8 × 2 grids: 2.9717\n", + " 9 × 2 grids: 3.0027\n" ] } ], "source": [ - "do(9, 2)" + "do(9, 2, all)" ] }, { "cell_type": "markdown", - "id": "8e1ac7a1-f947-4b76-8c5d-d1b26974d018", + "id": "de68a7e0-ced6-4c23-a806-3abacfc0927c", "metadata": {}, "source": [ - "It looks like the answer is converging on 3, but it is not as clear as with the first question.\n", - "\n", - "Here's a justification: consider a cluster that extends a certain length all along the top row, with possibly some additional squares below it in the bootom row. You can call this a π-shape, because there is a horizontal bar at the top, and ero or more vertical bars extending down. By the argument above, the average length of the bar is 2. Now for each column in the bar there is a 50% chance that the square below will be the same color and thus join the cluster. So the average cluster size in total is 2 + 2 × 1/2 = 3. By a similar argument, clusters that extend along the bottom row with possibly some squares above also have an average size of 3. But what about an Z-shape or S-shape cluster, which starts along the top (or bottom) row, then switches to the bottom (or top)? The trick is that you can take any Z- or S-shape and turn it into a π-shape by just swapping top and bottom squares when the cluster is on the bottom. This swap doesn't change any probabilities, so a Z-, S-, and π-shaped clusters all have a mean cluster size of 3.\n", - "\n", - "# Arbitrary *w* × *h* Grids\n", - "\n", - "Let's consider grids of size *n* × 3 and *n* × 4:" + "With widths up to 9, the mean is getting close to 3, but I don't think it is converging to exactly 3. " ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 8, + "id": "c7b56724-5bb7-4395-87b5-0268abb30ff1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " Average cluster size over 30,000 randomly sampled grids of width 1–25 and height 2:\n", + " 1 × 2 grids: 1.5005\n", + " 2 × 2 grids: 2.1190\n", + " 3 × 2 grids: 2.4684\n", + " 4 × 2 grids: 2.6611\n", + " 5 × 2 grids: 2.7899\n", + " 6 × 2 grids: 2.8772\n", + " 7 × 2 grids: 2.9399\n", + " 8 × 2 grids: 2.9784\n", + " 9 × 2 grids: 3.0066\n", + "10 × 2 grids: 3.0256\n", + "11 × 2 grids: 3.0587\n", + "12 × 2 grids: 3.0636\n", + "13 × 2 grids: 3.0728\n", + "14 × 2 grids: 3.0804\n", + "15 × 2 grids: 3.0969\n", + "16 × 2 grids: 3.1046\n", + "17 × 2 grids: 3.1094\n", + "18 × 2 grids: 3.1201\n", + "19 × 2 grids: 3.1187\n", + "20 × 2 grids: 3.1297\n", + "21 × 2 grids: 3.1326\n", + "22 × 2 grids: 3.1296\n", + "23 × 2 grids: 3.1403\n", + "24 × 2 grids: 3.1367\n", + "25 × 2 grids: 3.1383\n" + ] + } + ], + "source": [ + "do(25, 2)" + ] + }, + { + "cell_type": "markdown", + "id": "90a6dc1c-c570-4bfd-8fe9-210e80c6a455", + "metadata": {}, + "source": [ + "Now it is clear that the mean is well above 3. At this point I tried to come up with a mathematical analysis but made a mistake. Fortunately, Zach [provides the answer](https://thefiddler.substack.com/p/can-you-eclipse-via-ellipse) by breaking down the possibilities into cases. With the single-row grid, we only needed two cases: column *k* introduced a new cluster 1/2 the time. But with two rows there are 16 cases: each of the two squares in column *k* and the two in column *k* - 1 can be either red or blue. Overall, as aach's diagram below shows, there are 10 new clusters introduced in 16 cases, so the number of new clusters per column is 10/16, and the average cluster size over *W* columns is 2*W* / ((10/16)·*W*) = 16/5 = 3.2.\n", + "\n", + "![](https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb06d848c-37f6-4e9b-be38-a88bfe3e7316_1342x1154.png)" + ] + }, + { + "cell_type": "markdown", + "id": "9c0b08ba-7e6b-42ff-884a-5dd010bcfa3a", + "metadata": {}, + "source": [ + "# Three-Row Grids\n", + "\n", + "Let's next consider grids of size *n* × 3 :" + ] + }, + { + "cell_type": "code", + "execution_count": 9, "id": "b29ddec0-5fd2-4f8e-96af-f4b380c6eb59", "metadata": {}, "outputs": [ @@ -216,39 +373,71 @@ "name": "stdout", "output_type": "stream", "text": [ - " 1 × 3 grids: 1.7500 mean cluster size\n", - " 2 × 3 grids: 2.4688 mean cluster size\n", - " 3 × 3 grids: 2.8986 mean cluster size\n", - " 4 × 3 grids: 3.1591 mean cluster size\n", - " 5 × 3 grids: 3.3266 mean cluster size\n", - " 6 × 3 grids: 3.4408 mean cluster size\n" + " Average cluster size over all possible grids of width 1–6 and height 3:\n", + " 1 × 3 grids: 1.7500\n", + " 2 × 3 grids: 2.4688\n", + " 3 × 3 grids: 2.8986\n", + " 4 × 3 grids: 3.1591\n", + " 5 × 3 grids: 3.3266\n", + " 6 × 3 grids: 3.4408\n" ] } ], "source": [ - "do(6, 3)" + "do(6, 3, all)" ] }, { "cell_type": "code", - "execution_count": 6, - "id": "cb493f9f-734e-4767-b245-24cdf906a06a", + "execution_count": 10, + "id": "7e231093-9e10-446d-8b32-25952b286e1e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - " 1 × 4 grids: 1.8750 mean cluster size\n", - " 2 × 4 grids: 2.6682 mean cluster size\n", - " 3 × 4 grids: 3.1591 mean cluster size\n", - " 4 × 4 grids: 3.4616 mean cluster size\n", - " 5 × 4 grids: 3.6608 mean cluster size\n" + " Average cluster size over 30,000 randomly sampled grids of width 1–25 and height 3:\n", + " 1 × 3 grids: 1.7507\n", + " 2 × 3 grids: 2.4668\n", + " 3 × 3 grids: 2.8953\n", + " 4 × 3 grids: 3.1560\n", + " 5 × 3 grids: 3.3236\n", + " 6 × 3 grids: 3.4452\n", + " 7 × 3 grids: 3.5230\n", + " 8 × 3 grids: 3.5881\n", + " 9 × 3 grids: 3.6304\n", + "10 × 3 grids: 3.6695\n", + "11 × 3 grids: 3.6897\n", + "12 × 3 grids: 3.7266\n", + "13 × 3 grids: 3.7455\n", + "14 × 3 grids: 3.7636\n", + "15 × 3 grids: 3.7772\n", + "16 × 3 grids: 3.7995\n", + "17 × 3 grids: 3.8018\n", + "18 × 3 grids: 3.8090\n", + "19 × 3 grids: 3.8220\n", + "20 × 3 grids: 3.8290\n", + "21 × 3 grids: 3.8318\n", + "22 × 3 grids: 3.8514\n", + "23 × 3 grids: 3.8500\n", + "24 × 3 grids: 3.8580\n", + "25 × 3 grids: 3.8639\n" ] } ], "source": [ - "do(5, 4)" + "do(25, 3)" + ] + }, + { + "cell_type": "markdown", + "id": "bf56480e-4bb5-40f9-b1a9-c380eb698046", + "metadata": {}, + "source": [ + "The mathematical analysis here is much trickier. With two rows, we can tell how many new clusters are introduced in column *k* just by looking at column *k* - 1. But with three or more rows, that's no longer true. Suppose the top and bottom squares in a column are red. Is that one red cluster or two? We have to look a potentially *unbounded* number of columns away to see if they are connected (by a column that has red in all three squares). So no local analysis can determine the number of new clusters per column; we'll need some new method of analysis. \n", + "\n", + "Interestingly, this is the same limitation that Minsky and Papert noticed in the book [Perceptrons](https://direct.mit.edu/books/monograph/3132/PerceptronsAn-Introduction-to-Computational); their analysis caused many researchers to draw the conclusion that neural networks were no good; they should have instead drawn the conclusion that single-layer neural networks are no good but mult-layer ones avoid the limitations." ] }, { @@ -258,12 +447,12 @@ "source": [ "# Adding Colors\n", "\n", - "What if we add a third color? That should make the average cluster size smaller (since there is a 2/3 rather than 1/2 chance that the next square will be a different color and start a new cluster). I'll start with a single row:" + "What if we add a third color? That should make the average cluster size smaller (since there are more chances for neighboring squares to be different colors). I'll start with a single row:" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 11, "id": "dc4dc1f5-f5f4-4d69-8bdf-c00d21d978cd", "metadata": {}, "outputs": [ @@ -271,21 +460,23 @@ "name": "stdout", "output_type": "stream", "text": [ - " 1 × 1 grids: 1.0000 mean cluster size\n", - " 2 × 1 grids: 1.3333 mean cluster size\n", - " 3 × 1 grids: 1.4444 mean cluster size\n", - " 4 × 1 grids: 1.4815 mean cluster size\n", - " 5 × 1 grids: 1.4938 mean cluster size\n", - " 6 × 1 grids: 1.4979 mean cluster size\n", - " 7 × 1 grids: 1.4993 mean cluster size\n", - " 8 × 1 grids: 1.4998 mean cluster size\n", - " 9 × 1 grids: 1.4999 mean cluster size\n", - "10 × 1 grids: 1.5000 mean cluster size\n" + " Average cluster size over all possible grids of width 1–11 and height 1:\n", + " 1 × 1 grids: 1.0000\n", + " 2 × 1 grids: 1.3333\n", + " 3 × 1 grids: 1.4444\n", + " 4 × 1 grids: 1.4815\n", + " 5 × 1 grids: 1.4938\n", + " 6 × 1 grids: 1.4979\n", + " 7 × 1 grids: 1.4993\n", + " 8 × 1 grids: 1.4998\n", + " 9 × 1 grids: 1.4999\n", + "10 × 1 grids: 1.5000\n", + "11 × 1 grids: 1.5000\n" ] } ], "source": [ - "do(10, 1, 'RGB')" + "do(11, 1, all, colors='RGB')" ] }, { @@ -293,16 +484,14 @@ "id": "7afd16c2-99bd-410b-b5a8-d20862b7b1f5", "metadata": {}, "source": [ - "This is straightforward: the mean cluster size converges to 3/2. We can update the equation to deal with a single row with *c* equiprobable colors:\n", + "This is straightforward: the mean cluster size converges to 3/2; the analysis says that this is true because each column starts a new cluster 2/3 of the time, on average.\n", "\n", - "mean cluster size   =   Σi∈{0,1,...∞} (1/*c*)n   =   *c* / (*c* - 1)\n", - "\n", - "Now for a two-row grid with three colors:" + "Now for a two-row grid with three colors. I'll use random sampling:" ] }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 12, "id": "37f4a825-d75c-4275-b2c4-6dd3de080b3f", "metadata": {}, "outputs": [ @@ -310,17 +499,37 @@ "name": "stdout", "output_type": "stream", "text": [ - " 1 × 2 grids: 1.3333 mean cluster size\n", - " 2 × 2 grids: 1.6543 mean cluster size\n", - " 3 × 2 grids: 1.7737 mean cluster size\n", - " 4 × 2 grids: 1.8261 mean cluster size\n", - " 5 × 2 grids: 1.8535 mean cluster size\n", - " 6 × 2 grids: 1.8698 mean cluster size\n" + " Average cluster size over 30,000 randomly sampled grids of width 1–25 and height 2:\n", + " 1 × 2 grids: 1.3320\n", + " 2 × 2 grids: 1.6609\n", + " 3 × 2 grids: 1.7763\n", + " 4 × 2 grids: 1.8300\n", + " 5 × 2 grids: 1.8569\n", + " 6 × 2 grids: 1.8685\n", + " 7 × 2 grids: 1.8816\n", + " 8 × 2 grids: 1.8874\n", + " 9 × 2 grids: 1.8936\n", + "10 × 2 grids: 1.9009\n", + "11 × 2 grids: 1.9026\n", + "12 × 2 grids: 1.9045\n", + "13 × 2 grids: 1.9031\n", + "14 × 2 grids: 1.9056\n", + "15 × 2 grids: 1.9151\n", + "16 × 2 grids: 1.9119\n", + "17 × 2 grids: 1.9110\n", + "18 × 2 grids: 1.9121\n", + "19 × 2 grids: 1.9106\n", + "20 × 2 grids: 1.9144\n", + "21 × 2 grids: 1.9152\n", + "22 × 2 grids: 1.9139\n", + "23 × 2 grids: 1.9156\n", + "24 × 2 grids: 1.9165\n", + "25 × 2 grids: 1.9156\n" ] } ], "source": [ - "do(6, 2, 'RGB')" + "do(25, 2, colors='RGB')" ] }, { @@ -328,7 +537,7 @@ "id": "3c18f723-447f-4cd6-ba27-59db37e87659", "metadata": {}, "source": [ - "By the same reasoning as we had last time, this should converge to 3/2 + 3/2 × 1/3 = 2. (But it would take a lot of computation time to consider the larger grids that would be needed to get closer to 2.)" + "This seems to converge to about 1.92. To analye this for the two-color case we had to look at 24 = 16 cases; with three colors we have 34 = 81 cases; I haven't gone through them all to do the calculation. (But you can.)" ] }, { @@ -336,197 +545,72 @@ "id": "6a36b07b-8e5c-4048-8046-7619b34f6b26", "metadata": {}, "source": [ - "# Random Sampling for Larger Grids\n", + "# Larger Grids\n", "\n", - "I can't analyze an infinite two-dimensional grid.\n", - "\n", - "And I can't even enumerate all the, say, 100 × 100 grids, because there are 210,000 of them.\n", - "\n", - "But I can **randomly sample** a bunch of 100 × 100 grids, and that should give me a good estimate of the full distribution of grids.\n", - "\n", - "First, a function to make a random grid of a given size:" + "What about a grid that is unbounded in all directions? I can't fit that into a finite computer, but I can easily examine random grids of size 100 × 100 or more:" ] }, { "cell_type": "code", - "execution_count": 9, - "id": "51cc5dca-f3b4-4a2b-96af-479683216d4b", - "metadata": {}, - "outputs": [], - "source": [ - "import random\n", - "\n", - "def random_grid(width: int, height: int, colors='RB') -> Grid:\n", - " \"\"\"A width × height grid filled with random colors.\"\"\"\n", - " return one_grid(width, height, [random.choice(colors) for _ in range(width * height)])" - ] - }, - { - "cell_type": "markdown", - "id": "50ac4938-fd46-4e27-89d0-82bca289f37d", - "metadata": {}, - "source": [ - "Here's the mean cluster size of one random grid:" - ] - }, - { - "cell_type": "code", - "execution_count": 10, + "execution_count": 13, "id": "6abd1ae5-852f-459c-ad6e-1e91d91566b0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "7.037297677691766" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "mean_cluster_size(random_grid(100, 100))" - ] - }, - { - "cell_type": "markdown", - "id": "197a9c3e-58e4-4a4d-a37f-c793c98f36e8", - "metadata": {}, - "source": [ - "But that's just one grid. Better to sample *N* different grids (say, *N* = 200) and report on the mean cluster sizes for them:" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "39fd3321-907d-45eb-b06b-70d590427a09", - "metadata": {}, - "outputs": [], - "source": [ - "import matplotlib.pyplot as plt\n", - "\n", - "def report(numbers, bins=30) -> dict:\n", - " \"\"\"Plot a histogram and return statistics on these numbers.\"\"\"\n", - " plt.hist(numbers, bins=bins, rwidth=0.8, align='left')\n", - " return dict(mean=mean(numbers), max=max(numbers), min=min(numbers), \n", - " stdev=stdev(numbers), N=len(numbers))" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "083dc3ea-a395-4eae-b751-347e4384a373", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'mean': 7.239472697041692,\n", - " 'max': 7.9239302694136295,\n", - " 'min': 6.6711140760507,\n", - " 'stdev': 0.19030622926154506,\n", - " 'N': 200}" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAGfCAYAAAD/BbCUAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjYuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8o6BhiAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAauElEQVR4nO3de4xU5f348c/C4ggKGNRll4ILGqpRDBqxCFXBRlBCtd4aW6sFL4lGvJUYldLGNY2gNPVrDUWjaSlGUVNLrS1WXGu5eL/SGrSKCnVVkIjAgpKlyvP7o2F/riDsrDvPsruvV3IS58yZmQ+PML49M8spSymlAADIpEtbDwAAdC7iAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyKq8mIOnT58e8+bNi3//+9/RvXv3GDlyZNx8881x8MEHNx4zceLEmDNnTpPHDR8+PJ599tlmvcbWrVvjgw8+iJ49e0ZZWVkx4wEAbSSlFBs3box+/fpFly47P7dRVHwsWrQoJk2aFEcffXR89tlnMXXq1Bg7dmy89tprsddeezUed/LJJ8fs2bMbb++xxx7Nfo0PPvggBgwYUMxYAMBuoq6uLvr377/TY4qKj0cffbTJ7dmzZ0dFRUW89NJLcfzxxzfuLxQKUVlZWcxTN+rZs2dE/G/4Xr16teg5AIC86uvrY8CAAY3/Hd+ZouLjyzZs2BAREX369Gmyf+HChVFRURH77LNPjBo1Km688caoqKjY4XM0NDREQ0ND4+2NGzdGRESvXr3EBwC0M835ykRZSim15MlTSvG9730v1q1bF0uWLGnc/8ADD8Tee+8d1dXVsWLFivj5z38en332Wbz00ktRKBS2e56ampq44YYbttu/YcMG8QEA7UR9fX307t27Wf/9bnF8TJo0KebPnx9PPvnkTj/bWbVqVVRXV8f9998fZ5xxxnb3f/nMx7bTNuIDANqPYuKjRR+7XH755fHwww/H4sWLd/mlkqqqqqiuro7ly5fv8P5CobDDMyIAQMdUVHyklOLyyy+PP/3pT7Fw4cIYNGjQLh+zdu3aqKuri6qqqhYPCQB0HEX9JWOTJk2Ke+65J+bOnRs9e/aM1atXx+rVq2Pz5s0REbFp06a4+uqr45lnnomVK1fGwoUL45RTTon99tsvTj/99JL8AgCA9qWo73x81TdYZ8+eHRMnTozNmzfHaaedFq+88kqsX78+qqqq4oQTTohf/OIXzf67O4r5zAgA2D2U7Dsfu+qU7t27x4IFC4p5SgCgk3FtFwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyKpF13YBWtfA6+YX/ZiVN40vwSQApefMBwCQlfgAALISHwBAVuIDAMhKfAAAWYkPACAr8QEAZCU+AICsxAcAkJX4AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWYkPACAr8QEAZCU+AICsxAcAkJX4AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWYkPACCrouJj+vTpcfTRR0fPnj2joqIiTjvttHjjjTeaHJNSipqamujXr1907949Ro8eHcuWLWvVoQGA9quo+Fi0aFFMmjQpnn322aitrY3PPvssxo4dG5988knjMTNmzIhbbrklZs6cGS+88EJUVlbGmDFjYuPGja0+PADQ/pQXc/Cjjz7a5Pbs2bOjoqIiXnrppTj++OMjpRS33nprTJ06Nc4444yIiJgzZ0707ds35s6dGxdffHHrTQ4AtEtf6zsfGzZsiIiIPn36RETEihUrYvXq1TF27NjGYwqFQowaNSqefvrpHT5HQ0ND1NfXN9kAgI6rxfGRUorJkyfHscceG0OGDImIiNWrV0dERN++fZsc27dv38b7vmz69OnRu3fvxm3AgAEtHQkAaAdaHB+XXXZZ/Otf/4r77rtvu/vKysqa3E4pbbdvmylTpsSGDRsat7q6upaOBAC0A0V952Obyy+/PB5++OFYvHhx9O/fv3F/ZWVlRPzvDEhVVVXj/jVr1mx3NmSbQqEQhUKhJWMAAO1QUWc+Ukpx2WWXxbx58+KJJ56IQYMGNbl/0KBBUVlZGbW1tY37tmzZEosWLYqRI0e2zsQAQLtW1JmPSZMmxdy5c+PPf/5z9OzZs/F7HL17947u3btHWVlZXHXVVTFt2rQYPHhwDB48OKZNmxY9evSIc845pyS/AACgfSkqPm6//faIiBg9enST/bNnz46JEydGRMQ111wTmzdvjksvvTTWrVsXw4cPj8ceeyx69uzZKgMDAO1bUfGRUtrlMWVlZVFTUxM1NTUtnQkA6MBc2wUAyEp8AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVi26sBzQcQy8bn5Rx6+8aXybPxZo35z5AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWYkPACAr8QEAZCU+AICsxAcAkJX4AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWYkPACAr8QEAZCU+AICsxAcAkJX4AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGRVdHwsXrw4TjnllOjXr1+UlZXFQw891OT+iRMnRllZWZPtmGOOaa15AYB2ruj4+OSTT2Lo0KExc+bMrzzm5JNPjlWrVjVujzzyyNcaEgDoOMqLfcC4ceNi3LhxOz2mUChEZWVls56voaEhGhoaGm/X19cXOxIA0I4UHR/NsXDhwqioqIh99tknRo0aFTfeeGNUVFTs8Njp06fHDTfcUIoxaMcGXje/qONX3jS+RJMA0Npa/Qun48aNi3vvvTeeeOKJ+NWvfhUvvPBCfOc732lyduOLpkyZEhs2bGjc6urqWnskAGA30upnPs4+++zGfx4yZEgMGzYsqqurY/78+XHGGWdsd3yhUIhCodDaYwAAu6mS/6htVVVVVFdXx/Lly0v9UgBAO1Dy+Fi7dm3U1dVFVVVVqV8KAGgHiv7YZdOmTfHWW2813l6xYkUsXbo0+vTpE3369Imampo488wzo6qqKlauXBk//elPY7/99ovTTz+9VQcHANqnouPjxRdfjBNOOKHx9uTJkyMiYsKECXH77bfHq6++GnfffXesX78+qqqq4oQTTogHHnggevbs2XpTAwDtVtHxMXr06EgpfeX9CxYs+FoDAQAdm2u7AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVq1+YTmAUht43fyijl950/gSTQK0hDMfAEBW4gMAyEp8AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWYkPACAr8QEAZCU+AICsxAcAkJX4AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AIKvyth4A+PoGXje/qONX3jS+RJPs/qwVtD1nPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWZW39QDQUbhUO0DzOPMBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBkJT4AgKyKjo/FixfHKaecEv369YuysrJ46KGHmtyfUoqampro169fdO/ePUaPHh3Lli1rrXkBgHau6Pj45JNPYujQoTFz5swd3j9jxoy45ZZbYubMmfHCCy9EZWVljBkzJjZu3Pi1hwUA2r+iLyw3bty4GDdu3A7vSynFrbfeGlOnTo0zzjgjIiLmzJkTffv2jblz58bFF1/89aYFANq9Vv3Ox4oVK2L16tUxduzYxn2FQiFGjRoVTz/99A4f09DQEPX19U02AKDjKvrMx86sXr06IiL69u3bZH/fvn3jP//5zw4fM3369LjhhhtacwxosYHXzS/q+JU3jS/RJAAdV0l+2qWsrKzJ7ZTSdvu2mTJlSmzYsKFxq6urK8VIAMBuolXPfFRWVkbE/86AVFVVNe5fs2bNdmdDtikUClEoFFpzDABgN9aqZz4GDRoUlZWVUVtb27hvy5YtsWjRohg5cmRrvhQA0E4VfeZj06ZN8dZbbzXeXrFiRSxdujT69OkTBxxwQFx11VUxbdq0GDx4cAwePDimTZsWPXr0iHPOOadVBwcA2qei4+PFF1+ME044ofH25MmTIyJiwoQJ8fvf/z6uueaa2Lx5c1x66aWxbt26GD58eDz22GPRs2fP1psaAGi3io6P0aNHR0rpK+8vKyuLmpqaqKmp+TpzAQAdlGu7AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVq16bReAjsxVj6F1OPMBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWYkPACAr8QEAZCU+AICsxAcAkJX4AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW5W09ALS2gdfNL+r4lTeNL9EkAOyIMx8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBk5aq2AB2cKz2zu3HmAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZtXp81NTURFlZWZOtsrKytV8GAGinSnJtl8MOOywef/zxxttdu3YtxcsAAO1QSeKjvLzc2Q4AYIdK8p2P5cuXR79+/WLQoEHxgx/8IN55552vPLahoSHq6+ubbABAx9Xq8TF8+PC4++67Y8GCBXHXXXfF6tWrY+TIkbF27dodHj99+vTo3bt34zZgwIDWHgkA2I20enyMGzcuzjzzzDj88MPjxBNPjPnz50dExJw5c3Z4/JQpU2LDhg2NW11dXWuPBADsRkrynY8v2muvveLwww+P5cuX7/D+QqEQhUKh1GMAALuJkv89Hw0NDfH6669HVVVVqV8KAGgHWj0+rr766li0aFGsWLEinnvuuTjrrLOivr4+JkyY0NovBQC0Q63+sct7770XP/zhD+Ojjz6K/fffP4455ph49tlno7q6urVfCgBoh1o9Pu6///7WfkoAoANxbRcAICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWZX8wnK0bwOvm1/U8StvGl+iSQDoKJz5AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICtXtaVkXBEXmmqPfyba48zs/pz5AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJCV+AAAshIfAEBW4gMAyEp8AABZiQ8AICvxAQBkVd7WAwCwa+3x0vbtcWbycOYDAMhKfAAAWYkPACAr8QEAZCU+AICsxAcAkJX4AACyEh8AQFbiAwDISnwAAFmJDwAgK/EBAGQlPgCArMQHAJBVeVsPkFt7vMRzsTNHNJ27Pf6aAdpCe32/bW/v8858AABZiQ8AICvxAQBkJT4AgKzEBwCQlfgAALISHwBAVuIDAMhKfAAAWYkPACCrksXHrFmzYtCgQbHnnnvGUUcdFUuWLCnVSwEA7UhJ4uOBBx6Iq666KqZOnRqvvPJKHHfccTFu3Lh49913S/FyAEA7UpILy91yyy1x4YUXxkUXXRQREbfeemssWLAgbr/99pg+fXqTYxsaGqKhoaHx9oYNGyIior6+vhSjxdaGT4s6vlRzFKPYmSOazv11fs0eu3s+ti1f22Nb9ti2fO32+Ni20pZ/Fr+O3WGttz1nSmnXB6dW1tDQkLp27ZrmzZvXZP8VV1yRjj/++O2Ov/7661NE2Gw2m81m6wBbXV3dLluh1c98fPTRR/H5559H3759m+zv27dvrF69ervjp0yZEpMnT268vXXr1vj4449j3333jY0bN8aAAQOirq4uevXq1dqjdgj19fXWaCesz65Zo52zPrtmjXaus6xPSik2btwY/fr12+WxJfnYJSKirKxsu6G+vC8iolAoRKFQaLJvn332afIcvXr16tD/wlqDNdo567Nr1mjnrM+uWaOd6wzr07t372Yd1+pfON1vv/2ia9eu253lWLNmzXZnQwCAzqfV42OPPfaIo446Kmpra5vsr62tjZEjR7b2ywEA7UxJPnaZPHlynHfeeTFs2LAYMWJE3HnnnfHuu+/GJZdcUtTzFAqFuP7667f7WIb/zxrtnPXZNWu0c9Zn16zRzlmf7ZWl1JyfiSnerFmzYsaMGbFq1aoYMmRI/N///V8cf/zxpXgpAKAdKVl8AADsiGu7AABZiQ8AICvxAQBkJT4AgKzaND7ef//9OPfcc2PfffeNHj16xBFHHBEvvfTSTh/T0NAQU6dOjerq6igUCnHQQQfF7373u0wT59eSNbr33ntj6NCh0aNHj6iqqorzzz8/1q5dm2nifAYOHBhlZWXbbZMmTfrKxyxatCiOOuqo2HPPPePAAw+MO+64I+PE+RW7RvPmzYsxY8bE/vvvH7169YoRI0bEggULMk+dT0t+D23z1FNPRXl5eRxxxBGlH7QNtWSNOtP7dEvWp7O8R+9UK1xLrkU+/vjjVF1dnSZOnJiee+65tGLFivT444+nt956a6ePO/XUU9Pw4cNTbW1tWrFiRXruuefSU089lWnqvFqyRkuWLEldunRJv/71r9M777yTlixZkg477LB02mmnZZw8jzVr1qRVq1Y1brW1tSki0j/+8Y8dHv/OO++kHj16pCuvvDK99tpr6a677krdunVLDz74YN7BMyp2ja688sp08803p+effz69+eabacqUKalbt27p5Zdfzjt4JsWuzzbr169PBx54YBo7dmwaOnRollnbSkvWqDO9Txe7Pp3pPXpn2iw+rr322nTssccW9Zi//e1vqXfv3mnt2rUlmmr30pI1+uUvf5kOPPDAJvtuu+221L9//9Ycbbd05ZVXpoMOOiht3bp1h/dfc8016ZBDDmmy7+KLL07HHHNMjvF2C7taox059NBD0w033FDCqXYfzV2fs88+O/3sZz9L119/fYePjy/b1Rp1tvfpL9vV+nTm9+gvarOPXR5++OEYNmxYfP/734+Kioo48sgj46677mrWY2bMmBHf+MY34pvf/GZcffXVsXnz5kxT59WSNRo5cmS899578cgjj0RKKT788MN48MEHY/z48ZmmbhtbtmyJe+65Jy644IIdXsAwIuKZZ56JsWPHNtl30kknxYsvvhj//e9/c4zZppqzRl+2devW2LhxY/Tp06fE07W95q7P7Nmz4+23347rr78+43S7h+asUWd7n/6i5qxPZ32P3k5bVU+hUEiFQiFNmTIlvfzyy+mOO+5Ie+65Z5ozZ85XPuakk05KhUIhjR8/Pj333HNp/vz5qbq6Op1//vkZJ8+nJWuUUkp/+MMf0t57753Ky8tTRKRTTz01bdmyJdPUbeOBBx5IXbt2Te+///5XHjN48OB04403Ntn31FNPpYhIH3zwQalHbHPNWaMvmzFjRurTp0/68MMPSzjZ7qE56/Pmm2+mioqK9MYbb6SUUqc789GcNeps79Nf1Nw/Y53xPfrL2iw+unXrlkaMGNFk3+WXX77TU+BjxoxJe+65Z1q/fn3jvj/+8Y+prKwsffrppyWbta20ZI2WLVuWqqqq0owZM9I///nP9Oijj6bDDz88XXDBBaUet02NHTs2ffe7393pMYMHD07Tpk1rsu/JJ59MEZFWrVpVyvF2C81Zoy+aO3du6tGjR6qtrS3hVLuPXa3PZ599loYNG5Zuv/32xn2dLT6a83uos71Pf1Fz1qezvkd/WZvFxwEHHJAuvPDCJvtmzZqV+vXr95WP+fGPf5wOOuigJvtee+21FBHpzTffLMmcbakla3Tuueems846q8m+JUuWdOj/u1+5cmXq0qVLeuihh3Z63HHHHZeuuOKKJvvmzZuXysvLO/z/dTR3jba5//77U/fu3dNf//rXEk+2e2jO+qxbty5FROratWvjVlZW1rjv73//e8aJ82vu76HO9j69TXPXpzO+R+9Im33n49vf/na88cYbTfa9+eabUV1dvdPHfPDBB7Fp06Ymj+nSpUv079+/ZLO2lZas0aeffhpdujT919q1a9eIiEgd9DI+s2fPjoqKil1+ZjpixIiora1tsu+xxx6LYcOGRbdu3Uo5Yptr7hpFRNx3330xceLEmDt3bqf5HLo569OrV6949dVXY+nSpY3bJZdcEgcffHAsXbo0hg8fnnHi/Jr7e6izvU9v09z16Yzv0TvUVtXz/PPPp/Ly8nTjjTem5cuXp3vvvTf16NEj3XPPPY3HXHfddem8885rvL1x48bUv3//dNZZZ6Vly5alRYsWpcGDB6eLLrqoLX4JJdeSNZo9e3YqLy9Ps2bNSm+//XZ68skn07Bhw9K3vvWttvgllNznn3+eDjjggHTttddud9+X12bbj9r+5Cc/Sa+99lr67W9/2+F/1Dal4tZo7ty5qby8PP3mN79p8uODXzyF3tEUsz5f1lk+dilmjTrb+3RKxa1PZ3uP/iptFh8ppfSXv/wlDRkyJBUKhXTIIYekO++8s8n9EyZMSKNGjWqy7/XXX08nnnhi6t69e+rfv3+aPHlyh/4csSVrdNttt6VDDz00de/ePVVVVaUf/ehH6b333ss4dT4LFixIEdH4BcAv2tHaLFy4MB155JFpjz32SAMHDmzy+X1HVcwajRo1KkXEdtuECRPyDZxZsb+HvqizxEexa9TZ3qeLXZ/O9B79VcpS6kzneQCAtubaLgBAVuIDAMhKfAAAWYkPACAr8QEAZCU+AICsxAcAkJX4AACyEh8AQFbiAwDISnwAAFn9PxOJZYoQPFMEAAAAAElFTkSuQmCC\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "N = 200\n", - "sizes = [mean_cluster_size(random_grid(100, 100)) for _ in range(N)]\n", - "report(sizes)" - ] - }, - { - "cell_type": "markdown", - "id": "9f7b9a04-31ff-44ed-bbf4-19b56c910859", - "metadata": {}, - "source": [ - "A 100 x 100 grid isn't infinite, so let's look at larger grids:" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "9e1f69b9-870f-4439-ad62-92d91684392d", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'mean': 7.4134560772843425,\n", - " 'max': 7.651109410864575,\n", - " 'min': 7.116171499733143,\n", - " 'stdev': 0.10453350361308941,\n", - " 'N': 200}" + "7.2028626172827535" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" } ], "source": [ - "report([mean_cluster_size(random_grid(200, 200)) for _ in range(N)])" + "mean_cluster_size(random_grids(100, 100, N=200))" ] }, { "cell_type": "code", "execution_count": 14, - "id": "b580bc66-9bc2-406c-8ce4-806b9c020dc7", + "id": "ac814364-c03d-4888-bbbf-b0d6ac6c6625", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "{'mean': 7.47612464655258,\n", - " 'max': 7.681147051292993,\n", - " 'min': 7.279786459597185,\n", - " 'stdev': 0.07889928048753433,\n", - " 'N': 200}" + "7.400513340948056" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" } ], "source": [ - "report([mean_cluster_size(random_grid(300, 300)) for _ in range(N)])" + "mean_cluster_size(random_grids(200, 200, N=200))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "8824b41d-6224-4156-94a8-6d0ab3e6a1a6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "7.468919989722078" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "mean_cluster_size(random_grids(300, 300, N=200))" ] }, { @@ -534,7 +618,7 @@ "id": "6d20f32a-8c7a-4456-aa41-d2989e33fd57", "metadata": {}, "source": [ - "I think that what's happening is that the clusters that are near the edge of the grid get arbitrarily cut off, and since the edge is a smaller percentage of the larger grid, the larger grid has a larger mean cluster size, one that is a better representative of what would happen on an infinite grid. But I can't say exactly what the mean converges to." + "I think that what's happening is that the clusters that are near the edge of the grid get arbitrarily cut off, and since the edge is a smaller percentage of larger grids, a larger grid has a larger mean cluster size, one that is a better representative of what would happen on an infinite grid. But I can't say exactly what the mean converges to; probably somewhere around 7.5." ] }, { @@ -542,37 +626,9 @@ "id": "f3f21c52-8baf-4e71-a33c-e6f4cefd0877", "metadata": {}, "source": [ - "# My First Random Cluster\n", + "# Examining Random Clusters\n", "\n", - "The great thing about an *n* × 1 grid is that the clusters can't interfere with each other. Going left-to-right, once one cluster ends, the next one starts, and there's no looping back. But in an arbitrary 2D array there is looping back. So I want to get a feel for what the \"first\" cluster on an infinite grid looks like: start by placing a color at the origin square, then recursively place random colors in neighboring squares, extending the cluster, but stop when the cluster is completely surrounded by squares of the other color. " - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "ee2a29b8-7ec6-4558-b86a-5933ddc716f0", - "metadata": {}, - "outputs": [], - "source": [ - "def random_cluster(colors='@.', start=(0, 0)) -> Grid:\n", - " \"\"\"Create just enough of a grid to create a single random cluster.\"\"\"\n", - " cluster = {}\n", - " Q = [start] # A queue of the frontier squares of the cluster\n", - " while Q:\n", - " sq = Q.pop()\n", - " if sq not in cluster:\n", - " cluster[sq] = random.choice(colors)\n", - " if cluster[sq] == cluster[start]:\n", - " Q.extend(neighbors(sq))\n", - " return cluster" - ] - }, - { - "cell_type": "markdown", - "id": "58e3e7bc-2576-4332-921c-04e02da6490b", - "metadata": {}, - "source": [ - "I'll define `show` to print a picture of the cluster:" + "On an *n* × 1 grid a cluster is a straight line. But what do clusters look like on an arbitrarily large grid? I'll define `show` to print a grid (and return the number of colored squares in the grid), and `cluster_at(grid, square)` to return just the part of `grid` that contains `square` and all the other members of square's cluster." ] }, { @@ -582,100 +638,82 @@ "metadata": {}, "outputs": [], "source": [ - "def show(cluster: Grid) -> int:\n", - " \"\"\"Print a representation of the grid and return the size of the one cluster.\"\"\"\n", - " xs = sorted({x for (x, y) in cluster})\n", - " ys = sorted({y for (x, y) in cluster})\n", + "def show(grid: Grid) -> int:\n", + " \"\"\"Print a representation of the grid and return counts for the colors.\"\"\"\n", + " xs = sorted({x for (x, y) in grid})\n", + " ys = sorted({y for (x, y) in grid})\n", " for y in ys:\n", - " print(*[cluster.get((x, y), ' ') for x in xs])\n", - " return cluster_size(cluster)\n", + " print(*[grid.get((x, y), ' ') for x in xs])\n", + " return len(grid)\n", "\n", - "def cluster_size(cluster) -> int: \n", - " color = cluster[0, 0]\n", - " return Counter(cluster.values())[color]" + "def cluster_at(grid: Grid[Square, int], square: Square, color='#') -> Grid[Square, Color]:\n", + " \"\"\"The cluster that square belongs to in grid. Grid must already be clustered.\"\"\"\n", + " cluster_number = grid[square]\n", + " assert isinstance(cluster_number, int), \"grid must be already clustered by cluster(grid)\"\n", + " return {sq: color for sq in grid if grid[sq] == cluster_number}" + ] + }, + { + "cell_type": "markdown", + "id": "b866bc53-a9b0-45ca-8f85-d796b42fe987", + "metadata": {}, + "source": [ + "I'll make a 300 x 300 grid and then show the cluster at four different squares, chosen more-or-less randomly:" ] }, { "cell_type": "code", "execution_count": 17, - "id": "1827747e-76ac-496e-ad3b-b7cbf8b5201a", + "id": "35daf49d-9ac8-4cea-b85e-1f95cea2d9d3", "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " . . \n", - ". @ @ .\n", - ". @ . \n", - " . \n" - ] - }, - { - "data": { - "text/plain": [ - "3" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "random.seed(37)\n", - "show(random_cluster())" + "random.seed(1234)\n", + "\n", + "grid300 = cluster(random_grid(300, 300))" ] }, { "cell_type": "code", "execution_count": 18, - "id": "fd429e22-b8bd-475e-9bb0-cbdac2d57c0e", + "id": "f23766c7-18b2-4c94-85bf-0346c70eaa21", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - " . \n", - " . @ . \n", - " . @ @ . \n", - " . . @ . \n", - " . @ . . @ . \n", - " . @ @ @ @ @ . \n", - " . . @ @ . @ . @ .\n", - " . . @ . @ @ @ @ @ @ @ .\n", - " . @ . @ @ @ @ @ @ . . . \n", - " . @ @ @ @ . . @ . \n", - " . . @ . @ . @ . \n", - " . . @ . . . @ @ @ @ . \n", - " . . @ . . @ @ @ @ @ . . . \n", - " . @ @ @ @ . . @ @ . @ @ @ . \n", - " . @ @ @ . . . @ @ . @ @ . . \n", - " . . @ @ @ @ @ @ . @ . @ . @ @ @ @ . \n", - " . @ @ @ . . @ . @ @ @ @ . @ @ . . . \n", - " . @ . @ . . . @ . @ @ @ @ @ . \n", - " . . @ . . @ @ @ . . . . . \n", - " . @ @ . @ @ . @ . \n", - " . @ . @ @ @ . . \n", - " . @ @ @ @ @ @ @ . \n", - " . @ @ @ . @ @ @ . \n", - " . . . . . @ @ @ . @ @ . \n", - " . @ @ @ . . @ @ @ @ @ @ . @ . \n", - " . @ @ @ . @ @ . . @ @ . . \n", - " . @ @ @ @ @ @ . @ . \n", - " . @ @ . . . . @ @ @ . \n", - " . . . @ . . @ . \n", - ". @ @ . @ @ . . \n", - " . @ @ @ . \n", - " . @ @ @ . \n", - " . . . \n" + " # # # # # \n", + " # # # # # # # \n", + " # # # # # # # # \n", + " # # # # # # \n", + " # # # # # # # # # # \n", + " # # # # # # # # # \n", + " # # # # # # # # # # # # \n", + " # # # # # # # # # # # #\n", + " # # # # # # # # # #\n", + " # # # # # # # # # # # # \n", + " # # # # # # # # # # # # # # # # # # \n", + " # # # # # # # # # # # # # # # # # # \n", + " # # # # # # # # # # # # # # # # # # # # \n", + " # # # # # # # # # # \n", + " # # # # # # # # # # # # \n", + "# # # # # # # # # # # # \n", + " # # # # # # # \n", + "# # # # # # # # # \n", + "# # # # # \n", + " # # # # # # # # # # # \n", + " # # # # # # # \n", + " # # # # # # \n", + " # # # # # # # # # \n", + " # # # # \n", + " # \n" ] }, { "data": { "text/plain": [ - "167" + "240" ] }, "execution_count": 18, @@ -684,31 +722,50 @@ } ], "source": [ - "show(random_cluster())" + "show(cluster_at(grid300, (150, 150)))" ] }, { "cell_type": "code", "execution_count": 19, - "id": "13ba443e-c5b9-4897-9075-0833f92081ae", + "id": "408ad7a8-9cd6-4d0a-a7e1-170581eb4c45", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - " . . \n", - ". @ @ . . \n", - " . @ @ @ .\n", - " . . @ .\n", - " . @ .\n", - " . \n" + " # # \n", + " # # # # \n", + " # \n", + " # \n", + " # # \n", + " # # \n", + " # \n", + " # # # # # # # # # # \n", + " # # # # # # # # # \n", + " # # # # # # # \n", + " # # # # # # # # \n", + " # # # # # # # # # # # # \n", + " # # # # # # # \n", + " # # # # # # # # # \n", + " # # # # # # # # # # \n", + " # # # # # # # # # # # # #\n", + " # # # # # # # # # # # # \n", + " # # # # # # # # # # # # # # # # \n", + "# # # # # # # # # # # \n", + "# # # # # # # # \n", + " # # # # # \n", + " # # # # # # # # \n", + " # # \n", + "# # # \n", + " # \n" ] }, { "data": { "text/plain": [ - "7" + "164" ] }, "execution_count": 19, @@ -717,7 +774,105 @@ } ], "source": [ - "show(random_cluster())" + "show(cluster_at(grid300, (150, 200)))" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "1827747e-76ac-496e-ad3b-b7cbf8b5201a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + " # # # \n", + " # # # # # # \n", + " # # # # \n", + " # # # # # \n", + " # # # # # # # \n", + " # # # # # # # #\n", + " # # # # # # # # # \n", + " # # # # # # # # # # \n", + " # # # # # \n", + " # # \n", + " # # \n", + " # # # # # # \n", + " # # # # # \n", + " # # # # # # # # \n", + " # # # # # # # \n", + " # # # \n", + " # # # # \n", + " # # # # # # \n", + " # # # # # # # \n", + " # # # # # # # \n", + " # # # # # # # \n", + " # # # # # # # # # # \n", + " # # # # # # # # # # # # \n", + " # # # # # # \n", + " # # # # # # # \n", + " # # # # # # # # \n", + " # # # # # # # # # \n", + " # # # # # # # # \n", + " # # # # # # # \n", + " # # # # # # # # # # # \n", + " # # # # # # # # # # # # # # \n", + " # # # # # # # # # # # # # \n", + "# # # # # # # # # # # # \n", + " # # # # # \n", + " # # # # # \n", + " # # # # # # # # \n", + " # # # \n", + " # # # # # # # \n", + " # # # # # \n", + " # # # # # \n", + "# # # # # # # # # \n", + " # # # # # # # \n", + " # \n" + ] + }, + { + "data": { + "text/plain": [ + "293" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "show(cluster_at(grid300, (200, 150)))" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "fd429e22-b8bd-475e-9bb0-cbdac2d57c0e", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "#\n" + ] + }, + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "show(cluster_at(grid300, (200, 200)))" ] }, { @@ -725,100 +880,83 @@ "id": "faf079b0-8534-4d92-aa8e-acd035dc5420", "metadata": {}, "source": [ - "We can see that there is a lot of variation in size and shape of the first clusters. We can also see that the shape will constrain other clusters: there are some small clusters of the second color that are trapped inside the cluster of the first color. Below I sample 10,000 first clusters and report on their sizes:" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "id": "ab6ba895-aa51-41af-baf1-f09ee47e973f", - "metadata": {}, - "outputs": [], - "source": [ - "sizes = [cluster_size(random_cluster()) for _ in range(10_000)]" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "id": "df7f6ae0-32aa-4002-8056-9de0dc2da07a", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'mean': 59.6728, 'max': 917, 'min': 1, 'stdev': 75.3054688022268, 'N': 10000}" - ] - }, - "execution_count": 21, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "report(sizes)" - ] - }, - { - "cell_type": "markdown", - "id": "7d2c2f1a-5b51-4d02-a35c-05d6f90e5a09", - "metadata": {}, - "source": [ - "There's quite a spread, with the standard deviation being larger than the mean, and the maximum being about 10 times the mean. It might help to zoom in on the smaller sizes:" + "We see that there is a lot of variation in size and shape of the clusters. To get a better idea of the variation I introduce `cluster_counts(S)` to do the following: make an S x S grid; cluster it; define `sizes` such that `sizes[cluster_number]` gives the cluster's size; define the set of squares in the `perimeter`; define the set of clusters that touch the perimeter (and thus have a size that might be smaller than it should be); and finally return a Counter that for each cluster size gives the number of squares that are in a cluster of that size:" ] }, { "cell_type": "code", "execution_count": 22, - "id": "3dcbff13-ddb1-4ba0-9650-3b6b6122f76a", + "id": "77c0aba0-700d-486e-ba9e-d70dbe11097e", + "metadata": {}, + "outputs": [], + "source": [ + "def cluster_counts(S: int) -> Counter:\n", + " \"\"\"For a random SxS grid, return a counter of {cluster_size: squares_with_that_size},\n", + " for all squares that are not part of a cluster that touches the perimeter.\"\"\"\n", + " grid = cluster(random_grid(S, S)) # Grid of {square: cluster_number}\n", + " sizes = Counter(grid.values()) # Counter of {cluster_number: cluster_size}\n", + " perimeter = cross([0, S - 1], range(S)) | cross(range(S), [0, S - 1]) # Squares on perimeter\n", + " perimeter_clusters = {grid[p] for p in perimeter} # Cluster numbers on perimeter squares\n", + " return Counter(sizes[grid[sq]] for sq in grid if grid[sq] not in perimeter_clusters)\n", + "\n", + "def cross(xs, ys) -> Set[Square]: return {(x, y) for x in xs for y in ys}" + ] + }, + { + "cell_type": "markdown", + "id": "a314bce0-fe3c-490c-a859-3650266542c2", + "metadata": {}, + "source": [ + "I'll then define `counter_stats` to return a dict of statistics about a counter, and use it to explore our cluster counts:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "id": "44f89808-d7a6-47ec-8945-d6f6c4affbb9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "{'mean': 7.851026539809714,\n", - " 'max': 20,\n", - " 'min': 1,\n", - " 'stdev': 5.816812011424741,\n", - " 'N': 3994}" + "{'mean': 58.88548665248869,\n", + " 'N': 974751,\n", + " 'range': range(1, 734),\n", + " 'common': [(1, 62777),\n", + " (2, 31428),\n", + " (3, 28983),\n", + " (4, 25780),\n", + " (5, 23480),\n", + " (6, 21900),\n", + " (7, 20412),\n", + " (8, 18168),\n", + " (9, 17316),\n", + " (10, 16280)]}" ] }, - "execution_count": 22, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" - }, - { - "data": { - "image/png": "\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" } ], "source": [ - "smaller = [s for s in sizes if s <= 20]\n", - "report(smaller, bins=range(22))" + "def counter_stats(counter: Counter, common=10) -> dict:\n", + " \"\"\"Return a dict of some statistics for the values in this counter.\"\"\"\n", + " return dict(mean=mean(counter.elements()), N=sum(counter.values()), \n", + " range=range(min(counter), max(counter) + 1), common=counter.most_common(common))\n", + "\n", + "counter_stats(cluster_counts(1000))" ] }, { "cell_type": "markdown", - "id": "e93bc126-d2c9-415d-97cf-ee07bd36502d", + "id": "b0005025-4a9e-4fe9-ac04-0d1ce43812f8", "metadata": {}, "source": [ - "We see that there are over 600 clusters of size 1. That makes sense; we start with one square, and about once every 24 times all 4 neighbors will be the other color, and 10000 / 16 = 625." + "Wait a minute ... earler we saw that the average cluster size on a 300 x 300 grid was about 7.5. But the `counter_stats` here give a `mean` cluster size of somewhere around 60. How can that be? It is because the two numbers are answering two different questions. **7.5** is the answer to \"if we randomly pick a **cluster**, what is its expected size\"? **60** is the answer to \"if we randomly pick a **square**, what is the expected size of the square's cluster?\"\n", + "\n", + "In the output from `counter_stats`, the `common` entry gives a list of the ten most common sizes (which happen to be the integers from 1 to 10), paired with the counts of how many squares have that size cluster. On average, 1/16 of the clusters have size 1, (when all 4 neighbors of the start square get the opposite color). Half as many have size 2. A cluster with hundreds of squares is far less common, but certainly possible.\n", + "\n" ] }, { @@ -833,12 +971,12 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 24, "id": "da7a19ef-d47f-4c69-8fe5-cfaf79861dab", "metadata": {}, "outputs": [], "source": [ - "assert grids(1, 3) == [\n", + "assert all_grids(1, 3) == [\n", " {(0, 0): 'R', (0, 1): 'R', (0, 2): 'R'},\n", " {(0, 0): 'R', (0, 1): 'R', (0, 2): 'B'},\n", " {(0, 0): 'R', (0, 1): 'B', (0, 2): 'R'},\n", @@ -848,7 +986,7 @@ " {(0, 0): 'B', (0, 1): 'B', (0, 2): 'R'},\n", " {(0, 0): 'B', (0, 1): 'B', (0, 2): 'B'}]\n", "\n", - "assert grids(1, 2, 'RGB') == [\n", + "assert all_grids(1, 2, 'RGB') == [\n", " {(0, 0): 'R', (0, 1): 'R'},\n", " {(0, 0): 'R', (0, 1): 'G'},\n", " {(0, 0): 'R', (0, 1): 'B'},\n", @@ -862,7 +1000,7 @@ "grid6x1 = one_grid(6, 1, 'RRBBBR')\n", "assert grid6x1 == {(0, 0): 'R', (1, 0): 'R', (2, 0): 'B', (3, 0): 'B', (4, 0): 'B', (5, 0): 'R'}\n", "assert cluster(grid6x1) == {(0, 0): 1, (1, 0): 1, (2, 0): 2, (3, 0): 2, (4, 0): 2, (5, 0): 3}\n", - "assert mean_cluster_size(grid6x1) == 2\n", + "assert mean_cluster_size([grid6x1]) == 2\n", "\n", "grid5x3 = one_grid(5, 3, 'RR:RR'\n", " '.R:R.'\n", @@ -871,13 +1009,16 @@ " (0, 0): 1, (1, 0): 1, (2, 0): 2, (3, 0): 1, (4, 0): 1,\n", " (0, 1): 3, (1, 1): 1, (2, 1): 2, (3, 1): 1, (4, 1): 4,\n", " (0, 2): 3, (1, 2): 1, (2, 2): 1, (3, 2): 1, (4, 2): 4}\n", - "assert mean_cluster_size(grid5x3) == 3.75\n", + "assert mean_cluster_size([grid5x3]) == 3.75\n", "\n", "grid10x2 = one_grid(10, 2, 'RBRBRRRBBR' # Example from diagram at top of notebook\n", " 'RRRRBBRBRB') \n", - "assert mean_cluster_size(grid10x2) == 20/9\n", + "assert mean_cluster_size([grid10x2]) == 20/9\n", "\n", - "assert len(random_grid(3, 4)) == 12" + "grid4x3 = random_grid(3, 4)\n", + "assert len(grid4x3) == 12\n", + "\n", + "assert cross((10, 20, 30), (1, 2, 3)) == {(10, 1), (10, 2), (10, 3), (20, 1), (20, 2), (20, 3), (30, 1), (30, 2), (30, 3)}" ] } ],