diff --git a/ipynb/SpellingBee.ipynb b/ipynb/SpellingBee.ipynb index 3dae3b7..7f42b25 100644 --- a/ipynb/SpellingBee.ipynb +++ b/ipynb/SpellingBee.ipynb @@ -8,7 +8,7 @@ "\n", "# Spelling Bee Puzzle\n", "\n", - "The [3 Jan. 2020 edition of the 538 Riddler](https://fivethirtyeight.com/features/can-you-solve-the-vexing-vexillology/) concerns the popular NYTimes [Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee) puzzle:\n", + "The [3 Jan. 2020 edition of the **538 Riddler**](https://fivethirtyeight.com/features/can-you-solve-the-vexing-vexillology/) concerns the popular NYTimes [**Spelling Bee**](https://www.nytimes.com/puzzles/spelling-bee) puzzle:\n", "\n", "> In this game, seven letters are arranged in a **honeycomb** lattice, with one letter in the center. Here’s the lattice from Dec. 24, 2019:\n", "> \n", @@ -19,29 +19,17 @@ "> 2. The word must include the central letter.\n", "> 3. The word cannot include any letter beyond the seven given letters.\n", ">\n", - ">Note that letters can be repeated. For example, the words GAME and AMALGAM are both acceptable words. Four-letter words are worth 1 point each, while five-letter words are worth 5 points, six-letter words are worth 6 points, seven-letter words are worth 7 points, etc. Words that use all of the seven letters in the honeycomb are known as **pangrams** and earn 7 bonus points (in addition to the points for the length of the word). So in the above example, MEGAPLEX is worth 15 points.\n", + ">Note that letters can be repeated. For example, the words GAME and AMALGAM are both acceptable words. Four-letter words are worth 1 point each, while five-letter words are worth 5 points, six-letter words are worth 6 points, etc. Words that use all seven letters in the honeycomb are known as **pangrams** and earn 7 bonus points (in addition to the points for the length of the word). So in the above example, MEGAPLEX is worth 8 + 7 = 15 points.\n", ">\n", - "> ***Which seven-letter honeycomb results in the highest possible game score?*** To be a valid choice of seven letters, no letter can be repeated, it must not contain the letter S (that would be too easy) and there must be at least one pangram.\n", + "> ***Which seven-letter honeycomb results in the highest possible score?*** To be a valid choice of seven letters, no letter can be repeated, it must not contain the letter S (that would be too easy) and there must be at least one pangram.\n", ">\n", "> For consistency, please use [this word list](https://norvig.com/ngrams/enable1.txt) to check your game score.\n", "\n", "\n", "\n", - "Since the referenced [word list](https://norvig.com/ngrams/enable1.txt) came from [*my* web site](https://norvig.com/ngrams/enable1.txt), I felt somewhat compelled to solve this one. (Note I didn't make up the word list; it is a standard Scrabble word list that I happen to host a copy of.) I'll show you how I address the problem, step by step:" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Step 1: Words, Word Scores, and Pangrams\n", + "Since the referenced [word list](https://norvig.com/ngrams/enable1.txt) came from [***my*** web site](https://norvig.com/ngrams), I felt somewhat compelled to solve this one. (Note it is a standard Scrabble® word list that I happen to host a copy of; I didn't curate it.) \n", "\n", - "Let's start by defining some basics:\n", - "\n", - "- A **valid word** is a string of at least 4 letters, with no 'S', and not more than 7 distinct letters.\n", - "- A **word list** is, well, a list of words.\n", - "- A **pangram** is a word with exactly 7 distinct letters; it scores a **pangram bonus** of 7 points.\n", - "- The **word score** is 1 for a four letter word, or the length of the word for longer words, plus any pangram bonus.\n" + "I'll show you how I address the problem. First some imports, then we'll work through 10 steps." ] }, { @@ -50,11 +38,24 @@ "metadata": {}, "outputs": [], "source": [ - "from typing import List, Set, Tuple, Dict\n", "from collections import Counter, defaultdict\n", "from dataclasses import dataclass\n", "from itertools import combinations\n", - "import matplotlib.pyplot as plt" + "from typing import List, Set, Dict, Tuple" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 1: Words, Word Scores, and Pangrams\n", + "\n", + "Let's start by defining some basic terms:\n", + "\n", + "- **valid word**: a string of at least 4 letters ('A' to 'Z' but not 'S'), and not more than 7 distinct letters.\n", + "- **word list**: a list of valid words.\n", + "- **pangram**: a word with exactly 7 distinct letters.\n", + "- **word score**: 1 for a four letter word, or the length of the word for longer words, plus 7 for a pangram." ] }, { @@ -63,23 +64,19 @@ "metadata": {}, "outputs": [], "source": [ - "Word = str # Type for a word\n", - "\n", - "def valid(word) -> bool:\n", - " \"\"\"Does word have at least 4 letters, no 'S', and no more than 7 distinct letters?\"\"\"\n", + "def is_valid(word) -> bool:\n", + " \"\"\"Is word 4 or more letters, no 'S', and no more than 7 distinct letters?\"\"\"\n", " return len(word) >= 4 and 'S' not in word and len(set(word)) <= 7\n", "\n", - "def valid_words(text, valid=valid) -> List[Word]: \n", - " \"\"\"All the valid words in text.\"\"\"\n", - " return [w for w in text.upper().split() if valid(w)]\n", + "def word_list(text) -> List[str]: \n", + " \"\"\"All the valid words in text (uppercased).\"\"\"\n", + " return [w for w in text.upper().split() if is_valid(w)]\n", "\n", - "def pangram_bonus(word) -> int: \n", - " \"\"\"Does a word get a bonus for having 7 distinct letters?\"\"\"\n", - " return 7 if len(set(word)) == 7 else 0\n", + "def is_pangram(word) -> bool: return len(set(word)) == 7\n", "\n", "def word_score(word) -> int: \n", " \"\"\"The points for this word, including bonus for pangram.\"\"\"\n", - " return 1 if len(word) == 4 else len(word) + pangram_bonus(word)" + " return 1 if len(word) == 4 else len(word) + 7 * is_pangram(word)" ] }, { @@ -97,7 +94,7 @@ { "data": { "text/plain": [ - "['GAME', 'AMALGAM', 'GLAM', 'MEGAPLEX', 'CACCIATORE', 'EROTICA']" + "['AMALGAM', 'CACCIATORE', 'EROTICA', 'GAME', 'GLAM', 'MEGAPLEX']" ] }, "execution_count": 3, @@ -106,7 +103,7 @@ } ], "source": [ - "mini = valid_words('game amalgam amalgamation glam gem gems em megaplex cacciatore erotica')\n", + "mini = word_list('amalgam amalgamation cacciatore erotica em game gem gems glam megaplex')\n", "mini" ] }, @@ -114,7 +111,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Note that `gem` and `em` are too short, `gems` has an `s` which is not allowed, and `amalgamation` has too many distinct letters (8). We're left with six valid words out of the ten candidate words. Here are examples of the other two functions in action:" + "Note that `em` and `gem` are too short, `gems` has an `s` which is not allowed, and `amalgamation` has too many distinct letters (8). We're left with six valid words out of the ten candidate words. Here are examples of the other two functions in action:" ] }, { @@ -134,7 +131,7 @@ } ], "source": [ - "{w for w in mini if pangram_bonus(w)}" + "{w for w in mini if is_pangram(w)}" ] }, { @@ -145,12 +142,12 @@ { "data": { "text/plain": [ - "{'GAME': 1,\n", - " 'AMALGAM': 7,\n", - " 'GLAM': 1,\n", - " 'MEGAPLEX': 15,\n", + "{'AMALGAM': 7,\n", " 'CACCIATORE': 17,\n", - " 'EROTICA': 14}" + " 'EROTICA': 14,\n", + " 'GAME': 1,\n", + " 'GLAM': 1,\n", + " 'MEGAPLEX': 15}" ] }, "execution_count": 5, @@ -166,20 +163,34 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Step 2: Honeycombs and Game Scores\n", + "# 2: Honeycombs and Lettersets\n", "\n", - "In a honeycomb the order of the letters doesn't matter; all that matters is:\n", - " 1. The seven distinct letters in the honeycomb.\n", - " 2. The one distinguished center letter.\n", - " \n", - "Thus, we can represent a honeycomb as follows:\n", - " " + "A honeycomb lattice consists of (1) a set of seven distinct letters and (2) the one distinguished center letter:\n" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, + "outputs": [], + "source": [ + "Letter = Letterset = str # Types\n", + "\n", + "@dataclass(frozen=True, order=True)\n", + "class Honeycomb:\n", + " \"\"\"A Honeycomb lattice, with 7 letters, 1 of which is the center.\"\"\"\n", + " letters: Letterset # 7 letters\n", + " center: Letter # 1 letter\n", + " \n", + "def letterset(word) -> Letterset:\n", + " \"\"\"The set of letters in a word, represented as a sorted str.\"\"\"\n", + " return ''.join(sorted(set(word)))" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, "outputs": [ { "data": { @@ -187,18 +198,13 @@ "Honeycomb(letters='AEGLMPX', center='G')" ] }, - "execution_count": 6, + "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "@dataclass(frozen=True, order=True)\n", - "class Honeycomb:\n", - " letters: str # 7 letters\n", - " center: str # 1 letter\n", - "\n", - "hc = Honeycomb('AEGLMPX', 'G')\n", + "hc = Honeycomb(letterset('MEGAPLEX'), 'G')\n", "hc" ] }, @@ -206,12 +212,38 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "The **game score** for a honeycomb is the sum of the word scores for all the words that the honeycomb can make. How do we know if a honeycomb can make a word? It can if (1) the word contains the honeycomb's center and (2) every letter in the word is in the honeycomb. " + "The type `Letter` is a `str` of 1 letter and `Letterset` is an unordered collection of letters, which I will represent as a sorted `str`. Why not a Python `set` or `frozenset`? Because a `str` takes up less space in memory, and its printed representation is easier to read when debugging. Compare:\n", + "- `frozenset({'A', 'E', 'G', 'L', 'M', 'P', 'X'})`\n", + "- `'AEGLMPX'`\n", + "\n", + "Why sorted? So that equal lettersets are equal:" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "assert letterset('EROTICA') == letterset('CACCIATORE') == 'ACEIORT'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# 3: Game Score\n", + "\n", + "The **game score** for a honeycomb is the sum of the word scores for all the words that the honeycomb **can make**. \n", + "\n", + "A honeycomb can make a word if\n", + "(1) the word contains the honeycomb's center, and\n", + "(2) every letter in the word is in the honeycomb. " + ] + }, + { + "cell_type": "code", + "execution_count": 9, "metadata": {}, "outputs": [], "source": [ @@ -227,36 +259,16 @@ }, { "cell_type": "code", - "execution_count": 8, + "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "24" + "{'AMALGAM': 7, 'GAME': 1, 'GLAM': 1, 'MEGAPLEX': 15}" ] }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "game_score(hc, mini)" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'GAME': 1, 'AMALGAM': 7, 'GLAM': 1, 'MEGAPLEX': 15}" - ] - }, - "execution_count": 9, + "execution_count": 10, "metadata": {}, "output_type": "execute_result" } @@ -265,62 +277,25 @@ "{w: word_score(w) for w in mini if can_make(hc, w)}" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Step 3: Best Honeycomb\n", - "\n", - "Now that we can compute the game score of a honeycomb, a strategy for finding the best honeycomb is:\n", - " - Compile a list of valid candidate honeycombs.\n", - " - For each one, compute the game score.\n", - " - Return the one with the highest game score." - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [], - "source": [ - "def best_honeycomb(words) -> Honeycomb: \n", - " \"\"\"Return a honeycomb with highest game score on these words.\"\"\"\n", - " return max(valid_honeycombs(words), \n", - " key=lambda h: game_score(h, words))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "What are the possible candidate honeycombs? We can put any letter in the center, then any 6 letters around the outside (order doesn't matter); since the letter 'S' is not allowed, this gives a total of 25 × (24 choose 6) = 3,364,900 possible honeycombs. \n", - "\n", - "However, a key constraint of the game is that a valid honeycomb **must make at least one pangram**. That means that a valid honeycomb must ***be*** the set of seven letters in a pangram (with any of the seven letters as the center). There should be fewer than 3,364,900 of these." - ] - }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ - "def valid_honeycombs(words) -> List[Honeycomb]:\n", - " \"\"\"Valid Honeycombs are the pangram lettersets, with any center.\"\"\"\n", - " pangram_lettersets = {letterset(w) for w in words if pangram_bonus(w)}\n", - " return [Honeycomb(letters, center) \n", - " for letters in pangram_lettersets \n", - " for center in letters]" + "assert game_score(hc, mini) == 24 == sum(_.values())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "I will represent a **set of letters** as a sorted string of distinct letters. Why not a Python `set` (or `frozenset` to be hashable)? Because a string takes up less space in memory, and its printed representation is easier to read when debugging. Compare:\n", - "- `frozenset({'A', 'E', 'G', 'L', 'M', 'P', 'X'})`\n", - "- `'AEGLMPX'`\n", + "# 4: Top Honeycomb\n", "\n", - "I'll use the name `letterset` for the function that converts a word to a set of letters, and `Letterset` for the resulting type:" + "The strategy for finding the top (highest-scoring) honeycomb is:\n", + " - Compile a list of valid candidate honeycombs.\n", + " - For each honeycomb, compute the game score.\n", + " - Return a (score, honeycomb) tuple with the highest score." ] }, { @@ -329,43 +304,36 @@ "metadata": {}, "outputs": [], "source": [ - "Letterset = str # Type for a set of letters, like \"AGLM\"\n", - "\n", - "def letterset(word) -> Letterset:\n", - " \"\"\"The set of letters in a word, represented as a sorted str.\"\"\"\n", - " return ''.join(sorted(set(word)))" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'GAME': 'AEGM',\n", - " 'AMALGAM': 'AGLM',\n", - " 'GLAM': 'AGLM',\n", - " 'MEGAPLEX': 'AEGLMPX',\n", - " 'CACCIATORE': 'ACEIORT',\n", - " 'EROTICA': 'ACEIORT'}" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "{w: letterset(w) for w in mini}" + "def top_honeycomb(words) -> Tuple[int, Honeycomb]: \n", + " \"\"\"Return a (score, honeycomb) tuple with a highest-scoring honeycomb.\"\"\"\n", + " return max((game_score(h, words), h) \n", + " for h in candidate_honeycombs(words))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Note that 'AMALGAM' and 'GLAM' have the same letterset, as do 'CACCIATORE' and 'EROTICA'." + "What are the possible candidate honeycombs? We can put any letter (except 'S') in the center, then any 6 remaining letters around the outside; this gives 25 × (24 choose 6) = 3,364,900 possible honeycombs. It would take hours to apply `game_score` to all of these.\n", + "\n", + "Fortunately, we can use the constraint that a valid honeycomb **must make at least one pangram**. So the letters of any valid honeycomb must ***be*** the letterset of some pangram (and the center can be any of those letters):" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [], + "source": [ + "def candidate_honeycombs(words) -> List[Honeycomb]:\n", + " \"\"\"Valid honeycombs have pangram letters, with any center.\"\"\"\n", + " return [Honeycomb(letters, center) \n", + " for letters in pangram_lettersets(words)\n", + " for center in letters]\n", + "\n", + "def pangram_lettersets(words) -> Set[Letterset]:\n", + " \"\"\"All lettersets from the pangram words.\"\"\"\n", + " return {letterset(w) for w in words if is_pangram(w)}" ] }, { @@ -376,20 +344,7 @@ { "data": { "text/plain": [ - "[Honeycomb(letters='ACEIORT', center='A'),\n", - " Honeycomb(letters='ACEIORT', center='C'),\n", - " Honeycomb(letters='ACEIORT', center='E'),\n", - " Honeycomb(letters='ACEIORT', center='I'),\n", - " Honeycomb(letters='ACEIORT', center='O'),\n", - " Honeycomb(letters='ACEIORT', center='R'),\n", - " Honeycomb(letters='ACEIORT', center='T'),\n", - " Honeycomb(letters='AEGLMPX', center='A'),\n", - " Honeycomb(letters='AEGLMPX', center='E'),\n", - " Honeycomb(letters='AEGLMPX', center='G'),\n", - " Honeycomb(letters='AEGLMPX', center='L'),\n", - " Honeycomb(letters='AEGLMPX', center='M'),\n", - " Honeycomb(letters='AEGLMPX', center='P'),\n", - " Honeycomb(letters='AEGLMPX', center='X')]" + "{'ACEIORT', 'AEGLMPX'}" ] }, "execution_count": 14, @@ -398,7 +353,7 @@ } ], "source": [ - "valid_honeycombs(mini)" + "pangram_lettersets(mini)" ] }, { @@ -409,7 +364,20 @@ { "data": { "text/plain": [ - "Honeycomb(letters='ACEIORT', center='A')" + "[Honeycomb(letters='AEGLMPX', center='A'),\n", + " Honeycomb(letters='AEGLMPX', center='E'),\n", + " Honeycomb(letters='AEGLMPX', center='G'),\n", + " Honeycomb(letters='AEGLMPX', center='L'),\n", + " Honeycomb(letters='AEGLMPX', center='M'),\n", + " Honeycomb(letters='AEGLMPX', center='P'),\n", + " Honeycomb(letters='AEGLMPX', center='X'),\n", + " Honeycomb(letters='ACEIORT', center='A'),\n", + " Honeycomb(letters='ACEIORT', center='C'),\n", + " Honeycomb(letters='ACEIORT', center='E'),\n", + " Honeycomb(letters='ACEIORT', center='I'),\n", + " Honeycomb(letters='ACEIORT', center='O'),\n", + " Honeycomb(letters='ACEIORT', center='R'),\n", + " Honeycomb(letters='ACEIORT', center='T')]" ] }, "execution_count": 15, @@ -418,18 +386,14 @@ } ], "source": [ - "best_honeycomb(mini)" + "candidate_honeycombs(mini) # 2×7 of them" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "**We're done!** We know how to find the best honeycomb. But so far, we've only done it for the mini word list. \n", - "\n", - "# Step 4: The enable1 Word List\n", - "\n", - "Here's the real word list, `enable1.txt`, and some counts derived from it:" + "Now we're ready to find the highest-scoring honeycomb with the mini word list:" ] }, { @@ -438,16 +402,29 @@ "metadata": {}, "outputs": [ { - "name": "stdout", - "output_type": "stream", - "text": [ - " 172820 enable1.txt\r\n" - ] + "data": { + "text/plain": [ + "(31, Honeycomb(letters='ACEIORT', center='T'))" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "! [ -e enable1.txt ] || curl -O http://norvig.com/ngrams/enable1.txt\n", - "! wc -w enable1.txt" + "top_honeycomb(mini)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**It works.** But that's just the mini word list. \n", + "\n", + "# 5: The enable1 Word List\n", + "\n", + "Here's the real word list, `enable1.txt`, and some counts derived from it:" ] }, { @@ -456,19 +433,25 @@ "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "44585" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + "aa\n", + "aah\n", + "aahed\n", + "aahing\n", + "aahs\n", + "aal\n", + "aalii\n", + "aaliis\n", + "aals\n", + "aardvark\n" + ] } ], "source": [ - "enable1 = valid_words(open('enable1.txt').read())\n", - "len(enable1)" + "! [ -e enable1.txt ] || curl -O http://norvig.com/ngrams/enable1.txt\n", + "! head enable1.txt" ] }, { @@ -477,19 +460,15 @@ "metadata": {}, "outputs": [ { - "data": { - "text/plain": [ - "14741" - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" + "name": "stdout", + "output_type": "stream", + "text": [ + " 172820 enable1.txt\n" + ] } ], "source": [ - "pangrams = [w for w in enable1 if pangram_bonus(w)]\n", - "len(pangrams)" + "! wc -w enable1.txt" ] }, { @@ -500,7 +479,7 @@ { "data": { "text/plain": [ - "7986" + "44585" ] }, "execution_count": 19, @@ -509,7 +488,9 @@ } ], "source": [ - "len({letterset(w) for w in pangrams}) # pangram lettersets" + "enable1 = word_list(open('enable1.txt').read())\n", + "\n", + "len(enable1)" ] }, { @@ -520,7 +501,7 @@ { "data": { "text/plain": [ - "55902" + "14741" ] }, "execution_count": 20, @@ -529,7 +510,47 @@ } ], "source": [ - "len(valid_honeycombs(enable1))" + "len([w for w in enable1 if is_pangram(w)])" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "7986" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(pangram_lettersets(enable1))" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "55902" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(candidate_honeycombs(enable1))" ] }, { @@ -542,22 +563,22 @@ "- 44,585 valid Spelling Bee words\n", "- 14,741 pangram words \n", "- 7,986 distinct pangram lettersets\n", - "- 55,902 (7 × 7,986) valid pangram-containing honeycombs\n", - "- 3,364,900 possible honeycombs (most of which can't make a panagram and thus are invalid)\n", + "- 55,902 (7 × 7,986) candidate pangram-containing honeycombs\n", + "- out of 3,364,900 theoretically possible honeycombs\n", "\n", - "How long will it take to run `best_honeycomb(enable1)`? Most of the computation time is in `game_score` (each call has to look at all 44,585 valid words), so let's estimate the total time by first checking how long it takes to compute the game score of a single honeycomb:" + "How long will it take to run `top_honeycomb(enable1)`? Most of the computation time is in `game_score` (each call has to look at all 44,585 valid words), so let's estimate the total time by first checking how long it takes to compute the game score of a single honeycomb:" ] }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "8.7 ms ± 96.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + "8.48 ms ± 29.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], @@ -569,12 +590,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Roughly 9 milliseconds on my computer (this may vary). How many seconds would it be to run `game_score` for all 55,902 valid honeycombs?" + "Roughly 9 milliseconds on my computer (this may vary). How many seconds to run `game_score` for all 55,902 valid honeycombs?" ] }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 24, "metadata": {}, "outputs": [ { @@ -583,7 +604,7 @@ "503.118" ] }, - "execution_count": 22, + "execution_count": 24, "metadata": {}, "output_type": "execute_result" } @@ -596,19 +617,20 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "About 500 seconds, or 8 minutes. I could run `best_honeycomb(enable1)`, go take a coffee break, and come back to see the solution and declare victory. But I think that a puzzle like this deserves a more elegant solution. I'd like to get the run time under a minute (as is suggested in [Project Euler](https://projecteuler.net/)), and I have an idea how to do it. (*Note:* I should point out that [William Shunn](https://www.shunn.net/about.html) has a list of [58,838 pangrams](https://www.shunn.net/bee/pangrams.html) that form 20,597 distinct pangram lettersets, so it would take more than twice as long with his dictionary. However, we don't know what dictionary the NY Times is using, so we don't know what words would be accepted.)\n", + "About 500 seconds, or 8 minutes. I could run `top_honeycomb(enable1)`, take a coffee break, come back, and declare victory. \n", "\n", - "# Step 5: Faster Algorithm: Points Table\n", + "But I think that a puzzle like this deserves a more elegant solution. I'd like to get the run time under a minute (as is suggested in [Project Euler](https://projecteuler.net/)), and I have an idea how to do it. \n", "\n", - "Here's my plan for a more efficient program:\n", + "# 6: Faster Algorithm: Points Table\n", + "\n", + "Here's my plan:\n", "\n", "1. Keep the same strategy of trying every pangram letterset, but do some precomputation that will make `game_score` much faster.\n", - "1. The precomputation is: compute the `letterset` and `word_score` for each word, and make a table of `{letterset: total_points}` giving the total number of word score points for all the words that correspond to each letterset. I call this a **points table**.\n", + "1. The precomputation is: compute the `letterset` and `word_score` for each word in the word list, and make a table of `{letterset: total_points}` giving the total number of word score points for all the words that correspond to each letterset. I call this a **points table**.\n", "3. These calculations are independent of the honeycomb, so they need be done only once, not 55,902 times. \n", - "4. `game_score2` takes a honeycomb and a points table as input. The idea is that every word that the honeycomb can make must have a letterset that is the same as a valid **letter subset** of the honeycomb. A valid letter subset must include the center letter, and it may or may not include each of the other 6 letters, so there are exactly $2^6 = 64$ valid letter subsets. The function `letter_subsets(honeycomb)` computes these.\n", - "The result of `game_score2` is the sum of the honeycomb's 64 letter subset entries in the points table.\n", - "\n", - "That means that in `game_score2` we no longer need to iterate over 44,585 words and check if each word is a subset of the honeycomb. Instead we iterate over the 64 subsets of the honeycomb and for each one check—in one table lookup—whether it is a word (or more than word) and how many total points those word(s) score. Since 64 is less than 44,585, that's a nice optimization!\n", + "4. Every word that a honeycomb can make is formed from a **letter subset** of the honeycomb's 7 letters. A valid letter subset must include the center letter, and may include any non-empty subset of the other 6 letters, so there are 26 – 1 = 63 valid letter subsets. \n", + "4. `game_score2` considers each of the 63 letter subsets of a honeycomb, and sums the point table entry for each one. \n", + "5. Thus, `game_score2` iterates over just 63 letter subsets; a big optimization over `game_score`, which iterated over 44,585 words.\n", "\n", "\n", "Here's the code:" @@ -616,19 +638,17 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "PointsTable = Dict[Letterset, int] # How many points does a letterset score?\n", "\n", - "def best_honeycomb(words) -> Honeycomb: \n", - " \"\"\"Return a honeycomb with highest game score on these words.\"\"\"\n", + "def top_honeycomb2(words) -> Tuple[int, Honeycomb]: \n", + " \"\"\"Return a (score, honeycomb) tuple with a highest-scoring honeycomb.\"\"\"\n", " points_table = tabulate_points(words)\n", - " honeycombs = (Honeycomb(letters, center) \n", - " for letters in points_table if len(letters) == 7 \n", - " for center in letters)\n", - " return max(honeycombs, key=lambda h: game_score2(h, points_table))\n", + " return max((game_score2(h, points_table), h) \n", + " for h in candidate_honeycombs(words))\n", "\n", "def tabulate_points(words) -> PointsTable:\n", " \"\"\"Return a Counter of {letterset: points} from words.\"\"\"\n", @@ -638,15 +658,15 @@ " return table\n", "\n", "def letter_subsets(honeycomb) -> List[Letterset]:\n", - " \"\"\"The 64 subsets of the letters in the honeycomb, each including the center letter.\"\"\"\n", + " \"\"\"The 63 subsets of the letters in the honeycomb, each including the center letter.\"\"\"\n", " return [letters \n", - " for n in range(1, 8) \n", + " for n in range(2, 8) \n", " for letters in map(''.join, combinations(honeycomb.letters, n))\n", " if honeycomb.center in letters]\n", "\n", "def game_score2(honeycomb, points_table) -> int:\n", " \"\"\"The total score for this honeycomb, using a points table.\"\"\"\n", - " return sum(points_table[letterset] for letterset in letter_subsets(honeycomb))" + " return sum(points_table[s] for s in letter_subsets(honeycomb))" ] }, { @@ -655,55 +675,55 @@ "source": [ "Let's get a feel for how this works. \n", "\n", - "First consider `letter_subsets`. A 4-letter honeycomb makes $2^3 = 8$ subsets; 7-letter honeycombs make $2^6 = 64$ subsets:" + "First consider `letter_subsets`. A 4-letter honeycomb makes $2^3 - 1= 7$ subsets; 7-letter honeycombs make $2^6 - 1= 63$ subsets:" ] }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "['G', 'GL', 'GA', 'GM', 'GLA', 'GLM', 'GAM', 'GLAM']" + "['AG', 'LG', 'MG', 'ALG', 'AMG', 'LMG', 'ALMG']" ] }, - "execution_count": 24, + "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "letter_subsets(Honeycomb('GLAM', 'G')) " + "letter_subsets(Honeycomb('ALMG', 'G')) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Now `tabulate_points`:" + "Now let's eminded ourselves what `mini` is, and compute `tabulate_points(mini)`:" ] }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "mini = ['GAME', 'AMALGAM', 'GLAM', 'MEGAPLEX', 'CACCIATORE', 'EROTICA']\n" + "mini = ['AMALGAM', 'CACCIATORE', 'EROTICA', 'GAME', 'GLAM', 'MEGAPLEX']\n" ] }, { "data": { "text/plain": [ - "Counter({'AEGM': 1, 'AGLM': 8, 'AEGLMPX': 15, 'ACEIORT': 31})" + "Counter({'AGLM': 8, 'ACEIORT': 31, 'AEGM': 1, 'AEGLMPX': 15})" ] }, - "execution_count": 25, + "execution_count": 27, "metadata": {}, "output_type": "execute_result" } @@ -719,79 +739,30 @@ "source": [ "The letterset `'AGLM'` gets 8 points, 7 for AMALGAM and 1 for GLAM. `'ACEIORT'` gets 31 points, 17 for CACCIATORE and 14 for EROTICA. The other lettersets represent one word each. \n", "\n", - "Let's make sure we haven't broken the `best_honeycomb` function:" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "assert best_honeycomb(mini) == Honeycomb('ACEIORT', 'A')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Step 6: The Solution" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, the solution to the puzzle:" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "CPU times: user 1.65 s, sys: 2.19 ms, total: 1.65 s\n", - "Wall time: 1.65 s\n" - ] - } - ], - "source": [ - "%time best = best_honeycomb(enable1)" + "Let's make sure we haven't broken the `top_honeycomb` function:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(Honeycomb(letters='AEGINRT', center='R'), 3898)" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "best, game_score(best, enable1)" + "assert top_honeycomb(mini) == top_honeycomb2(mini)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "**Wow! 3898 is a high score!** \n", - "\n", - "And it took **less than 2 seconds** of computation to find the best honeycomb!\n", - "\n", - "Where does the time go? There's the initial time to do `tabulate_points` once, and then calls to `game_score2` for each candidate honeycomb: " + "# 7: The Solution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, the solution to the puzzle on the real word list:" ] }, { @@ -803,13 +774,32 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 66.1 ms, sys: 852 µs, total: 66.9 ms\n", - "Wall time: 66.2 ms\n" + "CPU times: user 1.65 s, sys: 3.78 ms, total: 1.65 s\n", + "Wall time: 1.65 s\n" ] + }, + { + "data": { + "text/plain": [ + "(3898, Honeycomb(letters='AEGINRT', center='R'))" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" } ], "source": [ - "%time points_table = tabulate_points(enable1)" + "%time top_honeycomb2(enable1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Wow! 3898 is a high score!** And the whole computation took **less than 2 seconds**!\n", + "\n", + "We can see that `game_score2` is about 300 times faster than `game_score`:" ] }, { @@ -821,23 +811,25 @@ "name": "stdout", "output_type": "stream", "text": [ - "25.6 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" + "26.4 µs ± 90.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" ] } ], "source": [ - "%timeit game_score2(Honeycomb('AEGINRT', 'R'), points_table)" + "points_table = tabulate_points(enable1)\n", + "\n", + "%timeit game_score2(hc, points_table)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Step 7: Even Faster Algorithm: Branch and Bound\n", + "# 8: Even Faster Algorithm: Branch and Bound\n", "\n", "A run time of less than 2 seconds is pretty good! But I'm not ready to stop now.\n", "\n", - "Consider the word 'EQUIVOKE'. It is a pangram, but what with the 'Q' and 'V' and 'K', it is not a high-scoring honeycomb, regardless of what center is used:" + "Consider the word **JUKEBOX**. It is a pangram, but what with the **J**, **K**, and **X**, it is a low-scoring honeycomb, regardless of what center is used:" ] }, { @@ -848,7 +840,13 @@ { "data": { "text/plain": [ - "{'E': 48, 'I': 32, 'K': 34, 'O': 36, 'Q': 29, 'U': 29, 'V': 35}" + "{Honeycomb(letters='BEJKOUX', center='J'): 26,\n", + " Honeycomb(letters='BEJKOUX', center='U'): 32,\n", + " Honeycomb(letters='BEJKOUX', center='K'): 26,\n", + " Honeycomb(letters='BEJKOUX', center='E'): 37,\n", + " Honeycomb(letters='BEJKOUX', center='B'): 49,\n", + " Honeycomb(letters='BEJKOUX', center='O'): 39,\n", + " Honeycomb(letters='BEJKOUX', center='X'): 15}" ] }, "execution_count": 31, @@ -857,23 +855,25 @@ } ], "source": [ - "letters = letterset('EQUIVOKE')\n", - "{C: game_score(Honeycomb(letters, C), enable1) for C in letters}" + "honeycombs = [Honeycomb(letterset('JUKEBOX'), C) for C in 'JUKEBOX']\n", + "\n", + "{h: game_score(h, enable1) for h in honeycombs}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "It would be great if we could eliminate all seven of these honeycombs at once, rather than trying each one in turn. So my idea is to:\n", - "- Keep track of the best honeycomb and best score found so far.\n", - "- For each new pangram letterset, ask \"if we weren't required to use the center letter, would this letterset score higher than the best honeycomb so far?\" \n", - "- If yes, then try it with all seven centers; if not then discard it immediately.\n", - "- This is called a [**branch and bound**](https://en.wikipedia.org/wiki/Branch_and_bound) algorithm: if an **upper bound** of the new letterset's score can't beat the best honeycomb so far, then we prune a whole **branch** of the search tree consisting of the seven honeycombs that have that letterset.\n", + "It would be great if we could determine that **JUKEBOX** is not a top honeycomb in one call to `game_score2`, rather than seven calls. My idea:\n", + "- Keep track of the top score found so far.\n", + "- For each pangram letterset, ask \"if we weren't required to use the center letter, what would this letterset score?\"\n", + "- Check if that score (which is an upper bound of the score using any one center letter) is higher than the top score so far.\n", + "- If yes, then try it with all seven centers; if not then discard it without trying any centers.\n", + "- This is called a [**branch and bound**](https://en.wikipedia.org/wiki/Branch_and_bound) algorithm: prune a whole **branch** (of 7 honeycombs) if an upper **bound** can't beat the top score.\n", "\n", - "What would the score of a letterset be if we weren't required to use the center letter? It turns out I can make a dummy Honeycomb and specify the empty string for the center, `Honeycomb(letters, '')`, and call `game_score2` on that. This works because of a quirk of Python: we ask if `honeycomb.center in letters`; normally in Python the expression `x in y` means \"is `x` a member of the collection `y`\", but when `y` is a string it means \"is `x` a substring of `y`\", and the empty string is a substring of every string. (If I had represented a letterset as a Python `set`, this wouldn't work.)\n", + "To compute the score of a honeycomb with no center, it turns out I can just call `game_score2` on `Honeycomb(letters, '')`. This works because of a quirk of Python: `game_score2` checks if `honeycomb.center in letters`; normally in Python the expression `x in y` means \"is `x` a member of the collection `y`\", but when `y` is a string it means \"is `x` a substring of `y`\", and the empty string is a substring of every string. (If I had represented a letterset as a Python `set`, this wouldn't work.)\n", "\n", - "Thus, I can rewrite `best_honeycomb` as follows:" + "Thus, I can rewrite `top_honeycomb` as follows:" ] }, { @@ -882,19 +882,19 @@ "metadata": {}, "outputs": [], "source": [ - "def best_honeycomb2(words) -> Honeycomb: \n", - " \"\"\"Return a honeycomb with highest game score on these words.\"\"\"\n", + "def top_honeycomb3(words) -> Tuple[int, Honeycomb]: \n", + " \"\"\"Return a (score, honeycomb) tuple with a highest-scoring honeycomb.\"\"\"\n", " points_table = tabulate_points(words)\n", - " best, best_score = None, 0\n", + " top_score, top = -1, None\n", " pangrams = (s for s in points_table if len(s) == 7)\n", " for p in pangrams:\n", - " if game_score2(Honeycomb(p, ''), points_table) > best_score:\n", + " if game_score2(Honeycomb(p, ''), points_table) > top_score:\n", " for center in p:\n", " honeycomb = Honeycomb(p, center)\n", " score = game_score2(honeycomb, points_table)\n", - " if score > best_score:\n", - " best, best_score = honeycomb, score\n", - " return best" + " if score > top_score:\n", + " top_score, top = score, honeycomb\n", + " return top_score, top" ] }, { @@ -906,14 +906,14 @@ "name": "stdout", "output_type": "stream", "text": [ - "CPU times: user 373 ms, sys: 1.32 ms, total: 374 ms\n", - "Wall time: 374 ms\n" + "CPU times: user 360 ms, sys: 2 ms, total: 362 ms\n", + "Wall time: 361 ms\n" ] }, { "data": { "text/plain": [ - "Honeycomb(letters='AEGINRT', center='R')" + "(3898, Honeycomb(letters='AEGINRT', center='R'))" ] }, "execution_count": 33, @@ -922,20 +922,16 @@ } ], "source": [ - "%time best_honeycomb2(enable1)" + "%time top_honeycomb3(enable1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Same honeycomb for the answer, but four times faster–less than 0.4 second.\n", + "Awesome! We get the same answer, and the computation is about 5 times faster; about **1/3 second**.\n", "\n", - "# Step 8: Curiosity\n", - "\n", - "I'm curious about a bunch of things.\n", - "\n", - "### What's the highest-scoring individual word?" + "For how many pangram lettersets did we have to check all 7 centers? We can find out by copy-pasting `top_honeycomb3` and annotating it to keep a `COUNT` of the number of pangrams that are checked, and to print a line of output when a honeycomb (either with or without a center letter) outscores the top score." ] }, { @@ -943,10 +939,45 @@ "execution_count": 34, "metadata": {}, "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "ADFLORW scores 333 with center ∅ (pangram 1 (1/7986))\n", + " scores 265 with center A \n", + "ABDENOR scores 1856 with center ∅ (pangram 2 (2/7986))\n", + " scores 1148 with center A \n", + " scores 1476 with center D \n", + " scores 1578 with center E \n", + "ABEIRTV scores 1585 with center ∅ (pangram 3 (4/7986))\n", + "ABDEORT scores 2434 with center ∅ (pangram 4 (28/7986))\n", + " scores 1679 with center A \n", + " scores 2134 with center E \n", + "ABCDERT scores 2254 with center ∅ (pangram 5 (35/7986))\n", + "ACELNRT scores 2529 with center ∅ (pangram 6 (46/7986))\n", + " scores 2158 with center A \n", + " scores 2225 with center E \n", + "ACDELRT scores 2746 with center ∅ (pangram 7 (47/7986))\n", + " scores 2273 with center A \n", + " scores 2608 with center E \n", + "ACENORT scores 2799 with center ∅ (pangram 8 (50/7986))\n", + "ACEIPRT scores 2653 with center ∅ (pangram 9 (57/7986))\n", + "ACDEIRT scores 3407 with center ∅ (pangram 10 (71/7986))\n", + " scores 3023 with center E \n", + "ACEINRT scores 3575 with center ∅ (pangram 11 (77/7986))\n", + "ADEOPRT scores 3031 with center ∅ (pangram 12 (157/7986))\n", + "AEGINRT scores 4688 with center ∅ (pangram 13 (178/7986))\n", + " scores 3372 with center A \n", + " scores 3769 with center E \n", + " scores 3782 with center N \n", + " scores 3898 with center R \n", + "ADEINRT scores 4020 with center ∅ (pangram 14 (419/7986))\n" + ] + }, { "data": { "text/plain": [ - "'ANTITOTALITARIAN'" + "(3898, Honeycomb(letters='AEGINRT', center='R'))" ] }, "execution_count": 34, @@ -955,14 +986,33 @@ } ], "source": [ - "max(enable1, key=word_score)" + "def top_honeycomb3_annotated(words) -> Tuple[int, Honeycomb]: \n", + " \"\"\"Return a (score, honeycomb) tuple with a highest-scoring honeycomb. Print stuff.\"\"\"\n", + " points_table = tabulate_points(words)\n", + " top_score, top = -1, None\n", + " pangrams = [s for s in points_table if len(s) == 7]\n", + " COUNT = 0\n", + " for i, p in enumerate(pangrams, 1):\n", + " if game_score2(Honeycomb(p, ''), points_table) > top_score:\n", + " COUNT +=1; \n", + " print(f'{p} scores {game_score2(Honeycomb(p, \"\"), points_table):4}',\n", + " f'with center ∅ (pangram {COUNT:2} ({i}/{len(pangrams)}))')\n", + " for center in p:\n", + " honeycomb = Honeycomb(p, center)\n", + " score = game_score2(honeycomb, points_table)\n", + " if score > top_score:\n", + " top_score, top = score, honeycomb\n", + " print(f'{\" \":8}scores {top_score:4} with center {top.center} ')\n", + " return top_score, top\n", + "\n", + "top_honeycomb3_annotated(enable1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### What are some of the pangrams?" + "Only 14 pangram lettersets had to have all 7 centers checked. We were lucky that the 4th pangram out of 7986, ABDEORT, happened to be aa good one, scoring 2134 points (with center E), setting a high score so that most of the remaining pangrams only needed to be checked with an empty center. The total number of calls to `game_score2` is:" ] }, { @@ -973,36 +1023,7 @@ { "data": { "text/plain": [ - "['AARDWOLF',\n", - " 'ANCIENTER',\n", - " 'BABBLEMENT',\n", - " 'BIVARIATE',\n", - " 'CABEZON',\n", - " 'CHEERFUL',\n", - " 'COLLOGUING',\n", - " 'CRANKLE',\n", - " 'DEMERGERING',\n", - " 'DWELLING',\n", - " 'ETYMOLOGY',\n", - " 'FLATTING',\n", - " 'GARROTTING',\n", - " 'HANDIER',\n", - " 'IDENTIFY',\n", - " 'INTERVIEWER',\n", - " 'LARVICIDAL',\n", - " 'MANDRAGORA',\n", - " 'MORTGAGEE',\n", - " 'NOTABLE',\n", - " 'OVERHELD',\n", - " 'PERONEAL',\n", - " 'PRAWNED',\n", - " 'QUILTER',\n", - " 'REINITIATED',\n", - " 'TABLEFUL',\n", - " 'TOWHEAD',\n", - " 'UNCHURCHLY',\n", - " 'UTOPIAN',\n", - " 'WINDAGE']" + "8084" ] }, "execution_count": 35, @@ -1011,219 +1032,46 @@ } ], "source": [ - "pangrams[::500] # Every five-hundreth pangram" + "len(pangram_lettersets(enable1)) + 14 * 7" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "### What's the breakdown of reasons why words are invalid?" + "8,084 is a big improvement over 55,902.\n", + "\n", + "# 9: Fancy Report\n", + "\n", + "I'd like to see the actual words that each honeycomb can make, and I'm curious about how the words are divided up by letterset. Here's a function to provide such a report. I remembered that there is a `fill` function in Python (it is in the `textwrap` module) but this turned out to be a lot more complicated than I expected. I guess it is difficult to create a practical extraction and reporting tool. I feel you, [Larry Wall](http://www.wall.org/~larry/)." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('too many distinct letters', 73611),\n", - " ('contains an S', 53556),\n", - " ('valid', 44585),\n", - " ('too short', 1068)]" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "def common(items): return Counter(items).most_common()\n", - "\n", - "common('too short' if len(w) < 4 else \n", - " 'too many distinct letters' if len(set(w)) > 7 else \n", - " 'contains an S' if 'S' in w else\n", - " 'valid'\n", - " for w in valid_words(open('enable1.txt').read(), lambda w: True))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are more words with an 'S' than there are valid words.\n", - "\n", - "### About the points table: How many different letter subsets are there? " - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "21661" - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "pts = tabulate_points(enable1)\n", - "len(pts)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "That means there's about two valid words for each letterset.\n", - "\n", - "### Which letter subsets score the most?" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('AEGINRT', 832),\n", - " ('ADEGINR', 486),\n", - " ('ACILNOT', 470),\n", - " ('ACEINRT', 465),\n", - " ('CEINORT', 398),\n", - " ('AEGILNT', 392),\n", - " ('AGINORT', 380),\n", - " ('ADEINRT', 318),\n", - " ('CENORTU', 318),\n", - " ('ACDEIRT', 307)]" - ] - }, - "execution_count": 38, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "pts.most_common(10)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The best honeycomb, `'AEGINRT'`, is also the highest scoring letter subset on its own (although it only gets 832 of the 3,898 total points from using all seven letters)." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### How many honeycombs does `best_honeycomb2` consider?\n", - "\n", - "We know that `best_honeycomb` considers 7,986 × 7 = 55,902 honeycombs. How many does `best_honeycomb2` consider? We can answer that by wrapping `Honeycomb` with a decorator that counts calls:" - ] - }, - { - "cell_type": "code", - "execution_count": 39, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "8084" - ] - }, - "execution_count": 39, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "def call_counter(fn):\n", - " \"Return a function that calls fn, and increments a counter on each call.\"\n", - " def wrapped(*args, **kwds):\n", - " wrapped.call_counter += 1\n", - " return fn(*args, **kwds)\n", - " wrapped.call_counter = 0\n", - " return wrapped\n", - " \n", - "Honeycomb = call_counter(Honeycomb)\n", - "\n", - "best = best_honeycomb2(enable1)\n", - "Honeycomb.call_counter" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Only 8,084 honeycombs are considered. That means that most pangrams are only considered once; for only 14 pangrams do we consider all seven centers." - ] - }, - { - "cell_type": "code", - "execution_count": 40, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "14.0" - ] - }, - "execution_count": 40, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "(8084 - 7986) / 7" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Step 9: Fancy Report\n", - "\n", - "I'd like to see the actual words that each honeycomb can make, in addition to the total score, and I'm curious about how the words are divided up by letterset. Here's a function to provide such a report. I remembered that there is a `fill` function in Python (it is in the `textwrap` module) but this turned out to be a lot more complicated than I expected. I guess it is difficult to create a practical extraction and reporting tool. I feel you, [Larry Wall](http://www.wall.org/~larry/)." - ] - }, - { - "cell_type": "code", - "execution_count": 41, - "metadata": {}, "outputs": [], "source": [ "from textwrap import fill\n", "\n", "def report(honeycomb=None, words=enable1):\n", - " \"\"\"Print stats, words, and word scores for the given honeycomb (or the best\n", + " \"\"\"Print stats, words, and word scores for the given honeycomb (or the top\n", " honeycomb if no honeycomb is given) over the given word list.\"\"\"\n", - " bins = group_by(words, letterset)\n", - " adj = (\"Best \" if honeycomb is None else \"\")\n", - " honeycomb = honeycomb or best_honeycomb(words)\n", - " points = game_score(honeycomb, words)\n", + " bins = group_by(words, key=letterset)\n", + " if honeycomb is None:\n", + " adj = \"Top \"\n", + " score, honeycomb = top_honeycomb3(words)\n", + " else:\n", + " adj = \"\"\n", + " score = game_score(honeycomb, words)\n", " subsets = letter_subsets(honeycomb)\n", " nwords = sum(len(bins[s]) for s in subsets)\n", - " print(f'{adj}{honeycomb} scores {Ns(points, \"point\")} on {Ns(nwords, \"word\")}',\n", + " print(f'{adj}{honeycomb} scores {Ns(score, \"point\")} on {Ns(nwords, \"word\")}',\n", " f'from a {len(words)} word list:\\n')\n", " for s in sorted(subsets, key=lambda s: (-len(s), s)):\n", " if bins[s]:\n", " pts = sum(word_score(w) for w in bins[s])\n", - " wcount = Ns(len(bins[s]), \"pangram\" if len(s) == 7 else \"word\")\n", + " wcount = Ns(len(bins[s]), \"pangram\" if is_pangram(s) else \"word\")\n", " intro = f'{s:>7} {Ns(pts, \"point\"):>10} {wcount:>8} '\n", " words = [f'{w}({word_score(w)})' for w in sorted(bins[s])]\n", " print(fill(' '.join(words), width=110, \n", @@ -1244,7 +1092,7 @@ }, { "cell_type": "code", - "execution_count": 42, + "execution_count": 37, "metadata": {}, "outputs": [ { @@ -1260,21 +1108,19 @@ } ], "source": [ - "report(hc, mini)" + "report(Honeycomb('AEGLMPX', 'G'), mini)" ] }, { "cell_type": "code", - "execution_count": 43, - "metadata": { - "scrolled": false - }, + "execution_count": 38, + "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Best Honeycomb(letters='AEGINRT', center='R') scores 3898 points on 537 words from a 44585 word list:\n", + "Top Honeycomb(letters='AEGINRT', center='R') scores 3898 points on 537 words from a 44585 word list:\n", "\n", "AEGINRT 832 points 50 pangrams AERATING(15) AGGREGATING(18) ARGENTINE(16) ARGENTITE(16) ENTERTAINING(19)\n", " ENTRAINING(17) ENTREATING(17) GARNIERITE(17) GARTERING(16) GENERATING(17) GNATTIER(15) GRANITE(14)\n", @@ -1373,831 +1219,28 @@ } ], "source": [ - "report()" + "report(words=enable1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "# Step 10: What honeycombs have a high score without a lot of words?\n", + "# 10: 'S' Words\n", "\n", - "[Michael Braverman](https://www.linkedin.com/in/michael-braverman-2b32721/) said he dislikes puzzles with a lot of low-scoring four-letter words. Can we find succint puzzles with lots of points and fewer words? With two objectives there won't be a single best answer to this question; rather we can ask: what honeycombs are there such that there are no other honeycombs with both more points and fewer words? We say such honeycombs are [**Pareto optimal**](https://en.wikipedia.org/wiki/Pareto_efficiency) and are on the **Pareto frontier**. We can find them as follows:" + "What if we allowed honeycombs and words to have an 'S' in them? I'll make a new word list, and report on it:" ] }, { "cell_type": "code", - "execution_count": 44, - "metadata": {}, - "outputs": [], - "source": [ - "def pareto_honeycombs(words) -> list: \n", - " \"\"\"A table of {word_count: (points, honeycomb)} with highest scoring honeycomb.\"\"\"\n", - " points_table = tabulate_points(words)\n", - " wcount_table = Counter(map(letterset, words))\n", - " honeycombs = (Honeycomb(letters, center) \n", - " for letters in points_table if len(letters) == 7 \n", - " for center in letters)\n", - " # Build a table of {word_count: (points, honeycomb)}\n", - " table = defaultdict(lambda: (0, None)) \n", - " for h in honeycombs:\n", - " points = game_score2(h, points_table)\n", - " wcount = game_score2(h, wcount_table)\n", - " table[wcount] = max(table[wcount], (points, h))\n", - " return pareto_frontier(table)\n", - " \n", - "def pareto_frontier(table) -> list:\n", - " \"\"\"The pareto frontier that minimizes word counts while maximizing points.\n", - " Returns a list of (wcount, points, honeycomb, points/wcount) entries\n", - " such that there is no other entry that has fewer words and more points.\"\"\"\n", - " return sorted((w, p, h, round(p/w, 2))\n", - " for w, (p, h) in table.items()\n", - " if not any(h2 != h and w2 <= w and p2 >= p\n", - " for w2, (p2, h2) in table.items()))" - ] - }, - { - "cell_type": "code", - "execution_count": 45, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "108" - ] - }, - "execution_count": 45, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "ph = pareto_honeycombs(enable1)\n", - "len(ph)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "So there are 108 (out of 55,902) honeycombs on the Pareto frontier. Let's see what the frontier looks like by plotting word counts versus points scored:" - ] - }, - { - "cell_type": "code", - "execution_count": 46, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEGCAYAAACUzrmNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3dfZTdVX3v8fdnZpKAgiY8TQMJBJpoBbwCGSEW2ztBisC1BhUFdFlUbOy9cBcs9Sr0ASmULuy1orYUmwoVWyCgQMllwaU8HanWBJjwlBCQMTAwJsIlOQFGhGRmvveP3z6TM5MzOZPDnMf5vNY665zf/u3f7+ydTOab/fDbWxGBmZnZzrTVuwBmZtb4HCzMzKwsBwszMyvLwcLMzMpysDAzs7I66l2Aathnn31i3rx5FV3761//mre+9a2TW6AG4zo2v1avH7iO9dDT0/NSROxb6lxLBot58+bx0EMPVXRtLpeju7t7cgvUYFzH5tfq9QPXsR4k9Y13rurdUJLaJT0s6bZ0fLCkVZKelnSDpOkpfUY67k3n5xXd44KU/pSkD1a7zGZmNlotxizOBdYVHX8duDwiFgB54KyUfhaQj4j5wOUpH5IOBU4HDgNOBP5BUnsNym1mZklVg4WkOcB/A76XjgUcB/woZbkGOCV9XpKOSec/kPIvAZZHxBsR8QzQCxxdzXKbmdlo1R6z+BbwFWDPdLw3sCUiBtNxP3BA+nwA8DxARAxKejnlPwBYWXTP4mtGSFoKLAXo7Owkl8tVVOCBgYGKr20WrmPza/X6gevYaKoWLCR9CHgxInokdReSS2SNMud2ds32hIhlwDKArq6uqHTQqNEGnKrBdWx+rV4/cB0bTTVbFscCH5Z0MrAb8DaylsZMSR2pdTEH2JDy9wNzgX5JHcDbgc1F6QXF15iZWQ1UbcwiIi6IiDkRMY9sgPreiPgUcB9wasp2JnBr+rwiHZPO3xvZkrgrgNPTbKmDgQXAA9Uqt5lZs+rpy3PFfb309OUn/d71eM7iq8BySX8FPAxcldKvAv5FUi9Zi+J0gIhYK+lG4AlgEDg7IoZqX2wzs8bV05fnU99bydbBYaZ3tHHt5xex8KBZk3b/mgSLiMgBufR5PSVmM0XE68DHx7n+UuDS6pXQzKy5rVy/ia2DwwwHbBscZuX6TZMaLLw2lJlZC1h0yN5M72ijXTCto41Fh+w9qfdvyeU+zMymmoUHzeLazy9i5fpNLDpk70ltVYCDhZlZy1h40KxJDxIF7oYyM2sw1ZzVVCm3LMzMGki1ZzVVyi0LM7MGUmpWUyNwsDAzayDVntVUKXdDmZk1kGrPaqqUg4WZWYOp5qymSrkbyszMynKwMDOzshwszMysLAcLMzMry8HCzMzKcrAwM7OyHCzMzKysqgULSbtJekDSo5LWSvrLlP59Sc9IeiS9jkjpkvQdSb2SHpN0VNG9zpT0dHqdOd53mplZdVTzobw3gOMiYkDSNOAnku5I5/5XRPxoTP6TyPbXXgAcA1wJHCNpL+BrQBcQQI+kFRHROMsxmpm1uKq1LCIzkA6npVfs5JIlwA/SdSuBmZJmAx8E7oqIzSlA3AWcWK1ym5nZjqo6ZiGpXdIjwItkv/BXpVOXpq6myyXNSGkHAM8XXd6f0sZLNzOzGqnq2lARMQQcIWkmcIukw4ELgF8B04FlwFeBiwGVusVO0keRtBRYCtDZ2Ukul6uozAMDAxVf2yxcx+bX6vUD17HR1GQhwYjYIikHnBgR30jJb0j6Z+DL6bgfmFt02RxgQ0rvHpOeK/Edy8iCD11dXdHd3T02y4TkcjkqvbZZuI7Nr9XrB65jo6nmbKh9U4sCSbsDxwNPpnEIJAk4BViTLlkB/FGaFbUIeDkiNgJ3AidImiVpFnBCSjMzq0il25Y24nantVLNlsVs4BpJ7WRB6caIuE3SvZL2JeteegT4k5T/duBkoBd4DfgsQERslnQJ8GDKd3FEbK5iuc2shVW6bWmjbndaK1ULFhHxGHBkifTjxskfwNnjnLsauHpSC2hmU1KpbUsn8ku/0utahZ/gNrMppdJtSxt1u9Na8U55ZjalVLptaaNud1orDhZmNuVUum1pI253WivuhjIzs7IcLMzMrCwHCzMzK8vBwszMynKwMDOzshwszMysLAcLMzMry8HCzMzKcrAwM7OyHCzMzKwsBwsza1hTef+IRuO1ocysIfXmh/jGPVN3/4hG45aFmTWkJzcP7bB/hNWPg4WZNaTf2at9Su8f0WiquQf3bpIekPSopLWS/jKlHyxplaSnJd0gaXpKn5GOe9P5eUX3uiClPyXpg9Uqs5k1jvmz2rn284v44gnvdBdUA6jmmMUbwHERMSBpGvATSXcAXwQuj4jlkr4LnAVcmd7zETFf0unA14HTJB0KnA4cBuwP3C3pHRExVMWym1kDmMr7RzSaqrUsIjOQDqelVwDHAT9K6dcAp6TPS9Ix6fwHJCmlL4+INyLiGaAXOLpa5TYzsx1VdTaUpHagB5gPXAH8AtgSEYMpSz9wQPp8APA8QEQMSnoZ2Dulryy6bfE1xd+1FFgK0NnZSS6Xq6jMAwMDFV/bLFzH5tfq9QPXsdFUNVikrqIjJM0EbgHeVSpbetc458ZLH/tdy4BlAF1dXdHd3V1JkcnlclR6bbNwHZtfq9cPXMdGU5PZUBGxBcgBi4CZkgpBag6wIX3uB+YCpPNvBzYXp5e4xszMaqCas6H2TS0KJO0OHA+sA+4DTk3ZzgRuTZ9XpGPS+XsjIlL66Wm21MHAAuCBapXbzMx2VM1uqNnANWncog24MSJuk/QEsFzSXwEPA1el/FcB/yKpl6xFcTpARKyVdCPwBDAInO2ZUGZmtVW1YBERjwFHlkhfT4nZTBHxOvDxce51KXDpZJfRzMwmxk9wm5lZWQ4WZmZWloOFmZmV5WBhZmZlOViYmVlZDhZmZlaWg4WZmZXlYGFmZmU5WJiZWVkOFmZmVpaDhZmZlVXV/SzMzHZVT1+eles3MWPLEN31LoyNcLAws4bR05fnU99bydbBYToERx6V9x7cDcLdUGbWMFau38TWwWGGAwaHs2NrDA4WZjapevryXHFfLz19+V2+dtEhezO9o412QUdbdmyNwd1QZjZpiruRpne0ce3nF+1SN9LCg2Zx7ecXpTGLPndBNZBqbqs6V9J9ktZJWivp3JR+kaRfSnokvU4uuuYCSb2SnpL0waL0E1Nar6Tzq1VmM3tziruRtg0OV9SNtPCgWZy9eD7zZ7VXoYRWqWq2LAaBL0XEakl7Aj2S7krnLo+IbxRnlnQo2VaqhwH7A3dLekc6fQXwB0A/8KCkFRHxRBXLbmYVKHQjbRscZlpHm7uRWkg1t1XdCGxMn1+VtA44YCeXLAGWR8QbwDNpL+7C9qu9aTtWJC1PeR0szBpMcTfSokP2djdSC1FEVP9LpHnA/cDhwBeBzwCvAA+RtT7ykv4eWBkR/5quuQq4I93ixIj4fEr/NHBMRJwz5juWAksBOjs7Fy5fvryisg4MDLDHHntUdG2zcB2bX6vXD1zHeli8eHFPRHSVOlf1AW5JewA3AedFxCuSrgQuASK9/y3wOUAlLg9Kj6vsEOEiYhmwDKCrqyu6u7srKm8ul6PSa5uF69j8Wr1+4Do2mqoGC0nTyALFtRFxM0BEvFB0/p+A29JhPzC36PI5wIb0ebx0MzOrgWrOhhJwFbAuIr5ZlD67KNtHgDXp8wrgdEkzJB0MLAAeAB4EFkg6WNJ0skHwFdUqt5mN7808Q2HNrZoti2OBTwOPS3okpf0pcIakI8i6kp4FvgAQEWsl3Ug2cD0InB0RQwCSzgHuBNqBqyNibRXLbWYlvNlnKKy5VXM21E8oPQ5x+06uuRS4tET67Tu7zsyqr9QzFA4WU4eX+zCzCSleisPPUEw9Xu7DzCbEz1BMbQ4WZlNQYc+IXf2lv/CgWQ4SU5SDhdkU44Fqq4THLMymmMlY7M+mHgcLsynGA9VWCXdDmU0xHqi2SjhYmE1BHqi2XeVuKLMW5yU6bDK4ZWHWwjzzySbLhFoWks6V9DZlrpK0WtIJ1S6cmb05nvlkk2Wi3VCfi4hXgBOAfYHPApdVrVRmNik888kmy0S7oQoLAp4M/HNEPJqWIDezBjL2yWzPfLLJMtFg0SPp34GDgQsk7QkMV69YZrarxhuf8MwnmwwT7YY6CzgfeG9EvAZMJ+uKMrMG4fEJq6aJBou7ImJ1RGwBiIhNwOXVK5aZ7SqPT1g17bQbStJuwFuAfSTNYvvYxduA/atcNjMbozAmMWPLEN1jznl8wqqp3JjFF4DzyAJDD9uDxSvAFTu7UNJc4AfAb5GNbyyLiG9L2gu4AZhHtq3qJyIinwbMv002iP4a8JmIWJ3udSbw5+nWfxUR1+xCHc2a0tjB6uIxiQ7BkUfldwgIHp+watlpsIiIbwPflvQ/I+LvdvHeg8CXImJ1GhDvkXQX8Bngnoi4TNL5ZGMhXwVOAhak1zHAlcAxKbh8Degi27e7R9KKiPDjqNaySg1WF49JDAbe1tRqakKzoSLi7yT9LllroKMo/Qc7uWYjsDF9flXSOuAAYAmMtKCvAXJkwWIJ8IOICGClpJmSZqe8d0XEZoAUcE4Erp9oJc2aQU9fnptW948038cOVhfGJLYNDtMuPCZhNTWhYCHpX4DfBh4BhlJykHUzTeT6ecCRwCqgMwUSImKjpP1StgOA54su609p46WP/Y6lwFKAzs5OcrncRIq2g4GBgYqvbRauY+PpzQ9x2QOvMxjZcbugTUBkn2ds6ePVZ/r58lHTeXLzEAfuvpVXn3mU3DN1LXZVNdvfYSWaqY4Tfc6iCzg0/a9/l0jaA7gJOC8iXtnJs3ylTsRO0kcnRCwDlgF0dXVFd3f3rhYVgFwuR6XXNgvXsfGsva+XoXhq5Hg44IyjD+SAmbuPGqzuTuebrX6VcB0by0SDxRqygeqNu3JzSdPIAsW1EXFzSn5B0uzUqpgNvJjS+4G5RZfPATak9O4x6bldKYdZo1t0yN5Maxdbh7L/B03raONjR83xmIQ1jIkGi32AJyQ9ALxRSIyID493QZrddBWwLiK+WXRqBXAm2dpSZwK3FqWfI2k52QD3yymg3An8dZq6C9n6VBdMsNxmTePUrrm89Oob7LvnDD7qQGENZqLB4qIK7n0s8GngcUmPpLQ/JQsSN0o6C3gO+Hg6dzvZtNlesqmznwWIiM2SLgEeTPkuLgx2mzW7wqD2j3r6GRzyMuLWuCY6G+rHu3rjiPgJpccbAD5QIn8AZ49zr6uBq3e1DGaN7LpVz3HhrWsYGo6RQbjCzCcHC2s05Z7g/klEvF/Sq4weVBbZ7/e3VbV0Zi2m8KDdrLdM58Jb1zA4vP2flfAyHda4yj2U9/70vmdtimPWmsZ2N7VJDBUFinbB6Ucf6LEKa1gT3lZV0nuA30uH90fEY9UpkllrKTyN/ca24e3N8wja28TwcNDWJi5ecjifPObAehbTbKcm+lDeucAfA4Xpr9dKWlbBEiBmLan46euxrYObV/ePChQCpk9r48IPHUb+ta1e9M+awkRbFmcBx0TErwEkfR34GeBgYVNeT1+eM5b9bOQZiR/29HP9Hy8aWfzvhw89PxIoOtrFaV1z3d1kTWdXtlUdKjoeYvyZTmYtr3hF2JXrN7FtaPv4Q/GMppXrN40MYgv4RNdcLv3Iu+tUarPKTTRY/DOwStIt6fgUsgfuzKacsSvCXvihw3Z4+rowo6l48b/CU9lmzWiiz1l8U1IOeD/Zf5A+GxEPV7NgZo1q7Pal+de2cv3S95Ucs/CGRNYqJrJT3p8A84HHgX+IiMFaFMysERQ/F1EYjB7bWigEgfECgTckslZQrmVxDbAN+A+yzYneRbZznlnLGzvltU2MLMfh1oJNNeWCxaER8W4ASVcBD1S/SGaNodDdVBi6Lt6I6OzF8x0kbEppK3N+W+GDu59sqil0NxX+kbTJy3HY1FWuZfEeSa+kzwJ2T8deG8paWmGsovDgXPGYhVsUNhWVWxuqvVYFMWsUY6fGeslws/LdUGZTQk9fnivu6x1pURSmxm5NYxRmU92EFxI0a1VjWxKfed88CgvCDgfMesv0+hbQrAFUrWUh6WpJL0paU5R2kaRfSnokvU4uOneBpF5JT0n6YFH6iSmtV9L51SqvTV1jH7Jbu/GVkbVs2oD8a1vrWTyzhlDNbqjvAyeWSL88Io5Ir9sBJB0KnA4clq75B0ntktqBK8ie8TgUOCPlNZs0hVlP7Wm200mHz2bGtOx4+jTPfjKDKnZDRcT9kuZNMPsSYHlEvAE8I6kXODqd642I9QCSlqe8T0xycW0KK7Ukxzt/a08/dGdWRNnW11W6eRYsbouIw9PxRcBngFeAh4AvRURe0t8DKyPiX1O+q4A70m1OjIjPp/RPky2Vfk6J71oKLAXo7OxcuHz58orKPDAwwB577FHRtc3CdRytNz/Ek5uH+J292pk/qzkmAPrvsDU0Wh0XL17cExFdpc7VeoD7SuASsv28LwH+FvgcpZc7D0p3k5WMbhGxDFgG0NXVFd3d3RUVMJfLUem1zcJ1zIzd6nR6x1DTTJP132FraKY61jRYRMQLhc+S/gm4LR32A3OLss4BNqTP46Wb7ZLiRQHXbHiZH/X0s61oOY/ifSjMbLSaBgtJsyNiYzr8CFCYKbUCuE7SN4H9gQVk61AJWCDpYOCXZIPgn6xlma059eaHWHtf78iYw9hFAcXoJqrwUh5mO1O1YCHpeqAb2EdSP/A1oFvSEWT/Tp8FvgAQEWsl3Ug2cD0InB0RQ+k+5wB3Au3A1RGxtlplttbQ05fnbx58ncF4auQJ7LGLAhbvhz2tXXzcW52a7VQ1Z0OdUSJ53N31IuJS4NIS6bcDt09i0awFFG9rOvYX/Mr1m9g2nAWEQtdSYXrs1m3DDJMtCtjR5iBhNlF+gtuaTrm1mxYdsjfT2mAoGLU5UaGF4UUBzXadg4U1jUJrYsOW34x64nrsoPTCg2bxlffuxhszDxoVELxjnVnlHCysKRS3JjraREd7G0NDw+MOSs+f1U539/w6lNSsNTlYWFO4eXX/yEymoeHgtKPncsDM3d2VZFYjDhbW8Hr68vzwoedHZjC1t7fxMQ9Km9WU97Owhrdy/SYG05rhAk5d6EBhVmsOFtbwileFnTEta1WYWW25G8qawkePmoPSu1sVZrXnYGENo3jtpsJzEMCoZyo+6laFWV04WFhDGLt2U5tgekfW5bSzZyrMrDYcLKxuilsSd6zZOGrtpkJwCLKgsW1w/GcqzKz6HCysLsZbBbbw3pa2OP3YUXP42FFzvGudWZ05WFjN9fTl+dbdP99hFdg24NgF+3DS4bN3WLvJQcKsvhwsrKbGa1EUxijOO/4dDgxmDcjBwqqueDnx4n0ldtaSMLPG4mBhVXXdque48NY1DEcwvaONCz902KgBa7ckzJqDg4VVTU9fngtvXTOyVMfWwWHyr20d2VfCLQmz5lG15T4kXS3pRUlritL2knSXpKfT+6yULknfkdQr6TFJRxVdc2bK/7SkM6tVXpt8K9dvYmh4+07XbdJIgDh78XwHCrMmUs21ob4PnDgm7XzgnohYANyTjgFOAhak11LgSsiCC9ne3ccARwNfKwQYazw9fXmuuK+Xnr48kK3pNGNaG21kW5hevORwBwizJlXNPbjvlzRvTPISoDt9vgbIAV9N6T+IiABWSpopaXbKe1dEbAaQdBdZALq+WuW2yoy31am7nMxaQ63HLDojYiNARGyUtF9KPwB4vihff0obL30HkpaStUro7Owkl8tVVMCBgYGKr20Wk1nH3vwQT24eYtNvYmQ67NZtw1x/94O8+tvTAThM8Ooz/eSemZSvnJBW/3ts9fqB69hoGmWAWyXSYifpOyZGLAOWAXR1dUV3d3dFBcnlclR6bbOYrDr29OX5m7tXsm1wmPZ2Ma1j+1anZxz/3rq2JFr977HV6weuY6OpdbB4QdLs1KqYDbyY0vuBuUX55gAbUnr3mPRcDcppE3Dz6n62Dg4DMDgUnHDofrxn7kx3OZm1oFpvfrQCKMxoOhO4tSj9j9KsqEXAy6m76k7gBEmz0sD2CSnNGsDYJt4+e87wLCezFlW1loWk68laBftI6ieb1XQZcKOks4DngI+n7LcDJwO9wGvAZwEiYrOkS4AHU76LC4PdVntj95s4fP+3M71dbBsKprXLO9iZtbBqzoY6Y5xTHyiRN4Czx7nP1cDVk1g02wXFAeLi29busN/ERR8+3Et1mE0BjTLAbQ2oeDpsm8RwxA77TeRf28rZi+fXtZxmVn0OFjaum1f3j7QkiKCtTRDBMNv3m/BmRGZTg4OFjSheHRbghw89P9KS6Oho46I/PIz8a1tH7ZHtriezqcHBwoAdn8D+2FFzRhYAFHDqwjl88pgD61tIM6ubWk+dtQZV6HIau/d1u2DGtDbPdDKb4tyymIJ680Osva93pBuppy/PDUVdTu3t3vvazEZzsJhievry/M2DrzMYT40s+Hfz6n4Gh7Y/Ytf9jn2997WZjeJgMcWsXL+JbcPZ09fbBodZuX5TySexzcyKOVhMIT19eX655Te0t0HE6KmvP3roeT+JbWbjcrCYIkY9YAecfvSBfPSoOSPdTNcvfZ/HJ8xsXA4WLa7w7MSGLb9h62A22ymA/WfuPiooLDxoloOEmY3LwaJF9fTl+e6Pf8G9T75IRNCmbIkOyILFrLdMr2v5zKy5OFi0mJ6+PDet7ufGh54fNcMpikaxBeRf21r7wplZ03KwaCGFcYmR9ZyKtCl7fmJoaJh24TWdzGyXOFi0kFEL/xVpbxOXLDmcd/7Wnqxcv4kZW/o8PmFmu8TBogUUxifuWffC9oX/2sVx79yPffecMWrW08KDZpHL9devsGbWlOoSLCQ9C7wKDAGDEdElaS/gBmAe8CzwiYjISxLwbbKd9F4DPhMRq+tR7kZRvDrsU796lb/4t8cZGjMm8Ymuufz1R95dtzKaWWupZ8ticUS8VHR8PnBPRFwm6fx0/FXgJGBBeh0DXJnep6TrVj3HX9y6hqHhoKNdRDAqUEDW7eQH68xsMjVSN9QSsj27Aa4BcmTBYgnwg7T16kpJMyXNjoiNdSllnRS6mu5+YntX0+DYKEEWKC5ecrjHJMxsUilix184Vf9S6RkgTzbl/x8jYpmkLRExsyhPPiJmSboNuCwifpLS7wG+GhEPjbnnUmApQGdn58Lly5dXVLaBgQH22GOPiq6tht78ED/95Tbu/+XQDi0IyGY5RWRdT0fs187JB09j/qz2nd6z0epYDa1ex1avH7iO9bB48eKeiOgqda5eLYtjI2KDpP2AuyQ9uZO8KpG2w6/NiFgGLAPo6uqK7u7uigqWy+Wo9NrJ1tOX5xv3lJ4KC9kg9sUfPnyXd61rpDpWS6vXsdXrB65jo6lLsIiIDen9RUm3AEcDLxS6lyTNBl5M2fuBuUWXzwE21LTAddDTl+fi/7OW17cN73CuTXD8uzr5wn/9bXc3mVlN1DxYSHor0BYRr6bPJwAXAyuAM4HL0vut6ZIVwDmSlpMNbL/c6uMVPX15Tlv2s1FjEuNNhTUzq4V6tCw6gVuyGbF0ANdFxP+V9CBwo6SzgOeAj6f8t5NNm+0lmzr72doXufqKp8N+/Y51OwxeeyqsmdVTzYNFRKwH3lMifRPwgRLpAZxdg6LVTHFgKGxrOrJ8uGBwTM+Tp8KaWb010tTZKaE4MBS2NV25ftPI8uHDJUayL/FUWDOrs7Z6F2CqKazfNBzbtzVddMjeTO9oo13QMeZv5E9+/xA+ecyB9SmsmVnilkUN9fTl+eFDz49Mg21vbxvpiiq0MApLeNyxZiMnHT7bgcLMGoKDRY309OX51t0/ZzD1Mwk4deHoBf6KPztImFkjcbCogetWPceFaT2nIHtOYnpHmwetzaxpOFhU2XWrnuPP/+3xkYFrAcfO34fzjn+HB63NrGk4WEyi4imxADet7ueGB58fNcOpvU0OFGbWdBwsJknxlNiONoHEtsHRazq1Ca8Ia2ZNycHiTerpy3PT6n4eeGbzyDpO24YCiFGBoiMtHe6BazNrRg4Wb0KpNZwA2ttFm8TQ0DDtbeLjXXO9npOZNTUHiwoVVoUttQHRJ7rm8rGj5oxa0sPMrJk5WFTgstvXsew/1pdcmqMwJbb4uQkzs2bnYDFBhbGJnz79En2bX9vhfBtw/KHeY8LMWpODxQRcdvs6vnv/+pLnBHzymAM9JmFmLc3Boozzlj/Mvz0y/sZ8X/j9Qzj/5HfVsERmZrXnYDGOnr48f37L46z71avj5jnliP0dKMxsSnCwGKM3P8QV3/1PHnw2P26e+fvtweeOPdjPTJjZlNE0wULSicC3gXbgexFx2WR/R09fnr9e9TrDvF7y/KGz9+SSU97tsQkzm3KaIlhIageuAP4A6AcelLQiIp6YzO+5eXU/w+OcO+WI/fnW6UdO5teZmTWNpggWwNFAb9q/G0nLgSXApAaLEo9NMGfmbvyPxQvc5WRmU1qzBIsDgOeLjvuBY4ozSFoKLAXo7Owkl8vt8pccwhDtCoZCCDhpXgef+J12+M16crnSU2eb0cDAQEV/Ps2k1evY6vUD17HRNEuwUIm0UQ2BiFgGLAPo6uqK7u7uXf6S7Ip7eGPmQS29TEcul6OSP59m0up1bPX6gevYaJolWPQDc4uO5wDjP/zwJsyf1U539/xq3NrMrGm11bsAE/QgsEDSwZKmA6cDK+pcJjOzKaMpWhYRMSjpHOBOsqmzV0fE2joXy8xsymiKYAEQEbcDt9e7HGZmU1GzdEOZmVkdOViYmVlZDhZmZlaWIko9t9zcJP0/oK/Cy/cBXprE4jQi17H5tXr9wHWsh4MiYt9SJ1oyWLwZkh6KiK56l6OaXMfm1+r1A9ex0bgbyszMynKwMDOzshwsdrSs3gWoAdex+bV6/cB1bCgeszAzs7LcsjAzs7IcLMzMrCwHiyKSTpT0lKReSefXuzyVknS1pBclrSlK20vSXZKeTu+zUrokfSfV+TFJR9Wv5BMjaa6k+yStk7RW0rkpvdXRQqEAAAXASURBVJXquJukByQ9mur4lyn9YEmrUh1vSKswI2lGOu5N5+fVs/wTJald0sOSbkvHrVa/ZyU9LukRSQ+ltKb8OXWwSIr2+T4JOBQ4Q9Kh9S1Vxb4PnDgm7XzgnohYANyTjiGr74L0WgpcWaMyvhmDwJci4l3AIuDs9HfVSnV8AzguIt4DHAGcKGkR8HXg8lTHPHBWyn8WkI+I+cDlKV8zOBdYV3TcavUDWBwRRxQ9T9GcP6cR4Vc2yP8+4M6i4wuAC+pdrjdRn3nAmqLjp4DZ6fNs4Kn0+R+BM0rla5YXcCvwB61aR+AtwGqyrYRfAjpS+sjPLNny/e9LnztSPtW77GXqNYfsl+VxwG1kO2K2TP1SWZ8F9hmT1pQ/p25ZbFdqn+8D6lSWauiMiI0A6X2/lN7U9U7dEUcCq2ixOqYumkeAF4G7gF8AWyJiMGUprsdIHdP5l4G9a1viXfYt4CvAcDrem9aqH2TbP/+7pB5JS1NaU/6cNs1+FjVQdp/vFtW09Za0B3ATcF5EvCKVqkqWtURaw9cxIoaAIyTNBG4B3lUqW3pvqjpK+hDwYkT0SOouJJfI2pT1K3JsRGyQtB9wl6Qnd5K3oevolsV2Ndvnu05ekDQbIL2/mNKbst6SppEFimsj4uaU3FJ1LIiILUCObHxmpqTCf/KK6zFSx3T+7cDm2pZ0lxwLfFjSs8Bysq6ob9E69QMgIjak9xfJAv7RNOnPqYPFdq2+z/cK4Mz0+Uyyfv5C+h+lmRiLgJcLTeRGpawJcRWwLiK+WXSqleq4b2pRIGl34HiygeD7gFNTtrF1LNT9VODeSB3fjSgiLoiIORExj+zf2r0R8SlapH4Akt4qac/CZ+AEYA3N+nNa70GTRnoBJwM/J+sb/rN6l+dN1ON6YCOwjex/K2eR9e/eAzyd3vdKeUU2C+wXwONAV73LP4H6vZ+sef4Y8Eh6ndxidfwvwMOpjmuAC1P6IcADQC/wQ2BGSt8tHfem84fUuw67UNdu4LZWq1+qy6PptbbwO6VZf0693IeZmZXlbigzMyvLwcLMzMpysDAzs7IcLMzMrCwHCzMzK8vBwqY0SZdLOq/o+E5J3ys6/ltJX3wT979I0pffbDkr+N4jJJ1c6++11uVgYVPdfwK/CyCpDdgHOKzo/O8CP53IjdLKxY3iCLJnT8wmhYOFTXU/JQULsiCxBnhV0ixJM8jWY3o4PVX7vyWtSfsTnAYgqVvZ3hrXkT1IhaQ/U7Yvyt3AO0t9qaROSbco26/iUUmFgPXF9B1rCi0eSfM0em+SL0u6KH3OSfq6sr0vfi7p99IKBBcDp6V9FE6b9D81m3K8kKBNaZEt8jYo6UCyoPEzspU+30e2suljEbFV0sfI/rf+HrLWx4OS7k+3ORo4PCKekbSQbPmKI8n+fa0Gekp89XeAH0fER1KLZI907WfJliIXsErSj8n2ddiZjog4OnU7fS0ijpd0IdkTwOdU9idjNppbFmbbWxeFYPGzouP/THneD1wfEUMR8QLwY+C96dwDEfFM+vx7wC0R8VpEvML464sdR9rcJt3z5fQdt0TEryNiALg53a+cwkKKPWT7mJhNOgcLs+3jFu8m64ZaSdayKB6vGHf9c+DXY44rXUNnvO8YZPS/1d3GnH8jvQ/h3gKrEgcLsywgfAjYnP6XvxmYSRYwfpby3E82BtAuaV/g98kWtBvrfuAjknZPK47+4TjfeQ/w32Fkk6O3pWtPkfSWtErpR4D/AF4A9pO0dxpH+dAE6vQqsOcE8plNiIOFWTYwvQ9Zi6I47eWIeCkd30K2AuyjwL3AVyLiV2NvFBGrgRvIVsK9ieyXfSnnAoslPU7WfXRYuvb7ZEFoFfC9iHg4IraRDVivItt+dGcb6BTcBxzqAW6bLF511szMynLLwszMynKwMDOzshwszMysLAcLMzMry8HCzMzKcrAwM7OyHCzMzKys/w8zfByNM89YXQAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "W, P, H, PPW = zip(*ph)\n", - "\n", - "def plot(xlabel, X, ylabel, Y): \n", - " plt.plot(X, Y, '.'); plt.xlabel(xlabel); plt.ylabel(ylabel); plt.grid(True)\n", - " \n", - "plot('Word count', W, 'Points', P)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "That was somewhat surprising to me; usually a Pareto frontier looks like a quarter-circle; here it looks like an almost straight line. Maybe we can get a better view by plotting word counts versus the number of points per word:" - ] - }, - { - "cell_type": "code", - "execution_count": 47, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAfeklEQVR4nO3da5Qc5X3n8e+/ZyRuwjBIIAsLRgwi2KAYrBEwNjiRbOwDXvAF7GPL7C6Lwcrm2Als1ktwvAcTXsWOHbxOSHwBG28ihA2CxdGGY24SEMwI1AJZozUXIRh5EEYgRoC4aC793xdV1eppdc9U90x1T1f9PufMma7qSz1PT8+vnn7qqafM3RERkezINbsAIiLSWAp+EZGMUfCLiGSMgl9EJGMU/CIiGdPe7ALEMWfOHF+wYEFdz33zzTc55JBDprZA00ja6weqY1qojo2Xz+dfcfcjy9e3RPAvWLCADRs21PXcdevWsXTp0qkt0DSS9vqB6pgWqmPjmVl/pfXq6hERyRgFv4hIxij4RUQyRsEvIpIxCn4RkYxR8IuIZEyqgz/fP8iaZ4fI9w82uygiItNGaoM/3z/IRTf0svqZYS66oVfhLyISSm3w927bxdBIAQeGRwr0btvV7CKJiEwLqQ3+nq7ZzGzPkQNmtOfo6Zrd7CKJiEwLqQ3+7s4OVl7WwwUnzGDlZT10d3Y0u0giItNCS8zVU6/uzg7eOH6mQl9EpERqW/wiIlKZgl9EJGMU/CIiGaPgFxHJGAW/iEjGKPhFRDJGwS8ikjEKfhGRjFHwi4hkjIJfRCRjFPwiIhmj4BcRyRgFv4hIxij4RUQyRsEvIpIxCn4RkYxR8IuIZExiwW9mPzGznWbWV+G+r5mZm9mcpLYvIiKVJdnivwk4p3ylmR0DfAzYnuC2RUSkisSC390fBF6tcNd1wJWAJ7VtERGpztyTy18zWwCscfdF4fIngY+6++Vm9jywxN1fqfLcFcAKgLlz53bfcsstdZVhz549zJo1q67ntoK01w9Ux7RQHRtv2bJleXdfst8d7p7YD7AA6AtvHwysBw4Ll58H5sR5ne7ubq/X2rVr635uK0h7/dxVx7RQHRsP2OAVMrWRo3qOB44DNoWt/fnARjN7dwPLICKSee2N2pC7bwaOipYn6uoREZFkJDmccxXwCHCimQ2Y2aVJbUtEROJLrMXv7ssnuH9BUtsWEZHqdOauiEjGKPhFRDJGwS8ikjEKfhGRjFHwi4hkjIJfRCRjFPwiIhmj4BcRyRgFv4hIxij4RUQyRsEvIpIxCn4RkYxR8IuIZIyCX0QkYxT8IiIZk/rg3zo4yvVrt5LvH2x2UUREpoWGXXqxGfL9g3z7sXcY8aeY2Z5j5WU9dHd2NLtYIiJNleoWf++2XQwXoOAwPFKgd9uuZhdJRKTpUh38PV2zmZGDNoMZ7Tl6umY3u0giIk2X6q6e7s4OrjztQPYe3klP12x184iIkPLgB1jY0cbSpQubXQwRkWkj1V09IiKyPwW/iEjGKPhFRDJGwS8ikjEKfhGRjFHwi4hkjIJfRCRjFPwiIhmj4BcRyRgFv4hIxij4RUQyRsEvIpIxVSdpM7PNgFe7393fn0iJREQkUePNznle+Psr4e9/Dn9fBLyVWIlERCRRVYPf3fsBzOxMdz+z5K6rzOxh4NqkCyciIlMvTh//IWZ2VrRgZh8CDkmuSCIikqQ4F2L5EvBTMzuMoM//tXDduMzsJwTdRTvdfVG47m+B84Eh4FngEnffXWfZRUSkDuO2+M0sByx091OA9wOnuvup7r4xxmvfBJxTtu4eYFF4YPhp4Ou1F1lERCZj3OB39wLw1fD26+7+WtwXdvcHgVfL1t3t7iPhYi8wv7biiojIZMXp47/HzL5mZseY2RHRzxRs+0vAXVPwOiIiUgNzrzpUP3iA2XMVVru7d0344mYLgDVRH3/J+m8AS4ALvEoBzGwFsAJg7ty53bfccstEm6toz549zJo1q67ntoK01w9Ux7RQHRtv2bJleXdfst8d7p7YD7AA6CtbdzHwCHBw3Nfp7u72eq1du7bu57aCtNfPXXVMC9Wx8YANXiFTJxzVY2YzgD8F/ihctQ74obsP17r3MbNzgL8E/tjddRKYiEgTxOnj/yegG/jH8Kc7XDcuM1tF0LI/0cwGzOxS4B+AQwmOGzxhZj+ou+QiIlKXOOP4T/NgOGfkfjPbNNGT3H15hdU3xi6ZiIgkIk6Lf9TMjo8WzKwLGE2uSCIikqQ4Lf7/Aaw1s22AAZ3AJYmWSkREEjNh8Lv7fWZ2AnAiQfA/6e57Ey+ZiIgkIs6onoeAB4GHgIcV+iIirS1OH//FwFPAhcCvzWyDmV2XbLFERCQpcbp6tpnZ2wQzag4By4D3JV0wERFJxoQtfjN7Fvg/wFyC4ZiL3L181k0REWkRcbp6vg9sB5YDfw5cXDq8U0REWsuEwe/u/8vdPwecDeSBawjm0hcRkRYUZ1TPd4GzgFkEUzBcTTDCR0REWlCcE7h6gW+7+0tJF0ZERJIXZ1TPrY0oiIiINEacg7siIpIiCn4RkYwZN/jNLGdmfY0qjIiIJG/c4Hf3ArDJzI5tUHlERCRhcUb1zAO2mNmjwJvRSnf/ZGKlEhGRxMQJ/r9OvBQiItIwcYZzPmBmncAJ7n6vmR0MtCVfNBERSUKcSdq+DNwG/DBc9R6CSdtERKQFxRnO+RXgTOB1AHd/BjgqyUKJiEhy4gT/XncfihbMrB3w5IokIiJJihP8D5jZXwEHmdnHgFuBf022WCIikpQ4wX8V8DKwGfgT4N+A/5lkoUREJDlxRvUUzOxnwHqCLp6n3F1dPSIiLSrOfPz/AfgB8CxgwHFm9ifuflfShRMRkakX5wSu7wLL3H0rQHjZxf8LKPhFRFpQnD7+nVHoh7YBOxMqj4iIJCxOi3+Lmf0b8AuCPv7PAY+Z2QUA7n57guUTEZEpFif4DwReAv44XH4ZOAI4n2BHoOAXEWkhcUb1XNKIgoiISGPoClwiIhmT+uDfOjjK9Wu3ku8fbHZRRESmhTh9/C0r3z/Itx97hxF/ipntOVZe1kN3Z0eziyUi0lRxpmW+3MzeZYEbzWyjmX28EYWbrN5tuxguQMFheKRA77ZdzS6SiEjTxenq+ZK7vw58HDgSuAT4m0RLNUV6umYzIwdtBjPac/R0zW52kUREmi5OV4+Fvz8B/NTdN5mZjfeE6aK7s4MrTzuQvYd30tM1W908IiLEC/68md0NHAd83cwOBQrJFmvqLOxoY+nShc0uhojItBGnq+dSgqmZT3P3t4CZBN094zKzn5jZTjPrK1l3hJndY2bPhL/VBBcRabA4wX+Pu290990A7r4LuC7G824CzilbdxVwn7ufANwXLouISANVDX4zO9DMjgDmmFlH2Fo/wswWAEdP9MLu/iDwatnqTwE/C2//DPh0XaUWEZG6WbVrqpjZ5cAVBCH/AvsO8r4O/Njd/2HCFw92EmvcfVG4vNvdDy+5f9DdK3b3mNkKYAXA3Llzu2+55ZaYVRprz549zJo1q67ntoK01w9Ux7RQHRtv2bJleXdfst8d7j7uD/BnEz1mnOcuAPpKlneX3T8Y53W6u7u9XmvXrq37ua0g7fVzVx3TQnVsPGCDV8jUOJO0/b2ZfSgM8faS9f+7jh3QS2Y2z91fNLN5aF5/EZGGi3PpxX8GjgeeAEbD1Q7UE/y/BC4mOAHsYuDOOl5DREQmIc44/iXASeHXhtjMbBWwlODg8ADwTYLA/4WZXQpsJ7ioi4iINFCc4O8D3g28WMsLu/vyKnd9tJbXERGRqRUn+OcA/8/MHgX2Rivd/ZOJlUpERBITJ/ivSboQIiLSOHFG9TzQiIKIiEhjVA1+M/t3dz/LzN4gGMVTvAtwd39X4qUTEZEpVzX43f2s8PehjSuOiIgkLdalF83sFODD4eKD7v6b5IokIiJJinXpRWAlcFT4s9LM/izpgomISDLitPgvBc5w9zcBzOxbwCPA3ydZMBERSUac+fiNfVM1EN5uiUsviojI/uK0+H8KrDezO8LlTwM3JlckERFJUpxx/H9nZuuAswha+pe4++NJF0xERJIx3jj+A4H/CiwENgP/6O4jjSqYiIgkY7w+/p8RzMy5GTgX+E5DSiQiIokar6vnJHf/QwAzuxF4tDFFEhGRJI3X4h+ObqiLR0QkPcZr8Z9iZq+Htw04KFzWXD0iIi1svLl62hpZkCTl+wfp3baLnq7ZdHd2NLs4IiJNFWuunla2dXCU79zXy9BIgZntOVZe1qPwF5FMi3Pmbkt78tVRhkYKFByGRwr0btvV7CKJiDRV6oP/vUe0MbM9R5tBW87Ysftt8v2DzS6WiEjTpD74F3a0sfKyHj5/+rFgxqpHt3PRDb0KfxHJrNQHP0B3ZwfvOfwgRkbV5SMikongB+jpml3s8pnRnqOna3aziyQi0hSpH9UT6e7sYOVlPRrWKSKZl5nghyD8FfgiknWZ6eoREZGAgl9EJGMU/CIiGaPgFxHJGAW/iEjGZGpUT75/kNUbBzDggsXzNcJHRDIpM8Gf7x9k+Y8eYWjUAbg1P8CqL2umThHJnsx09dy+caAY+gBDIwWu/dctk56zJ98/yPVrt2ruHxFpGZlo8ef7B7l1w+/2W79p4DWW/7iXa84/mcG3hmo+ozffP8hFN2iufxFpLZkI/t5tuxgpeMX7hkYKXH1nHwX3msO7d9uu/eb6V/CLyHSXia6e0gnaZrYZ7W1WvK8tZxTc65q1UxO/iUgrykSLv3yCNqA4uufkow/j2jVbGB4p1BzemvhNRFpRJoIf9p+grfT2ie8+tO7w1sRvItJqmhL8ZvbfgMsABzYDl7j7O40uR75/sBj4X1m2sNGbFxFpioYHv5m9B/hz4CR3f9vMfgF8AbgpqW2WBnzUOq82Iid6bMfBM+sa6SMiMt01q6unHTjIzIaBg4EdSW1o6+Ao37lv/4CvNCIH4KIbetk7XMCBnKFhmiKSOg0Pfnd/wcy+A2wH3gbudve7yx9nZiuAFQBz585l3bp1dW1v0+/fZu+w4cDQcIFV9z7GG8fP5IDdo7QbjDi0GRywu59V924rhj5Awcc+Zzras2dP3e9Nq1Ad00F1nD6a0dXTAXwKOA7YDdxqZv/R3f+l9HHu/iPgRwBLlizxpUuX1rW9rYP3cc+OoeKoneVnn0Z3ZwdLgQ8sHtsFlO8fZM3zvQwNFyiwr8UfPWc6WrduHfW+N61CdUwH1XH6aEZXz9nAc+7+MoCZ3Q58CPiXcZ9Vp4UdbVWHXFYa6RM9Vn38IpJWzQj+7UCPmR1M0NXzUWBDkhssD/hKB3vHWy8ikibN6ONfb2a3ARuBEeBxwi6dRrh5/faKUzRo3h0RyYqmTNng7t909/e6+yJ3/0/uvrcR2833D3L1nX2MFIIpGoZKRvNUG+UjItOfZsmtTWbO3IUg3EdLJmszYMfut8n3Dxbn3aln6gYRmdjWwVG2rN065V2p+rZeu0wFf0/XbA6YkWNouIAZ5HLGqke3s3rjACsv6xlzYDdq8esDJDJ5+f5Bvv3YO4z4U1Mezpolt3aZCv7SUTs7dr/Nqke3j/mwRNM2qPUgMrV6t+1iuBDM0TLV4axv67XLVPDDvhE+0fV3h0cKtOWs2OVT2noYmuADqlFAIvH0dM1mRg5GfeqnMNcsubXLXPBHog/L6o0D3JYfKHb5/JcPLiA6DFBw6Di48hm76lcUia+7s4MrTzuQvYd3JhLOmiW3Npm4EEs13Z0dvOfwgxgZ3dc/uOXF14ku05IDBt8aqvhcjQISqc3Cjja+smyhAnoayHTww/5X0Tp30TwOmBFerWtG9a+kuvqWiLSqzHb1RCr1D8a5MIv6FUUkjul4LDDzwQ+V5+yJ8wdSv6JI62pEIE/XY4GZD/7puDcWkWQ1KpCn6zkGmQ7+fP8gy3/cWxz/u+rLE//xJ7ujSGJH06o7L13tTJqlUYE82XMMkvrfznTw375xgKGRAhCM2b9940DFN7c0oK5ds6XuVkISrYxqVxib7qL3opFXO2vVHaRMvUad9DWZY4FJfivJdPD7BMsw9s3PmVFwr7uVkEQr48lXR6fNV8lagjV6L0qvdpZk+adrX6s0RyMHZ9R7LDDJbyWZDv4LF8/ntg2/Y3jUmdFmXLh4/n6PKX3zcSeXMwyvq5VQrZUxmZboe49oY2b7aMWWy2Ret9oF6qu9Xi3Bmu8f5IXdb9PelmNkZN/VzpJseZX/E63eOFCsS3S/vglky3QfnJHkt5JMB393ZwerVnxw3H/68jf/6vNOrrs/ulIrY7It0WpXGJvM61Z6Low/h1Hc1knpa7fnjOVnHMvJRx+WeB9/6d+xLWfclh9gZDQoA2aMjI6tq3YE0mxJfivJdPDDxHv9qX7zo+3dvH4737v3aQ6a0VY1MOO22CvVoVoQT/Sa+f5Bvnfv0xXPSh6vnFELfnR0X+uk0rZKyzVacI4+/CC+eMaxk3pP46g6Qd+oA16cPGz1xoHisZ/2nPG5JcdwweL52gFIUyT1rSTzwR9HPW/+eAF78/rt/NUdm4vLM9oMK/h+3T+T+SZQ6WtieWu7PNQqHXAtLVNpizma1A4Y85pfOD1owUdzIJW2pLs7OxpyUK3ae19tgj7MijssY98ObmjUuXn9vmm7a/0M3Lx+O3f1vci5i+bFOilwMnTgWmqh4E/ARKF9V9+LYx5/0rx38fGT3121ZVzPgZ1K31SuX7t13FArPeCaA85cOIcrzv6D4nYrTWp34eL5Y1rwDly7Zktx5wGwdzhoSUfBW+s3qFou4BFnh1lehuj9jm6v3jhQLH+90wiX7twfeuYVZrQZowVP5MBy0geutVNJHwV/AiYK7XMXzeOhZ14pLn/+tGP36+4o75OOWti1hn/p46PXrBZq5a3x0tCPXq93264xk9o5jHlO1GIuHSHlwG35YCcRlamW4w21XMAjbhdXpbO1I6U7uNKuq9IyTRSE5Tv3oEspmZFLSY7+0GiodFLwJ2Ci7owo5H/+2HbmvutATnz3ofu9RtQqLW9hxz24XC2cLlg8n1fe2Mu6p1+uGGoXLJ6Phb/jHOy+cPF8Llw8f78W83B4fkSYd4yO1hdItV7AY6IurjjhFe0USutV6aD5eMcAynfuM9qMQoXuvNL3LY5Kf9cku8+m65mnMjkK/gRUG71TPhHcUy+9weYXXuPBZ16u2iVR2sIeGi5w9Z19FDzoMrj6vJPZ+OwQhx439ptAnFE515w/dgdS/pwLKgxtrVa3aH2k9BKW167ZMqlAqvUCHpXK9407Nhe/5dQSXhMdNB/vGEC0c6/Wx1/+fn9t8UyWTlCeajuwyQxAmOjbi65ulU4K/jpN9A9TGhr5/kGW/+iR4vkC0RDSKEBK+8DLX7/j4JnFfzwrOYEs2gmMFpw1z/eOCZ7bS/qoq43KGXxrqHipSaitZRdnJFR0/2QPanZ31n4Bj/L3/tYNvyt2PbW1TS68erpm096WK57xXb4zKZ+GorS7bLzhr0++Olpxe6Wfs/H+RvUOQKj1eIha++mg4K9DrV0HqzcOMBT2eQyNevGgaHvOGBr1/frAy18/6t4pbUFHO4FKwVMt6Kq13KoNx5wKUzEcbWFHG0uXLtxvfZy+9t5tuxgJL6lmwGe7p2Bopu87glE68unm9duLO+OJpqEob0m/94i2ivUr/xxMZes77s4+qSGF0jwK/jrU2u9pFZa7Ozv43JJjuHn9dpygDzw6m3TH7rcrts7z/YPFPviTjz4smDdoeGwIjBd0E53oFQ3HTHrc+lSMEikdetqWM6791KKK5wNUOiYxGaXvb+nIJ4Cr7+wr3gfjT0NR3pJ+47lNFbdV/jmYyta3unGyS8Ffh1r+YfL9gzjBwb2R0eDgXtR/fsHi+WPGk5eeTdqWM3zUiy328tbfyUcfxgWL5/PiCzv46vmnVz3QVxp0E/VZRydUJR36UzFKpHfbrmJ31kjBufrOvuJB8vLRO0mGZdSVc/3arYyWhD5MPA1F6d9j3XMTb6u0TlNB3TjZpeCvQ9x/mCjk3hkuYMBpCzr4y3PfN6ZftvrZpKGwW2HMQcWSg7ztBl+to2yRRrf6pmqUSE/XbNpyVmxhF9z5wQPPsvbJncWD36UHP5MOy56u2RwwI8fQcIFczrjsrOM49KAZkwrURgRz0t049YxckuQp+GtU+kEuPThaSe+2XbwzvO8g4KPPD/LU79/Yb/x41De/euMAQ8PBpGWRkYIXt1fpIO+Is1941vLP3OhW31TtaLo7O7j2U4v27QBzxv1P7iy2uocSHHpY6f1N6n1s5f71ekYuSWMo+GtQazdFT9dschb09Ubu6nuxYl90FBzfu/dp/v2ZV4oHZ3NmxSCpNEyyzZh0a6qR4TKVAfnFM44tjhrasfttbl6/vXhf9L41UiuHdBLijlySxlPw16DWboruzg5WfLiLHzy4rbju3EXzxn38FWf/AY89/2qxy+DaTy2qOGQvCrwDdve3XNhMdddL+Tem8vdNmiPOyCVpDgV/DerpprjqE+/j2NmHFE/kmWgmyrgt4ijw1q0bqKsuaaMDldNPnJFL0hwK/hrUGy5fPGP/uXgm2o6Cq3Z636afiUYuSXMo+GukcBGRVpdrdgFERKSxFPwiIhmj4BcRyRgFv4hIxij4RUQyRsEvIpIx5u4TP6rJzOxloL/Op88BXpnwUa0r7fUD1TEtVMfG63T3I8tXtkTwT4aZbXD3Jc0uR1LSXj9QHdNCdZw+1NUjIpIxCn4RkYzJQvD/qNkFSFja6weqY1qojtNE6vv4RURkrCy0+EVEpISCX0QkY1Ib/GZ2jpk9ZWZbzeyqZpenXmb2EzPbaWZ9JeuOMLN7zOyZ8HdHuN7M7PthnX9jZoubV/J4zOwYM1trZr81sy1mdnm4Pk11PNDMHjWzTWEd/zpcf5yZrQ/r+HMzmxmuPyBc3hrev6CZ5a+FmbWZ2eNmtiZcTlUdzex5M9tsZk+Y2YZwXct9VlMZ/GbWBlwPnAucBCw3s5OaW6q63QScU7buKuA+dz8BuC9chqC+J4Q/K4B/alAZJ2ME+O/u/j6gB/hK+LdKUx33Ah9x91OAU4FzzKwH+BZwXVjHQeDS8PGXAoPuvhC4Lnxcq7gc+G3JchrruMzdTy0Zr996n1V3T90P8EHgVyXLXwe+3uxyTaI+C4C+kuWngHnh7XnAU+HtHwLLKz2uVX6AO4GPpbWOwMHARuAMgjM828P1xc8s8Cvgg+Ht9vBx1uyyx6jbfILg+wiwBrAU1vF5YE7Zupb7rKayxQ+8B/hdyfJAuC4t5rr7iwDh76PC9S1d7/Dr/geA9aSsjmEXyBPATuAe4Flgt7uPhA8prUexjuH9rwETX+C5+b4HXAkUwuXZpK+ODtxtZnkzWxGua7nPalovvWgV1mVh3GrL1tvMZgGrgSvc/XWzSlUJHlph3bSvo7uPAqea2eHAHcD7Kj0s/N1ydTSz84Cd7p43s6XR6goPbdk6hs509x1mdhRwj5k9Oc5jp20d09riHwCOKVmeD+xoUlmS8JKZzQMIf+8M17dkvc1sBkHor3T328PVqapjxN13A+sIjmccbmZR46u0HsU6hvcfBrza2JLW7Ezgk2b2PHALQXfP90hXHXH3HeHvnQQ78NNpwc9qWoP/MeCEcETBTOALwC+bXKap9Evg4vD2xQT94tH6/xyOJugBXou+gk5XFjTtbwR+6+5/V3JXmup4ZNjSx8wOAs4mOAC6Fvhs+LDyOkZ1/yxwv4edxNOVu3/d3ee7+wKC/7f73f0iUlRHMzvEzA6NbgMfB/poxc9qsw8yJHgQ5hPA0wR9qd9odnkmUY9VwIvAMEEL4lKCvtD7gGfC30eEjzWC0UzPApuBJc0uf4z6nUXw9fc3wBPhzydSVsf3A4+HdewDrg7XdwGPAluBW4EDwvUHhstbw/u7ml2HGuu7FFiTtjqGddkU/myJcqUVP6uaskFEJGPS2tUjIiJVKPhFRDJGwS8ikjEKfhGRjFHwi4hkjIJfUsXMrjOzK0qWf2VmN5Qsf9fM/mISr3+NmX1tsuWsY7unmtknGr1dSScFv6TNr4EPAZhZDpgDnFxy/4eAh+O8UDjL63RxKsH5DSKTpuCXtHmYMPgJAr8PeMPMOszsAII5ch4Pz6b8WzPrC+dX/zyAmS214PoANxOcdIOZfcOCazvcC5xYaaNmNtfM7rBgzv1NZhbtfP4i3EZf9E3EzBbY2OsrfM3MrglvrzOzb1kwf//TZvbh8Ozza4HPh/PAf37K3zXJlLRO0iYZ5cEEWiNmdizBDuARghkRP0gwA+Rv3H3IzC4kaEWfQvCt4DEzezB8mdOBRe7+nJl1E0xB8AGC/5eNQL7Cpr8PPODunwm/KcwKn3sJwRTMBqw3swcI5qUfT7u7nx527XzT3c82s6sJzvz8an3vjMg+avFLGkWt/ij4HylZ/nX4mLOAVe4+6u4vAQ8Ap4X3Peruz4W3Pwzc4e5vufvrVJ/z6SOEF9oIX/O1cBt3uPub7r4HuD18vYlEE9XlCa7FIDKlFPySRlE//x8SdPX0ErT4S/v3q877DLxZtlzvvCbVtjHC2P+9A8vu3xv+HkXfyiUBCn5Jo4eB84BXw9b3q8DhBOH/SPiYBwn6zNvM7EjgjwgmCyv3IPAZMzsonJnx/CrbvA/4UyhedOVd4XM/bWYHh7M5fgZ4CHgJOMrMZofHHc6LUac3gENjPE5kQgp+SaPNBP32vWXrXnP3V8LlOwhmy9wE3A9c6e6/L38hd98I/Jxg1tDVBMFdyeXAMjPbTNBFc3L43JsIdijrgRvc/XF3HyY4WLue4BKF413MI7IWOEkHd2UqaHZOEZGMUYtfRCRjFPwiIhmj4BcRyRgFv4hIxij4RUQyRsEvIpIxCn4RkYz5/7fLLBKYZ+BaAAAAAElFTkSuQmCC\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plot('Word count', W, 'Points per word', PPW)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "So the highest points per word are for honeycombs with very few words. We can see all the Pareto optimal honeycombs that score more than, say, 7.6 points per word:" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[(1, 15, Honeycomb(letters='BIMNRUV', center='V'), 15.0),\n", - " (2, 26, Honeycomb(letters='DHNORTX', center='X'), 13.0),\n", - " (3, 31, Honeycomb(letters='CILMOQU', center='Q'), 10.33),\n", - " (4, 32, Honeycomb(letters='BGINOUX', center='X'), 8.0),\n", - " (5, 45, Honeycomb(letters='CEGIPTX', center='G'), 9.0),\n", - " (6, 50, Honeycomb(letters='DELNPUZ', center='Z'), 8.33),\n", - " (7, 62, Honeycomb(letters='BGILNOX', center='X'), 8.86),\n", - " (8, 67, Honeycomb(letters='DGINOXZ', center='X'), 8.38),\n", - " (9, 70, Honeycomb(letters='EFNQRTU', center='Q'), 7.78),\n", - " (10, 84, Honeycomb(letters='CENOQRU', center='Q'), 8.4),\n", - " (11, 86, Honeycomb(letters='GINOTUV', center='V'), 7.82),\n", - " (12, 100, Honeycomb(letters='GILMNUZ', center='Z'), 8.33),\n", - " (13, 108, Honeycomb(letters='GINOQTU', center='Q'), 8.31),\n", - " (14, 113, Honeycomb(letters='CINOTXY', center='X'), 8.07),\n", - " (15, 115, Honeycomb(letters='DGINOXZ', center='Z'), 7.67),\n", - " (19, 157, Honeycomb(letters='DEIORXZ', center='X'), 8.26),\n", - " (22, 172, Honeycomb(letters='DEGINPZ', center='Z'), 7.82),\n", - " (23, 184, Honeycomb(letters='ACELQRU', center='Q'), 8.0),\n", - " (26, 198, Honeycomb(letters='AILNOTZ', center='Z'), 7.62),\n", - " (28, 224, Honeycomb(letters='DEGINRZ', center='Z'), 8.0),\n", - " (45, 374, Honeycomb(letters='ACINOTV', center='V'), 8.31),\n", - " (403, 3095, Honeycomb(letters='AEGINRT', center='G'), 7.68),\n", - " (442, 3406, Honeycomb(letters='AEGINRT', center='I'), 7.71)]" - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "[entry for entry in ph if entry[-1] > 7.6]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The last two of these represent our old optimal honeycomb, with two different centers." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here are reports on what I think are the most interesting high-score/few-words honeycombs:" - ] - }, - { - "cell_type": "code", - "execution_count": 49, + "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Honeycomb(letters='CEGIPTX', center='G') scores 45 points on 5 words from a 44585 word list:\n", - "\n", - "CEGIPTX 17 points 1 pangram EPEXEGETIC(17)\n", - " CEGITX 8 points 1 word EXEGETIC(8)\n", - " CEGIP 7 points 1 word EPIGEIC(7)\n", - " EGIP 6 points 1 word PIGGIE(6)\n", - " EGTX 7 points 1 word EXEGETE(7)\n" - ] - } - ], - "source": [ - "report(Honeycomb('CEGIPTX', 'G'))" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Honeycomb(letters='DEIORXZ', center='X') scores 157 points on 19 words from a 44585 word list:\n", - "\n", - "DEIORXZ 65 points 4 pangrams DEOXIDIZER(17) OXIDIZER(15) REOXIDIZE(16) REOXIDIZED(17)\n", - " DEIOXZ 34 points 4 words DEOXIDIZE(9) DEOXIDIZED(10) OXIDIZE(7) OXIDIZED(8)\n", - " DEIOX 23 points 4 words DIOXIDE(7) DOXIE(5) EXODOI(6) OXIDE(5)\n", - " DEORX 12 points 2 words REDOX(5) XEROXED(7)\n", - " DEIX 5 points 1 word DEXIE(5)\n", - " DIOX 13 points 3 words DIOXID(6) IXODID(6) OXID(1)\n", - " EORX 5 points 1 word XEROX(5)\n" - ] - } - ], - "source": [ - "report(Honeycomb('DEIORXZ', 'X'))" - ] - }, - { - "cell_type": "code", - "execution_count": 51, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Honeycomb(letters='ACINOTV', center='V') scores 374 points on 45 words from a 44585 word list:\n", - "\n", - "ACINOTV 171 points 10 pangrams ACTIVATION(17) AVOCATION(16) CAVITATION(17) CONVOCATION(18) INACTIVATION(19)\n", - " INVOCATION(17) VACATION(15) VACCINATION(18) VATICINATION(19) VOCATION(15)\n", - " ACINOV 7 points 1 word AVIONIC(7)\n", - " ACINTV 8 points 1 word CAVATINA(8)\n", - " AINOTV 62 points 7 words AVIATION(8) INNOVATION(10) INVITATION(10) NOVATION(8) OVATION(7) TITIVATION(10)\n", - " VITIATION(9)\n", - " CINOTV 17 points 2 words CONVICT(7) CONVICTION(10)\n", - " ACINV 20 points 3 words VACCINA(7) VACCINIA(8) VINCA(5)\n", - " ACITV 24 points 4 words ATAVIC(6) VATIC(5) VIATIC(6) VIATICA(7)\n", - " ACNTV 6 points 1 word VACANT(6)\n", - " ACOTV 6 points 1 word OCTAVO(6)\n", - " AINOV 5 points 1 word AVION(5)\n", - " CINOV 11 points 2 words COVIN(5) OVONIC(6)\n", - " AINV 7 points 3 words AVIAN(5) VAIN(1) VINA(1)\n", - " AITV 6 points 2 words VITA(1) VITTA(5)\n", - " ANOV 1 point 1 word NOVA(1)\n", - " ANTV 5 points 1 word AVANT(5)\n", - " AOTV 6 points 1 word OTTAVA(6)\n", - " CINV 5 points 1 word VINIC(5)\n", - " INOV 1 point 1 word VINO(1)\n", - " AIV 1 point 1 word VIVA(1)\n", - " CIV 5 points 1 word CIVIC(5)\n" - ] - } - ], - "source": [ - "report(Honeycomb('ACINOTV', 'V'))" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Honeycomb(letters='ACINOTU', center='U') scores 385 points on 55 words from a 44585 word list:\n", - "\n", - "ACINOTU 162 points 10 pangrams ACTUATION(16) ANNUNCIATION(19) AUCTION(14) CAUTION(14) CONTINUA(15)\n", - " CONTINUANT(17) CONTINUATION(19) COUNTIAN(15) CUNCTATION(17) INCAUTION(16)\n", - " ACINTU 6 points 1 word TUNICA(6)\n", - " ACNOTU 31 points 4 words ACCOUNT(7) ACCOUNTANT(10) COCOANUT(8) TOUCAN(6)\n", - " AINOTU 17 points 2 words ANTIUNION(9) NUTATION(8)\n", - " CINOTU 24 points 3 words CONTINUO(8) INUNCTION(9) UNCTION(7)\n", - " ACINU 5 points 1 word UNCIA(5)\n", - " ACOTU 6 points 1 word OUTACT(6)\n", - " AINTU 9 points 1 word ANNUITANT(9)\n", - " CINOU 13 points 2 words INCONNU(7) NUNCIO(6)\n", - " CINTU 10 points 2 words CUTIN(5) TUNIC(5)\n", - " CNOTU 20 points 3 words COCONUT(7) COUNT(5) OUTCOUNT(8)\n", - " INOTU 16 points 2 words INTUITION(9) TUITION(7)\n", - " AINU 1 point 1 word UNAI(1)\n", - " ANTU 13 points 4 words AUNT(1) NUTANT(6) TAUNT(5) TUNA(1)\n", - " AOTU 1 point 1 word AUTO(1)\n", - " CINU 7 points 2 words UNCI(1) UNCINI(6)\n", - " CNOU 1 point 1 word UNCO(1)\n", - " CNTU 6 points 2 words CUNT(1) UNCUT(5)\n", - " COTU 6 points 1 word CUTOUT(6)\n", - " INOU 13 points 2 words NONUNION(8) UNION(5)\n", - " INTU 7 points 2 words INTUIT(6) UNIT(1)\n", - " NOTU 1 point 1 word UNTO(1)\n", - " ANU 1 point 1 word UNAU(1)\n", - " ATU 1 point 1 word TAUT(1)\n", - " ITU 5 points 1 word TUTTI(5)\n", - " NOU 1 point 1 word NOUN(1)\n", - " OTU 1 point 1 word TOUT(1)\n", - " TU 1 point 1 word TUTU(1)\n" - ] - } - ], - "source": [ - "report(Honeycomb('ACINOTU', 'U'))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Step 11: NY Times Archives\n", - "\n", - "What do the official honeycombs in the NY Times look like? I looked for an archive of past puzzles and found a nice [github repository](https://github.com/philshem/scrape_bee) by [Philip Shemella](https://smalldata.dev/#about), from which I extracted the following honeycombs, where the first letter is the center:" - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "226" - ] - }, - "execution_count": 53, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "nyt_archive = '''\n", - "ABCGILO ABGNORZ ACDIMNY ACEGHNX ACEHIMN ACINTVY ACIPTVY ACLNTVY ADEGIKN ADFLMPU ADFPRTU ADHILNR ADMOPRU AEFHLTU \n", - "AEGHLNO AEHLPUV BACILMY BACLNOR BAELNOP BAGHINT BAILNRT BEFILNO BEFILNX BEGIKNP BILMOTY CABEILT CABJKOT CADGINR \n", - "CADHINP CAEGHIN CAGILNW CAHIMRT CAINOTZ CAIPTVY CDEKNTU CGHINOP CHIKMNU CINOTUY DAGINPT DAGINTU DAKORWY DAMNORT \n", - "DBILNOW DCENOVY DHILNOW DIMORTY EABCHLW EABGMTY EABKLNO EACGHLN EACNTUX EAGHIMT EAHILVY EBLOPTY ECDLMOY ECFHILY \n", - "ECFLNOU ECHNTUY ECLMOPX ECNOPTY EDFLOUY EFGLNUV EGHILNW FAEGLOP FAELMOT GDEILTU GFHLOTU HACEMNT HACGILR HACILNR \n", - "HACLNTU HADGILN HAELPTY HCDEIKL HDEGITW IABCFKL IABEMNT IADFLNW IADLNTW IADLPTU IAEMNTY IAFNRTY IAJMNRU IALOTVY \n", - "ICDEFNO ICENOTV ICFNORT ICFNOTU ICGHLOR ICGLORW ICHKOPT ICLMPTY ICLNORY IDGNOXZ IEGHLOP IEHLNOT KAEGINP KCEHINT \n", - "LABCKOW LABIMOX LACENOW LACFINU LACHINO LACIKOT LACIMOY LADFMPU LADGRUY LAEIJNV LAFMORU LAGINOZ LAHIRTY LAIRUXY \n", - "LBCENOV LBEFINX LBEFIXY LDHINOP LDIQTUY LEHMNOT LEIPTVX LEMNOTU LENOPTU LFGHOTU LGHMOTY LGIMOXY LHIOPRW MABDINR \n", - "MABEKNT MACILNT MADILOR MAHNOPT MAILNPT MAILORT MALNOWY MFLNORU MILPTUY NABDHKO NACDHIP NACIMTY NADFILW NADHIOT \n", - "NALMOPR NALOPRU NBEILMT NCFIMOR NDOPRUW NMOPRTY OACELTY OADEFHN OADEGLY OADELNW OAEGHMP OAEHMNP OAGHMNY OAGILNY \n", - "OAGLMRU OAKLMNW OBDGHUY OBDHLNU OBGHRTU OBHILRY OCDMNPU OCEHKLM OCFHINR OCFIMRU OCHIMNP OCIMPRY ODEIJNT ODFGHIT \n", - "ODGHNRU OEGHMNY OEGIKNV OFHIRTW OFIKLRT OFINPRT OGHRTUW OHILMNT PABEILM PACDEHI PACEFTY PACINTY PADINOR PAEGHIN \n", - "PAGHORT PAIRTUY PCELMOX PEGLOTY PELNOTY PHILORW PIMNORT RABCDKW RABCDKY RACINPT RAFHKOY RAGHOPT RCGHOUY RFLMNOU \n", - "TACDKPR TACDLOR TACHLOP TADILVY TBCEILO TBILMOY TCNORUY TEGHNOU TGNORUW UACNORT UADFLMP UAJLNOR UBGILTY UCGILNO \n", - "UCNORTY VABEGLT VAEGLUY WADHIRT WADKLRY WAEGIKN WCDELNO WDILORY WEGHILT YABLNOT YACEINT YADEHLT YADGHLR YADINOR \n", - "YAEGILT YAGLMOP'''.split()\n", - "\n", - "len(nyt_archive)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Spelling Bee has been online since mid-2018, so there should be around 1,000 puzzles, but the 226 in this archive is enough for my purposes. We can determine some characteristics of the past puzzles:" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('A', 130),\n", - " ('L', 126),\n", - " ('O', 126),\n", - " ('I', 125),\n", - " ('N', 122),\n", - " ('T', 101),\n", - " ('C', 84),\n", - " ('E', 84),\n", - " ('H', 75),\n", - " ('R', 68),\n", - " ('Y', 68),\n", - " ('D', 63),\n", - " ('G', 62),\n", - " ('M', 62),\n", - " ('P', 59),\n", - " ('U', 54),\n", - " ('B', 41),\n", - " ('F', 38),\n", - " ('W', 29),\n", - " ('K', 26),\n", - " ('V', 17),\n", - " ('X', 12),\n", - " ('J', 5),\n", - " ('Z', 4),\n", - " ('Q', 1)]" - ] - }, - "execution_count": 54, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "common(''.join(nyt_archive)) # Most common letters" - ] - }, - { - "cell_type": "code", - "execution_count": 55, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('O', 30),\n", - " ('L', 27),\n", - " ('I', 21),\n", - " ('E', 17),\n", - " ('A', 16),\n", - " ('C', 13),\n", - " ('P', 13),\n", - " ('N', 11),\n", - " ('M', 10),\n", - " ('B', 9),\n", - " ('T', 9),\n", - " ('D', 8),\n", - " ('H', 8),\n", - " ('R', 7),\n", - " ('Y', 7),\n", - " ('U', 6),\n", - " ('W', 6),\n", - " ('F', 2),\n", - " ('G', 2),\n", - " ('K', 2),\n", - " ('V', 2)]" - ] - }, - "execution_count": 55, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "common(letters[0] for letters in nyt_archive) # Most common centers" - ] - }, - { - "cell_type": "code", - "execution_count": 56, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('AI', 21),\n", - " ('IO', 20),\n", - " ('AO', 13),\n", - " ('AEI', 12),\n", - " ('OU', 11),\n", - " ('AIO', 10),\n", - " ('AIY', 10),\n", - " ('AEO', 10),\n", - " ('EI', 10),\n", - " ('EIO', 8),\n", - " ('IOY', 8),\n", - " ('EOY', 7),\n", - " ('AE', 6),\n", - " ('AOU', 6),\n", - " ('AOY', 6),\n", - " ('EO', 6),\n", - " ('AU', 5),\n", - " ('AY', 4),\n", - " ('AIU', 4),\n", - " ('AEY', 4),\n", - " ('AEIY', 4),\n", - " ('EOU', 4),\n", - " ('AIOY', 4),\n", - " ('OUY', 4),\n", - " ('AEU', 3),\n", - " ('IOU', 3),\n", - " ('IUY', 3),\n", - " ('EU', 2),\n", - " ('EIY', 2),\n", - " ('AIUY', 2),\n", - " ('OY', 2),\n", - " ('AEOY', 2),\n", - " ('A', 2),\n", - " ('IU', 1),\n", - " ('IOUY', 1),\n", - " ('EUY', 1),\n", - " ('EOUY', 1),\n", - " ('EIU', 1),\n", - " ('IY', 1),\n", - " ('AUY', 1),\n", - " ('AEUY', 1)]" - ] - }, - "execution_count": 56, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "def vowels(letters): return letterset(v for v in 'AEIOUY'if v in letters)\n", - "\n", - "common(vowels(p) for p in nyt_archive) # Vowels used" - ] - }, - { - "cell_type": "code", - "execution_count": 57, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[(3, 107), (2, 102), (4, 15), (1, 2)]" - ] - }, - "execution_count": 57, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "common(len(vowels(p)) for p in nyt_archive) # Number of vowels used" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[(2, 141), (3, 76), (1, 9)]" - ] - }, - "execution_count": 58, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "common(len(vowels(p).replace('Y', '')) for p in nyt_archive) # Number of vowels used, not counting 'Y'" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can find the game score for all honeycombs in the archive and report some statistics:" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'N': 226, 'min': 107, 'max': 837, 'mean': 301.8407079646018}" - ] - }, - "execution_count": 59, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "nyt_honeycombs = [Honeycomb(letterset(letters), letters[0]) for letters in nyt_archive]\n", - "\n", - "points_table = tabulate_points(enable1)\n", - "scores = [game_score2(h, points_table) for h in nyt_honeycombs]\n", - "\n", - "dict(N=len(scores), min=min(scores), max=max(scores), mean=sum(scores)/len(scores))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "So the scores of NY Times honeycombs range from 100 to 800 with a mean of 300; far less than the 3,898 of the optimal honeycomb. Here's a plot and a histogram of all the scores:" - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEGCAYAAACKB4k+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de3zcdZ3v8dcnaVNsqfRKqaQXUpDTAoJtgAIuIGVRWLVoRams4m0Le2C9nPWs3RXqUmQfsI9dFY8orYqihyAItQUWjlzEoi4NdCLQloi0pZMGalvqcFlZaJN8zh+/74SZySSdtnP5zcz7+XjMI7/fd36Z36fNZD753s3dERERAWiodAAiIhIfSgoiItJPSUFERPopKYiISD8lBRER6Tes0gEciAkTJvj06dMrHYaISFVJJBIvuvvEfM9VdVKYPn06a9eurXQYIiJVxcySgz2n5iMREemnpCAiIv2UFEREpJ+SgoiI9FNSEBGRfkoKIiLST0lBRKTKJJIpbnh4I4lkquivXdXzFERECpFIplizeRdzW8YzZ9rYSodzQBLJFBd9fw27e/poGtbALZ+dW9R/k5KCiNS0tvYulqxaT597ST5Ey23N5l3s7umjz2FPTx9rNu9SUhAR2ZtEMsWdHd3c9vhWevuizcR2l+BDtNzmtoynaVgDe3r6GD6sgbkt44v6+koKIlJz0rWD3j4nc2/JBrOif4iW25xpY7nls3NL1hympCAiNaWtvYsrVq6jL2en4WENxtL5x1Z1LSFtzrSxJft3KCmISCylm38M+NDs5oI+BBPJVOg/eLOs0eDCk6YW/Br1TklBRMqqkJFAiWSKhcsfZXdv9On+s0Q3t/7N3juI12ze1d9/ANBgcPX5x/Gxk6cW7x9Q45QURKRo9vaBX+hwyjWbd7Gn980P90JH2cxtGc+I4Q3s3tNHQ2guUkLYN0oKInLA0k09dyS66ekd/AO/0OGUc1vGM7zR+msKhY6yKXUnbD1QUhCpcwc6sSv91/8be/r6R/oM9oFf6HDKOdPGcuuiU/a5TyH9vUoG+09JQaSO5CaAYsyOTf/1n04IxuB/2e/LX/L6cK8MJQWROjBY804xZsdm/vXf2GBc0DplyL/s9WEfb0oKIlUk31/6hYzkGax5pxizY9WOX1uUFERiLPNDH8hq6lnyvmNYes+GgkbyDNa8U6wPdP31XzuUFERiKnchtwWzm7Oaeu5bv63gkTxDNe/oA10yKSmIxERurWDJqvX0ZCzk5pDV1HPusZN5fMufChrJo+YdKZSSgkgFpRPB2JFNWU1BC2Y358zMNRbMbmbB7OasD/ejDxutkTxSVEoKImUy1HDQBjP63PubghwGzMzNbO5J04e9FJuSgkiJFTIcFHcaGgzDGR5qCrm1ApFyUFIQKZF0MvjZ2q151/HJHQ665H3HkHptd1YSUDKQcitZUjCzKcCPgcOAPmC5u19vZuOA24DpwBbgI+6eMjMDrgfOA14DPunuHaWKT6SU0iOHenIW9S/FcFCRYiplTaEH+Ht37zCz0UDCzB4APgk85O7XmtliYDHwZeBc4KjwOBn4bvgqUlUSyRRXhl2/cr2j+RCWvP8YDQeV2Goo1Qu7+7b0X/ru/irQCRwOzAduDpfdDJwfjucDP/bIGmCMmU0uVXwixZJIprjh4Y0kkikAVnR0D0gIBjQ1WlZCEImjsvQpmNl04J1AOzDJ3bdBlDjM7NBw2eHA1oxv6w5l23JeaxGwCGDqVK2TLpUzWAdybv3gxOljOfPoQ9VEJFWh5EnBzA4G7gS+4O6vRF0H+S/NUzag/u3uy4HlAK2trQPr5yJlkEimWPi9aDhpWroDecHsZu4IncvDG43F585UMpCqUdKkYGbDiRLCLe6+IhRvN7PJoZYwGdgRyruBKRnf3gy8UMr4RPbXio7urISQ24F866JT1IEsValkfQphNNEPgE53/3rGU3cBF4fji4FVGeWfsMhc4OV0M5NInCSSKX71h51ZZTMOPThrQbo508Zy2buPVEKQqlPKmsJpwMeBdWb2RCj7J+Ba4HYz+wzQBVwQnruXaDjqRqIhqZ8qYWwi+6WtvYsrV66jN6fh8qQjxikBSE0oWVJw99+Qv58AYF6e6x24rFTxiByIdKfybY9vHZAQGg0WzG6uTGAiRaYZzSJ7ka9TOa2xwbg6Y10ikWqnpCCyF8tWbxqQEBoN5s2cxCVnzFBCkJqipCAyiHST0YOd27PKj8+ZlSxSS5QURPIYrMmo0VBCkJqmpCCSR+48BIBhOfsaiNQiJQWRPDrCOkZpRx56MNcteIcSgtS8kk1eE6lW197bSecfX80q0zwEqRdKCiIZ2tq7WPbI5qwyQ/MQpH6o+UjqWua+yQBLVq0fsArjJae3qJYgdUNJQepWeoRRejvMC+Y0Z+2DYEQJYfF5MysXpEiZKSlI3cisFcyZNjZrhNHunj52vvoGI4Y3sHtPHw1hpNHHTtaeHVJflBSkLqT3TO5zH3QznAmjR2jPZKl7SgpS8xLJFFesXEe6ZWj3IJvhLJjdrD2Tpe4pKUjNu+6+TnK2TNZmOCKDUFKQmtbW3sVjW7InorVMGJW1GY6SgcibNE9BalYimeLKVesHlH/6XS0ViEakOigpSE1KJFMsvXtD1hBTgEtPb9GIIpEhqPlIakZ6yOnYkU0svWcDr+/JXtDuL2dN0pwDkb1QUpCqlpsIdvf00WBGn2fXEJoajUvPmFGhKEWqh5KCVK1r7+1k+a834x5ti9nnHo0ycqehwTCcxgbjgtYpfCgMNxWRoSkpSFW69t5ObsxYuK6nzxkWEsHwYQ0sed8xpF7braGmIvtISUGqTiKZYtmvs1cybTBYOv9YJQKRA6SkIFVnRUc3OV0GLPoLjSoSKQYNSZWqs/PVN7LOZx42WqOKRIpESUGqzkuv7c46nzJuZIUiEak9aj6SqpFIprizo5u1OfsnTxg9okIRidQeJQWpColkiguXP8qe3uzOhEbTVpkixaSkIFXhuvs6BySEYWEjHI00EikeJQWJvXwrnR721hHccNEcJQSRIlNSkFjKXL5iSZ6VTj837+1KCCIloKQgsZNIplj4vTXs6Yn2Su7LWOnUgEu00qlIySgpSOwsW72J3T3RCqe9fdH6Rd4XrWe0dP6xSggiJaSkILGSSKZ4sHN7VtlZ/+NQTpgyRstXiJSBkoLERiKZ4st3PpW1n3KDwaVnzFAyECkTJQWJhbb2Lq5cuY6cUafMmzlJCUGkjEq2zIWZ3WRmO8xsfUbZP5vZ82b2RHicl/HcP5rZRjN7xszeU6q4JD4SyRQ3PLyRtvYulqxaPyAhNIZagoiUz5A1BTNrBG5297/ej9f+EfBt4Mc55d9w93/Luc8s4ELgGOBtwINm9nZ3792P+0oVGGqEEUSb5lytiWkiZTdkUnD3XjObaGZN7r57qGvzfO8jZja9wMvnAz919zeA58xsI3AS8Oi+3FOqQyKZYundG/KOMDKLmowuUT+CSEUU0qewBfitmd0F/Dld6O5f3897Xm5mnwDWAn/v7ingcGBNxjXdoWwAM1sELAKYOlVDE6tNuoaQTghpGmEkEg+F9Cm8ANwTrh2d8dgf3wVmACcA24B/D+WW51rPU4a7L3f3VndvnThx4n6GIZWyoqN7QEJoajQuPWMGl737SCUEkQrba03B3a8CMLNR7v7nvV2/l9fqH4BuZt8jSjYQ1QymZFzaTJSMpMbkbpBz5KEHc92CdygZiMTEXmsKZnaKmT0NdIbz483sO/tzMzObnHH6QSA9Muku4EIzG2FmRwBHAY/tzz0kvhLJFBu2vZJV1jJhlBKCSIwU0qfwTeA9RB/cuPuTZnb63r7JzG4FzgQmmFk38FXgTDM7gahpaAtwSXjNDWZ2O/A00ANcppFHtSWRTPHR5Y/SkzPuVBvkiMRLQZPX3H2rWVaz/14/sN19YZ7iHwxx/TXANYXEI9Vn2epNAxKCNsgRiZ9CksJWMzsVcDNrAj5HaEoSKURbexcPPJ29npEZXH3+cWo6EomZQkYfXQpcRjRE9HmikUOXlTIoqR1t7V1csXJd1lAyA645/zitdioSQ4WMPnoRuKgMsUiNSSeE3AXuvqaEIBJbhYw+ajGzu81sZ1jLaJWZtZQjOKleSggi1amQPoU24AaiIaQQrVF0K3ByqYKS6pVIprhx9SYe7NyOKyGIVJ1CkoK5+08yzv+vmV1eqoCkeg027FQJQaR6DJoUzGxcOHzYzBYDPyWaX/BR4D/KEJtUkfQid0oIItVtqJpCgigJpCcoXJLxnANXlyooqS6DLXJnSggiVWfQpODuR5QzEKley1ZvGpAQGsM8BCUEkeqy1z6FsNHOXwHTM68/gKWzpYYkkike7MyemHZ88yEsef8xmpgmUoUK6Wi+G3gdWAf07eVaqTPLVm8aMOxUCUGkehWSFJrd/R0lj0SqTlt7F/fnLF8xb+YkJQSRKlbIMhf3mdk5JY9EqkoimeKKleuyyhoMLj1jRoUiEpFiKKSmsAb4uZk1AHuIRiO5u7+1pJFJrK3o6M5qNgLVEkRqQSFJ4d+BU4B17p53i0ypH4lkijWbd/Hs9lezyg3VEkRqQSFJ4VlgvRKCpOcj7Onpw3J21T57lmoJIrWgkKSwDfiVmd0H9G+wqyGp9SdzPoJ71IfgDsMbTbUEkRpRSFJ4LjyawkPqUL75CPNmTuKEKWOY2zJetQSRGlHIfgpXlSMQibd88xEuPWOGkoFIjSlkRvPDwID+BHc/qyQRSewkkqkB22m2ThurhCBSgwppPvpSxvFBwAKgpzThSBwtW71pwF8FR04aXZFYRKS0Cmk+SuQU/dbMVpcoHomZtvauAbWEBoMFs5srFJGIlFIhzUfjMk4bgDnAYSWLSGIjkUxx5ar1WbUEI1oOW01HIrWpkOajzH0VeohGIn2mlEFJPKzo6KY3o3fZDK7RctgiNa2Q5iPtq1Cndr76Rtb5idPGKiGI1LihtuM8fahvdPdHih+OxNmYkZqmIlLrhqop/O88ZQ4cDzQDjSWJSEREKmao7Tjfn3luZu8CvkK07MXlJY5LYmDrn17LOn/ptd0VikREyqWQ0UfzgCuJagn/4u4PlDwqqbhEMkXnH7NXQn2jRxvvidS6ofoU/oqoZvAy8BV3/23ZopKKW7Z604Cyj56oTmaRWjdUTeFuoBvYBXzZctZKdvcPlDAuqZBEMsWdHd0DFr87cbpGHonUg6GSwrvLFoXEQlt7F0tWrae3z7MmrDUaLD53ZsXiEpHyGaqjWUtZ1JG29i6uWLluwBabwxqMpfOP1QxmkTpRyIxmqXFt7V18ZeU6MvfWazS48KSpfGh2sxKCSB1RUqhziWSKK3ISghlcreUsROpSQ6EXmtmofXlhM7vJzHaY2fqMsnFm9oCZPRu+jg3lZmbfMrONZvaUmc3el3vJ/lvR0T2gyejsmZOUEETq1F6TgpmdamZPA53h/Hgz+04Br/0j4L05ZYuBh9z9KOChcA5wLnBUeCwCvltQ9HLAnt2ePRfBQPsti9SxQmoK3wDeQzQ0FXd/EhhyXaRw3SPAn3KK5wM3h+ObgfMzyn/skTXAGDObXEBssp8SyRT/9PN1rE2mssrPnjVJfQgidaygPgV335ozT6F3P+83yd23hdfcZmaHhvLDga0Z13WHsm25L2Bmi4hqE0ydqiaO/THU0FPVEkTqWyE1ha1mdirgZtZkZl8iNCUVkeUpG7AvNIC7L3f3VndvnThxYpHDqH3poac9OQlhWINxtTbPEal7hdQULgWuJ/rLvRu4H7hsP++33cwmh1rCZGBHKO8GpmRc1wy8sJ/3kEGkRxr1aeipiAyikE12XgQuKtL97gIuBq4NX1dllF9uZj8FTgZeTjczyYFLJFOs2byL1c/syEoIhoaeiki2QlZJPQL4O2B65vV7W/vIzG4FzgQmmFk38FWiZHC7mX0G6AIuCJffC5wHbAReAz61j/8OyZFOBGNHNrH0ng3s7unLmosAMGPiKCUEEclSSPPRSuAHRAvkFbx2srsvHOSpeXmudfa/SUpypDuS+9xpMKPPnT4f2HHz6Xe1VCQ+EYmvQpLC6+7+rZJHIkWRu4aRu9PYYBjO8GENfPKU6WzY9grnHjtZtQQRGaCQpHC9mX2VqIO5fyd3d+8oWVSyXxLJFFeuWp/dkRwWtEu9tpu5LePVmSwiQyokKRwHfBw4izebjzycS0wkkimW3r2B3oyMYAZL5x+rGoGIFKyQpPBBoMXdtUFvzLS1d3Hf+m0cM/mt/OjRLby+J7vLR2sYici+KiQpPAmM4c05BRID197byY2PbAbg18++OKATuanRNDtZRPZZIUlhEvB7M3uc7D4FbcdZIW3tXSwLCSHNLJqe3thgXNA6RZPRRGS/FJIUvlryKKRg6dFFuWuALPqLFka/Zbg6k0XkgBQyo1nbcsZEvtFFBlxyeguLz9MeyiJy4ArZT2GumT1uZv9lZrvNrNfMXilHcJJt2epNA0YXXfPB45QQRKRoClkl9dvAQuBZ4C3AZ0OZlFEimeLBzu1ZZRpdJCLFVuh+ChvNrNHde4Efmtl/ljguybFs9aasZqMG7X0gIiVQSFJ4zcyagCfM7F+JNr7Zp/2a5cAkkikeeDq7ljBvpnZIE5HiKyQpfJyomely4ItE+x4sKGVQEslc8jpztJH2URaRUilk9FEyHL4OXFXacCQtkUyx8Htr2NPTN2D46YyJo1RLEJGSGLSj2czmm9llGeftZrY5PD5cnvDq17LVm6I9EPI8pyWvRaRUhqop/ANwYcb5COBEov6EHwJ3lDCuupZvpNGJ08dy0PBGLXktIiU1VFJocvetGee/cfddwC4zU0dzCeUbabT43JlqMhKRkhtqnkLWJ5C7X55xOrE04Ui+WoJGGolIuQyVFNrN7G9yC83sEuCx0oVUv9J7Img+gohUylDNR18EVprZx4D0LmtziPoWzi91YPUmPdpod0/2ngit08aqliAiZTNoUnD3HcCpZnYWcEwo/g93/2VZIqszKzq6ByQEgCMnja5ANCJSrwqZp/BLQImgxJ7d/uqAsqZGY8Hs5gpEIyL1qqC1j6S0EskUa5OprLLjmw9hyfuPUdORiJRVIaukSonlG4KqhCAilaCkUGFt7V1a7E5EYkNJoYISydSArTUbNQRVRCpISaGCcpuNDLj6/ONUSxCRilFSqKAN27J3NX3bmIO0rpGIVJSSQoUkkileSP13Vtmstx1SoWhERCJKChWyoqNbG+eISOwoKVRI7mS1E6drOQsRqTwlhQrIN1lNy1mISBwoKVTAio7uAZPVtJyFiMSBkkIF5DYdHT1ptJqORCQWlBQq4PmXX88639M7cHVUEZFKqMiCeGa2BXgV6AV63L3VzMYBtwHTgS3AR9w9NdhrVKt8Q1GPmHhwhaIREclWyZrCu939BHdvDeeLgYfc/SjgoXBec5at3qShqCISW3FqPpoP3ByOb6YGd3fLt/jd2bO0+J2IxEelkoID95tZwswWhbJJ7r4NIHw9NN83mtkiM1trZmt37txZpnAPXCKZ4spV67X4nYjEWqU22TnN3V8ws0OBB8zs94V+o7svB5YDtLa2+l4uj40VHd30ZoxDNdPidyISPxWpKbj7C+HrDuDnwEnAdjObDBC+7qhEbKWQSKb41R+yazUnThurxe9EJHbKnhTMbJSZjU4fA+cA64G7gIvDZRcDq8odWykkkik+uvxRns8ZcaQZzCISR5VoPpoE/NzM0vdvc/f/Z2aPA7eb2WeALuCCCsRWdCs6uunpzW7latQMZhGJqbInBXffDByfp3wXMK/c8ZRa7uxlbaQjInEWpyGpNSffwndnz5qkvgQRiS0lhRLK3W6zQUNQRSTmlBRKJJFM8WBn9kS1eTM1UU1E4q1S8xRqWlt7F9966A+qJYhI1VFSKKJEMsWNqzcNWMoCVEsQkeqgpFAkiWSKhd9bw+6egctgNzWaagkiUhWUFIpkRUd33oRwzqxJXHLGDNUSRKQqKCkUSe58hHGjmvjSOUdr+KmIVBWNPiqCfPMR3nvsYUoIIlJ1lBSKIN98BC1jISLVSEnhACWSqQGjjTTSSESqlZLCAdL2miJSS5QUDtDmnf+Vdf62MQepliAiVUtJ4QCNG9WUdX74mLdUKBIRkQOnpFBkY0Y27f0iEZGYUlI4APmGok4YPaJC0YiIHDglhQOgoagiUmuUFPbTtfd2cr+GoopIjVFS2A9t7V3c+MjmrLJGLY0tIjVASWE/3PTb5waUad9lEakFWhBvHySSKe7s6GbTjuy5CSdOH6t1jkSkJigpFGiw/RIagMXnzqxMUCIiRaakUKBlqzcNSAjDGoyl849Vs5GI1AwlhQIkkike7MweaXR88yEsef8xSggiUlPU0bwXiWSKL9/51ID5CEoIIlKLVFMYQlt7F1euXEevZ5drPoKI1CrVFAaRSKa4ctX6AQlB8xFEpJappjCIFR3d9PZlZ4TGBuNqdSyLSA1TUhjEs9tfzTpvHnMQ1y+crYQgIjVNzUd5tLV38fiW7NVPTz/6UCUEEal5Sgo50n0JmQ1HjVr9VETqhJJCjmWrN2X1JZhpXSMRqR/qUwgSyRQ3rt7EAznLYZ89c5LWNRKRuqGkQJQQPrr8UXpyxp9q+KmI1Ju6TgqJZIo1m3fx5NaXBiSEBjUbiUgdil1SMLP3AtcDjcD33f3aUtwnverpnp4+zHJjgK+df5yajUSk7sSqo9nMGoEbgHOBWcBCM5tV7PskkimW3r2B3T19ONDnUc0Aoiaja5QQRKROxa2mcBKw0d03A5jZT4H5wNPFukEimeKi76/h9T3Zy2DPmzmJE6aMYW7LeDUZiUjdiltSOBzYmnHeDZyceYGZLQIWAUyduu9/za/ZvGvAvghNjcalZ8xQMhCRuhe3pGB5yrJ6gN19ObAcoLW11fNcP6S5LeNpGtbAnp4+GhuMC1qn8KHZzUoIIiLELyl0A1MyzpuBF4p5gznTxnLLZ+eyZvMuNRWJiOSIW1J4HDjKzI4AngcuBD5W7JvMmTZWyUBEJI9YJQV37zGzy4FfEA1JvcndN1Q4LBGRuhGrpADg7vcC91Y6DhGRehSreQoiIlJZSgoiItJPSUFERPopKYiISD9z3+f5X7FhZjuB5D58ywTgxRKFU0yKs7iqIc5qiBEUZzFVMsZp7j4x3xNVnRT2lZmtdffWSsexN4qzuKohzmqIERRnMcU1RjUfiYhIPyUFERHpV29JYXmlAyiQ4iyuaoizGmIExVlMsYyxrvoURERkaPVWUxARkSEoKYiISL+aSgpmdpOZ7TCz9Rll48zsATN7NnwdG8rNzL5lZhvN7Ckzm13GOKeY2cNm1mlmG8zs83GL1cwOMrPHzOzJEONVofwIM2sPMd5mZk2hfEQ43xien17qGHPibTSz35nZPXGN08y2mNk6M3vCzNaGstj8zMN9x5jZHWb2+/D+PCWGMR4d/g/Tj1fM7AtxizPc+4vh92e9md0afq9i997M4u418wBOB2YD6zPK/hVYHI4XA9eF4/OA+4h2e5sLtJcxzsnA7HA8GvgDMCtOsYZ7HRyOhwPt4d63AxeG8huBvw3H/xO4MRxfCNxW5p/9/wLagHvCeeziBLYAE3LKYvMzD/e9GfhsOG4CxsQtxpx4G4E/AtPiFifR9sLPAW/JeE9+Mo7vzay4K3HTEv8gppOdFJ4BJofjycAz4XgZsDDfdRWIeRXwl3GNFRgJdBDtl/0iMCyUnwL8Ihz/AjglHA8L11mZ4msGHgLOAu4Jv/xxjHMLA5NCbH7mwFvDh5jFNcY8MZ8D/DaOcfLmnvPjwnvtHuA9cXxvZj5qqvloEJPcfRtA+HpoKE//wNK6Q1lZhSriO4n+Eo9VrKFJ5glgB/AAsAl4yd178sTRH2N4/mVgfKljDL4J/APQF87HxzROB+43s4SZLQplcfqZtwA7gR+Gprjvm9momMWY60Lg1nAcqzjd/Xng34AuYBvRey1BPN+b/eohKQzG8pSVdXyumR0M3Al8wd1fGerSPGUlj9Xde939BKK/xE8CZg4RR0ViNLP3ATvcPZFZPEQslfy5n+bus4FzgcvM7PQhrq1EnMOIml+/6+7vBP5M1AwzmIr+DoW2+A8AP9vbpXnKyvHeHAvMB44A3gaMIvrZDxZLxT+ToD6SwnYzmwwQvu4I5d3AlIzrmoEXyhWUmQ0nSgi3uPuKOMfq7i8BvyJqjx1jZukd+zLj6I8xPH8I8KcyhHca8AEz2wL8lKgJ6ZsxjBN3fyF83QH8nCjRxuln3g10u3t7OL+DKEnEKcZM5wId7r49nMctzrOB59x9p7vvAVYApxLD92amekgKdwEXh+OLidrv0+WfCCMT5gIvp6uepWZmBvwA6HT3r8cxVjObaGZjwvFbiN7gncDDwIcHiTEd+4eBX3poHC0ld/9Hd2929+lETQm/dPeL4hanmY0ys9HpY6K28PXE6Gfu7n8EtprZ0aFoHvB0nGLMsZA3m47S8cQpzi5grpmNDL/z6f/PWL03Byh3J0YpH0RvkG3AHqKs+xmiNrmHgGfD13HhWgNuIGonXwe0ljHOdxFVC58CngiP8+IUK/AO4HchxvXAklDeAjwGbCSqto8I5QeF843h+ZYK/PzP5M3RR7GKM8TzZHhsAL4SymPzMw/3PQFYG37uK4GxcYsx3HsksAs4JKMsjnFeBfw+/A79BBgRt/dm7kPLXIiISL96aD4SEZECKSmIiEg/JQUREemnpCAiIv2UFEREpJ+SgtQ8M5tkZm1mtjksMfGomX2wjPcfaWa3WLRC6noz+02YzS4SO8P2folI9QqThlYCN7v7x0LZNKLlEcrl88B2dz8u3P9oork0+83Mhvmb6+eIFI1qClLrzgJ2u/uN6QJ3T7r7/4FoQUIz+7WZdYTHqaH8TDNbbWa3m9kfzOxaM7vIoj0m1pnZjHDdRDO708weD4/T8sQwGXg+4/7PuPsb4fs/Edb4f9LMfhLKppnZQ6H8ITObGsp/ZGZfN7OHgevCLOmbwn1/Z2bzS/NfKHWlEjPm9NCjXA/gc8A3hnh+JHBQOD4KWBuOzwReIvpAH1RraEAAAAIFSURBVEH0oX5VeO7zwDfDcRvwrnA8lWjpktx7nEC0Ds+jwNeAo0L5MUTLOE8I5+kZuHcDF4fjTwMrw/GPiJZfbgzn/wL8dTgeQ7Qvx6hK/5/rUd0PNR9JXTGzG4iWGdnt7icSbSD0bTM7AegF3p5x+eMe1sgxs03A/aF8HfDucHw2MCtqpQLgrWY22t1fTRe4+xNm1kK03tHZwONmdgpRLeYOd38xXJde/OwU4EPh+CdEm8ek/czde8PxOUSLAX4pnB9ESEz7+N8i0k9JQWrdBmBB+sTdLzOzCUTr+wB8EdgOHE/UnPp6xve+kXHcl3Hex5u/Ow1EG6P891BBuPt/Ea2SucLM+ojWutpDYUsjZ17z54xjAxa4+zMFvIZIQdSnILXul8BBZva3GWUjM44PAba5ex/wcaLtHffF/cDl6ZNQ48hiZqfZm/sFNxFtvZokWrTtI2Y2Pjw3LnzLfxKt+ApwEfCbQe79C+DvQmc6ZvbOfYxdZAAlBalp7u7A+cAZZvacmT1GtA/xl8Ml3wEuNrM1RE1Hf87/SoP6HNAaOoWfBi7Nc80MYLWZrSNaeXYtcKe7bwCuCc89CXw94zU/ZWZPESWqzw9y76uJmr+eMrP14VzkgGiVVBER6aeagoiI9FNSEBGRfkoKIiLST0lBRET6KSmIiEg/JQUREemnpCAiIv3+P6ezdzrznRoyAAAAAElFTkSuQmCC\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEGCAYAAABiq/5QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAATjklEQVR4nO3df7RlZX3f8ffHARwGfyAymqkIAy5KpW0EOhoINjGIiZEENaEWS800wdAmWjW2K442K40rSRd2pWLSJFUSrZSq4ZcBAs1CgmBqa4FBfosE1FERwoyJBEMMCHz7x34uXIb5cYacfe6587xfa511n/2cfc7+3nvO/dx9n73Ps1NVSJL68bSlLkCSNFsGvyR1xuCXpM4Y/JLUGYNfkjqzx1IXMIn999+/1q5du9RlSNKyct11132zqlZv3b8sgn/t2rVs3LhxqcuQpGUlyVe31e9QjyR1xuCXpM4Y/JLUGYNfkjpj8EtSZwx+SeqMwS9JnTH4JakzBr8kdWZZfHJ3nqzdcOnUnmvT6SdM7bkkaVLu8UtSZwx+SeqMwS9JnTH4JakzBr8kdcbgl6TOGPyS1BmDX5I6Y/BLUmcMfknqzOjBn2RFkuuTXNKWD05ydZI7kpyTZK+xa5AkPW4We/xvB25btPw+4IyqOhT4FnDqDGqQJDWjBn+SA4ATgN9vywGOA85vq5wFvG7MGiRJTzT2Hv8HgF8EHm3LzwXuq6qH2/JdwAu29cAkpyXZmGTjli1bRi5TkvoxWvAn+TFgc1Vdt7h7G6vWth5fVWdW1bqqWrd69epRapSkHo05H/+xwIlJXgOsBJ7F8B/Avkn2aHv9BwB3j1iDJGkro+3xV9W7q+qAqloLnAx8uqpOAa4ETmqrrQcuGqsGSdKTLcV5/O8C3pnkToYx/w8vQQ2S1K2ZXHqxqq4CrmrtLwMvm8V2JUlP5id3JakzBr8kdcbgl6TOGPyS1BmDX5I6Y/BLUmcMfknqjMEvSZ0x+CWpMwa/JHXG4Jekzhj8ktQZg1+SOmPwS1JnDH5J6ozBL0mdMfglqTMGvyR1xuCXpM4Y/JLUGYNfkjpj8EtSZwx+SeqMwS9JnTH4JakzBr8kdcbgl6TOGPyS1BmDX5I6Y/BLUmcMfknqjMEvSZ0x+CWpM3ssdQFjW7vh0qk8z6bTT5jK80jSUnOPX5I6Y/BLUmcMfknqjMEvSZ3Z7Q/u9mhaB7TBg9rS7mi0Pf4kK5Nck+TGJLcmeW/rPzjJ1UnuSHJOkr3GqkGS9GRjDvU8CBxXVS8BjgBeneRo4H3AGVV1KPAt4NQRa5AkbWW04K/BX7fFPdutgOOA81v/WcDrxqpBkvRkox7cTbIiyQ3AZuBy4EvAfVX1cFvlLuAF23nsaUk2Jtm4ZcuWMcuUpK6MGvxV9UhVHQEcALwMePG2VtvOY8+sqnVVtW716tVjlilJXZnJ6ZxVdR9wFXA0sG+ShbOJDgDunkUNkqTBmGf1rE6yb2vvDRwP3AZcCZzUVlsPXDRWDZKkJxvzPP41wFlJVjD8gTm3qi5J8gXgD5L8GnA98OERa5AkbWW04K+qm4Ajt9H/ZYbxfknSEnDKBknqjMEvSZ0x+CWpMwa/JHXG2Tk1ES9hKe0+3OOXpM5MFPxJ/tHYhUiSZmPSPf4Ptrn1f37h07iSpOVpouCvqpcDpwAvBDYm+XiSV41amSRpFBOP8VfVHcAvAe8CfhD4rSRfTPITYxUnSZq+Scf4vzfJGQyTrB0H/HhVvbi1zxixPknSlE16OudvA78HvKeqvrPQWVV3J/mlUSqTJI1i0uB/DfCdqnoEIMnTgJVV9TdVdfZo1UmSpm7SMf4/AfZetLyq9UmSlplJg3/logun09qrxilJkjSmSYP/gSRHLSwk+SfAd3awviRpTk06xv8O4LwkC9fHXQP883FKkiSNaaLgr6prk/wD4DAgwBer6rujViZJGsWuzM75UmBte8yRSaiq/zFKVZKk0UwU/EnOBl4E3AA80roLMPglaZmZdI9/HXB4VdWYxUiSxjfpWT23AN8zZiGSpNmYdI9/f+ALSa4BHlzorKoTR6lKkjSaSYP/V8YsQpI0O5OezvmZJAcBh1bVnyRZBawYtzRJ0hgmnZb5Z4HzgQ+1rhcAF45VlCRpPJMe3H0LcCxwPzx2UZbnjVWUJGk8kwb/g1X10MJCkj0YzuOXJC0zkwb/Z5K8B9i7XWv3POCPxitLkjSWSYN/A7AFuBn418D/Yrj+riRpmZn0rJ5HGS69+HvjliNJGtukc/V8hW2M6VfVIVOvSJI0ql2Zq2fBSuCfAftNvxxJ0tgmGuOvqr9YdPtGVX0AOG7k2iRJI5h0qOeoRYtPY/gP4JmjVCRJGtWkQz3/ZVH7YWAT8IapVyNJGt2kZ/X80NiFSJJmY9Khnnfu6P6qev90ypEkjW1Xzup5KXBxW/5x4E+Br49RlCRpPLtyIZajqurbAEl+BTivqt68vQckeSHDNXm/B3gUOLOqfjPJfsA5DBdu3wS8oaq+9VS/AUnSrpl0yoYDgYcWLT/EENw78jDw76rqxcDRwFuSHM4w/cMVVXUocEVbliTNyKR7/GcD1yT5Q4ZP8L6eYW9+u6rqHuCe1v52ktsY5vF/LfCKttpZwFXAu3a1cEnSUzPpWT2/nuSPgX/aun66qq6fdCNJ1gJHAlcDz29/FKiqe5Jsc17/JKcBpwEceOCBk25qWVq74dKpPM+m00+YyvPMwrS+Z1he37c0DyYd6gFYBdxfVb8J3JXk4EkelOQZwAXAO6rq/kk3VlVnVtW6qlq3evXqXShTkrQjk1568T8yDMe8u3XtCfzPCR63J0Pof6yqPtm6702ypt2/Bti8q0VLkp66Sff4Xw+cCDwAUFV3s5MpG5IE+DBw21bn+V8MrG/t9cBFu1KwJOnvZtKDuw9VVSUpgCT7TPCYY4E3ATcnuaH1vQc4HTg3yanA1xhm+pQkzcikwX9ukg8B+yb5WeBn2MlFWarqs0C2c/crJy9RmkyPB8mlp2LSs3p+o11r937gMOCXq+ryUSuTJI1ip8GfZAVwWVUdDxj2krTM7fTgblU9AvxNkmfPoB5J0sgmHeP/W4aDtJfTzuwBqKq3jVKVJGk0kwb/pe0mSVrmdhj8SQ6sqq9V1VmzKkiSNK6djfFfuNBIcsHItUiSZmBnwb/4PPxDxixEkjQbOwv+2k5bkrRM7ezg7kuS3M+w5793a9OWq6qeNWp1kqSp22HwV9WKWRUiSZqNXZmPX5K0GzD4JakzBr8kdcbgl6TOGPyS1BmDX5I6Y/BLUmcMfknqjMEvSZ0x+CWpMwa/JHXG4Jekzhj8ktQZg1+SOmPwS1JnDH5J6ozBL0md2dmlF6Vurd1w6VSeZ9PpJ0zleaRpcY9fkjpj8EtSZwx+SeqMwS9JnTH4JakzBr8kdcbgl6TOGPyS1BmDX5I6Y/BLUmdGC/4kH0myOckti/r2S3J5kjva1+eMtX1J0raNucf/UeDVW/VtAK6oqkOBK9qyJGmGRgv+qvpT4C+36n4tcFZrnwW8bqztS5K2bdZj/M+vqnsA2tfnzXj7ktS9uT24m+S0JBuTbNyyZctSlyNJu41ZB/+9SdYAtK+bt7diVZ1ZVeuqat3q1atnVqAk7e5mHfwXA+tbez1w0Yy3L0ndG/N0zk8AnwMOS3JXklOB04FXJbkDeFVbliTN0GiXXqyqN27nrleOtU1J0s7N7cFdSdI4DH5J6ozBL0mdMfglqTMGvyR1xuCXpM4Y/JLUGYNfkjpj8EtSZwx+SeqMwS9JnTH4JakzBr8kdcbgl6TOjDYts6QnWrvh0qk916bTT5jac6k/7vFLUmcMfknqjMEvSZ0x+CWpMx7clbSseJD87849fknqjMEvSZ0x+CWpMwa/JHXG4Jekzhj8ktQZg1+SOmPwS1Jn/ACXtIxN68NMY36QaTnU2Bv3+CWpMwa/JHXG4Jekzhj8ktQZD+5KeowzX/bBPX5J6ozBL0mdMfglqTMGvyR1xoO7kjRl8/5pZff4JakzSxL8SV6d5PYkdybZsBQ1SFKvZh78SVYAvwP8KHA48MYkh8+6Dknq1VLs8b8MuLOqvlxVDwF/ALx2CeqQpC6lqma7weQk4NVV9ea2/Cbg+6rqrVutdxpwWls8DLh9wk3sD3xzSuWOyTqnyzqnyzqna6nqPKiqVm/duRRn9WQbfU/661NVZwJn7vKTJxurat1TKWyWrHO6rHO6rHO65q3OpRjquQt44aLlA4C7l6AOSerSUgT/tcChSQ5OshdwMnDxEtQhSV2a+VBPVT2c5K3AZcAK4CNVdesUN7HLw0NLxDqnyzqnyzqna67qnPnBXUnS0vKTu5LUGYNfkjqz7II/yUeSbE5yy6K+/ZJcnuSO9vU5rT9JfqtNDXFTkqNmVOMLk1yZ5LYktyZ5+5zWuTLJNUlubHW+t/UfnOTqVuc57SA8SZ7elu9s96+dRZ2L6l2R5Pokl8xrnUk2Jbk5yQ1JNra+uXrd27b3TXJ+ki+29+kx81ZnksPaz3Hhdn+Sd8xbnW3bv9B+h25J8on2uzV378/HVNWyugE/ABwF3LKo7z8DG1p7A/C+1n4N8McMnx04Grh6RjWuAY5q7WcCf8YwPcW81RngGa29J3B12/65wMmt/4PAz7X2zwMfbO2TgXNm/Nq/E/g4cElbnrs6gU3A/lv1zdXr3rZ9FvDm1t4L2Hce61xU7wrgz4GD5q1O4AXAV4C9F70v/9U8vj8fq3nWG5zSD3otTwz+24E1rb0GuL21PwS8cVvrzbjei4BXzXOdwCrg88D3MXzCcI/WfwxwWWtfBhzT2nu09TKj+g4ArgCOAy5pv9zzWOcmnhz8c/W6A89qQZV5rnOr2n4Y+D/zWCdD8H8d2K+93y4BfmQe358Lt2U31LMdz6+qewDa1+e1/oUXZMFdrW9m2r9xRzLsTc9dnW345AZgM3A58CXgvqp6eBu1PFZnu/+vgOfOok7gA8AvAo+25efOaZ0FfCrJdRmmHYH5e90PAbYA/70Nnf1+kn3msM7FTgY+0dpzVWdVfQP4DeBrwD0M77frmM/3J7AMx/h30UTTQ4y28eQZwAXAO6rq/h2tuo2+mdRZVY9U1REMe9QvA168g1qWpM4kPwZsrqrrFnfvoJalfN2PraqjGGaffUuSH9jBuktV5x4Mw6X/raqOBB5gGDLZnqX+PdoLOBE4b2erbqNvFu/P5zBMNHkw8PeAfRhe/+3VsqQ/T9h9gv/eJGsA2tfNrX/JpodIsidD6H+sqj45r3UuqKr7gKsYxkb3TbLw4b7FtTxWZ7v/2cBfzqC8Y4ETk2ximM31OIb/AOatTqrq7vZ1M/CHDH9M5+11vwu4q6qubsvnM/whmLc6F/wo8Pmqurctz1udxwNfqaotVfVd4JPA9zOH788Fu0vwXwysb+31DGPqC/0/1Y72Hw381cK/iGNKEuDDwG1V9f45rnN1kn1be2+GN/BtwJXASdupc6H+k4BPVxuoHFNVvbuqDqiqtQz/8n+6qk6ZtzqT7JPkmQtthnHpW5iz172q/hz4epLDWtcrgS/MW52LvJHHh3kW6pmnOr8GHJ1kVfvdX/h5ztX78wlmeUBhSgdSPsEwjvZdhr+cpzKMj10B3NG+7tfWDcNFX74E3Aysm1GNL2f41+0m4IZ2e80c1vm9wPWtzluAX279hwDXAHcy/Hv99Na/si3f2e4/ZAle/1fw+Fk9c1Vnq+fGdrsV+A+tf65e97btI4CN7bW/EHjOnNa5CvgL4NmL+uaxzvcCX2y/R2cDT5+39+fim1M2SFJndpehHknShAx+SeqMwS9JnTH4JakzBr8kdcbg124jyfOTfDzJl9uUCZ9L8voZbn9Vko9lmJ3zliSfbZ/elubKzC+9KI2hfXDmQuCsqvoXre8gho/6z8rbgXur6h+37R/G8HmTpyzJHvX4fC/SVLjHr93FccBDVfXBhY6q+mpV/VcYJstL8r+TfL7dvr/1vyLJZ5Kcm+TPkpye5JQM1ym4OcmL2nqrk1yQ5Np2O3YbNawBvrFo+7dX1YPt8T/V5oi/McnZre+gJFe0/iuSHNj6P5rk/UmuBN7XPhH8kbbd65O8dpwfobox60+MefM2xg14G3DGDu5fBaxs7UOBja39CuA+htB+OkNwv7fd93bgA639ceDlrX0gw3QcW2/jCIZ5Yz4H/BpwaOv/hwxTBO/flhc+afpHwPrW/hngwtb+KMPUviva8n8C/mVr78twfYd9lvpn7m353hzq0W4pye8wTJ3xUFW9lOFCM7+d5AjgEeDvL1r92mpzuiT5EvCp1n8z8EOtfTxw+DCiBMCzkjyzqr690FFVNyQ5hGGOnuOBa5Mcw/DfyPlV9c223sKEXMcAP9HaZzNcYGTBeVX1SGv/MMMkdf++La+k/fHZxR+LBDjGr93HrcBPLixU1VuS7M8wHw3ALwD3Ai9hGOL820WPfXBR+9FFy4/y+O/I0xgunvGdHRVRVX/NMDvjJ5M8yjBH03eZbNrdxes8sKgd4Cer6vYJnkPaKcf4tbv4NLAyyc8t6lu1qP1s4J6qehR4E8Ol/HbFp4C3Liy0/xyeIMmxefz6r3sxXG7zqwwTib0hyXPbffu1h/xfhtlGAU4BPrudbV8G/Nt2AJskR+5i7dITGPzaLVRVAa8DfjDJV5Jcw3Bd2Xe1VX4XWJ/k/zEM8zyw7WfarrcB69qB2C8A/2Yb67wI+EySmxlmPd0IXFBVtwK/3u67EXj/ouf86SQ3Mfwxevt2tv2rDENVNyW5pS1LT5mzc0pSZ9zjl6TOGPyS1BmDX5I6Y/BLUmcMfknqjMEvSZ0x+CWpM/8fHS1nE+nF9ZYAAAAASUVORK5CYII=\n", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "plt.plot(sorted(scores), range(len(scores)), '.')\n", - "plt.xlabel('Game Score'); plt.ylabel('Game Number');\n", - "\n", - "plt.show()\n", - "\n", - "plt.hist(scores, bins=15, rwidth=0.9)\n", - "plt.xlabel('Game Score'); plt.ylabel('Frequency');" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here's a report on the outlier, the only honeycomb to score over 800 points:" - ] - }, - { - "cell_type": "code", - "execution_count": 61, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Honeycomb(letters='CEINOTV', center='I') scores 837 points on 125 words from a 44585 word list:\n", - "\n", - "CEINOTV 182 points 11 pangrams COINVENT(15) CONCOCTIVE(17) CONNECTIVE(17) CONNIVENT(16) CONVECTION(17)\n", - " CONVECTIVE(17) CONVENIENT(17) CONVENTION(17) EVECTION(15) EVICTION(15) INCONVENIENT(19)\n", - " CEINOT 140 points 16 words CONCEIT(7) CONNECTION(10) CONTENTION(10) CONTINENCE(10) CONTINENT(9) COONTIE(7)\n", - " INCONTINENCE(12) INCONTINENT(11) INNOCENT(8) NEOTENIC(8) NICOTINE(8) NOETIC(6) NONCONNECTION(13)\n", - " NOTICE(6) TECTONIC(8) TONETIC(7)\n", - " CEINOV 60 points 7 words CONCEIVE(8) CONNIVE(7) CONVENIENCE(11) CONVINCE(8) INCONVENIENCE(13) INVOICE(7)\n", - " NOVICE(6)\n", - " CEINTV 18 points 2 words INCENTIVE(9) INVECTIVE(9)\n", - " CINOTV 17 points 2 words CONVICT(7) CONVICTION(10)\n", - " EINOTV 9 points 1 word INVENTION(9)\n", - " CEINO 22 points 3 words CONIINE(7) CONINE(6) INNOCENCE(9)\n", - " CEINT 20 points 3 words ENCEINTE(8) ENTICE(6) INCITE(6)\n", - " CEINV 14 points 2 words EVINCE(6) EVINCIVE(8)\n", - " CEIOT 6 points 1 word COOTIE(6)\n", - " CEIOV 5 points 1 word VOICE(5)\n", - " CEITV 17 points 3 words CIVET(5) EVICT(5) EVICTEE(7)\n", - " CINOT 53 points 7 words COITION(7) CONCOCTION(10) INTINCTION(10) NICOTIN(7) NICOTINIC(9) ONTIC(5) TONIC(5)\n", - " CINOV 11 points 2 words COVIN(5) OVONIC(6)\n", - " EINOT 22 points 3 words INTENTION(9) INTONE(6) TONTINE(7)\n", - " EINOV 10 points 2 words ENVOI(5) OVINE(5)\n", - " EINTV 28 points 4 words INVENT(6) INVENTIVE(9) INVITE(6) INVITEE(7)\n", - " EIOTV 6 points 1 word VOTIVE(6)\n", - " CEIN 7 points 3 words CINE(1) NICE(1) NIECE(5)\n", - " CEIT 9 points 3 words CITE(1) ETIC(1) TECTITE(7)\n", - " CEIV 6 points 2 words CIVIE(5) VICE(1)\n", - " CINO 33 points 9 words CION(1) COIN(1) CONI(1) CONIC(5) CONIN(5) ICON(1) ICONIC(6) IONIC(5) NONIONIC(8)\n", - " CINT 5 points 1 word TINCT(5)\n", - " CINV 5 points 1 word VINIC(5)\n", - " CIOT 13 points 3 words OTIC(1) OTITIC(6) TICTOC(6)\n", - " EINO 6 points 1 word IONONE(6)\n", - " EINT 28 points 6 words INTENT(6) INTINE(6) NINETEEN(8) NITE(1) TENTIE(6) TINE(1)\n", - " EINV 19 points 6 words NEVI(1) NIEVE(5) VEIN(1) VENIN(5) VENINE(6) VINE(1)\n", - " EITV 5 points 1 word EVITE(5)\n", - " INOT 12 points 3 words INTO(1) NITON(5) NOTION(6)\n", - " INOV 1 point 1 word VINO(1)\n", - " CIO 11 points 2 words COCCI(5) COCCIC(6)\n", - " CIT 5 points 1 word ICTIC(5)\n", - " CIV 5 points 1 word CIVIC(5)\n", - " EIN 1 point 1 word NINE(1)\n", - " EIT 6 points 1 word TITTIE(6)\n", - " EIV 1 point 1 word VIVE(1)\n", - " INO 15 points 3 words INION(5) NINON(5) ONION(5)\n", - " INT 2 points 2 words INTI(1) TINT(1)\n", - " IOT 1 point 1 word TOIT(1)\n", - " IT 1 point 1 word TITI(1)\n" - ] - } - ], - "source": [ - "report(max(nyt_honeycombs, key=lambda h: game_score2(h, points_table)))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "And here are the most common words, along with all the pangram words that appeared more than once:" - ] - }, - { - "cell_type": "code", - "execution_count": 62, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('LOLL', 28),\n", - " ('NOON', 23),\n", - " ('LALL', 21),\n", - " ('OLIO', 18),\n", - " ('AALII', 16),\n", - " ('ANNA', 16),\n", - " ('CACA', 16),\n", - " ('ILIA', 16),\n", - " ('ILIAL', 16),\n", - " ('INION', 16),\n", - " ('NAAN', 16),\n", - " ('NANA', 16),\n", - " ('NINON', 16),\n", - " ('OLLA', 16),\n", - " ('ONION', 16),\n", - " ('OTTO', 16),\n", - " ('TOOT', 16),\n", - " ('COCCI', 15),\n", - " ('COCCIC', 15),\n", - " ('DODO', 15),\n", - " ('INIA', 15),\n", - " ('LOOM', 15),\n", - " ('LOON', 15),\n", - " ('MOLL', 15),\n", - " ('MOOL', 15),\n", - " ('NOLO', 15),\n", - " ('OLEO', 15),\n", - " ('TITI', 15),\n", - " ('ACACIA', 14),\n", - " ('TOIT', 14)]" - ] - }, - "execution_count": 62, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "nyt_words = common(w for w in enable1 for h in nyt_honeycombs if can_make(h, w))\n", - "nyt_words[:30]" - ] - }, - { - "cell_type": "code", - "execution_count": 63, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "[('CAPTIVITY', 2),\n", - " ('COMPLEX', 2),\n", - " ('COUNTRY', 2),\n", - " ('HANDICAP', 2),\n", - " ('IMMOBILITY', 2),\n", - " ('INFLEXIBLE', 2),\n", - " ('MOBILITY', 2),\n", - " ('MOURNFUL', 2),\n", - " ('NONCOUNTRY', 2),\n", - " ('PHOTOGRAPH', 2),\n", - " ('THOUGHTFUL', 2),\n", - " ('WHIPPOORWILL', 2),\n", - " ('WHIRLPOOL', 2),\n", - " ('WINDFALL', 2),\n", - " ('WINDFLAW', 2)]" - ] - }, - "execution_count": 63, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "[(w, c) for w, c in nyt_words if c > 1 and pangram_bonus(w)]" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Step 12: 'S' Words\n", - "\n", - "What if we allowed honeycombs and words to have an 'S' in them? We already saw that 53,556 words were rejected because they contain an 'S'; how much more could a honeycomb score if we allow 'S' words?" - ] - }, - { - "cell_type": "code", - "execution_count": 64, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Best Honeycomb(letters='AEINRST', center='E') scores 8681 points on 1179 words from a 98141 word list:\n", + "Top Honeycomb(letters='AEINRST', center='E') scores 8681 points on 1179 words from a 98141 word list:\n", "\n", "AEINRST 1381 points 86 pangrams ANESTRI(14) ANTISERA(15) ANTISTRESS(17) ANTSIER(14) ARENITES(15) ARSENITE(15)\n", " ARSENITES(16) ARTINESS(15) ARTINESSES(17) ATTAINERS(16) ENTERTAINERS(19) ENTERTAINS(17) ENTRAINERS(17)\n", @@ -2377,8 +1420,8 @@ } ], "source": [ - "enable1s = valid_words(open('enable1.txt').read(), \n", - " lambda w: len(w) >= 4 and len(set(w)) <= 7)\n", + "enable1s = [w for w in open('enable1.txt').read().upper().split()\n", + " if len(w) >= 4 and len(set(w)) <= 7]\n", "\n", "report(words=enable1s)" ] @@ -2393,7 +1436,9 @@ "\n", "\n", "
\n", - " 537 words; 3,898 points         1,179 words; 8,681 points\n", + " 537 words                         1,179 words \n", + "
50 pangrams                         86 pangrams\n", + "
3,898 points                         8,681 points\n", "
\n", "
" ] @@ -2406,19 +1451,20 @@ "\n", "This notebook showed how to find the highest-scoring honeycomb. Four ideas led to four approaches:\n", "\n", - "1. **Brute Force Enumeration**: Compute the game score for every possible honeycomb; return the best.\n", + "1. **Brute Force Enumeration**: Compute the game score for every possible honeycomb; return the highest-scoring.\n", "2. **Pangram Lettersets**: Compute the game score for just the honeycombs that are pangram lettersets (with all possible centers).\n", - "3. **Points Table**: Precompute the score for each letterset; then for each candidate honeycomb, sum the scores of the 64 letter subsets.\n", - "4. **Branch and Bound**: Try all 7 centers only for lettersets that score better than the best score so far.\n", + "3. **Points Table**: Precompute the score for each letterset; then for each candidate honeycomb, sum the scores of the 63 letter subsets.\n", + "4. **Branch and Bound**: Try all 7 centers only for lettersets that score better than the top score so far.\n", "\n", - "These ideas led to a substantial reduction in the number of honeycombs examined (a factor of 400), the run time of a call to `game_score` (a factor of 300), and the overall run time of `best_honeycomb` (a factor of 75,000, although I didn't actually run the first two cases, just estimated the time).\n", + "These ideas led to substantial improvements (gain) in number of honeycombs processed, `game_score` run time, and total run time:\n", "\n", - "|Approach|Honeycombs|`game_score` Time|Total Run Time|\n", - "|--------|----------|--------|----|\n", - "|**1. Brute Force Enumeration**|3,364,900|9000 microseconds|8.5 hours|\n", - "|**2. Pangram Lettersets**|55,902|9000 microseconds|500 seconds|\n", - "|**3. Points Table**|55,902|26 microseconds|1.6 seconds|\n", - "|**4. Branch and Bound**|8,084 |26 microseconds|0.4 seconds|\n", + "\n", + "|Approach|Honeycombs|Gain|`game_score` Time|Gain|Total Run Time|Gain|\n", + "|--------|----------|--------|----|---|---|---|\n", + "|**1. Brute Force Enumeration**|3,364,900|——|9000 microseconds|——|8.5 hours|——|\n", + "|**2. Pangram Lettersets**|55,902|60×|9000 microseconds|——|500 seconds|60×|\n", + "|**3. Points Table**|55,902|60×|25 microseconds|360×|1.6 seconds|20,000×|\n", + "|**4. Branch and Bound**|8,084 |400×|25 microseconds|360×|0.36 seconds|80,000×|\n", "\n" ] }