Add files via upload
This commit is contained in:
parent
35893c590e
commit
56b1aab373
@ -6,13 +6,13 @@
|
||||
"source": [
|
||||
"<div style=\"text-align: right\" align=\"right\"><i>Peter Norvig, 3 Jan 2020</i></div>\n",
|
||||
"\n",
|
||||
"# Spelling Bee\n",
|
||||
"# Spelling Bee Puzzle\n",
|
||||
"\n",
|
||||
"The [Jan. 3 2020 Riddler](https://fivethirtyeight.com/features/can-you-solve-the-vexing-vexillology/) concerns the popular NYTimes [Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee) puzzle:\n",
|
||||
"\n",
|
||||
"*In this game, seven letters are arranged in a honeycomb lattice, with one letter in the center. Here’s the lattice from December 24, 2019:*\n",
|
||||
"\n",
|
||||
"<img src=\"https://fivethirtyeight.com/wp-content/uploads/2020/01/Screen-Shot-2019-12-24-at-5.46.55-PM.png?w=1136\" width=150>\n",
|
||||
"<img src=\"https://fivethirtyeight.com/wp-content/uploads/2020/01/Screen-Shot-2019-12-24-at-5.46.55-PM.png?w=1136\" width=\"150\">\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"*The goal is to identify as many words that meet the following criteria:*\n",
|
||||
@ -26,19 +26,18 @@
|
||||
"\n",
|
||||
"*For consistency, please use [this word list](https://norvig.com/ngrams/enable1.txt) to check your game score.*\n",
|
||||
"\n",
|
||||
"# Approach to a Solution\n",
|
||||
"# My Approach\n",
|
||||
"\n",
|
||||
"Since the referenced [word list](https://norvig.com/ngrams/enable1.txt) was on my web site (it is a standard Scrabble word list that I happen to host a copy of), I felt somewhat compelled to submit an answer. I had worked on word puzzles before, like Scrabble and Boggle. But this puzzle is different because it deals with *unordered sets* of letters, not *ordered permutations* of letters. That makes things much easier. When I searched for an optimal 5×5 Boggle board, I couldn't exhaustively try all $26^{(5×5)} \\approx 10^{35}$ possibilites; I could only do hillclimbing to find a local maximum. But for Spelling Bee, it is feasible to try every possibility and get a guaranteed highest-scoring honeycomb. Here's a sketch of my approach:\n",
|
||||
"Since the referenced [word list](https://norvig.com/ngrams/enable1.txt) came from *my* web site (it is a standard Scrabble word list that I host a copy of), I felt somewhat compelled to solve this. I had worked on word games before, like Scrabble and Boggle. This puzzle is different because it deals with *unordered sets* of letters, not *ordered permutations* of letters. That makes things much easier. When I searched for an optimal 5×5 Boggle board, I couldn't exhaustively try all $26^{(5×5)} \\approx 10^{35}$ possibilites; I could only do hillclimbing to find a local maximum. But for Spelling Bee, it is feasible to try every possibility and get a guaranteed highest-scoring honeycomb. Here's a sketch of my approach:\n",
|
||||
" \n",
|
||||
"- Since order and repetition don't count, we can represent a word as a **set** of letters, which I will call a `letterset`. For simplicity I'll choose to implement that as a sorted string (not as a Python `set` or `frozenset`). For example:\n",
|
||||
" letterset(\"GLAM\") == letterset(\"AMALGAM\") == \"AGLM\"\n",
|
||||
"- A word is a **pangram** if and only if its letterset has exactly 7 letters.\n",
|
||||
"- A honeycomb can be represented by a `(letterset, center)` pair, for example `('AEGLMPX', 'G')` for the honeycomb above.\n",
|
||||
"- Since the rules say every valid honeycomb must contain a pangram, it must be that case that every valid honeycomb *is* a pangram. That means I can:\n",
|
||||
" * Consider all possible pangram lettersets and all possible centers for each pangram\n",
|
||||
" * Compute the game score for each one\n",
|
||||
" * Take the maximum; that's guaranteed to be the best possible honeycomb.\n",
|
||||
"- So it all comes down to having an efficient-enough `game_score` function. "
|
||||
"- Since the rules say every valid honeycomb must contain a pangram, it must be that case that every valid honeycomb *is* a pangram. That means:\n",
|
||||
" * The number of valid honeycombs is 7 times the number of pangram lettersets (because any of the 7 letters could be the center).\n",
|
||||
" * I will consider every valid honeycomb and compute the game score for each one.\n",
|
||||
" * The one with the highest game score is guaranteed to be the best possible honeycomb.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -47,7 +46,7 @@
|
||||
"source": [
|
||||
"# Words, Word Scores, Pangrams, and Lettersets\n",
|
||||
"\n",
|
||||
"I'll start by loading some modules and defining four basic functions about words:"
|
||||
"I'll start by importing some utilities and defining four basic functions about words:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -67,15 +66,14 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def Words(text) -> list:\n",
|
||||
" \"\"\"A list of all the valid space-separated words in a str.\"\"\"\n",
|
||||
" \"\"\"A list of all the valid space-separated words.\"\"\"\n",
|
||||
" return [w for w in text.upper().split() \n",
|
||||
" if len(w) >= 4 and 'S' not in w and len(set(w)) <= 7]\n",
|
||||
"\n",
|
||||
"def word_score(word) -> int: \n",
|
||||
" \"\"\"The points for this word, including bonus for pangram.\"\"\"\n",
|
||||
" N = len(word)\n",
|
||||
" bonus = (7 if is_pangram(word) else 0)\n",
|
||||
" return (1 if N == 4 else N + bonus)\n",
|
||||
" return (1 if len(word) == 4 else len(word) + bonus)\n",
|
||||
"\n",
|
||||
"def is_pangram(word) -> bool: \n",
|
||||
" \"\"\"Does a word use all 7 letters (some maybe more than once)?\"\"\"\n",
|
||||
@ -118,7 +116,7 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that `I`, `me` and `gem` are too short, `games` has an `s` which is not allowed, and `amalgamation` has too many distinct letters. We're left with six valid words out of the original eleven.\n",
|
||||
"Note that `I`, `me` and `gem` are too short, `games` has an `S` which is not allowed, and `amalgamation` has too many distinct letters. We're left with six valid words out of the original eleven.\n",
|
||||
"\n",
|
||||
"Here are examples of the functions in action:"
|
||||
]
|
||||
@ -204,15 +202,108 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# The enable1 Word List\n",
|
||||
"# Game Score and Best Honeycomb\n",
|
||||
"\n",
|
||||
"Now I will load in the enable1 word list and see what we have:"
|
||||
"The game score for a honeycomb is the sum of the word scores for all the words that the honeycomb can make. How do we know if a honeycomb can make a word? It can if (1) the word contains the honeycomb's center and (2) every letter in the word is in the honeycomb. Another way of saying (2) is that the letters in the word must be a **subset** of the letters in the honeycomb.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def game_score(honeycomb, words) -> int:\n",
|
||||
" \"\"\"The total score for this honeycomb.\"\"\"\n",
|
||||
" return sum(word_score(word) for word in words if can_make(honeycomb, word))\n",
|
||||
"\n",
|
||||
"def can_make(honeycomb, word) -> bool:\n",
|
||||
" \"\"\"Can the honeycomb make this word?\"\"\"\n",
|
||||
" (letters, center) = honeycomb\n",
|
||||
" return center in word and all(L in letters for L in word)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"24"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"honeycomb = ('AEGLMPX', 'G')\n",
|
||||
"\n",
|
||||
"game_score(honeycomb, words)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"I can find the highest-scoring honeycomb by considering all valid honeycombs: ones where the letters are a pangram letterset, and the center can be any of the 7 letters. Then I just need to pick out the honeycomb with the maximum game score:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def best_honeycomb(words) -> list: \n",
|
||||
" \"\"\"Return [score, honeycomb] for the honeycomb with highest score on these words.\"\"\"\n",
|
||||
" return max([game_score(h, words), h] for h in valid_honeycombs(words))\n",
|
||||
"\n",
|
||||
"def valid_honeycombs(words):\n",
|
||||
" \"The valid honeycombs are the pangram lettersets, each with all 7 centers.\"\n",
|
||||
" pangrams = {letterset(w) for w in words if is_pangram(w)}\n",
|
||||
" return ((pangram, center) for pangram in pangrams for center in pangram)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[31, ('ACEIORT', 'T')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"best_honeycomb(words)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**We're done!** We know how to find the best honeycomb. But so far, we've only done it for the tiny word list. \n",
|
||||
"\n",
|
||||
"# The enable1 Word List\n",
|
||||
"\n",
|
||||
"Here's the real word list:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
@ -229,7 +320,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -238,7 +329,7 @@
|
||||
"44585"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -250,7 +341,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -259,7 +350,7 @@
|
||||
"14741"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -269,40 +360,6 @@
|
||||
"len(pangrams)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{'AARDWOLF': 15,\n",
|
||||
" 'BABBLEMENT': 17,\n",
|
||||
" 'CABEZON': 14,\n",
|
||||
" 'COLLOGUING': 17,\n",
|
||||
" 'DEMERGERING': 18,\n",
|
||||
" 'ETYMOLOGY': 16,\n",
|
||||
" 'GARROTTING': 17,\n",
|
||||
" 'IDENTIFY': 15,\n",
|
||||
" 'LARVICIDAL': 17,\n",
|
||||
" 'MORTGAGEE': 16,\n",
|
||||
" 'OVERHELD': 15,\n",
|
||||
" 'PRAWNED': 14,\n",
|
||||
" 'REINITIATED': 18,\n",
|
||||
" 'TOWHEAD': 14,\n",
|
||||
" 'UTOPIAN': 14}"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"{w: word_score(w) for w in pangrams[::1000]} # Just sample some of them"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@ -314,7 +371,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -323,7 +380,7 @@
|
||||
"'ANTITOTALITARIAN'"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -332,6 +389,47 @@
|
||||
"max(enable1, key=word_score)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And what are some of the pangrams?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['AARDWOLF',\n",
|
||||
" 'BABBLEMENT',\n",
|
||||
" 'CABEZON',\n",
|
||||
" 'COLLOGUING',\n",
|
||||
" 'DEMERGERING',\n",
|
||||
" 'ETYMOLOGY',\n",
|
||||
" 'GARROTTING',\n",
|
||||
" 'IDENTIFY',\n",
|
||||
" 'LARVICIDAL',\n",
|
||||
" 'MORTGAGEE',\n",
|
||||
" 'OVERHELD',\n",
|
||||
" 'PRAWNED',\n",
|
||||
" 'REINITIATED',\n",
|
||||
" 'TOWHEAD',\n",
|
||||
" 'UTOPIAN']"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"pangrams[::1000] # Every thousandth one"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
@ -339,134 +437,46 @@
|
||||
"And what's the breakdown of reasons why words are invalid?\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Counter({'short': 922, 'valid': 44585, 'S': 103913, 'long': 23400})"
|
||||
]
|
||||
},
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"Counter(('S' if 'S' in w else 'short' if len(w) < 4 else 'long' if len(set(w)) > 7 else 'valid')\n",
|
||||
" for w in open('enable1.txt').read().upper().split())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are more than twice as many words with an 'S' than there are valid words."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Game Score and Best Honeycomb\n",
|
||||
"\n",
|
||||
"The game score for a honeycomb is the sum of the word scores for all the words that the honeycomb can make. How do we know if a honeycomb can make a word? It can if (1) the word contains the honeycomb's center and (2) every letter in the word is in the honeycomb. Another way of saying (2) is that the letters in the word must be a **subset** of the letters in the honeycomb.\n",
|
||||
"\n",
|
||||
"I can find the highest-scoring honeycomb by considering all pangram lettersets, and for each one considering all 7 possible centers, and keeping the one with the maximum game score:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def game_score(honeycomb, words) -> int:\n",
|
||||
" \"\"\"The total score for this honeycomb.\"\"\"\n",
|
||||
" (letters, center) = honeycomb\n",
|
||||
" return sum(word_score(word) for word in words \n",
|
||||
" if center in word and is_subset(word, letters))\n",
|
||||
"\n",
|
||||
"def best_honeycomb(words) -> list: \n",
|
||||
" \"\"\"Return [score, honeycomb] for the honeycomb with highest score on these words.\"\"\"\n",
|
||||
" pangrams = {letterset(w) for w in words if is_pangram(w)}\n",
|
||||
" honeycombs = ((pangram, center) for pangram in pangrams for center in pangram)\n",
|
||||
" return max([game_score(h, words), h]\n",
|
||||
" for h in honeycombs)\n",
|
||||
"\n",
|
||||
"def is_subset(subset, superset) -> bool: return all(x in superset for x in subset)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's try it:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"24"
|
||||
]
|
||||
},
|
||||
"execution_count": 14,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"honeycomb = ('AEGLMPX', 'G')\n",
|
||||
"\n",
|
||||
"game_score(honeycomb, words)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[31, ('ACEIORT', 'T')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"best_honeycomb(words)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**We're done!** At least theoretically. But how long will it take to run the computation on the big `enable1` word list? Let's see how long it takes to compute the game score of a single honeycomb:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[('S', 103913), ('valid', 44585), ('long', 23400), ('short', 922)]"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"Counter(('S' if 'S' in w else 'short' if len(w) < 4 else 'long' if len(set(w)) > 7 else 'valid')\n",
|
||||
" for w in open('enable1.txt').read().upper().split()).most_common()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are more than twice as many words with an 'S' than there are valid words.\n",
|
||||
"But how long will it take to run the computation on the big `enable1` word list? Let's see how long it takes to compute the game score of a single honeycomb:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 7.38 ms, sys: 332 µs, total: 7.71 ms\n",
|
||||
"Wall time: 7.41 ms\n"
|
||||
"CPU times: user 11.9 ms, sys: 506 µs, total: 12.4 ms\n",
|
||||
"Wall time: 12 ms\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -475,7 +485,7 @@
|
||||
"153"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"execution_count": 17,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -488,49 +498,49 @@
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"About 8 milliseconds. How many minutes would that be for all 14,741 pangrams and all 7 possible center letters for each pangram?"
|
||||
"About 12 milliseconds. How many minutes would that be for all 14,741 × 7 valid honeycombs?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 17,
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"13.758266666666666"
|
||||
"20.6374"
|
||||
]
|
||||
},
|
||||
"execution_count": 17,
|
||||
"execution_count": 18,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
".008 * 14741 * 7 / 60"
|
||||
".012 * 14741 * 7 / 60"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"About 14 minutes. I could run `best_honeycomb(enable1)` right now and take a coffee break until it completes, but I'm predisposed to think that a puzzle like this deserves a more elegant solution. I'd like to get the run time under a minute (as in [Project Euler](https://projecteuler.net/)).\n",
|
||||
"About 20 minutes. I could run `best_honeycomb(enable1)` right now and take a coffee break until it completes, but I'm predisposed to think that a puzzle like this deserves a more elegant solution. I'd like to get the run time under a minute (as in [Project Euler](https://projecteuler.net/)).\n",
|
||||
"\n",
|
||||
"# Making it Faster\n",
|
||||
"\n",
|
||||
"Here's my plan for a more efficient program:\n",
|
||||
"\n",
|
||||
"1. Keep the same strategy of trying every pangram, but do some precomputation that will make `game_score` faster.\n",
|
||||
"1. Keep the same strategy of trying every pangram, but do some precomputation that will make `game_score` much faster.\n",
|
||||
"1. The precomputation is: compute the `letterset` and `word_score` for each word, and make a table of `{letterset: points}` giving the total number of points that can be made with each letterset. I call this a `points_table`.\n",
|
||||
"3. These calculations are independent of the honeycomb, so they only need to be done once per word, not 14,741 × 7 times per word. \n",
|
||||
"3. These calculations are independent of the honeycomb, so they only need to be done, not 14,741 × 7 times. \n",
|
||||
"4. Within `game_score`, generate every valid **subset** of the letters in the honeycomb. A valid subset must include the center letter, and it may or may not include each of the other 6 letters, so there are exactly $2^6 = 64$ subsets. The function `letter_subsets(honeycomb)` returns these.\n",
|
||||
"5. To compute `game_score`, just take the sum of the 64 subset entries in the points table.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"That means that in `game_score` we no longer need to iterate over 44,585 words and check if each word is a subset of the honeycomb. Instead we iterate over the 64 subsets of the honeycomb and check if each one is a word (or more than word) and how many total points those word(s) score. \n",
|
||||
"\n",
|
||||
"Since 64 < 44,585, that's a nice improvement!\n",
|
||||
"Since 64 < 44,585, that's a nice optimization!\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Here's the code. Notice we've changed the interface to `game_score`; it now takes a points table, not a word list."
|
||||
@ -538,7 +548,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 18,
|
||||
"execution_count": 19,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -579,7 +589,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 19,
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -588,7 +598,7 @@
|
||||
"['C', 'AC', 'BC', 'CD', 'ABC', 'ACD', 'BCD', 'ABCD']"
|
||||
]
|
||||
},
|
||||
"execution_count": 19,
|
||||
"execution_count": 20,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -606,7 +616,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 20,
|
||||
"execution_count": 21,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -622,7 +632,7 @@
|
||||
"Counter({'AGLM': 8, 'AEGM': 1, 'AEGLMPX': 15, 'ACEIORT': 31})"
|
||||
]
|
||||
},
|
||||
"execution_count": 20,
|
||||
"execution_count": 21,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -643,7 +653,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"execution_count": 22,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
@ -667,15 +677,15 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 22,
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 2.08 s, sys: 9.46 ms, total: 2.09 s\n",
|
||||
"Wall time: 2.12 s\n"
|
||||
"CPU times: user 2.05 s, sys: 5.2 ms, total: 2.05 s\n",
|
||||
"Wall time: 2.06 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -684,7 +694,7 @@
|
||||
"[3898, ('AEGINRT', 'R')]"
|
||||
]
|
||||
},
|
||||
"execution_count": 22,
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
@ -706,7 +716,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"execution_count": 24,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
@ -746,7 +756,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 24,
|
||||
"execution_count": 25,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -772,7 +782,7 @@
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 25,
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
@ -952,24 +962,24 @@
|
||||
"source": [
|
||||
"# S Words\n",
|
||||
"\n",
|
||||
"What if we did allow honeycombs (and words) with an S in them?"
|
||||
"What if we allowed honeycombs (and words) to have an S?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"execution_count": 27,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def S_words(text) -> list:\n",
|
||||
" \"\"\"A list of all the valid space-separated words in a str, including words with an S.\"\"\"\n",
|
||||
" \"\"\"A list of all the valid space-separated words, including words with an S.\"\"\"\n",
|
||||
" return [w for w in text.upper().split() \n",
|
||||
" if len(w) >= 4 and len(set(w)) <= 7]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"execution_count": 28,
|
||||
"metadata": {
|
||||
"scrolled": false
|
||||
},
|
||||
@ -1257,6 +1267,15 @@
|
||||
"source": [
|
||||
"report(S_words(open('enable1.txt').read()))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here are the highest-scoring honeycombs, with and without an S:\n",
|
||||
"\n",
|
||||
"<img src=\"http://norvig.com/honeycombs.png\" width=\"400\">"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
|
Loading…
Reference in New Issue
Block a user