diff --git a/ipynb/jotto.ipynb b/ipynb/jotto.ipynb index dbf570c..b95d479 100644 --- a/ipynb/jotto.ipynb +++ b/ipynb/jotto.ipynb @@ -306,8 +306,8 @@ "metadata": {}, "outputs": [], "source": [ - "def evaluate(scores: Iterable[Score]) -> None:\n", - " \"\"\"Display statistics and a histogram for these scores.\"\"\"\n", + "def report(scores: Iterable[Score]) -> None:\n", + " \"\"\"Report statistics and a histogram for these scores.\"\"\"\n", " scores = list(scores)\n", " ctr = Counter(scores)\n", " bins = range(min(ctr), max(ctr) + 2)\n", @@ -359,7 +359,7 @@ } ], "source": [ - "evaluate(play(random_guesser, target, verbose=False) for target in wordlist)" + "report(play(random_guesser, target, verbose=False) for target in wordlist)" ] }, { @@ -1112,7 +1112,7 @@ } ], "source": [ - "%time evaluate(tree_scores(minimizing_tree(max, wordlist, inconsistent=False)))" + "%time report(tree_scores(minimizing_tree(max, wordlist, inconsistent=False)))" ] }, { @@ -1144,7 +1144,7 @@ } ], "source": [ - "%time evaluate(tree_scores(minimizing_tree(expectation, wordlist, inconsistent=False)))" + "%time report(tree_scores(minimizing_tree(expectation, wordlist, inconsistent=False)))" ] }, { @@ -1176,7 +1176,7 @@ } ], "source": [ - "%time evaluate(tree_scores(minimizing_tree(neg_entropy, wordlist, inconsistent=False)))" + "%time report(tree_scores(minimizing_tree(neg_entropy, wordlist, inconsistent=False)))" ] }, { @@ -1215,7 +1215,7 @@ } ], "source": [ - "%time evaluate(play(random_guesser, target, verbose=False) for target in wordlist)" + "%time report(play(random_guesser, target, verbose=False) for target in wordlist)" ] }, { @@ -1256,7 +1256,7 @@ } ], "source": [ - "%time evaluate(tree_scores(minimizing_tree(max, wordlist, inconsistent=True)))" + "%time report(tree_scores(minimizing_tree(max, wordlist, inconsistent=True)))" ] }, { @@ -1288,7 +1288,7 @@ } ], "source": [ - "%time evaluate(tree_scores(minimizing_tree(expectation, wordlist, inconsistent=True)))" + "%time report(tree_scores(minimizing_tree(expectation, wordlist, inconsistent=True)))" ] }, { @@ -1320,7 +1320,7 @@ } ], "source": [ - "%time evaluate(tree_scores(minimizing_tree(neg_entropy, wordlist, inconsistent=True)))" + "%time report(tree_scores(minimizing_tree(neg_entropy, wordlist, inconsistent=True)))" ] }, { @@ -1333,8 +1333,8 @@ "\n", "|

Algorithm|Consistent
Only
Mean (Max)|Inconsistent
Allowed
Mean (Max)|\n", "|--|--|--|\n", - "|baseline random guesser|7.33 (16)| |\n", - "|minimize max_counts|7.15 (18)|7.05 (10)|\n", + "|random guesser|7.33 (16)| |\n", + "|minimize max|7.15 (18)|7.05 (10)|\n", "|minimize expectation|7.14 (17)|6.84 (10)|\n", "|minimize neg_entropy|7.09 (19)|6.82 (10)|\n", "\n", @@ -1354,7 +1354,7 @@ " - *Yellow* if the guess letter is in the word but in the wrong spot.\n", " - *Miss* if the letter is not in the word in any spot.\n", " \n", - "Since repeated letters and anagrams are allowed, I can use all of `sgb_words` as my list of allowable Wordle words. (Presumably Wordle uses a different list, but this should be not too far off.)\n", + "Since repeated letters and anagrams are allowed, I can use all of `sgb_words` as my list of allowable Wordle words.\n", "\n", "There seems to be an ambiguity in the rules. Assume the guess is *etude* and the target is *poems*. I think the correct reply should be that one letter *e* is *yellow* and the other is a *miss*, although a strict reading of the rules would say they both should be *yellow*, because both instances of *e* are \"in the word but in the wrong spot.\" I decided that in cases like this I would report the first one as yellow and the second as a miss." ] @@ -1600,7 +1600,7 @@ } ], "source": [ - "%time evaluate(play(random_guesser, target, sgb_words, verbose=False) for target in sgb_words)" + "%time report(play(random_guesser, target, sgb_words, verbose=False) for target in sgb_words)" ] }, { @@ -1633,7 +1633,7 @@ ], "source": [ "%time wtree = minimizing_tree(neg_entropy, sgb_words, sgb_words, inconsistent=False)\n", - "evaluate(tree_scores(wtree))" + "report(tree_scores(wtree))" ] }, { @@ -1666,26 +1666,35 @@ ], "source": [ "%time wtree = minimizing_tree(neg_entropy, sgb_words, sgb_words, inconsistent=True)\n", - "evaluate(tree_scores(wtree))" + "report(tree_scores(wtree))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Pretty good! The Wordle web site challenges you to solve each puzzle in six guesses; we can now do that 99.9% of the time when inconsistent guesses are allowed (a big jump from the 95% without inconsistent guesses and the 92% with random consistent guesses).\n", + "Pretty good! The Wordle web site challenges you to solve each puzzle in six guesses; we can now do that 99.9% of the time when inconsistent guesses are allowed, a big jump from the 95% without inconsistent guesses and the 92% with random consistent guesses. \n", "\n", + "This is all on the `sgb-words.txt` file. I poked around in the Wordle javascript, and I think I found the word list that they use. If I interpreted it correctly, my algorithm gets these results on it:\n", + "\n", + " median: 3 guesses, mean: 3.49 ± 0.60, worst: 6, scores: 2,315\n", + " cumulative: ≤3:53%, ≤4:96%, ≤5:99.8%, ≤6:100%, ≤7:100%, ≤8:100%, ≤9:100%, ≤10:100%\n", + " \n", + "I won't post the word list here, because I don't have the author's permission.\n", "\n", "# Jotto and Wordle Evaluation Summary\n", "\n", - "Here is a summary of the evaluations for both games:\n", + "Here is a summary (the first four columns on `sgb-words.txt`, the last on the Wordle word list):\n", + "\n", + "|


Algorithm|JOTTO
Consistent
Only
Mean (Max)|JOTTO
Inconsistent
Allowed
Mean (Max)|WORDLE
Consistent
Only
Mean (Max)|WORDLE
Inconsistent
Allowed
Mean (Max)|WORDLE
Official
Wordlist
Mean (Max)|\n", + "|--|--|--|--|--|--|\n", + "|random guesser|7.33 (16)| ––––––– |4.64 (14) | ––––––– | 4.08 (8) |\n", + "|minimize max|7.15 (18)|7.05 (10)| ––––––– | ––––––– | ––––––– | ––––––– |\n", + "|minimize expectation|7.14 (17)|6.84 (10)| ––––––– | ––––––– | ––––––– | ––––––– |\n", + "|minimize neg_entropy|7.09 (19)|6.82 (10)| 4.09 (12) | 3.82 (7) | 3.49 (6) |\n", + "\n", + "\n", "\n", - "|


Algorithm|JOTTO
Consistent
Only
Mean (Max)|JOTTO
Inconsistent
Allowed
Mean (Max)|WORDLE
Consistent
Only
Mean (Max)|WORDLE
Inconsistent
Allowed
Mean (Max)|\n", - "|--|--|--|--|--|\n", - "|baseline random guesser|7.33 (16)| –––– |4.64 (14) | ––––\n", - "|minimize max_counts|7.15 (18)|7.05 (10)| –––– | –––– |\n", - "|minimize expectation|7.14 (17)|6.84 (10)| –––– | –––– |\n", - "|minimize neg_entropy|7.09 (19)|6.82 (10)| 4.09 (12) | 3.82 (7) |\n", "\n", "# Sample Wordle Games with Minimizing Guesser\n", "\n", @@ -1856,21 +1865,33 @@ "source": [ "The best words use popular letters, especially \"e\", \"s\", \"a\", \"r\", \"l\", \"t\".\n", "\n", - "The worst words have repeated unpopular letters.\n", + "The worst words have repeated unpopular letters, like \"zz\" and \"yukky\".\n", "\n", "# Next Steps\n", "\n", "There are many directions you could take this if you are interested:\n", - "- Do the refactoring so that the code can neatly handle multiple different games with different replies, etc.\n", - "- Run the computations to figure out the best strategy for Wordle.\n", - "- Rerun the computations with the larger Wordle word lists. If necessary, optimize code first.\n", - "- Consider game variant where each reply consists of two numbers: the number of letters in common with the target, and the number of letters that are in the exact correct position (as in Mastermind).\n", - "- Implement Mastermind with 6 colors and 4 pegs, and with other combinations.\n", - "- What's the best strategy for a chooser who is trying to make the guesser get a bad score. Is there a strategy equilibrium?\n", - "- Our `minimizing_tree` function is **greedy** in that it guesses the word that minimizes some metric of the current situation without looking ahead to future branches in the tree. Can you get better performance by doing some **look-ahead**? Perhaps with a beam search?\n", - "- As an alternative to look-ahead, can you improve a tree by editing it? Given a tree, look for interior nodes that end up with a worse-than-expected average score, and see if the node can be replaced with something better (covering the same target words). Correcting a few bad nodes might be faster than carefully searching for good nodes in the first place.\n", - "- Research what other computer scientists have done with [Jotto](https://arxiv.org/abs/1107.3342) or [Mastermind](http://serkangur.freeservers.com/).\n", - "- What else can you explore?" + "- **Other games:**\n", + " - Consider a Jotto game variant where each reply consists of two numbers: the number of letters in common with the target, and the number of letters that are in the exact correct position (as in Mastermind).\n", + " - Implement [Mastermind](https://en.wikipedia.org/wiki/Mastermind_%28board_game%29). The default version has 6 colors and 4 pegs. Can you go beyond that?\n", + " - Research what other computer scientists have done with [Jotto](https://arxiv.org/abs/1107.3342) or [Mastermind](http://serkangur.freeservers.com/).\n", + "- **Better strategy**:\n", + " - Our `minimizing_tree` function is **greedy** in that it guesses the word that minimizes some metric of the current situation without looking ahead to future branches in the tree. Can you get better performance by doing some **look-ahead**? Perhaps with a beam search?\n", + " - As an alternative to look-ahead, can you improve a tree by editing it? Given a tree, look for interior nodes that end up with a worse-than-expected average score, and see if the node can be replaced with something better (covering the same target words). Correcting a few bad nodes might be faster than carefully searching for good nodes in the first place.\n", + " - The metrics max, expectation, and negative entropy are all designed as proxies to what we really want to minimize: the average number of guesses. Can we estimate that directly? For example, we know a branch of size 1 has average 1; of size 2 has average 1.5; and of size 3 has average 1.5 if one of the words partitions the other two, otherwise an average of 2. Can we learn a function that takes a set of words as input and estimates the average number of guesses for the set?\n", + " - Is it feasible to do a complete search and find the guaranteed optimal strategy? What optimizations to the code would be necessary? How long would the search take?\n", + "- **Code refactoring**:\n", + " - Refactor the code so it can smoothly handle multiple different games with different replies, etc.\n", + "- **Chooser strategy**:\n", + " - Analyze the game where the chooser is not random, but rather is an adversary to the guesser–the chooser tries to choose a word that will maximize the guesser's score. What's a good strategy for the chooser? Is there a strategy equilibrium?\n", + "\n", + "One thing I thought of is to choose a word for which one of the spots can be filled by many letters, such as: \n", + "\n", + " bills cills dills fills gills hills jills kills lills mills nills pills rills sills tills vills wills yills zills\n", + " aight bight dight eight fight hight kight light might night pight right sight tight wight\n", + " backs cacks dacks hacks jacks kacks lacks macks packs racks sacks tacks wacks yacks zacks\n", + " bangs cangs dangs fangs gangs hangs kangs mangs pangs rangs sangs tangs vangs wangs yangs\n", + " bests fests gests hests jests kests lests nests pests rests tests vests wests yests zests\n", + " bines cines dines fines kines lines mines nines pines rines sines tines vines wines zines\n" ] } ],