Merge pull request #2 from norvig/master

Update from master
This commit is contained in:
Luis San Martin
2018-08-17 17:36:29 +02:00
committed by GitHub
26 changed files with 27871 additions and 18797 deletions

View File

@@ -8,43 +8,59 @@ Some are in Jupyter (IPython) notebooks, some in `.py` files. You can view the f
# Index of Jupyter (IPython) Notebooks
|Logic and Number Puzzles|
|Programming Examples|
|---|
|[Advent of Code 2017](https://github.com/norvig/pytudes/blob/master/ipynb/Advent%202017.ipynb)<br>*Puzzle site with a coding puzzle each day for Advent 2017.*|
|[Advent of Code 2016](https://github.com/norvig/pytudes/blob/master/ipynb/Advent%20of%20Code.ipynb)<br>*Puzzle site with a coding puzzle each day for Advent 2016*.|
|[Project Euler Utilities](https://github.com/norvig/pytudes/blob/master/ipynb/Project%20Euler%20Utils.ipynb)<br>*My utility functions for the Project Euler problems, including `Primes` and `Factors`.*|
|[Translating English Sentences into Propositional Logic Statements](https://github.com/norvig/pytudes/blob/master/ipynb/PropositionalLogic.ipynb)<br>*Automatically converting informal English sentences into formal Propositional Logic.*|
|[The Puzzle of the Misanthropic Neighbors](https://github.com/norvig/pytudes/blob/master/ipynb/Mean%20Misanthrope%20Density.ipynb)<br>*How crowded will this neighborhood be, if nobody wants to live next door to anyone else?*|
|[Countdown to 2016](https://github.com/norvig/pytudes/blob/master/ipynb/Countdown.ipynb)<br>*Solving the equation 10 _ 9 _ 8 _ 7 _ 6 _ 5 _ 4 _ 3 _ 2 _ 1 = 2016. From an Alex Bellos puzzle.*|
|[Sicherman Dice](https://github.com/norvig/pytudes/blob/master/ipynb/Sicherman%20Dice.ipynb)<br>*Find a pair of dice that is like a regular pair of dice, only different.*|
|[Beal's Conjecture Revisited](https://github.com/norvig/pytudes/blob/master/ipynb/Beal.ipynb)<br>*A search for counterexamples to Beal's Conjecture*|
|[WWW: Who WIll Win (NBA Title)?](https://github.com/norvig/pytudes/blob/master/ipynb/WWW.ipynb)<br>*Computing the probability of winning the NBA title, for my home town Warriors, or any other team.*|
|[Pickleball Tournament](https://github.com/norvig/pytudes/blob/master/ipynb/Pickleball.ipynb)<br>*Scheduling a doubles tournament fairly and efficiently.*|
|[Conway's Game of Life](https://github.com/norvig/pytudes/blob/master/ipynb/Life.ipynb)<br>*The cellular automata zero-player game.*|
|[A Chaos Game with Triangles](https://github.com/norvig/pytudes/blob/master/ipynb/Sierpinski.ipynb)<br>*A surprising appearance of the Sierpinski triangle in a random walk between vertexes.*|
|[Generating Mazes](https://github.com/norvig/pytudes/blob/master/ipynb/Maze.ipynb)<br>*Make a maze by generating a random tree superimposed on a grid.*|
|Logic and Number Puzzles|
|---|
|[When is Cheryl's Birthday?](https://github.com/norvig/pytudes/blob/master/ipynb/Cheryl.ipynb)<br>*Solving the "Cheryl's Birthday" logic puzzle.*|
|[When Cheryl Met Eve: A Birthday Story](https://github.com/norvig/pytudes/blob/master/ipynb/Cheryl-and-Eve.ipynb)<br>*Inventing new puzzles in the Style of Cheryl's Birthday.*|
|[The Devil and the Coin Flip Game](https://github.com/norvig/pytudes/blob/master/ipynb/Coin%20Flip.ipynb)<br>*How to beat the Devil at his own game.*|
|[The Puzzle of the Misanthropic Neighbors](https://github.com/norvig/pytudes/blob/master/ipynb/Mean%20Misanthrope%20Density.ipynb)<br>*How crowded will this neighborhood be, if nobody wants to live next door to anyone else?*|
|[Four 4s, Five 5s, and Countdown to 2016](https://github.com/norvig/pytudes/blob/master/ipynb/Countdown.ipynb)<br>*Solving the equation 10 _ 9 _ 8 _ 7 _ 6 _ 5 _ 4 _ 3 _ 2 _ 1 = 2016. From an Alex Bellos puzzle.*|
|[Sicherman Dice](https://github.com/norvig/pytudes/blob/master/ipynb/Sicherman%20Dice.ipynb)<br>*Find a pair of dice that is like a regular pair of dice, only different.*|
|[Sol Golomb's Rectangle Puzzle](https://github.com/norvig/pytudes/blob/master/ipynb/Golomb-Puzzle.ipynb)<br>*A Puzzle involving placing rectangles of different sizes inside a square. Bonus: cryptarithmetic.*|
|[WWW: Will Warriors Win?](https://github.com/norvig/pytudes/blob/master/ipynb/WWW.ipynb)<br>*Golden State Warriors probability of winning the 2016 NBA title.*|
|[The Riddler: Battle Royale](https://github.com/norvig/pytudes/blob/master/ipynb/Riddler%20Battle%20Royale.ipynb)<br>*A puzzle involving allocating your troops and going up against an opponent.*|
|Word Games|
|---|
|[xkcd 1970: Name Dominoes](https://github.com/norvig/pytudes/blob/master/ipynb/xkcd-Name-Dominoes.ipynb)<br>*Lay out dominoes legally; the dominoes have people names, not numbers.*|
|[Ghost](https://github.com/norvig/pytudes/blob/master/ipynb/Ghost.ipynb)<br>*The word game Ghost (add letters, try to avoid making a word).*|
|[World's Longest Palindrome](https://github.com/norvig/pytudes/blob/master/ipynb/pal3.ipynb)<br>*Searching for a long Panama-style palindrome, this time letter-by-letter.*|
|[Refactoring a Crossword Game Program](https://github.com/norvig/pytudes/blob/master/ipynb/Scrabble.ipynb)<br>*Refactoring the Scrabble / Word with Friends game from Udacity 212.*|
|[xkcd 1313: Regex Golf](https://github.com/norvig/pytudes/blob/master/ipynb/xkcd1313.ipynb)<br>*Find the smallest regular expression; inspired by Randall Monroe.*|
|[xkcd 1313: Regex Golf (Part 2: Infinite Problems)](https://github.com/norvig/pytudes/blob/master/ipynb/xkcd1313-part2.ipynb)<br>*Regex Golf: better, faster, funner. With Stefan Pochmann.*|
|[Let's Code About Bike Locks](https://github.com/norvig/pytudes/blob/master/ipynb/Fred%20Buns.ipynb)<br>*A tale of a bicycle combination lock that uses letters instead of digits. Inspired by Bike Snob NYC.*|
|[Gesture Typing](https://github.com/norvig/pytudes/blob/master/ipynb/Gesture%20Typing.ipynb)<br>*What word has the longest path on a gesture-typing smartphone keyboard? Inspired by Nicolas Schank and Shumin Zhai.*|
|[Gesture Typing](https://github.com/norvig/pytudes/blob/master/ipynb/Gesture%20Typing.ipynb)<br>*What word has the longest path on a gesture-typing smartphone keyboard?*|
|[How to Do Things with Words, or Statistical Natural Language Processing in Python](https://github.com/norvig/pytudes/blob/master/ipynb/How%20to%20Do%20Things%20with%20Words.ipynb)<br>*Spelling Correction, Secret Codes, Word Segmentation, and more: grab your bag of words.*|
|Computer Science Algorithms, Concepts, and Problems|
|Math Concepts|
|---|
|[A Chaos Game with Triangles](https://github.com/norvig/pytudes/blob/master/ipynb/Sierpinski.ipynb)<br>*A surprising appearance of the Sierpinski triangle in a random walk between vertexes.*|
|[BASIC Interpreter](https://github.com/norvig/pytudes/blob/master/ipynb/BASIC.ipynb)<br>*How to write an interpreter for the BASIC programming language.*|
|[Bad Grade, Good Experience](https://github.com/norvig/pytudes/blob/master/ipynb/Snobol.ipynb)<br>*As a student, did you ever get a bad grade on a programming assignment? (Snobol, Concordance)*|
|[Conway's Game of Life](https://github.com/norvig/pytudes/blob/master/ipynb/Life.ipynb)<br>*The cellular automata zero-player game.*|
|[A Concrete Introduction to Probability](https://github.com/norvig/pytudes/blob/master/ipynb/Probability.ipynb)<br>*Code and examples of the basic principles of Probability Theory.*|
|[Probability, Paradox, and the Reasonable Person Principle](https://github.com/norvig/pytudes/blob/master/ipynb/ProbabilityParadox.ipynb)<br>*Some classic paradoxes in Probability Theory, and how to think about disagreements.*|
|[Symbolic Algebra, Simplification, and Differentiation](https://github.com/norvig/pytudes/blob/master/ipynb/Differentiation.ipynb)<br>*A computer algebra system that manipulates expressions, including symbolic differentiation.*|
|[Economics Simulation](https://github.com/norvig/pytudes/blob/master/ipynb/Economics.ipynb)<br>*A simulation of a simple economic game.*|
|[How to Count Things](https://github.com/norvig/pytudes/blob/master/ipynb/How%20To%20Count%20Things.ipynb)<br>*Combinatorial math: how to count how many things there are, when there are a lot of them.*|
|[Euler's Sum of Powers Conjecture](https://github.com/norvig/pytudes/blob/master/ipynb/Euler's%20Conjecture.ipynb)<br>*Solving a 200-year-old puzzle by finding integers that satisfy a<sup>5</sup> + b<sup>5</sup> + c<sup>5</sup> + d<sup>5</sup> = e<sup>5</sup>.*|
|Computer Science Algorithms and Concepts|
|---|
|[BASIC Interpreter](https://github.com/norvig/pytudes/blob/master/ipynb/BASIC.ipynb)<br>*How to write an interpreter for the BASIC programming language.*|
|[Bad Grade, Good Experience](https://github.com/norvig/pytudes/blob/master/ipynb/Snobol.ipynb)<br>*As a student, did you ever get a bad grade on a programming assignment? (Snobol, Concordance)*|
|[The Convex Hull Problem](https://github.com/norvig/pytudes/blob/master/ipynb/Convex%20Hull.ipynb)<br>*A classic Computer Science Algorithm.*|
|[The Traveling Salesperson Problem](https://github.com/norvig/pytudes/blob/master/ipynb/TSP.ipynb)<br>*Another of the classics.*|
|[Economics Simulation](https://github.com/norvig/pytudes/blob/master/ipynb/Economics.ipynb)<br>*A simulation of a simple economic game.*|
|[Project Euler Utilities](https://github.com/norvig/pytudes/blob/master/ipynb/Project%20Euler%20Utils.ipynb)<br>*My utility functions for the Project Euler problems, including `Primes` and `Factors`.*|
# Index of Python Files
@@ -69,7 +85,9 @@ Some are in Jupyter (IPython) notebooks, some in `.py` files. You can view the f
# Etudes for Programmers
I got the idea for the "etudes" part of the name from this [1978 book by Charles Wetherell](https://books.google.com/books/about/Etudes_for_programmers.html?id=u89WAAAAMAAJ)
I got the idea for the "etudes" part of the name from
this [1978 book](https://books.google.com/books/about/Etudes_for_programmers.html?id=u89WAAAAMAAJ)
by [Charles Wetherell](http://demin.ws/blog/english/2012/08/25/interview-with-charles-wetherell/)
that was very influential to me when I was first learning to program.
![](https://images-na.ssl-images-amazon.com/images/I/51ZnZH29dvL._SX394_BO1,204,203,200_.jpg)

1475
data/latlong.htm Normal file

File diff suppressed because it is too large Load Diff

1
data/ngrams/README.md Normal file
View File

@@ -0,0 +1 @@

File diff suppressed because it is too large Load Diff

View File

@@ -11,7 +11,7 @@
}
},
"source": [
"<div style=\"text-align:right\"><b>Peter Norvig</b> 22 October 2015, revised 28 October 2015, 4 July 2017</div>\n",
"<div style=\"text-align:right\"><b>Peter Norvig</b> 2000; revised 2015&mdash;2018</div>\n",
"\n",
"# Beal's Conjecture Revisited\n",
"\n",
@@ -25,78 +25,52 @@
"made his conjecture in 1993:\n",
"\n",
"> If $A^x + B^y = C^z$, \n",
"> <br>where $A, B, C, x, y, z$ are positive integers and $x, y, z$ are all greater than $2$, \n",
"> <br>where $A, B, C, x, y, z$ are positive integers \n",
"> <br>and $x, y, z$ are all greater than $2$, \n",
"> <br>then $A, B$ and $C$ must have a common prime factor.\n",
"\n",
"[Andrew Wiles](https://en.wikipedia.org/wiki/Andrew_Wiles) proved Fermat's theorem in 1995, but Beal's conjecture remains unproved, and Beal has offered [$1,000,000](http://www.ams.org/profession/prizes-awards/ams-supported/beal-prize) for a proof or disproof. I don't have the mathematical skills of Wiles, so I could never find a proof, but I can write a program to search for counterexamples. I first wrote [that program in 2000](http://norvig.com/beal2000.html), and [my name got associated](https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=beal%20conjecture) with Beal's Conjecture, which means I get a lot of emails with purported proofs or counterexamples (many asking how they can collect their prize money). So far, all the emails have been wrong. This page catalogs some of the more common errors and updates my 2000 program."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"[Andrew Wiles](https://en.wikipedia.org/wiki/Andrew_Wiles) proved Fermat's theorem in 1995, but Beal's conjecture remains unproved, and Beal has offered [one million dollars](http://www.ams.org/profession/prizes-awards/ams-supported/beal-prize) for a proof or disproof. I don't have the mathematical skills of Wiles, so I could never find a proof, but I can write a program to search for counterexamples. I first wrote [that program in 2000](http://norvig.com/beal2000.html), and [my name got associated](https://www.google.com/webhp?#q=beal conjecture) with Beal's Conjecture, which means I get a lot of emails with purported proofs or counterexamples (many asking how they can collect their prize money). So far, all the emails have been wrong. This notebook catalogs some of the more common errors, updates my 2000 program, and introduces this tool for verifying counterexamples:\n",
" \n",
"\n",
"| [**Online Beal Counterexample Checker**](http://norvig.com/bealcheck.html) |\n",
"|:---:|\n",
"\n",
"\n",
"# How to Not Win A Million Dollars\n",
"\n",
"\n",
"* A proof must show that there are no examples that satisfy the conditions. A common error is to show how a certain pattern generates an infinite collection of numbers that satisfy $A^x + B^y = C^z$ and then show that in all of these, $A, B, C$ have a common factor. But that's not good enough, unless you can also prove that no other pattern exists.\n",
"* A proof must show that there are **no** examples that satisfy the conditions. A common error is to show how a certain pattern generates an infinite number of $(A, x, B, y, C, z)$ examples, and that the conjecture holds for this entire infinite collection. But that's not good enough, unless you can also prove that the conjecture holds for every other possible pattern.\n",
"\n",
"<p>\n",
"\n",
"* It is valid to use proof by contradiction: assume the conjecture is true, and show that that leads to a contradiction. It is not valid to use proof by circular reasoning: assume the conjecture is true, put in some irrelevant steps, and show that it follows that the conjecture is true.\n",
"\n",
"\n",
"<p>\n",
"* A valid counterexample needs to satisfy all four conditions&mdash;don't leave one out.\n",
"\n",
"* A valid counterexample needs to satisfy all four conditions&mdash;don't leave one out:\n",
"\n",
"> $A, B, C, x, y, z$ are positive integers <br> \n",
"$x, y, z > 2$ <br>\n",
"$A^x + B^y = C^z$ <br>\n",
"$A, B, C$ have no common prime factor.\n",
"* One correspondent claimed that $27^4 + 162 ^ 3 = 9 ^ 7$ was a solution, because the first three conditions hold, and the common factor is 9, which isn't a prime. But of course, if $A, B, C$ have 9 as a common factor, then they also have 3, and 3 is prime. \"No common prime factor\" means the same thing as \"no common factor greater than 1.\"\n",
"\n",
"(If you think you might have a valid counterexample, before you share it with Andrew Beal or anyone else, you can check it with my [Online Beal Counterexample Checker](http://norvig.com/bealcheck.html).)\n",
"\n",
"<p>\n",
"\n",
"* One correspondent claimed that $27^4 + 162 ^ 3 = 9 ^ 7$ was a solution, because the first three conditions hold, and the common factor is 9, which isn't a prime. But of course, if $A, B, C$ have 9 as a common factor, then they also have 3, and 3 is prime. The phrase \"no common prime factor\" means the same thing as \"no common factor greater than 1.\"\n",
"\n",
"<p>\n",
"\n",
"* Another claimed that $2^3+2^3=2^4$ was a counterexample, because all the bases are 2, which is prime, and prime numbers have no prime factors. But that's not true; a prime number has itself as a factor.\n",
"\n",
"<p>\n",
"\n",
"* A creative person offered $1359072^4 - 940896^4 = 137998080^3$, which fails both because $3^3 2^5 11^2$ is a common factor, and because it has a subtraction rather than an addition (although, as Julius Jacobsen pointed out, that can be rectified by adding $940896^4$ to both sides).\n",
"* A creative person offered $ 1359072^4 - 940896^4 = 137998080^3$, which fails both because $ 3^3 2^5 11^2 $ is a common factor, and because it has a subtraction rather than an addition (although, as Julius Jacobsen pointed out, it could be rewritten as $ 137998080^3 + 940896^4 = 1359072^4 $).\n",
"\n",
"<p>\n",
"\n",
"* Mustafa Pehlivan came up with an example involving 76-million-digit numbers, which took some work to prove wrong (by using modulo arithmetic). \n",
"* Mustafa Pehlivan came up with an example involving 76-million-digit numbers, which took some work to prove wrong (using modulo arithmetic). \n",
"\n",
"<p>\n",
"\n",
"* Another Beal fan started by saying \"Let $C = 43$ and $z = 3$. Since $43 = 21 + 22$, we have $43^3 = (21^3 + 22^3).$\" But of course $(a + b)^3 \\ne (a^3 + b^3)$. This fallacy is called [the freshman's dream](https://en.wikipedia.org/wiki/Freshman%27s_dream) (although I remember having different dreams as a freshman).\n",
"* Another Beal fan started by saying \"Let $C = 43$ and $z = 3$. Since $43 = 21 + 22$, we have $43^3 = (21^3 + 22^3)$.\" But of course $(a + b)^3 \\ne (a^3 + b^3)$. This fallacy is called [the freshman's dream](https://en.wikipedia.org/wiki/Freshman%27s_dream) (although I remember having different dreams as a freshman).\n",
"\n",
"<p>\n",
"\n",
"* Multiple people proposed answers similar to this one:"
"* Multiple people proposed counterexamples similar to this one:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
"collapsed": true
},
"outputs": [],
"source": [
@@ -108,6 +82,7 @@
"execution_count": 2,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -126,11 +101,10 @@
}
],
"source": [
"A, B, C = 60000000000000000000, 70000000000000000000, 82376613842809255677\n",
"\n",
"A, B, C = 60000000000000000001, 70000000000000000003, 82376613842809255677\n",
"x = y = z = 3.\n",
"\n",
"A ** x + B ** y == C ** z and gcd(gcd(A, B), C) == 1"
"A ** x + B ** y == C ** z and gcd(A, B) == gcd(B, C) == 1"
]
},
{
@@ -143,9 +117,9 @@
}
},
"source": [
"**WOW! The result is `True`!** Is this a real counterexample to Beal? And also a disproof of Fermat?\n",
"**WOW! The result is `True`!** The two sides of the equation are equal, and the greatest common divisor is 1. Is this a real counterexample to Beal? And also a disproof of Fermat's Last Theorem?\n",
"\n",
"Alas, it is not. Notice the decimal point in \"`3.`\", indicating a floating point number, with inexact, limited precision. Change the inexact \"`3.`\" to an exact \"`3`\" and the result changes to \"`False`\". Below we see that the two sides of the equation are the same for the first 18 digits, but differ starting with the 19th: "
"Alas, it is not. The decimal point in \"`x = y = z = 3`**`.`**\" indicates a floating point number, with inexact, limited precision. Change the inexact \"`3.`\" to an exact \"`3`\" and the two sides of the equation are no longer equal. Below we see they are the same for the first 19 digits, but differ starting with the 20th: "
]
},
{
@@ -153,6 +127,7 @@
"execution_count": 3,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -162,7 +137,7 @@
{
"data": {
"text/plain": [
"(559000000000000000000000000000000000000000000000000000000000,\n",
"(559000000000000000054900000000000000002070000000000000000028,\n",
" 559000000000000000063037470301555182935702892172500189973733)"
]
},
@@ -186,101 +161,128 @@
}
},
"source": [
"They say \"close\" only counts in horseshoes and hand grenades, and if you threw two horseshoes at a stake on the planet [Kapteyn-b](https://en.wikipedia.org/wiki/Kapteyn_b) (a possibly habitable and thus possibly horseshoe-playing exoplanet 12.8 light years from Earth) and the two paths differed in the 19th digit, the horseshoes would end up [less than an inch](https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=12.8%20light%20years%20*%201e-19%20in%20inches) apart. That's really, really close, but close doesn't count in number theory.\n",
"They say \"close\" only counts in horseshoes and hand grenades, and if you stood in your yard and threw a horseshoe at a stake on **[Kapteyn-b](https://en.wikipedia.org/wiki/Kapteyn_b)** (an exoplanet 12.8 light years from Earth that is deemed habitable and thus possibly horseshoe-playing) and the flight path differed from the perfect path in the 20th digit, then it would end up \n",
"[about a millimeter](https://www.google.com/search?q=12.8+light+years+%2F+10%5E20+in+inches&oq=12.8+light+years+%2F+10%5E19+in+inches) \n",
"from the target. That's really, really close, but close doesn't count in number theory.\n",
"\n",
"![Kapteyn-b and Homer Simpson](https://norvig.com/homer.png)\n",
"Left: [Kapteyn-b](https://www.space.com/26115-oldest-habitable-alien-planet-kapteyn-b.html). &nbsp; &nbsp; Right: [Homer Simpson](https://www.youtube.com/watch?time_continue=1&v=ReOQ300AcSU).\n",
"\n",
"\n",
"# *The Simpsons* and Fermat\n",
"\n",
"In two different [episodes of *The Simpsons*](http://www.npr.org/sections/krulwich/2014/05/08/310818693/did-homer-simpson-actually-solve-fermat-s-last-theorem-take-a-look), close counterexamples to Fermat's Last Theorem are shown: \n",
"$1782^{12} + 1841^{12} = 1922^{12}$ and $3987^{12} + 4365^{12} = 4472^{12}$. These were designed by *Simpsons* writer David X. Cohen to be correct up to the precision found in most handheld calculators. Cohen found the equations with a program that must have been something like this:"
"$3987^{12} + 4365^{12} = 4472^{12}$ and $1782^{12} + 1841^{12} = 1922^{12}$. These were designed by *Simpsons* writer David X. Cohen to be correct up to the precision of a typical handheld calculator; here we see the two sides of the second equation agree on the first ten digits, `6397665634`, and then differ:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
"collapsed": false
},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"(63976656349698612616236230953154487896987106,\n",
" 63976656348486725806862358322168575784124416)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from itertools import combinations\n",
"\n",
"def simpsons(bases, powers):\n",
" \"\"\"Find the integers (A, B, C, n) that come closest to solving \n",
" Fermat's equation, A ** n + B ** n == C ** n. \n",
" Let A, B range over all pairs of bases and n over all powers.\"\"\"\n",
" equations = ((A, B, iroot(A ** n + B ** n, n), n)\n",
" for A, B in combinations(bases, 2)\n",
" for n in powers)\n",
" return min(equations, key=relative_error)\n",
"\n",
"def iroot(i, n): \n",
" \"The integer closest to the nth root of i.\"\n",
" return int(round(i ** (1./n)))\n",
"\n",
"def relative_error(equation):\n",
" \"Error between LHS and RHS of equation, relative to RHS.\" \n",
" (A, B, C, n) = equation\n",
" LHS = A ** n + B ** n\n",
" RHS = C ** n\n",
" return abs(LHS - RHS) / RHS"
"3987 ** 12 + 4365 ** 12, 4472 ** 12"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Cohen must have found the equations with a program something like this (here `bases` is a sequence of integers to consider for the values of `A` and `B`; the variables `An` and `Bn` hold the `A**n` and `B**n` values; `lhs` is their sum (the left-hand-side of the equation); and the function `Cn` computes the `C**n` that is closest to that sum):"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(1782, 1841, 1922, 12)"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"outputs": [],
"source": [
"simpsons(range(1000, 2000), [11, 12, 13])"
"from itertools import combinations\n",
"\n",
"def simpsons(bases, n):\n",
" \"\"\"Print the (A**n + B**n = C**n) equation that minimizes the relative error,\n",
" for a given n and A, B values from the sequence of integers `bases`.\"\"\"\n",
" def Cn(lhs): return iroot(sum(lhs), n) ** n\n",
" def err(lhs): return abs(sum(lhs) - Cn(lhs)) / sum(lhs)\n",
" def show(Xn): return '{} ** {}'.format(iroot(Xn, n), n)\n",
" powers = [b ** n for b in bases]\n",
" (An, Bn) = lhs = min(combinations(powers, 2), key=err)\n",
" print('{} + {} == {} (with error {:.0g})'\n",
" .format(show(An), show(Bn), show(Cn(lhs)), err(lhs)))\n",
"\n",
"def iroot(x, n): \"integer nth root\"; return int(round(x ** (1 / n)))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"(3987, 4365, 4472, 12)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
"name": "stdout",
"output_type": "stream",
"text": [
"1782 ** 12 + 1841 ** 12 == 1922 ** 12 (with error 3e-10)\n",
"3987 ** 12 + 4365 ** 12 == 4472 ** 12 (with error 2e-11)\n"
]
}
],
"source": [
"simpsons(range(3000, 5000), [12])"
"simpsons(range(1000, 2000), 12)\n",
"simpsons(range(2000, 5000), 12)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These are the same two equations that David X. Cohen found. \n",
"\n",
"Can we find other near-misses? I'll try each single-digit exponent. I want A, B, C to be 4 digits each, so I'll limit A and B to 9500 (not 9999), to try to keep C from overflowing to 5 digits. (This takes around 10 minutes to run.)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5856 ** 3 + 9036 ** 3 == 9791 ** 3 (with error 1e-12)\n",
"2396 ** 4 + 4551 ** 4 == 4636 ** 4 (with error 4e-11)\n",
"3993 ** 5 + 7767 ** 5 == 7822 ** 5 (with error 2e-11)\n",
"6107 ** 6 + 8919 ** 6 == 9066 ** 6 (with error 8e-13)\n",
"5592 ** 7 + 9079 ** 7 == 9122 ** 7 (with error 2e-11)\n",
"4749 ** 8 + 8952 ** 8 == 8959 ** 8 (with error 3e-11)\n",
"5433 ** 9 + 6725 ** 9 == 6828 ** 9 (with error 4e-11)\n"
]
}
],
"source": [
"for n in range(3, 10):\n",
" simpsons(range(1000, 9500), n)"
]
},
{
@@ -293,21 +295,24 @@
}
},
"source": [
"# Back to Beal\n",
"The equation for *n*=6 has the smallest error yet (in the 12th decimal place).\n",
"\n",
"# Back to `beal`\n",
"\n",
"In October 2015 I looked back at my original [program from 2000](http://norvig.com/beal2000.html).\n",
"I ported it from Python 1.5 to 3.5 (by putting parens around the argument to `print` and adding `long = int`). The program runs 250 times faster today than it did in 2000, a tribute to both computer hardware engineers and the developers of the Python interpreter.\n",
"I ported it from Python 1.5 to 3.5 (`print` is now a function, `long` is `int`). It runs 250 times faster today, a tribute to both computer hardware engineers and the developers of the Python interpreter.\n",
"\n",
"I found that I was a bit confused about the definition of [the problem in 2000](https://web.archive.org/web/19991127081319/http://bealconjecture.com/). I thought then that, *by definition*, $A$ and $B$ could not have a common factor, but actually, the definition of the conjecture only rules out examples where all three of $A, B, C$ share a common factor. Mark Tiefenbruck (and later Edward P. Berlin and Shen Lixing) wrote to point out that my thought was actually correct, not by definition, but by derivation: if $A$ and $B$ have a commmon prime factor $p$, then the sum of $A^x + B^y$ must also have that factor $p$, and since $A^x + B^y = C^z$, then $C^z$ and hence $C$ must have the factor $p$. So I was wrong twice, and in this case two wrongs did make a right.\n",
"I found that I had [misstated the problem](https://web.archive.org/web/19991127081319/http://bealconjecture.com/) in 2000. I thought that, *by definition*, $A$ and $B$ could not have a common factor, but actually, \n",
"the conjecture only rules out examples where all three of $A, B, C$ share a common factor. But, as [Mark Tiefenbruck](mailto:mark @tiefenbruck.org) (as well as Edward P. Berlin and Shen Lixing) pointed out, my statement is correct, not by definition, but *by derivation:* if $A$ and $B$ have a common prime factor $p$, then the sum of $A^x + B^y$ must also have that factor $p$, and hence $C^z$, and $C$, must have the factor $p$.\n",
"\n",
"Mark Tiefenbruck also suggested an optimization: only consider exponents that are odd primes, or 4. The idea is that a number like 512 can be expressed as either $2^9$ or $8^3$, and my program doesn't need to consider both. In general, any time we have a composite exponent, such as $b^{qp}$, where $p$ is prime, we should ignore $b^{(qp)}$, and instead consider only $(b^q)^p$. There's one complication to this scheme: 2 is a prime, but 2 is not a valid exponent for a Beal counterexample. So we will allow 4 as an exponent, as well as all odd primes up to `max_x`.\n",
"Mark Tiefenbruck also suggested another optimization: only consider exponents that are odd primes, or 4. The idea is that a number like 512 can be expressed as either $2^9$ or $8^3$, and my program doesn't need to consider both. In general, any time we have a composite exponent, such as $b^{qp}$, where $p$ is prime, we should ignore $A=b, x=qp$, and instead consider only $A=b^q, x=p$. There's one complication to this scheme: 2 is a prime, but 2 is not a valid exponent for a Beal counterexample. So we will allow 4 as an exponent, as well as all odd primes up to `max_x`.\n",
"\n",
"Here is the complete, updated, refactored, optimized program:"
"Here is the complete, updated program:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 8,
"metadata": {
"button": false,
"collapsed": true,
@@ -373,9 +378,10 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 9,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -386,8 +392,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 256 ms, sys: 1.44 ms, total: 257 ms\n",
"Wall time: 256 ms\n"
"CPU times: user 353 ms, sys: 4.84 ms, total: 358 ms\n",
"Wall time: 376 ms\n"
]
}
],
@@ -405,14 +411,15 @@
}
},
"source": [
"The execution time goes up roughly with the square of `max_A`, so the following should take about 100 times longer:"
"The execution time goes up roughly with the square of `max_A`, so the following should take about 25 times longer:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 10,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -423,13 +430,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 29.1 s, sys: 25.2 ms, total: 29.2 s\n",
"Wall time: 29.2 s\n"
"CPU times: user 8.97 s, sys: 56.6 ms, total: 9.03 s\n",
"Wall time: 9.12 s\n"
]
}
],
"source": [
"%time beal(1000, 100)"
"%time beal(500, 100)"
]
},
{
@@ -454,9 +461,10 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 11,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -474,7 +482,7 @@
" 6: [216, 1296, 7776, 279936]}"
]
},
"execution_count": 10,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -499,9 +507,10 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 12,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -534,7 +543,7 @@
" 279936: 6}"
]
},
"execution_count": 11,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
@@ -546,8 +555,10 @@
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
@@ -555,7 +566,7 @@
"3"
]
},
"execution_count": 12,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
@@ -584,9 +595,10 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 14,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -632,9 +644,10 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 15,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -647,7 +660,7 @@
"{True}"
]
},
"execution_count": 14,
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
@@ -676,12 +689,12 @@
}
},
"source": [
"I get nervous having an incorrect version of `gcd` around; let's change it back, quick!"
"I get nervous having an incorrect version of `gcd` around: change it back, quick!"
]
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 16,
"metadata": {
"button": false,
"collapsed": true,
@@ -712,9 +725,10 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 17,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -727,7 +741,7 @@
"'tests pass'"
]
},
"execution_count": 16,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
@@ -832,7 +846,7 @@
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 18,
"metadata": {
"button": false,
"collapsed": true,
@@ -897,8 +911,10 @@
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
@@ -911,7 +927,7 @@
" 6: [(216, 3), (296, 4), (776, 5), (936, 7)]}"
]
},
"execution_count": 18,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
@@ -937,9 +953,10 @@
},
{
"cell_type": "code",
"execution_count": 19,
"execution_count": 20,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -971,7 +988,7 @@
" 936: [(6, 7)]})"
]
},
"execution_count": 19,
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
@@ -995,9 +1012,10 @@
},
{
"cell_type": "code",
"execution_count": 20,
"execution_count": 21,
"metadata": {
"button": false,
"collapsed": false,
"new_sheet": false,
"run_control": {
"read_only": false
@@ -1008,8 +1026,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 35.5 s, sys: 44.1 ms, total: 35.5 s\n",
"Wall time: 35.5 s\n"
"CPU times: user 56 s, sys: 436 ms, total: 56.4 s\n",
"Wall time: 59.2 s\n"
]
}
],
@@ -1035,15 +1053,6 @@
"\n",
"This was fun, but I can't recommend anyone spend a serious amount of computer time looking for counterexamples to the Beal Conjecture&mdash;the money you would have to spend in computer time would be more than the expected value of your prize winnings. I suggest you work on a proof rather than a counterexample, or work on some other interesting problem instead!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
@@ -1062,7 +1071,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
"version": "3.6.0"
}
},
"nbformat": 4,

File diff suppressed because one or more lines are too long

View File

@@ -17,7 +17,7 @@
"\n",
"Peter Norvig, April 2015\n",
"\n",
"This logic puzzle has been [making the rounds](https://www.google.com/webhp?#q=cheryl%27s+birthday):\n",
"This logic puzzle has been [making the rounds](https://www.google.com/webhp?#q=cheryl%27s+birthday) (and [not always favorably](https://www.newyorker.com/cartoons/daily-cartoon/daily-cartoon-thursday-april-16th-cheryl-singapore-math)):\n",
"\n",
"\n",
"1. Albert and Bernard just became friends with Cheryl, and they want to know when her birthday is. Cheryl gave them a list of 10 possible dates:\n",

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -48,7 +48,7 @@
"N = 5000 # Default size of the population\n",
"MU = 100. # Default mean of the population\n",
"\n",
"population = [random.gauss(mu=MU, sigma=MU/5) for actor in range(N)]"
"population = [random.gauss(mu=MU, sigma=MU/5) for _ in range(N)]"
]
},
{
@@ -69,7 +69,7 @@
"<tr><td>Switzerland <td> 0.337\n",
"<tr><td>United States<td> 0.408\n",
"<tr><td>Chile <td> 0.521\n",
"<tr><td>South Africe <td> 0.631\n",
"<tr><td>South Africa <td> 0.631\n",
"</table>\n",
"\n",
"\n",
@@ -182,7 +182,7 @@
"outputs": [],
"source": [
"def random_split(A, B):\n",
" \"Take all the money uin the pot and divide it randomly between the two actors.\"\n",
" \"Take all the money in the pot and divide it randomly between the two actors.\"\n",
" pot = A + B\n",
" share = random.uniform(0, pot)\n",
" return share, pot - share"
@@ -326,7 +326,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# SImulation Visualization\n",
"# Simulation Visualization\n",
"\n",
"If we want to do larger simulations we'll need a better way to visualize the results.\n",
"The function `show` does that:"

View File

@@ -0,0 +1,163 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Euler's Sum of Powers Conjecture\n",
"\n",
"In 1769, Leonhard Euler [conjectured that](https://en.wikipedia.org/wiki/Euler%27s_sum_of_powers_conjecture) for all integers *n* and *k* greater than 1, if the sum of *n* *k*th powers of positive integers is itself a *k*th power, then *n* is greater than or equal to *k*. For example, this would mean that no sum of a pair of cubes (a<sup>3</sup> + b<sup>3</sup>) can be equal to another cube (c<sup>3</sup>), but a sum of three cubes can, as in 3<sup>3</sup> + 4<sup>3</sup> + 5<sup>3</sup> = 6<sup>3</sup>. \n",
"\n",
"It took 200 years to disprove the conjecture: in 1966 L. J. Lander and T. R. Parkin published a refreshingly short [article](https://projecteuclid.org/download/pdf_1/euclid.bams/1183528522) giving a counterexample of four fifth-powers that summed to another fifth power. They found it via a program that did an exhaustive search. Can we duplicate their work and find integers greater than 1 such that \n",
"*a*<sup>5</sup> + *b*<sup>5</sup> + *c*<sup>5</sup> + *d*<sup>5</sup> = *e*<sup>5</sup> ?\n",
"\n",
"## Algorithm\n",
"\n",
"An exhaustive *O*(*m*<sup>4</sup>) algorithm woud be to look at all values of *a, b, c, d* < *m* and check if *a*<sup>5</sup> + *b*<sup>5</sup> + *c*<sup>5</sup> + *d*<sup>5</sup> is a fifth power. But we can do better: a sum of four numbers is a sum of two pairs of numbers, so we\n",
"are looking for\n",
"\n",
"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*pair*<sub>1</sub> + *pair*<sub>2</sub> = *e*<sup>5</sup> &nbsp;&nbsp; **where** &nbsp; *pair*<sub>1</sub> = *a*<sup>5</sup> + *b*<sup>5</sup> **and** *pair*<sub>2</sub> = *c*<sup>5</sup> + *d*<sup>5</sup>.\n",
"\n",
"We will define *pairs* be a dict of `{`*a*<sup>5</sup> + *b*<sup>5</sup>`: (`*a*<sup>5</sup>`, ` *b*<sup>5</sup>`)}` entries for all *a* ≤ *b* < *m*; for example, for *a*=2 and *b*=10, the entry is `{100032: (32, 100000)}`.\n",
"Then we can ask for each *pair*<sub>1</sub>, and for each *e*, whether there is a *pair*<sub>2</sub> in the `dict` that makes the equation work. There are *O*(*m*<sup>2</sup>) pairs and *O*(*m*) values of *e*, and `dict` lookup is *O*(1), so the whole algorithm is *O*(*m*<sup>3</sup>):"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import itertools\n",
"\n",
"def euler(m):\n",
" \"\"\"Yield tuples (a, b, c, d, e) such that a^5 + b^5 + c^5 + d^5 = e^5,\n",
" where all are integers, and 1 < a ≤ b ≤ c ≤ d < e < m.\"\"\"\n",
" powers = [e**5 for e in range(2, m)] \n",
" pairs = {sum(pair): pair \n",
" for pair in itertools.combinations_with_replacement(powers, 2)}\n",
" for pair1 in pairs:\n",
" for e5 in powers:\n",
" pair2 = e5 - pair1\n",
" if pair2 in pairs:\n",
" yield fifthroots(pairs[pair1] + pairs[pair2] + (e5,))\n",
" \n",
"def fifthroots(nums): \n",
" \"Sorted integer fifth roots of a collection of numbers.\" \n",
" return tuple(sorted(int(round(x ** (1/5))) for x in nums))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's look for a solution (arbitrarily choosing *m*=500):"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.07 s, sys: 21.4 ms, total: 1.09 s\n",
"Wall time: 1.11 s\n"
]
},
{
"data": {
"text/plain": [
"(27, 84, 110, 133, 144)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time next(euler(500))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That was easy, and it turns out this is the same answer that Lander and Parkin got: 27<sup>5</sup> + 84<sup>5</sup> + 110<sup>5</sup> + 133<sup>5</sup> = 144<sup>5</sup>.\n",
"\n",
"We can keep going, collecting all the solutions up to `*m*=1000`:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1min 53s, sys: 706 ms, total: 1min 54s\n",
"Wall time: 1min 57s\n"
]
},
{
"data": {
"text/plain": [
"{(27, 84, 110, 133, 144),\n",
" (54, 168, 220, 266, 288),\n",
" (81, 252, 330, 399, 432),\n",
" (108, 336, 440, 532, 576),\n",
" (135, 420, 550, 665, 720),\n",
" (162, 504, 660, 798, 864)}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time set(euler(1000))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All the answers are multiples of the first one (this is easiest to see in the middle column: 110, 220, 330, ...).\n",
"Since 1966 other mathematicians have found [other solutions](https://en.wikipedia.org/wiki/Euler%27s_sum_of_powers_conjecture), but all we need is one to disprove Euler's conjecture."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

File diff suppressed because one or more lines are too long

View File

@@ -13,7 +13,7 @@
"source": [
"This problem by Solomon Golomb was presented by Gary Antonik in his 14/4/14 New York Times [Numberplay column](http://wordplay.blogs.nytimes.com/2014/04/14/rectangle):\n",
"\n",
"><i>Say youre given the following challenge: create a set of five rectangles that have sides of length 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 units. You can combine sides in a variety of ways: for example, you could create a set of rectangles with dimensions 1 x 3, 2 x 4, 5 x 7, 6 x 8 and 9 x 10.\n",
">Say youre given the following challenge: create a set of five rectangles that have sides of length 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 units. You can combine sides in a variety of ways: for example, you could create a set of rectangles with dimensions 1 x 3, 2 x 4, 5 x 7, 6 x 8 and 9 x 10.\n",
">\n",
">1. How many different sets of five rectangles are possible?\n",
">\n",
@@ -21,7 +21,7 @@
">\n",
">3. What other values for the total areas of the five rectangles are possible?\n",
">\n",
">4. Which sets of rectangles may be assembled to form a square?</i>\n",
">4. Which sets of rectangles may be assembled to form a square?\n",
"\n",
"To me, these are interesting questions because, first, I have a (slight) personal connection to Solomon Golomb (my former colleague at USC) and to Nelson Blachman (the father of my colleague Nancy Blachman), who presented the problem to Antonik, and second, I find it interesting that the problems span the range from mathematical to computational. Let's answer them."
]
@@ -645,7 +645,7 @@
"\n",
"In Way 1, we could pre-sort the rectangles (say, biggest first). Then we try to put the biggest rectangle in all possible positions on the grid, and for each position that fits, try putting the second biggest rectangle in all remaining positions, and so on. As a rough estimate, assume there are on average about 10 ways to place a rectangle. Then this way will look at about 10<sup>5</sup> = 100,000 combinations.\n",
"\n",
"In Way 2, we consider the positions in some fixed order; say top-to-bottom, left-to right. Take the first empty position (say, the upper left corner). Try putting each of the rectangles there, and for each one that fits, try all possible rectangles in the next empty position, and so on. There are only 5! permutations of rectangles, and each rectangle can go either horizontally or vertically, so we would have to consider 5! &times; 2<sup>5</sup> = 3840 combinations. Since 3840 &lt; 100,000, I'll go with Way 2. Here is a more precise description:\n",
"In Way 2, we consider the positions in some fixed order; say top-to-bottom, left-to right. Take the first empty position (say, the upper left corner). Try putting each of the rectangles there, and for each one that fits, try all possible rectangles in the next empty position, and so on. There are only 5! permutations of rectangles, and each rectangle can go either horizontaly or vertically, so we would have to consider 5! &times; 2<sup>5</sup> = 3840 combinations. Since 3840 &lt; 100,000, I'll go with Way 2. Here is a more precise description:\n",
"\n",
"> Way 2: To `pack` a set of rectangles onto a grid, find the first empty cell on the grid. Try in turn all possible placements of any rectangle (in either orientation) at that position. For each one that fits, try to `pack` the remaining rectangles, and return the resulting grid if one of these packings succeeds. "
]
@@ -669,7 +669,7 @@
" return solution\n",
"\n",
"def rectangle_placements(rectangles, grid, pos):\n",
" \"Yield all (rectangles2, grid2) pairs that are the result of placing any rectangle at pos on grid.\"\n",
" \"Yield all (rect, grid) pairs that result from placing a rectangle at pos on grid.\"\n",
" for (w, h) in rectangles:\n",
" for rect in [(w, h), (h, w)]:\n",
" grid2 = place_rectangle_at(rect, grid, pos)\n",
@@ -728,7 +728,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"It would be nicer to have a graphical display of colored rectangles. I will define the function `show` which displays a grid as colored rectangles, by calling upon `html_table`, which formats any grid into HTML text."
"It would be nicer to have a graphical display of colored rectangles. I will define the function `show` which displays a grid as colored rectangles, by calling upon `html_table`, which formats any grid into HTML text. (*Note:* Github is conservative in the javascript and even CSS that it allows, so if you don't see colors in the grids below, look at this file on [nbviewer](https://nbviewer.jupyter.org/github/norvig/pytudes/blob/master/ipynb/Golomb-Puzzle.ipynb); same file, but the rendering will definitely show the colors.)"
]
},
{
@@ -745,7 +745,8 @@
" display(html_table(grid, colored_cell))\n",
" \n",
"def html_table(grid, cell_function='<td>{}'.format):\n",
" \"Return an HTML <table>, where each cell's contents comes from calling cell_function(grid[y][x])\"\n",
" \"\"\"Return an HTML <table>, where each cell's contents comes from calling \n",
" cell_function(grid[y][x])\"\"\"\n",
" return HTML('<table>{}</table>'\n",
" .format(cat('\\n<tr>' + cat(map(cell_function, row)) \n",
" for row in grid)))\n",
@@ -754,7 +755,8 @@
" x, y = sorted(rect)\n",
" return '<td style=\"background-color:{}\">{}{}'.format(colors[x], x%10, y%10)\n",
"\n",
"colors = 'lightgrey yellow plum chartreuse cyan coral red olive slateblue lightgrey wheat'.split()\n",
"colors = ('lightgrey yellow plum chartreuse cyan coral red olive slateblue lightgrey wheat'\n",
" .split())\n",
"\n",
"cat = ''.join"
]
@@ -931,17 +933,15 @@
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"def pack(rectangles, grid, animation=False): \n",
" \"\"\"Find a way to pack all rectangles onto grid and return the packed grid,\n",
" or return None if not possible. \n",
" Pause `animation` seconds between displaying each rectangle placement if `animation` is not false.\"\"\"\n",
" or return None if not possible. Pause `animation` seconds between \n",
" displaying each rectangle placement if `animation` is not false.\"\"\"\n",
" if animation: \n",
" clear_output()\n",
" show(grid)\n",
@@ -1073,7 +1073,7 @@
{
"data": {
"text/plain": [
"'857 + 349 == 1206'"
"'325 + 764 == 1089'"
]
},
"execution_count": 29,
@@ -1094,8 +1094,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 29 s, sys: 42.2 ms, total: 29.1 s\n",
"Wall time: 29.1 s\n"
"CPU times: user 30 s, sys: 56.5 ms, total: 30 s\n",
"Wall time: 30.1 s\n"
]
},
{
@@ -1136,9 +1136,7 @@
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"def compile_formula(formula, verbose=False):\n",
@@ -1173,13 +1171,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"lambda Y,M,E,O,U: Y and M and ((100*Y+10*O+U) == (10*M+E)**2)\n"
"lambda M,U,E,O,Y: M and Y and ((100*Y+10*O+U) == (10*M+E)**2)\n"
]
},
{
"data": {
"text/plain": [
"(<function __main__.<lambda>>, 'YMEOU')"
"(<function __main__.<lambda>>, 'MUEOY')"
]
},
"execution_count": 32,
@@ -1200,13 +1198,13 @@
"name": "stdout",
"output_type": "stream",
"text": [
"lambda L,A,Y,P,M,E,R,U,B,N: P and B and N and ((100*N+10*U+M) + (100*B+10*E+R) == (1000*P+100*L+10*A+Y))\n"
"lambda U,A,R,E,L,Y,N,B,M,P: B and N and P and ((100*N+10*U+M) + (100*B+10*E+R) == (1000*P+100*L+10*A+Y))\n"
]
},
{
"data": {
"text/plain": [
"(<function __main__.<lambda>>, 'LAYPMERUBN')"
"(<function __main__.<lambda>>, 'UARELYNBMP')"
]
},
"execution_count": 33,
@@ -1221,15 +1219,14 @@
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"collapsed": true
},
"metadata": {},
"outputs": [],
"source": [
"def faster_solve_all(formula):\n",
" \"\"\"Given a formula like 'ODD + ODD == EVEN', fill in digits to solve it.\n",
" Input formula is a string; output is a digit-filled-in string or None.\n",
" Capital letters are variables. This version precompiles the formula; only one eval per formula.\"\"\"\n",
" Capital letters are variables. This version precompiles the formula; \n",
" only one eval per formula.\"\"\"\n",
" fn, letters = compile_formula(formula)\n",
" for digits in itertools.permutations((1,2,3,4,5,6,7,8,9,0), len(letters)):\n",
" try:\n",
@@ -1239,7 +1236,7 @@
" pass\n",
" \n",
"def replace_all(text, olds, news):\n",
" \"Replace each occurrence of each old in text with the corresponding new.\"\n",
" \"Replace each occurence of each old in text with the corresponding new.\"\n",
" # E.g. replace_all('A + B', ['A', 'B'], [1, 2]) == '1 + 2'\n",
" for (old, new) in zip(olds, news):\n",
" text = text.replace(str(old), str(new))\n",
@@ -1254,7 +1251,7 @@
{
"data": {
"text/plain": [
"'857 + 349 = 1206'"
"'325 + 764 = 1089'"
]
},
"execution_count": 35,
@@ -1275,8 +1272,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.75 s, sys: 3.29 ms, total: 1.76 s\n",
"Wall time: 1.76 s\n"
"CPU times: user 1.77 s, sys: 4.05 ms, total: 1.77 s\n",
"Wall time: 1.77 s\n"
]
},
{

455
ipynb/Maze.ipynb Normal file
View File

@@ -0,0 +1,455 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div style=\"text-align: right\">Peter Norvig<br>13 March 2018</div> \n",
"\n",
"# Maze Generation\n",
"\n",
"Let's make some mazes! I'm thinking of mazes like this one, which is a rectangular grid of squares, with walls on some of the sides of squares, and openings on other sides:\n",
"\n",
"![Wikipedia](https://upload.wikimedia.org/wikipedia/commons/thumb/8/88/Maze_simple.svg/475px-Maze_simple.svg.png)\n",
"\n",
"The main constraint is that there should be a path from entrance to exit, and it should be ***fun*** to solve the maze with pencil, paper, and brain power&mdash;not too easy, but also not impossible. \n",
"\n",
"As I think about how to model a maze on the computer, it seems like a **graph** is the right model: the nodes of\n",
"the graph are the squares of the grid, and the edges of the graph are the openings between adjacent squares. So what properties of a graph make a good maze?\n",
"- There must be a path from entrance to exit.\n",
"- There must not be too such many paths; maybe it is best if there is only one. \n",
"- Probably the graph should be *singly connected*&mdash;there shouldn't be islands of squares that are unreachable from the start. And maybe we want exactly one path between any two squares.\n",
"- The path should have many twists; it would be too easy if it was mostly straight.\n",
"\n",
"I know that a **tree** has all these properties except the last one. So my goal has become: *Superimpose a tree over the grid, covering every square, and make sure the paths are twisty.* Here's how I'll do it:\n",
"\n",
"- Start with a grid with no edges (every square is surrounded by walls on all sides). \n",
"- Add edges (that is, knock down walls) for the entrance at upper left and exit at lower right.\n",
"- Place the root of the tree in some square.\n",
"- Then repeat until the tree covers the whole grid:\n",
" * Select some node already in the tree.\n",
" * Randomly select a neighbor that hasn't been added to the tree yet.\n",
" * Add an edge (knock down the wall) from the node to the neighbor.\n",
" \n",
"In the example below, the root, `A`, has been placed in the upper-left corner, and two branches,\n",
"`A-B-C-D` and `A-b-c-d`, have been randomly chosen (well, not actually random; they are starting to create the same maze as in the diagram above):\n",
"\n",
" o o--o--o--o--o--o--o--o--o--o\n",
" | A b c| | | | | | | |\n",
" o o--o o--o--o--o--o--o--o--o\n",
" | B| | d| | | | | | | |\n",
" o o--o--o--o--o--o--o--o--o--o\n",
" | C D| | | | | | | | |\n",
" o--o--o--o--o--o--o--o--o--o--o\n",
" | | | | | | | | | | |\n",
" o--o--o--o--o--o--o--o--o--o--o\n",
" | | | | | | | | | | |\n",
" o--o--o--o--o--o--o--o--o--o o\n",
" \n",
"Next I select node `d` and extend it to `e` (at which point there are no available neighbors, so `e` will not be selected in the future), and then I select `D` and extend from there all the way to `N`, at each step selecting the node I just added:\n",
"\n",
" o o--o--o--o--o--o--o--o--o--o\n",
" | A b c| | | | | | | |\n",
" o o--o o--o--o--o--o--o--o--o\n",
" | B| e d| | N| | | | | |\n",
" o o--o--o--o o--o--o--o--o--o\n",
" | C D| | | M| | | | | |\n",
" o--o o--o--o o--o--o--o--o--o\n",
" | F E| | K L| | | | | |\n",
" o o--o--o o--o--o--o--o--o--o\n",
" | G H I J| | | | | | |\n",
" o--o--o--o--o--o--o--o--o--o o\n",
" \n",
"Continue like this until every square in the grid has been added to the tree. \n",
"\n",
"\n",
"## Implementing Random Trees\n",
"\n",
"I'll make the following implementation choices:\n",
"\n",
"- A tree will be represented as a list of edges.\n",
"- An `Edge` is a tuple of two nodes. I'll keep them sorted so that `Edge(A, B)` is the same as `Edge(B, A)`.\n",
"- A node in a tree can be anything: a number, a letter, a square, ...\n",
"- The algorithm for `random_tree(nodes, neighbors, pop)` is as follows:\n",
" * We will keep track of three collections:\n",
" - `tree`: a list of edges that constitutes the tree.\n",
" - `nodes`: the set of nodes that have not yet been added to the tree, but will be.\n",
" - `frontier`: a queue of nodes in the tree that are eligible to have an edge added.\n",
" * On each iteration:\n",
" - Use `pop` to pick a `node` from the frontier, and find the neighbors that are not already in the tree.\n",
" - If there are any neighbors, randomly pick one (`nbr`), add `Edge(node, nbr)` to `tree`, remove the\n",
" neighbor from `nodes`, and keep both the node and the neighbor on the frontier. If there are no neighbors,\n",
" drop the node from the frontier.\n",
" * When no `nodes` remain, return `tree`."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import random\n",
"from collections import deque, namedtuple\n",
"\n",
"def Edge(node1, node2): return tuple(sorted([node1, node2]))\n",
"\n",
"def random_tree(nodes: set, neighbors: callable, pop: callable) -> [Edge]:\n",
" \"Repeat: pop a node and add Edge(node, nbr) until all nodes have been added to tree.\"\n",
" tree = []\n",
" root = nodes.pop()\n",
" frontier = deque([root])\n",
" while nodes:\n",
" node = pop(frontier)\n",
" nbrs = neighbors(node) & nodes\n",
" if nbrs:\n",
" nbr = random.choice(list(nbrs))\n",
" tree.append(Edge(node, nbr))\n",
" nodes.remove(nbr)\n",
" frontier.extend([node, nbr])\n",
" return tree"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Implementing Random Mazes\n",
"\n",
"Now let's use `random_tree` to implement `random_maze`. Some more choices:\n",
"\n",
"* A `Maze` is a named tuple with three fields: the `width` and `height` of the grid, and a list of `edges` between squares. \n",
"* A square is denoted by an `(x, y)` tuple of integer coordinates.\n",
"* The function `neighbors4` gives the four surrounding squares."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"Maze = namedtuple('Maze', 'width, height, edges')\n",
"\n",
"def neighbors4(square):\n",
" \"The 4 neighbors of an (x, y) square.\"\n",
" (x, y) = square\n",
" return {(x + 1, y), (x - 1, y), (x, y + 1), (x, y - 1)}\n",
"\n",
"def squares(width, height): \n",
" \"All squares in a grid of these dimensions.\"\n",
" return {(x, y) for x in range(width) for y in range(height)}\n",
"\n",
"def random_maze(width, height, pop=deque.pop):\n",
" \"Use random_tree to generate a random maze.\"\n",
" nodes = squares(width, height)\n",
" tree = random_tree(nodes, neighbors4, pop)\n",
" return Maze(width, height, tree)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"Maze(width=10, height=5, edges=[((6, 3), (7, 3)), ((6, 3), (6, 4)), ((6, 4), (7, 4)), ((7, 4), (8, 4)), ((8, 3), (8, 4)), ((8, 2), (8, 3)), ((7, 2), (8, 2)), ((7, 1), (7, 2)), ((7, 0), (7, 1)), ((7, 0), (8, 0)), ((8, 0), (8, 1)), ((8, 1), (9, 1)), ((9, 0), (9, 1)), ((9, 1), (9, 2)), ((9, 2), (9, 3)), ((9, 3), (9, 4)), ((6, 0), (7, 0)), ((5, 0), (6, 0)), ((5, 0), (5, 1)), ((5, 1), (6, 1)), ((6, 1), (6, 2)), ((5, 2), (6, 2)), ((4, 2), (5, 2)), ((3, 2), (4, 2)), ((3, 2), (3, 3)), ((2, 3), (3, 3)), ((2, 2), (2, 3)), ((2, 1), (2, 2)), ((2, 0), (2, 1)), ((1, 0), (2, 0)), ((0, 0), (1, 0)), ((0, 0), (0, 1)), ((0, 1), (1, 1)), ((1, 1), (1, 2)), ((0, 2), (1, 2)), ((0, 2), (0, 3)), ((0, 3), (1, 3)), ((1, 3), (1, 4)), ((0, 4), (1, 4)), ((1, 4), (2, 4)), ((2, 4), (3, 4)), ((3, 4), (4, 4)), ((4, 3), (4, 4)), ((4, 3), (5, 3)), ((5, 3), (5, 4)), ((2, 0), (3, 0)), ((3, 0), (4, 0)), ((4, 0), (4, 1)), ((3, 1), (4, 1))])"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random_maze(10,5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's not very pretty to look at. I'm going to need a way to visualize a maze.\n",
"\n",
"# Printing a maze"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"o o--o--o--o--o--o--o--o--o--o\n",
"| | | |\n",
"o--o--o--o o--o o o o--o o\n",
"| | | | |\n",
"o o o--o--o o--o--o--o--o o\n",
"| | | | | | |\n",
"o o--o o o--o o o--o o--o\n",
"| | | | | | |\n",
"o o o--o--o--o--o o o--o o\n",
"| | |\n",
"o--o--o--o--o--o--o--o--o--o o\n"
]
}
],
"source": [
"def print_maze(maze, dot='o', lin='--', bar='|', sp=' '):\n",
" \"Print maze in ASCII.\"\n",
" exit = Edge((maze.width-1, maze.height-1), (maze.width-1, maze.height))\n",
" edges = set(maze.edges) | {exit}\n",
" print(dot + sp + lin.join(dot * maze.width)) # Top line, including entrance\n",
" def vert_wall(x, y): return (' ' if Edge((x, y), (x+1, y)) in edges else bar)\n",
" def horz_wall(x, y): return (sp if Edge((x, y), (x, y+1)) in edges else lin)\n",
" for y in range(maze.height):\n",
" print(bar + cat(sp + vert_wall(x, y) for x in range(maze.width)))\n",
" print(dot + cat(horz_wall(x, y) + dot for x in range(maze.width)))\n",
" \n",
"cat = ''.join\n",
" \n",
"print_maze(random_maze(10, 5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"*Much better!* But can I do better still?\n",
"\n",
"# Plotting a maze\n",
"\n",
"I'll use `matplotlib` to plot lines where the edges aren't:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"\n",
"def plot_maze(maze):\n",
" \"Plot a maze by drawing lines between adjacent squares, except for pairs in maze.edges\"\n",
" plt.figure(figsize=(8, 4))\n",
" plt.axis('off')\n",
" plt.gca().invert_yaxis()\n",
" w, h = maze.width, maze.height\n",
" exits = {Edge((0, 0), (0, -1)), Edge((w-1, h-1), (w-1, h))}\n",
" edges = set(maze.edges) | exits\n",
" for sq in squares(w, h):\n",
" for nbr in neighbors4(sq):\n",
" if Edge(sq, nbr) not in edges:\n",
" plot_wall(sq, nbr)\n",
" plt.show()\n",
"\n",
"def plot_wall(s1, s2):\n",
" \"Plot a thick black line between squares s1 and s2.\"\n",
" (x1, y1), (x2, y2) = s1, s2\n",
" if x1 == x2: # horizontal wall\n",
" y = max(y1, y2)\n",
" X, Y = [x1, x1+1], [y, y]\n",
" else: # vertical wall\n",
" x = max(x1, x2)\n",
" X, Y = [x, x], [y1, y1+1]\n",
" plt.plot(X, Y, 'k-', linewidth=4.0)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's compare the two visualization functions:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAd0AAAD8CAYAAAAyun5JAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAABNxJREFUeJzt2sFuIjEUAMF4xf//svcaJSsIAdr2UnXj9uSZUct6jDnn\nBwDwen9WDwAA70J0ASAiugAQEV0AiIguAEREFwAiogsAEdEFgIjoAkBEdAEgIroAEBFdAIiILgBE\nRBcAIqILABHRBYCI6AJA5LJ6gGvGGPPz7znnWDXLv3ydD4Cz1F1x0wWAiOgCQER0ASCy9U53d7vt\nmHlfu///AXax+r84broAEBFdAIiILgBERBcAIqILABHRBYCI6AJARHQBICK6ABARXQCIiC4AREQX\nACKiCwAR0QWAiOgCQER0ASAiugAQEV0AiIguAEREFwAiogsAEdEFgIjoAkBEdAEgIroAEBFdAIiI\nLgBERBcAIqILABHRBYCI6AJARHQBICK6ABARXQCIiC4ARC6rBzjZGGOunuFec86xeoavTjvHHc9w\nd6c9Y57Dt/Kdmy4AREQXACKiCwARO90n2nF/ccIubcdz++yEMzzN7s+c3/Gt3OamCwAR0QWAiOgC\nQER0ASAiugAQEV0AiIguAEREFwAiogsAEdEFgIjoAkBEdAEgIroAEBFdAIiILgBERBcAIqILABHR\nBYCI6AJARHQBICK6ABARXQCIiC4AREQXACKiCwAR0QWAiOgCQER0ASAiugAQEV0AiIguAEREFwAi\nogsAEdEFgMhl9QC0xhhz9Qy3zDnH6hl4Le/h75xwblznpgsAEdEFgIjoAkDETvfN7Lin4v14D5/D\nOZ7HTRcAIqILABHRBYCI6AJARHQBICK6ABARXQCIiC4AREQXACKiCwAR0QWAiOgCQER0ASAiugAQ\nEV0AiIguAEREFwAiogsAEdEFgIjoAkBEdAEgIroAEBFdAIiILgBERBcAIqILABHRBYCI6AJARHQB\nICK6ABARXQCIiC4AREQXACKiCwAR0QWAyGX1ADDGmKtnuMdp8/IznuvjnOFtbroAEBFdAIiILgBE\n7HQfMOccq2f4HzjHx9mlPZ/38nE7nuHqb8VNFwAiogsAEdEFgIjoAkBEdAEgIroAEBFdAIiILgBE\nRBcAIqILABHRBYCI6AJARHQBICK6ABARXQCIiC4AREQXACKiCwAR0QWAiOgCQER0ASAiugAQEV0A\niIguAEREFwAiogsAEdEFgIjoAkBEdAEgIroAEBFdAIiILgBERBcAIqILAJHL6gHuMcaYq2e4Zs45\nVs9wy+5neIITnjOP863wCm66ABARXQCIiC4ARI7a6dql3c+ZPe6E3d7uz3n3+T4+zpiR87npAkBE\ndAEgIroAEBFdAIiILgBERBcAIqILABHRBYCI6AJARHQBICK6ABARXQCIiC4AREQXACKiCwAR0QWA\niOgCQER0ASAiugAQEV0AiIguAEREFwAiogsAEdEFgIjoAkBEdAEgIroAEBFdAIiILgBERBcAIqIL\nABHRBYCI6AJARHQBICK6ABC5rB7gHmOMuXoG8B7Cz8w5x+oZduOmCwAR0QWAiOgCQGTMaT0FAAU3\nXQCIiC4AREQXACKiCwAR0QWAiOgCQER0ASAiugAQEV0AiIguAEREFwAiogsAEdEFgIjoAkBEdAEg\nIroAEBFdAIiILgBERBcAIqILABHRBYCI6AJARHQBICK6ABARXQCIiC4AREQXACKiCwAR0QWAiOgC\nQER0ASDyFyUScgM3r2rXAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x10dd4dd68>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"o o--o--o--o--o--o--o--o--o--o\n",
"| | |\n",
"o o--o o--o o o--o--o--o o\n",
"| | | | | | | |\n",
"o o o--o o o o--o o--o o\n",
"| | | | | | | |\n",
"o o o o--o--o o o--o o--o\n",
"| | | | | | |\n",
"o--o o--o--o o o o--o--o o\n",
"| | |\n",
"o--o--o--o--o--o--o--o--o--o o\n"
]
}
],
"source": [
"M = random_maze(10, 5)\n",
"plot_maze(M) \n",
"print_maze(M)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# `pop` strategies\n",
"\n",
"Now I want to compare how the maze varies based on theree different choices for the `pop` parameter. \n",
"\n",
"(1) The default is `deque.pop`\n",
"which means that the tree is created **depth-first**; we always select the `node` at the end of the `frontier`, so the tree follows a single branch along a randomly-twisted path until the path doubles back on itself and there are no more neighbors; at that point we select the most recent square for which there are neighbors:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAe0AAAD8CAYAAABaSfxxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACnJJREFUeJzt3dFu4zYQBVCl6P//cvoUYLGwU1oiOXOlc94KdCVKlnJB\nzHj89f39fQAA/f1TvQAAYIzQBoAQQhsAQghtAAghtAEghNAGgBBCGwBCCG0ACCG0ASCE0AaAEEIb\nAEIIbQAIIbQBIITQBoAQQhsAQghtAAghtAEgxL/VC1jp6+vr+8///v7+/hr5/wBgxLtcWcVOGwBC\nCG0ACCG0ASDErWvao2bXJM7W0nfXRiqsuObu93H2+iqvt/u9rnSn9z5hjSN2/L3ZzU4bAEIIbQAI\nIbQBIITQBoAQQhsAQugeP853A1Z27Y6aPQVu11S5xK7P2V3hM1Q925X3uvvnPGrHRMfUz/ns34dP\n1tu1Y95OGwBCCG0ACCG0ASCEmvYLV2sZK2ohK2o2Z45fdbwV5+pWz6usy40es+peX7Hrc666xk/O\nW/XMjuq+vg7stAEghNAGgBBCGwBCCG0ACCG0ASCE7vEX7tQVPGrXb8SePc/O37Ct+i3hTlOeuj8P\nK87R7dsEZ4+/4nnt3tW94t537Ty30waAEEIbAEIIbQAIoaZ99KmvJEwRe6f71Kjj6F+XW3He7r/y\nlfA8VE2B69R/0K3ef/bedK1Tf8JOGwBCCG0ACCG0ASCE0AaAEEIbAELoHj8yur27TBLr1NH6Tpdv\nA6w6XuIUsYTn4U5T4HZZ3Z1d+Rx27TS30waAEEIbAEIIbQAIoaZ99Kt5vVI1AWh1Talr3ehPVb/y\nNWrF+nZN81qhahrbnd6VLu/lu3Ws+IxTeg3stAEghNAGgBBCGwBCCG0ACCG0ASCE7vGjT9fgzmk9\n3aZ8rT7eCp26vd954ufyt+5TAWe/y5Ud01XPx5O6ye20ASCE0AaAEEIbAEKoab+wq+bYqb6yawJW\n1bSqHefuMkXqN7PXmHDNVyX0LpxVWU+/ovJXvapr3XbaABBCaANACKENACGENgCEENoAEOJR3eNX\nu/7OdkpWdxuOuNpZvfre/Ha8s/d31+dyp8+/6nh/++R5mP3MXrX63sw4/ux7s/odrewm381OGwBC\nCG0ACCG0ASDEo2raVTWKjrWRqnr8jnvRfcpT8vNQdfyEvoDZk/QqpwfOlvC3N+EZOw47bQCIIbQB\nIITQBoAQQhsAQghtAAjxqO7xd6omau3oVqzqQO3YidllytPo8Sp/U3yXTlMGR49ZdQ+7fXav7Pq2\nScK9WMVOGwBCCG0ACCG0ASCEmvbRZ6LQznV0n3RW9QtMV87V5Tn6Tfdpce8kTNKbrXI9adPYuqxj\nBzttAAghtAEghNAGgBBCGwBCaEQ7+nxRf8U6ug9NmbG+quE4q8+z87nsMnim4zvQ5e/DCpXDbHZI\nXfdv7LQBIITQBoAQQhsAQqhpH8/6Yv5VowP8K+9p98+z+/p+c+fBM+/caXDQ7KEpnd77EZU9NLPY\naQNACKENACGENgCEENoAEEJoA0CIR3WPV3f9VejWzVn5Gaw+950mb81eS6dre6fbtL+z6+nYIT3a\nZd7p70O3v50/7LQBIITQBoAQQhsAQjyqpt2tRpFSQxmRMCkrbcpTZW1y1zV3udcJZtybhPd05nkT\neik+ZacNACGENgCEENoAEEJoA0AIoQ0AIR7VPf7O7A7bFR2L3bsgO03Qmv2bwaPneULHdVU3evfn\n/5XVv8M9Q9XncvVanvytAzttAAghtAEghNAGgBBq2i9crVXvqK90qxGenTZ29XivjtnxF4RGJNbl\nqu71imlxqe/UJ7q897ue9cR36v/YaQNACKENACGENgCEENoAEEJoA0AI3eMXdOxGXt2lueuaEyce\njd6bjs/Nnayezjf7mxJXv1Gx412pmjK4+t8lstMGgBBCGwBCCG0ACKGm/YHV9eGOddvUmvgrs6/l\nTs/D6Lmq7uGV86ZO30p4VxIm341KqYvbaQNACKENACGENgCEENoAEEJoA0AI3eMfONtd2KUTc+e5\nKzvhZ09vmj2x6qorx6t6xjp25naZvnWnd2WX1X+LO7PTBoAQQhsAQghtAAihpn2cr3NcrWFW/hrP\njrXssqs2nXa8SgnT/v62a42d6vup0+J2H78TO20ACCG0ASCE0AaAEEIbAEIIbQAI8aju8TtNg7pL\np+vo8T9Zx9V7s3pSVtX6VrjTtdxp0tlVXabFXdVtPTPYaQNACKENACGENgCEeFRNO3n6T5dpWcl1\nurN21W1nT+b7RJcJZk98vhLMfgeS/xZXs9MGgBBCGwBCCG0ACCG0ASCE0AaAEI/qHn9n9m9Q75jC\n06Wb/Efi5KGra57dEbviHp5d4xOfr8r3eUSXdfxp9po6TWLr2nlupw0AIYQ2AIQQ2gAQQk37yK3z\ndeYefG72c1h57tmf/yfHWz19q6p3oeM7tfqZTZ3EtpKdNgCEENoAEEJoA0AIoQ0AIYQ2AIR4VPf4\n7M7XjhOKurh6r+98byuvrfs7sLM7ffXa7zQtbvTcs685+XlYxU4bAEIIbQAIIbQBIMSjatpX66er\n6zU7p/WsvuZuU6hWSJyQ5x14r9PndEXyddxpKuAqdtoAEEJoA0AIoQ0AIYQ2AIR4VCPaqNmNBpVf\n/O/WXHH2PJXNH93u4Q7d3oEr67l6LT7X7CE6Z8674tyz2GkDQAihDQAhhDYAhFDTPjKH3O8akrF6\n+EXXutEZrmWfJwxhWXGeLp9r8j2sZqcNACGENgCEENoAEEJoA0AIoQ0AIXSPH+snHlVOVFrdTX71\neKPn+e14q+9v1TSvq98QWCFhOljVGqumhq1ex05d7mFndtoAEEJoA0AIoQ0AIdS0X0ieotOtNl1h\ntBbcbSpT4mfSaS0/Eiccdjj+GWkTEyt/cXEWO20ACCG0ASCE0AaAEEIbAEIIbQAIoXv8A6OdkrMn\nYF055qjqjshOVk+imt1NXmn2GivvTernsmM9VVMBZx/vk3/XsVv/OOy0ASCG0AaAEEIbAEKoaR91\nk4yu1Hm61lv+T7d64CtVE7VWr6Ojymljq3sQuv06XKU7TwXczU4bAEIIbQAIIbQBIITQBoAQQhsA\nQugePzI6mkfN7ji9UwfrqO7PQ/f1vTK7k3rF9MBdnctVv+v+6txV502YPtf1b5idNgCEENoAEEJo\nA0AINe0XVkww++Q8O475xF+kOnu/u9W2PlnP2V+m63bNn6hae8I96zLt753Z0+fu9Fz/sNMGgBBC\nGwBCCG0ACCG0ASCE0AaAELrHN1jRWV3V/Z3QTb7r3KvP0/G3hbtcc8K3FUbNvpYV92b2e//EZ3sW\nO20ACCG0ASCE0AaAEGrax/y6ScLUnapa9womYF03ei13uua72vkZdX8ePllfQr/OcdhpA0AMoQ0A\nIYQ2AIQQ2gAQQmgDQIiv7++IITCnVHf5AZCp629022kDQAihDQAhhDYAhLh1TRsA7sROGwBCCG0A\nCCG0ASCE0AaAEEIbAEIIbQAIIbQBIITQBoAQQhsAQghtAAghtAEghNAGgBBCGwBCCG0ACCG0ASCE\n0AaAEEIbAEIIbQAIIbQBIITQBoAQQhsAQghtAAghtAEghNAGgBBCGwBCCG0ACCG0ASCE0AaAEEIb\nAEIIbQAIIbQBIMR/rtNmnUTdzqUAAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1176e5e80>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_maze(random_maze(40, 20, deque.pop))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The maze with `deque.pop` looks pretty good. Reminds me of those [cyber brain](https://www.vectorstock.com/royalty-free-vector/cyber-brain-vector-3071965) images.\n",
"\n",
"(2) An alternative is `queue.popleft`, which creates the maze roughly **breadth-first**&mdash;we start at some root square , add an edge to it, and from then on we always select first a parent edge before we select a child edge. The net result is a design that appears to radiate out in concentric layers from the root (which is chosen by `random_tree` and is not necessarily the top-left square; below it looks like the root is in the upper-left quadrant). "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAe0AAAD8CAYAAABaSfxxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACSxJREFUeJzt3dluIzkMBVBnMP//y56nQTcacboWSeQtnfMWIHGpFvtC\nIEN/vd/vFwDQ3z/VCwAAjhHaABBCaANACKENACGENgCEENoAEEJoA0AIoQ0AIYQ2AIQQ2gAQQmgD\nQAihDQAhhDYAhBDaABBCaANACKENACGENgCE+Ld6ATN9fX29f//5/X5/Hfk9ADjiU67MYqcNACGE\nNgCEENoAEOLRNe2jVtckrjhan686btX6AFaq7oGy0waAEEIbAEIIbQAIIbQBIITQBoAQW3WPV3f9\n/c2ZjuvR51LVjQ7QUdf/gLHTBoAQQhsAQghtAAixVU37bo2icupX1/rK/46u7+jvmbAGrJTSb2On\nDQAhhDYAhBDaABBCaANACKENACG26h4/2h14tcM52ejO+pWedB+go0+fD13eezPW1+Xc/mSnDQAh\nhDYAhBDaABBiq5r23brtk6Zyja7vj/q7mccyZQ2Omf35cFfl+qpr3XbaABBCaANACKENACGENgCE\nENoAEGKr7vHqrr8K3SadJdyDhDXSX/cpYq9X3fTH0dem0zWdzU4bAEIIbQAIIbQBIMRWNW0TsH45\nei1GX7PEe5C4ZtbrPkXs9apbY8K1Oaq6fm6nDQAhhDYAhBDaABBCaANACKENACG26h6/28FY3TX4\nu6NrHP17/OLaZBj9bHebMnjHqklns4+7EzttAAghtAEghNAGgBBb1bRN9zpvh3O8yrXpafT0raq6\nb4InTTo7qroeb6cNACGENgCEENoAEEJoA0AIoQ0AIbbqHh/d9VfZRbiqG3P2OT5pEtuTzqXK3Ql+\nR/727LGvetJ9N+msDzttAAghtAEghNAGgBBb1bSfNJVnlao6X8K9etK5VJlRAzXp7D7P9mfVdXs7\nbQAIIbQBIITQBoAQQhsAQghtAAixVff4J9XdgJ0dnUR1dWLVyu7hq+dylOfos8ppY93+AyLxOUlc\n81PZaQNACKENACGENgCEUNP+xo5TflaZcW1Ntuqn8p5UHdsUsT1U1/fttAEghNAGgBBCGwBCCG0A\nCLFVI9rdBoLZDQg/NajMHnJyVNU1TGhgO2r00I0zz81o3QaX3HnN0ceubljayYxBNl0bBu20ASCE\n0AaAEEIbAEJsVdMeXaPoWvMYqWqwxMovEpmt8tp062n405POeYfPgy4qex+q2WkDQAihDQAhhDYA\nhBDaABBCaANAiK26xz+5OkUspdvwjtET1mZPbDtz7NF02p93de2V57zD+3620RMdd7ondtoAEEJo\nA0AIoQ0AIdS0bzABKUv3SVmVk7e6fePZCiadrVM1Se+MlLq4nTYAhBDaABBCaANACKENACGENgCE\n0D0+QUoX4kyjr8GZ17vbFTx7StfKbvIuz2LlOlZd76qJe2emiHWZCnhU5XPT5b3zJzttAAghtAEg\nhNAGgBBq2gvsMFHpSVO/njI5bdZrjjzuDB17C65I+Pa2Jz03R1XXuu20ASCE0AaAEEIbAEIIbQAI\nIbQBIITu8UJ3uhCrpjJ9cmYq009/98mZKU9VE7CqupEru8mrpoiN6DKeff+O6t7RPUPCGrt2sttp\nA0AIoQ0AIYQ2AIRQ025kRg2lW12mqm5YeezKyVvdp3lVPg9Hda85P2n6XKWEOvvrZacNADGENgCE\nENoAEEJoA0AIoQ0AIXSPhxjd2Xh1GlTCcbt1tF69d5Xd5KMldJNXTXc7KqW7+Ttd1p7Qxf43dtoA\nEEJoA0AIoQ0AIdS0Qz2hNjNK96lalRO1uj8nCVPgur9epe7n0qWWPpKdNgCEENoAEEJoA0AIoQ0A\nIYQ2AITQPf6N2dPHZqqadPa3daw8VrfvPz66vtmT0yp17KzuMqku4f7ddfUcj75XEjrtR7HTBoAQ\nQhsAQghtAAihpv2N0fWRHeotT6rfVZ3Lk56ThLpvVe+D+3z/9XZmpw0AIYQ2AIQQ2gAQQmgDQAih\nDQAhdI+fMHtSWkJXaUKH7NE1VnW+3j3unefw7oSp0c/sivdAVZf/kzqhO06025WdNgCEENoAEEJo\nA0AINe0bdqjLJNTvnjLZasZxn1RXPSrhvnRn0llfdtoAEEJoA0AIoQ0AIYQ2AIQQ2gAQQvf4BImT\nzj7pci5nuk9HT/MaPcFsx+lSs6cJ/qTqvoy2ckLeUVePrZv8OjttAAghtAEghNAGgBBq2rRUOfWr\nqrbZvaZ6RmXdvuq+jLbjtUl4tqvZaQNACKENACGENgCEENoAEEIj2jcqB0HspmOzzeyhG5VDU6qe\n7dFDPCqvzeghJ1UDTiqPUdXAduY8un5u22kDQAihDQAhhDYAhFDT/sboekvCFzqsMmOASPdhKEfN\nqKFVDTnpWg/83VOuzYx+j+7PduVnavWzbacNACGENgCEENoAEEJoA0AIoQ0AIXSPn3C1a7C627DC\nyi7V2dOyune+rny+Ep/lqqlto/+u4/TAq2ZMMNuFnTYAhBDaABBCaANACDXtG0w6+2XHSVlHda+J\nv179p36dkTCZbKQZ0wP5rPozzE4bAEIIbQAIIbQBIITQBoAQQhsAQugen2BFd2H3TtXuHdhnjvXp\nXGavacS1Hj0trsv0ucrvW1/VdW4CI9+x0waAEEIbAEIIbQAIoaa9QPIUooTpUlU1vNHXJmHa2GgJ\n091G674+flb9/rPTBoAQQhsAQghtAAghtAEghNAGgBC6xxeo7ja8Y/YErMrjjp6A9aRu8tGvWfke\n6P7+674+erHTBoAQQhsAQghtAAihpr2AyUY9VH0jVVWte8ZrmubF7qp7EOy0ASCE0AaAEEIbAEII\nbQAIIbQBIITu8UIruhC7d/EmTPMy6Wzd6wE/s9MGgBBCGwBCCG0ACKGm3Uj3+vMIM+q5O046A2pU\n93HYaQNACKENACGENgCEENoAEEJoA0CIr/f7uQONqrv8AMj06b85/syV1f/1YacNACGENgCEENoA\nEOLRNW0AeBI7bQAIIbQBIITQBoAQQhsAQghtAAghtAEghNAGgBBCGwBCCG0ACCG0ASCE0AaAEEIb\nAEIIbQAIIbQBIITQBoAQQhsAQghtAAghtAEghNAGgBBCGwBCCG0ACCG0ASCE0AaAEEIbAEIIbQAI\nIbQBIITQBoAQQhsAQghtAAghtAEghNAGgBD/AfT29ufieXEqAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x1176fcf60>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"plot_maze(random_maze(40, 20, deque.popleft))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `deque.popleft` maze is interesting as a design, but to me it doesn't work well as a maze.\n",
"\n",
"(3) We can select a cell at random by shuffling the frontier before popping an element off of it:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAe0AAAD8CAYAAABaSfxxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAACmZJREFUeJzt3dmO4zYQBVB1kP//ZedpMpPA7tZCsupK57wN4NZC0b4g\nWFP6er1eGwDQ31/VFwAA7CO0ASCE0AaAEEIbAEIIbQAIIbQBIITQBoAQQhsAQghtAAghtAEghNAG\ngBBCGwBCCG0ACCG0ASCE0AaAEEIbAEIIbQAI8Xf1Bcz09fX1+vPfr9fra8/nAGCPT7kyi5U2AIQQ\n2gAQQmgDQIhb72mftXqPorOzdQFPGMPR9/zEMZyhahzvNB+q7qVyDFNqoKy0ASCE0AaAEEIbAEII\nbQAIIbQBIITq8Tf2Vgd2rS48o6rC9k6uzpvR5+lk9HdlxHytOvfZ846YX7Ors1dJ/A6MYqUNACGE\nNgCEENoAEMKe9jZ/Pze5i9jeaxr9uRm6dHn6ZMYYdukO9sne6zmyh3n2Hrs9v8oOXVVjeNad6ot+\nYqUNACGENgCEENoAEEJoA0AIoQ0AIR5VPd69knBlFWj3sah05y5PVd3+Rt/bimryq+dO+F8jfgfy\nWGkDQAihDQAhhDYAhHjUnnZqB6wZOnZfO6vquVR2OtvrTnN2ry4dzFY58kxGd2Mbfd7R82vG2FSz\n0gaAEEIbAEIIbQAIIbQBIITQBoAQj6oe/2R21WdCpzOd036r7N7VXeK9dO9wd/X6jlR6p83t0f/j\n5w6stAEghNAGgBBCGwBC2NN+Y9V+8QqrOhklSOjKtMeMPczKrm1Vqt621ektX91+H2Z35jvyXela\n52OlDQAhhDYAhBDaABBCaANACKENACFUjy9QXW34zuxKyb3Hu/q5K9fIb1WdrRIqdrt3VBv1d6OP\nMVP365vJShsAQghtAAghtAEghD3tbX43qBFdg6r2/vZee+Ue0526dO1xZKyrOp11f/tT5blGd/2a\n0WGt6tzeUvgzK20ACCG0ASCE0AaAEEIbAEIIbQAI8ajq8e6V1TMqYq/ec2VXppnH27bsLl17pVVx\nP6Gb114z7uOJ3d3uxkobAEIIbQAIIbQBIMTX63XfLYPRb5Sa0XnoaSrH0HO+zhhelzA2CddYpXps\nrLQBIITQBoAQQhsAQghtAAghtAEgxKM6oo2W0K2ne9evhDH8pMsYHjH6Gvf+XacxSLFizLz/+rOu\nFfNW2gAQQmgDQAihDQAh7Glv+/cuuu5xzJRwz9Udin5SeT2Vb5gb6cgz7j4f9qp869/o38SEedhp\nP/07VtoAEEJoA0AIoQ0AIYQ2AIR4VCHa1WKI2YUKI4prRhfhVN3zlfOOLnq5Ogazm7AcecbdvwMr\nVT2XTk15Rp9j1Xdq73nvUpT4JyttAAghtAEghNAGgBCP2tMevZ/xhH2U0dc+o8nC7PGuen4jznun\n5hd7VT3/qv3hSlXz5k61FUdZaQNACKENACGENgCEENoAEEJoA0CIR1WPd6uQXVFl3r2b18oq0LPj\nXVWpOmK+zu6QV9X168g979X9OVfqNm9GXUciK20ACCG0ASCE0AaAEI/a0161V92pk9ET7rny7WsV\n57nDvtwvM+757DFXvRGv0+/DJ3eaY39KGPufWGkDQAihDQAhhDYAhBDaABBCaANAiEdVj6+q7kx4\nn/bo6tAunbKO/O3oz1Xd85G/qxqb2fNtprQOa0fcpeNY1TysYKUNACGENgCEENoAEOJRe9od95ar\ndO++lfCsRo9Nwj0/0Z2eS9WcvUvXwm2r3xe30gaAEEIbAEIIbQAIIbQBIITQBoAQj6oe/+RsB7M7\ndXnqfs/fXV91NedRozusPdGM+TC6o11lV8C9rp5jdre4quvrzEobAEIIbQAIIbQBIIQ97QtGd+Hp\n2Hlp9F73jL3z2W8q6vacO86T2WbMh+77nTOe8+z99G5zc8UYrmalDQAhhDYAhBDaABBCaANACKEN\nACEeVT0+ugvP6IrpEeeaffzundO27XonqrPXtKrS/srYrqoW7jJfZxyz6nOV7tKZ7A6dFa20ASCE\n0AaAEEIbAEI8ak+7W7eeK3Tp+q37XlRVV7kZx1xZx0Gdqt+X2XUnR3R9y56VNgCEENoAEEJoA0AI\noQ0AIYQ2AIR4VPX4J7OrAVdW0lZ1/ar63J6/rdapIraqU93K+VD1fZ79Hbh6fTOOWfVdO3vebr8N\nZ1hpA0AIoQ0AIYQ2AISwp73VdethjKo3j1Xp1H2u217nto0fn6p7mTGv79L57sm/vVbaABBCaANA\nCKENACGENgCEENoAEEL1+Hbv7joJ17jHjPvYe8xuY9ipO9hoI653dte9VWNaOefPHq/j/ya4Gytt\nAAghtAEghNAGgBD2tN9I7rbTbV/urIRn0H0Mt63/OCbM1y7XOOJZVt1L93m4bTn781baABBCaANA\nCKENACGENgCEENoAEEL1+AHVVYN7zO7ytKqL2IqxHl0dWjU/RrxP+ey1X60K7t6VbPW5vjPiHddV\n4z27MrtrpfcMVtoAEEJoA0AIoQ0AIexpX5DQ5eeTqmsfsS+Xdu7K7lKj77lqj3DGPY8+d0JHtNHn\n6j7Wyb/Rn1hpA0AIoQ0AIYQ2AIQQ2gAQQiHaG6saiPzfkaYIV3V/Dd2V865q+NFlbEYU28xuKFPZ\nbKfq+9ytYdERM+bYd8f/ZGVhWkojFittAAghtAEghNAGgBD2tLfz+yaV/3F/1bm772UdOeZsyY0c\nnthsZ69O3/urujfHqWwG1KWJzk+stAEghNAGgBBCGwBCCG0ACCG0ASCE6vEts9tYt45CVWOYYHa1\n6YjjV1fE/tLlOs5IvvbZuvw+3OEZWWkDQAihDQAhhDYAhLCn/UbV/unK8yZ0KLqL2WMzYt4k1wx0\nkTCGd+mINvq8R1T/hllpA0AIoQ0AIYQ2AIQQ2gAQQmgDQAjV429UveN3RlVi9w5me6/jyPXOOOYV\nncZm1XzoZPQ9V43h3vNWvou+2zw6MjYpXRittAEghNAGgBBCGwBC2NN+Q0e03xI6FI2W1jVKR7T/\n6vYGvJ+Ot3KPvOqe9+q2J96RlTYAhBDaABBCaANACKENACGENgCEeFT1+NXKxMrKxi7dm6q6jR3p\nZDRaVee0GX9X1R2sshNbVdevqsrqlb9TnZ7zHjM6K65mpQ0AIYQ2AIQQ2gAQ4lF72qP3mCq7SF09\nd/cOWCP2aUede9VYJXRES3yT1Sqz97pXjk2X5zzrPFdU73VbaQNACKENACGENgCEENoAEEJoA0CI\nR1WP36Uz0nfOdiiq+twIVdWcZ6t2KztWpR3/jKrK59XXcUZaJ8TkLpazWGkDQAihDQAhhDYAhHjU\nnnbHLkpPs2I/sNtzrnrb04pzr+rMd+Q8VeNdNe8qx2a00b8POqIBAGWENgCEENoAEEJoA0AIoQ0A\nIb5er9s1jPlXdZUfAJm6vtvdShsAQghtAAghtAEgxK33tAHgTqy0ASCE0AaAEEIbAEIIbQAIIbQB\nIITQBoAQQhsAQghtAAghtAEghNAGgBBCGwBCCG0ACCG0ASCE0AaAEEIbAEIIbQAIIbQBIITQBoAQ\nQhsAQghtAAghtAEghNAGgBBCGwBCCG0ACCG0ASCE0AaAEEIbAEIIbQAIIbQBIITQBoAQQhsAQvwD\nxnjoO1WPSQgAAAAASUVORK5CYII=\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x117710550>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"def poprandom(seq):\n",
" \"Shuffle a mutable sequence (deque or list) and then pop an element.\"\n",
" random.shuffle(seq)\n",
" return seq.pop()\n",
"\n",
"plot_maze(random_maze(40, 20, poprandom))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is an interesting compromise: it has some structure, but still works nicely as a maze, in my opinion.\n",
"\n",
"What other variations can you come up with to generate interesting mazes?"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

422
ipynb/Pickleball.ipynb Normal file
View File

@@ -0,0 +1,422 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Scheduling a Doubles Pickleball Tournament\n",
"\n",
"My friend Steve asked for help in creating a schedule for a round-robin doubles pickleball tournament with 8 or 9 players on 2 courts. ([Pickleball](https://en.wikipedia.org/wiki/Pickleball) is a paddle/ball/net game played on a court that is smaller than tennis but larger than ping-pong.) \n",
"\n",
"To generalize: given *P* players and *C* available courts, we would like to create a **schedule**: a table where each row is a time period (a round of play), each column is a court, and each cell contains a game, which consists of two players partnered together and pitted against two other players. The preferences for the schedule are:\n",
"\n",
"- Each player should partner with each other player exactly once (or as close to that as possible).\n",
"- Fewer rounds are better (in other words, try to fill all the courts each round).\n",
"- Each player should play against each other player twice, or as close to that as possible.\n",
"- A player should not be scheduled to play two games at the same time.\n",
"\n",
"For example, here's a perfect schedule for *P*=8 players on *C*=2 courts:\n",
"\n",
" [([[1, 6], [2, 4]], [[3, 5], [7, 0]]),\n",
" ([[1, 5], [3, 6]], [[2, 0], [4, 7]]),\n",
" ([[2, 3], [6, 0]], [[4, 5], [1, 7]]),\n",
" ([[4, 6], [3, 7]], [[1, 2], [5, 0]]),\n",
" ([[1, 0], [6, 7]], [[3, 4], [2, 5]]),\n",
" ([[2, 6], [5, 7]], [[1, 4], [3, 0]]),\n",
" ([[2, 7], [1, 3]], [[4, 0], [5, 6]])]\n",
" \n",
"This means that in the first round, players 1 and 6 partner against 2 and 4 on one court, while 3 and 5 partner against 7 and 0 on the other. There are 7 rounds.\n",
"\n",
"My strategy for finding a good schedule is to use **hillclimbing**: start with an initial schedule, then repeatedly alter the schedule by swapping partners in one game with partners in another. If the altered schedule is better, keep it; if not, discard it. Repeat. \n",
"\n",
"## Coding it up\n",
"\n",
"The strategy in more detail:\n",
"\n",
"- First form all pairs of players, using `all_pairs(P)`.\n",
"- Put pairs together to form a list of games using `initial_games`.\n",
"- Use `Schedule` to create a schedule; it calls `one_round` to create each round and `scorer` to evaluate the schedule.\n",
"- Use `hillclimb` to improve the initial schedule: call `alter` to randomly alter a schedule, `Schedule` to re-allocate the games to rounds and courts, and `scorer` to check if the altered schedule's score is better.\n",
"\n",
"\n",
"\n",
"(Note: with *P* players there are *P &times; (P - 1) / 2* pairs of partners; this is an even number when either *P* or *P - 1* is divisible by 4, so everything works out when, say, *P*=4 or *P*=9, but for, say, *P*=10 there are 45 pairs, and so `initial_games` chooses to create 22 games, meaning that one pair of players never play together, and thus play one fewer game than everyone else.)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import random\n",
"from itertools import combinations\n",
"from collections import Counter\n",
"\n",
"#### Types\n",
"\n",
"Player = int # A player is an int: `1`\n",
"Pair = list # A pair is a list of two players who are partners: `[1, 2]`\n",
"Game = list # A game is a list of two pairs: `[[1, 2], [3, 4]]`\n",
"Round = tuple # A round is a tuple of games: `([[1, 2], [3, 4]], [[5, 6], [7, 8]])`\n",
"\n",
"class Schedule(list):\n",
" \"\"\"A Schedule is a list of rounds (augmented with a score and court count).\"\"\"\n",
" def __init__(self, games, courts=2):\n",
" games = list(games)\n",
" while games: # Allocate games to courts, one round at a time\n",
" self.append(one_round(games, courts))\n",
" self.score = scorer(self)\n",
" self.courts = courts\n",
" \n",
"#### Functions\n",
" \n",
"def hillclimb(P, C=2, N=100000):\n",
" \"Schedule games for P players on C courts by randomly altering schedule N times.\"\n",
" sched = Schedule(initial_games(all_pairs(P)), C)\n",
" for _ in range(N):\n",
" sched = max(alter(sched), sched, key=lambda s: s.score)\n",
" return sched\n",
"\n",
"def all_pairs(P): return list(combinations(range(P), 2))\n",
"\n",
"def initial_games(pairs):\n",
" \"\"\"An initial list of games: [[[1, 2], [3, 4]], ...].\n",
" We try to have every pair play every other pair once, and\n",
" have each game have 4 different players, but that isn't always true.\"\"\"\n",
" random.shuffle(pairs)\n",
" games = []\n",
" while len(pairs) >= 2:\n",
" A = pairs.pop()\n",
" B = first(pair for pair in pairs if disjoint(pair, A)) or pairs[0]\n",
" games.append([A, B])\n",
" pairs.remove(B)\n",
" return games\n",
"\n",
"def disjoint(A, B): \n",
" \"Do A and B have disjoint players in them?\"\n",
" return not (players(A) & players(B))\n",
"\n",
"def one_round(games, courts):\n",
" \"\"\"Place up to `courts` games into `round`, all with disjoint players.\"\"\"\n",
" round = []\n",
" while True:\n",
" G = first(g for g in games if disjoint(round, g))\n",
" if not G or not games or len(round) == courts:\n",
" return Round(round)\n",
" round.append(G)\n",
" games.remove(G)\n",
"\n",
"def players(x): \n",
" \"All distinct players in a pair, game, or sequence of games.\"\n",
" return {x} if isinstance(x, Player) else set().union(*map(players, x))\n",
"\n",
"def first(items): return next(items, None)\n",
"\n",
"def pairing(p1, p2): return tuple(sorted([p1, p2]))\n",
" \n",
"def scorer(sched):\n",
" \"Score has penalties for a non-perfect schedule.\"\n",
" penalty = 50 * len(sched) # More rounds are worse (avoid empty courts)\n",
" penalty += 1000 * sum(len(players(game)) != 4 # A game should have 4 players!\n",
" for round in sched for game in round)\n",
" penalty += 1 * sum(abs(c - 2) ** 3 + 8 * (c == 0) # Try to play everyone twice\n",
" for c in opponents(sched).values())\n",
" return -penalty\n",
" \n",
"def opponents(sched):\n",
" \"A Counter of {(player, opponent): times_played}.\"\n",
" return Counter(pairing(p1, p2) \n",
" for round in sched for A, B in round for p1 in A for p2 in B)\n",
" \n",
"def alter(sched):\n",
" \"Modify a schedule by swapping two pairs.\"\n",
" games = [Game(game) for round in sched for game in round] \n",
" G = len(games)\n",
" i, j = random.sample(range(G), 2) # index into games\n",
" a, b = random.choice((0, 1)), random.choice((0, 1)) # index into each game\n",
" games[i][a], games[j][b] = games[j][b], games[i][a]\n",
" return Schedule(games, sched.courts)\n",
"\n",
"def report(sched):\n",
" \"Print information about this schedule.\"\n",
" for i, round in enumerate(sched, 1):\n",
" print('Round {}: {}'.format(i, '; '.join('{} vs {}'.format(*g) for g in round)))\n",
" games = sum(sched, ())\n",
" P = len(players(sched))\n",
" print('\\n{} games in {} rounds for {} players'.format(len(games), len(sched), P))\n",
" opp = opponents(sched)\n",
" fmt = ('{:2X}|' + P * ' {}' + ' {}').format\n",
" print('Number of times each player plays against each opponent:\\n')\n",
" print(' |', *map('{:X}'.format, range(P)), ' Total')\n",
" print('--+' + '--' * P + ' -----')\n",
" for row in range(P):\n",
" counts = [opp[pairing(row, col)] for col in range(P)]\n",
" print(fmt(row, *[c or '-' for c in counts], sum(counts) // 2))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 8 Player Tournament\n",
"\n",
"I achieved (in a previous run) a perfect schedule for 8 players: the 14 games fit into 7 rounds, each player partners with each other once, and plays each individual opponent twice:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Round 1: [1, 6] vs [2, 4]; [3, 5] vs [7, 0]\n",
"Round 2: [1, 5] vs [3, 6]; [2, 0] vs [4, 7]\n",
"Round 3: [2, 3] vs [6, 0]; [4, 5] vs [1, 7]\n",
"Round 4: [4, 6] vs [3, 7]; [1, 2] vs [5, 0]\n",
"Round 5: [1, 0] vs [6, 7]; [3, 4] vs [2, 5]\n",
"Round 6: [2, 6] vs [5, 7]; [1, 4] vs [3, 0]\n",
"Round 7: [2, 7] vs [1, 3]; [4, 0] vs [5, 6]\n",
"\n",
"14 games in 7 rounds for 8 players\n",
"Number of times each player plays against each opponent:\n",
"\n",
" | 0 1 2 3 4 5 6 7 Total\n",
"--+---------------- -----\n",
" 0| - 2 2 2 2 2 2 2 7\n",
" 1| 2 - 2 2 2 2 2 2 7\n",
" 2| 2 2 - 2 2 2 2 2 7\n",
" 3| 2 2 2 - 2 2 2 2 7\n",
" 4| 2 2 2 2 - 2 2 2 7\n",
" 5| 2 2 2 2 2 - 2 2 7\n",
" 6| 2 2 2 2 2 2 - 2 7\n",
" 7| 2 2 2 2 2 2 2 - 7\n"
]
}
],
"source": [
"report([\n",
" ([[1, 6], [2, 4]], [[3, 5], [7, 0]]),\n",
" ([[1, 5], [3, 6]], [[2, 0], [4, 7]]),\n",
" ([[2, 3], [6, 0]], [[4, 5], [1, 7]]),\n",
" ([[4, 6], [3, 7]], [[1, 2], [5, 0]]),\n",
" ([[1, 0], [6, 7]], [[3, 4], [2, 5]]),\n",
" ([[2, 6], [5, 7]], [[1, 4], [3, 0]]),\n",
" ([[2, 7], [1, 3]], [[4, 0], [5, 6]]) ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 9 Player Tournament\n",
"\n",
"For 9 players, I can fit the 18 games into 9 rounds, but some players play each other 1 or 3 times:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Round 1: [1, 7] vs [4, 0]; [3, 5] vs [2, 6]\n",
"Round 2: [2, 7] vs [1, 3]; [4, 8] vs [6, 0]\n",
"Round 3: [5, 0] vs [1, 6]; [7, 8] vs [3, 4]\n",
"Round 4: [7, 0] vs [5, 8]; [1, 2] vs [4, 6]\n",
"Round 5: [3, 8] vs [1, 5]; [2, 0] vs [6, 7]\n",
"Round 6: [1, 4] vs [2, 5]; [3, 6] vs [8, 0]\n",
"Round 7: [5, 6] vs [4, 7]; [1, 8] vs [2, 3]\n",
"Round 8: [1, 0] vs [3, 7]; [2, 8] vs [4, 5]\n",
"Round 9: [3, 0] vs [2, 4]; [6, 8] vs [5, 7]\n",
"\n",
"18 games in 9 rounds for 9 players\n",
"Number of times each player plays against each opponent:\n",
"\n",
" | 0 1 2 3 4 5 6 7 8 Total\n",
"--+------------------ -----\n",
" 0| - 2 1 2 2 1 3 3 2 8\n",
" 1| 2 - 3 3 2 2 1 2 1 8\n",
" 2| 1 3 - 3 3 2 2 1 1 8\n",
" 3| 2 3 3 - 1 1 1 2 3 8\n",
" 4| 2 2 3 1 - 2 2 2 2 8\n",
" 5| 1 2 2 1 2 - 3 2 3 8\n",
" 6| 3 1 2 1 2 3 - 2 2 8\n",
" 7| 3 2 1 2 2 2 2 - 2 8\n",
" 8| 2 1 1 3 2 3 2 2 - 8\n"
]
}
],
"source": [
"report([\n",
" ([[1, 7], [4, 0]], [[3, 5], [2, 6]]),\n",
" ([[2, 7], [1, 3]], [[4, 8], [6, 0]]),\n",
" ([[5, 0], [1, 6]], [[7, 8], [3, 4]]),\n",
" ([[7, 0], [5, 8]], [[1, 2], [4, 6]]),\n",
" ([[3, 8], [1, 5]], [[2, 0], [6, 7]]),\n",
" ([[1, 4], [2, 5]], [[3, 6], [8, 0]]),\n",
" ([[5, 6], [4, 7]], [[1, 8], [2, 3]]),\n",
" ([[1, 0], [3, 7]], [[2, 8], [4, 5]]),\n",
" ([[3, 0], [2, 4]], [[6, 8], [5, 7]]) ])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 10 Player Tournament\n",
"\n",
"With *P*=10 there is an odd number of pairings (45), so two players necessarily play one game less than the other players. Let's see what kind of schedule we can come up with:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Round 1: (6, 7) vs (0, 5); (3, 4) vs (2, 8)\n",
"Round 2: (1, 8) vs (0, 3); (7, 9) vs (4, 5)\n",
"Round 3: (3, 6) vs (1, 7); (0, 9) vs (2, 5)\n",
"Round 4: (2, 9) vs (6, 8); (1, 3) vs (4, 7)\n",
"Round 5: (0, 8) vs (5, 7); (4, 6) vs (2, 3)\n",
"Round 6: (2, 4) vs (3, 5); (1, 6) vs (8, 9)\n",
"Round 7: (6, 9) vs (3, 7); (1, 2) vs (5, 8)\n",
"Round 8: (1, 4) vs (5, 9); (0, 7) vs (3, 8)\n",
"Round 9: (1, 5) vs (2, 7); (3, 9) vs (0, 6)\n",
"Round 10: (7, 8) vs (4, 9); (0, 1) vs (2, 6)\n",
"Round 11: (4, 8) vs (5, 6); (0, 2) vs (1, 9)\n",
"\n",
"22 games in 11 rounds for 10 players\n",
"Number of times each player plays against each opponent:\n",
"\n",
" | 0 1 2 3 4 5 6 7 8 9 Total\n",
"--+-------------------- -----\n",
" 0| - 2 2 2 - 2 2 2 2 2 8\n",
" 1| 2 - 3 2 1 2 2 2 2 2 9\n",
" 2| 2 3 - 2 2 3 2 - 2 2 9\n",
" 3| 2 2 2 - 3 - 3 3 2 1 9\n",
" 4| - 1 2 3 - 3 1 2 2 2 8\n",
" 5| 2 2 3 - 3 - 1 3 2 2 9\n",
" 6| 2 2 2 3 1 1 - 2 2 3 9\n",
" 7| 2 2 - 3 2 3 2 - 2 2 9\n",
" 8| 2 2 2 2 2 2 2 2 - 2 9\n",
" 9| 2 2 2 1 2 2 3 2 2 - 9\n",
"CPU times: user 2min 39s, sys: 661 ms, total: 2min 40s\n",
"Wall time: 2min 43s\n"
]
}
],
"source": [
"%time report(hillclimb(P=10))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this schedule several players never play each other; it may be possible to improve on that (in another run that has better luck with random numbers).\n",
"\n",
"# 16 Player Tournament\n",
"\n",
"Let's jump to 16 players on 4 courts (this will take a while):"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Round 1: (0, 12) vs (9, 13); (5, 10) vs (11, 15); (6, 8) vs (1, 3); (2, 7) vs (4, 14)\n",
"Round 2: (5, 12) vs (0, 10); (6, 11) vs (3, 9); (8, 15) vs (2, 14)\n",
"Round 3: (12, 15) vs (4, 6); (10, 13) vs (1, 9); (2, 5) vs (8, 11)\n",
"Round 4: (11, 14) vs (0, 9); (3, 13) vs (7, 10); (2, 15) vs (4, 12)\n",
"Round 5: (10, 11) vs (0, 15); (12, 14) vs (5, 13); (1, 8) vs (6, 9); (3, 7) vs (2, 4)\n",
"Round 6: (3, 11) vs (8, 13); (7, 9) vs (5, 15); (1, 6) vs (4, 10); (2, 12) vs (0, 14)\n",
"Round 7: (3, 10) vs (7, 12); (1, 14) vs (5, 11); (6, 13) vs (4, 8)\n",
"Round 8: (4, 5) vs (0, 8); (6, 10) vs (2, 11); (1, 13) vs (9, 15)\n",
"Round 9: (3, 5) vs (2, 9); (10, 15) vs (1, 7); (0, 11) vs (6, 12); (8, 14) vs (4, 13)\n",
"Round 10: (1, 10) vs (3, 8); (6, 7) vs (5, 9); (11, 12) vs (4, 15)\n",
"Round 11: (4, 7) vs (1, 11); (9, 14) vs (10, 12); (0, 6) vs (2, 13)\n",
"Round 12: (10, 14) vs (5, 8); (9, 12) vs (2, 3); (4, 11) vs (7, 13)\n",
"Round 13: (7, 8) vs (0, 13); (3, 12) vs (1, 5); (14, 15) vs (4, 9)\n",
"Round 14: (0, 5) vs (1, 4); (13, 14) vs (3, 15); (9, 10) vs (2, 8)\n",
"Round 15: (0, 3) vs (1, 15); (2, 6) vs (5, 7)\n",
"Round 16: (7, 11) vs (8, 12); (3, 4) vs (5, 14); (6, 15) vs (0, 2)\n",
"Round 17: (3, 14) vs (9, 11); (8, 10) vs (0, 4); (5, 6) vs (7, 15); (1, 2) vs (12, 13)\n",
"Round 18: (0, 1) vs (7, 14); (13, 15) vs (3, 6)\n",
"Round 19: (11, 13) vs (2, 10); (0, 7) vs (8, 9); (6, 14) vs (1, 12)\n",
"\n",
"60 games in 19 rounds for 16 players\n",
"Number of times each player plays against each opponent:\n",
"\n",
" | 0 1 2 3 4 5 6 7 8 9 A B C D E F Total\n",
"--+-------------------------------- -----\n",
" 0| - 2 2 - 2 2 2 2 3 2 2 2 3 2 2 2 15\n",
" 1| 2 - - 3 2 2 3 2 2 2 3 1 2 2 2 2 15\n",
" 2| 2 - - 2 2 2 3 2 2 2 2 2 3 2 2 2 15\n",
" 3| - 3 2 - 1 2 2 2 2 3 2 2 2 3 2 2 15\n",
" 4| 2 2 2 1 - 2 2 3 3 - 1 2 2 2 3 3 15\n",
" 5| 2 2 2 2 2 - 2 3 2 2 2 2 2 - 3 2 15\n",
" 6| 2 3 3 2 2 2 - 2 2 2 1 2 2 2 - 3 15\n",
" 7| 2 2 2 2 3 3 2 - 2 2 2 2 1 2 1 2 15\n",
" 8| 3 2 2 2 3 2 2 2 - 2 3 2 - 3 2 - 15\n",
" 9| 2 2 2 3 - 2 2 2 2 - 2 2 2 2 3 2 15\n",
" A| 2 3 2 2 1 2 1 2 3 2 - 3 2 2 1 2 15\n",
" B| 2 1 2 2 2 2 2 2 2 2 3 - 2 2 2 2 15\n",
" C| 3 2 3 2 2 2 2 1 - 2 2 2 - 2 3 2 15\n",
" D| 2 2 2 3 2 - 2 2 3 2 2 2 2 - 2 2 15\n",
" E| 2 2 2 2 3 3 - 1 2 3 1 2 3 2 - 2 15\n",
" F| 2 2 2 2 3 2 3 2 - 2 2 2 2 2 2 - 15\n",
"CPU times: user 15min 6s, sys: 2.67 s, total: 15min 9s\n",
"Wall time: 15min 16s\n"
]
}
],
"source": [
"%time report(hillclimb(P=16, C=4))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We get a pretty good schedule, although it takes 19 rounds rather than the 15 it would take if every court was filled, and again there are some players who never face each other."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -8,7 +8,7 @@
"\n",
"# Bad Grade, Good Experience\n",
"\n",
"Recently I was asked a question I hadn't thought about before: \n",
"Recently I was asked a question I hadn't thought about in decades: \n",
"\n",
"> *As a student, did you ever get a bad grade on a programming assignment?* \n",
"\n",
@@ -20,20 +20,20 @@
"\n",
"After studying Snobol a bit, I realized that the expected solution was along these lines:\n",
"\n",
"1. Create an empty `dict` (Snobol calls these \"tables\") whose keys will be words and values will be lists of line numbers.\n",
"1. Create an empty hash table whose keys will be words and values will be lists of line numbers.\n",
"2. Read the lines of text (tracking the line numbers), split them into words, and build up the list of line numbers for each word.\n",
"3. Convert the table into a two-dimensional `array` where each row has the two columns `[word, line_numbers]`.\n",
"3. Convert the hash table into a two-dimensional array where each row has the two columns `[word, line_numbers]`.\n",
"4. Write a function to sort the array alphabetically (`sort` is not built-in to Snobol).\n",
"5. Write a function to print the array.\n",
"\n",
"That would be around 40 to 60 lines of code; an easy task. But I noticed three interesting things about Snobol:\n",
"\n",
"* There is an *indirection* operator, `$`, so if the variable `'X'` has the value `\"A\"`, then `'$X = i'` is the same as `'A = i'`.\n",
"* Uninitialized variables are treated as the empty string, so `'A += \"text\"'` works even if we haven't seen `'A'` before.\n",
"* When the program ends, the Snobol interpreter automatically\n",
"prints the values of every variable, sorted alphabetically, as a debugging aid.\n",
"* '`$`' is an *indirection* operator, so if the variable `'word'` has the value `\"A\"`, then `'$word = x'` is the same as `'A = x'`.\n",
"* Uninitialized variables are treated as the empty string, so `'A = A + \"text\"'` works even if we haven't seen `'A'` before.\n",
"* When the program ends, the Snobol interpreter \n",
"prints out each variable (in sorted order), with its value, as a debugging aid.\n",
"\n",
"That means I could do away with the `dict` and `array` data structures, eliminating steps 1, 3, 4, and 5, and just do step 2! \n",
"That means I could use `$` to do away with the hash table and array data structures, eliminating steps 1, 3, 4, and 5, and just do step 2! \n",
"\n",
"# The Concordance Solution\n",
"\n",
@@ -50,8 +50,8 @@
"source": [
"program = \"\"\"\n",
"for i, line in enumerate(input):\n",
" for word in re.findall(r\"\\w+\", line.upper()):\n",
" $word += str(i) + ', '\n",
" for word in re.findall(\"[A-Z]+\", line.upper()):\n",
" $word = $word + i + \", \"\n",
"\"\"\""
]
},
@@ -61,38 +61,43 @@
"source": [
"That's just 3 lines, not 40 to 60! \n",
"\n",
"To test the program, I'll write a mock Snobol/Python interpreter, which at heart is just a call to the Python interpreter, `exec(program)`, except that it handles the three things I noticed about the Snobol interpreter:\n",
"To test the program, I'll write a mock Snobol/Python interpreter, which at heart is just a call to the Python interpreter, `exec(program)`, except that it handles the three things I mentioned about the Snobol interpreter, plus one more:\n",
"\n",
"* `$word` gets translated as `_context[word]`.\n",
"* It calls `exec(program, _context)`, where `_context` is a `defaultdict(str)`, so variables default to `''`.\n",
"* After the `exec` completes, the user-defined variables (but not the built-in ones) are printed."
"1. `$word` gets translated as `_globals[word]`.\n",
"2. The interpreter calls `exec(program, _globals)`, where `_globals` is a `defaultdict` that makes variables default to the empty string.\n",
"3. After the `exec` completes, the user-defined variables (but not the built-in ones) are printed.\n",
"4. Concatenating a string with an integer coerces the `int` to `str` automatically. I'll handle that with a `Str` class.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"from collections import defaultdict\n",
"import re\n",
"\n",
"def snobol(program, data=''):\n",
" \"\"\"A Python interpreter with three Snobol-ish features:\n",
" (1) $word indirection; (2) variables default to ''; (3) post-mortem dump.\"\"\"\n",
" program = re.sub(r'\\$(\\w+)', r'_context[\\1]', program) # (1) \n",
" _context = defaultdict(str, vars(__builtins__)) # (2) \n",
" _context.update(re=re, input=data.splitlines(), _context=_context)\n",
" builtins = set(_context)\n",
" \"\"\"A Python interpreter with four Snobol-ish features:\n",
" 1. $word indirection; 2. variables default to empty string; \n",
" 3. post-mortem dump; 4. automatic coercing to string\"\"\"\n",
" program = re.sub(r'\\$(\\w+)', r'_globals[\\1]', program) # 1. \n",
" _globals = defaultdict(Str, vars(__builtins__)) # 4., 2.\n",
" _globals.update(re=re, input=data.splitlines(), _globals=_globals)\n",
" builtins = set(_globals) | {'__builtins__'}\n",
" try:\n",
" exec(program, _context)\n",
" exec(program, _globals)\n",
" finally:\n",
" print('-' * 79) # (3)\n",
" for name in sorted(_context):\n",
" if not (name in builtins or name == '__builtins__'):\n",
" print('{:10} = {}'.format(name, _context[name]))"
" print('-' * 79) # 3. \n",
" for name in sorted(_globals):\n",
" if name not in builtins:\n",
" print('{:10} = {}'.format(name, _globals[name]))\n",
" \n",
"class Str(str):\n",
" \"String class with automatic coercion for +\"\n",
" def __add__(self, other): return Str(str(self) + str(other))\n",
" def __radd__(self, other): return Str(str(other) + str(self))\n"
]
},
{
@@ -105,9 +110,7 @@
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [],
"source": [
"data = \"\"\"\n",
@@ -115,17 +118,17 @@
"Singin' \"Do wah diddy diddy dum diddy do\"\n",
"Snappin' her fingers and shufflin' her feet, \n",
"Singin' \"Do wah diddy diddy dum diddy do\"\n",
"She looked good (looked good), she looked fine (looked fine)\n",
"She looked good, she looked fine and I nearly lost my mind\n",
"She looked good (looked good), \n",
"She looked fine (looked fine)\n",
"She looked good, she looked fine \n",
"And I nearly lost my mind\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@@ -133,24 +136,24 @@
"text": [
"-------------------------------------------------------------------------------\n",
"A = 1, \n",
"AND = 3, 6, \n",
"AND = 3, 8, \n",
"DIDDY = 2, 2, 2, 4, 4, 4, \n",
"DO = 2, 2, 4, 4, \n",
"DOWN = 1, \n",
"DUM = 2, 4, \n",
"FEET = 3, \n",
"FINE = 5, 5, 6, \n",
"FINE = 6, 6, 7, \n",
"FINGERS = 3, \n",
"GOOD = 5, 5, 6, \n",
"GOOD = 5, 5, 7, \n",
"HER = 3, 3, \n",
"I = 6, \n",
"I = 8, \n",
"JUST = 1, \n",
"LOOKED = 5, 5, 5, 5, 6, 6, \n",
"LOST = 6, \n",
"MIND = 6, \n",
"MY = 6, \n",
"NEARLY = 6, \n",
"SHE = 1, 5, 5, 6, 6, \n",
"LOOKED = 5, 5, 6, 6, 7, 7, \n",
"LOST = 8, \n",
"MIND = 8, \n",
"MY = 8, \n",
"NEARLY = 8, \n",
"SHE = 1, 5, 6, 7, 7, \n",
"SHUFFLIN = 3, \n",
"SINGIN = 2, 4, \n",
"SNAPPIN = 3, \n",
@@ -160,8 +163,8 @@
"WAH = 2, 4, \n",
"WALKIN = 1, \n",
"WAS = 1, \n",
"i = 6\n",
"line = She looked good, she looked fine and I nearly lost my mind\n",
"i = 8\n",
"line = And I nearly lost my mind\n",
"word = MIND\n"
]
}
@@ -174,15 +177,13 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Oops! The post-mortem printout includes the variables `i`, `line`, and `word`. Reluctantly, I increased the program's line count by 33%:"
"**Oops!** The post-mortem printout includes the variables `i`, `line`, and `word`. Reluctantly, I'll increase the program's line count by 33%:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
@@ -190,24 +191,24 @@
"text": [
"-------------------------------------------------------------------------------\n",
"A = 1, \n",
"AND = 3, 6, \n",
"AND = 3, 8, \n",
"DIDDY = 2, 2, 2, 4, 4, 4, \n",
"DO = 2, 2, 4, 4, \n",
"DOWN = 1, \n",
"DUM = 2, 4, \n",
"FEET = 3, \n",
"FINE = 5, 5, 6, \n",
"FINE = 6, 6, 7, \n",
"FINGERS = 3, \n",
"GOOD = 5, 5, 6, \n",
"GOOD = 5, 5, 7, \n",
"HER = 3, 3, \n",
"I = 6, \n",
"I = 8, \n",
"JUST = 1, \n",
"LOOKED = 5, 5, 5, 5, 6, 6, \n",
"LOST = 6, \n",
"MIND = 6, \n",
"MY = 6, \n",
"NEARLY = 6, \n",
"SHE = 1, 5, 5, 6, 6, \n",
"LOOKED = 5, 5, 6, 6, 7, 7, \n",
"LOST = 8, \n",
"MIND = 8, \n",
"MY = 8, \n",
"NEARLY = 8, \n",
"SHE = 1, 5, 6, 7, 7, \n",
"SHUFFLIN = 3, \n",
"SINGIN = 2, 4, \n",
"SNAPPIN = 3, \n",
@@ -223,8 +224,8 @@
"source": [
"program = \"\"\"\n",
"for i, line in enumerate(input):\n",
" for word in re.findall(r\"\\w+\", line.upper()):\n",
" $word += str(i) + ', '\n",
" for word in re.findall(\"[A-Z]+\", line.upper()):\n",
" $word = $word + i + \", \"\n",
"del i, line, word\n",
"\"\"\"\n",
"\n",
@@ -235,7 +236,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Looks good to me! \n",
"## Looks good to me! \n",
"\n",
"But sadly, the grader for the course did not agree, complaining that my program was not extensible: what if I wanted to cover two or more files in one run? What if I wanted the output to have a slightly different format? I argued that [YAGNI](https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it), and if the requirements\n",
"changed, *then* I would write the necessary 40 or 60 lines, but there's no sense doing that until then. The grader was not impressed with my arguments and I got points taken off. \n",
@@ -268,7 +269,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
"version": "3.5.3"
}
},
"nbformat": 4,

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@@ -1592,7 +1592,7 @@
}
},
"source": [
"An *exhaustive search* considers *every* possible choice of parts, and selects the best solution. On each iteration exhaustive search picks a part (just like greedy search), but then it considers *both* using the part and not using the part. You can see that exhaustive search is almost identical to greedy search, except that it has *two* recursive calls (on lines 7 and 8) instead of *one* (on line 7). (*If you are viewing this in a iPython notebook, not just a web page, you can toggle line numbers by pressing 'ctrl-M L' within a cell.*) How do we choose between the results of the two calls? We need a cost function that we are trying to minimize. (For regex golf the cost of a solution is the length of the string.)"
"An *exhaustive search* considers *every* possible choice of parts, and selects the best solution. On each iteration exhaustive search picks a part (just like greedy search), but then it considers *both* using the part and not using the part. You can see that exhaustive search is almost identical to greedy search, except that it has *two* recursive calls (on lines 7 and 8) instead of *one* (on line 7). (*If you are viewing this in a IPython notebook, not just a web page, you can toggle line numbers by pressing 'ctrl-M L' within a cell.*) How do we choose between the results of the two calls? We need a cost function that we are trying to minimize. (For regex golf the cost of a solution is the length of the string.)"
]
},
{

View File

@@ -1,428 +0,0 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def partition(covers):\n",
" # covers: {w: {r,...}}\n",
" # invcovers: {r: {w,...}}\n",
" pass\n",
"\n",
"def connected(w, covers, invcovers, result):\n",
" if w not in result:\n",
" result.add(w)\n",
" for r in covers[w]:\n",
" for w2 in invcovers[r]:\n",
" connected(w2, covers, invcovers, result)\n",
" return result\n",
"\n",
"for (W, L, legend) in ALL:\n",
" covers = eliminate_dominated(regex_covers(W, L))\n",
" invcovers = invert_multimap(covers)\n",
" start = list(covers)[2]\n",
" P = connected(start, covers, invcovers, set())\n",
" print legend, len(P), len(covers), len(covers)-len(P)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finding Shorter Regexes: Trying Multiple Times\n",
"----\n",
" \n",
"Why run just two versions of `findregex`? Why not run 1000 variations, and then pick the best solution? Of course, I don't want to write 1000 different functions by hand; I want an automated way of varying each run. I can think of three easy things to vary:\n",
" \n",
"* The number '4' in the `score` function. That is, vary the tradeoff between number of winners matched and number of characters.\n",
"* The tie-breaker. In case of a tie, Python's `max` function always picks the first one. Let's make it choose a different 'best' regex from among all those that tie.\n",
"* The greediness. Don't be so greedy (picking the best) every time. Occasionally pick a not-quite-best component, and see if that works out better.\n",
" \n",
"The first of these is easy; we just use the `random.choice` function to choose an integer, `K`, to serve as the tradeoff factor. \n",
"\n",
"The second is easy too. We could write an alternative to the `max` function, say `max_random_tiebreaker`. That would work, but an easier approach is to build the tiebreaker into the `score` function. In addition to awarding points for matching winners and the number of characters, we will have add in a tiebreaker: a random number between 0 and 1. Since all the scores are otherwise integers, this will not change the order of the scores, but it will break ties.\n",
"\n",
"The third we can accomplish by allowing the random factor to be larger than 1 (allowing us to pick a component that is not the shortest) or even larger than `K` (allowing us to pick a component that does not cover the most winners). \n",
" \n",
"I will factor out the function `greedy_search` to do a single computation oof a covering regex, while keeping the name `findregex` for the top level function that now calls `greedy_search` 1000 times and chooses the best (shortest length) result."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"def findregex(winners, losers, tries=1000):\n",
" \"Find a regex that matches all winners but no losers (sets of strings).\"\n",
" # Repeatedly call 'findregex1' the given number of tries; pick the shortest result\n",
" covers = regex_covers(winners, losers)\n",
" results = [greedy_search(winners, covers) for _ in range(tries)]\n",
" return min(results, key=len)\n",
"\n",
"def greedy_search(winners, covers):\n",
" # On each iteration, add the 'best' component in covers to 'result',\n",
" # remove winners covered by best, and remove from 'pool' any components\n",
" # that no longer match any remaining winners.\n",
" winners = set(winners) # Copy input so as not to modify it.\n",
" pool = set(covers)\n",
" result = []\n",
" \n",
" def matches(regex, strings): return {w for w in covers[regex] if w in strings}\n",
" \n",
" K = random.choice((2, 3, 4, 4, 5, 6))\n",
" T = random.choice((1., 1.5, 2., K+1., K+2.))\n",
" def score(c): \n",
" return K * len(matches(c, winners)) - len(c) + random.uniform(0., T)\n",
" \n",
" while winners:\n",
" best = max(pool, key=score)\n",
" result.append(best)\n",
" winners -= covers[best]\n",
" pool -= {c for c in pool if covers[c].isdisjoint(winners)}\n",
" return OR(result)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def factorial1(n):\n",
" if (n <= 1):\n",
" return 1\n",
" else:\n",
" return n * factorial1(n-1)\n",
"\n",
"def factorial2(n, partial_solution=1):\n",
" if (n <= 1):\n",
" return partial_solution\n",
" else:\n",
" return factorial2(n-1, n * partial_solution)\n",
" \n",
"assert factorial1(6) == factorial2(6) == 720"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def findregex(winners, losers, calls=100000):\n",
" \"Find the shortest disjunction of regex components that covers winners but not losers.\"\n",
" covers = regex_covers(winners, losers)\n",
" best = '^(' + OR(winners) + ')$'\n",
" state = Struct(best=best, calls=calls)\n",
" return bb_search('', covers, state).best\n",
"\n",
"def bb_search(regex, covers, state):\n",
" \"\"\"Recursively build a shortest regex from the components in covers.\"\"\"\n",
" if state.calls > 0:\n",
" state.calls -= 1\n",
" regex, covers = simplify_covers(regex, covers)\n",
" if not covers:\n",
" state.best = min(regex, state.best, key=len)\n",
" elif len(OR2(regex, min(covers, key=len))) < len(state.best):\n",
" # Try with and without the greedy-best component\n",
" def score(c): return 4 * len(covers[c]) - len(c)\n",
" best = max(covers, key=score)\n",
" covered = covers[best]\n",
" covers.pop(best)\n",
" bb_search(OR2(regex, best), {c:covers[c]-covered for c in covers}, state)\n",
" bb_search(regex, covers, state)\n",
" return state\n",
"\n",
"class Struct(object):\n",
" \"A mutable structure with specified fields and values.\"\n",
" def __init__(self, **kwds): vars(self).update(kwds)\n",
" def __repr__(self): return '<%s>' % vars(self)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def findregex(winners, losers, calls=100000):\n",
" \"Find the shortest disjunction of regex components that covers winners but not losers.\"\n",
" covers = regex_covers(winners, losers)\n",
" solution = '^(' + OR(winners) + ')$'\n",
" solution, calls = bb_search('', covers, solution, calls)\n",
" return solution\n",
"\n",
"def bb_search(regex, covers, solution, calls):\n",
" \"\"\"Recursively build a shortest regex from the components in covers.\"\"\"\n",
" if calls > 0:\n",
" calls -= 1\n",
" regex, covers = simplify_covers(regex, covers)\n",
" if not covers: # Solution is complete\n",
" solution = min(regex, solution, key=len)\n",
" elif len(OR2(regex, min(covers, key=len))) < len(solution):\n",
" # Try with and without the greedy-best component\n",
" def score(c): return 4 * len(covers[c]) - len(c)\n",
" r = max(covers, key=score) # Best component\n",
" covered = covers[r] # Set of winners covered by r\n",
" covers.pop(r)\n",
" solution, calls = bb_search(OR2(regex, r), \n",
" {c:covers[c]-covered for c in covers}, \n",
" solution, calls)\n",
" solution, calls = bb_search(regex, covers, solution, calls)\n",
" return solution, calls"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def findregex(winners, losers, calls=100000):\n",
" \"Find the shortest disjunction of regex components that covers winners but not losers.\"\n",
" global SOLUTION, CALLS\n",
" SOLUTION = '^(' + OR(winners) + ')$'\n",
" CALLS = calls\n",
" return bb_search(None, regex_covers(winners, losers))\n",
"\n",
"def bb_search(regex, covers):\n",
" \"\"\"Recursively build a shortest regex from the components in covers.\"\"\"\n",
" global SOLUTION, CALLS\n",
" CALLS -= 1\n",
" regex, covers = simplify_covers(regex, covers)\n",
" if not covers: # Solution is complete\n",
" SOLUTION = min(regex, SOLUTION, key=len)\n",
" elif CALLS >= 0 and len(OR(regex, min(covers, key=len))) < len(SOLUTION):\n",
" # Try with and without the greedy-best component\n",
" def score(c): return 4 * len(covers[c]) - len(c)\n",
" r = max(covers, key=score) # Best component\n",
" covered = covers[r] # Set of winners covered by r\n",
" covers.pop(r)\n",
" bb_search(OR(regex, r), {c:covers[c]-covered for c in covers})\n",
" bb_search(regex, covers)\n",
" return SOLUTION\n",
" \n",
"def OR(*regexes):\n",
" \"OR together regexes. Ignore 'None' components.\"\n",
" return '|'.join(r for r in regexes if r is not None)\n",
"\n",
"\n",
"def invert_multimap(multimap):\n",
" result = collections.defaultdict(list)\n",
" for key in multimap:\n",
" for val in multimap[key]:\n",
" result[val].append(key)\n",
" return result"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"## For debugging\n",
"\n",
"def findregex(winners, losers, calls=100000):\n",
" \"Find the shortest disjunction of regex components that covers winners but not losers.\"\n",
" solution = '^(' + OR(winners) + ')$'\n",
" covers = regex_covers(winners, losers)\n",
" b = BranchBound(solution, calls)\n",
" b.search(None, covers)\n",
" print b.calls, 'calls', len(b.solution), 'len'\n",
" return b.solution\n",
"\n",
"\n",
"def triage_covers(partial, covers):\n",
" \"Simplify covers by eliminating dominated regexes, and picking ones that uniquely cover a winner.\"\n",
" previous = None\n",
" while covers != previous:\n",
" previous = covers\n",
" # Eliminate regexes that are dominated by another regex\n",
" covers = eliminate_dominated(covers) # covers = {regex: {winner,...}}\n",
" coverers = invert_multimap(covers) # coverers = {winner: {regex,...}}\n",
" # For winners covered by only one component, move winner from covers to regex\n",
" singletons = {coverers[w][0] for w in coverers if len(coverers[w]) == 1}\n",
" if singletons:\n",
" partial = OR(partial, OR(singletons))\n",
" covered = {w for c in singletons for w in covers[c]}\n",
" covers = {c:covers[c]-covered for c in covers if c not in singletons}\n",
" return partial, covers\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
", and to , who suggested looking at [WFSTs](http://www.openfst.org/twiki/bin/view/FST/WebHome)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"def regex_covers(winners, losers):\n",
" \"\"\"Generate regex components and return a dict of {regex: {winner...}}.\n",
" Each regex matches at least one winner and no loser.\"\"\"\n",
" losers_str = '\\n'.join(losers)\n",
" wholes = {'^'+winner+'$' for winner in winners}\n",
" parts = {d for w in wholes for p in subparts(w) for d in dotify(p)}\n",
" chars = set(cat(winners))\n",
" pairs = {A+'.'+rep_char+B for A in chars for B in chars for rep_char in '+*?'}\n",
" reps = {r for p in parts for r in repetitions(p)}\n",
" pool = wholes | parts | pairs | reps \n",
" searchers = [re.compile(c, re.MULTILINE).search for c in pool]\n",
" covers = {r: set(filter(searcher, winners)) \n",
" for (r, searcher) in zip(pool, searchers)\n",
" if not searcher(losers_str)}\n",
" covers = eliminate_dominated(covers)\n",
" return covers\n",
" return add_character_class_components(covers)\n",
"\n",
"def add_character_class_components(covers):\n",
" for (B, Ms, E) in combine_splits(covers):\n",
" N = len(Ms)\n",
" or_size = N*len(B+'.'+E) + N-1 # N=3 => 'B1E|B2E|B3E'\n",
" class_size = len(B+'[]'+E) + N # N=3 => 'B[123]E'\n",
" winners = {w for m in Ms for w in Ms[m]}\n",
" if class_size < or_size:\n",
" covers[B + make_char_class(Ms) + E] = winners\n",
" return covers\n",
"\n",
"def split3(word):\n",
" \"Splits a word into 3 parts, all ways, with middle part having 0 or 1 character.\"\n",
" return [(word[:i], word[i:i+L], word[i+L:]) \n",
" for i in range(len(word)+1) for L in (0, 1)\n",
" if not word[i:i+L].startswith(('.', '+', '*', '?'))]\n",
"\n",
"def combine_splits(covers):\n",
" \"Convert covers = {BME: {w...}} into a list of [(B, {M...}, E, {w...}].\"\n",
" table = collections.defaultdict(dict) # table = {(B, E): {M: {w...}}}\n",
" for r in covers:\n",
" for (B, M, E) in split3(r):\n",
" table[B, E][M] = covers[r]\n",
" return [(B, Ms, E) for ((B, E), Ms) in table.items()\n",
" if len(Ms) > 1]\n",
"\n",
"def make_char_class(chars):\n",
" chars = set(chars)\n",
" return '[%s]%s' % (cat(chars), ('?' if '' in chars else ''))\n",
"\n",
"covers = regex_covers(boys, girls)\n",
"old = set(covers)\n",
"print len(covers)\n",
"covers = add_character_class_components(covers)\n",
"print len(covers)\n",
"print set(covers) - old\n",
"\n",
"print dict(combine_splits({'..a': {1,2,3}, '..b': {4,5,6}, '..c':{7}}))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Consider the two components `'..a'` and `'..b'`. If we wanted to cover all the winners that both of these match, we could use `'..a|..b'`, or we could share the common prefix and introduce a *character class* to get `'..[ab]'`. Since the former is 7 characters and the later is only 6, the later would be preferred. It would be an even bigger win to replace `'..az|..bz|..cz'` with `'..[abc]z'`; that reduces the count from 14 to 8. Similarly, replacing `'..az|..bz|..z'` with `'..[ab]?z'` saves 5 characters.\n",
"\n",
"There seems to be potential savings with character classes. But how do we know which characters from which components to combine into classes? To keep things from getting out of control, I'm going to only look at components that are left after we eliminate dominated. That is not an ideal approach&mdash;there may well be some components that are dominated on their own, but could be part of an optimal solution when combined with other components into a character class. But I'm going to keep it simple."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"Searching: Better Bounds\n",
"----\n",
"\n",
"Branch and bound prunes the search tree whenever it is on a branch that is guaranteed to result in a solution that is no better than the best solution found so far. Currently we estimate the best possible solution along the current branch by taking the length of the partial solution and adding the length of the shortest component in `covers`. We do that because we know for sure that we need at least one component, but we don't know for sure how many components we'll need (nor how long each of them will be. So our estimate is often severely underestimates the true answer, which means we don't cut off search some places where we could, if only we had a better estimate.\n",
" \n",
"Here's one way to get a better bound. We'll define the following quantities:\n",
"\n",
"+ *P* = the length of the partial solution, plus the \"|\", if needed. So if the partial solution is `None`, then *P* will be zero, otherwise *P* is the length plus 1.\n",
"+ *S* = the length of the shortest regex component in `covers`.\n",
"+ *W* = the number of winners still in `covers`.\n",
"+ *C* = the largest number of winners covered by any regex in `covers`.\n",
"\n",
"If we assume The current estimate is *P* + *S*. We can see that a better estimate is *P* + *S* &times; ceil(*W* / *C*)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import math\n",
"\n",
"class BranchBound(object):\n",
" \"Hold state information for a branch and bound search.\"\n",
" def __init__(self, solution, calls):\n",
" self.solution, self.calls = solution, calls\n",
" \n",
" def search(self, covers, partial=None):\n",
" \"Recursively extend partial regex until it matches all winners in covers.\"\n",
" if self.calls <= 0: \n",
" return self.solution\n",
" self.calls -= 1\n",
" covers, partial = simplify_covers(covers, partial)\n",
" if not covers: # Nothing left to cover; solution is complete\n",
" self.solution = min(partial, self.solution, key=len)\n",
" else:\n",
" P = 0 if not partial else len(partial) + 1\n",
" S = len(min(covers, key=len))\n",
" C = max(len(covers[r]) for r in covers)\n",
" W = len(set(w for r in covers for w in covers[r]))\n",
" if P + S * math.ceil(W / C) < len(self.solution):\n",
" # Try with and without the greedy-best component\n",
" def score(r): return 4 * len(covers[r]) - len(r)\n",
" r = max(covers, key=score) # Best component\n",
" covered = covers[r] # Set of winners covered by r\n",
" covers.pop(r)\n",
" self.search({c:covers[c]-covered for c in covers}, OR(partial, r))\n",
" self.search(covers, partial)\n",
" return self.solution"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}

View File

@@ -57,7 +57,7 @@ def standard_env():
'>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '=':op.eq,
'abs': abs,
'append': op.add,
'apply': apply,
'apply': lambda proc, args: proc(*args),
'begin': lambda *x: x[-1],
'car': lambda x: x[0],
'cdr': lambda x: x[1:],
@@ -142,4 +142,4 @@ def eval(x, env=global_env):
else: # (proc arg...)
proc = eval(x[0], env)
args = [eval(exp, env) for exp in x[1:]]
return proc(*args)
return proc(*args)

258
py/ngrams.py Normal file
View File

@@ -0,0 +1,258 @@
"""
Code to accompany the chapter "Natural Language Corpus Data"
from the book "Beautiful Data" (Segaran and Hammerbacher, 2009)
http://oreilly.com/catalog/9780596157111/
Code copyright (c) 2008-2009 by Peter Norvig
You are free to use this code under the MIT licencse:
http://www.opensource.org/licenses/mit-license.php
"""
import re, string, random, glob, operator, heapq
from collections import defaultdict
from math import log10
def memo(f):
"Memoize function f."
table = {}
def fmemo(*args):
if args not in table:
table[args] = f(*args)
return table[args]
fmemo.memo = table
return fmemo
def test(verbose=None):
"""Run some tests, taken from the chapter.
Since the hillclimbing algorithm is randomized, some tests may fail."""
import doctest
print 'Running tests...'
doctest.testfile('ngrams-test.txt', verbose=verbose)
################ Word Segmentation (p. 223)
@memo
def segment(text):
"Return a list of words that is the best segmentation of text."
if not text: return []
candidates = ([first]+segment(rem) for first,rem in splits(text))
return max(candidates, key=Pwords)
def splits(text, L=20):
"Return a list of all possible (first, rem) pairs, len(first)<=L."
return [(text[:i+1], text[i+1:])
for i in range(min(len(text), L))]
def Pwords(words):
"The Naive Bayes probability of a sequence of words."
return product(Pw(w) for w in words)
#### Support functions (p. 224)
def product(nums):
"Return the product of a sequence of numbers."
return reduce(operator.mul, nums, 1)
class Pdist(dict):
"A probability distribution estimated from counts in datafile."
def __init__(self, data=[], N=None, missingfn=None):
for key,count in data:
self[key] = self.get(key, 0) + int(count)
self.N = float(N or sum(self.itervalues()))
self.missingfn = missingfn or (lambda k, N: 1./N)
def __call__(self, key):
if key in self: return self[key]/self.N
else: return self.missingfn(key, self.N)
def datafile(name, sep='\t'):
"Read key,value pairs from file."
for line in file(name):
yield line.split(sep)
def avoid_long_words(key, N):
"Estimate the probability of an unknown word."
return 10./(N * 10**len(key))
N = 1024908267229 ## Number of tokens
Pw = Pdist(datafile('count_1w.txt'), N, avoid_long_words)
#### segment2: second version, with bigram counts, (p. 226-227)
def cPw(word, prev):
"Conditional probability of word, given previous word."
try:
return P2w[prev + ' ' + word]/float(Pw[prev])
except KeyError:
return Pw(word)
P2w = Pdist(datafile('count_2w.txt'), N)
@memo
def segment2(text, prev='<S>'):
"Return (log P(words), words), where words is the best segmentation."
if not text: return 0.0, []
candidates = [combine(log10(cPw(first, prev)), first, segment2(rem, first))
for first,rem in splits(text)]
return max(candidates)
def combine(Pfirst, first, (Prem, rem)):
"Combine first and rem results into one (probability, words) pair."
return Pfirst+Prem, [first]+rem
################ Secret Codes (p. 228-230)
def encode(msg, key):
"Encode a message with a substitution cipher."
return msg.translate(string.maketrans(ul(alphabet), ul(key)))
def ul(text): return text.upper() + text.lower()
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def shift(msg, n=13):
"Encode a message with a shift (Caesar) cipher."
return encode(msg, alphabet[n:]+alphabet[:n])
def logPwords(words):
"The Naive Bayes probability of a string or sequence of words."
if isinstance(words, str): words = allwords(words)
return sum(log10(Pw(w)) for w in words)
def allwords(text):
"Return a list of alphabetic words in text, lowercase."
return re.findall('[a-z]+', text.lower())
def decode_shift(msg):
"Find the best decoding of a message encoded with a shift cipher."
candidates = [shift(msg, n) for n in range(len(alphabet))]
return max(candidates, key=logPwords)
def shift2(msg, n=13):
"Encode with a shift (Caesar) cipher, yielding only letters [a-z]."
return shift(just_letters(msg), n)
def just_letters(text):
"Lowercase text and remove all characters except [a-z]."
return re.sub('[^a-z]', '', text.lower())
def decode_shift2(msg):
"Decode a message encoded with a shift cipher, with no spaces."
candidates = [segment2(shift(msg, n)) for n in range(len(alphabet))]
p, words = max(candidates)
return ' '.join(words)
#### General substitution cipher (p. 231-233)
def logP3letters(text):
"The log-probability of text using a letter 3-gram model."
return sum(log10(P3l(g)) for g in ngrams(text, 3))
def ngrams(seq, n):
"List all the (overlapping) ngrams in a sequence."
return [seq[i:i+n] for i in range(1+len(seq)-n)]
P3l = Pdist(datafile('count_3l.txt'))
P2l = Pdist(datafile('count_2l.txt')) ## We'll need it later
def hillclimb(x, f, neighbors, steps=10000):
"Search for an x that maximizes f(x), considering neighbors(x)."
fx = f(x)
neighborhood = iter(neighbors(x))
for i in range(steps):
x2 = neighborhood.next()
fx2 = f(x2)
if fx2 > fx:
x, fx = x2, fx2
neighborhood = iter(neighbors(x))
if debugging: print 'hillclimb:', x, int(fx)
return x
debugging = False
def decode_subst(msg, steps=4000, restarts=90):
"Decode a substitution cipher with random restart hillclimbing."
msg = cat(allwords(msg))
candidates = [hillclimb(encode(msg, key=cat(shuffled(alphabet))),
logP3letters, neighboring_msgs, steps)
for _ in range(restarts)]
p, words = max(segment2(c) for c in candidates)
return ' '.join(words)
def shuffled(seq):
"Return a randomly shuffled copy of the input sequence."
seq = list(seq)
random.shuffle(seq)
return seq
cat = ''.join
def neighboring_msgs(msg):
"Generate nearby keys, hopefully better ones."
def swap(a,b): return msg.translate(string.maketrans(a+b, b+a))
for bigram in heapq.nsmallest(20, set(ngrams(msg, 2)), P2l):
b1,b2 = bigram
for c in alphabet:
if b1==b2:
if P2l(c+c) > P2l(bigram): yield swap(c,b1)
else:
if P2l(c+b2) > P2l(bigram): yield swap(c,b1)
if P2l(b1+c) > P2l(bigram): yield swap(c,b2)
while True:
yield swap(random.choice(alphabet), random.choice(alphabet))
################ Spelling Correction (p. 236-)
def corrections(text):
"Spell-correct all words in text."
return re.sub('[a-zA-Z]+', lambda m: correct(m.group(0)), text)
def correct(w):
"Return the word that is the most likely spell correction of w."
candidates = edits(w).items()
c, edit = max(candidates, key=lambda (c,e): Pedit(e) * Pw(c))
return c
def Pedit(edit):
"The probability of an edit; can be '' or 'a|b' or 'a|b+c|d'."
if edit == '': return (1. - p_spell_error)
return p_spell_error*product(P1edit(e) for e in edit.split('+'))
p_spell_error = 1./20.
P1edit = Pdist(datafile('count_1edit.txt')) ## Probabilities of single edits
def edits(word, d=2):
"Return a dict of {correct: edit} pairs within d edits of word."
results = {}
def editsR(hd, tl, d, edits):
def ed(L,R): return edits+[R+'|'+L]
C = hd+tl
if C in Pw:
e = '+'.join(edits)
if C not in results: results[C] = e
else: results[C] = max(results[C], e, key=Pedit)
if d <= 0: return
extensions = [hd+c for c in alphabet if hd+c in PREFIXES]
p = (hd[-1] if hd else '<') ## previous character
## Insertion
for h in extensions:
editsR(h, tl, d-1, ed(p+h[-1], p))
if not tl: return
## Deletion
editsR(hd, tl[1:], d-1, ed(p, p+tl[0]))
for h in extensions:
if h[-1] == tl[0]: ## Match
editsR(h, tl[1:], d, edits)
else: ## Replacement
editsR(h, tl[1:], d-1, ed(h[-1], tl[0]))
## Transpose
if len(tl)>=2 and tl[0]!=tl[1] and hd+tl[1] in PREFIXES:
editsR(hd+tl[1], tl[0]+tl[2:], d-1,
ed(tl[1]+tl[0], tl[0:2]))
## Body of edits:
editsR('', word, d, [])
return results
PREFIXES = set(w[:i] for w in Pw for i in range(len(w) + 1))

View File

@@ -1 +1,2 @@
numpy
matplotlib