added jupyter notebooks
This commit is contained in:
parent
b495fc323c
commit
c08ec2a796
3226
Advent of Code.ipynb
Normal file
3226
Advent of Code.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
1096
Beal.ipynb
Normal file
1096
Beal.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
487
Cheryl's Birthday.ipynb
Normal file
487
Cheryl's Birthday.ipynb
Normal file
@ -0,0 +1,487 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"When is Cheryl's Birthday?\n",
|
||||
"===\n",
|
||||
"\n",
|
||||
"[This puzzle](https://www.google.com/webhp?#q=cheryl%27s%20birthday) has been making the rounds:\n",
|
||||
"\n",
|
||||
"1. Albert and Bernard just became friends with Cheryl, and they want to know when her birthday is. Cheryl gave them a list of 10 possible dates:\n",
|
||||
"<pre>\n",
|
||||
" May 15 May 16 May 19\n",
|
||||
" June 17 June 18\n",
|
||||
" July 14 July 16\n",
|
||||
" August 14 August 15 August 17\n",
|
||||
"</pre>\n",
|
||||
" \n",
|
||||
"2. Cheryl then tells Albert and Bernard separately the month and the day of the birthday resepctively.\n",
|
||||
" \n",
|
||||
"3. Albert: I don't know when Cheryl's birthday is, but I know that Bernard does not know too.\n",
|
||||
" \n",
|
||||
"4. Bernard: At first I don't know when Cheryl's birthday is, but I know now.\n",
|
||||
" \n",
|
||||
"5. Albert: Then I also know when Cheryl's birthday is.\n",
|
||||
" \n",
|
||||
"6. So when is Cheryl's birthday?\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"1. Cheryl gave them a list of 10 possible dates:\n",
|
||||
"---\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"DATES = ['May 15', 'May 16', 'May 19',\n",
|
||||
" 'June 17', 'June 18',\n",
|
||||
" 'July 14', 'July 16',\n",
|
||||
" 'August 14', 'August 15', 'August 17']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We'll define accessor functions for the month and day of a date:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def Month(date): return date.split()[0]\n",
|
||||
"\n",
|
||||
"def Day(date): return date.split()[1]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'May'"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"Month('May 15')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'15'"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"Day('May 15')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"2. Cheryl then tells Albert and Bernard separately the month and the day of the birthday respectively.\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"We can define the idea of **telling**, and while we're at it, the idea of **knowing** a birthdate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def tell(part, possible_dates=DATES):\n",
|
||||
" \"Cheryl tells a part of her birthdate to someone; return a new list of possible dates that match the part.\"\n",
|
||||
" return [date for date in possible_dates if part in date]\n",
|
||||
"\n",
|
||||
"def know(possible_dates):\n",
|
||||
" \"A person knows the birthdate if they have exactly one possible date.\"\n",
|
||||
" return len(possible_dates) == 1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that we use a *list of dates* to represent someone's knowledge of the possible birthdates, and that someone *knows* the birthdate when they get down to only one possibility. For example: If Cheryl tells Albert that her birthday is in May, he would know there is a list of three possible birthdates:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['May 15', 'May 16', 'May 19']"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tell('May')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And if she tells Bernard that her birthday is on the 15th, he would know there are two possibilities:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['May 15', 'August 15']"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tell('15')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"With two possibilities, Bernard does not know the birthdate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"False"
|
||||
]
|
||||
},
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"know(tell('15'))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Overall Strategy\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"When Cheryl tells Albert `'May'` then *he* knows there are three possibilities, but *we* (the puzzle solvers) don't, because we don't know what Cheryl said. So what can we do? We will consider *all* of the possible dates, one at a time. For example, first consider `'May 15'`. Cheryl tells Albert `'May'` and Bernard `'15'`, giving them the lists of possible birthdates shown above. We can then check whether statements 3 through 5 are true in this scenario. If they are, then `'May 15'` is a solution to the puzzle. Repeat the process for each of the other possible dates. If all goes well, there should be exactly one solution. \n",
|
||||
"\n",
|
||||
"Here is the main function, `cheryls_birthday`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def cheryls_birthday(possible_dates=DATES):\n",
|
||||
" \"Return a list of the possible dates for which all three statements are true.\"\n",
|
||||
" return filter(all3, possible_dates)\n",
|
||||
"\n",
|
||||
"def all3(date): return statement3(date) and statement4(date) and statement5(date)\n",
|
||||
"\n",
|
||||
"## TO DO: define statement3, statement4, statement5"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
" (*Python note:* `filter(predicate, items)` returns a list of all items for which `predicate(item)` is true.)\n",
|
||||
" \n",
|
||||
" 3. Albert: I don't know when Cheryl's birthday is, but I know that Bernard does not know too.\n",
|
||||
"---"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The function `statement3` takes as input a possible birthdate and returns true if Albert's statement is true for that birthdate. How do we go from Albert's English statement to a Python function? Let's paraphrase in a form that is closer to Python code:\n",
|
||||
"\n",
|
||||
"> **Albert**: After Cheryl told me the month of her birthdate, I didn't know her birthday. I also don't know what day Cheryl told Bernard, but for *any* of the possible dates, if Bernard was told the day of that date, he would not know Cheryl's birthday.\n",
|
||||
"\n",
|
||||
"That I can translate directly into code:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def statement3(date):\n",
|
||||
" \"Albert: I don't know when Cheryl's birthday is, but I know that Bernard does not know too.\"\n",
|
||||
" possible_dates = tell(Month(date))\n",
|
||||
" return (not know(possible_dates) \n",
|
||||
" and all(not know(tell(Day(d)))\n",
|
||||
" for d in possible_dates))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We haven't solved the puzzle yet, but let's take a peek and see which dates satisfy statement 3:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['July 14', 'July 16', 'August 14', 'August 15', 'August 17']"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"filter(statement3, DATES)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"4. Bernard: At first I don't know when Cheryl's birthday is, but I know now.\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"Again, a paraphrase:\n",
|
||||
"\n",
|
||||
"> **Bernard:** At first Cheryl told me the day, and I didn't know. Then I considered just the dates for which Albert's statement 3 is true, and now I know."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def statement4(date):\n",
|
||||
" \"Bernard: At first I don't know when Cheryl's birthday is, but I know now.\"\n",
|
||||
" at_first = tell(Day(date))\n",
|
||||
" return (not know(at_first) \n",
|
||||
" and know(filter(statement3, at_first)))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's see which dates satisfy both statement 3 and statement 4:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['July 16', 'August 15', 'August 17']"
|
||||
]
|
||||
},
|
||||
"execution_count": 13,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"filter(statement4, filter(statement3, DATES))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Wait a minute—I thought that Bernard **knew**?! Why are there three possible dates? Bernard does indeed know; it is just that we, the puzzle solvers, don't know. That's because Bernard knows something we don't know: the day. We won't know until after statement 5.\n",
|
||||
"\n",
|
||||
"5. Albert: Then I also know when Cheryl's birthday is.\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"Albert is saying that after hearing the month and Bernard's statement 4, he now knows Cheryl's birthday:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 14,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def statement5(date):\n",
|
||||
" \"Albert: Then I also know when Cheryl's birthday is.\"\n",
|
||||
" return know(filter(statement4, tell(Month(date))))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"6. So when is Cheryl's birthday?\n",
|
||||
"---\n",
|
||||
"\n",
|
||||
"Let's see:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"['July 16']"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"cheryls_birthday()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Success!** We have deduced that Cheryl's birthday is **July 16**. It is now `True` that we know Cheryl's birthday:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 16,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"True"
|
||||
]
|
||||
},
|
||||
"execution_count": 16,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"know(cheryls_birthday())"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
1193
Cheryl-and-Eve.ipynb
Normal file
1193
Cheryl-and-Eve.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
2039
Convex Hull.ipynb
Normal file
2039
Convex Hull.ipynb
Normal file
File diff suppressed because one or more lines are too long
1919
Countdown.ipynb
Normal file
1919
Countdown.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
1392
Differentiation.ipynb
Normal file
1392
Differentiation.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
1246
Economics.ipynb
Normal file
1246
Economics.ipynb
Normal file
File diff suppressed because one or more lines are too long
1477
Fred Buns.ipynb
Normal file
1477
Fred Buns.ipynb
Normal file
File diff suppressed because one or more lines are too long
3785
Gesture Typing.ipynb
Normal file
3785
Gesture Typing.ipynb
Normal file
File diff suppressed because one or more lines are too long
1703
Golomb-Puzzle.ipynb
Normal file
1703
Golomb-Puzzle.ipynb
Normal file
File diff suppressed because one or more lines are too long
732
How To Count Things.ipynb
Normal file
732
How To Count Things.ipynb
Normal file
@ -0,0 +1,732 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import re\n",
|
||||
"import itertools\n",
|
||||
"from collections import defaultdict"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# How to Count Things\n",
|
||||
"\n",
|
||||
"## Student Records: Late, Absent, Present\n",
|
||||
"\n",
|
||||
"Consider this problem:\n",
|
||||
"\n",
|
||||
"> (1) Students at a school must meet with the guidance counselor if they have two absences, or three consecutive late days. Each student's attendance record consists of a string of 'A' for absent, 'L' for late, or 'P' for present. For example: \"LAPLPA\" requires a meeting (because there are two absences), and \"LAPLPL\" is OK (there are three late days, but they are not consecutive). Write a function that takes such a string as input and that true if the student's record is OK. \n",
|
||||
"\n",
|
||||
"> (2) Write a function to calculate the number of attendance records of length N that are OK.\n",
|
||||
"\n",
|
||||
"For part (1), the simplest approach is to use `re.search`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def ok(record: str) -> bool: return not re.search(r'LLL|A.*A', record)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 57,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'ok'"
|
||||
]
|
||||
},
|
||||
"execution_count": 57,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def test_ok():\n",
|
||||
" assert ok(\"LAPLLP\")\n",
|
||||
" assert not ok(\"LAPLLL\") # 3 Ls in a row\n",
|
||||
" assert not ok(\"LAPLLA\") # 2 As overall\n",
|
||||
" assert ok(\"APLLPLLP\")\n",
|
||||
" assert not ok(\"APLLPLLL\") # 3 Ls in a row\n",
|
||||
" assert not ok(\"APLLPLLA\") # 2 As overall\n",
|
||||
" return 'ok'\n",
|
||||
" \n",
|
||||
"test_ok() "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For part (2), I'll start with a simple (but slow) solution that enumerates all possible strings (using `itertools.product`) and checks each one. I use the `quantify` recipe from `itertools` to count them:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 29,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def total_ok_slow(N: int) -> int:\n",
|
||||
" \"How many strings over 'LAP' of length N are ok?\"\n",
|
||||
" return quantify(all_strings('LAP', N), ok)\n",
|
||||
"\n",
|
||||
"def quantify(iterable, pred=bool):\n",
|
||||
" \"Count how many times the predicate is true of items in iterable.\"\n",
|
||||
" return sum(map(pred, iterable))\n",
|
||||
"\n",
|
||||
"cat = ''.join\n",
|
||||
"\n",
|
||||
"def all_strings(alphabet, N): return map(cat, itertools.product(alphabet, repeat=N))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{0: 1,\n",
|
||||
" 1: 3,\n",
|
||||
" 2: 8,\n",
|
||||
" 3: 19,\n",
|
||||
" 4: 43,\n",
|
||||
" 5: 94,\n",
|
||||
" 6: 200,\n",
|
||||
" 7: 418,\n",
|
||||
" 8: 861,\n",
|
||||
" 9: 1753,\n",
|
||||
" 10: 3536}"
|
||||
]
|
||||
},
|
||||
"execution_count": 30,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"{N: total_ok_slow(N) for N in range(11)}"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This looks good, but\n",
|
||||
"I will need a more efficient algorithm to handle large values of *N*. Here's how I think about it:\n",
|
||||
"\n",
|
||||
"* I can't enumerate all the strings; there are too many of them, 3<sup>N</sup>. \n",
|
||||
"* Even if I only enumerate the ok strings, there are still too many, O(2<sup>N</sup>).\n",
|
||||
"* Instead, I'll want to keep track of a *summary* of all the ok strings of length *N*, and use that to quickly compute a summary of the ok strings of length *N*+1. I recognize this as a *[dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming)* approach.\n",
|
||||
"\n",
|
||||
"* What is in the summary? A list of all ok strings is too much. A count of the number of ok strings is not enough. Instead, I will group together the strings that have the same number of `'A'` characters in them, and the same number of consecutive `'L'` characters at the end of the string, and count them. I don't need to count strings that have two or more `'A'` characters, or 3 consecutive `'L'` characters anywhere in the string. And I don't need to worry about runs of 1 or 2 `'L'` characters embedded in the middle of the string. So the summary is a mapping of the form `{(A, L): count, ...}`. \n",
|
||||
"\n",
|
||||
"* For *N* = 2, the summary looks like this:\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" #(A, L): count\n",
|
||||
" {(0, 0): 2, # LP, PP\n",
|
||||
" (0, 1): 1, # PL\n",
|
||||
" (0, 2): 1, # LL\n",
|
||||
" (1, 0): 1, # AP, LA, PA\n",
|
||||
" (1, 1): 1} # AL\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Here is a function to create the summary for `N+1`, given the summary for `N`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 33,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def next_summary(prev_summary):\n",
|
||||
" \"Given a summary of the form {(A, L): count, ...}, return the summary for strings one character longer.\"\n",
|
||||
" summary = defaultdict(int)\n",
|
||||
" for (A, L), c in prev_summary.items():\n",
|
||||
" if A < 1: summary[A+1, 0] += c # transition with 'A'\n",
|
||||
" if L < 2: summary[A, L+1] += c # transition with 'L'\n",
|
||||
" summary[A, 0] += c # transition with 'P'\n",
|
||||
" return summary"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For `N = 0`, the summary is `{(0, 0): 1}`, because there is one string, the empty string, which has no `'A'` nor `'L'`. From there we can proceed in a \"bottom-up\" fashion to compute the total number of OK strings for any value of `N`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here's a \"bottom-up\" approach for `total_ok` that starts at `0` and works up to `N`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 34,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def total_ok(N):\n",
|
||||
" \"How many strings of length N are ok?\"\n",
|
||||
" summary = {(0, 0): 1}\n",
|
||||
" for _ in range(N):\n",
|
||||
" summary = next_summary(summary)\n",
|
||||
" return sum(summary.values()) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can use this to go way beyond what we could do with `total_ok_slow`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 35,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 2.43 ms, sys: 50 µs, total: 2.48 ms\n",
|
||||
"Wall time: 2.48 ms\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"5261545087067582125179062608958232695543100705754634272071166414871321070487675367"
|
||||
]
|
||||
},
|
||||
"execution_count": 35,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%time total_ok(300)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are over 10<sup>80</sup> ok strings of length 300; more than the number of atoms in the universe. But it only took around a millisecond to count them.\n",
|
||||
"\n",
|
||||
"Dynamic programming can be done top-down (where we start at `N` and work down to `0`):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def total_ok(N):\n",
|
||||
" \"How many strings of length N are ok?\"\n",
|
||||
" return sum(summary_for(N).values())\n",
|
||||
" \n",
|
||||
"def summary_for(N): return ({(0, 0): 1} if N == 0 else next_summary(summary_for(N - 1)))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 2.33 ms, sys: 538 µs, total: 2.87 ms\n",
|
||||
"Wall time: 4.83 ms\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"5261545087067582125179062608958232695543100705754634272071166414871321070487675367"
|
||||
]
|
||||
},
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"%time total_ok(300)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's make sure we're getting the same results as before, and take a look at the summaries for the first few values of `N`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 38,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" 0 1 {(0, 0): 1}\n",
|
||||
" 1 3 {(0, 1): 1, (1, 0): 1, (0, 0): 1}\n",
|
||||
" 2 8 {(0, 1): 1, (1, 0): 3, (0, 0): 2, (0, 2): 1, (1, 1): 1}\n",
|
||||
" 3 19 {(0, 1): 2, (1, 2): 1, (0, 0): 4, (1, 0): 8, (0, 2): 1, (1, 1): 3}\n",
|
||||
" 4 43 {(0, 1): 4, (1, 2): 3, (0, 0): 7, (1, 0): 19, (0, 2): 2, (1, 1): 8}\n",
|
||||
" 5 94 {(0, 1): 7, (1, 2): 8, (0, 0): 13, (1, 0): 43, (0, 2): 4, (1, 1): 19}\n",
|
||||
" 6 200 {(0, 1): 13, (1, 2): 19, (0, 0): 24, (1, 0): 94, (0, 2): 7, (1, 1): 43}\n",
|
||||
" 7 418 {(0, 1): 24, (1, 2): 43, (0, 0): 44, (1, 0): 200, (0, 2): 13, (1, 1): 94}\n",
|
||||
" 8 861 {(0, 1): 44, (1, 2): 94, (0, 0): 81, (1, 0): 418, (0, 2): 24, (1, 1): 200}\n",
|
||||
" 9 1753 {(0, 1): 81, (1, 2): 200, (0, 0): 149, (1, 0): 861, (0, 2): 44, (1, 1): 418}\n",
|
||||
"10 3536 {(0, 1): 149, (1, 2): 418, (0, 0): 274, (1, 0): 1753, (0, 2): 81, (1, 1): 861}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for N in range(11): \n",
|
||||
" assert total_ok(N) == total_ok_slow(N)\n",
|
||||
" print('{:2} {:4} {}'.format(N, total_ok(N), dict(summary_for(N))))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# >>> Count Strings with Alphabetic First Occurences\n",
|
||||
"\n",
|
||||
"> Given an alphabet of length k, how many strings of length k can be formed such that the first occurrences of each character in the string are a prefix of the alphabet?\n",
|
||||
"\n",
|
||||
"Let's first make sure we understand the problem. I will choose to represent a string as a list of integers, like `[0, 1, 2]` rather than as a `str` like `\"abc\"`, and the alphabet will always be `range(k)`. So, the string `[0, 1, 0, 2]` would be valid, because the first occurrences are `[0, 1, 2]`, but `[0, 1, 0, 3]` would not be valid, since `[0, 1, 3]` is not a prefix of `range(4)`. I'll define three key concepts:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 39,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def valid(s): return is_prefix(first_occurrences(s))\n",
|
||||
"\n",
|
||||
"def is_prefix(s): return s == list(range(len(s)))\n",
|
||||
"\n",
|
||||
"def first_occurrences(s):\n",
|
||||
" \"The unique elements of s, in the order they appear.\" \n",
|
||||
" firsts = []\n",
|
||||
" for x in s:\n",
|
||||
" if x not in firsts: firsts.append(x)\n",
|
||||
" return firsts "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'ok'"
|
||||
]
|
||||
},
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def test(): # s firsts(s) valid(s)\n",
|
||||
" assert test1([0, 1, 2], [0, 1, 2], True) \n",
|
||||
" assert test1([0, 0, 0], [0], True) \n",
|
||||
" assert test1([1], [1], False) \n",
|
||||
" assert test1([0, 1, 3], [0, 1, 3], False)\n",
|
||||
" assert test1([0, 1, 3, 2], [0, 1, 3, 2], False)\n",
|
||||
" assert test1([0, 1, 0, 1, 0, 2, 1], [0, 1, 2], True)\n",
|
||||
" assert test1([0, 1, 0, 2, 1, 3, 1, 2, 5, 4, 3], [0, 1, 2, 3, 5, 4], False)\n",
|
||||
" return 'ok'\n",
|
||||
"\n",
|
||||
"def test1(s, firsts, is_valid):\n",
|
||||
" \"\"\"Test whether first_occurrences(s) == firsts and valid(s) == is_valid\"\"\"\n",
|
||||
" return first_occurrences(s) == firsts and valid(s) == is_valid\n",
|
||||
" \n",
|
||||
"test()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"First, I will solve the problem in a slow but sure way: generate all possible strings, then count the number that are valid. The complexity of this algorithm is $O(k^{k+1})$, because there are $k^k$ strings, and to validate a string requires looking at all $k$ characters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 41,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[1, 1, 2, 5, 15, 52, 203]"
|
||||
]
|
||||
},
|
||||
"execution_count": 41,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import itertools \n",
|
||||
"\n",
|
||||
"all_strings = itertools.product\n",
|
||||
"\n",
|
||||
"def how_many_slow(k): \n",
|
||||
" \"\"\"Count the number of valid strings. (Try all possible strings.)\"\"\"\n",
|
||||
" return sum(valid(s) for s in all_strings(range(k), repeat=k))\n",
|
||||
"\n",
|
||||
"[how_many_slow(k) for k in range(7)]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's think about how to speed that up.I don't want to have to consider every possible string, because there are too many ($k^k$) of them. Can I group together many strings and just count the number of them, without enumerating each one? For example, if I knew there were 52 valid strings of length $k-1$ (and didn't know anything else about them), can I tell how many valid strings of length $k$ there are? I don't see a way to do this, because the number of ways to extend a valid string is dependent on the number of distinct characters in the string. If a string has $m$ distinct characters, then I can extend it by repeating any of those $m$ characters, or by introducing a first occurrence of character number $m+1$.\n",
|
||||
"\n",
|
||||
"So I need to keep track of the number of valid strings of length $k$ that have exactly $m$ distinct characters (which, by definition, must be `range(m)`). I'll call that number `C(k, m)`. Then I can define `how_many(k)` as the sum over all values of `m` of `C(k, m)`:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from functools import lru_cache\n",
|
||||
" \n",
|
||||
"@lru_cache()\n",
|
||||
"def C(k, m):\n",
|
||||
" \"Count the number of valid strings of length k, that use m distinct characters.\"\n",
|
||||
" return (1 if k == 0 == m else\n",
|
||||
" 0 if k == 0 != m else\n",
|
||||
" C(k-1, m) * m + C(k-1, m-1)) # m ways to add an old character; 1 way to add new\n",
|
||||
"\n",
|
||||
"def how_many(k): return sum(C(k, m) for m in range(k+1))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 43,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"47585391276764833658790768841387207826363669686825611466616334637559114497892442622672724044217756306953557882560751"
|
||||
]
|
||||
},
|
||||
"execution_count": 43,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"how_many(100)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 44,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"assert(all(how_many(k) == how_many_slow(k) for k in range(7)))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 68,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" 0 10^0 1\n",
|
||||
" 1 10^0 1\n",
|
||||
" 2 10^0 2\n",
|
||||
" 3 10^1 5\n",
|
||||
" 4 10^1 15\n",
|
||||
" 5 10^2 52\n",
|
||||
" 6 10^2 203\n",
|
||||
" 7 10^3 877\n",
|
||||
" 8 10^4 4140\n",
|
||||
" 9 10^4 21147\n",
|
||||
" 10 10^5 115975\n",
|
||||
" 20 10^14 51724158235372\n",
|
||||
" 30 10^24 846749014511809332450147\n",
|
||||
" 40 10^35 157450588391204931289324344702531067\n",
|
||||
" 50 10^47 185724268771078270438257767181908917499221852770\n",
|
||||
" 60 10^60 976939307467007552986994066961675455550246347757474482558637\n",
|
||||
" 70 10^73 18075003898340511237556784424498369141305841234468097908227993035088029195\n",
|
||||
" 80 10^87 991267988808424794443839434655920239360814764000951599022939879419136287216681744888844\n",
|
||||
" 90 10^101 141580318123392930464192819123202606981284563291786545804370223525364095085412667328027643050802912567\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import math\n",
|
||||
"\n",
|
||||
"for k in itertools.chain(range(10), range(10, 100, 10)):\n",
|
||||
" n = how_many(k)\n",
|
||||
" print('{:3} 10^{:<3} {:d}'.format(k, round(math.log10(n)), n))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 74,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"-11 -9 5\n",
|
||||
"-5 9 11\n",
|
||||
"-1 4 11\n",
|
||||
"done 20\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"N = 20\n",
|
||||
"for a, b, p in itertools.combinations(range(-N, N), 3):\n",
|
||||
" if (b + p) * (a + p) * (a + b) == 0: continue\n",
|
||||
" if a/(b + p) + b/(a + p) + p/(a + b) == 4:\n",
|
||||
" print(a, b, p)\n",
|
||||
"print('done', N)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"(a^3 + a^2 b + a^2 p + a b^2 + 3 a b p + a p^2 + b^3 + b^2 p + b p^2 + p^3)/((a + b) (a + p) (b + p)) = 4"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 104,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"CPU times: user 25 s, sys: 106 ms, total: 25.1 s\n",
|
||||
"Wall time: 25.3 s\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"945"
|
||||
]
|
||||
},
|
||||
"execution_count": 104,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sides = range(1, 11)\n",
|
||||
"from itertools import permutations\n",
|
||||
"\n",
|
||||
"def sortuple(items): return tuple(sorted(items))\n",
|
||||
"\n",
|
||||
"def sets_of_rectangles(sides): return set(*map(set_of_rectangles, permutations(sides)))\n",
|
||||
"\n",
|
||||
"def set_of_rectangles(sides): return sortuple(sortuple(sides[i:i+2]) for i in range(0, len(sides), 2))\n",
|
||||
"\n",
|
||||
"%time len(sets_of_rectangles(sides))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 99,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"((1, 2), (3, 4), (5, 6), (7, 8), (9, 10))"
|
||||
]
|
||||
},
|
||||
"execution_count": 99,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from itertools import zip_longest\n",
|
||||
"\n",
|
||||
"def grouper(iterable, n, fillvalue=None):\n",
|
||||
" \"Collect data into fixed-length chunks or blocks\"\n",
|
||||
" # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx\"\n",
|
||||
" args = [iter(iterable)] * n\n",
|
||||
" return zip_longest(*args, fillvalue=fillvalue)\n",
|
||||
"\n",
|
||||
"def grouper(iterable, n):\n",
|
||||
" \n",
|
||||
"\n",
|
||||
"tuple(grouper(sides, 2))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 93,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"{frozenset({(1, 3), (2, 5), (4, 6)}),\n",
|
||||
" frozenset({(1, 2), (3, 5), (4, 6)}),\n",
|
||||
" frozenset({(1, 3), (2, 6), (4, 5)}),\n",
|
||||
" frozenset({(1, 6), (2, 3), (4, 5)}),\n",
|
||||
" frozenset({(1, 4), (2, 3), (5, 6)}),\n",
|
||||
" frozenset({(1, 6), (2, 4), (3, 5)}),\n",
|
||||
" frozenset({(1, 6), (2, 5), (3, 4)}),\n",
|
||||
" frozenset({(1, 5), (2, 3), (4, 6)}),\n",
|
||||
" frozenset({(1, 5), (2, 6), (3, 4)}),\n",
|
||||
" frozenset({(1, 3), (2, 4), (5, 6)}),\n",
|
||||
" frozenset({(1, 4), (2, 6), (3, 5)}),\n",
|
||||
" frozenset({(1, 2), (3, 4), (5, 6)}),\n",
|
||||
" frozenset({(1, 5), (2, 4), (3, 6)}),\n",
|
||||
" frozenset({(1, 2), (3, 6), (4, 5)}),\n",
|
||||
" frozenset({(1, 4), (2, 5), (3, 6)})}"
|
||||
]
|
||||
},
|
||||
"execution_count": 93,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sets_of_rectangles(sides)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
2307
How to Do Things with Words.ipynb
Normal file
2307
How to Do Things with Words.ipynb
Normal file
File diff suppressed because one or more lines are too long
1180
Mean Misanthrope Density.ipynb
Normal file
1180
Mean Misanthrope Density.ipynb
Normal file
File diff suppressed because one or more lines are too long
534
Palindrome.ipynb
Normal file
534
Palindrome.ipynb
Normal file
@ -0,0 +1,534 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Panama Palindrome\n",
|
||||
"\n",
|
||||
"## Utilities"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 39,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import random, re, bisect, time\n",
|
||||
"\n",
|
||||
"def is_palindrome(s):\n",
|
||||
" \"Test if a string is a palindrome (only considering letters a-z).\"\n",
|
||||
" s1 = canonical(s)\n",
|
||||
" return s1 == reversestr(s1)\n",
|
||||
"\n",
|
||||
"def is_unique_palindrome(s):\n",
|
||||
" \"Test if string s is a palindrome where each comma-separated phrase is unique.\"\n",
|
||||
" return is_palindrome(s) and is_unique(phrases(s))\n",
|
||||
"\n",
|
||||
"def canonical(word, sub=re.compile('[^a-z]').sub):\n",
|
||||
" \"The canonical form for comparing: only lowercase a-z.\"\n",
|
||||
" return sub('', word.lower())\n",
|
||||
"\n",
|
||||
"def phrases(s):\n",
|
||||
" \"Break a string s into comma-separated phrases.\"\n",
|
||||
" return [phrase.strip() for phrase in s.split(',')]\n",
|
||||
"\n",
|
||||
"def reversestr(s):\n",
|
||||
" \"Reverse a string.\"\n",
|
||||
" return s[::-1]\n",
|
||||
"\n",
|
||||
"def is_unique(collection):\n",
|
||||
" \"Return true if collection has no duplicate elements.\"\n",
|
||||
" return len(collection) == len(set(collection))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 35,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'test_utils passes'"
|
||||
]
|
||||
},
|
||||
"execution_count": 35,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def test_utils():\n",
|
||||
" assert is_unique_palindrome('A man, a plan, a canal, Panama!')\n",
|
||||
" assert is_unique_palindrome('''A (man), a PLAN... a ``canal?'' -- Panama!''')\n",
|
||||
" assert not is_unique_palindrome('A man, a plan, a radar, a canal, Panama.')\n",
|
||||
" \n",
|
||||
" assert is_palindrome('A man, a plan, a canal, Panama.')\n",
|
||||
" assert is_palindrome('Radar. Radar? Radar!')\n",
|
||||
" assert not is_palindrome('radars')\n",
|
||||
"\n",
|
||||
" assert phrases('A man, a plan, Panama') == ['A man', 'a plan', 'Panama']\n",
|
||||
" assert canonical('A man, a plan, a canal, Panama') == 'amanaplanacanalpanama'\n",
|
||||
" assert reversestr('foo') == 'oof'\n",
|
||||
" assert is_unique([1, 2, 3])\n",
|
||||
" assert not is_unique([1, 2, 2])\n",
|
||||
" return 'test_utils passes'\n",
|
||||
"\n",
|
||||
"test_utils()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## The Dictionary"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"! [ -e npdict.txt ] || curl -O http://norvig.com/npdict.txt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" 126144 204928 1383045 npdict.txt\r\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"! wc npdict.txt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 45,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"################ Reading in a dictionary\n",
|
||||
"\n",
|
||||
"class PalDict:\n",
|
||||
" \"\"\"A dictionary with the following fields:\n",
|
||||
" words: a sorted list of words: ['ant', 'bee', 'sea']\n",
|
||||
" rwords: a sorted list of reversed words: ['aes', 'eeb', 'tna']\n",
|
||||
" truename: a dict of {canonical:true} pairs, e.g. {'anelk': 'an elk', 'anneelk': 'Anne Elk'}\n",
|
||||
" k:\n",
|
||||
" and the followng methods:\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" def __init__(self, k=100, filename='npdict.txt'):\n",
|
||||
" words, rwords, truename = [], [], {'': '', 'panama': 'Panama!'}\n",
|
||||
" for tword in open(filename, 'r', encoding='ascii', errors='ignore').read().splitlines():\n",
|
||||
" word = canonical(tword)\n",
|
||||
" words.append(word)\n",
|
||||
" rwords.append(reversestr(word))\n",
|
||||
" truename[word] = tword\n",
|
||||
" words.sort()\n",
|
||||
" rwords.sort()\n",
|
||||
" self.k = k\n",
|
||||
" self.words = words\n",
|
||||
" self.rwords = rwords\n",
|
||||
" self.truename = truename\n",
|
||||
" self.rangek = range(k)\n",
|
||||
" self.tryharder = False\n",
|
||||
"\n",
|
||||
" def startswith(self, prefix):\n",
|
||||
" \"\"\"Return up to k canonical words that start with prefix.\n",
|
||||
" If there are more than k, choose from them at random.\"\"\"\n",
|
||||
" return self._k_startingwith(self.words, prefix)\n",
|
||||
"\n",
|
||||
" def endswith(self, rsuffix):\n",
|
||||
" \"\"\"Return up to k canonical words that end with the reversed suffix.\n",
|
||||
" If you want words ending in 'ing', ask for d.endswith('gni').\n",
|
||||
" If there are more than k, choose from them at random.\"\"\"\n",
|
||||
" return [reversestr(s) for s in self._k_startingwith(self.rwords, rsuffix)]\n",
|
||||
"\n",
|
||||
" def __contains__(self, word):\n",
|
||||
" return word in self.truename\n",
|
||||
"\n",
|
||||
" def _k_startingwith(self, words, prefix):\n",
|
||||
" start = bisect.bisect_left(words, prefix)\n",
|
||||
" end = bisect.bisect(words, prefix + 'zzzz')\n",
|
||||
" n = end - start\n",
|
||||
" if self.k >= n: # get all the words that start with prefix\n",
|
||||
" results = words[start:end]\n",
|
||||
" else: # sample from words starting with prefix \n",
|
||||
" indexes = random.sample(range(start, end), self.k)\n",
|
||||
" results = [words[i] for i in indexes]\n",
|
||||
" random.shuffle(results)\n",
|
||||
" ## Consider words that are prefixes of the prefix.\n",
|
||||
" ## This is very slow, so don't use it until late in the game.\n",
|
||||
" if self.tryharder:\n",
|
||||
" for i in range(3, len(prefix)):\n",
|
||||
" w = prefix[0:i]\n",
|
||||
" if ((words == self.words and w in self.truename) or\n",
|
||||
" (words == self.rwords and reversestr(w) in self.truename)):\n",
|
||||
" results.append(w)\n",
|
||||
" return results\n",
|
||||
"\n",
|
||||
"paldict = PalDict() \n",
|
||||
"\n",
|
||||
"def anpdictshort():\n",
|
||||
" \"Find the words that are valid when every phrase must start with 'a'\"\n",
|
||||
" def segment(word): return [s for s in word.split('a') if s]\n",
|
||||
" def valid(word): return all(reversestr(s) in segments for s in segment(word))\n",
|
||||
" words = [canonical(w) for w in open('anpdict.txt')]\n",
|
||||
" segments = set(s for w in words for s in segment(w))\n",
|
||||
" valid_words = [paldict.truename[w] for w in words if valid(w)]\n",
|
||||
" file('anpdict-short2.txt', 'w').write('\\n'.join(valid_words))\n",
|
||||
"\n",
|
||||
"################ Search for a palindrome\n",
|
||||
"\n",
|
||||
"class Panama:\n",
|
||||
" def __init__(self, L='A man, a plan', R='a canal, Panama', dict=paldict):\n",
|
||||
" ## .left and .right hold lists of canonical words\n",
|
||||
" ## .diff holds the number of characters that are not matched,\n",
|
||||
" ## positive for words on left, negative for right.\n",
|
||||
" ## .stack holds (action, side, arg) tuples\n",
|
||||
" self.left = []\n",
|
||||
" self.right = []\n",
|
||||
" self.best = 0\n",
|
||||
" self.seen = {}\n",
|
||||
" self.diff = 0\n",
|
||||
" self.stack = []\n",
|
||||
" self.starttime = time.clock()\n",
|
||||
" self.dict = dict\n",
|
||||
" self.steps = 0\n",
|
||||
" for word in L.split(','):\n",
|
||||
" self.add('left', canonical(word))\n",
|
||||
" for rword in reversestr(R).split(','):\n",
|
||||
" self.add('right', canonical(reversestr(rword)))\n",
|
||||
" self.consider_candidates()\n",
|
||||
" \n",
|
||||
" def search(self, steps=10*1000*1000):\n",
|
||||
" \"Search for palindromes.\"\n",
|
||||
" for self.steps in range(steps):\n",
|
||||
" if not self.stack:\n",
|
||||
" return 'done'\n",
|
||||
" action, dir, substr, arg = self.stack[-1]\n",
|
||||
" if action == 'added': # undo the last word added\n",
|
||||
" self.remove(dir, arg)\n",
|
||||
" elif action == 'trying' and arg: # try the next word if there is one\n",
|
||||
" self.add(dir, arg.pop()) and self.consider_candidates()\n",
|
||||
" elif action == 'trying' and not arg: # otherwise backtrack\n",
|
||||
" self.stack.pop()\n",
|
||||
" else:\n",
|
||||
" raise ValueError(action)\n",
|
||||
" self.report()\n",
|
||||
" return self\n",
|
||||
"\n",
|
||||
" def add(self, dir, word):\n",
|
||||
" \"add a word\"\n",
|
||||
" if word in self.seen:\n",
|
||||
" return False\n",
|
||||
" else:\n",
|
||||
" getattr(self, dir).append(word)\n",
|
||||
" self.diff += factor[dir] * len(word)\n",
|
||||
" self.seen[word] = True\n",
|
||||
" self.stack.append(('added', dir, '?', word))\n",
|
||||
" return True\n",
|
||||
"\n",
|
||||
" def remove(self, dir, word):\n",
|
||||
" \"remove a word\"\n",
|
||||
" oldword = getattr(self, dir).pop()\n",
|
||||
" assert word == oldword\n",
|
||||
" self.diff -= factor[dir] * len(word)\n",
|
||||
" del self.seen[word]\n",
|
||||
" self.stack.pop()\n",
|
||||
" \n",
|
||||
" def consider_candidates(self):\n",
|
||||
" \"\"\"Push a new state with a set of candidate words onto stack.\"\"\"\n",
|
||||
" if self.diff > 0: # Left is longer, consider adding on right\n",
|
||||
" dir = 'right'\n",
|
||||
" substr = self.left[-1][-self.diff:]\n",
|
||||
" candidates = self.dict.endswith(substr)\n",
|
||||
" elif self.diff < 0: # Right is longer, consider adding on left\n",
|
||||
" dir = 'left'\n",
|
||||
" substr = reversestr(self.right[-1][0:-self.diff])\n",
|
||||
" candidates = self.dict.startswith(substr)\n",
|
||||
" else: # Both sides are same size\n",
|
||||
" dir = 'left'\n",
|
||||
" substr = ''\n",
|
||||
" candidates = self.dict.startswith('')\n",
|
||||
" if substr == reversestr(substr):\n",
|
||||
" self.report()\n",
|
||||
" self.stack.append(('trying', dir, substr, candidates))\n",
|
||||
" \n",
|
||||
" def report(self):\n",
|
||||
" \"Report a new palindrome to log file (if it is sufficiently big).\"\n",
|
||||
" N = len(self)\n",
|
||||
" if N > 13333:\n",
|
||||
" self.dict.tryharder = True\n",
|
||||
" if N > self.best and (N > 13000 or N > self.best+1000):\n",
|
||||
" self.best = len(self)\n",
|
||||
" self.bestphrase = str(self)\n",
|
||||
" print('%5d phrases (%5d words) in %3d seconds (%6d steps)' % (\n",
|
||||
" self.best, self.bestphrase.count(' ')+1, time.clock() - self.starttime,\n",
|
||||
" self.steps))\n",
|
||||
" assert is_unique_palindrome(self.bestphrase)\n",
|
||||
"\n",
|
||||
" def __len__(self):\n",
|
||||
" return len(self.left) + len(self.right)\n",
|
||||
"\n",
|
||||
" def __str__(self):\n",
|
||||
" truename = self.dict.truename\n",
|
||||
" lefts = [truename[w] for w in self.left]\n",
|
||||
" rights = [truename[w] for w in self.right]\n",
|
||||
" return ', '.join(lefts + rights[::-1])\n",
|
||||
" \n",
|
||||
" def __repr__(self):\n",
|
||||
" return '<Panama with {} phrases>'.format(len(self))\n",
|
||||
"\n",
|
||||
"factor = {'left': +1, 'right': -1}\n",
|
||||
"\n",
|
||||
"# Note that we only allow one truename per canonical name. Occasionally\n",
|
||||
"# this means we miss a good word (as in \"a node\" vs. \"an ode\"), but there\n",
|
||||
"# are only 665 of these truename collisions, and most of them are of the\n",
|
||||
"# form \"a mark-up\" vs. \"a markup\" so it seemed better to disallow them.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 30,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"126144"
|
||||
]
|
||||
},
|
||||
"execution_count": 30,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(paldict.words)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 46,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"all tests pass\n",
|
||||
" 1005 phrases ( 1239 words) in 0 seconds ( 18582 steps)\n",
|
||||
" 2012 phrases ( 2478 words) in 0 seconds ( 41886 steps)\n",
|
||||
" 3017 phrases ( 3710 words) in 0 seconds ( 64444 steps)\n",
|
||||
" 4020 phrases ( 4957 words) in 0 seconds ( 92989 steps)\n",
|
||||
" 5022 phrases ( 6184 words) in 1 seconds (128986 steps)\n",
|
||||
" 6024 phrases ( 7408 words) in 1 seconds (162634 steps)\n",
|
||||
" 7027 phrases ( 8607 words) in 1 seconds (204639 steps)\n",
|
||||
" 8036 phrases ( 9846 words) in 2 seconds (254992 steps)\n",
|
||||
" 9037 phrases (11050 words) in 2 seconds (320001 steps)\n",
|
||||
"10039 phrases (12257 words) in 2 seconds (417723 steps)\n",
|
||||
"11040 phrases (13481 words) in 3 seconds (565050 steps)\n",
|
||||
"12043 phrases (14711 words) in 4 seconds (887405 steps)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<Panama with 12764 phrases>"
|
||||
]
|
||||
},
|
||||
"execution_count": 46,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"################ Unit Tests\n",
|
||||
" \n",
|
||||
"def test2(p=PalDict()):\n",
|
||||
" d = p.dict\n",
|
||||
" def sameset(a, b): return set(a) == set(b)\n",
|
||||
" assert 'panama' in d\n",
|
||||
" assert d.words[0] in d\n",
|
||||
" assert d.words[-1] in d\n",
|
||||
" assert sameset(d.startswith('aword'), ['awording', 'awordbreak',\n",
|
||||
" 'awordiness', 'awordage', 'awordplay', 'awordlore', 'awordbook',\n",
|
||||
" 'awordlessness', 'aword', 'awordsmith'])\n",
|
||||
" assert sameset(d.endswith('ytisob'), ['aglobosity', 'averbosity',\n",
|
||||
" 'asubglobosity', 'anonverbosity', 'agibbosity'])\n",
|
||||
" d.tryharder = True\n",
|
||||
" assert sameset(d.startswith('oklahoma'), ['oklahoma', 'okla'])\n",
|
||||
" d.tryharder = False\n",
|
||||
" assert d.startswith('oklahoma') == ['oklahoma']\n",
|
||||
" assert d.startswith('fsfdsfdsfds') == []\n",
|
||||
" print('all tests pass')\n",
|
||||
" return p\n",
|
||||
"\n",
|
||||
"p = Panama()\n",
|
||||
"test2(p)\n",
|
||||
"p.search().report()\n",
|
||||
"p"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 21,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" % Total % Received % Xferd Average Speed Time Time Time Current\n",
|
||||
" Dload Upload Total Spent Left Speed\n",
|
||||
"100 847k 100 847k 0 0 1037k 0 --:--:-- --:--:-- --:--:-- 1037k\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"! [ -e anpdict.txt ] || curl -O http://norvig.com/anpdict.txt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 26,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"anpdictshort()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 27,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" 4527 9055 39467 anpdict-short.txt\r\n",
|
||||
" 4527 9055 39467 anpdict-short2.txt\r\n",
|
||||
" 69241 138489 867706 anpdict.txt\r\n",
|
||||
" 126144 204928 1383045 npdict.txt\r\n",
|
||||
" 204439 361527 2329685 total\r\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"! wc *npd*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"source": [
|
||||
"# Letter-By-Letter Approach\n",
|
||||
"\n",
|
||||
"Can we go letter-by-letter instead of word-by-word? Advantages: \n",
|
||||
"\n",
|
||||
"* We can (if need be) be exhaustive at each decision point, trying all 26 possibilities.\n",
|
||||
"* We can try the most likely letters first.\n",
|
||||
"\n",
|
||||
"Process\n",
|
||||
"\n",
|
||||
"* Keep left- nad right- partial phrase lists; and the current state:\n",
|
||||
"\n",
|
||||
" {left: ['aman', 'aplan'], right: ['acanal', panama'],\n",
|
||||
" left_word: True, right_word: True, extra_chars: +3, palindrome: True}\n",
|
||||
" \n",
|
||||
"* Now consider all ways of extending:\n",
|
||||
"\n",
|
||||
" - Add the letter `'a'` to the left, either as a new word or a continuation of the old word (perhaps going for `'a planaria'`).\n",
|
||||
" - Add a letter, any letter, to the right, either as a new word or a continuation of \n",
|
||||
" \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from collections import namedtuple\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def do(state, action, side, L): action(state, side, L)\n",
|
||||
"def add(state, side, L): getattr(state, side)[-1] += L\n",
|
||||
"def new(state, side, L): getattr(state, side).append(L)\n",
|
||||
"def undo(action, letter):\n",
|
||||
" if action == add:\n",
|
||||
" elif action == new:\n",
|
||||
" else:\n",
|
||||
" raise ValueError()"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
3322
Probability.ipynb
Normal file
3322
Probability.ipynb
Normal file
File diff suppressed because one or more lines are too long
2626
ProbabilityParadox.ipynb
Normal file
2626
ProbabilityParadox.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
1011
Project Euler Utils.ipynb
Normal file
1011
Project Euler Utils.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
402
PropositionalLogic.ipynb
Normal file
402
PropositionalLogic.ipynb
Normal file
@ -0,0 +1,402 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Translating English Sentences into Propositional Logic Statements\n",
|
||||
"\n",
|
||||
"In a Logic course, one exercise is to turn an English sentence like this:\n",
|
||||
"\n",
|
||||
"> *Sieglinde will survive, and either her son will gain the Ring and Wotan’s plan will be fulfilled or else Valhalla will be destroyed.*\n",
|
||||
"\n",
|
||||
"Into a formal Propositional Logic statement: \n",
|
||||
"\n",
|
||||
" P ⋀ ((Q ⋀ R) ∨ S)\n",
|
||||
" \n",
|
||||
"along with definitions of the propositions:\n",
|
||||
"\n",
|
||||
" P: Sieglinde will survive\n",
|
||||
" Q: Sieglinde’s son will gain the Ring\n",
|
||||
" R: Wotan’s plan will be fulfilled\n",
|
||||
" S: Valhalla will be destroyed\n",
|
||||
"\n",
|
||||
"For some sentences, it takes detailed knowledge to get a good translation. The following two sentences are ambiguous, with different prefered interpretations, and translating them correctly requires knowledge of eating habits:\n",
|
||||
"\n",
|
||||
" I will eat salad or I will eat bread and I will eat butter. P ∨ (Q ⋀ R)\n",
|
||||
" I will eat salad or I will eat soup and I will eat ice cream. (P ∨ Q) ⋀ R\n",
|
||||
"\n",
|
||||
"But for many sentences, the translation process is automatic, with no special knowledge required. I will develop a program to handle these easy sentences. The program is based on the idea of a series of translation rules of the form:\n",
|
||||
"\n",
|
||||
" Rule('{P} ⇒ {Q}', 'if {P} then {Q}', 'if {P}, {Q}')\n",
|
||||
" \n",
|
||||
"which means that the logic translation will have the form `'P ⇒ Q'`, whenever the English sentence has either the form `'if P then Q'` or `'if P, Q'`, where `P` and `Q` can match any non-empty subsequence of characters. Whatever matches `P` and `Q` will be recursively processed by the rules. The rules are in order—top to bottom, left to right, and the first rule that matches in that order will be accepted, no matter what, so be sure you order your rules carefully. One guideline I have adhered to is to put all the rules that start with a keyword (like `'if'` or `'neither'`) before the rules that start with a variable (like `'{P}'`); that way you avoid accidently having a keyword swallowed up inside a `'{P}'`.\n",
|
||||
"\n",
|
||||
"Notice that given the sentence \"*Sieglinde will survive*\", the program should make up a new propositional symbol, `P`, and record the fact that `P` refers to \"*Sieglinde will survive*\". But the negative sentence \"*Sieglinde will not survive*\", should be translated as `~P`, where again `P` is \"*Sieglinde will survive*\". So to fully specify the translation process, we need to define both `rules` and `negations`. (We do that using [regular expressions](https://docs.python.org/3.5/library/re.html), which can sometimes be confusing.)\n",
|
||||
"\n",
|
||||
"First the function to define a rule (and some auxiliary functions):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import re\n",
|
||||
"\n",
|
||||
"def Rule(output, *patterns):\n",
|
||||
" \"A rule that produces `output` if the entire input matches any one of the `patterns`.\" \n",
|
||||
" return (output, [name_group(pat) + '$' for pat in patterns])\n",
|
||||
"\n",
|
||||
"def name_group(pat):\n",
|
||||
" \"Replace '{Q}' with '(?P<Q>.+?)', which means 'match 1 or more characters, and call it Q'\"\n",
|
||||
" return re.sub('{(.)}', r'(?P<\\1>.+?)', pat)\n",
|
||||
" \n",
|
||||
"def word(w):\n",
|
||||
" \"Return a regex that matches w as a complete word (not letters inside a word).\"\n",
|
||||
" return r'\\b' + w + r'\\b' # '\\b' matches at word boundary"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And now the actual rules. If you have a sentence that is not translated correctly by this program, you can augment these rules to handle your sentence."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"rules = [\n",
|
||||
" Rule('{P} ⇒ {Q}', 'if {P} then {Q}', 'if {P}, {Q}'),\n",
|
||||
" Rule('{P} ⋁ {Q}', 'either {P} or else {Q}', 'either {P} or {Q}'),\n",
|
||||
" Rule('{P} ⋀ {Q}', 'both {P} and {Q}'),\n",
|
||||
" Rule('~{P} ⋀ ~{Q}', 'neither {P} nor {Q}'),\n",
|
||||
" Rule('~{A}{P} ⋀ ~{A}{Q}', '{A} neither {P} nor {Q}'), # The Kaiser neither ...\n",
|
||||
" Rule('~{Q} ⇒ {P}', '{P} unless {Q}'),\n",
|
||||
" Rule('{P} ⇒ {Q}', '{Q} provided that {P}', '{Q} whenever {P}', '{P} implies {Q}',\n",
|
||||
" '{P} therefore {Q}', '{Q}, if {P}', '{Q} if {P}', '{P} only if {Q}'),\n",
|
||||
" Rule('{P} ⋀ {Q}', '{P} and {Q}', '{P} but {Q}'),\n",
|
||||
" Rule('{P} ⋁ {Q}', '{P} or else {Q}', '{P} or {Q}'),\n",
|
||||
" ]\n",
|
||||
"\n",
|
||||
"negations = [\n",
|
||||
" (word(\"not\"), \"\"),\n",
|
||||
" (word(\"cannot\"), \"can\"),\n",
|
||||
" (word(\"can't\"), \"can\"),\n",
|
||||
" (word(\"won't\"), \"will\"),\n",
|
||||
" (word(\"ain't\"), \"is\"),\n",
|
||||
" (\"n't\", \"\"), # matches as part of a word: didn't, couldn't, etc.\n",
|
||||
" ]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now the mechanism to process these rules. Note that `defs` is a dict of definitions of propositional symbols: `{P: 'english'}`. The three `match_*` functions return two values: the translation of a sentence, and a dict of defintions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def match_rules(sentence, rules, defs):\n",
|
||||
" \"\"\"Match sentence against all the rules, accepting the first match; or else make it an atomic proposition.\n",
|
||||
" Return two values: the Logic translation and a dict of {P: 'english'} definitions.\"\"\"\n",
|
||||
" sentence = clean(sentence)\n",
|
||||
" for rule in rules:\n",
|
||||
" result = match_rule(sentence, rule, defs)\n",
|
||||
" if result: \n",
|
||||
" return result\n",
|
||||
" return match_atomic_proposition(sentence, negations, defs)\n",
|
||||
" \n",
|
||||
"def match_rule(sentence, rule, defs):\n",
|
||||
" \"Match a single rule, returning the logic translation and the dict of definitions if the match succeeds.\"\n",
|
||||
" output, patterns = rule\n",
|
||||
" for pat in patterns:\n",
|
||||
" match = re.match(pat, sentence, flags=re.I)\n",
|
||||
" if match:\n",
|
||||
" groups = match.groupdict()\n",
|
||||
" for P in sorted(groups): # Recursively apply rules to each of the matching groups\n",
|
||||
" groups[P] = match_rules(groups[P], rules, defs)[0]\n",
|
||||
" return '(' + output.format(**groups) + ')', defs\n",
|
||||
" \n",
|
||||
"def match_atomic_proposition(sentence, negations, defs):\n",
|
||||
" \"No rule matched; sentence is an atom. Add new proposition to defs. Handle negation.\"\n",
|
||||
" polarity = ''\n",
|
||||
" for (neg, pos) in negations:\n",
|
||||
" (sentence, n) = re.subn(neg, pos, sentence, flags=re.I)\n",
|
||||
" polarity += n * '~'\n",
|
||||
" sentence = clean(sentence)\n",
|
||||
" P = proposition_name(sentence, defs)\n",
|
||||
" defs[P] = sentence\n",
|
||||
" return polarity + P, defs\n",
|
||||
" \n",
|
||||
"def proposition_name(sentence, defs, names='PQRSTUVWXYZBCDEFGHJKLMN'):\n",
|
||||
" \"Return the old name for this sentence, if used before, or a new, unused name.\"\n",
|
||||
" inverted = {defs[P]: P for P in defs}\n",
|
||||
" if sentence in inverted:\n",
|
||||
" return inverted[sentence] # Find previously-used name\n",
|
||||
" else:\n",
|
||||
" return next(P for P in names if P not in defs) # Use a new unused name\n",
|
||||
" \n",
|
||||
"def clean(text): \n",
|
||||
" \"Remove redundant whitespace; handle curly apostrophe and trailing comma/period.\"\n",
|
||||
" return ' '.join(text.split()).replace(\"’\", \"'\").rstrip('.').rstrip(',')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"And finally some test sentences and a top-level function to produce output:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"Polkadots and Moonbeams \n",
|
||||
"Logic: (P ⋀ Q)\n",
|
||||
"P: Polkadots\n",
|
||||
"Q: Moonbeams\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"If you liked it then you shoulda put a ring on it \n",
|
||||
"Logic: (P ⇒ Q)\n",
|
||||
"P: you liked it\n",
|
||||
"Q: you shoulda put a ring on it\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"If you build it, he will come \n",
|
||||
"Logic: (P ⇒ Q)\n",
|
||||
"P: you build it\n",
|
||||
"Q: he will come\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"It don't mean a thing, if it ain't got that swing \n",
|
||||
"Logic: (~P ⇒ ~Q)\n",
|
||||
"P: it is got that swing\n",
|
||||
"Q: It do mean a thing\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"If loving you is wrong, I don't want to be right \n",
|
||||
"Logic: (P ⇒ ~Q)\n",
|
||||
"P: loving you is wrong\n",
|
||||
"Q: I do want to be right\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"Should I stay or should I go \n",
|
||||
"Logic: (P ⋁ Q)\n",
|
||||
"P: Should I stay\n",
|
||||
"Q: should I go\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"I shouldn't go and I shouldn't not go \n",
|
||||
"Logic: (~P ⋀ ~~P)\n",
|
||||
"P: I should go\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"If I fell in love with you, would you promise to be true and help me understand \n",
|
||||
"Logic: (P ⇒ (Q ⋀ R))\n",
|
||||
"P: I fell in love with you\n",
|
||||
"Q: would you promise to be true\n",
|
||||
"R: help me understand\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"I could while away the hours conferrin' with the flowers, consulting with the rain and my head I'd be a\n",
|
||||
"scratchin' while my thoughts are busy hatchin' if I only had a brain \n",
|
||||
"Logic: (P ⇒ (Q ⋀ R))\n",
|
||||
"P: I only had a brain\n",
|
||||
"Q: I could while away the hours conferrin' with the flowers, consulting with the rain\n",
|
||||
"R: my head I'd be a scratchin' while my thoughts are busy hatchin'\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"There's a federal tax, and a state tax, and a city tax, and a street tax, and a sewer tax \n",
|
||||
"Logic: (P ⋀ (Q ⋀ (R ⋀ (S ⋀ T))))\n",
|
||||
"P: There's a federal tax\n",
|
||||
"Q: a state tax\n",
|
||||
"R: a city tax\n",
|
||||
"S: a street tax\n",
|
||||
"T: a sewer tax\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"A ham sandwich is better than nothing and nothing is better than eternal happiness therefore a ham\n",
|
||||
"sandwich is better than eternal happiness \n",
|
||||
"Logic: ((P ⋀ Q) ⇒ R)\n",
|
||||
"P: A ham sandwich is better than nothing\n",
|
||||
"Q: nothing is better than eternal happiness\n",
|
||||
"R: a ham sandwich is better than eternal happiness\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"If I were a carpenter and you were a lady, would you marry me anyway? and would you have my baby \n",
|
||||
"Logic: ((P ⋀ Q) ⇒ (R ⋀ S))\n",
|
||||
"P: I were a carpenter\n",
|
||||
"Q: you were a lady\n",
|
||||
"R: would you marry me anyway?\n",
|
||||
"S: would you have my baby\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"Either Danny didn't come to the party or Virgil didn't come to the party \n",
|
||||
"Logic: (~P ⋁ ~Q)\n",
|
||||
"P: Danny did come to the party\n",
|
||||
"Q: Virgil did come to the party\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"Either Wotan will triumph and Valhalla will be saved or else he won't and Alberic will have the final word \n",
|
||||
"Logic: ((P ⋀ Q) ⋁ (~R ⋀ S))\n",
|
||||
"P: Wotan will triumph\n",
|
||||
"Q: Valhalla will be saved\n",
|
||||
"R: he will\n",
|
||||
"S: Alberic will have the final word\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"Sieglinde will survive, and either her son will gain the Ring and Wotan's plan will be fulfilled or else\n",
|
||||
"Valhalla will be destroyed \n",
|
||||
"Logic: (P ⋀ ((Q ⋀ R) ⋁ S))\n",
|
||||
"P: Sieglinde will survive\n",
|
||||
"Q: her son will gain the Ring\n",
|
||||
"R: Wotan's plan will be fulfilled\n",
|
||||
"S: Valhalla will be destroyed\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"Wotan will intervene and cause Siegmund's death unless either Fricka relents or Brunnhilde has her way \n",
|
||||
"Logic: (~(R ⋁ S) ⇒ (P ⋀ Q))\n",
|
||||
"P: Wotan will intervene\n",
|
||||
"Q: cause Siegmund's death\n",
|
||||
"R: Fricka relents\n",
|
||||
"S: Brunnhilde has her way\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"Figaro and Susanna will wed provided that either Antonio or Figaro pays and Bartolo is satisfied or else\n",
|
||||
"Marcellina's contract is voided and the Countess does not act rashly \n",
|
||||
"Logic: ((((P ⋁ Q) ⋀ R) ⋁ (S ⋀ ~T)) ⇒ (U ⋀ V))\n",
|
||||
"P: Antonio\n",
|
||||
"Q: Figaro pays\n",
|
||||
"R: Bartolo is satisfied\n",
|
||||
"S: Marcellina's contract is voided\n",
|
||||
"T: the Countess does act rashly\n",
|
||||
"U: Figaro\n",
|
||||
"V: Susanna will wed\n",
|
||||
"__________________________________________________________________________________________________________ \n",
|
||||
"If the Kaiser neither prevents Bismarck from resigning nor supports the Liberals, then the military will\n",
|
||||
"be in control and either Moltke's plan will be executed or else the people will revolt and the Reich will\n",
|
||||
"not survive \n",
|
||||
"Logic: ((~PQ ⋀ ~PR) ⇒ (S ⋀ (T ⋁ (U ⋀ ~V))))\n",
|
||||
"P: the Kaiser\n",
|
||||
"Q: prevents Bismarck from resigning\n",
|
||||
"R: supports the Liberals\n",
|
||||
"S: the military will be in control\n",
|
||||
"T: Moltke's plan will be executed\n",
|
||||
"U: the people will revolt\n",
|
||||
"V: the Reich will survive\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sentences = '''\n",
|
||||
"Polkadots and Moonbeams.\n",
|
||||
"If you liked it then you shoulda put a ring on it.\n",
|
||||
"If you build it, he will come.\n",
|
||||
"It don't mean a thing, if it ain't got that swing.\n",
|
||||
"If loving you is wrong, I don't want to be right.\n",
|
||||
"Should I stay or should I go.\n",
|
||||
"I shouldn't go and I shouldn't not go.\n",
|
||||
"If I fell in love with you,\n",
|
||||
" would you promise to be true\n",
|
||||
" and help me understand.\n",
|
||||
"I could while away the hours\n",
|
||||
" conferrin' with the flowers,\n",
|
||||
" consulting with the rain\n",
|
||||
" and my head I'd be a scratchin'\n",
|
||||
" while my thoughts are busy hatchin'\n",
|
||||
" if I only had a brain.\n",
|
||||
"There's a federal tax, and a state tax, and a city tax, and a street tax, and a sewer tax.\n",
|
||||
"A ham sandwich is better than nothing \n",
|
||||
" and nothing is better than eternal happiness\n",
|
||||
" therefore a ham sandwich is better than eternal happiness.\n",
|
||||
"If I were a carpenter\n",
|
||||
" and you were a lady,\n",
|
||||
" would you marry me anyway?\n",
|
||||
" and would you have my baby.\n",
|
||||
"Either Danny didn't come to the party or Virgil didn't come to the party.\n",
|
||||
"Either Wotan will triumph and Valhalla will be saved or else he won't and Alberic will have the final word.\n",
|
||||
"Sieglinde will survive, and either her son will gain the Ring and Wotan’s plan will be fulfilled \n",
|
||||
" or else Valhalla will be destroyed.\n",
|
||||
"Wotan will intervene and cause Siegmund's death unless either Fricka relents or Brunnhilde has her way.\n",
|
||||
"Figaro and Susanna will wed provided that either Antonio or Figaro pays and Bartolo is satisfied \n",
|
||||
" or else Marcellina’s contract is voided and the Countess does not act rashly.\n",
|
||||
"If the Kaiser neither prevents Bismarck from resigning nor supports the Liberals, \n",
|
||||
" then the military will be in control and either Moltke's plan will be executed \n",
|
||||
" or else the people will revolt and the Reich will not survive'''.split('.')\n",
|
||||
"\n",
|
||||
"import textwrap\n",
|
||||
"\n",
|
||||
"def logic(sentences, width=106): \n",
|
||||
" \"Match the rules against each sentence in text, and print each result.\"\n",
|
||||
" for s in map(clean, sentences):\n",
|
||||
" logic, defs = match_rules(s, rules, {})\n",
|
||||
" print(width*'_', '\\n' + textwrap.fill(s, width), '\\nLogic:', logic)\n",
|
||||
" for P in sorted(defs):\n",
|
||||
" print('{}: {}'.format(P, defs[P]))\n",
|
||||
"\n",
|
||||
"logic(sentences)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"That looks pretty good! But far from perfect. Here are some errors:\n",
|
||||
"\n",
|
||||
"* `Should I stay` *etc.*:<br>questions are not poropositional statements.\n",
|
||||
"\n",
|
||||
"* `If I were a carpenter`:<br>doesn't handle modal logic.\n",
|
||||
"\n",
|
||||
"* `nothing is better`:<br>doesn't handle quantifiers.\n",
|
||||
"\n",
|
||||
"* `Either Wotan will triumph and Valhalla will be saved or else he won't`:<br>gets `'he will'` as one of the propositions, but better would be if that refered back to `'Wotan will triumph'`.\n",
|
||||
"\n",
|
||||
"* `Wotan will intervene and cause Siegmund's death`:<br>gets `\"cause Siegmund's death\"` as a proposition, but better would be `\"Wotan will cause Siegmund's death\"`.\n",
|
||||
"\n",
|
||||
"* `Figaro and Susanna will wed`:<br>gets `\"Figaro\"` and `\"Susanna will wed\"` as two separate propositions; this should really be one proposition. \n",
|
||||
"\n",
|
||||
"* `\"either Antonio or Figaro pays\"`:<br>gets `\"Antonio\"` as a proposition, but it should be `\"Antonio pays\"`.\n",
|
||||
"\n",
|
||||
"* `If the Kaiser neither prevents`:<br>uses the somewhat bogus propositions `PQ` and `PR`. This should be done in a cleaner way. The problem is the same as the previous problem with Antonio: I don't have a good way to attach the subject of a verb phrase to the multiple parts of the verb/object, when there are multiple parts.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"I'm sure more test sentences would reveal many more types of errors.\n",
|
||||
"\n",
|
||||
"There's also [a version](proplogic.py) of this program that is in Python 2 and uses only ASCII characters; if you have a Mac or Linux system you can download this as [`proplogic.py`](proplogic.py) and run it with the command `python proplogic.py`. Or you can run it [online](https://www.pythonanywhere.com/user/pnorvig/files/home/pnorvig/proplogic.py?edit)."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
697
SET.ipynb
Normal file
697
SET.ipynb
Normal file
@ -0,0 +1,697 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"# SET\n",
|
||||
"\n",
|
||||
"How many cards are there? There are four features (color, shape, shading, and number of figures) on each card, and each feature can take one of three values. So the number of cards is:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"81"
|
||||
]
|
||||
},
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"3 ** 4"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"How many sets are there?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"1080.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"81 * 80 * 1 / 6"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"run_control": {}
|
||||
},
|
||||
"source": [
|
||||
"How many sets does each card participate in? We need to look at the number of sets (1080) and the number of cards in a set (3), divided by the number of cards (81):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 23,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"40.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 23,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"1080 * 3 / 81"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"run_control": {}
|
||||
},
|
||||
"source": [
|
||||
"Note that each *pair* of cards participates in exactly one set.\n",
|
||||
"\n",
|
||||
"How many layouts of 12 cards are there? The answer is (81 choose 12):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 43,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"70724320184700.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 43,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from math import factorial as fact\n",
|
||||
"\n",
|
||||
"def C(n, k): \n",
|
||||
" \"Number of ways of choosing k things from n things.\"\n",
|
||||
" return fact(n) / fact(n-k) / fact(k)\n",
|
||||
"\n",
|
||||
"C(81, 12)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"run_control": {}
|
||||
},
|
||||
"source": [
|
||||
"That's a lot of digits; hard to read. This should help:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"70.7243201847"
|
||||
]
|
||||
},
|
||||
"execution_count": 42,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"M = 10 ** 6 # Million\n",
|
||||
"T = 10 ** 12 # Trillion\n",
|
||||
"\n",
|
||||
"C(81, 12) / T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"70 trillion layouts. Of those some will have 6 sets, some more, some less.\n",
|
||||
"\n",
|
||||
"In a layout of 12 cards, how many triples (that could potentially be a set) are there?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 44,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"220.0"
|
||||
]
|
||||
},
|
||||
"execution_count": 44,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"C(12, 3)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"run_control": {}
|
||||
},
|
||||
"source": [
|
||||
"So, what we're looking for is when exactly 6 of these are sets, and the other 214 are not."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 31,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"2173.5413336637"
|
||||
]
|
||||
},
|
||||
"execution_count": 31,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"C(1080, 6) / T"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"81"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from itertools import product\n",
|
||||
"\n",
|
||||
"feature = (1, 2, 3)\n",
|
||||
"\n",
|
||||
"Card = tuple\n",
|
||||
"\n",
|
||||
"cards = set(product(feature, feature, feature, feature))\n",
|
||||
"len(cards)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def match(card1, card2):\n",
|
||||
" return Card(match_feature(card1[i], card2[i]) \n",
|
||||
" for i in (0, 1, 2, 3))\n",
|
||||
"\n",
|
||||
"def match_feature(f1, f2):\n",
|
||||
" return f1 if f1 == f2 else 6 - f1 - f2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(1, 3, 1, 3)"
|
||||
]
|
||||
},
|
||||
"execution_count": 15,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"match((1, 2, 3, 3),\n",
|
||||
" (1, 1, 2, 3))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": true,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"source": [
|
||||
"2173 trillion; even worse.\n",
|
||||
"\n",
|
||||
"How many layouts are there where:\n",
|
||||
"- The first six cards form no sets\n",
|
||||
"- The last six cards each form a set?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 38,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"246.7179"
|
||||
]
|
||||
},
|
||||
"execution_count": 38,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"81 * 80 * (79 - 1) * (78 - 3) * (77 - 6) * (76 - 10) / fact(6) / M"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 40,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"1.091475"
|
||||
]
|
||||
},
|
||||
"execution_count": 40,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"15 * 21 * 28 * 36 * 45 * 55 / fact(6) / M"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"All tests pass.\n",
|
||||
"\n",
|
||||
"Size | Sets | NoSets | Set:NoSet ratio for the instruction booklet\n",
|
||||
"-----+--------+--------+----------------\n",
|
||||
" 12 | 33 | 1 | 33:1\n",
|
||||
" 15 | 2,500 | 1 | 2500:1\n",
|
||||
"\n",
|
||||
"Size | Sets | NoSets | Set:NoSet ratio for initial layout\n",
|
||||
"-----+--------+--------+----------------\n",
|
||||
" 12 | 96,844 | 3,156 | 31:1\n",
|
||||
" 15 | 99,963 | 37 | 2702:1\n",
|
||||
"\n",
|
||||
"Size | Sets | NoSets | Set:NoSet ratio for game play\n",
|
||||
"-----+--------+--------+----------------\n",
|
||||
" 12 | 86,065 | 5,871 | 15:1\n",
|
||||
" 15 | 5,595 | 64 | 87:1\n",
|
||||
" 18 | 57 | 0 | inft:1\n",
|
||||
"\n",
|
||||
"Size | Sets | NoSets | Set:NoSet ratio for initial layout, but no sets before dealing last 3 cards\n",
|
||||
"-----+--------+--------+----------------\n",
|
||||
" 12 | 26,426 | 3,242 | 8:1\n",
|
||||
" 15 | 3,207 | 35 | 92:1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import random\n",
|
||||
"import collections \n",
|
||||
"import itertools \n",
|
||||
"\n",
|
||||
"\"\"\"\n",
|
||||
"Game of Set (Peter Norvig 2010-2015)\n",
|
||||
"\n",
|
||||
"How often do sets appear when we deal an array of cards?\n",
|
||||
"How often in the course of playing out the game?\n",
|
||||
"\n",
|
||||
"Here are the data types we will use:\n",
|
||||
"\n",
|
||||
" card: A string, such as '3R=0', meaning \"three red striped ovals\".\n",
|
||||
" deck: A list of cards, initially of length 81.\n",
|
||||
" layout: A list of cards, initially of length 12.\n",
|
||||
" set: A tuple of 3 cards.\n",
|
||||
" Tallies: A dict: {12: {True: 33, False: 1}}} means a layout of size 12\n",
|
||||
" tallied 33 sets and 1 non-set.\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"#### Cards, dealing cards, and defining the notion of sets.\n",
|
||||
"\n",
|
||||
"CARDS = [number + color + shade + symbol \n",
|
||||
" for number in '123' \n",
|
||||
" for color in 'RGP' \n",
|
||||
" for shade in '@O=' \n",
|
||||
" for symbol in '0SD']\n",
|
||||
"\n",
|
||||
"def deal(n, deck): \n",
|
||||
" \"Deal n cards from the deck.\"\n",
|
||||
" return [deck.pop() for _ in range(n)]\n",
|
||||
"\n",
|
||||
"def is_set(cards):\n",
|
||||
" \"Are these 3 cards a set? No if any feature has 2 values.\"\n",
|
||||
" for f in range(4):\n",
|
||||
" values = {card[f] for card in cards}\n",
|
||||
" if len(values) == 2: \n",
|
||||
" return False\n",
|
||||
" return True\n",
|
||||
"\n",
|
||||
"def find_set(layout):\n",
|
||||
" \"Return a set found from this layout, if there is one.\"\n",
|
||||
" for cards in itertools.combinations(layout, 3):\n",
|
||||
" if is_set(cards):\n",
|
||||
" return cards\n",
|
||||
" return ()\n",
|
||||
"\n",
|
||||
"#### Tallying set:no-set ratio\n",
|
||||
"\n",
|
||||
"def Tallies(): \n",
|
||||
" \"A data structure to keep track, for each size, the number of sets and no-sets.\"\n",
|
||||
" return collections.defaultdict(lambda: {True: 0, False: 0})\n",
|
||||
"\n",
|
||||
"def tally(tallies, layout):\n",
|
||||
" \"Record that a set was found or not found in a layout of given size; return the set.\"\n",
|
||||
" s = find_set(layout)\n",
|
||||
" tallies[len(layout)][bool(s)] += 1\n",
|
||||
" return s\n",
|
||||
" \n",
|
||||
"#### Three experiments\n",
|
||||
"\n",
|
||||
"def tally_initial_layout(N, sizes=(12, 15)):\n",
|
||||
" \"Record tallies for N initial deals.\"\n",
|
||||
" tallies = Tallies()\n",
|
||||
" deck = list(CARDS)\n",
|
||||
" for deal in range(N):\n",
|
||||
" random.shuffle(deck)\n",
|
||||
" for size in sizes:\n",
|
||||
" tally(tallies, deck[:size])\n",
|
||||
" return tallies\n",
|
||||
"\n",
|
||||
"def tally_initial_layout_no_prior_sets(N, sizes=(12, 15)):\n",
|
||||
" \"\"\"Simulate N initial deals for each size, keeping tallies for Sets and NoSets,\n",
|
||||
" but only when there was no set with 3 fewer cards.\"\"\"\n",
|
||||
" tallies = Tallies()\n",
|
||||
" deck = list(CARDS)\n",
|
||||
" for deal in range(N):\n",
|
||||
" random.shuffle(deck)\n",
|
||||
" for size in sizes:\n",
|
||||
" if not find_set(deck[:size-3]):\n",
|
||||
" tally(tallies, deck[:size])\n",
|
||||
" return tallies\n",
|
||||
"\n",
|
||||
"def tally_game_play(N):\n",
|
||||
" \"Record tallies for the play of N complete games.\"\n",
|
||||
" tallies = Tallies()\n",
|
||||
" for game in range(N):\n",
|
||||
" deck = list(CARDS)\n",
|
||||
" random.shuffle(deck)\n",
|
||||
" layout = deal(12, deck)\n",
|
||||
" while deck:\n",
|
||||
" s = tally(tallies, layout)\n",
|
||||
" # Pick up the cards in the set, if any\n",
|
||||
" for card in s: layout.remove(card)\n",
|
||||
" # Deal new cards\n",
|
||||
" if len(layout) < 12 or not s:\n",
|
||||
" layout += deal(3, deck) \n",
|
||||
" return tallies\n",
|
||||
"\n",
|
||||
"def experiments(N):\n",
|
||||
" show({12: [1, 33], 15: [1, 2500]}, \n",
|
||||
" 'the instruction booklet')\n",
|
||||
" show(tally_initial_layout(N), \n",
|
||||
" 'initial layout')\n",
|
||||
" show(tally_game_play(N // 25), \n",
|
||||
" 'game play')\n",
|
||||
" show(tally_initial_layout_no_prior_sets(N), \n",
|
||||
" 'initial layout, but no sets before dealing last 3 cards')\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def show(tallies, label):\n",
|
||||
" \"Print out the counts.\"\n",
|
||||
" print()\n",
|
||||
" print('Size | Sets | NoSets | Set:NoSet ratio for', label)\n",
|
||||
" print('-----+--------+--------+----------------')\n",
|
||||
" for size in sorted(tallies):\n",
|
||||
" y, n = tallies[size][True], tallies[size][False]\n",
|
||||
" ratio = ('inft' if n==0 else int(round(float(y)/n)))\n",
|
||||
" print('{:4d} |{:7,d} |{:7,d} | {:4}:1'\n",
|
||||
" .format(size, y, n, ratio))\n",
|
||||
"\n",
|
||||
"def test():\n",
|
||||
" assert len(CARDS) == 81 == len(set(CARDS))\n",
|
||||
" assert is_set(('3R=O', '2R=S', '1R=D'))\n",
|
||||
" assert not is_set(('3R=0', '2R=S', '1R@D'))\n",
|
||||
" assert find_set(['1PO0', '2G=D', '3R=0', '2R=S', '1R=D']) == ('3R=0', '2R=S', '1R=D')\n",
|
||||
" assert not find_set(['1PO0', '2G=D', '3R=0', '2R=S', '1R@D'])\n",
|
||||
" photo = '2P=0 3P=D 2R=0 3GO0 2POD 3R@D 2RO0 2ROS 1P@S 2P@0 3ROS 2GOD 2P@D 1GOD 3GOS'.split()\n",
|
||||
" assert not find_set(photo)\n",
|
||||
" assert set(itertools.combinations([1, 2, 3, 4], 3)) == {(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)}\n",
|
||||
" print('All tests pass.')\n",
|
||||
"\n",
|
||||
"test()\n",
|
||||
"experiments(100000)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {
|
||||
"button": false,
|
||||
"collapsed": false,
|
||||
"deletable": true,
|
||||
"new_sheet": false,
|
||||
"run_control": {
|
||||
"read_only": false
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"Size | Sets | NoSets | Set:NoSet ratio for the instruction booklet\n",
|
||||
"-----+--------+--------+----------------\n",
|
||||
" 12 | 33 | 1 | 33:1\n",
|
||||
" 15 | 2,500 | 1 | 2500:1\n",
|
||||
"\n",
|
||||
"Size | Sets | NoSets | Set:NoSet ratio for initial layout\n",
|
||||
"-----+--------+--------+----------------\n",
|
||||
" 12 | 9,696 | 304 | 32:1\n",
|
||||
" 15 | 9,995 | 5 | 1999:1\n",
|
||||
"\n",
|
||||
"Size | Sets | NoSets | Set:NoSet ratio for game play\n",
|
||||
"-----+--------+--------+----------------\n",
|
||||
" 12 | 8,653 | 542 | 16:1\n",
|
||||
" 15 | 513 | 5 | 103:1\n",
|
||||
" 18 | 5 | 0 | inft:1\n",
|
||||
"\n",
|
||||
"Size | Sets | NoSets | Set:NoSet ratio for initial layout, but no sets before dealing last 3 cards\n",
|
||||
"-----+--------+--------+----------------\n",
|
||||
" 12 | 2,630 | 294 | 9:1\n",
|
||||
" 15 | 293 | 1 | 293:1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"experiments(10000)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true,
|
||||
"run_control": {}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
13278
Scrabble.ipynb
Normal file
13278
Scrabble.ipynb
Normal file
File diff suppressed because one or more lines are too long
2087
Sicherman Dice.ipynb
Normal file
2087
Sicherman Dice.ipynb
Normal file
File diff suppressed because one or more lines are too long
1275
Sudoku IPython Notebook.ipynb
Normal file
1275
Sudoku IPython Notebook.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
443
Untitled6.ipynb
Normal file
443
Untitled6.ipynb
Normal file
@ -0,0 +1,443 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Grid Domain\n",
|
||||
"\n",
|
||||
"Many games are played on a two-dimensional grid. We should represent two-dimensional points, and grids:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def Point(x, y): \"Two-dimensional (x, y) point.\"; return (x, y)\n",
|
||||
" \n",
|
||||
"def X(point): \"X coordinate of a point\"; return point[0]\n",
|
||||
"\n",
|
||||
"def Y(point): \"Y coordinate of a point\"; return point[1]\n",
|
||||
"\n",
|
||||
"class Grid(dict): \n",
|
||||
" \"A mapping of {point: contents}; also has width and height methods.\"\n",
|
||||
" def __init__(self, data):\n",
|
||||
" if isinstance(data, str):\n",
|
||||
" data = grid_from_picture(data)\n",
|
||||
" self.update(data)\n",
|
||||
" \n",
|
||||
" def width(self): return max(X(p) for p in self) + 1\n",
|
||||
" def height(self): return max(Y(p) for p in self) + 1\n",
|
||||
" \n",
|
||||
" def __str__(self):\n",
|
||||
" return '\\n'.join(''.join(self[Point(x, y)] \n",
|
||||
" for x in range(self.width()))\n",
|
||||
" for y in range(self.height()))\n",
|
||||
" __repr__ = __str__\n",
|
||||
" \n",
|
||||
"def grid_from_picture(text):\n",
|
||||
" lines = text.strip().splitlines()\n",
|
||||
" return {Point(x, y): ch\n",
|
||||
" for (y, line) in enumerate(lines)\n",
|
||||
" for (x, ch) in enumerate(line)}\n",
|
||||
"\n",
|
||||
"def add_sequence(grid, seq, marker='*'):\n",
|
||||
" for point in seq:\n",
|
||||
" grid[point] = marker\n",
|
||||
" return grid\n",
|
||||
"\n",
|
||||
"g = Grid(\"\"\"\n",
|
||||
"S...|.G\n",
|
||||
"....|.-\n",
|
||||
".---+..\n",
|
||||
".......\n",
|
||||
"\"\"\")\n",
|
||||
"\n",
|
||||
"g"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def depth_first_search(problem):\n",
|
||||
" return recursive_depth_first_search(problem, Node(problem.initial), set())\n",
|
||||
"\n",
|
||||
"def recursive_depth_first_search(problem, node, explored):\n",
|
||||
" if problem.is_goal(node.state):\n",
|
||||
" return node.action_sequence()\n",
|
||||
" else:\n",
|
||||
" for action in problem.actions(node.state):\n",
|
||||
" child = node.child(problem, action)\n",
|
||||
" if child.state not in explored:\n",
|
||||
" explored.add(child.state)\n",
|
||||
" result = recursive_depth_first_search(problem, child, explored)\n",
|
||||
" if result is not None:\n",
|
||||
" return result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# GPX - Coyote Century\n",
|
||||
"\n",
|
||||
"I have tracks from the 2012 Century Juliet and I did, but for some reason I didn't get time stamps (nor elevation), just lat/lon. I need timestamps to import into Strava or Garmin. \n",
|
||||
"\n",
|
||||
"So I'll take my average speed (12.3 mph) and start time (6/14/2012 8:12) and assign time stamps, assuming (incorrectly) a constant speed. Then I'll write the whole thing as a file in GPX format:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 72,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"import re\n",
|
||||
"import math\n",
|
||||
"import re\n",
|
||||
"\n",
|
||||
"text = open('/Users/pnorvig/Downloads/vrp9cybkx_Coyote-Century-Juliet-and-Peter.gpx').read()\n",
|
||||
"points = re.findall('<trkpt lat=\"(.*?)\" lon=\"(.*?)\">', text)\n",
|
||||
"\n",
|
||||
"gpx_template = \"\"\"\n",
|
||||
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n",
|
||||
"<gpx xmlns=\"http://www.topografix.com/GPX/1/1\" \n",
|
||||
" creator=\"KML2GPX.COM\" version=\"1.1\" \n",
|
||||
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" \n",
|
||||
" xsi:schemaLocation=\"http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd\">\n",
|
||||
" <trk><name>Track 1</name><number>1</number>\n",
|
||||
" <name>Coyote Creek Century, Peter and Juliet</name>\n",
|
||||
" <trkseg>\n",
|
||||
" {}\n",
|
||||
" </trkseg>\n",
|
||||
" </trk>\n",
|
||||
"</gpx>\n",
|
||||
"</xml>\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"trkpt_template = \"\"\"<trkpt lat=\"{}\" lon=\"{}\"><time>2012-06-14T{}Z</time></trkpt>\"\"\"\n",
|
||||
"\n",
|
||||
"speed = 12.3 # Overall speed of 12.3 MPH\n",
|
||||
"\n",
|
||||
"start = time.mktime((2012, 6, 14, 8, 12, 0, -1, -1, -1))\n",
|
||||
"\n",
|
||||
"def hms(t): return time.strftime('%H:%M:%S', time.localtime(t))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 28,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(2816, ('37.310427', '-121.843023'), ('37.310474', '-121.843086'))"
|
||||
]
|
||||
},
|
||||
"execution_count": 28,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"len(points), points[0], points[-1]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 42,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(1339686720.0, 'Thu Jun 14 08:12:00 2012', '08:12:00')"
|
||||
]
|
||||
},
|
||||
"execution_count": 42,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"(start, \n",
|
||||
" time.asctime(time.localtime(start)), \n",
|
||||
" hms(start))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 15,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from math import radians, cos, sin, asin, sqrt\n",
|
||||
"def haversine(lat1, lon1, lat2, lon2):\n",
|
||||
" \"\"\"\n",
|
||||
" Calculate the great circle distance between two points \n",
|
||||
" on the earth (specified in decimal degrees)\n",
|
||||
" \"\"\"\n",
|
||||
" # convert decimal degrees to radians \n",
|
||||
" lon1, lat1, lon2, lat2 = [radians(float(x)) for x in [lon1, lat1, lon2, lat2]]\n",
|
||||
" # haversine formula \n",
|
||||
" dlon = lon2 - lon1 \n",
|
||||
" dlat = lat2 - lat1 \n",
|
||||
" a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2\n",
|
||||
" c = 2 * asin(sqrt(a)) \n",
|
||||
" return 3963.1676 * c # Radius of Earth in miles\n",
|
||||
"\n",
|
||||
"def pairs(sequence):\n",
|
||||
" return [sequence[i:i+2] for i in range(len(sequence) - 1)]\n",
|
||||
"\n",
|
||||
"assert pairs('abcd') == ['ab', 'bc', 'cd']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 37,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"0.005787321233984945"
|
||||
]
|
||||
},
|
||||
"execution_count": 37,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"d = haversine(\"37.310427\", \"-121.843023\", \"37.31041\", \"-121.843126\")\n",
|
||||
"d"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 36,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"100.09224214050319"
|
||||
]
|
||||
},
|
||||
"execution_count": 36,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"sum(haversine(a, b, c, d) for ((a, b), (c, d)) in pairs(points))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 39,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"1.693850117263886"
|
||||
]
|
||||
},
|
||||
"execution_count": 39,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"d / speed * 60 * 60"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 44,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"'08:14:49'"
|
||||
]
|
||||
},
|
||||
"execution_count": 44,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"hms(start + d / speed * 60 * 60)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 54,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def insert_times(points):\n",
|
||||
" (lat, lon), time = points[0], start\n",
|
||||
" for (lat2, lon2) in points:\n",
|
||||
" d = haversine(lat, lon, lat2, lon2) \n",
|
||||
" time += d / speed * 60 * 60\n",
|
||||
" yield trkpt_template.format(lat2, lon2, hms(time))\n",
|
||||
" lat, lon = lat2, lon2"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 73,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n",
|
||||
"<gpx xmlns=\"http://www.topografix.com/GPX/1/1\" \n",
|
||||
" creator=\"KML2GPX.COM\" version=\"1.1\" \n",
|
||||
" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" \n",
|
||||
" xsi:schemaLocation=\"http://www.topografix.com/GPX/1/1 http://www.topografix.com/GPX/1/1/gpx.xsd\">\n",
|
||||
" <trk><name>Track 1</name><number>1</number>\n",
|
||||
" <name>Coyote Creek Century, Peter and Juliet</name>\n",
|
||||
" <trkseg>\n",
|
||||
" <trkpt lat=\"37.310427\" lon=\"-121.843023\"><time>2012-06-14T08:12:00Z</time></trkpt>\n",
|
||||
" <trkpt lat=\"37.31041\" lon=\"-121.843126\"><time>2012-06-14T08:12:01Z</time></trkpt>\n",
|
||||
" <trkpt lat=\"37.310461\" lon=\"-121.84309700000001\"><time>2012-06-14T08:12:02Z</time></trkpt>\n",
|
||||
" <trkpt lat=\"37.310313\" lon=\"-121.84318\"><time>2012-06-14T08:12:06Z</time></trkpt>\n",
|
||||
" <trkpt lat=\"37.310198\" lon=\"-121.843346\"><time>2012-06-14T08:12:09Z</time></trkpt>\n",
|
||||
" </trkseg>\n",
|
||||
" </trk>\n",
|
||||
"</gpx>\n",
|
||||
"</xml>\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def convert(points):\n",
|
||||
" records = '\\n '.join(insert_times(points))\n",
|
||||
" return gpx_template.format(records)\n",
|
||||
" \n",
|
||||
"print(convert(points[:5]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 74,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"256252"
|
||||
]
|
||||
},
|
||||
"execution_count": 74,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"open('coyote_creek_century.gpx', 'w').write(convert(points))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "raw",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"REVISE amendment = re + sight\n",
|
||||
"RESEARCH analysis = re + find \n",
|
||||
"REFRACT bending light = re + portion\n",
|
||||
"RELIABLE dependable = re + susceptible\n",
|
||||
"RENEG do something again = re + plea\n",
|
||||
"REDO echo = re + action word\n",
|
||||
"RENOUNCE give up = re + endorse\n",
|
||||
"RESOURCE good = re + origin\n",
|
||||
"RECONSTITUTE make whole = re + health\n",
|
||||
"RELOCATE move = re + find\n",
|
||||
"REVEAL find again = re + hide\n",
|
||||
"REPAIR fix = re + two things\n",
|
||||
"RECREATION pastime = re + bringing into existence\n",
|
||||
"REFINE purify = re + exact\n",
|
||||
"REPERCUSSION ramification = re + impact\n",
|
||||
"RECYCLE salvage = re + periodicity\n",
|
||||
"REPURPOSE sell = re + end\n",
|
||||
"RESUBMIT state briefly = re + surrender\n",
|
||||
"RECOLLECT summon a memory = re + amass\n",
|
||||
"REMEMBER summon a memory = re + section\n",
|
||||
"RECALL summon a memory = re + shout\n",
|
||||
"REPRESENT symbolize = re + give\n",
|
||||
"RELATE tell = re + tardy"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
3706
xkcd1313-part2.ipynb
Normal file
3706
xkcd1313-part2.ipynb
Normal file
File diff suppressed because one or more lines are too long
1220
xkcd1313.ipynb
Normal file
1220
xkcd1313.ipynb
Normal file
File diff suppressed because it is too large
Load Diff
428
xkxd-part3.ipynb
Normal file
428
xkxd-part3.ipynb
Normal file
@ -0,0 +1,428 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def partition(covers):\n",
|
||||
" # covers: {w: {r,...}}\n",
|
||||
" # invcovers: {r: {w,...}}\n",
|
||||
" pass\n",
|
||||
"\n",
|
||||
"def connected(w, covers, invcovers, result):\n",
|
||||
" if w not in result:\n",
|
||||
" result.add(w)\n",
|
||||
" for r in covers[w]:\n",
|
||||
" for w2 in invcovers[r]:\n",
|
||||
" connected(w2, covers, invcovers, result)\n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"for (W, L, legend) in ALL:\n",
|
||||
" covers = eliminate_dominated(regex_covers(W, L))\n",
|
||||
" invcovers = invert_multimap(covers)\n",
|
||||
" start = list(covers)[2]\n",
|
||||
" P = connected(start, covers, invcovers, set())\n",
|
||||
" print legend, len(P), len(covers), len(covers)-len(P)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Finding Shorter Regexes: Trying Multiple Times\n",
|
||||
"----\n",
|
||||
" \n",
|
||||
"Why run just two versions of `findregex`? Why not run 1000 variations, and then pick the best solution? Of course, I don't want to write 1000 different functions by hand; I want an automated way of varying each run. I can think of three easy things to vary:\n",
|
||||
" \n",
|
||||
"* The number '4' in the `score` function. That is, vary the tradeoff between number of winners matched and number of characters.\n",
|
||||
"* The tie-breaker. In case of a tie, Python's `max` function always picks the first one. Let's make it choose a different 'best' regex from among all those that tie.\n",
|
||||
"* The greediness. Don't be so greedy (picking the best) every time. Occasionally pick a not-quite-best component, and see if that works out better.\n",
|
||||
" \n",
|
||||
"The first of these is easy; we just use the `random.choice` function to choose an integer, `K`, to serve as the tradeoff factor. \n",
|
||||
"\n",
|
||||
"The second is easy too. We could write an alternative to the `max` function, say `max_random_tiebreaker`. That would work, but an easier approach is to build the tiebreaker into the `score` function. In addition to awarding points for matching winners and the number of characters, we will have add in a tiebreaker: a random number between 0 and 1. Since all the scores are otherwise integers, this will not change the order of the scores, but it will break ties.\n",
|
||||
"\n",
|
||||
"The third we can accomplish by allowing the random factor to be larger than 1 (allowing us to pick a component that is not the shortest) or even larger than `K` (allowing us to pick a component that does not cover the most winners). \n",
|
||||
" \n",
|
||||
"I will factor out the function `greedy_search` to do a single computation oof a covering regex, while keeping the name `findregex` for the top level function that now calls `greedy_search` 1000 times and chooses the best (shortest length) result."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"def findregex(winners, losers, tries=1000):\n",
|
||||
" \"Find a regex that matches all winners but no losers (sets of strings).\"\n",
|
||||
" # Repeatedly call 'findregex1' the given number of tries; pick the shortest result\n",
|
||||
" covers = regex_covers(winners, losers)\n",
|
||||
" results = [greedy_search(winners, covers) for _ in range(tries)]\n",
|
||||
" return min(results, key=len)\n",
|
||||
"\n",
|
||||
"def greedy_search(winners, covers):\n",
|
||||
" # On each iteration, add the 'best' component in covers to 'result',\n",
|
||||
" # remove winners covered by best, and remove from 'pool' any components\n",
|
||||
" # that no longer match any remaining winners.\n",
|
||||
" winners = set(winners) # Copy input so as not to modify it.\n",
|
||||
" pool = set(covers)\n",
|
||||
" result = []\n",
|
||||
" \n",
|
||||
" def matches(regex, strings): return {w for w in covers[regex] if w in strings}\n",
|
||||
" \n",
|
||||
" K = random.choice((2, 3, 4, 4, 5, 6))\n",
|
||||
" T = random.choice((1., 1.5, 2., K+1., K+2.))\n",
|
||||
" def score(c): \n",
|
||||
" return K * len(matches(c, winners)) - len(c) + random.uniform(0., T)\n",
|
||||
" \n",
|
||||
" while winners:\n",
|
||||
" best = max(pool, key=score)\n",
|
||||
" result.append(best)\n",
|
||||
" winners -= covers[best]\n",
|
||||
" pool -= {c for c in pool if covers[c].isdisjoint(winners)}\n",
|
||||
" return OR(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def factorial1(n):\n",
|
||||
" if (n <= 1):\n",
|
||||
" return 1\n",
|
||||
" else:\n",
|
||||
" return n * factorial1(n-1)\n",
|
||||
"\n",
|
||||
"def factorial2(n, partial_solution=1):\n",
|
||||
" if (n <= 1):\n",
|
||||
" return partial_solution\n",
|
||||
" else:\n",
|
||||
" return factorial2(n-1, n * partial_solution)\n",
|
||||
" \n",
|
||||
"assert factorial1(6) == factorial2(6) == 720"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def findregex(winners, losers, calls=100000):\n",
|
||||
" \"Find the shortest disjunction of regex components that covers winners but not losers.\"\n",
|
||||
" covers = regex_covers(winners, losers)\n",
|
||||
" best = '^(' + OR(winners) + ')$'\n",
|
||||
" state = Struct(best=best, calls=calls)\n",
|
||||
" return bb_search('', covers, state).best\n",
|
||||
"\n",
|
||||
"def bb_search(regex, covers, state):\n",
|
||||
" \"\"\"Recursively build a shortest regex from the components in covers.\"\"\"\n",
|
||||
" if state.calls > 0:\n",
|
||||
" state.calls -= 1\n",
|
||||
" regex, covers = simplify_covers(regex, covers)\n",
|
||||
" if not covers:\n",
|
||||
" state.best = min(regex, state.best, key=len)\n",
|
||||
" elif len(OR2(regex, min(covers, key=len))) < len(state.best):\n",
|
||||
" # Try with and without the greedy-best component\n",
|
||||
" def score(c): return 4 * len(covers[c]) - len(c)\n",
|
||||
" best = max(covers, key=score)\n",
|
||||
" covered = covers[best]\n",
|
||||
" covers.pop(best)\n",
|
||||
" bb_search(OR2(regex, best), {c:covers[c]-covered for c in covers}, state)\n",
|
||||
" bb_search(regex, covers, state)\n",
|
||||
" return state\n",
|
||||
"\n",
|
||||
"class Struct(object):\n",
|
||||
" \"A mutable structure with specified fields and values.\"\n",
|
||||
" def __init__(self, **kwds): vars(self).update(kwds)\n",
|
||||
" def __repr__(self): return '<%s>' % vars(self)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def findregex(winners, losers, calls=100000):\n",
|
||||
" \"Find the shortest disjunction of regex components that covers winners but not losers.\"\n",
|
||||
" covers = regex_covers(winners, losers)\n",
|
||||
" solution = '^(' + OR(winners) + ')$'\n",
|
||||
" solution, calls = bb_search('', covers, solution, calls)\n",
|
||||
" return solution\n",
|
||||
"\n",
|
||||
"def bb_search(regex, covers, solution, calls):\n",
|
||||
" \"\"\"Recursively build a shortest regex from the components in covers.\"\"\"\n",
|
||||
" if calls > 0:\n",
|
||||
" calls -= 1\n",
|
||||
" regex, covers = simplify_covers(regex, covers)\n",
|
||||
" if not covers: # Solution is complete\n",
|
||||
" solution = min(regex, solution, key=len)\n",
|
||||
" elif len(OR2(regex, min(covers, key=len))) < len(solution):\n",
|
||||
" # Try with and without the greedy-best component\n",
|
||||
" def score(c): return 4 * len(covers[c]) - len(c)\n",
|
||||
" r = max(covers, key=score) # Best component\n",
|
||||
" covered = covers[r] # Set of winners covered by r\n",
|
||||
" covers.pop(r)\n",
|
||||
" solution, calls = bb_search(OR2(regex, r), \n",
|
||||
" {c:covers[c]-covered for c in covers}, \n",
|
||||
" solution, calls)\n",
|
||||
" solution, calls = bb_search(regex, covers, solution, calls)\n",
|
||||
" return solution, calls"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def findregex(winners, losers, calls=100000):\n",
|
||||
" \"Find the shortest disjunction of regex components that covers winners but not losers.\"\n",
|
||||
" global SOLUTION, CALLS\n",
|
||||
" SOLUTION = '^(' + OR(winners) + ')$'\n",
|
||||
" CALLS = calls\n",
|
||||
" return bb_search(None, regex_covers(winners, losers))\n",
|
||||
"\n",
|
||||
"def bb_search(regex, covers):\n",
|
||||
" \"\"\"Recursively build a shortest regex from the components in covers.\"\"\"\n",
|
||||
" global SOLUTION, CALLS\n",
|
||||
" CALLS -= 1\n",
|
||||
" regex, covers = simplify_covers(regex, covers)\n",
|
||||
" if not covers: # Solution is complete\n",
|
||||
" SOLUTION = min(regex, SOLUTION, key=len)\n",
|
||||
" elif CALLS >= 0 and len(OR(regex, min(covers, key=len))) < len(SOLUTION):\n",
|
||||
" # Try with and without the greedy-best component\n",
|
||||
" def score(c): return 4 * len(covers[c]) - len(c)\n",
|
||||
" r = max(covers, key=score) # Best component\n",
|
||||
" covered = covers[r] # Set of winners covered by r\n",
|
||||
" covers.pop(r)\n",
|
||||
" bb_search(OR(regex, r), {c:covers[c]-covered for c in covers})\n",
|
||||
" bb_search(regex, covers)\n",
|
||||
" return SOLUTION\n",
|
||||
" \n",
|
||||
"def OR(*regexes):\n",
|
||||
" \"OR together regexes. Ignore 'None' components.\"\n",
|
||||
" return '|'.join(r for r in regexes if r is not None)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def invert_multimap(multimap):\n",
|
||||
" result = collections.defaultdict(list)\n",
|
||||
" for key in multimap:\n",
|
||||
" for val in multimap[key]:\n",
|
||||
" result[val].append(key)\n",
|
||||
" return result"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## For debugging\n",
|
||||
"\n",
|
||||
"def findregex(winners, losers, calls=100000):\n",
|
||||
" \"Find the shortest disjunction of regex components that covers winners but not losers.\"\n",
|
||||
" solution = '^(' + OR(winners) + ')$'\n",
|
||||
" covers = regex_covers(winners, losers)\n",
|
||||
" b = BranchBound(solution, calls)\n",
|
||||
" b.search(None, covers)\n",
|
||||
" print b.calls, 'calls', len(b.solution), 'len'\n",
|
||||
" return b.solution\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"def triage_covers(partial, covers):\n",
|
||||
" \"Simplify covers by eliminating dominated regexes, and picking ones that uniquely cover a winner.\"\n",
|
||||
" previous = None\n",
|
||||
" while covers != previous:\n",
|
||||
" previous = covers\n",
|
||||
" # Eliminate regexes that are dominated by another regex\n",
|
||||
" covers = eliminate_dominated(covers) # covers = {regex: {winner,...}}\n",
|
||||
" coverers = invert_multimap(covers) # coverers = {winner: {regex,...}}\n",
|
||||
" # For winners covered by only one component, move winner from covers to regex\n",
|
||||
" singletons = {coverers[w][0] for w in coverers if len(coverers[w]) == 1}\n",
|
||||
" if singletons:\n",
|
||||
" partial = OR(partial, OR(singletons))\n",
|
||||
" covered = {w for c in singletons for w in covers[c]}\n",
|
||||
" covers = {c:covers[c]-covered for c in covers if c not in singletons}\n",
|
||||
" return partial, covers\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
", and to , who suggested looking at [WFSTs](http://www.openfst.org/twiki/bin/view/FST/WebHome)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def regex_covers(winners, losers):\n",
|
||||
" \"\"\"Generate regex components and return a dict of {regex: {winner...}}.\n",
|
||||
" Each regex matches at least one winner and no loser.\"\"\"\n",
|
||||
" losers_str = '\\n'.join(losers)\n",
|
||||
" wholes = {'^'+winner+'$' for winner in winners}\n",
|
||||
" parts = {d for w in wholes for p in subparts(w) for d in dotify(p)}\n",
|
||||
" chars = set(cat(winners))\n",
|
||||
" pairs = {A+'.'+rep_char+B for A in chars for B in chars for rep_char in '+*?'}\n",
|
||||
" reps = {r for p in parts for r in repetitions(p)}\n",
|
||||
" pool = wholes | parts | pairs | reps \n",
|
||||
" searchers = [re.compile(c, re.MULTILINE).search for c in pool]\n",
|
||||
" covers = {r: set(filter(searcher, winners)) \n",
|
||||
" for (r, searcher) in zip(pool, searchers)\n",
|
||||
" if not searcher(losers_str)}\n",
|
||||
" covers = eliminate_dominated(covers)\n",
|
||||
" return covers\n",
|
||||
" return add_character_class_components(covers)\n",
|
||||
"\n",
|
||||
"def add_character_class_components(covers):\n",
|
||||
" for (B, Ms, E) in combine_splits(covers):\n",
|
||||
" N = len(Ms)\n",
|
||||
" or_size = N*len(B+'.'+E) + N-1 # N=3 => 'B1E|B2E|B3E'\n",
|
||||
" class_size = len(B+'[]'+E) + N # N=3 => 'B[123]E'\n",
|
||||
" winners = {w for m in Ms for w in Ms[m]}\n",
|
||||
" if class_size < or_size:\n",
|
||||
" covers[B + make_char_class(Ms) + E] = winners\n",
|
||||
" return covers\n",
|
||||
"\n",
|
||||
"def split3(word):\n",
|
||||
" \"Splits a word into 3 parts, all ways, with middle part having 0 or 1 character.\"\n",
|
||||
" return [(word[:i], word[i:i+L], word[i+L:]) \n",
|
||||
" for i in range(len(word)+1) for L in (0, 1)\n",
|
||||
" if not word[i:i+L].startswith(('.', '+', '*', '?'))]\n",
|
||||
"\n",
|
||||
"def combine_splits(covers):\n",
|
||||
" \"Convert covers = {BME: {w...}} into a list of [(B, {M...}, E, {w...}].\"\n",
|
||||
" table = collections.defaultdict(dict) # table = {(B, E): {M: {w...}}}\n",
|
||||
" for r in covers:\n",
|
||||
" for (B, M, E) in split3(r):\n",
|
||||
" table[B, E][M] = covers[r]\n",
|
||||
" return [(B, Ms, E) for ((B, E), Ms) in table.items()\n",
|
||||
" if len(Ms) > 1]\n",
|
||||
"\n",
|
||||
"def make_char_class(chars):\n",
|
||||
" chars = set(chars)\n",
|
||||
" return '[%s]%s' % (cat(chars), ('?' if '' in chars else ''))\n",
|
||||
"\n",
|
||||
"covers = regex_covers(boys, girls)\n",
|
||||
"old = set(covers)\n",
|
||||
"print len(covers)\n",
|
||||
"covers = add_character_class_components(covers)\n",
|
||||
"print len(covers)\n",
|
||||
"print set(covers) - old\n",
|
||||
"\n",
|
||||
"print dict(combine_splits({'..a': {1,2,3}, '..b': {4,5,6}, '..c':{7}}))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Consider the two components `'..a'` and `'..b'`. If we wanted to cover all the winners that both of these match, we could use `'..a|..b'`, or we could share the common prefix and introduce a *character class* to get `'..[ab]'`. Since the former is 7 characters and the later is only 6, the later would be preferred. It would be an even bigger win to replace `'..az|..bz|..cz'` with `'..[abc]z'`; that reduces the count from 14 to 8. Similarly, replacing `'..az|..bz|..z'` with `'..[ab]?z'` saves 5 characters.\n",
|
||||
"\n",
|
||||
"There seems to be potential savings with character classes. But how do we know which characters from which components to combine into classes? To keep things from getting out of control, I'm going to only look at components that are left after we eliminate dominated. That is not an ideal approach—there may well be some components that are dominated on their own, but could be part of an optimal solution when combined with other components into a character class. But I'm going to keep it simple."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"Searching: Better Bounds\n",
|
||||
"----\n",
|
||||
"\n",
|
||||
"Branch and bound prunes the search tree whenever it is on a branch that is guaranteed to result in a solution that is no better than the best solution found so far. Currently we estimate the best possible solution along the current branch by taking the length of the partial solution and adding the length of the shortest component in `covers`. We do that because we know for sure that we need at least one component, but we don't know for sure how many components we'll need (nor how long each of them will be. So our estimate is often severely underestimates the true answer, which means we don't cut off search some places where we could, if only we had a better estimate.\n",
|
||||
" \n",
|
||||
"Here's one way to get a better bound. We'll define the following quantities:\n",
|
||||
"\n",
|
||||
"+ *P* = the length of the partial solution, plus the \"|\", if needed. So if the partial solution is `None`, then *P* will be zero, otherwise *P* is the length plus 1.\n",
|
||||
"+ *S* = the length of the shortest regex component in `covers`.\n",
|
||||
"+ *W* = the number of winners still in `covers`.\n",
|
||||
"+ *C* = the largest number of winners covered by any regex in `covers`.\n",
|
||||
"\n",
|
||||
"If we assume The current estimate is *P* + *S*. We can see that a better estimate is *P* + *S* × ceil(*W* / *C*)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": false
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import math\n",
|
||||
"\n",
|
||||
"class BranchBound(object):\n",
|
||||
" \"Hold state information for a branch and bound search.\"\n",
|
||||
" def __init__(self, solution, calls):\n",
|
||||
" self.solution, self.calls = solution, calls\n",
|
||||
" \n",
|
||||
" def search(self, covers, partial=None):\n",
|
||||
" \"Recursively extend partial regex until it matches all winners in covers.\"\n",
|
||||
" if self.calls <= 0: \n",
|
||||
" return self.solution\n",
|
||||
" self.calls -= 1\n",
|
||||
" covers, partial = simplify_covers(covers, partial)\n",
|
||||
" if not covers: # Nothing left to cover; solution is complete\n",
|
||||
" self.solution = min(partial, self.solution, key=len)\n",
|
||||
" else:\n",
|
||||
" P = 0 if not partial else len(partial) + 1\n",
|
||||
" S = len(min(covers, key=len))\n",
|
||||
" C = max(len(covers[r]) for r in covers)\n",
|
||||
" W = len(set(w for r in covers for w in covers[r]))\n",
|
||||
" if P + S * math.ceil(W / C) < len(self.solution):\n",
|
||||
" # Try with and without the greedy-best component\n",
|
||||
" def score(r): return 4 * len(covers[r]) - len(r)\n",
|
||||
" r = max(covers, key=score) # Best component\n",
|
||||
" covered = covers[r] # Set of winners covered by r\n",
|
||||
" covers.pop(r)\n",
|
||||
" self.search({c:covers[c]-covered for c in covers}, OR(partial, r))\n",
|
||||
" self.search(covers, partial)\n",
|
||||
" return self.solution"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.0"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
Loading…
Reference in New Issue
Block a user