Add files via upload
This commit is contained in:
parent
e52ad0503f
commit
ba60638bb5
209
Snobol.ipynb
Normal file
209
Snobol.ipynb
Normal file
@ -0,0 +1,209 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Bad Grade, Good Experience\n",
|
||||
"\n",
|
||||
"Recently I was asked a question I hadn't thought about before: \n",
|
||||
"\n",
|
||||
"> *As a student, did you ever get a bad grade on a programming assignment?* \n",
|
||||
"\n",
|
||||
"It has been a long time since college so I've forgotten many of my assignments, but there is one I do remember. It was something like this:\n",
|
||||
"\n",
|
||||
"## The Concordance Assignment\n",
|
||||
"\n",
|
||||
"> *Using the [`Snobol`](http://www.snobol4.org/) language, read lines of text from the standard input and print a *concordance*, which is an alphabetized list of words in the text, with the line number(s) where each word appears. Words with different capitalization (like \"A\" and \"a\") should be merged into one entry.*\n",
|
||||
"\n",
|
||||
"After studying Snobol a bit, I realized that the intended solution was along these lines:\n",
|
||||
"\n",
|
||||
"1. Create an empty `dict`, let's call it `entries`.\n",
|
||||
"2. Read the lines of text (tracking the line numbers), split them into words, and build up the string of line numbers for each word with something like\n",
|
||||
"`entries[word] += line_number + \", \"`\n",
|
||||
"3. Convert the table into a two-dimensional `array` where each row has the two columns `[word, line_numbers]`.\n",
|
||||
"4. Write a function to sort the array alphabetically (`sort` was not built-in to Snobol).\n",
|
||||
"5. Write a function to print the array.\n",
|
||||
"\n",
|
||||
"That would probably be about 40 lines of code; an easy task, the main point of which was just to get familiar with a new and different programming language. But I noticed three interesting things about Snobol:\n",
|
||||
"\n",
|
||||
"* There is an *indirection* operator, `$`, so the two statements `word = \"A\"; $word = \"1\"` are equivalent to `A = \"1\"`.\n",
|
||||
"* Uninitialized variables are treated as the empty string, so `$word += \"1\"` works even if we haven't seen `$word` before.\n",
|
||||
"* When the program ends, the Snobol interpreter automatically\n",
|
||||
"prints the values of\n",
|
||||
"every variable, sorted alphabetically, as a debugging aid.\n",
|
||||
"\n",
|
||||
"That means I can do away with the `dict` and the `array`, eliminating steps 1, 3, 4, and 5, and just do step 2! My entire program is as follows (to make it easier to understand I use Python syntax, except for Snobol's `$word` indirection:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"program = \"\"\"\n",
|
||||
"for num, line in enumerate(input):\n",
|
||||
" for word in re.findall(r\"\\w+\", line.upper()):\n",
|
||||
" $word += str(num) + \", \"\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"That's 3 lines, not 40, and it works! Actually, it wasn't quite perfect, because the post-mortem printout included the values of the variables `num, line,` and `word`. So, reluctantly, I had to increase the program's line count by 33%:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"program = \"\"\"\n",
|
||||
"for num, line in enumerate(input):\n",
|
||||
" for word in re.findall(r\"\\w+\", line.upper()):\n",
|
||||
" $word += str(num) + \", \"\n",
|
||||
"del num, line, word\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To prove it works, I'll write a mock Snobol/Python interpreter, which is really just a call to the Python interpreter, `exec`, except that it does three Snobol-ish things: (1) the `context` in which the program is executed is a `defaultdict(str)`, which means that variables default to the empty string. The context is pre-loaded with builtins. (2) Any `$word` in the program is rewritten as `_context[word]`, which enables\n",
|
||||
"indirection. (3) After the `exec` completes, the user-defined variables (but not the built-in ones) and their values are printed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from collections import defaultdict\n",
|
||||
"import re\n",
|
||||
"\n",
|
||||
"def snobol(program, data=''):\n",
|
||||
" \"\"\"A Python interpreter with three Snobol-ish features:\n",
|
||||
" (1) variables default to '', (2) $word indirection, (3) post-mortem printout.\"\"\"\n",
|
||||
" context = defaultdict(str, vars(__builtins__))\n",
|
||||
" context.update(_context=context, re=re, input=data.splitlines())\n",
|
||||
" builtins = set(context) # The builtin functions won't appear in the printout\n",
|
||||
" program = re.sub(r'\\$(\\w+)', r'_context[\\1]', program) # \"$word\" => \"_context[word]\"\n",
|
||||
" exec(program, context)\n",
|
||||
" print('-' * 50)\n",
|
||||
" for name in sorted(context):\n",
|
||||
" if name not in builtins and not name.startswith('_'):\n",
|
||||
" print('{:10} = {}'.format(name, context[name]))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can run my program on some data:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"--------------------------------------------------\n",
|
||||
"A = 1, \n",
|
||||
"AND = 2, 4, \n",
|
||||
"DIDDY = 1, 1, 1, 2, 2, 2, \n",
|
||||
"DO = 1, 1, 2, 2, \n",
|
||||
"DOWN = 1, \n",
|
||||
"DUM = 1, 2, \n",
|
||||
"FEET = 2, \n",
|
||||
"FINE = 3, 3, 4, \n",
|
||||
"FINGERS = 2, \n",
|
||||
"GOOD = 3, 3, 4, \n",
|
||||
"HER = 2, 2, \n",
|
||||
"I = 4, \n",
|
||||
"JUST = 1, \n",
|
||||
"LOOKED = 3, 3, 3, 3, 4, 4, \n",
|
||||
"LOST = 4, \n",
|
||||
"MIND = 4, \n",
|
||||
"MY = 4, \n",
|
||||
"NEARLY = 4, \n",
|
||||
"SHE = 1, 3, 3, 4, 4, \n",
|
||||
"SHUFFLIN = 2, \n",
|
||||
"SINGIN = 1, 2, \n",
|
||||
"SNAPPIN = 2, \n",
|
||||
"STREET = 1, \n",
|
||||
"THE = 1, \n",
|
||||
"THERE = 1, \n",
|
||||
"WAH = 1, 2, \n",
|
||||
"WALKIN = 1, \n",
|
||||
"WAS = 1, \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"data = \"\"\"\n",
|
||||
"There she was just a-walkin' down the street, singin' \"Do wah diddy diddy dum diddy do\"\n",
|
||||
"Snappin' her fingers and shufflin' her feet, singin' \"Do wah diddy diddy dum diddy do\"\n",
|
||||
"She looked good (looked good), she looked fine (looked fine)\n",
|
||||
"She looked good, she looked fine and I nearly lost my mind\n",
|
||||
"\"\"\"\n",
|
||||
"\n",
|
||||
"snobol(program, data)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Looks good to me! But sadly, the grader did not agree, complaining that my program was not extensible: what if I wanted to cover two or more files in one run? What if I wanted the output to have a slightly different format? I argued that [YAGNI](https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it), but the grader wasn't impressed so I got points taken off.\n",
|
||||
"\n",
|
||||
"# TFW you flunk AI\n",
|
||||
"\n",
|
||||
"Here's another example that I had completely forgotten about until 2016, when I was cleaning out some old files and came across my old college transcript. It turns out that *I flunked an AI course!* (Or at least, didn't complete it.) This course was\n",
|
||||
"offered by Prof. Richard Millward in the Cognitive Science program. I certainly remember a lot of influential material from this class: we read David Marr, we read Winston's just-published *Psychology of Computer Vision*, we read a chapter from Duda and Hart which was then only a few years old. The things I learned in that course have stuck with me for decades, but one thing that didn't stick is that my transcript said I never completed the course! I'm not sure what happened. I did an independent study with Ulf Grenander that semester; maybe I had to drop the AI course to fit that in. I don't know. \n",
|
||||
"\n",
|
||||
"So in both the concordance program and the Cognitive Science AI class, I had a great experience and I learned a lot, even if it wasn't well-reflected in official credit. The moral is: look for the good experiences, and don;t worry about the official credit.\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"collapsed": true
|
||||
},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.5.3"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 2
|
||||
}
|
Loading…
Reference in New Issue
Block a user