{ "cells": [ { "cell_type": "markdown", "id": "828d21f7-0a3b-4024-b3d5-306ea56a3214", "metadata": {}, "source": [ "
Peter Norvig
\n", "\n", "# (How to Write a (Lisp) Interpreter (in Python))\n", "\n", "This notebook describes how (and why) to implement computer language interpreters in general, and in particular an interpreter for most of the [**Scheme**](https://www.scheme.org/) dialect of [**Lisp**](https://en.wikipedia.org/wiki/Lisp_(programming_language%29). I call my language and interpreter **Lispy** because it is Lisp implemented in Python. \n", "\n", "Why should interpreters and compilers matter to you? As [Steve Yegge said](https://steve-yegge.blogspot.com/2007/06/rich-programmer-food.html?), \"If you don't know how compilers work, then you don't know how computers work.\" Yegge describes 8 problems that can be solved with compilers (or equally well with interpreters, or with Yegge's typical heavy dosage of cynicism).\n", "\n", "## Syntax and Semantics of Programs\n", "\n", "The **syntax** of a language is the arrangement of characters to form correct statements or expressions. For example, in the language of mathematical expressions (and in many programming languages and handheld calculators), the syntax for computing one plus two is \"1 + 2\". The **semantics** of a language determines what it means: what computations it describes, and ultimately what answer(s) it computes. We say that \"1 + 2\" *evaluates* to 3, and write that as \"1 + 2\" ⇒ 3. \n", "\n", "If you are familiar with languages such as Python or Java, you may find Scheme syntax to be unusual. Consider:\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
JavaLisp
\n", "
if (x.val() > 0) {\n",
    "    return fn(A[i] + 3 * i,\n",
    "              new String[] {\"one\", \"two\"});\n",
    "}
\n", "
\n", "
(if (> (val x) 0)\n",
    "    (fn (+ (aref A i) (* 3 i))\n",
    "        (quote (one two))))
\n", "
\n", "\n", "Java has a wide variety of syntactic conventions (keywords, infix operators, four kinds of brackets, operator precedence, dot notation, quotes, commas, semicolons), but Scheme syntax is much simpler:\n", "Scheme programs consist solely of expressions; there is no statement/expression distinction.\n", "Numbers (e.g. 1) and symbols (e.g. A) are called **atomic expressions**; they cannot be broken into pieces. These are similar to their Java counterparts, except that in Scheme, operators such as `+` and `>` are symbols too, and are treated the same way as `A` and `fn`.\n", "Everything else is a **list expression**: a \"(\", followed by zero or more expressions, followed by a \")\". The first element of the list expression determines what it means:\n", "- A list starting with a keyword, e.g. `(if ...)`, is a **special form**; the meaning depends on the keyword.\n", "- A list starting with a non-keyword, e.g. `(max x y)`, is a function call: the function `max` is applied to the arguments `x` and `y` to compute a value.\n", "\n", "The beauty of Scheme is that the full language only needs 5 keywords and 8 syntactic forms. In comparison, Python has 35 keywords and 110 syntactic forms, and Java has 50 keywords and 133 syntactic forms. All those parentheses may seem intimidating, but Scheme syntax has the virtues of simplicity and consistency. (Some have joked that \"Lisp\" stands for \"**L**ots of **I**rritating **S**illy **P**arentheses\"; I think it stand for \"**L**isp **I**s **S**yntactically **P**ure\".)\n", "\n", "\n", "# Language 1: Lispy Calculator\n", "\n", "We won't tackle all of Scheme right away; instead we'll start with a subset of Scheme I call **Lispy Calculator**. Lispy Calculator lets you do any computation you could do on a typical calculator—as long as you are comfortable with prefix notation. And while many calculators let you store and retrieve a value from a fixed set of registers (e.g. A, B, or C), Lispy Calculator let's you define and use any number of variables with any names you choose. Here's an example program that computes the area of a circle of radius 10, using the formula π r2\n", "\n", " (begin\n", " (define r 10)\n", " (* pi (* r r)))\n", "\n", "\n", "Here is a table of all the allowable expressions in the Lispy Calculator Language. In the Syntax column of this table, *symbol* must be a symbol, *number* must be an integer or floating point number, and the other italicized words can be any expression. The notation *exp...* means zero or more repetitions of *exp*.\n", "\n", "|Expression\t|Syntax\t|Example|Semantics|\n", "|-----------|-------|-------|-------------|\n", "|constant \t|*number*\t|`12` or `-3.45e+6`|A number evaluates to itself.|\n", "|variable |\t*symbol*\t|`r`|A symbol is interpreted as a variable name; its value is the variable's value.|\n", "|definition\t|`(define` *symbol exp*`}`\t|`(define r 10)`|Define a new variable and give it the value of the expression *exp*.|\n", "|procedure call\t|`(`*proc exp*...`)`\t|`(sqrt (* 2 8))` ⇒ 4.0|Proc's value (a function) is applied to the argument values.|\n", "\n", "\n", "Let's get some imports out of the way, and be explicit about how Scheme objects are represented in Python:" ] }, { "cell_type": "code", "execution_count": 1, "id": "e1c3cef0-d091-433e-a2c0-0959df3cee0d", "metadata": {}, "outputs": [], "source": [ "from numbers import Number\n", "import math\n", "import operator as op\n", "\n", "Symbol = str # A Scheme symbol is implemented as a Python str\n", "Atom = Symbol | Number # A Scheme atom is a Symbol or Number\n", "List = list # A Scheme list is implemented as a Python list\n", "Exp = Atom | List # A Scheme expression is an Atom or List\n", "Env = dict # A Scheme environment is a dictionary mapping of {variable: value}" ] }, { "cell_type": "markdown", "id": "b5c1ff1c-15e1-47d2-bfc8-a008ecc5ff13", "metadata": {}, "source": [ "## The core of Lisp: eval \n", "\n", "Here is the core of the interpreter, `eval`. It takes as input an expression, `exp`, and an **environment** that specifies the values of variables. It returns the value of the expression. These few lines are what what [Alan Kay called](https://queue.acm.org/detail.cfm?id=1039523) \"the Maxwell's Equations of Software.\"\n" ] }, { "cell_type": "code", "execution_count": 2, "id": "3b081873-6ae7-4d73-830d-c441cd196cbc", "metadata": {}, "outputs": [], "source": [ "def eval(exp: Exp, env: Env) -> object:\n", " \"\"\"Evaluate an expression in an environment.\"\"\"\n", " match exp:\n", " case Number(): # number evaluates to itself \n", " return exp\n", " case Symbol(): # variable evaluates to its value in environment\n", " return env[exp]\n", " case ('define', Symbol(name), val): # definition adds name to the environment\n", " env[name] = eval(val, env)\n", " return name\n", " case (proc, *args): # procedure call\n", " func = eval(proc, env)\n", " vals = [eval(arg, env) for arg in args]\n", " return func(*vals)" ] }, { "cell_type": "markdown", "id": "e9ae14da-9668-405a-8c5c-2a7186e98cb4", "metadata": {}, "source": [ "## Global Environment\n", "\n", "We mentioned that an **environment** is a mapping from variable names to their values. We will define a default global environment. `ENV`, containing values for the names of a bunch of standard functions like `sqrt` and `max`, as well as operators like `+` and `>`, which are implemented as procedures in Lisp. (Scheme's name for `print` is `display`; the function `unparse` will be defined later.)" ] }, { "cell_type": "code", "execution_count": 3, "id": "ce21a511-1089-4d70-9c9a-8825c3d63b17", "metadata": {}, "outputs": [], "source": [ "ENV = {\n", " **vars(math), # sqrt, sin, cos, etc.\n", " '+':op.add, '-':op.sub, '*':op.mul, '/':op.truediv, \n", " '>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '=':op.eq, \n", " 'eq?': op.is_, 'equal?': op.eq,\n", " 'abs': abs,\n", " 'append': op.add, \n", " 'apply': lambda proc, args: proc(*args),\n", " 'begin': lambda *x: x[-1],\n", " 'cons': lambda x,y: [x] + y,\n", " 'display': lambda x: print(unparse(x)),\n", " 'expt': pow,\n", " 'first': lambda x: x[0],\n", " 'length': len, \n", " 'list': lambda *x: List(x), \n", " 'list?': lambda x: isinstance(x, list), \n", " 'map': lambda f, *args: list(map(f, *args)),\n", " 'max': max, \n", " 'min': min,\n", " 'not': op.not_,\n", " 'null?': lambda x: x == [], \n", " 'number?': lambda x: isinstance(x, Number), \n", " 'procedure?': callable,\n", " 'rest': lambda x: x[1:], \n", " 'round': round,\n", " 'symbol?': lambda x: isinstance(x, Symbol),\n", " }" ] }, { "cell_type": "markdown", "id": "edafbbc3-ae0b-4705-859e-3fcf2f10fdad", "metadata": {}, "source": [ "(*Note:* because I am not implementing all the features of Scheme (such as [continuations](https://groups.csail.mit.edu/mac/projects/info/schemedocs/ref-manual/html/scheme_122.html)), I can get away with defining `begin` as a function rather than a special form.)\n", "\n", "## Parsing\n", "\n", "How do we get from a sequence of characters to the abstract syntax tree that `eval` expects? The function `parse` does the job, in two steps: \n", "1. **Lexical analysis**: the function `tokenize` breaks the characters into tokens (such as the keyword `\"if\"` or the number `\"10\"`).\n", "2. **Syntactic analysis**: the function `parse_tokens` converts the tokens into an expression." ] }, { "cell_type": "code", "execution_count": 4, "id": "209b7574-1cad-469e-b735-06f8e18d699d", "metadata": {}, "outputs": [], "source": [ "def parse(program: str) -> Exp:\n", " \"\"\"Read a Scheme expression from a string.\n", " First split the program into tokens, then read from the token list.\"\"\"\n", " return parse_tokens(tokenize(program))" ] }, { "cell_type": "markdown", "id": "1b6a1fb3-4765-42c2-9278-a374fa30f5af", "metadata": {}, "source": [ "There are many tools for lexical analysis (such as Mike Lesk and Eric Schmidt's [lex](https://en.wikipedia.org/wiki/Lex_%28software%29)), most of which define tokens as a class containing a token kind and a token string. But Lisp is so simple that there are really only three types of tokens: left paren, right paren, and everything else. So `str.split` can do the job (with a little help):" ] }, { "cell_type": "code", "execution_count": 5, "id": "48875a7a-3c86-4322-9307-3592ab327924", "metadata": {}, "outputs": [], "source": [ "def tokenize(chars: str) -> list[str]:\n", " \"\"\"Convert a string of characters into a list of tokens.\n", " (Put spaces around parens, then split on spaces.)\"\"\"\n", " return chars.replace('(', ' ( ').replace(')', ' ) ').split()" ] }, { "cell_type": "markdown", "id": "64d2de32-6e7f-43ba-b6db-eb1eff397ff2", "metadata": {}, "source": [ "`parse_tokens` looks at the first token; if it is a `)` that's a syntax error. If it is a `(`, then we start building up a list of sub-expressions until we hit a matching `)`. Any non-parenthesis token must be an atom: first try to interpret it as a number, and failing that, it must be a symbol. " ] }, { "cell_type": "code", "execution_count": 6, "id": "c2677c70-f08f-49ff-b167-0cb492ec124f", "metadata": {}, "outputs": [], "source": [ "def parse_tokens(tokens: list[str]) -> Exp:\n", " \"\"\"Read an expression from a list of tokens, mutating the list.\"\"\"\n", " if not tokens:\n", " raise SyntaxError('unexpected end of expression')\n", " match (token := tokens.pop(0)):\n", " case ')':\n", " raise SyntaxError('unexpected \")\"')\n", " case '(':\n", " result = []\n", " while tokens[0] != ')':\n", " result.append(parse_tokens(tokens))\n", " tokens.pop(0) # pop off the closing ')'\n", " return result\n", " case _:\n", " try:\n", " n = float(token)\n", " return int(n) if n.is_integer() else n\n", " except ValueError:\n", " return token # symbol" ] }, { "cell_type": "markdown", "id": "a8191faa-a8ad-41dc-a1fa-a3e805b24a28", "metadata": {}, "source": [ "## Homoiconicity\n", "\n", "One of the defining features of Lisp is [**homoiconicity**](https://en.wikipedia.org/wiki/Homoiconicity), a fancy word derived from the Greek for \"same representation\" that refers to a language in which the internal representation of a program is the same as the external representation. In other words, a Lisp program is just a list, so all the functions for reading, manipulating and printing lists apply equally to programs. We've already defined `parse` to convert a string into an internal representation; now we'll define `unparse` to reverse the process:" ] }, { "cell_type": "code", "execution_count": 7, "id": "62c5a9f4-5969-4646-ac64-fbc34027cadc", "metadata": {}, "outputs": [], "source": [ "def unparse(exp: Exp) -> str:\n", " \"Convert a Python object back into a Scheme-readable string.\"\n", " match exp:\n", " case List(): return '(' + ' '.join(map(unparse, exp)) + ')'\n", " case _: return str(exp)" ] }, { "cell_type": "markdown", "id": "69764e26-0c13-4b05-86aa-cec9605865b6", "metadata": {}, "source": [ "We can see that parse and unparse are inverses:" ] }, { "cell_type": "code", "execution_count": 8, "id": "9c5d310b-832e-494e-a141-b8eeaa221206", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'(lambda (A i) (fn (+ (aref A i) (* 3 i))))'" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unparse(parse(\"(lambda (A i) (fn (+ (aref A i) (* 3 i))))\"))" ] }, { "cell_type": "markdown", "id": "cf6b5b61-f523-486a-b600-7a9aadc4c824", "metadata": {}, "source": [ "Python has a module, `ast` (for \"abstract syntax tree\") that makes it possible to manipulate programs, but Python is not homoiconic. The internal representation of a program is quite different from the external representation:" ] }, { "cell_type": "code", "execution_count": 9, "id": "0617fa61-7f37-4623-b431-fa44a3edb5e4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Module(\n", " body=[\n", " Expr(\n", " value=Lambda(\n", " args=arguments(\n", " args=[\n", " arg(arg='A'),\n", " arg(arg='i')]),\n", " body=Call(\n", " func=Name(id='fn', ctx=Load()),\n", " args=[\n", " BinOp(\n", " left=Subscript(\n", " value=Name(id='A', ctx=Load()),\n", " slice=Name(id='i', ctx=Load()),\n", " ctx=Load()),\n", " op=Add(),\n", " right=BinOp(\n", " left=Constant(value=3),\n", " op=Mult(),\n", " right=Name(id='i', ctx=Load())))])))])\n" ] } ], "source": [ "import ast\n", "print(ast.dump(ast.parse(\"lambda A, i: fn(A[i] + 3 * i)\"), indent=4))" ] }, { "cell_type": "markdown", "id": "c568f646-4f7e-4c13-806d-5bfc7fc98933", "metadata": {}, "source": [ "## Batch Processing\n", "\n", "In **batch processing** an entire program is read, parsed, and evaluated in one step, with no human in the loop." ] }, { "cell_type": "code", "execution_count": 10, "id": "3a96b511-30da-407b-9137-4b79bc40c367", "metadata": {}, "outputs": [], "source": [ "def batch(program: str) -> None:\n", " \"\"\"Parse the program, evaluate it, and print the result.\"\"\"\n", " print(unparse(eval(parse(program), ENV)))" ] }, { "cell_type": "markdown", "id": "8aa89ec1-7f62-4797-8e70-da97e74a7b88", "metadata": {}, "source": [ "Here is a sample program and the result of running it using `batch`:" ] }, { "cell_type": "code", "execution_count": 11, "id": "69509c3a-2c1e-435e-8563-9a9a82cfc75f", "metadata": {}, "outputs": [], "source": [ "program1 = \"\"\"\n", "(begin \n", " (define r 10)\n", " (* pi (* r r))) \"\"\"" ] }, { "cell_type": "code", "execution_count": 12, "id": "6447f9b9-a3f8-4465-a735-1be81217b94d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "314.1592653589793\n" ] } ], "source": [ "batch(program1)" ] }, { "cell_type": "markdown", "id": "3ec8b0e9-e384-48fa-8539-2794114f2275", "metadata": {}, "source": [ "We can also see the intermediate steps of tokenizing and parsing the program:" ] }, { "cell_type": "code", "execution_count": 13, "id": "75485bb1-9625-470e-bfa2-3e3c39cb43c0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['(',\n", " 'begin',\n", " '(',\n", " 'define',\n", " 'r',\n", " '10',\n", " ')',\n", " '(',\n", " '*',\n", " 'pi',\n", " '(',\n", " '*',\n", " 'r',\n", " 'r',\n", " ')',\n", " ')',\n", " ')']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tokenize(program1)" ] }, { "cell_type": "code", "execution_count": 14, "id": "1fba56d9-e97b-4919-9a51-31e94c535361", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['begin', ['define', 'r', 10], ['*', 'pi', ['*', 'r', 'r']]]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse(program1)" ] }, { "cell_type": "markdown", "id": "ffe3866c-76ee-4a6e-b0d2-efb18b544c1e", "metadata": {}, "source": [ "## Interactive Processing\n", "\n", "One of Lisp's great legacies is the notion of an interactive loop: a way for a programmer to enter an expression, see the results, and then think of something new to try. This facilitates exploratory programming: instead of having to design every aspect of a complete program ahead of time, the programmer can experiment, learning as they go, step by step. So let's define the function `repl` (which stands for read-eval-print-loop):" ] }, { "cell_type": "code", "execution_count": 15, "id": "037afa47-c35e-416f-affb-aa219cfff935", "metadata": {}, "outputs": [], "source": [ "def repl(prompt='\\nlispy> ') -> None:\n", " \"\"\"A read-eval-print loop.\"\"\"\n", " print('lispy> read-eval-print loop – Type exit to exit')\n", " while (expr := input(prompt)) != 'exit':\n", " batch(expr)" ] }, { "cell_type": "markdown", "id": "28cccf9e-0cf3-426b-9f12-61ac165f1e31", "metadata": {}, "source": [ "Here is an example run of `repl()`:\n", " \n", " lispy> read-eval-print loop – Type exit to exit\n", " lispy> (define r 10)\n", " r\n", " \n", " lispy> (* pi (* r r))\n", " 314.159265359\n", " \n", " lispy> (if (> (* 11 11) 120) (* 7 6) oops)\n", " 42\n", " \n", " lispy> (list (+ 1 1) (+ 2 2) (* 2 3) (expt 2 3) (expt 2 (expt 2 2)))\n", " (2 4 6 8 16\n", "\n", " lispy> exit\n", "\n", "You can experiment with `repl()` yourself by deleting the `#` and running the following cell. I left it commented out so that when you do the \"Run All Cells\" command on this notebook, it runs all the way through, without pausing to ask you to type in some Scheme expressions." ] }, { "cell_type": "code", "execution_count": 16, "id": "5286fcff-0152-4da7-8b70-d5ba1883ac4d", "metadata": {}, "outputs": [], "source": [ "#repl()" ] }, { "cell_type": "markdown", "id": "ed98aea1-201a-4532-9cf4-56aff0ba74e6", "metadata": {}, "source": [ "# Language 2: Full Lispy\n", "\n", "We will now extend our language with four new special forms, and one variant of the old `define` special form. This gived us a much more nearly-complete Scheme subset:\n", "\n", "|Expression\t|Syntax| Example|\tSemantics|\n", "|-----------|------|--------|------------|\n", "|conditional\t|(`if` *test then_part else_part*`)`|`(if (< x 0) (- x) x)`|\tEvaluate *test*; if true, evaluate and return *then part*; otherwise *else part*.|\n", "|quotation\t|`(quote` *exp*`)`| `(quote (+ 1 2))` ⇒ `(+ 1 2)`|\tReturn the exp literally; do not evaluate it.|\n", "|assignment\t|`(set!` *symbol exp*`)`| `(set! r2 (* r r))`|\tEvaluate *exp* and assign that value to *symbol*.|\n", "|procedure\t|`(lambda (`*symbol...*`)` *exp*`)`|`(lambda (r) (* pi (* r r)))`|\tCreate an anonymous procedure.|\n", "|definition | `(define (`*symbol*...`)` *body*`)`|`(defun (add1 x) (+ x 1)`| Define a named procedure.|\n", "\n", "- The `if` special form is similar to the `(x if test else y)` syntax in Python, although only `False` counts as false in Scheme.\n", "- The `quote` special form allows you to build a literal data structure.\n", "- The `set!` special form allows you to update the value of a previously-defined variable. This is different from `define`, which introduces a new variable in the curent environment (and sets its initial value).\n", "- The `lambda` special form (an obscure name from Alonzo Church's [lambda calculus](https://en.wikipedia.org/wiki/Lambda_calculus)) creates a procedure (without giving it a name).\n", "- The new option for `define` is just a shortcut for a regular `define` of a `lambda` expression.\n", "\n", "There are two equivalent ways of defining a procedure and giving it a name. Consider:\n", "\n", " (define (circle-area r) (* pi (* r r)))\n", "\n", " (define circle-area (lambda (r) (* pi (* r r)))\n", "\n", "Either way, `circle-area` is defined to take as its value a procedure that refers to the global variables `pi` and `*`, and takes a single parameter, `r`. Now we can call the procedure like this:\n", "\n", " (circle-area (+ 5 5))\n", "\n", "We want this call to return the value of `(* pi (* r r))` with `r` set to 10. But it wouldn't do to set `r` to 10 in the global environment. What if we were using `r` for some other purpose? Instead, we want to arrange for there to be a **local variable** named `r` that is only used during this call to `circle-area`. The process for calling a procedure introduces these new local variable(s), binding each symbol in the parameter list of the function to the corresponding value in the argument list of the function call. In this case, the result of the call is 314.159265359.\n", " \n", "\n", "## Local Variables and Procedures\n", "\n", "To handle local variables, we will **nest** envronments. Local variables are defined in an environment that is \"inside\" another environment. We will use the convention: `env['_outer']` to refer to the outer environment of the nested environment `env`. When we evaluate (`circle-area (+ 5 5))`, we will fetch the procedure body, `(* pi (* r r))`, and evaluate it in an environment that has `r` as the sole local variable (with value 10), but also has the global environment as the `_outer` environment; it is there that we will find the values of `*` and `pi`. In the diagram, the inner environment is blue and the outer red:\n", "\n", "

\n", "pi: 3.141592653589793\n", "
*: <built-in function mul>\n", "
...\n", "
\n", "\n", "
r: 10\n", "
\n", "
\n", "\n", "When we look up a variable in such a nested environment, we look first at the innermost level, but if we don't find the variable name there, we move to the next outer level. \n", "\n", "Here is the definition of the Procedure class:" ] }, { "cell_type": "code", "execution_count": 17, "id": "1d9cc9fb-32df-41dc-9c6c-863c7a5ba5a6", "metadata": {}, "outputs": [], "source": [ "from dataclasses import dataclass\n", "\n", "@dataclass\n", "class Procedure(object):\n", " \"\"\"A user-defined Scheme procedure.\"\"\"\n", " parms: list[Symbol]\n", " body: Exp\n", " env: Env\n", " def __call__(self, *args) -> object: \n", " env = Env(zip(self.parms, args), _outer=self.env)\n", " return eval(self.body, env)\n", " def __repr__(self) -> str: \n", " return f''" ] }, { "cell_type": "markdown", "id": "7734c900-7d9c-4770-987a-30830989175d", "metadata": {}, "source": [ "We see that a procedure has three components: a list of parameter names, a body expression, and an environment that tells us what other variables are accessible from the body. For a procedure defined at the top level this will be the global environment, but it is also possible for a procedure to refer to the local variables of the environment in which it was defined (**not** the environment in which it is called).\n", "\n", "The function `find` is used to find the right environment for a variable: starting with the inner one and going out, find the first environment that mentions the variable name." ] }, { "cell_type": "code", "execution_count": 18, "id": "22fd9960-b288-4a01-94fe-faab8e8da8bf", "metadata": {}, "outputs": [], "source": [ "def find(var: Symbol, env: Env) -> Env:\n", " \"\"\"Find the environment that contains the variable `var`.\"\"\"\n", " return env if (var in env) else find(var, env['_outer'])" ] }, { "cell_type": "markdown", "id": "3d0599c6-3a35-4fd8-b179-c970e8a36c4d", "metadata": {}, "source": [ "To see how these all go together, here is the new definition of `eval`:" ] }, { "cell_type": "code", "execution_count": 19, "id": "afa69b35-bac3-459c-a0cc-b94656ab474c", "metadata": {}, "outputs": [], "source": [ "def eval(exp: Exp, env) -> object:\n", " \"\"\"Evaluate an expression in an environment.\"\"\"\n", " match exp:\n", " case Symbol(): # variable reference\n", " return find(exp, env)[exp]\n", " case Number(): # constant \n", " return exp\n", " case ('if', test, then_, else_): # conditional evaluates one branch or the other\n", " branch = (else_ if eval(test, env) is False else then_)\n", " return eval(branch, env)\n", " case ('define', (name, *parms), body): # procedure definition\n", " env[name] = eval(['lambda', parms, body], env)\n", " return name\n", " case ('define', Symbol(name), val): # regular definition\n", " env[name] = eval(val, env)\n", " return name\n", " case ('quote', constant): # constant expression\n", " return constant\n", " case ('set!', symbol, val): # variable assignment\n", " find(symbol, env)[symbol] = eval(val, env)\n", " return None\n", " case ('lambda', parms, body): # create an anonymous procedure\n", " return Procedure(parms, body, env)\n", " case (proc, *args): # procedure call\n", " func = eval(proc, env)\n", " vals = [eval(arg, env) for arg in args]\n", " return func(*vals)" ] }, { "cell_type": "markdown", "id": "73750dec-a8e0-41fb-83ae-3c7740103659", "metadata": {}, "source": [ "For example:" ] }, { "cell_type": "code", "execution_count": 20, "id": "fb398ac0-cd7f-45e2-9a8c-1ea15bc7b4e6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "314.1592653589793\n" ] } ], "source": [ "batch(\"\"\"\n", "(begin \n", " (define (circle-area r) (* pi (* r r)))\n", " (circle-area (+ 5 5)))\"\"\")" ] }, { "cell_type": "markdown", "id": "1a2ccfcd-326b-44f0-b041-da657dc042a0", "metadata": {}, "source": [ "We now have a language with variables, conditionals, sequential execution, and procedures with recursive calls. That makes our language Turing-complete. If you are familiar with other languages, you might think that a `while` or `for` loop would be needed, but Scheme manages to do without these just fine. In Scheme you iterate by defining recursive functions. The Scheme report says \"Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language.\" \n", "\n", "## How Small/Complete/Good/Fast is Lispy?\n", " \n", "In which we judge Lispy on several criteria:\n", "- **Small**: Lispy is *very* small: about 120 lines or 4K of source code. (An earlier version was just 90 lines, but was perhaps a bit too terse.) The smallest version of my Scheme in Java, [Jscheme](http://norvig.com/jscheme.html) was 1664 lines and 57K of source. Jscheme was originally called SILK (Scheme in Fifty Kilobytes), but I only kept under that limit by counting bytecode rather than source code. Lispy does much better; I think it meets Alan Kay's 1972 [claim](http://gagne.homedns.org/~tgagne/contrib/EarlyHistoryST.html) that *you could define the \"most powerful language in the world\" in \"a page of code.\"* (if you use a small font). However, I think Alan would disagree, because he would count the Python compiler as part of the code, putting me well over a page.\n", "- **Complete**: Lispy is not very complete compared to the [Scheme standard](https://standards.scheme.org/). Some major shortcomings:\n", " - **Syntax**: Missing comments, quote and quasiquote notation, # literals, the derived\n", " expression types (such as `cond` and `let`), and dotted list notation.\n", " - **Semantics**: Missing `call/cc` and tail recursion. \n", " - **Data Types**: Missing strings, characters, booleans, ports,\n", " vectors, exact/inexact numbers. A Scheme list should actually be a custom data class, not a Python list.\n", " - **Procedures**: Missing over 100 primitive procedures.\n", " - **Error recovery**: Lispy does not attempt to detect,\n", " reasonably report, or recover from errors. Lispy expects the\n", " programmer to be perfect. \n", "- **Good**: That's up to the readers to decide. I think that Lispy is good for my purpose of explaining Lisp interpreters. It is not a viable choice for serious software development.\n", "- **Fast**: Lispy computes (factorial 100) in less than a millisecond. That's fast enough for me.

" ] }, { "cell_type": "code", "execution_count": 21, "id": "5bcc6a7f-261b-4156-92d3-c771ecfcd109", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000\n", "CPU times: user 395 μs, sys: 200 μs, total: 595 μs\n", "Wall time: 521 μs\n" ] } ], "source": [ "%%time\n", "batch(\"\"\"\n", "(begin \n", " (define (factorial n) \n", " (if (<= n 1) \n", " 1 \n", " (* n (factorial (- n 1)))))\n", " (factorial 100))\"\"\")" ] }, { "cell_type": "markdown", "id": "802789ab-2f1c-4fb0-a79b-c7621a851e81", "metadata": {}, "source": [ "## More Example Lisp Programs" ] }, { "cell_type": "code", "execution_count": 22, "id": "f8728be6-2423-487f-a44a-49a50411d363", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(count 3 4)\n" ] } ], "source": [ "batch(\"\"\"\n", "(list \n", " (define (count item L) \n", " (if (null? L)\n", " 0\n", " (+ (equal? item (first L)) (count item (rest L)))))\n", " (count 0 (list 0 1 2 3 0 0))\n", " (count (quote the) (quote (the more the merrier the bigger the better))))\"\"\")" ] }, { "cell_type": "code", "execution_count": 23, "id": "af58a444-c684-481c-a74b-1ad6bb49459b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(square compose repeat 10 81 65536 2.0)\n" ] } ], "source": [ "batch(\"\"\"\n", "(list\n", " (define (square x) (* x x))\n", " (define (compose f g) (lambda (x) (f (g x))))\n", " (define (repeat f) (compose f f))\n", " ((compose round sqrt) 101)\n", " ((repeat square) 3)\n", " ((repeat (repeat square)) 2)\n", " ((repeat (repeat sqrt)) (pow 2 16)))\"\"\")" ] }, { "cell_type": "code", "execution_count": 24, "id": "5dba7073-33d3-4f39-a192-5e0ccdc42f8a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765)\n" ] } ], "source": [ "batch(\"\"\"\n", "(begin\n", " (define (fib n) \n", " (if (< n 2) \n", " 1 \n", " (+ (fib (- n 1)) (fib (- n 2)))))\n", " (define (range start stop)\n", " (if (= start stop) \n", " (quote ()) \n", " (cons start (range (+ start 1) stop))))\n", " (map fib (range 0 20)))\"\"\")" ] }, { "cell_type": "markdown", "id": "5f7fcfb8-ef93-4dfb-af8b-43030ab74eaa", "metadata": {}, "source": [ "## True Story\n", "\n", "To back up the idea that it can be helpful to know how\n", "interpreters work, here's a story. Way back in 1984 I was writing a\n", "Ph.D. thesis. This was before LaTeX, before Microsoft Word for Windows–we used\n", "[troff](https://en.wikipedia.org/wiki/Troff). Unfortunately, troff had no facility for forward references\n", "to symbolic labels: I wanted to be able to write \"As we will see on\n", "page @theorem-x\" and then write something like \"@(set theorem-x \\n%)\" in\n", "the appropriate place (the troff register \\n% holds the page number). My\n", "fellow grad student Tony DeRose felt the same need, and together we\n", "sketched out a simple Lisp program that would handle this as a preprocessor. However,\n", "it turned out that the Lisp we had at the time was good at reading\n", "Lisp expressions, but slow at reading 100 KB of characters one character at a time.\n", "\n", "From there Tony and I split paths. He reasoned that the hard part was\n", "the interpreter for expressions; he needed Lisp for that, but he knew\n", "how to write a tiny C routine\n", "for reading the characters one at a time, and how to link it into the Lisp\n", "program. I didn't know how to do that linking, but I reasoned that writing an\n", "interpreter for this trivial language (all it had was set variable,\n", "fetch variable, and string concatenate) was easy, so I wrote an\n", "interpreter in C. So, ironically, Tony wrote a Lisp program (with one small routine in C) because he was a\n", "C programmer, and I wrote a C program (that implements a hand-coded mini-interpreter) because I was a Lisp programmer.\n", "\n", "In the end, we both got our theses done (Tony, Peter).\n", "\n", "

Further Reading

\n", "\n", "Years ago, I showed how to write a semi-practical near-complete Scheme interpreter (one in [Java](https://norvig.com/jscheme.html) and one in [Common Lisp](https://github.com/norvig/paip-lisp/blob/main/docs/chapter22.md)). I also have another page describing a more advanced version of Lispy.\n", " \n", "To learn more about Scheme consult some of the fine books (by\n", " Friedman\n", " and Fellesein,\n", " Dybvig,\n", " Queinnec,\n", " Harvey and\n", " Wright or\n", " Sussman and Abelson),\n", " videos (by Abelson\n", " and Sussman),\n", " tutorials (by\n", " Dorai,\n", " PLT, or\n", " Neller),\n", " or the\n", " reference\n", " manual.\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.9" } }, "nbformat": 4, "nbformat_minor": 5 }