<div align="right" style="text-align:right"><i>Peter Norvig<br>May 2020</i></div>

# Equilength Number Expressions

The internet [claims](https://www.reddit.com/r/Showerthoughts/comments/3h8wpx/four_is_the_only_number_that_has_the_same_amount/):

> ***Four is the only number that has the same amount of letters as its value.*** 

We'll call "*four*" an **equilength number**. Languages other than English have different equilength numbers. In Italian there's "tre", in German  "vrei", in Spanish and Portuguese "cinco", and in Hanyu Pinyin Chinese "èr" and "sān". French does not have an equilength number. 

There are also **equilength number expressions** such as "two plus nine": its value is 11 and there are 11 letters in the expression (spaces and hyphens don't count). What other integers besides 11 have equilength expressions? In English and in other languages? This notebook will partially answer these questions.  

# Defining the Language of Numbers

I start by defining the namedtuple class `Language` and the convenience function `language`, which will be used to succintly declare the names of arithmetic operators and numbers. Use `language` like this:

    mini = language('Mini', {add: 'plus, and', mul: 'times'}, 
                    'zero, one, two, three, four')
    
This defines a language named `'Mini'` with two `operators`, addition (which can be spelled two ways, as `'plus'` or `'and'`) and multiplication (which can only be spelled `'times'`); and with five `integers`, denoted consecutively starting at zero. 

# The ExpTable Data Structure

Internally, the key data structure is something I'll call an expression table or `ExpTable`: a dict where each value is an *expression*: a string such as `"zero"` or `"(two plus nine)"`; and each key is a tuple of two integers: the numeric value of the expression and the number of letters in the expression. The integers for the mini language defined above form this table:

    {(0, 4): 'zero', (1, 3): 'one', (2, 3): 'two'}, (3, 5): 'three', (4, 4): 'four')
    
The key `(0, 4)` means that the expression `'zero'` has the value `0` and has `4` letters. I arrange things this way because I want to eventually build up equilength expressions, ones where the key is `(i, i)` for as many integers `i` as possible, and to do so I only need one table entry for each `(value, letters)` combination.
Let's implement that:

In [1]:
from collections import namedtuple
from operator    import add, sub, mul, neg, truediv as div

In [2]:
ExpTable = dict # A mapping of {(value, count_of_letters): "expression"}

Language = namedtuple('Language', 'name, operators, integers')

def language(name, operators, integers) -> Language:
    """E.g., language({add: 'plus, and', mul: 'times'}, 'zero, one, two')"""
    return Language(name, {op: split(operators[op]) for op in operators},
                    exptable(enumerate(split(integers))))
        
def exptable(items) -> ExpTable:
    """Convert an iterable of (value, "exp") pairs to {(value, letter_count): "exp"}"""
    return {(val, lettercount(exp)): exp for (val, exp) in items}

def lettercount(exp) -> int: return sum(ch.isalpha() for ch in exp)

def split(text, sep=',') -> list: 
    """Split text by `sep`, stripping whitespace."""
    return [word.strip() for word in text.split(',')]

In [3]:
mini = language('Mini', {add: 'plus, and', mul: 'times'}, 
                'zero, one, two, three, four')

# Building Number Expressions

I want to follow an approach similar to what I did in the [Making Numbers: Four 4s, Five 5s](Countdown.ipynb) notebook: take two entries from a table (like `{(2, 3): 'two'}` and `{(3, 5): 'three'}`) along with an operator, and combine them to make a larger expression (`{(5, 12): 'two plus three'}`). 

There is an infinite set of possible expressions, so  I will focus on a finite subset of expressions that have a chance of being equilength (although not afraid of making intermediate expressions that are not equilength).

I'll use the following strategy to build an `ExpTable` with `expressions(language, c)`:
  - Start with a copy of the table of integers from the language.
  - Repeat `c` times:
      - Add to the table all ways to `combine` an integer entry, an operator, and a table entry.
      - Add to the table all ways to `combine` a table entry, an operator, and an integer entry.
  - Given the table; use `equilength` to pull just the equilength expressions from the table.
        
    
Note this doesn't form all possible expressions: it gives me **branching** expression trees but not **bushy** expression trees. Consider the two trees below; each with eight leaves (integers) and seven internal nodes (operators):
- **Below left: a bushy tree** that could have been formed by `c=3` iterations of combining any two table entries, <br>e.g. `(((a+b)+(c+d))+((e+f)+(g+h)))`
- **Below right: a branching tree** formed by `c=7` iterations of combining an integer with a table entry, <br>e.g. 
`(h+((f+(e+(((a+b)+c)+d)))+g))`.


        /\        /\
       /  \        /\ 
      /\  /\      /\
     /\/\/\/\      /\
                    /\
                   /\
                  /\

Here is the code to create `expressions`:

In [4]:
def expressions(language, c=1) -> ExpTable:
    """Combine language integers with table entries c times."""
    _, ops, ints = language
    table = dict(ints) # Copy the language.integers table so we can modify it
    for i in range(c):
        table.update({**combine(ints,  ops, table), 
                      **combine(table, ops, ints)})
    return table

def combine(Ltable, operators, Rtable) -> ExpTable:
    """Return table like {(5, 12): "(two plus three)"} by combining table entries with ops."""
    return exptable((op(lv, rv), f'({Ltable[lv, ln]} {opname} {Rtable[rv, rn]})')
                    for (rv, rn) in Rtable
                    for op in operators
                    if not (rv == 0 and op == div) # Don't divide by zero
                    for (lv, ln) in Ltable
                    for opname in operators[op])

def equilength(table) -> dict:
    """Return only table expressions that evaluate to n and have n letters."""
    return {n: table[n, n] for (v, n) in sorted(table) if n == v}

In [5]:
expressions(mini, 0) # The table after zero iterations of `combine`

{(0, 4): 'zero', (1, 3): 'one', (2, 3): 'two', (3, 5): 'three', (4, 4): 'four'}

In [6]:
expressions(mini, 1) # The table after one iteration of `combine`

{(0, 4): 'zero',
 (1, 3): 'one',
 (2, 3): 'two',
 (3, 5): 'three',
 (4, 4): 'four',
 (0, 12): '(zero times two)',
 (0, 11): '(zero and zero)',
 (1, 11): '(one times one)',
 (1, 10): '(zero and one)',
 (2, 11): '(one times two)',
 (2, 10): '(zero and two)',
 (3, 13): '(one times three)',
 (3, 12): '(zero and three)',
 (4, 12): '(one times four)',
 (4, 11): '(zero and four)',
 (0, 13): '(zero times four)',
 (0, 14): '(zero times three)',
 (2, 9): '(one and one)',
 (3, 10): '(one plus two)',
 (3, 9): '(one and two)',
 (5, 11): '(one plus four)',
 (5, 10): '(one and four)',
 (4, 10): '(two plus two)',
 (4, 9): '(two and two)',
 (5, 12): '(two plus three)',
 (6, 11): '(two plus four)',
 (6, 10): '(two and four)',
 (6, 13): '(two times three)',
 (8, 12): '(two times four)',
 (6, 14): '(three plus three)',
 (7, 13): '(three plus four)',
 (7, 12): '(three and four)',
 (9, 15): '(three times three)',
 (12, 14): '(three times four)',
 (8, 11): '(four and four)',
 (16, 13): '(four times four)'}

In [7]:
equilength(expressions(mini, 3)) # The equilength expressions after 3 iterations of `combine`

{4: 'four',
 20: '((one plus four) times four)',
 24: '(((two and two) and two) times four)',
 25: '(((two and four) times four) and one)',
 26: '(((two and four) times four) plus two)',
 27: '(((one and four) and four) times three)',
 28: '(((zero and three) and four) times four)',
 29: '(((three plus four) times four) plus one)',
 30: '(((two times three) and four) times three)',
 31: '(((three plus four) times four) plus three)'}

The code above solves the problem, but I added the code below to make the solution look prettier:

In [8]:
def show(language, c):
    """Summarize and show the equilength expressions from this language."""
    table = expressions(language, c)
    equis = equilength(table)
    print(f'{language.name} has {len(equis)} equilengths: {describe(set(equis))}')
    print(f'     from {len(table):,d} table entries in {c} iterations') 
    return equis
    
def describe(numbers: set) -> str:
    """Describe a set of integers in a shorthand way.
    E.g. shorthand({1, 2, 3, 4, 5, 6, 86, 99}) => '1-6, 86, 99'."""
    formats = []
    M = max(numbers) + 2
    while numbers:
        missing = next(i for i in range(min(numbers), M) if i not in numbers)
        g = set(range(min(numbers), missing)) # Group of consecutive numbers
        formats.append(f'{min(g)}' + (f'-{max(g)}' if len(g) > 1 else ''))
        numbers = numbers - g
    return ', '.join(formats)

# Mini Language Equilength Expressions

We'll see what we can do with just the mini language:

In [9]:
show(mini, 7)

Mini has 48 equilengths: 4, 20, 24-69
     from 32,517 table entries in 7 iterations


{4: 'four',
 20: '((one plus four) times four)',
 24: '(((two and two) and two) times four)',
 25: '(((two and four) times four) and one)',
 26: '(((two and four) times four) plus two)',
 27: '(((one and four) and four) times three)',
 28: '(((zero and three) and four) times four)',
 29: '(((three plus four) times four) plus one)',
 30: '(((two times three) and four) times three)',
 31: '(((three plus four) times four) plus three)',
 32: '((((one and one) and two) plus four) times four)',
 33: '((((one and two) and four) and four) times three)',
 34: '((((two and four) and four) times three) and four)',
 35: '((((two and two) plus four) times four) plus three)',
 36: '((((one times two) plus three) and four) times four)',
 37: '((((three and four) and four) times three) plus four)',
 38: '(((((two and two) and four) times four) and two) and four)',
 39: '(((((four and four) times four) and one) and two) and four)',
 40: '(((((zero and one) and one) and four) plus four) times four)',
 4

# Defining Multiple Languages

Now some real languages: English, Spanish, French, Italian, Pinyin Chinese and German, with integers from 0 to 30.

In [10]:
english = language('English', 
    {add: 'plus, and, added to', sub: 'minus, less, take away', mul: 'times, multiplied by',
    div: 'divided by, over', lambda x, y: y - x: 'subtracted from'},
    '''zero, one, two, three, four, five, six, seven, eight, nine, ten, eleven, 
    twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty,
    twenty-one, twenty-two, twenty-three, twenty-four, twenty-five, twenty-six,
    twenty-seven, twenty-eight, twenty-nine, thirty''')

spanish = language('Spanish', 
    {add: 'más', sub: 'menos', mul: 'por', div: 'dividido entre, dividido por'},
    '''cero, uno, dos, tres, cuatro, cinco, seis, siete, ocho, nueve, diez, once, doce, 
    trece, catorce, quince, dieciséis, diecisiete, dieciocho, diecinueve, veinte, veintiuno, 
    veintidós, veintitrés, veinticuatro, veinticinco, veintiséis, veintisiete, veintiocho,
    veintinueve, treinta''')

french = language('French',
    {add: 'plus, et', sub: 'moins', mul: 'multiplié par, fois', div: 'divisé par, sur'},
    '''zéro, un, deux, trois, quatre, cinq, six, sept, huit, neuf, dix, onze, douze, 
    treize, quatorze, quinze, seize, dix-sept, dix-huit, dix-neuf, vingt, vingt et un,
    vingt-deux, vingt-trois, vingt-quatre, vingt-cinq, vingt-six, vingt-sept, vingt-huit,
    vingt-neuf, trente''')

italian = language('Italian',
    {add: 'piu', sub: 'meno', mul: 'per', div: 'diviso'},
    '''zero, uno, due, tre, quattro, cinque, sei, sette, otto, nove, dieci, undici, dodici,
    tredici, quattrodici, quindici, sedici, dicisette, diciotto, dicinove, venti,
    ventiuno, ventidue, ventitre, ventiquattro, venticinque, ventisei, ventisette,
    ventotto, ventinove, trenta''')

chinese = language('Chinese',
    {add: 'jiā', sub: 'jiǎn', mul: 'chéng', div: 'chú, chú yǐ'},
    '''ling, yī, èr, sān, sì, wŭ, liù, qī, bā, jiŭ, shí, shí yī, shí èr, shí sān, 
    shí sì, shí wǔ, shí liù, shí qī, shí bā, shí jiŭ, èr shí, èr shí yī, èr shí èr, 
    èr shí sān, èr shí sì, èr shí wŭ, èr shí liù, èr shí qī, èr shí bā, èr shí jiŭ, sān shí''')

german = language('German',
    {add: 'und, plus', sub: 'weniger, minus', mul: 'mal, multipliziert', div: 'durch'},
    '''null, eins, zwei, drei, vier, fünf, sechs, sieben, acht, neun, zehn, elf, 
    zwölf, dreizehn, vierzehn, funfzehn, sechszehn, siebzehn, achtzehn, neunzehn, 
    zwanzig, einundzwanzig, zweiundzwanzig, dreiundzwanzig, vierundzwanzig,
    fünfundzwanzig, sechsundzwanzig, siebenundzwanzig, achtundzwanzig, 
    neunundzwanzig, dreiβig''')

languages = english, spanish, french, italian, chinese, german

# Equilength Numbers

Let's start with the equilength numbers for each language:

In [11]:
for L in languages:
    print(f'{L.name:7} {equilength(L.integers)}')

English {4: 'four'}
Spanish {5: 'cinco'}
French  {}
Italian {3: 'tre'}
Chinese {2: 'èr', 3: 'sān'}
German  {4: 'vier'}


I'm also interested in the average number of letters over the first 31 integers for each language:

In [12]:
for L in languages:
    avg = sum(n for (v, n) in L.integers) / len(L.integers)
    print(f'{L.name:7} {avg:.1f} letters/integer')

English 6.8 letters/integer
Spanish 7.0 letters/integer
French  6.2 letters/integer
Italian 6.7 letters/integer
Chinese 4.9 letters/integer
German  8.2 letters/integer


Now we'll look at each language in turn, seeing what equilength number expressions we can get in just a few seconds of computation, with `c=2` iterations of `combine`:

# English


In [13]:
show(english, 2)

English has 47 equilengths: 4, 10-54, 56
     from 418,515 table entries in 2 iterations


{4: 'four',
 10: '(zero and ten)',
 11: '(two plus nine)',
 12: '(one and eleven)',
 13: '(one plus twelve)',
 14: '(one and thirteen)',
 15: '(one times fifteen)',
 16: '(three and thirteen)',
 17: '(one times seventeen)',
 18: '((one and six) and eleven)',
 19: '((one less two) and twenty)',
 20: '((zero less ten) and thirty)',
 21: '((one minus ten) plus thirty)',
 22: '((one minus nine) plus thirty)',
 23: '((three minus ten) plus thirty)',
 24: '((ten less sixteen) plus thirty)',
 25: '((five times eleven) less thirty)',
 26: '((seven times eight) minus thirty)',
 27: '((nine divided by ten) times thirty)',
 28: '((two times twenty-nine) less thirty)',
 29: '((twenty-nine and thirty) less thirty)',
 30: '((ten less ten) subtracted from thirty)',
 31: '((nine less ten) subtracted from thirty)',
 32: '((eight less ten) subtracted from thirty)',
 33: '((seven minus ten) subtracted from thirty)',
 34: '((six take away ten) subtracted from thirty)',
 35: '((ten minus fifteen) subtracte

# Spanish

In [14]:
show(spanish, 2)

Spanish has 41 equilengths: 5, 10-12, 14-42, 44-46, 48-50, 52, 54
     from 220,630 table entries in 2 iterations


{5: 'cinco',
 10: '(uno por diez)',
 11: '(cero más once)',
 12: '(tres más nueve)',
 14: '(cero más catorce)',
 15: '(quince menos cero)',
 16: '(cero más dieciséis)',
 17: '(cero más diecisiete)',
 18: '((uno más dos) más quince)',
 19: '((uno más tres) más quince)',
 20: '((dos menos uno) por veinte)',
 21: '((uno por uno) por veintiuno)',
 22: '((dos menos diez) más treinta)',
 23: '((tres menos diez) más treinta)',
 24: '((seis por nueve) menos treinta)',
 25: '((diez menos quince) más treinta)',
 26: '((diez menos catorce) más treinta)',
 27: '((diez menos doce) más veintinueve)',
 28: '((doce menos trece) más veintinueve)',
 29: '((cuatro menos tres) por veintinueve)',
 30: '((veintiuno menos veinte) por treinta)',
 31: '((veinte menos diecinueve) más treinta)',
 32: '((treinta menos veintiocho) más treinta)',
 33: '((treinta menos veintisiete) más treinta)',
 34: '((veintiséis menos veintidós) más treinta)',
 35: '((veintiocho menos veintitrés) más treinta)',
 36: '((dieciocho 

# French

In [15]:
show(french, 2)

French has 40 equilengths: 7-42, 44-46, 48
     from 279,648 table entries in 2 iterations


{7: '(un et six)',
 8: '(un et sept)',
 9: '(neuf sur un)',
 10: '(un plus neuf)',
 11: '(trois et huit)',
 12: '(quatre et huit)',
 13: '((un et deux) et dix)',
 14: '(un fois quatorze)',
 15: '((un et deux) et douze)',
 16: '((un sur un) fois seize)',
 17: '((un fois un) plus seize)',
 18: '((un sur un) fois dix-huit)',
 19: '((dix moins onze) et vingt)',
 20: '((six sur neuf) fois trente)',
 21: '((deux moins onze) et trente)',
 22: '((huit moins seize) et trente)',
 23: '((dix moins dix-sept) et trente)',
 24: '((douze sur quinze) fois trente)',
 25: '((douze moins dix-sept) et trente)',
 26: '((quinze moins dix-neuf) et trente)',
 27: '((trois fois dix-neuf) moins trente)',
 28: '((vingt-huit fois trente) sur trente)',
 29: '((vingt moins vingt et un) plus trente)',
 30: '((trente moins vingt-neuf) fois trente)',
 31: '((vingt-neuf moins vingt-huit) et trente)',
 32: '((seize sur quinze) multiplié par trente)',
 33: '((vingt-deux divisé par vingt) fois trente)',
 34: '((dix-sept s

# Italian

In [16]:
show(italian, 2)

Italian has 32 equilengths: 3, 9-13, 15-40
     from 127,985 table entries in 2 iterations


{3: 'tre',
 9: '(tre piu sei)',
 10: '(uno piu nove)',
 11: '(uno piu dieci)',
 12: '(uno per dodici)',
 13: '(uno per tredici)',
 15: '(zero piu quindici)',
 16: '(quattro piu dodici)',
 17: '(tre piu quattrodici)',
 18: '((uno meno tre) piu venti)',
 19: '((zero meno uno) piu venti)',
 20: '((nove meno otto) per venti)',
 21: '((uno meno dieci) piu trenta)',
 22: '((tre meno undici) piu trenta)',
 23: '((nove meno sedici) piu trenta)',
 24: '((otto diviso dieci) per trenta)',
 25: '((cinque per undici) meno trenta)',
 26: '((nove meno dodici) piu ventinove)',
 27: '((dicisette meno venti) piu trenta)',
 28: '((tredici meno quindici) piu trenta)',
 29: '((ventidue meno ventitre) piu trenta)',
 30: '((ventinove meno ventotto) per trenta)',
 31: '((ventotto diviso ventotto) piu trenta)',
 32: '((ventinove meno ventisette) piu trenta)',
 33: '((ventiquattro meno ventiuno) piu trenta)',
 34: '((ventisette meno ventidue) piu ventinove)',
 35: '((ventotto diviso ventiquattro) per trenta)',


# Chinese

In [17]:
show(chinese, 2)

Chinese has 28 equilengths: 2-3, 7-32
     from 166,807 table entries in 2 iterations


{2: 'èr',
 3: 'sān',
 7: '(èr jiā wŭ)',
 8: '(èr jiā liù)',
 9: '(sān jiā liù)',
 10: '(yī chéng shí)',
 11: '(shí bā jiǎn qī)',
 12: '(yī chéng shí èr)',
 13: '(yī chéng shí sān)',
 14: '(èr shí sì jiǎn shí)',
 15: '(sān shí jiǎn shí wǔ)',
 16: '((sì jiǎn bā) jiā èr shí)',
 17: '((qī jiǎn shí) jiā èr shí)',
 18: '((yī jiǎn bā) jiā èr shí wŭ)',
 19: '((qī chéng qī) jiǎn sān shí)',
 20: '((liù chú jiŭ) chéng sān shí)',
 21: '((qī chú yǐ shí) chéng sān shí)',
 22: '((bā jiǎn shí wǔ) jiā èr shí jiŭ)',
 23: '((shí yī jiǎn shí bā) jiā sān shí)',
 24: '((shí èr chú shí wǔ) chéng sān shí)',
 25: '((èr shí jiǎn èr shí wŭ) jiā sān shí)',
 26: '((shí sān chú shí sì) chéng èr shí bā)',
 27: '((èr shí qī chéng sān shí) chú sān shí)',
 28: '((èr shí qī jiǎn èr shí jiŭ) jiā sān shí)',
 29: '((èr shí yī jiǎn èr shí) chéng èr shí jiŭ)',
 30: '((èr shí jiŭ chú èr shí jiŭ) chéng sān shí)',
 31: '(èr shí wŭ jiǎn (èr shí sān jiǎn èr shí jiŭ))',
 32: '(èr shí liù jiǎn (èr shí sān jiǎn èr shí jiŭ))'}

# German

In [18]:
show(german, 2)

German has 48 equilengths: 4, 11-13, 15-57, 63
     from 238,787 table entries in 2 iterations


{4: 'vier',
 11: '(null plus elf)',
 12: '(eins mal zwölf)',
 13: '(eins plus zwölf)',
 15: '(eins mal funfzehn)',
 16: '(eins mal sechszehn)',
 17: '(eins plus sechszehn)',
 18: '((zwei und fünf) plus elf)',
 19: '((zwei und fünf) und zwölf)',
 20: '((null mal elf) und zwanzig)',
 21: '(null plus einundzwanzig)',
 22: '((drei minus elf) und dreiβig)',
 23: '((vier minus elf) plus dreiβig)',
 24: '((sechs mal neun) minus dreiβig)',
 25: '((eins minus sechs) plus dreiβig)',
 26: '((elf minus funfzehn) und dreiβig)',
 27: '((drei mal neunzehn) minus dreiβig)',
 28: '((zwölf minus vierzehn) und dreiβig)',
 29: '((zwölf minus dreizehn) plus dreiβig)',
 30: '((zwanzig minus neunzehn) mal dreiβig)',
 31: '((zwanzig minus neunzehn) plus dreiβig)',
 32: '((sechszehn durch funfzehn) mal dreiβig)',
 33: '((zwanzig weniger siebzehn) plus dreiβig)',
 34: '((zwanzig weniger sechszehn) plus dreiβig)',
 35: '((vier mal sechszehn) minus neunundzwanzig)',
 36: '((dreiβig durch fünfundzwanzig) mal dreiβ

# Infinite Additions

I have a feeling that there are an infinite number of equilength expressions in most languages. But how to prove it? One way is to show an infinite pattern. For example, in English we have: "four (and six)<sup><i>k</i></sup>" meaning "four", "four and six", "four and six and six", ... Because "and six" has 6 letters, we can add the phrase an arbitrary number of times, generating the pattern..

I'll search for more such patterns in all our languages:

In [19]:
def infinite_equilengths(language):
    return {v: f'{plus} {number} ...' for plus in language.operators[add] 
            for (v, n), number in language.integers.items() 
            if v == lettercount(plus + number)}

{L.name: infinite_equilengths(L) for L in languages}

{'English': {6: 'and six ...', 8: 'and eight ...', 10: 'added to ten ...'},
 'Spanish': {},
 'French': {8: 'plus huit ...'},
 'Italian': {6: 'piu sei ...', 14: 'piu quattrodici ...'},
 'Chinese': {5: 'jiā wŭ ...', 6: 'jiā liù ...'},
 'German': {8: 'plus acht ...'}}

# Extended Languages

To find more equilength expressions I will need to search a larger portion of the search space. I can think of three approaches:

1. Go beyond `c=2` to `c=3` or `c=4` iterations.
2. Instead of always combining an integer with a table entry, allow the combination of two table entries.
3. Continue to restrict combinations to an integer and a table entry, but seed the integers with more entries that are bigger in both value and number of letters.

My intution is that (1) and (2) will give me "bushier" expressions that are unlikely to help much. They will have lots of different numerical values, but the new expressions will have roughly the same number of letters in them as the previous expressions. 

Therefore, I'm going to try approach (3), which has the added advantage that I don't have to  alter `word_expessions` or `combine`. I'm going to define an `extended` language were we add to the `language.integers` field. I could just add regular integers, perhaps going up to a hundred rather than just thirty. But the problem is that an integer like "ninety" has a lot fewer letters than its value; I think there would be an imbalance. Instead I'll add *pseudo-integers:* for each integer (like `"two"`) in the base language, the extended language will have new expressions like `(two plus two plus two...)`. The idea is that these will be better building blocks when we `combine` an integer with a table entry.

In [20]:
def extended(language, repeats=6) -> Language:
    """Extend language by adding "repeated" integers, like "two plus two plus two"."""
    name, ops, ints = language
    new_ints = exptable((i * r, '(' + f' {op} '.join([exp] * r) + ')')
                        for r in range(2, repeats + 1)
                        for (i, _), exp in ints.items()
                        for op in ops[add])
    return Language('Extended ' + name, ops, {**ints, **new_ints})

For example, here are the extended integers for the `mini` language, with up to 3 repeats:

In [21]:
extended(mini, 3).integers

{(0, 4): 'zero',
 (1, 3): 'one',
 (2, 3): 'two',
 (3, 5): 'three',
 (4, 4): 'four',
 (0, 12): '(zero plus zero)',
 (0, 11): '(zero and zero)',
 (2, 10): '(one plus one)',
 (2, 9): '(one and one)',
 (4, 10): '(two plus two)',
 (4, 9): '(two and two)',
 (6, 14): '(three plus three)',
 (6, 13): '(three and three)',
 (8, 12): '(four plus four)',
 (8, 11): '(four and four)',
 (0, 20): '(zero plus zero plus zero)',
 (0, 18): '(zero and zero and zero)',
 (3, 17): '(one plus one plus one)',
 (3, 15): '(one and one and one)',
 (6, 17): '(two plus two plus two)',
 (6, 15): '(two and two and two)',
 (9, 23): '(three plus three plus three)',
 (9, 21): '(three and three and three)',
 (12, 20): '(four plus four plus four)',
 (12, 18): '(four and four and four)'}

# Efficiency of `expressions`

Come to think of it, I will alter `expressions` after all. Not to accomodate some new strategy, but rather to be more efficient. The previous version of `expressions` tries to combine each integer on either the left or the right of an existing expression in `table`. That's inefficient for two reasons:
- On the first iteration, we're combining an integer with an integer; there's only one way to do that, so trying two ways is redundant.
- On subsequent iterations, given, say, the integer "two" and the expression "(one plus two)", I want both "(two minus (one plus two))" and "((one plus two) minus two)" because the respective values are -1 and 1, but it is redundant to combine them both ways with addition (or multiplication), because the resulting value is 5 (or 6) either way, because addition (or multiplication) is commutative. So we will restrict the second call to `combine` to only noncommutative operations.

In [22]:
def expressions(language, c=1) -> ExpTable:
    """Combine language integers with table c times."""
    _, ops, ints = language
    table = dict(ints) # Copy the language.integers table so we can modify it
    if c >= 1:
        table.update(combine(ints,  ops, table))
    for i in range(1, c):
        table.update({**combine(ints,  ops, table), 
                      **combine(table, noncommutative(ops), ints)})
    return table

def noncommutative(ops: dict) -> dict: 
    """Copy `ops` but omit the commutative operators."""
    return {op: ops[op] for op in ops if op not in (add, mul)}

Let's see how well the extended languages perform:

# Extended English

In [23]:
%time show(extended(english), 0)

Extended English has 5 equilengths: 4, 14, 21, 24, 32
     from 488 table entries in 0 iterations
CPU times: user 2.72 ms, sys: 89 µs, total: 2.81 ms
Wall time: 2.79 ms


{4: 'four',
 14: '(seven plus seven)',
 21: '(seven and seven and seven)',
 24: '(six plus six plus six plus six)',
 32: '(eight plus eight plus eight plus eight)'}

In [24]:
%time show(extended(english), 1)

Extended English has 150 equilengths: 4, 10-148, 150-154, 156, 158, 162-163, 168
     from 275,310 table entries in 1 iterations
CPU times: user 26.1 s, sys: 282 ms, total: 26.4 s
Wall time: 26.8 s


{4: 'four',
 10: '(zero and ten)',
 11: '(two plus nine)',
 12: '(one and eleven)',
 13: '(one plus twelve)',
 14: '(one and thirteen)',
 15: '(one times fifteen)',
 16: '(four and (six and six))',
 17: '(five plus (six and six))',
 18: '(zero and (nine and nine))',
 19: '(one plus (nine plus nine))',
 20: '(zero and (ten added to ten))',
 21: '(three plus (nine plus nine))',
 22: '(zero and (eleven and eleven))',
 23: '(one plus (eleven plus eleven))',
 24: '(six plus (six plus six plus six))',
 25: '(seven and (six plus six plus six))',
 26: '(eight plus (six plus six plus six))',
 27: '(zero and (nine plus nine plus nine))',
 28: '(one added to (nine and nine and nine))',
 29: '(five plus (six and six and six and six))',
 30: '(six and (six plus six plus six plus six))',
 31: '((two and two) plus (nine and nine and nine))',
 32: '(eight and (six plus six plus six plus six))',
 33: '(zero and (eleven plus eleven plus eleven))',
 34: '(four and (six and six and six and six and six))',

# Extended Spanish

In [25]:
%time show(extended(spanish), 1)

Extended Spanish has 77 equilengths: 5, 10-12, 14-22, 24-61, 63-73, 76-83, 85, 87-88, 92-93, 96-97
     from 77,695 table entries in 1 iterations
CPU times: user 1.49 s, sys: 16.9 ms, total: 1.5 s
Wall time: 1.51 s


{5: 'cinco',
 10: '(uno por diez)',
 11: '(cero más once)',
 12: '(tres más nueve)',
 14: '(cero más catorce)',
 15: '(quince menos cero)',
 16: '(doce más (dos más dos))',
 17: '(uno más (ocho más ocho))',
 18: '(seis más (seis más seis))',
 19: '(uno más (nueve más nueve))',
 20: '(cuatro más (ocho más ocho))',
 21: '(siete más (siete más siete))',
 22: '(cuatro más (nueve más nueve))',
 24: '(uno por (ocho más ocho más ocho))',
 25: '(veintinueve menos (dos más dos))',
 26: '(veintiséis menos (cero más cero))',
 27: '(uno por (nueve más nueve más nueve))',
 28: '((once más once más once) menos cinco)',
 29: '((once más once más once) menos cuatro)',
 30: '(quince más (cinco más cinco más cinco))',
 31: '((veintidós más veintidós) menos trece)',
 32: '(cero más (ocho más ocho más ocho más ocho))',
 33: '(nueve más (seis más seis más seis más seis))',
 34: '(veintiséis más (dos más dos más dos más dos))',
 35: '(siete por (uno más uno más uno más uno más uno))',
 36: '(cero más (nueve

# Extended French

In [26]:
%time show(extended(french), 1)

Extended French has 107 equilengths: 7-12, 14-94, 96-105, 107-111, 113-115, 119-120
     from 163,204 table entries in 1 iterations
CPU times: user 6.5 s, sys: 72.1 ms, total: 6.57 s
Wall time: 6.62 s


{7: '(un et six)',
 8: '(un et sept)',
 9: '(neuf sur un)',
 10: '(un plus neuf)',
 11: '(trois et huit)',
 12: '(quatre et huit)',
 14: '(deux et (six et six))',
 15: '(trois et (six et six))',
 16: '(un fois (huit et huit))',
 17: '(trois et (sept et sept))',
 18: '(quinze et (un et un et un))',
 19: '(un plus (six et six et six))',
 20: '((zéro et zéro) et (dix et dix))',
 21: '(six et (cinq et cinq et cinq))',
 22: '(un plus (sept et sept et sept))',
 23: '(dix-neuf et (un et un et un et un))',
 24: '(un fois (six et six et six et six))',
 25: '(vingt et (un et un et un et un et un))',
 26: '((un et un) et (six et six et six et six))',
 27: '(trois plus (six et six et six et six))',
 28: '(un fois (sept et sept et sept et sept))',
 29: '((dix et dix) et (trois et trois et trois))',
 30: '(cinq fois (un et un et un et un et un et un))',
 31: '(trois plus (sept et sept et sept et sept))',
 32: '(vingt-six et (un et un et un et un et un et un))',
 33: '(vingt-sept et (un et un et un e

# Extended Italian

In [27]:
%time show(extended(italian), 1)

Extended Italian has 91 equilengths: 3, 9-13, 15-28, 30-64, 66-78, 80-87, 89-92, 94-96, 99, 104-105, 108, 114, 116, 124, 128
     from 58,522 table entries in 1 iterations
CPU times: user 1.1 s, sys: 14 ms, total: 1.12 s
Wall time: 1.13 s


{3: 'tre',
 9: '(tre piu sei)',
 10: '(uno piu nove)',
 11: '(uno piu dieci)',
 12: '(uno per dodici)',
 13: '(uno per tredici)',
 15: '(tre piu (sei piu sei))',
 16: '(otto per (uno piu uno))',
 17: '(uno piu (otto piu otto))',
 18: '(zero piu (nove piu nove))',
 19: '(tredici piu (tre piu tre))',
 20: '(zero piu (dieci piu dieci))',
 21: '(tre piu (sei piu sei piu sei))',
 22: '(zero piu (undici piu undici))',
 23: '(venti piu (uno piu uno piu uno))',
 24: '(uno per (otto piu otto piu otto))',
 25: '(venticinque piu (zero piu zero))',
 26: '(ventitre piu (uno piu uno piu uno))',
 27: '(tre piu (sei piu sei piu sei piu sei))',
 28: '(quattro piu (otto piu otto piu otto))',
 30: '((tre piu tre) piu (otto piu otto piu otto))',
 31: '(tre piu (quattrodici piu quattrodici))',
 32: '(zero piu (otto piu otto piu otto piu otto))',
 33: '(tre piu (sei piu sei piu sei piu sei piu sei))',
 34: '((ventisei piu ventisei) meno (nove piu nove))',
 35: '(venti piu (tre piu tre piu tre piu tre piu tr

# Extended Chinese

Since the Chinese number names tend to be shorter, I'll allow more repetitions in `extended`:

In [28]:
%time show(extended(chinese, 9), 1)

Extended Chinese has 119 equilengths: 2-3, 7-106, 108-111, 114-116, 120, 123, 125-126, 128, 133, 136-138, 142
     from 156,099 table entries in 1 iterations
CPU times: user 4.12 s, sys: 49.3 ms, total: 4.17 s
Wall time: 4.21 s


{2: 'èr',
 3: 'sān',
 7: '(èr jiā wŭ)',
 8: '(èr jiā liù)',
 9: '(sān jiā liù)',
 10: '(yī chéng shí)',
 11: '(shí bā jiǎn qī)',
 12: '(èr jiā (wŭ jiā wŭ))',
 13: '(sān jiā (wŭ jiā wŭ))',
 14: '(yī chéng (qī jiā qī))',
 15: '(sān jiā (liù jiā liù))',
 16: '(èr shí jiǎn (èr jiā èr))',
 17: '(èr jiā (wŭ jiā wŭ jiā wŭ))',
 18: '(sān jiā (wŭ jiā wŭ jiā wŭ))',
 19: '(èr shí jiŭ jiǎn (wŭ jiā wŭ))',
 20: '(èr jiā (liù jiā liù jiā liù))',
 21: '(sān jiā (liù jiā liù jiā liù))',
 22: '(èr jiā (wŭ jiā wŭ jiā wŭ jiā wŭ))',
 23: '(sān jiā (wŭ jiā wŭ jiā wŭ jiā wŭ))',
 24: '((yī jiā yī) chéng (sì jiā sì jiā sì))',
 25: '(shí qī jiā (èr jiā èr jiā èr jiā èr))',
 26: '(èr jiā (liù jiā liù jiā liù jiā liù))',
 27: '(èr jiā (wŭ jiā wŭ jiā wŭ jiā wŭ jiā wŭ))',
 28: '(sān jiā (wŭ jiā wŭ jiā wŭ jiā wŭ jiā wŭ))',
 29: '(shí qī jiā (sān jiā sān jiā sān jiā sān))',
 30: '(sān chéng (èr jiā èr jiā èr jiā èr jiā èr))',
 31: '((shí yī jiā shí yī) jiā (sān jiā sān jiā sān))',
 32: '(bā jiā (sì jiā sì jiā sì jiā 

# Extended German

In [29]:
%time show(extended(german), 1)

Extended German has 139 equilengths: 4, 11-13, 15-138, 140, 143-146, 150-152, 156-157, 162
     from 157,788 table entries in 1 iterations
CPU times: user 7.31 s, sys: 79.2 ms, total: 7.39 s
Wall time: 7.45 s


{4: 'vier',
 11: '(null plus elf)',
 12: '(eins mal zwölf)',
 13: '(eins plus zwölf)',
 15: '(eins mal funfzehn)',
 16: '(eins mal sechszehn)',
 17: '(elf und (drei und drei))',
 18: '(eins mal (neun und neun))',
 19: '(eins und (neun plus neun))',
 20: '(null plus (zehn plus zehn))',
 21: '(neun und (sechs plus sechs))',
 22: '(acht und (sieben und sieben))',
 23: '(neun und (sieben plus sieben))',
 24: '((eins und eins) und (elf plus elf))',
 25: '(eins und (acht und acht und acht))',
 26: '(zwei plus (acht und acht und acht))',
 27: '(eins mal (neun plus neun plus neun))',
 28: '(eins plus (neun plus neun plus neun))',
 29: '(elf und (sechs plus sechs plus sechs))',
 30: '(zwölf plus (sechs und sechs und sechs))',
 31: '(elf und (fünf und fünf und fünf und fünf))',
 32: '(eins mal (acht und acht und acht und acht))',
 33: '(eins plus (acht und acht und acht und acht))',
 34: '((zwei plus zwei) plus (zehn und zehn und zehn))',
 35: '(drei und (acht plus acht plus acht plus acht))',
 

That looks pretty good. We're getting more equilength expressions, and they cover the integers pretty well. We could go up to `c=2`, but it would take longer. Perhaps you can figure out a way to make the search more efficient. I'll leave it to you to explore further.

# Summary

Here is a table of the number of equilength expressions found for each language, for `expressions` with `c=0` and `c=2`, for infinite additions, and for `extended` languages with `c=1`. English has the most on all counts, but I think that is largely because I gave it 11 operator names, while most of the other languages only have 4 or 5. German has the next most, which I believe is due to the fact that German integer names are longer than the other languages, on average.

|Language|c=0|c=2|∞|extended, c=1|
|---|--|--|--|--|
|Mini|1|2|0|6|
|English|1|47|3|150|
|Spanish|1|41|0|77|
|French|0|40|1|107|
|Italian|1|32|2|91|
|Chinese|2|28|2|119|
|German|1|48|1|139|
