<div align="right"><i>Peter Norvig, June 2015<br>(Updated April 2024)</i></div>

# Let's Code About Bike Locks

The [June 15, 2015 post](http://bikesnobnyc.blogspot.com/2015/06/lets-get-this-show-on-road-once-we-all.html) on [*Bike Snob NYC*](http://bikesnobnyc.blogspot.com) leads with "*Let's talk about bike locks.*" Here's what I want to talk about: in my local bike shop, I saw a combination  lock called *WordLock&reg;*,
which replaces digits  with  letters. There are 4 discs in the lock, each of which has 10 distinct letters.  I classified this as a Fred lock,
"[Fred](http://bikesnobnyc.blogspot.com/2014/06/a-fred-too-far.html)" being the term for an amateurish cyclist with inappropriate equipment. I played around with it and got this:

![](http://norvig.com/ipython/fredbuns.jpg)


Naturally I set  the other locks on the rack to FRED BUNS as well. But I have questions ... 

# Questions

1. How many words can the WordLock&reg; make?
3. Can a lock with different letters on the discs make more words? 
4. How many words can be made simultaneously? The photo above shows the words "FRED" and "BUNS," but "SOMN" is not a word.
5. Is it a coincidence that the phrase "FRED BUNS" appears, or was it planted there by  WordLock&reg; designers? 



# Preliminaries

First, set the stage with some (1) imports, (2) constants, and (3) type definitions:


In [1]:
from   collections import Counter
from   typing      import *
from   functools   import lru_cache
import itertools
import random 
import re
import textwrap

wordlock = ('SPHMTWDLFB', 'LEYHNRUOAI', 'ENMLRTAOSK', 'DSNMPYLKTE') # The lock in the photo above
ALPHABET = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' # The 26 letters
DISCS    = 4     # A lock has 4 discs.
LETTERS  = 10    # A disc has 10 letters.

Letter   = str   # A single letter from the ALPHABET.
Disc     = str   # A sequence of letters (joined into a string) forms a disc.
Lock     = tuple # A tuple of discs forms a lock.
Word     = str   # A 4-letter string that is in the list of valid words. (Some strings are non-words.)
Regex    = str   # A regular expression (used to find all the words that can be made by a lock).

# The Word List

I happen to have  a file of  four-letter words (no, not *[that](http://en.wikipedia.org/wiki/Four-letter_word)* kind of four-letter word). It is the union of an official Scrabble&reg; word list with a list of proper names. The following shell command tests if the file has already been downloaded to the local directory and if not, fetches it from the web:

In [2]:
! [ -e words4.txt ] || curl -O http://norvig.com/ngrams/words4.txt

I will use this word list two ways: `WORDSTR` is a big string of words separated by newlines.  `WORDS` is a list of the individual words in the file:

In [3]:
WORDSTR = open('words4.txt').read()
WORDS   = WORDSTR.split()

In [4]:
len(WORDS)

4360

Here is a random sampling of the words:

In [5]:
random.seed(1234) # for reproducability

random.sample(WORDS, LETTERS)

['STOW',
 'DRUM',
 'AIDS',
 'CUSS',
 'BAYS',
 'COSH',
 'DEED',
 'PHON',
 'KARA',
 'ANOA']

# Question 1: How Many Words can the WordLock® make?

My approach:
- Given a lock, I need to compare it against a list of valid words to see which words can be made by the lock.
- I  think of this as a matching problem: what words match the pattern described by the lock.
- Python's `re` module is good at matching.
  - I'll define `regex` to create a regular expression that represents a lock's 10<sup>4</sup> possible combinations.
  - I'll define `words_from(lock)`  to match the regular expression against all the words in the word list.
  - I can do this in just one call to `re.findall`.
- I'll define `word_count` to count the number of words a lock makes, and cache the results (to save time on future calls).

In [6]:
def regex(lock: Lock) -> Regex: 
    """A regular expression describing all 10**4 combinations that this lock can make."""
    return cat(('[' + disc + ']') for disc in lock)

def words_from(lock: Lock, wordstr=WORDSTR) -> List[Word]: 
    """A list of all valid words that can be made by this lock."""
    return re.findall(regex(lock), wordstr)

@lru_cache(None)
def word_count(lock) -> int: return len(words_from(lock))

cat   = ''.join  # Function to concatenate strings with no space between them.
space = ' '.join # Function to join strings with a space between each one.

In [7]:
regex(wordlock)

'[SPHMTWDLFB][LEYHNRUOAI][ENMLRTAOSK][DSNMPYLKTE]'

We can now answer question 1:

In [8]:
word_count(wordlock)

1118

That's our answer: WordLock® can make 1,118 words (about 1/4 of the word list).  

# Visualizing Results

I don't want to print a list with 1,118 lines, so I'll define `show` to print  in a prettier format:

In [9]:
def show(lock: Lock, width=110) -> None:
    """Show (print) a lock, the words it makes, and the word count."""
    words = words_from(lock)
    print(f'{len(words):,d} words can be made by the lock ({space(lock)}):\n')
    print(textwrap.fill(space(sorted(words)), width))

In [10]:
show(wordlock)

1,118 words can be made by the lock (SPHMTWDLFB LEYHNRUOAI ENMLRTAOSK DSNMPYLKTE):

BAAL BAAS BAKE BALD BALE BALK BALL BALM BALS BAMS BAND BANE BANK BANS BARD BARE BARK BARM BARN BARS BASE BASK
BASS BAST BATE BATS BATT BEAD BEAK BEAM BEAN BEAT BEEN BEEP BEES BEET BELL BELS BELT BEND BENE BENS BENT BERK
BERM BEST BETS BIAS BIKE BILE BILK BILL BIND BINE BINS BINT BIOS BIRD BIRK BIRL BISE BISK BITE BITS BITT BLAE
BLAM BLAT BLED BLET BLOT BOAS BOAT BOLD BOLE BOLL BOLT BOND BONE BONK BONY BOOK BOOM BOON BOOS BOOT BORE BORK
BORN BORT BOSK BOSS BOTS BOTT BRAD BRAE BRAN BRAS BRAT BRAY BRED BREE BREN BROS BULK BULL BUMP BUMS BUND BUNK
BUNN BUNS BUNT BUOY BURD BURL BURN BURP BURS BURY BUSK BUSS BUST BUSY BUTE BUTS BUTT BYES BYRE BYRL BYTE DAKS
DALE DALS DAME DAMN DAMP DAMS DANE DANK DANS DARE DARK DARN DART DATE DEAD DEAL DEAN DEED DEEM DEEP DEES DEET
DEKE DELE DELL DELS DELT DEME DEMY DENE DENS DENT DENY DEON DERE DERM DESK DHAK DHAL DIAL DIED DIEL DIES DIET
DIKE DILL DIME DIMS DINE DINK DINS D

# Aside: How Secure is WordLock?

The WordLock® makes 1,118 words.  You might say that an attacker  would find this lock to be only 11.18% as secure as a 4-digit lock with 10,000 combinations.  But in reality, every cable lock is [vulnerable](https://www.sfbike.org/news/video-how-to-lock-your-bike/) to an attacker who possesses  wire  cutters or a knowledge of lock-picking, so  security is equally terrible for WordLock&reg; and for an equivalent lock with digits instead of letters. You really should use a hardened steel U-lock instead.

# Question 2: Can a lock with different letters on the discs make more words?

To make a lock with more words than the original WordLock®, the simplest thing I could think of is a [greedy algorithm](https://en.wikipedia.org/wiki/Greedy_algorithm):
1) Consider each of the 4 discs, one at a time.
2) Fill each disc with the 10 most common letters (across all possible words) that appear at that position.
4) When all 4 discs have been filled, we have a lock.

How do we choose the 10 most common letters?  The `Counter.most_common` method can do most of the work:

In [11]:
def most_common(words, position, n=LETTERS) -> str:
    """The `n` most common letters in `position` of all the `words`."""
    counter = Counter(word[position] for word in words)
    return cat(letter for (letter, count) in counter.most_common(n))
    
most_common(WORDS, 0)

'SPTBDCLMAR'

In other words, the ten most common letters from the first position of all the words are SPTBDCLMAR, in that order.

The function `greedy_lock` creates a lock with this greedy disc-filling approach. I'm not sure if the best order of discs is left-to-right or right-to-left or something else, so I'll leave the order as a parameter. Order matters because when we choose the 10 letters for one disc, we count "across all possible words"; we eliminate any impossible words that don't match one of those 10 letters before we move on to the next disc.

In [12]:
def greedy_lock(words=WORDS, order=range(DISCS)) -> Lock:
    """Make a lock where we greedily choose the LETTERS best letters for each disc, in order."""
    lock = DISCS * [Disc()] # Initially a lock of 4 empty discs
    for i in order: 
        # Make lock[i] be a disc whose letters cover the most words, then update `words`
        lock[i] = most_common(words, i)
        words   = [w for w in words if w[i] in lock[i]]
    return Lock(lock)

In [13]:
greedy_lock()

('SPTBDCLMAR', 'OAIEURLHYN', 'RNALEOTISM', 'SETADNLKYP')

In [14]:
word_count(greedy_lock())

1177

That's an improvement! The original WordLock® makes 1,118 words, so 1,177 is 5% more.

# Greedier Algorithm

`greedy_lock` might be better if we consider the 4 discs in some other order. Let's try all possible orders and pick the best result:

In [15]:
def greedier_lock(words=WORDS) -> Lock:
    """Choose the best greedy lock, considering all possible orderings of discs."""
    locks = [greedy_lock(words, order) for order in itertools.permutations(range(DISCS))]
    return max(locks, key=word_count)

In [16]:
greedier_lock()

('BPTCMSDLGW', 'OAEIURLYHW', 'EARNLIOTSM', 'SETNDALKYP')

In [17]:
%time show(greedier_lock())

1,235 words can be made by the lock (BPTCMSDLGW OAEIURLYHW EARNLIOTSM SETNDALKYP):

BAAL BAAS BAIL BAIT BALD BALE BALK BALL BALS BAMS BAND BANE BANK BANS BARD BARE BARK BARN BARS BASE BASK BASS
BAST BATE BATS BATT BEAD BEAK BEAN BEAT BEEN BEEP BEES BEET BELL BELS BELT BEMA BEND BENE BENS BENT BERK BEST
BETA BETS BIAS BILE BILK BILL BIMA BIND BINE BINS BINT BIOS BIRD BIRK BIRL BISE BISK BITE BITS BITT BLAE BLAT
BLED BLET BLIN BLIP BLOT BOAS BOAT BOIL BOLA BOLD BOLE BOLL BOLT BOND BONE BONK BONY BOOK BOON BOOS BOOT BORA
BORE BORK BORN BORT BOSK BOSS BOTA BOTS BOTT BRAD BRAE BRAN BRAS BRAT BRAY BREA BRED BREE BREN BRIA BRIE BRIN
BRIS BRIT BROS BULK BULL BUMP BUMS BUNA BUND BUNK BUNN BUNS BUNT BUOY BURA BURD BURL BURN BURP BURS BURY BUSK
BUSS BUST BUSY BUTE BUTS BUTT BYES BYRE BYRL BYTE CAEL CAID CAIN CALE CALK CALL CAME CAMP CAMS CANE CANS CANT
CARA CARD CARE CARK CARL CARN CARP CARS CART CASA CASE CASK CAST CATE CATS CEES CEIL CELL CELS CELT CENT CERE
CESS CETE CHAD CHAP CHAT CHAY CHIA C

That's another 5% improvement! We've done well, with only two dozen lines of code and under 50 milliseconds of run time. What else can we do?

# Hillclimbing Algorithm

The problem with the greedy algorithm is that it does no exploration: at every step it tries one thing, and if it makes a suboptimal choice of letters on one disc, there is no way to undo the choice. I'd like the option to try multiple choices on each disc. But there are too many locks to try all of them: (26 choose 10)<sup>4</sup> ≈ 800 septillion. Even if we only chose from the 14 most common letters for each disc, rather than all 26, that would still be a trillion possible locks. So rather than systematically trying all possible locks, we're left with randomly sampling from possible locks. A process called **hillclimbing** makes random changesd and keeps the changes that help:

   1. Start with some lock.
   2. Make a random change to some letter(s) in the lock.
   3. If the change yields a lock that makes more words, keep the change. Otherwise discard the change.
   4. Repeat multiple times.

I'm not sure exactly how I want to make a random change, so for now I'll define `changed_lock` to change one random letter in one random disc, but I'll parameterize the function `hillclimb` to accept a different function to allow for different changes (and also to allow a different function for how to score the best lock):

In [18]:
def changed_lock(lock) -> Lock: 
    """Change one random letter in one random disc in the lock."""
    # Make a mutable copy of lock. Then change lock2[i]'s `old` letter to a `new` one.
    lock2 = list(lock) 
    i = random.randrange(DISCS)
    old: Letter = random.choice(lock[i])
    new: Letter = random.choice([L for L in ALPHABET if L not in lock[i]])
    lock2[i] = lock2[i].replace(old, new)
    return Lock(lock2)

def hillclimb(lock, changer=changed_lock, scorer=word_count, repeat=4000) -> Lock:
    """Starting with `lock`, apply `changer` to make a new lock, keeping it if
    `scorer` rates it as better than the previous best lock. Repeat."""
    best, best_score = lock, scorer(lock)
    for _ in range(repeat):
        candidate = changer(best)
        if scorer(candidate) >= best_score:
            best, best_score = candidate, scorer(candidate)
    return best

Let's see how hillclimbing does:

In [19]:
show(hillclimb(wordlock))

1,240 words can be made by the lock (SPWMTFDLCB LEYHPRUOAI ENCLRTAOSI DSNHAYLKTE):

BAAL BAAS BACH BACK BAIL BAIT BALD BALE BALK BALL BALS BAND BANE BANK BANS BARD BARE BARK BARN BARS BASE BASH
BASK BASS BAST BATE BATH BATS BATT BEAD BEAK BEAN BEAT BECK BEEN BEES BEET BELL BELS BELT BEND BENE BENS BENT
BERK BEST BETA BETH BETS BIAS BICE BILE BILK BILL BIND BINE BINS BINT BIOS BIRD BIRK BIRL BISE BISK BITE BITS
BITT BLAE BLAH BLAT BLED BLET BLIN BLOT BOAS BOAT BOCK BOIL BOLA BOLD BOLE BOLL BOLT BOND BONE BONK BONY BOOK
BOON BOOS BOOT BORA BORE BORK BORN BORT BOSH BOSK BOSS BOTA BOTH BOTS BOTT BRAD BRAE BRAN BRAS BRAT BRAY BREA
BRED BREE BREN BRIA BRIE BRIN BRIS BRIT BROS BUCK BULK BULL BUNA BUND BUNK BUNN BUNS BUNT BUOY BURA BURD BURL
BURN BURS BURY BUSH BUSK BUSS BUST BUSY BUTE BUTS BUTT BYES BYRE BYRL BYTE CACA CAEL CAID CAIN CALE CALK CALL
CANE CANS CANT CARA CARD CARE CARK CARL CARN CARS CART CASA CASE CASH CASK CAST CATE CATS CECA CEES CEIL CELL
CELS CELT CENT CERE CESS CETE CHAD C

We got up to 1240 words, another improvement!  But can we go beyond that?   

I'll create a list of 40 `locks`: the original wordlock, 15 random locks, and 24 greedy locks, one for each permutation of the orders.

In [20]:
def random_lock() -> Lock:
    """A lock with randomly-chosen letters."""
    return Lock(cat(random.sample(ALPHABET, LETTERS)) for dial in range(DISCS))

locks  = ([wordlock] + [random_lock() for _ in range(15)] +
          [greedy_lock(WORDS, order) for order in itertools.permutations(range(DISCS))])

I'll also define `lock_table`, a little function to summarize the locks and their word counts:

In [21]:
def lock_table(locks) -> dict: return {lock: word_count(lock) for lock in locks}

In [22]:
lock_table(locks)

{('SPHMTWDLFB', 'LEYHNRUOAI', 'ENMLRTAOSK', 'DSNMPYLKTE'): 1118,
 ('YAHIQUMEOG', 'KJCXLTRQAB', 'VPTECDBKXO', 'BUVTFXKGSW'): 80,
 ('XOTNFCBIVS', 'ARWTHKDEJU', 'CKYQDFVWLO', 'TEMADLHYRX'): 115,
 ('RFGLIXZAED', 'JEOWUVCYIM', 'OCTAVRYISU', 'YOMZCTEJPL'): 178,
 ('CTIVPWDJXZ', 'REKMFSCJAT', 'KJXHUODPNB', 'IWJUGRZBOC'): 35,
 ('ILOAHSFKZX', 'XIECBRHAMN', 'SYGRQUPJNZ', 'BMVRKSYCAG'): 142,
 ('BRVNFOWYLH', 'MWTQHZSFCY', 'TQPGYZXOUK', 'NVXZHWBFUY'): 0,
 ('KRDMTSBZIA', 'FKJAZYNTPW', 'TWXFCGAJPI', 'XLVOQFHGMN'): 73,
 ('IKTENRBPSV', 'WFKYQLPJBI', 'IYCFXQUOHW', 'QAKDYFOTMG'): 56,
 ('INOEVWXSLB', 'ZYXOIWQNSL', 'NUDFMAICPR', 'YNVIZALGWR'): 105,
 ('DSKRBQEYXZ', 'CPEHRBAXOM', 'QRUNSTMCYV', 'AYDPKVSBIL'): 224,
 ('RFICNAQEUL', 'YZQPASNLBO', 'QCDZWIONHT', 'TOFMNPKXYW'): 96,
 ('AKMQVSLOHP', 'VWESKTPMLD', 'INGSRTFPAL', 'WKRIAMFSOY'): 143,
 ('OBHSIZJAGW', 'BYRENOZUFV', 'DRUWEYOVHK', 'DPNLWIABTC'): 146,
 ('SIXTNVFQJD', 'TODHWGKILE', 'RKGLUNAJDM', 'CVNHLGIDQF'): 111,
 ('KHABFQXWRC', 'HVTEFKRUWS', 'HABDOQWGRK', 'V

Now I'll do hillclimbing from each lock and display the results:

In [23]:
%time hillclimbs = [hillclimb(lock) for lock in locks]

CPU times: user 30.9 s, sys: 92.6 ms, total: 31 s
Wall time: 31.3 s


In [24]:
lock_table(hillclimbs)

{('SPMWTCDLFB', 'LEYHWRUOAI', 'ENILRTAOSC', 'DSNYAHLKTE'): 1240,
 ('FPLBSMCDTW', 'IPHLEYURAO', 'ORTEALCISN', 'LYHTKENDSA'): 1240,
 ('PWSCTLBMFD', 'AROYLPEIHU', 'NOACITESLR', 'TEKADLHSYN'): 1240,
 ('PMWSTLFBCD', 'LEOHURYAIP', 'SCLATRNIEO', 'YHLAKTEDNS'): 1240,
 ('CTWLPGDBSM', 'REHAUILYOW', 'AOITREMSNL', 'NKLTDYSPAE'): 1235,
 ('LBFDMSCPWT', 'LIEPYRHAOU', 'SLERTIAONC', 'NHLDKSAYTE'): 1240,
 ('BPCWHMTDLS', 'HLPOEIARUY', 'TRSNCEAILP', 'LDATHYNSEK'): 1232,
 ('PWTMCSBDLF', 'IYEARUPOLH', 'TESROLACNI', 'YLHNSTKAED'): 1240,
 ('DLTFWPBMSC', 'RWHYUEOLAI', 'NRCLOEAIST', 'YEDASHNTKL'): 1240,
 ('SFMWDCLPTB', 'WYROIUAELH', 'LOTCEAISNR', 'DNHTKALYSE'): 1240,
 ('MSFDTBHLPC', 'LIEHRYAWOU', 'ORLNSTECAM', 'AYDTKLSNPE'): 1232,
 ('BMSPFWDTCL', 'YIEPAHRLUO', 'CEOANSIRLT', 'NKLHADSETY'): 1240,
 ('FDMLBSCTPW', 'RYEILWUHAO', 'IONARTCSEL', 'KDYLANTSEH'): 1240,
 ('FBWMCLPSDT', 'WRAEHOIUYL', 'ARLOESTICN', 'DSKLMNAETY'): 1236,
 ('SWLTCFMPBD', 'YOHURLWIAE', 'IECLONARST', 'AETYLKHDSN'): 1240,
 ('WPMTFDLSBC', 'HAUELYRO

Here are the counts of how many times each score occurred:

In [25]:
Counter(_.values()).most_common()

[(1240, 21), (1235, 14), (1232, 4), (1236, 1)]

The fact that there are 21 different locks at 1240 (and others that are close) suggested to me that there might be a lock with 1241 or more, and I should search longer. 

But a discussion with [Matt Chisholm](https://blog.glyphobet.net/faq) changed my thinking. Matt pointed out that some locks that look different are actually the same; they just have the letters within a disc in a different order. I'll define the function `canonical` to put each disc in canonical alphabetical order, and update `lock_table` to do three new things: (1) use the canonical form;  (2) sort the locks by word count; and (3) display along with the word count the [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) to a 1240 lock:

In [26]:
def canonical(lock: Lock) -> Lock:
    "Canonicalize a lock by alphabetizing the letters in each disc."
    return Lock(cat(sorted(disc)) for disc in lock)
    
def distance(lock1, lock2) -> int:
    return sum(len(set(lock1[i]) - set(lock2[i])) for i in range(DISCS))
    
lock1240 = canonical(('SPTBDCLMWF', 'OAIEURLHYW', 'RNALEOTISC', 'SETADNLKYH'))
assert word_count(lock1240) == 1240

def lock_table(locks: Collection[Lock], target=lock1240) -> dict: 
    """A table of {canonical_lock: (word_count, distance_to_target} in sorted order."""
    locks = sorted(locks, key=word_count, reverse=True)
    return {canonical(lock): (word_count(lock), distance(lock, target)) 
            for lock in locks}

In [27]:
lock_table(hillclimbs)

{('BCDFLMPSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1240, 0),
 ('BCDFLMPSTW', 'AEHILOPRUY', 'ACEILNORST', 'ADEHKLNSTY'): (1240, 1),
 ('BCDFLMPSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEKLMNSTY'): (1236, 1),
 ('BCDGLMPSTW', 'AEHILORUWY', 'AEILMNORST', 'ADEKLNPSTY'): (1235, 3),
 ('BCDHLMPSTW', 'AEHILOPRUY', 'ACEILNPRST', 'ADEHKLNSTY'): (1232, 3),
 ('BCDFHLMPST', 'AEHILORUWY', 'ACELMNORST', 'ADEKLNPSTY'): (1232, 3)}

There are far fewer locks than it seemed at first! 

There are just  two locks (not 21) that score 1240, and they differ in just one letter (a `P` or a `W` in the second disc). 

This discovery changes my whole thinking about the geometry of the lock/score space.  Previously I imagined a spiky "porcupine-shaped" landscape, with many different peaks hitting a height of 1240.  But now I picture a "space needle" landscape: a single peak containing the two locks (one with a `P` and one with a `W`), surrounded by other lesser towers. Now I think it is less likely that there is a lock that scores over 1240.

# Searching More

Despie the revelation, I'm not quite ready to give up on finding a higher-scoring lock. An easy thing to try is to search for 8,000 steps rather than just 4,000:

In [28]:
%time lock_table(hillclimb(lock, repeat=8000) for lock in locks)

CPU times: user 31.9 s, sys: 61.3 ms, total: 32 s
Wall time: 32.1 s


{('BCDFLMPSTW', 'AEHILOPRUY', 'ACEILNORST', 'ADEHKLNSTY'): (1240, 1),
 ('BCDFLMPSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1240, 0),
 ('BCDGLMPSTW', 'AEHILORUWY', 'AEILMNORST', 'ADEKLNPSTY'): (1235, 3),
 ('BCDFHLMPST', 'AEHILORUWY', 'ACELMNORST', 'ADEKLNPSTY'): (1232, 3)}

That didn't help: still the same two 1240 locks. 

Maybe part of the problem is that once we've improved a locks a bunch, no single-letter change can improve it further. What if we allowed each change to change either one or two or three letters at a time? That would give us a better chance of escaping from a local maximum.

In [29]:
def lock_changes(lock, n=3) -> Lock: 
    """Make up to `n` random changes to lock, returning whichever lock is best."""
    locks = [lock]
    for _ in range(n):
        locks.append(changed_lock(locks[-1]))
    return max(locks, key=word_count)

%time lock_table(hillclimb(lock, lock_changes, repeat=3000) for lock in locks)

CPU times: user 1min 51s, sys: 131 ms, total: 1min 51s
Wall time: 1min 51s


{('BCDFLMPSTW', 'AEHILOPRUY', 'ACEILNORST', 'ADEHKLNSTY'): (1240, 1),
 ('BCDFLMPSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1240, 0),
 ('BCDLMPRSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1237, 1),
 ('BCDGLMPSTW', 'AEHILORUWY', 'AEILMNORST', 'ADEKLNPSTY'): (1235, 3),
 ('BCDHLMPSTW', 'AEHILOPRUY', 'ACEILNPRST', 'ADEHKLNSTY'): (1232, 3)}

Again, the same two 1240 locks; it just took longer to run.

Maybe we should keep more options open. A hillclimbing search always tracks the single best scoring lock, but a [beam search](https://en.wikipedia.org/wiki/Beam_search) tracks multiple possibilities at each step. The number to track is called the beam width. We can treack 40 different locks, and try up to 3 changes for each one on each iteration:

In [30]:
def beam_search(locks, changer=lock_changes, scorer=word_count, beam_width=40, repeat=3000) -> List[Lock]:
    """Keep up to `beam_width` locks, changing each one with `changer`, and keeping the best scoring ones
    according to `scorer`. Repeat."""
    locks = set(map(canonical, locks)) # Make a copy
    for _ in range(repeat):
        locks |= set(map(canonical, map(changer, locks)))
        locks = set(sorted(locks, key=scorer, reverse=True)[:beam_width])
    return locks

%time lock_table(beam_search(locks))

CPU times: user 1min 50s, sys: 118 ms, total: 1min 50s
Wall time: 1min 50s


{('BCDFLMPSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1240, 0),
 ('BCDFLMPSTW', 'AEHILOPRUY', 'ACEILNORST', 'ADEHKLNSTY'): (1240, 1),
 ('BCDHLMPSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1237, 1),
 ('BCDLMPRSTW', 'AEHILOPRUY', 'ACEILNORST', 'ADEHKLNSTY'): (1237, 2),
 ('BCDLMPRSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1237, 1),
 ('BCDFLMPSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEKLMNSTY'): (1236, 1),
 ('BCDHLMPSTW', 'AEHILOPRUY', 'ACEILNORST', 'ADEHKLNSTY'): (1236, 2),
 ('BCDGLMPSTW', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1236, 1),
 ('BCDFLMPSTW', 'AEHIKLORUY', 'ACEILNORST', 'ADEHKLNSTY'): (1236, 1),
 ('BCDFLMPSTW', 'AEHILOPRUY', 'ACEILNORST', 'ADEKLNRSTY'): (1236, 2),
 ('BCDGLMPSTW', 'AEHILORUWY', 'AEILMNORST', 'ADEKLNPSTY'): (1235, 3),
 ('BCDGLMPSTW', 'AEHILOPRUY', 'ACEILNORST', 'ADEHKLNSTY'): (1235, 2),
 ('BCDFLMPRST', 'AEHILORUWY', 'ACEILNORST', 'ADEHKLNSTY'): (1235, 1),
 ('BCDFLMPSTW', 'AEHILORTUY', 'ACEILNORST', 'ADEHKLNSTY'): (1235, 1),
 ('BCDFLMPSTW', 'AEH

I'm getting more convinced that 1240 is the top-scoring lock. 

On to the next question.



# Question 3: Simultaneous Words

Can we make a lock that spells 10 words simultaneously?  And could a lock with more than 10 letters on a disc spell more than 10 words simultaneously? A disc cannot have duplicates of any letter, so the upper bound is 26.

We could use hillclimbing with a `scorer` that counts simultaneous words.  My intuition is that this approach would work, eventually, but that progress would be very slow, because most random changes to a single letter would not increase the number of simultaneous words.

An alternative approach is to think of the lock not as a tuple of 4 discs (each with 10 letters), but rather as a list of 10 rows (each with 4 letters), where we want the rows to be words. Then we can just go through the list of valid words and greedily include words that don't duplicate a letter in any column:

In [31]:
def greedy_simultaneous_words(words=WORDS) -> List[Word]:
    """Greedily add words that have no duplicate letters with previous rows."""
    rows = []
    for word in words:
        if not has_duplicate_letters(word, rows):
            rows.append(word)
    return rows

def has_duplicate_letters(word: Word, rows: List[Word]) -> bool:
    """Is any letter in this `word` a duplicate of the corresponding letter in any of the `rows`?"""
    return any(word[i] == row[i] for i in range(DISCS) for row in rows)

We can run `greedy_simultaneous_words()` (with the help of a `report` function) and see if it gives us 10 (or more) words

In [32]:
def report(items) -> None: print(len(items), 'words:', *items)
    
report(greedy_simultaneous_words())

13 words: AADI BEAD CHEF DIBS EBON FLIC GOGO HUCK ICKY KNUR LYLA ODYL PFFT


It does give us more than 10 words!

The answer we get depends on the order in which we go through the words. So let's try a thousand different orders, shuffling the list of words before each try:

In [33]:
def shuffled(items: list) -> List: return random.sample(items, len(items))

def greedier_simultaneous_words(words=WORDS, n=1000) -> List[Word]:
    """Try `n` times and return the try with the most simultaneous words."""
    tries = (greedy_simultaneous_words(shuffled(WORDS)) for _ in range(n))
    return max(tries, key=len)

report(greedier_simultaneous_words())

16 words: BUNN KAPH JOWL ADAM FLUX EPOS GEED RISK MYLA UNCI CRIB ICKY SKYE OTTO THRU PFFT


16 simultaneous words; I think that's pretty good.

# Question 4: Coincidence?

There is still one unanswered question: did the designers of WordLock&reg; deliberately put  "FRED BUNS" in, or was it a coincidence? Hacker News reader [emhart](https://news.ycombinator.com/user?id=emhart) (aka the competitive lockpicker Schyler Towne) astutely commented that he had found the [patent](https://www.google.com/patents/US6621405) assigned to WordLock; it describes an algorithm similar to my `greedy_lock`.
After seeing that, I'm inclined to believe that "FRED BUNS" is the coincidental result of running the algorithm. On the other hand, there is a [followup patent](https://www.google.com/patents/US20080053167) that discusses a refinement
"wherein the letters on the wheels are configured to spell a first word displayed on a first row of letters and a second word displayed on a second row of letters." So the possibility of a two-word phrase was something that Wordlock LLc. was aware of.

# Patents

We see below that the procedure described in the [Wordlock Inc patent](https://www.google.com/patents/US6621405) is not quite as good as `greedy_lock`, because the patent states that at each disc position "*the entire word list is scanned*" to produce the letter frequencies, whereas `greedy_lock` scans only the *possible* words: the words that are consistent with the previously-filled discs. Because of that difference, the patented algorithm does worse than my `greedy_lock` (by 1,161 to 1,177) and  my `greedier_lock` (by 1,161 to 1,235).

In [34]:
def greedy_lock_patented(words=WORDS, order=range(DISCS)) -> Lock:
    """Make a lock where we greedily choose the LETTERS best letters for each disc, in order."""
    lock = DISCS * [Disc()] # Initially a lock of 4 empty discs
    for i in order: 
        # Make lock[i] be a disc that covers the most words, then update `words`
        lock[i] = disc = most_common(words, i)
        #### Don't update words #### words = [w for w in words if w[i] in disc]
    return Lock(lock)

In [35]:
word_count(greedy_lock_patented())

1161

What does it say about our patent system that Wordlock Inc actually got a patent for an algorithm that is worse than the initial idea I came up with just to use as a baseline against the *real* hillclimbing algorithm?

# Tests

It is a 
good idea to have some tests.  The following tests have poor coverage, because it is hard to test non-deterministic functions, and I didn't attempt that here.

In [36]:
def tests() -> bool:
    assert 'WORD' in WORDS
    assert 'FRED' in WORDS
    assert 'BUNS' in WORDS
    assert 'FIVE' in WORDS
    assert 'XYZZ' not in WORDS
    assert 'word' not in WORDS
    assert 'FIVER' not in WORDS
    assert len(WORDS) == 4360

    fredbuns = Lock(['FB', 'RU', 'EN', 'DS'])
    assert words_from(fredbuns) == ['BRED', 'BUND', 'BUNS', 'FRED', 'FUND', 'FUNS']
    assert word_count(fredbuns) == 6
    assert most_common(fredbuns, 0) == 'FRED'
    assert most_common(['stink', 'stank', 'stunk'], 2) == 'iau'

    assert greedy_lock() == ('SPTBDCLMAR', 'OAIEURLHYN', 'RNALEOTISM', 'SETADNLKYP')
    assert greedier_lock() == ('BPTCMSDLGW', 'OAEIURLYHW', 'EARNLIOTSM', 'SETNDALKYP')
    assert greedy_lock_patented() == ('SPTBDCLMAR', 'AOIEURLHYN', 'EARNLIOTSM', 'SETANDYLKO')
    assert word_count(greedier_lock()) >= word_count(greedy_lock())
    
    assert wordlock == Lock(('SPHMTWDLFB', 'LEYHNRUOAI', 'ENMLRTAOSK', 'DSNMPYLKTE'))
    assert regex(wordlock) == '[SPHMTWDLFB][LEYHNRUOAI][ENMLRTAOSK][DSNMPYLKTE]'
    assert word_count(wordlock) == 1118
    assert canonical(wordlock) == ('BDFHLMPSTW', 'AEHILNORUY', 'AEKLMNORST', 'DEKLMNPSTY')
    assert "FRED" in words_from(wordlock) 
    assert "BUNS" in words_from(wordlock)
    assert "QUIT" not in words_from(wordlock)

    assert distance(wordlock, lock1240) == 6
    assert distance(greedy_lock(), lock1240) == 5
    assert distance(greedy_lock(), greedy_lock_patented()) == 1

    assert has_duplicate_letters("WORD", ["WILD", "DOGS"])
    assert not has_duplicate_letters("WORD", ["FREE", "CATS"])

    assert greedy_simultaneous_words() == [
        'AADI', 'BEAD', 'CHEF', 'DIBS', 'EBON', 'FLIC', 'GOGO', 'HUCK', 'ICKY', 'KNUR', 'LYLA', 'ODYL', 'PFFT']
    
    assert lock_table([greedy_lock(), greedier_lock(), wordlock]) == {
     ('BCDGLMPSTW', 'AEHILORUWY', 'AEILMNORST', 'ADEKLNPSTY'): (1235, 3),
     ('ABCDLMPRST', 'AEHILNORUY', 'AEILMNORST', 'ADEKLNPSTY'): (1177, 5),
     ('BDFHLMPSTW', 'AEHILNORUY', 'AEKLMNORST', 'DEKLMNPSTY'): (1118, 6)}
    
    return True

tests()

True

# Addendum: 2018

New [research](https://www.theverge.com/2018/10/7/17940352/turing-test-one-word-minimal-human-ai-machine-poop) suggests that the one word that humans are most likely to think was generated by another human rather than by a machine is "*poop*". Is it a coincidence that the same week this research came out, I was in a bike shop and saw the following:

![](like.jpg)

This proves that I'm not the only one with a juvenile reaction to the *WordLock*®, but it raises some questions: Was the last visitor to the store a human asserting their individuality? Or a robot yearning to be free? Or [Triumph the insult comic dog](https://en.wikipedia.org/wiki/Triumph_the_Insult_Comic_Dog)? I guess we'll never know.

In [37]:
'POOP' in WORDS

True

# One More Question

I wonder if [@BIKESNOBNYC](https://twitter.com/bikesnobnyc) would appreciate this notebook?  On the one hand, he is the kind of guy who, in discussing the fact that bicycling is the seventh most popular recreational activity,  [wrote]() "*the number seven is itself a highly significant number. It is the lowest number that cannot be represented as the sum of the square of three integers*," so it seems he has some interest in mathematical oddities.  On the other hand, he followed that up by writing "*I have no idea what that means, but it's true*," so maybe not.

In [38]:
nums = range(11) 
sums = {A**2 + B**2 + C**2 for A in nums for B in nums for C in nums} # Sums of 3 squares
set(range(101)) - sums # Numbers up to 100 that are not the sum of 3 squares

{7, 15, 23, 28, 31, 39, 47, 55, 60, 63, 71, 79, 87, 92, 95}