<div style="text-align: right" align="right"><i>Peter Norvig, Oct 2022</i></div>

# Efficiently Selecting Names from a Menu 

We've all been faced with the task of selecting an item from an online menu. Usually you have the following options:

- You can select an item by clicking on it.
- You can type a few letters to move the menu's cursor to the first item whose prefix matches those letters (case ignored). 
- You can move the cursor up or down with the arrow keys.

In this notebook we're going to ignore the "clicking on it" option and concentrate on finding the shortest sequence of keystrokes (letters or arrow keys) that will select an item in the menu.

# Country Menu

I live in the United States, but when faced with a country menu, I think of my home as "`un↓↓`". I know that typing "`un`" will get me to "United Arab Emirates" (roughly 185 items deep in the menu) and then two down arrows will take me past "United Kingdom" to "United States." 

But questions remain:
- Is that the shortest possible key sequence? 
- What are shortest sequences for other country names? 
- What's the average key sequence length? 
- What about other menus (with items other than countries)? 

Let's answer these questions. First some preliminary imports and declarations, and a list of country names:

In [1]:
from typing import Dict, List, Iterable
from collections import Counter
import numpy as np

up     = '↑' # up arrow key
down   = '↓' # down arrow key

Item   = str # Type for a menu item, e.g. 'United States'
Keyseq = str # Type for a sequence of keystrokes (a string of letters and arrows, e.g. 'Un↓↓')

In [2]:
countries = [
    'Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola', 'Antigua and Barbuda', 'Argentina', 'Armenia', 
    'Australia', 'Austria', 'Azerbaijan', 'Bahamas, The', 'Bahrain', 'Bangladesh', 'Barbados', 'Belarus', 
    'Belgium', 'Belize', 'Benin', 'Bhutan', 'Bolivia', 'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Brunei', 
    'Bulgaria', 'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cambodia', 'Cameroon', 'Canada', 
    'Central African Republic', 'Chad', 'Chile', 'China', 'Colombia', 'Comoros', 'Congo, Democratic Republic of the', 
    'Congo, Republic of the', 'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Cyprus', 'Czech Republic', 'Denmark', 
    'Djibouti', 'Dominica', 'Dominican Republic', 'East Timor', 'Ecuador', 'Egypt', 'El Salvador', 
    'Equatorial Guinea', 'Eritrea', 'Estonia', 'Eswatini', 'Ethiopia', 'Fiji', 'Finland', 'France', 'Gabon', 
    'Gambia, The', 'Georgia', 'Germany', 'Ghana', 'Greece', 'Grenada', 'Guatemala', 'Guinea', 'Guinea-Bissau', 
    'Guyana', 'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 
    'Italy', 'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati', 'Korea, North', 'Korea, South', 'Kosovo', 
    'Kuwait', 'Kyrgyzstan', 'Laos', 'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Liechtenstein', 'Lithuania', 
    'Luxembourg', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives', 'Mali', 'Malta', 'Marshall Islands', 'Mauritania', 
    'Mauritius', 'Mexico', 'Micronesia, Federated States of', 'Moldova', 'Monaco', 'Mongolia', 'Montenegro', 'Morocco', 
    'Mozambique', 'Myanmar', 'Namibia', 'Nauru', 'Nepal', 'Netherlands', 'New Zealand', 'Nicaragua', 'Niger', 
    'Nigeria', 'North Macedonia', 'Norway', 'Oman', 'Pakistan', 'Palau', 'Panama', 'Papua New Guinea', 'Paraguay', 
    'Peru', 'Philippines', 'Poland', 'Portugal', 'Qatar', 'Romania', 'Russia', 'Rwanda', 'Saint Kitts and Nevis', 
    'Saint Lucia', 'Saint Vincent and the Grenadines', 'Samoa', 'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 
    'Senegal', 'Serbia', 'Seychelles', 'Sierra Leone', 'Singapore', 'Slovakia', 'Slovenia', 'Solomon Islands', 'Somalia', 
    'South Africa', 'Spain', 'Sri Lanka', 'Sudan', 'Sudan, South', 'Suriname', 'Sweden', 'Switzerland', 'Syria', 
    'Taiwan', 'Tajikistan', 'Tanzania', 'Thailand', 'Togo', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey', 
    'Turkmenistan', 'Tuvalu', 'Uganda', 'Ukraine', 'United Arab Emirates', 'United Kingdom', 'United States', 'Uruguay', 
    'Uzbekistan', 'Vanuatu', 'Vatican City', 'Venezuela', 'Vietnam', 'Yemen', 'Zambia', 'Zimbabwe']

# Shortest Key Sequence 

My strategy to find the **shortest key sequence for every item in a menu** is:
- Generate all "sensible" key sequences in shortest-first order. For each one:
  - Determine what menu item is selected by the key sequence. 
  - If that item has not been assigned a key sequence before, assign it. This is guaranteed to be a shortest possible key sequence for the item, because we're considering the sequences in shortest-first order.
- When all menu items have been assigned, we're done; return the assignments of shortest possible key sequences.

The function `shortest_keyseqs` implements this strategy:

In [3]:
def shortest_keyseqs(items) -> Dict[Item, Keyseq]:
    """Compute a dict of {'Item': 'Shortest Keyseq'}, e.g. {'Iceland': 'I', Japan': 'J↓'}."""
    assigned = {}
    for keyseq in all_sensible_keyseqs(items):
        item = select_from_menu(keyseq, items)
        if item not in assigned:
            assigned[item] = keyseq
            if len(assigned) == len(items):
                return assigned

A "sensible" key sequence consists of zero or more letters constituting a prefix of one of the menu item names, followed by zero or more arrows. 

It wouldn't make sense to generate a sequence of letters that is not a prefix of an item name, because the non-matching letter(s) wouldn't move the menu's cursor. It also wouldn't make sense to have both an up and a down arrow in a key sequence, because they cancel each other out.

In [4]:
def all_sensible_keyseqs(items) -> Iterable[Keyseq]:
    """All sensible key sequences, in shortest-first order.
    (For ties, use fewer arrows first; if still tied, use alphabetical ordering.)"""
    longest = max(map(len, items))
    for n in range(longest + 1):    # `n` is total length of key sequence; shortest first
        for a in range(n + 1):      # `a` is number of arrows in key sequence; fewer first
            keyseqs = {item[:n - a] + a * arrow 
                       for item in items for arrow in (up, down)}
            yield from sorted(keyseqs) 

For example, here are the first 30 sensible key sequences (note that no country starts with `'W'` or `'X'`):

In [5]:
keyseqs = all_sensible_keyseqs(countries)
print(*(repr(next(keyseqs)) for _ in range(30)))

'' 'A' 'B' 'C' 'D' 'E' 'F' 'G' 'H' 'I' 'J' 'K' 'L' 'M' 'N' 'O' 'P' 'Q' 'R' 'S' 'T' 'U' 'V' 'Y' 'Z' '↑' '↓' 'Af' 'Al' 'An'


Only one function remains to be defined: `select_from_menu(keyseq, items)` returns the menu item selected by `keyseq`:

In [6]:
def select_from_menu(keyseq, items: List[str]) -> str:
    """Select a menu item from `items` according to `keyseq`."""
    letters = keyseq.replace(up, '').replace(down, '') # The keyseq without the arrows
    for i, item in enumerate(items):
        if item.startswith(letters): 
            i = i - keyseq.count(up) + keyseq.count(down) # Arrow keys can move cursor `i` up or down,
            return items[np.clip(i, 0, len(items) - 1)]   # but not above or below the extent of the menu

# Solving the Problem: You're Up!

We are now ready to  find a shortest key sequence for every country:

In [7]:
%time results = shortest_keyseqs(countries)
results

CPU times: user 24.9 ms, sys: 294 µs, total: 25.2 ms
Wall time: 25.1 ms


{'Afghanistan': '',
 'Bahamas, The': 'B',
 'Cabo Verde': 'C',
 'Denmark': 'D',
 'East Timor': 'E',
 'Fiji': 'F',
 'Gabon': 'G',
 'Haiti': 'H',
 'Iceland': 'I',
 'Jamaica': 'J',
 'Kazakhstan': 'K',
 'Laos': 'L',
 'Madagascar': 'M',
 'Namibia': 'N',
 'Oman': 'O',
 'Pakistan': 'P',
 'Qatar': 'Q',
 'Romania': 'R',
 'Saint Kitts and Nevis': 'S',
 'Taiwan': 'T',
 'Uganda': 'U',
 'Vanuatu': 'V',
 'Yemen': 'Y',
 'Zambia': 'Z',
 'Albania': '↓',
 'Andorra': 'An',
 'Argentina': 'Ar',
 'Australia': 'Au',
 'Azerbaijan': 'Az',
 'Belarus': 'Be',
 'Bhutan': 'Bh',
 'Bolivia': 'Bo',
 'Brazil': 'Br',
 'Bulgaria': 'Bu',
 'Central African Republic': 'Ce',
 'Chad': 'Ch',
 'Colombia': 'Co',
 'Croatia': 'Cr',
 'Cuba': 'Cu',
 'Cyprus': 'Cy',
 'Czech Republic': 'Cz',
 'Djibouti': 'Dj',
 'Dominica': 'Do',
 'Ecuador': 'Ec',
 'Egypt': 'Eg',
 'El Salvador': 'El',
 'Equatorial Guinea': 'Eq',
 'Eritrea': 'Er',
 'Estonia': 'Es',
 'Ethiopia': 'Et',
 'France': 'Fr',
 'Georgia': 'Ge',
 'Ghana': 'Gh',
 'Greece': 'Gr',
 'G

*Nice!* The results seem sensible, and it only took a couple dozen milliseconds (even though I emphasized clarity over efficiency in the code). There is a surprise: instead of the expected four-character key sequence `'Un↓↓'` for "United States," there is a shorter key sequence:

In [8]:
results['United States']

'Ur↑'

The mnemonic "**you're up**" may help you remember this.

Here is a function to report on the results:

In [9]:
def report(results):
    """Report stats on the number of keystrokes needed to select each item."""
    N       = len(results)
    lengths = [len(results[item]) for item in results]
    counts  = dict(Counter(lengths))
    print(f'{N} items; Lengths: mean={sum(lengths)/N:3.2f}, max={max(lengths)}, counts={counts}')

In [10]:
report(results)

196 items; Lengths: mean=2.29, max=4, counts={0: 1, 1: 24, 2: 96, 3: 68, 4: 7}


We see that the average over the 196 countries is a bit over 2 keystrokes, and that only 7 countries require the maximum of 4 keystrokes.

# State Menu

To show that the code generalizes to different menus, let's examine US states (and territories):

In [11]:
states = [
    'Alabama', 'Alaska', 'American Samoa', 'Arizona', 'Arkansas', 'California', 'Colorado', 'Connecticut', 'Delaware',
    'District of Columbia', 'Federated States of Micronesia', 'Florida', 'Georgia', 'Guam', 'Hawaii', 'Idaho', 'Illinois',
    'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana', 'Maine', 'Marshall Islands', 'Maryland', 'Massachusetts', 
    'Michigan', 'Minnesota', 'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire', 'New Jersey', 
    'New Mexico', 'New York', 'North Carolina', 'North Dakota', 'Northern Mariana Islands', 'Ohio', 'Oklahoma', 'Oregon', 
    'Palau', 'Pennsylvania', 'Puerto Rico', 'Rhode Island', 'South Carolina', 'South Dakota', 'Tennessee', 'Texas', 'Utah', 
    'Vermont', 'Virgin Island', 'Virginia', 'Washington', 'West Virginia', 'Wisconsin', 'Wyoming']

In [12]:
shortest_keyseqs(states)

{'Alabama': '',
 'California': 'C',
 'Delaware': 'D',
 'Federated States of Micronesia': 'F',
 'Georgia': 'G',
 'Hawaii': 'H',
 'Idaho': 'I',
 'Kansas': 'K',
 'Louisiana': 'L',
 'Maine': 'M',
 'Nebraska': 'N',
 'Ohio': 'O',
 'Palau': 'P',
 'Rhode Island': 'R',
 'South Carolina': 'S',
 'Tennessee': 'T',
 'Utah': 'U',
 'Vermont': 'V',
 'Washington': 'W',
 'Alaska': '↓',
 'American Samoa': 'Am',
 'Arizona': 'Ar',
 'Colorado': 'Co',
 'District of Columbia': 'Di',
 'Florida': 'Fl',
 'Guam': 'Gu',
 'Illinois': 'Il',
 'Indiana': 'In',
 'Iowa': 'Io',
 'Kentucky': 'Ke',
 'Michigan': 'Mi',
 'Montana': 'Mo',
 'North Carolina': 'No',
 'Oklahoma': 'Ok',
 'Oregon': 'Or',
 'Pennsylvania': 'Pe',
 'Puerto Rico': 'Pu',
 'Virgin Island': 'Vi',
 'West Virginia': 'We',
 'Wisconsin': 'Wi',
 'Wyoming': 'Wy',
 'Arkansas': 'C↑',
 'Connecticut': 'D↑',
 'Marshall Islands': 'M↓',
 'Nevada': 'N↓',
 'Northern Mariana Islands': 'O↑',
 'South Dakota': 'S↓',
 'Texas': 'T↓',
 'Virginia': 'W↑',
 'Massachusetts': 'Mas',


In [13]:
report(shortest_keyseqs(states))

59 items; Lengths: mean=1.85, max=4, counts={0: 1, 1: 19, 2: 29, 3: 8, 4: 2}


The average is under two keystrokes, and only two states, New Jersey and New Mexico, require 4 keystrokes (New Hampshire and New York only need 3).

# Color Menu

One more example: the 140 [Web color names](https://en.wikipedia.org/wiki/Web_colors) recognized by modern browsers:

In [14]:
colors = [
    'AliceBlue', 'AntiqueWhite', 'Aqua', 'Aquamarine', 'Azure', 'Beige', 'Bisque', 'Black', 'BlanchedAlmond', 'Blue', 'BlueViolet', 
    'Brown', 'BurlyWood', 'CadetBlue', 'Chartreuse', 'Chocolate', 'Coral', 'CornflowerBlue', 'Cornsilk', 'Crimson', 'Cyan', 
    'DarkBlue', 'DarkCyan', 'DarkGoldenRod', 'DarkGrey', 'DarkGreen', 'DarkKhaki', 'DarkMagenta', 'DarkOliveGreen', 'Darkorange', 
    'DarkOrchid', 'DarkRed', 'DarkSalmon', 'DarkSeaGreen', 'DarkSlateBlue', 'DarkSlateGrey', 'DarkTurquoise', 'DarkViolet', 
    'DeepPink', 'DeepSkyBlue', 'DimGray', 'DodgerBlue', 'FireBrick', 'FloralWhite', 'ForestGreen', 'Fuchsia', 'Gainsboro', 
    'GhostWhite', 'Gold', 'GoldenRod', 'Grey', 'Green', 'GreenYellow', 'HoneyDew', 'HotPink', 'IndianRed', 'Indigo', 'Ivory', 
    'Khaki', 'Lavender', 'LavenderBlush', 'LawnGreen', 'LemonChiffon', 'LightBlue', 'LightCoral', 'LightCyan', 'LightGoldenRodYellow',
    'LightGrey', 'LightGreen', 'LightPink', 'LightSalmon', 'LightSeaGreen', 'LightSkyBlue', 'LightSlateGrey', 'LightSteelBlue', 
    'LightYellow', 'Lime', 'LimeGreen', 'Linen', 'Magenta', 'Maroon', 'MediumAquaMarine', 'MediumBlue', 'MediumOrchid', 'MediumPurple',
    'MediumSeaGreen', 'MediumSlateBlue', 'MediumSpringGreen', 'MediumTurquoise', 'MediumVioletRed', 'MidnightBlue', 'MintCream', 
    'MistyRose', 'Moccasin', 'NavajoWhite', 'Navy', 'OldLace', 'Olive', 'OliveDrab', 'Orange', 'OrangeRed', 'Orchid', 'PaleGoldenRod',
    'PaleGreen', 'PaleTurquoise', 'PaleVioletRed', 'PapayaWhip', 'PeachPuff', 'Peru', 'Pink', 'Plum', 'PowderBlue', 'Purple', 'Red', 
    'RosyBrown', 'RoyalBlue', 'SaddleBrown', 'Salmon', 'SandyBrown', 'SeaGreen', 'SeaShell', 'Sienna', 'Silver', 'SkyBlue', 
    'SlateBlue', 'SlateGrey', 'Snow', 'SpringGreen', 'SteelBlue', 'Tan', 'Teal', 'Thistle', 'Tomato', 'Turquoise', 'Violet', 
    'Wheat', 'White', 'WhiteSmoke', 'Yellow', 'YellowGreen']

In [15]:
shortest_keyseqs(colors)

{'AliceBlue': '',
 'Beige': 'B',
 'CadetBlue': 'C',
 'DarkBlue': 'D',
 'FireBrick': 'F',
 'Gainsboro': 'G',
 'HoneyDew': 'H',
 'IndianRed': 'I',
 'Khaki': 'K',
 'Lavender': 'L',
 'Magenta': 'M',
 'NavajoWhite': 'N',
 'OldLace': 'O',
 'PaleGoldenRod': 'P',
 'Red': 'R',
 'SaddleBrown': 'S',
 'Tan': 'T',
 'Violet': 'V',
 'Wheat': 'W',
 'Yellow': 'Y',
 'AntiqueWhite': '↓',
 'Aqua': 'Aq',
 'Azure': 'Az',
 'Bisque': 'Bi',
 'Black': 'Bl',
 'Brown': 'Br',
 'BurlyWood': 'Bu',
 'Chartreuse': 'Ch',
 'Coral': 'Co',
 'Crimson': 'Cr',
 'Cyan': 'Cy',
 'DeepPink': 'De',
 'DimGray': 'Di',
 'DodgerBlue': 'Do',
 'FloralWhite': 'Fl',
 'ForestGreen': 'Fo',
 'Fuchsia': 'Fu',
 'GhostWhite': 'Gh',
 'Gold': 'Go',
 'Grey': 'Gr',
 'Ivory': 'Iv',
 'LemonChiffon': 'Le',
 'LightBlue': 'Li',
 'MediumAquaMarine': 'Me',
 'MidnightBlue': 'Mi',
 'Moccasin': 'Mo',
 'Orange': 'Or',
 'PeachPuff': 'Pe',
 'Pink': 'Pi',
 'Plum': 'Pl',
 'PowderBlue': 'Po',
 'Purple': 'Pu',
 'RosyBrown': 'Ro',
 'SeaGreen': 'Se',
 'Sienna': 'Si'

In [16]:
report(shortest_keyseqs(colors))

140 items; Lengths: mean=2.77, max=7, counts={0: 1, 1: 20, 2: 59, 3: 29, 4: 7, 5: 12, 6: 9, 7: 3}


This time the average length is close to 3 letters. There are 12 color names that require 6 or 7 keystrokes, and another 12 that require 5 keystrokes. The difficulty is the large number of color names that start with the prefix "Light", "Medium", or "Dark":

In [17]:
{shade: sum(name.startswith(shade) for name in colors) 
 for shade in ("Light", "Medium", "Dark")}

{'Light': 13, 'Medium': 9, 'Dark': 17}