{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Peter Norvig
March 2020
\n", "\n", "# Elemental Spelling\n", "\n", "Consider this problem: \n", "\n", "*Given a word, decide if it can be spelled using only the symbols in the **[periodic table](https://en.wikipedia.org/wiki/Periodic_table)** of elements. For example, the word \"bananas\" can be spelled with \"BaNaNaS\" (Barium-Sodium-Sodium-Sulfur). There may be multiple possible spellings for a word–\"bananas\" could also be \"BaNaNAs'\" (Barium-Sodium-Nitrogen-Arsenic).*\n", "\n", "To start, here is the periodic table, which I've called `elements`:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "elements = dict(H='Hydrogen', He='Helium', Li='Lithium', Be='Beryllium', B='Boron', \n", "C='Carbon', N='Nitrogen', O='Oxygen', F='Fluorine', Ne='Neon', Na='Sodium', Mg='Magnesium', \n", "Al='Aluminium', Si='Silicon', P='Phosphorus', S='Sulfur', Cl='Chlorine', Ar='Argon', \n", "K='Potassium', Ca='Calcium', Sc='Scandium', Ti='Titanium', V='Vanadium', Cr='Chromium', \n", "Mn='Manganese', Fe='Iron', Co='Cobalt', Ni='Nickel', Cu='Copper', Zn='Zinc', Ga='Gallium', \n", "Ge='Germanium', As='Arsenic', Se='Selenium', Br='Bromine', Kr='Krypton', Rb='Rubidium', \n", "Sr='Strontium', Y='Yttrium', Zr='Zirconium', Nb='Niobium', Mo='Molybdenum', Tc='Technetium', \n", "Ru='Ruthenium', Rh='Rhodium', Pd='Palladium', Ag='Silver', Cd='Cadmium', In='Indium', Sn='Tin', \n", "Sb='Antimony', Te='Tellurium', I='Iodine', Xe='Xenon', Cs='Cesium', Ba='Barium', La='Lanthanum', \n", "Ce='Cerium', Pr='Praseodymium', Nd='Neodymium', Pm='Promethium', Sm='Samarium', Eu='Europium', \n", "Gd='Gadolinium', Tb='Terbium', Dy='Dysprosium', Ho='Holmium', Er='Erbium', Tm='Thulium', \n", "Yb='Ytterbium', Lu='Lutetium', Hf='Hafnium', Ta='Tantalum', W='Tungsten', Re='Rhenium', \n", "Os='Osmium', Ir='Iridium', Pt='Platinum', Au='Gold', Hg='Mercury', Tl='Thallium', Pb='Lead', \n", "Bi='Bismuth', Po='Polonium', At='Astatine', Rn='Radon', Fr='Francium', Ra='Radium', Ac='Actinium', \n", "Th='Thorium', Pa='Protactinium', U='Uranium', Np='Neptunium', Pu='Plutonium', Am='Americium', \n", "Cm='Curium', Bk='Berkelium', Cf='Californium', Es='Einsteinium', Fm='Fermium', Md='Mendelevium', \n", "No='Nobelium', Lr='Lawrencium', Rf='Rutherfordium', Db='Dubnium', Sg='Seaborgium', Bh='Bohrium', \n", "Hs='Hassium', Mt='Meitnerium', Ds='Darmstadtium', Rg='Roentgenium', Cn='Copernicium', Nh='Nihonium', \n", "Fl='Flerovium', Mc='Moscovium', Lv='Livermorium', Ts='Tennessine', Og='Oganesson')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "assert len(elements) == 118\n", "assert 'H' in elements and 'He' in elements and 'Fire' not in elements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is a recursive algorithm to solve the problem. A word is **spellable** if any of three cases hold: \n", "1) The word is the empty string.\n", "2) The first **one** character of the word (capitalized) forms an element symbol, and the rest of the word is spellable.\n", "3) The first **two** characters of the word (capitalized) forms an element symbol, and the rest of the word is spellable.\n", "\n", "Here is the code:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def spellable(word: str) -> bool:\n", " \"\"\"Can we spell `word` by concatenating symbols in `elements`?\"\"\"\n", " def case(k: int) -> bool: \n", " return word[:k].capitalize() in elements and spellable(word[k:])\n", " return word == '' or case(1) or case(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can test the function on two examples:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spellable('bananas')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spellable('yogurt')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That was easy! But maybe you'd like to see the actual spellings:`'BaNaNaS'` or `'BaNaNAs'`. The function `spellings` does that. The general idea is the same (same three cases). However, each case returns a **set** of possible spellings. It is important to distinguish between the spellings of an unspellable word (the empty set) and the spellings of the empty string (a set consisting of one spelling, the empty string)." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def spellings(word) -> set:\n", " \"\"\"All spellings of `word` formed by concatenating symbols in `elements`.\"\"\"\n", " def case(k: int) -> set:\n", " head, tail = word[:k].capitalize(), word[k:]\n", " if head in elements:\n", " return {head + rest for rest in spellings(tail)}\n", " else:\n", " return set()\n", " return {''} if word == '' else case(1) | case(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The two examples:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'BaNaNAs', 'BaNaNaS'}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spellings('bananas')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "set()" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spellings('yogurt') " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Testing\n", "\n", "Here I define `bad`, a list of words that are **not** spellable, and `good`, a list of words that **are**. Then I make some assertions:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "bad = 'hello world failure not an alternative'.split() # Unspellable words\n", "\n", "good = '''howdy orb nonsuccess is notan option \n", "bananas wonky nutso psychic attention functions officious hyperbolic \n", "vichyssois bobbysocks phony whippersnappers soupspoons buffoonish \n", "bilateralism capabilities alterabilities cioppino pincushion \n", "onionskins unprofessional biostatistical copernicus inconspicuous \n", "nonpoisonous floccinaucinihilipilification'''.split() # Spellable words\n", "\n", "for w in bad:\n", " assert not spellable(w) and not spellings(w)\n", "for w in good:\n", " assert spellable(w) and spellings(w)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And here are the actual spellings for the good words:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'HOWDy', 'HoWDy'},\n", " {'ORb'},\n", " {'NONSUCCEsS', 'NONSUCCeSS', 'NoNSUCCEsS', 'NoNSUCCeSS'},\n", " {'IS'},\n", " {'NOTaN', 'NoTaN'},\n", " {'OPTiON', 'OPtION'},\n", " {'BaNaNAs', 'BaNaNaS'},\n", " {'WONKY'},\n", " {'NUTsO'},\n", " {'PSYCHIC'},\n", " {'AtTeNTiON'},\n", " {'FUNCTiONS'},\n", " {'OFFICIOUS'},\n", " {'HYPErBOLiC'},\n", " {'VICHYSSOIS'},\n", " {'BOBBYSOCKS'},\n", " {'PHONY', 'PHoNY'},\n", " {'WHIPPErSNaPPErS'},\n", " {'SOUPSPOONS', 'SOUPSPoONS'},\n", " {'BUFFOONISH', 'BUFFOONiSH'},\n", " {'BILaTeRaLiSm', 'BiLaTeRaLiSm'},\n", " {'CaPaBILiTiEs', 'CaPaBiLiTiEs'},\n", " {'AlTeRaBILiTiEs', 'AlTeRaBiLiTiEs'},\n", " {'CIOPPINO', 'CIOPPINo', 'CIOPPInO'},\n", " {'PINCUSHION', 'PINCuSHION', 'PInCUSHION', 'PInCuSHION'},\n", " {'ONIONSKINS', 'ONIONSKInS', 'ONiONSKINS', 'ONiONSKInS'},\n", " {'UNPrOFEsSIONAl', 'UNPrOFEsSiONAl', 'UNPrOFeSSIONAl', 'UNPrOFeSSiONAl'},\n", " {'BIOSTaTiSTiCAl', 'BIOsTaTiSTiCAl', 'BiOSTaTiSTiCAl', 'BiOsTaTiSTiCAl'},\n", " {'COPErNICUS',\n", " 'COPErNICuS',\n", " 'COPErNiCUS',\n", " 'COPErNiCuS',\n", " 'CoPErNICUS',\n", " 'CoPErNICuS',\n", " 'CoPErNiCUS',\n", " 'CoPErNiCuS'},\n", " {'INCONSPICUOUS',\n", " 'INCONSPICuOUS',\n", " 'INCoNSPICUOUS',\n", " 'INCoNSPICuOUS',\n", " 'InCONSPICUOUS',\n", " 'InCONSPICuOUS',\n", " 'InCoNSPICUOUS',\n", " 'InCoNSPICuOUS'},\n", " {'NONPOISONOUS',\n", " 'NONPOISONoUS',\n", " 'NONPoISONOUS',\n", " 'NONPoISONoUS',\n", " 'NONpOISONOUS',\n", " 'NONpOISONoUS',\n", " 'NoNPOISONOUS',\n", " 'NoNPOISONoUS',\n", " 'NoNPoISONOUS',\n", " 'NoNPoISONoUS',\n", " 'NoNpOISONOUS',\n", " 'NoNpOISONoUS'},\n", " {'FlOCCINAuCINIHILiPILiFICAtION',\n", " 'FlOCCINAuCINIHILiPILiFICaTiON',\n", " 'FlOCCINAuCINiHILiPILiFICAtION',\n", " 'FlOCCINAuCINiHILiPILiFICaTiON',\n", " 'FlOCCINAuCInIHILiPILiFICAtION',\n", " 'FlOCCINAuCInIHILiPILiFICaTiON',\n", " 'FlOCCINaUCINIHILiPILiFICAtION',\n", " 'FlOCCINaUCINIHILiPILiFICaTiON',\n", " 'FlOCCINaUCINiHILiPILiFICAtION',\n", " 'FlOCCINaUCINiHILiPILiFICaTiON',\n", " 'FlOCCINaUCInIHILiPILiFICAtION',\n", " 'FlOCCINaUCInIHILiPILiFICaTiON',\n", " 'FlOCCInAuCINIHILiPILiFICAtION',\n", " 'FlOCCInAuCINIHILiPILiFICaTiON',\n", " 'FlOCCInAuCINiHILiPILiFICAtION',\n", " 'FlOCCInAuCINiHILiPILiFICaTiON',\n", " 'FlOCCInAuCInIHILiPILiFICAtION',\n", " 'FlOCCInAuCInIHILiPILiFICaTiON'}]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[spellings(w) for w in good]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about spelling the actual names of the elements using the element symbols? We see below that only 15 out of 118 are spellable. \n", "\n", "`%time` tells us this took only about a millisecond to do 236 calls to `spellings`." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1.05 ms, sys: 7 µs, total: 1.05 ms\n", "Wall time: 1.07 ms\n" ] }, { "data": { "text/plain": [ "[{'CArBON', 'CaRbON'},\n", " {'NeON'},\n", " {'SILiCON', 'SILiCoN', 'SiLiCON', 'SiLiCoN'},\n", " {'PHOSPHORuS',\n", " 'PHOSPHoRuS',\n", " 'PHOsPHORuS',\n", " 'PHOsPHoRuS',\n", " 'PHoSPHORuS',\n", " 'PHoSPHoRuS'},\n", " {'IrON'},\n", " {'COPPEr', 'CoPPEr'},\n", " {'ArSeNIC', 'ArSeNiC'},\n", " {'KrYPtON'},\n", " {'SILvEr', 'SiLvEr'},\n", " {'TiN'},\n", " {'XeNON', 'XeNoN'},\n", " {'BISmUTh', 'BiSmUTh'},\n", " {'AsTaTiNe'},\n", " {'TeNNEsSINe', 'TeNNEsSiNe', 'TeNNeSSINe', 'TeNNeSSiNe'},\n", " {'OGaNEsSON', 'OGaNeSSON'}]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time [spellings(w) for w in elements.values() if spellings(w)]" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 4 }