Add files via upload

This commit is contained in:
Peter Norvig 2024-12-23 13:35:13 -08:00 committed by GitHub
parent 61dc8eb584
commit e1bd865098
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -8,62 +8,85 @@
"\n",
"# Elemental Spelling\n",
"\n",
"Here's a problem: \n",
"Consider this problem: \n",
"\n",
"> Given a word, decide if it can be spelled using only the symbols in the **[periodic table](https://en.wikipedia.org/wiki/Periodic_table)** of elements. For example, the word \"bananas\" can be spelled with \"BaNaNaS\" (Barium-Sodium-Sodium-Sulfur). Note that there can be multiple possible spellings for a word—\"coin\" could be \"CoIn\" (Cobalt-Indium) or \"COIN\" (Carbon-Oxygen-Iodine-Nitrogen). \n",
"*Given a word, decide if it can be spelled using only the symbols in the **[periodic table](https://en.wikipedia.org/wiki/Periodic_table)** of elements. For example, the word \"bananas\" can be spelled with \"BaNaNaS\" (Barium-Sodium-Sodium-Sulfur). There may be multiple possible spellings for a word\"bananas\" could also be \"BaNaNAs'\" (Barium-Sodium-Nitrogen-Arsenic).*\n",
"\n",
"Here is a sketch of a recursive algorithm to solve the problem. A word is **spellable** if any of the following are true:\n",
"- The word is the empty word.\n",
"- The first 2 letters of the word (capitalized) form an element symbol, and the rest of the word is spellable.\n",
"- The first 1 letter of the word (capitalized) forms an element symbol, and the rest of the word is spellable.\n",
"\n",
"The input to `spellable` should be a string and the output is a boolean. Here is the code:"
"To start, here is the periodic table, which I've called `elements`:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"elements = dict(H='Hydrogen', He='Helium', Li='Lithium', Be='Beryllium', B='Boron', \n",
"C='Carbon', N='Nitrogen', O='Oxygen', F='Fluorine', Ne='Neon', Na='Sodium', Mg='Magnesium', \n",
"Al='Aluminium', Si='Silicon', P='Phosphorus', S='Sulfur', Cl='Chlorine', Ar='Argon', \n",
"K='Potassium', Ca='Calcium', Sc='Scandium', Ti='Titanium', V='Vanadium', Cr='Chromium', \n",
"Mn='Manganese', Fe='Iron', Co='Cobalt', Ni='Nickel', Cu='Copper', Zn='Zinc', Ga='Gallium', \n",
"Ge='Germanium', As='Arsenic', Se='Selenium', Br='Bromine', Kr='Krypton', Rb='Rubidium', \n",
"Sr='Strontium', Y='Yttrium', Zr='Zirconium', Nb='Niobium', Mo='Molybdenum', Tc='Technetium', \n",
"Ru='Ruthenium', Rh='Rhodium', Pd='Palladium', Ag='Silver', Cd='Cadmium', In='Indium', Sn='Tin', \n",
"Sb='Antimony', Te='Tellurium', I='Iodine', Xe='Xenon', Cs='Cesium', Ba='Barium', La='Lanthanum', \n",
"Ce='Cerium', Pr='Praseodymium', Nd='Neodymium', Pm='Promethium', Sm='Samarium', Eu='Europium', \n",
"Gd='Gadolinium', Tb='Terbium', Dy='Dysprosium', Ho='Holmium', Er='Erbium', Tm='Thulium', \n",
"Yb='Ytterbium', Lu='Lutetium', Hf='Hafnium', Ta='Tantalum', W='Tungsten', Re='Rhenium', \n",
"Os='Osmium', Ir='Iridium', Pt='Platinum', Au='Gold', Hg='Mercury', Tl='Thallium', Pb='Lead', \n",
"Bi='Bismuth', Po='Polonium', At='Astatine', Rn='Radon', Fr='Francium', Ra='Radium', Ac='Actinium', \n",
"Th='Thorium', Pa='Protactinium', U='Uranium', Np='Neptunium', Pu='Plutonium', Am='Americium', \n",
"Cm='Curium', Bk='Berkelium', Cf='Californium', Es='Einsteinium', Fm='Fermium', Md='Mendelevium', \n",
"No='Nobelium', Lr='Lawrencium', Rf='Rutherfordium', Db='Dubnium', Sg='Seaborgium', Bh='Bohrium', \n",
"Hs='Hassium', Mt='Meitnerium', Ds='Darmstadtium', Rg='Roentgenium', Cn='Copernicium', Nh='Nihonium', \n",
"Fl='Flerovium', Mc='Moscovium', Lv='Livermorium', Ts='Tennessine', Og='Oganesson')"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"assert len(elements) == 118\n",
"assert 'H' in elements and 'He' in elements and 'Fire' not in elements"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is a recursive algorithm to solve the problem. A word is **spellable** if any of three cases hold: \n",
"1) The word is the empty string.\n",
"2) The first **one** character of the word (capitalized) forms an element symbol, and the rest of the word is spellable.\n",
"3) The first **two** characters of the word (capitalized) forms an element symbol, and the rest of the word is spellable.\n",
"\n",
"Here is the code:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def spellable(word: str) -> bool:\n",
" \"\"\"Can we spell `word` using the `symbols` of the elements?\"\"\"\n",
" return (word == ''\n",
" or word[:2].capitalize() in symbols and spellable(word[2:])\n",
" or word[:1].capitalize() in symbols and spellable(word[1:]))"
" \"\"\"Can we spell `word` by concatenating symbols in `elements`?\"\"\"\n",
" def case(k: int) -> bool: \n",
" return word[:k].capitalize() in elements and spellable(word[k:])\n",
" return word == '' or case(1) or case(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I felt a bit bad about repeating a line of code above—violating [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)—but using a subfunction or `any/for` would add complexity. Here are the 118 currently defined `symbols`. (Note that the symbols are all capitalized, so I capitalize `[word[:2]` and `word[:1]` in `spellable` to make sure they match.)"
"We can test the function on two examples:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"symbols = set( # Elements in the periodic table\n",
" 'Ac Al Am Sb Ar As At Ba Bk Be Bi Bh B Br Cd Ca Cf C Ce Cs Cl Cr Co Cn Cu Cm Ds Db '\n",
" 'Dy Es Er Eu Fm Fl F Fr Gd Ga Ge Au Hf Hs He Ho H In I Ir Fe Kr La Lr Pb Li Lv Lu '\n",
" 'Mg Mn Mt Md Hg Mo Mc Nd Ne Np Ni Nh Nb N No Og Os O Pd P Pt Pu Po K Pr Pm Pa Ra Rn '\n",
" 'Re Rh Rg Rb Ru Rf Sm Sc Sg Se Si Ag Na Sr S Ta Tc Te Ts Tb Tl Th Tm Sn Ti W U V Xe '\n",
" 'Yb Y Zn Zr'.split())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now| test the function (on `'Bananas'` and `'hello'`):"
]
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": 4,
"metadata": {},
"outputs": [
{
@ -72,18 +95,18 @@
"True"
]
},
"execution_count": 24,
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"spellable('Bananas')"
"spellable('bananas')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 5,
"metadata": {},
"outputs": [
{
@ -92,77 +115,20 @@
"False"
]
},
"execution_count": 25,
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"spellable('hello')"
"spellable('yogurt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That was easy. \n",
"\n",
"But maybe you'd like to see the actual spelling:`'BaNaNaS'`. The function `spelling` does that. The general idea is the same, except:\n",
" - We use the subfunction `first_rest_spelling` rather than repeating code.\n",
" - Both `spelling` and `first_rest_spelling` return either a string (the spelling) or `None` if no spelling is possible.\n",
" - There might be multiple possible spellings; only one is returned.\n",
" - We use `lru_cache` to avoid repeated computation and thereby speed up the function."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"from functools import lru_cache\n",
"\n",
"@lru_cache()\n",
"def spelling(word):\n",
" \"The spelling for `word` using `symbols` of the elements; or None if fail.\"\n",
" return '' if word == '' else first_rest_spelling(word, 2) or first_rest_spelling(word, 1)\n",
"\n",
"def first_rest_spelling(word, k):\n",
" \"Resulting spelling from taking off first k characters of word; or None if fail.\"\n",
" first, rest = word[:k].capitalize(), word[k:]\n",
" if first in symbols and spelling(rest) is not None:\n",
" return first + spelling(rest)\n",
" else:\n",
" return None"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'BaNaNaS'"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"spelling('bananas')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Testing\n",
"\n",
"Here I define `bad`, a list of words that are **not** spellable, and `good`, a list of words that **are**, and make some assertions:"
"That was easy! But maybe you'd like to see the actual spellings:`'BaNaNaS'` or `'BaNaNAs'`. The function `spellings` does that. The general idea is the same (same three cases). However, each case returns a **set** of possible spellings. It is important to distinguish between the spellings of an unspellable word (the empty set) and the spellings of the empty string (a set consisting of one spelling, the empty string)."
]
},
{
@ -170,19 +136,93 @@
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def spellings(word) -> set:\n",
" \"\"\"All spellings of `word` formed by concatenating symbols in `elements`.\"\"\"\n",
" def case(k: int) -> set:\n",
" head, tail = word[:k].capitalize(), word[k:]\n",
" if head in elements:\n",
" return {head + rest for rest in spellings(tail)}\n",
" else:\n",
" return set()\n",
" return {''} if word == '' else case(1) | case(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The two examples:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'BaNaNAs', 'BaNaNaS'}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"spellings('bananas')"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"set()"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"spellings('yogurt') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Testing\n",
"\n",
"Here I define `bad`, a list of words that are **not** spellable, and `good`, a list of words that **are**. Then I make some assertions:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"bad = 'hello world failure not an alternative'.split() # Unspellable words\n",
"\n",
"good = '''howdy sphere falure is notan option bananas \n",
" carbon iron silver silicon copper arsenic tin xenon bismuth\n",
" attention copernicus inconspicuous hyperbolic orbits functions\n",
" wonky nutso officious psychic unprofessional bilateralism \n",
" whippersnappers vichyssois bobbysocks alterabilities capabilities\n",
" biostatistical physics floccinaucinihilipilification'''.split() # Spellable words\n",
"good = '''howdy orb nonsuccess is notan option \n",
"bananas wonky nutso psychic attention functions officious hyperbolic \n",
"vichyssois bobbysocks phony whippersnappers soupspoons buffoonish \n",
"bilateralism capabilities alterabilities cioppino pincushion \n",
"onionskins unprofessional biostatistical copernicus inconspicuous \n",
"nonpoisonous floccinaucinihilipilification'''.split() # Spellable words\n",
"\n",
"assert len(symbols) == 118\n",
"assert not any(spellable(w) or spelling(w) for w in bad) \n",
"assert all(spellable(w) and spelling(w) for w in good)"
"for w in bad:\n",
" assert not spellable(w) and not spellings(w)\n",
"for w in good:\n",
" assert spellable(w) and spellings(w)"
]
},
{
@ -194,63 +234,157 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'AlTeRaBiLiTiEs',\n",
" 'ArSeNiC',\n",
" 'AtTeNTiON',\n",
" 'BOBBYSOCKS',\n",
" 'BaNaNaS',\n",
" 'BiLaTeRaLiSm',\n",
" 'BiOsTaTiSTiCAl',\n",
" 'BiSmUTh',\n",
" 'CaPaBiLiTiEs',\n",
" 'CaRbON',\n",
" 'CoPErNiCuS',\n",
" 'CoPPEr',\n",
" 'FAlURe',\n",
" 'FUNCTiONS',\n",
" 'FlOCCInAuCInIHILiPILiFICaTiON',\n",
" 'HYPErBOLiC',\n",
" 'HoWDy',\n",
" 'IS',\n",
" 'InCoNSPICuOUS',\n",
" 'IrON',\n",
" 'NUTsO',\n",
" 'NoTaN',\n",
" 'OFFICIOUS',\n",
" 'OPtION',\n",
" 'ORbITs',\n",
" 'PHYSiCs',\n",
" 'PSYCHIC',\n",
" 'SPHeRe',\n",
" 'SiLiCoN',\n",
" 'SiLvEr',\n",
" 'TiN',\n",
" 'UNPrOFeSSiONAl',\n",
" 'VICHYSSOIS',\n",
" 'WHIPPErSNaPPErS',\n",
" 'WONKY',\n",
" 'XeNoN'}"
"[{'HOWDy', 'HoWDy'},\n",
" {'ORb'},\n",
" {'NONSUCCEsS', 'NONSUCCeSS', 'NoNSUCCEsS', 'NoNSUCCeSS'},\n",
" {'IS'},\n",
" {'NOTaN', 'NoTaN'},\n",
" {'OPTiON', 'OPtION'},\n",
" {'BaNaNAs', 'BaNaNaS'},\n",
" {'WONKY'},\n",
" {'NUTsO'},\n",
" {'PSYCHIC'},\n",
" {'AtTeNTiON'},\n",
" {'FUNCTiONS'},\n",
" {'OFFICIOUS'},\n",
" {'HYPErBOLiC'},\n",
" {'VICHYSSOIS'},\n",
" {'BOBBYSOCKS'},\n",
" {'PHONY', 'PHoNY'},\n",
" {'WHIPPErSNaPPErS'},\n",
" {'SOUPSPOONS', 'SOUPSPoONS'},\n",
" {'BUFFOONISH', 'BUFFOONiSH'},\n",
" {'BILaTeRaLiSm', 'BiLaTeRaLiSm'},\n",
" {'CaPaBILiTiEs', 'CaPaBiLiTiEs'},\n",
" {'AlTeRaBILiTiEs', 'AlTeRaBiLiTiEs'},\n",
" {'CIOPPINO', 'CIOPPINo', 'CIOPPInO'},\n",
" {'PINCUSHION', 'PINCuSHION', 'PInCUSHION', 'PInCuSHION'},\n",
" {'ONIONSKINS', 'ONIONSKInS', 'ONiONSKINS', 'ONiONSKInS'},\n",
" {'UNPrOFEsSIONAl', 'UNPrOFEsSiONAl', 'UNPrOFeSSIONAl', 'UNPrOFeSSiONAl'},\n",
" {'BIOSTaTiSTiCAl', 'BIOsTaTiSTiCAl', 'BiOSTaTiSTiCAl', 'BiOsTaTiSTiCAl'},\n",
" {'COPErNICUS',\n",
" 'COPErNICuS',\n",
" 'COPErNiCUS',\n",
" 'COPErNiCuS',\n",
" 'CoPErNICUS',\n",
" 'CoPErNICuS',\n",
" 'CoPErNiCUS',\n",
" 'CoPErNiCuS'},\n",
" {'INCONSPICUOUS',\n",
" 'INCONSPICuOUS',\n",
" 'INCoNSPICUOUS',\n",
" 'INCoNSPICuOUS',\n",
" 'InCONSPICUOUS',\n",
" 'InCONSPICuOUS',\n",
" 'InCoNSPICUOUS',\n",
" 'InCoNSPICuOUS'},\n",
" {'NONPOISONOUS',\n",
" 'NONPOISONoUS',\n",
" 'NONPoISONOUS',\n",
" 'NONPoISONoUS',\n",
" 'NONpOISONOUS',\n",
" 'NONpOISONoUS',\n",
" 'NoNPOISONOUS',\n",
" 'NoNPOISONoUS',\n",
" 'NoNPoISONOUS',\n",
" 'NoNPoISONoUS',\n",
" 'NoNpOISONOUS',\n",
" 'NoNpOISONoUS'},\n",
" {'FlOCCINAuCINIHILiPILiFICAtION',\n",
" 'FlOCCINAuCINIHILiPILiFICaTiON',\n",
" 'FlOCCINAuCINiHILiPILiFICAtION',\n",
" 'FlOCCINAuCINiHILiPILiFICaTiON',\n",
" 'FlOCCINAuCInIHILiPILiFICAtION',\n",
" 'FlOCCINAuCInIHILiPILiFICaTiON',\n",
" 'FlOCCINaUCINIHILiPILiFICAtION',\n",
" 'FlOCCINaUCINIHILiPILiFICaTiON',\n",
" 'FlOCCINaUCINiHILiPILiFICAtION',\n",
" 'FlOCCINaUCINiHILiPILiFICaTiON',\n",
" 'FlOCCINaUCInIHILiPILiFICAtION',\n",
" 'FlOCCINaUCInIHILiPILiFICaTiON',\n",
" 'FlOCCInAuCINIHILiPILiFICAtION',\n",
" 'FlOCCInAuCINIHILiPILiFICaTiON',\n",
" 'FlOCCInAuCINiHILiPILiFICAtION',\n",
" 'FlOCCInAuCINiHILiPILiFICaTiON',\n",
" 'FlOCCInAuCInIHILiPILiFICAtION',\n",
" 'FlOCCInAuCInIHILiPILiFICaTiON'}]"
]
},
"execution_count": 7,
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"{spelling(w) for w in good}"
"[spellings(w) for w in good]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What about spelling the actual names of the elements using the element symbols? We see below that only 15 out of 118 are spellable. \n",
"\n",
"`%time` tells us this took only about a millisecond to do 236 calls to `spellings`."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.05 ms, sys: 7 µs, total: 1.05 ms\n",
"Wall time: 1.07 ms\n"
]
},
{
"data": {
"text/plain": [
"[{'CArBON', 'CaRbON'},\n",
" {'NeON'},\n",
" {'SILiCON', 'SILiCoN', 'SiLiCON', 'SiLiCoN'},\n",
" {'PHOSPHORuS',\n",
" 'PHOSPHoRuS',\n",
" 'PHOsPHORuS',\n",
" 'PHOsPHoRuS',\n",
" 'PHoSPHORuS',\n",
" 'PHoSPHoRuS'},\n",
" {'IrON'},\n",
" {'COPPEr', 'CoPPEr'},\n",
" {'ArSeNIC', 'ArSeNiC'},\n",
" {'KrYPtON'},\n",
" {'SILvEr', 'SiLvEr'},\n",
" {'TiN'},\n",
" {'XeNON', 'XeNoN'},\n",
" {'BISmUTh', 'BiSmUTh'},\n",
" {'AsTaTiNe'},\n",
" {'TeNNEsSINe', 'TeNNEsSiNe', 'TeNNeSSINe', 'TeNNeSSiNe'},\n",
" {'OGaNEsSON', 'OGaNeSSON'}]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time [spellings(w) for w in elements.values() if spellings(w)]"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -264,7 +398,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
"version": "3.9.12"
}
},
"nbformat": 4,