{ "cells": [ { "cell_type": "markdown", "id": "19ee7dde-0d74-47e8-8d0d-4ffcb99e2f5a", "metadata": {}, "source": [ "
Peter Norvig
Sept 25, 2024
\n", "\n", "# The Languages of English, Math, and Programming\n", "\n", "My colleague [Wei-Hwa Huang](https://en.wikipedia.org/wiki/Wei-Hwa_Huang) posed the following problem to several AI large language model (LLM) chatbots: \n", "\n", "- **List all the ways in which three distinct positive integers have a product of 108.**\n", "\n", "All the LLMs he tried failed. I reran the experiment on more LLMs (and a human), and a few of them succeeded. I thought they might do better with this prompt:\n", "\n", "- **Write a Python program to list all the ways in which three distinct positive integers have a product of 108.**\n", "\n", "\n", "\n", "# TLDR: Conclusions\n", "\n", "Only 2 of the 9 LLMs solved the \"list all ways\" prompt, but 7 out of 9 solved the \"write a program\" prompt. **The language that a problem-solver uses matters!** Sometimes a natural language such as English is a good choice, sometimes you need the language of mathematical equations, or chemical equations, or musical notation, and sometimes a programming language is best. Written language is an amazing invention that has enabled human culture to build over the centuries (and also enabled LLMs to work). But human ingenuity has divised other notations that are more specialized but very effective in limited domains.\n", "\n", "Some more notes on the \"write a program\" prompt:\n", "\n", "\n", "- The LLMs all started their answer by stating that 108 = 2 × 2 × 3 × 3 × 3, and then tried to partition those factors into three distinct subsets and report all ways to do so.\n", "- So far so good!\n", "- But some of them forgot that 1 could be a factor of 108 (or equivalently, that the empty set of factors is a valid subset).\n", "- Some of them only forgot the triplets (1, 2, 54) and (1, 3, 36), but somehow got (1, 4, 27) and (1, 6, 18).\n", " - Perhaps the forgetting was because their attention mechanism didn't go back far enough?\n", "- The models might have skipped 1 as a factor because 1 is not listed in the prime factorization, so it is easy to forget. But in programming, it is more natural to run a loop from 1 to *n* than from 2 to *n*; that's why I tried the \"write a program\" prompt.\n", "- Some of the models ignored the need for \"distinct\" integers, and proposed, (3, 6, 6) or (1, 108, 1).\n", "- Perplexity proposed (2, 4, 13.5) on the first run, but on a rerun proposed and then eliminated 13.5 to get the correct result.\n", "\n", "Summary of which solver solved which problems:\n", "\n", "|Solver|\"List all ways\"|\"Write a program\"|\n", "|--|--|--|\n", "|[A human programmer](https://github.com/norvig/)|yes|yes|\n", "|[Gemini Advanced](https://gemini.google.com/app)|no (4 + 1 nondistinct)|yes|\n", "|[ChatGPT 4o](https://chatgpt.com/)|**yes**|yes|\n", "|[Microsoft Copilot](https://copilot.microsoft.com/)|no (6/8)|yes|\n", "|[Anthropic Claude 3.5 Sonnet](https://claude.ai/new)|no (6/8)|yes|\n", "|[Meta AI Llama 3](https://www.meta.ai/)|no (6/8)|**no** (extra permutations)|\n", "|[Perplexity](https://www.perplexity.ai/)|**yes**|yes|\n", "|[Cohere Chat](https://cohere.com/chat)|no (8 + 2 nondistinct)|**no** (0/8)|\n", "|[HuggingFace Chat](https://huggingface.co/chat/)|no (8 + 1 nondistinct)|yes|\n", "|[You.com](https://you.com/)|no (6/8)|yes|\n", "|**Total of LLMs**|**2/9 yes**|**7/9 yes**|\n", "\n", "\n", "Below are the programs produced by all the solvers:\n", "\n", "# Human\n", "\n", "A human (me) generated this correct solution:" ] }, { "cell_type": "code", "execution_count": 1, "id": "f8a27ed0-c2b1-47a0-bdf0-c6a8cd789dc5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{(1, 2, 54),\n", " (1, 3, 36),\n", " (1, 4, 27),\n", " (1, 6, 18),\n", " (1, 9, 12),\n", " (2, 3, 18),\n", " (2, 6, 9),\n", " (3, 4, 9)}" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from math import prod\n", "from itertools import combinations\n", "from typing import *\n", "\n", "def find_products(k=3, N=108) -> Set[Tuple[int, ...]]:\n", " \"\"\"A list of all ways in which `k` distinct positive integers have a product of `N`.\"\"\" \n", " factors = {i for i in range(1, N + 1) if N % i == 0}\n", " return {ints for ints in combinations(factors, k) if prod(ints) == N}\n", "\n", "find_products()" ] }, { "cell_type": "markdown", "id": "b7682af0-8c46-4e19-bdba-71f28bbfa101", "metadata": {}, "source": [ "The program can solve different versions of the problem (although it would be slow when `n` is over a billion or so):" ] }, { "cell_type": "code", "execution_count": 2, "id": "55dfa9c2-f366-42c8-ae50-6a1df80c47b3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{(1, 2, 3, 4, 15),\n", " (1, 2, 3, 5, 12),\n", " (1, 2, 3, 6, 10),\n", " (1, 2, 4, 5, 9),\n", " (1, 3, 4, 5, 6)}" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "find_products(5, 360)" ] }, { "cell_type": "markdown", "id": "42fc5098-2315-4984-b256-4fbf2f6fec51", "metadata": {}, "source": [ "# Gemini Advanced\n", "\n", "Gemini produced three drafts, of which the following one was correct. In another draft, it had the line `k = product // (i * j)`, using integer division, which is incompatible with the `k.is_integer()` test (maybe `int` should support `.is_integer`?). Here is the correct draft:" ] }, { "cell_type": "code", "execution_count": 3, "id": "29789909-15d1-4833-b35f-cb5399893ba2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" ] } ], "source": [ "def find_combinations(product):\n", " \"\"\"Finds all combinations of three distinct positive integers that multiply to the given product.\n", "\n", " Args:\n", " product: The product to find combinations for.\n", "\n", " Returns:\n", " A list of tuples, where each tuple represents a combination of three integers.\n", " \"\"\"\n", "\n", " combinations = []\n", " for i in range(1, product):\n", " for j in range(i + 1, product):\n", " k = product / (i * j)\n", " if k > j and k.is_integer():\n", " combinations.append((i, j, int(k)))\n", " return combinations\n", "\n", "product = 108\n", "result = find_combinations(product)\n", "print(result)" ] }, { "cell_type": "markdown", "id": "81f032a3-5621-470e-9ada-8d3e026ff1cc", "metadata": {}, "source": [ "# ChatGPT 4o\n", "\n", "ChatGPT got it right (although it imported `itertools` and then never used it)." ] }, { "cell_type": "code", "execution_count": 4, "id": "c248a136-5ed5-40dd-ad48-b303e99c3675", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All distinct triplets of integers that multiply to 108:\n", "(1, 2, 54)\n", "(1, 3, 36)\n", "(1, 4, 27)\n", "(1, 6, 18)\n", "(1, 9, 12)\n", "(2, 3, 18)\n", "(2, 6, 9)\n", "(3, 4, 9)\n" ] } ], "source": [ "import itertools\n", "\n", "def find_triplets_with_product(product):\n", " solutions = []\n", " \n", " # Iterate over all possible combinations of distinct integers\n", " for a in range(1, int(product ** (1/3)) + 1):\n", " for b in range(a + 1, product // a + 1):\n", " for c in range(b + 1, product // (a * b) + 1):\n", " if a * b * c == product:\n", " solutions.append((a, b, c))\n", " \n", " return solutions\n", "\n", "product = 108\n", "triplets = find_triplets_with_product(product)\n", "\n", "if triplets:\n", " print(f\"All distinct triplets of integers that multiply to {product}:\")\n", " for triplet in triplets:\n", " print(triplet)\n", "else:\n", " print(f\"No distinct triplets found that multiply to {product}.\")\n" ] }, { "cell_type": "markdown", "id": "17e74293-feab-4fff-b682-bc26823ebefa", "metadata": {}, "source": [ "# Microsoft CoPilot\n", "\n", "Bing produces a very clean (but somewhat slower) `find_triplets` function." ] }, { "cell_type": "code", "execution_count": 5, "id": "2b0ee5b6-64bd-4581-bdc7-d22065f02d64", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All distinct triplets with a product of 108:\n", "(1, 2, 54)\n", "(1, 3, 36)\n", "(1, 4, 27)\n", "(1, 6, 18)\n", "(1, 9, 12)\n", "(2, 3, 18)\n", "(2, 6, 9)\n", "(3, 4, 9)\n" ] } ], "source": [ "def find_triplets(product):\n", " triplets = []\n", " for a in range(1, product + 1):\n", " for b in range(a + 1, product + 1):\n", " for c in range(b + 1, product + 1):\n", " if a * b * c == product:\n", " triplets.append((a, b, c))\n", " return triplets\n", "\n", "product = 108\n", "triplets = find_triplets(product)\n", "\n", "print(f\"All distinct triplets with a product of {product}:\")\n", "for triplet in triplets:\n", " print(triplet)" ] }, { "cell_type": "markdown", "id": "fddeabc0-4925-4145-b666-7a094fd61980", "metadata": {}, "source": [ "# Claude 3.5 Sonnet\n", "\n", "Claude's answer is quite simlar to ChatGPT 4o's (but I would criticize it for not taking a parameter):" ] }, { "cell_type": "code", "execution_count": 6, "id": "83f9088c-8444-4207-ad24-91d8e3a98004", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 8 ways to express 108 as a product of three distinct positive integers:\n", "1 x 2 x 54 = 108\n", "1 x 3 x 36 = 108\n", "1 x 4 x 27 = 108\n", "1 x 6 x 18 = 108\n", "1 x 9 x 12 = 108\n", "2 x 3 x 18 = 108\n", "2 x 6 x 9 = 108\n", "3 x 4 x 9 = 108\n" ] } ], "source": [ "def find_combinations():\n", " result = []\n", " for i in range(1, 108):\n", " for j in range(i + 1, 108):\n", " k = 108 // (i * j)\n", " if i * j * k == 108 and k > j:\n", " result.append((i, j, k))\n", " return result\n", "\n", "def main():\n", " combinations = find_combinations()\n", " print(f\"There are {len(combinations)} ways to express 108 as a product of three distinct positive integers:\")\n", " for combo in combinations:\n", " print(f\"{combo[0]} x {combo[1]} x {combo[2]} = 108\")\n", "\n", "if __name__ == \"__main__\":\n", " main()" ] }, { "cell_type": "markdown", "id": "eefc3007-105d-41cd-9023-0a34609517fd", "metadata": {}, "source": [ "# Meta AI Llama 3.1\n", "\n", "This program forgets to test for `i < j < k`, and thus produces six permutations of each triplet. Oddly, it claims that \"*When you run this program, it will output: [(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]*\", so it knows the correct answer, but it didn't run the program to test that it is correct." ] }, { "cell_type": "code", "execution_count": 7, "id": "de1d0cc3-10a8-417a-a35a-0a66325f549c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (1, 12, 9), (1, 18, 6), (1, 27, 4), (1, 36, 3), (1, 54, 2), (2, 1, 54), (2, 3, 18), (2, 6, 9), (2, 9, 6), (2, 18, 3), (2, 54, 1), (3, 1, 36), (3, 2, 18), (3, 4, 9), (3, 9, 4), (3, 18, 2), (3, 36, 1), (4, 1, 27), (4, 3, 9), (4, 9, 3), (4, 27, 1), (6, 1, 18), (6, 2, 9), (6, 9, 2), (6, 18, 1), (9, 1, 12), (9, 2, 6), (9, 3, 4), (9, 4, 3), (9, 6, 2), (9, 12, 1), (12, 1, 9), (12, 9, 1), (18, 1, 6), (18, 2, 3), (18, 3, 2), (18, 6, 1), (27, 1, 4), (27, 4, 1), (36, 1, 3), (36, 3, 1), (54, 1, 2), (54, 2, 1)]\n" ] } ], "source": [ "def find_factors(n):\n", " factors = []\n", " for i in range(1, n + 1):\n", " if n % i == 0:\n", " factors.append(i)\n", " return factors\n", "\n", "def find_triplets(n):\n", " factors = find_factors(n)\n", " triplets = []\n", " for i in factors:\n", " for j in factors:\n", " if i * j <= n:\n", " k = n // (i * j)\n", " if i != j and j != k and i != k and i * j * k == n:\n", " triplets.append((i, j, k))\n", " return triplets\n", "\n", "print(find_triplets(108))" ] }, { "cell_type": "markdown", "id": "742f3b13-a53f-497a-97e7-5aa58b361951", "metadata": {}, "source": [ "# Perplexity\n", "\n", "Perplexity does well." ] }, { "cell_type": "code", "execution_count": 8, "id": "7a764a01-8f18-4773-bd7a-fbc8dc652450", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All combinations of three distinct positive integers with a product of 108:\n", "1 x 2 x 54 = 108\n", "1 x 3 x 36 = 108\n", "1 x 4 x 27 = 108\n", "1 x 6 x 18 = 108\n", "1 x 9 x 12 = 108\n", "2 x 3 x 18 = 108\n", "2 x 6 x 9 = 108\n", "3 x 4 x 9 = 108\n", "\n", "Total number of combinations: 8\n" ] } ], "source": [ "def find_combinations(target):\n", " combinations = []\n", " \n", " for i in range(1, target + 1):\n", " if target % i == 0:\n", " for j in range(i + 1, target + 1):\n", " if (target // i) % j == 0:\n", " k = target // (i * j)\n", " if k > j and i * j * k == target:\n", " combinations.append((i, j, k))\n", " \n", " return combinations\n", "\n", "target_product = 108\n", "result = find_combinations(target_product)\n", "\n", "print(f\"All combinations of three distinct positive integers with a product of {target_product}:\")\n", "for combo in result:\n", " print(f\"{combo[0]} x {combo[1]} x {combo[2]} = {target_product}\")\n", "\n", "print(f\"\\nTotal number of combinations: {len(result)}\")" ] }, { "cell_type": "markdown", "id": "53f85624-bfe3-42cc-b55b-b86a56089314", "metadata": {}, "source": [ "# Cohere Chat Playground\n", "\n", "Cohere fails to find any combinations. The 5th line should just be `k = product // i // j`; Cohere bizarrely adds `* j * i`. It also fails to test that `i < j < k`." ] }, { "cell_type": "code", "execution_count": 9, "id": "61139601-ced2-4598-ae76-464c1041d2d1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All combinations of three distinct positive integers with a product of 108:\n" ] } ], "source": [ "def find_combinations(product):\n", " combinations = []\n", " for i in range(1, product // 3 + 1):\n", " for j in range(i, product // 2 + 1):\n", " k = product // i // j * j * i\n", " if i * j * k == product and i != j and j != k and i != k:\n", " combinations.append((i, j, k))\n", " return combinations\n", "\n", "product = 108\n", "combinations = find_combinations(product)\n", "print(f\"All combinations of three distinct positive integers with a product of {product}:\")\n", "for combo in combinations:\n", " print(combo)" ] }, { "cell_type": "markdown", "id": "51b0b353-79d2-4da7-a6da-8231d5ff811e", "metadata": {}, "source": [ "# HuggingChat\n", "\n", "Hugging Chat produced a correct concise program. I note that `i < j < k` would be cleaner than `k >= j and i!= j and j!= k` here (and for others as well)." ] }, { "cell_type": "code", "execution_count": 10, "id": "6e4b823b-66f1-48ae-b4ca-00387d480e03", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" ] } ], "source": [ "def find_triplets(n):\n", " triplets = []\n", " for i in range(1, n):\n", " for j in range(i+1, n):\n", " k = n // (i * j)\n", " if k >= j and i * j * k == n and i!= j and j!= k:\n", " triplets.append((i, j, k))\n", " return triplets\n", "\n", "print(find_triplets(108))" ] }, { "cell_type": "markdown", "id": "23d78bca-9b35-43c0-891a-a9c0e6801b31", "metadata": {}, "source": [ "# You.com\n", "\n", "You.com produces a correct solution, with some nice optimizations that make it *O*(*n*5/6), whereas most of the solutions are *O*(*n*2). This means it can handle a 14-digit product in about a second of run time, whereas the HuggingChat version (for example) can only handle 5-digit products in that time." ] }, { "cell_type": "code", "execution_count": 11, "id": "fe69bb04-02de-4f38-a8ab-6693c15b02c4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" ] } ], "source": [ "def find_triplets(product):\n", " triplets = []\n", " for i in range(1, int(product ** (1/3)) + 1): # The cube root of the product is the maximum possible value for i\n", " if product % i == 0:\n", " for j in range(i + 1, int((product / i) ** 0.5) + 1): # The square root of the product divided by i is the maximum possible value for j\n", " if (product / i) % j == 0:\n", " k = product // (i * j)\n", " if k > j: # Ensure the integers are distinct\n", " triplets.append((i, j, k))\n", " return triplets\n", "\n", "triplets = find_triplets(108)\n", "print(triplets)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15" } }, "nbformat": 4, "nbformat_minor": 5 }