{ "cells": [ { "cell_type": "markdown", "id": "19ee7dde-0d74-47e8-8d0d-4ffcb99e2f5a", "metadata": {}, "source": [ "
Peter Norvig
Sept 25, 2024
\n", "\n", "# The Languages of English, Math, and Programming\n", "\n", "My colleague [Wei-Hwa Huang](https://en.wikipedia.org/wiki/Wei-Hwa_Huang) gave several AI chatbots this prompt: \n", "\n", "**List all the ways in which three distinct positive integers have a product of 108.**\n", "\n", "I tested this prompt on the following solvers:\n", "- [A human programmer](https://github.com/norvig/)\n", "- [Gemini Advanced](https://gemini.google.com/app)\n", "- [ChatGPT 4o](https://chatgpt.com/)\n", "- [Microsoft Copilot](https://copilot.microsoft.com/)\n", "- [Anthropic Claude 3.5 Sonnet](https://claude.ai/new)\n", "- [Meta AI Llama 3](https://www.meta.ai/)\n", "- [Perplexity](https://www.perplexity.ai/)\n", "- [Cohere Chat](https://cohere.com/chat)\n", "- [HuggingFace Chat](https://huggingface.co/chat/)\n", "- [You.com](https://you.com/)\n", "\n", "All the LLMs Wei-Hwa originally tried got this one wrong. From my expanded list, Gemini, ChatGPT 4o, You.com and the human got it right, and 5 other models made mistakes:\n", "- The LLMs all started their answer by noting that 108 = 2 × 2 × 3 × 3 × 3, and then tried to partition those factors into three distinct subsets and report all ways to do so.\n", "- So far so good.\n", "- But most of them forgot that 1 could be a factor of 108 (or equivalently, that the empty set of factors is a valid subset). \n", "- Some of the models ignored the need for \"distinct\" integers, and proposed, say, 3 × 6 × 6.\n", "- Some got 5 or 6 correct triplets, and then stopped, perhaps because their attention mechanism didn't go back far enough.\n", "- SOme even proposed non-integers as \"factors\".\n", "\n", "I thought that the models might have skipped 1 as a factor because 1 is not listed in the prime factorization, so it is easy to forget. But in programming, it is more natural to run a loop from 1 to *n* than from 2 to *n*, so this error would be less likely. Therefore, I decided to test all the models with the following prompt: \n", "\n", "**Write a Python program to list all the ways in which three distinct positive integers have a product of 108.**\n", "\n", "# TLDR: Conclusion\n", "\n", "The models did much better with this prompt. My conclusion is that the language used to solve a problem matters. Sometimes a natural language such as English is a good choice, sometimes you need the language of mathematical equations, or maybe chemical equations, and sometimes a programming language is best.\n", "\n", "# Human\n", "\n", "A human (me) was able to correctly respond to the prompt:" ] }, { "cell_type": "code", "execution_count": 1, "id": "f8a27ed0-c2b1-47a0-bdf0-c6a8cd789dc5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{1, 2, 54},\n", " {1, 3, 36},\n", " {1, 4, 27},\n", " {1, 6, 18},\n", " {1, 9, 12},\n", " {2, 3, 18},\n", " {2, 6, 9},\n", " {3, 4, 9}]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from math import prod\n", "from itertools import combinations\n", "from typing import *\n", "\n", "def find_products(k=3, n=108) -> List[Set[int]]:\n", " \"\"\"A list of all ways in which `k` distinct positive integers have a product of `n`.\"\"\" \n", " factors = {i for i in range(1, n + 1) if n % i == 0}\n", " return [set(ints) for ints in combinations(factors, k) if prod(ints) == n]\n", "\n", "find_products()" ] }, { "cell_type": "markdown", "id": "b7682af0-8c46-4e19-bdba-71f28bbfa101", "metadata": {}, "source": [ "The program can solve different versions of the problem (although it would be slow when `n` is over a billion or so):" ] }, { "cell_type": "code", "execution_count": 2, "id": "55dfa9c2-f366-42c8-ae50-6a1df80c47b3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{1, 2, 3, 4, 15},\n", " {1, 2, 3, 5, 12},\n", " {1, 2, 3, 6, 10},\n", " {1, 2, 4, 5, 9},\n", " {1, 3, 4, 5, 6}]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "find_products(5, 360)" ] }, { "cell_type": "markdown", "id": "42fc5098-2315-4984-b256-4fbf2f6fec51", "metadata": {}, "source": [ "# Gemini Advanced\n", "\n", "Gemini produced three drafts, of which the following one was correct. In another draft, it had the line `k = product // (i * j)`, using integer division, which is incompatible with the `k.is_integer()` test. Here is the correct draft:" ] }, { "cell_type": "code", "execution_count": 3, "id": "29789909-15d1-4833-b35f-cb5399893ba2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" ] } ], "source": [ "def find_combinations(product):\n", " \"\"\"Finds all combinations of three distinct positive integers that multiply to the given product.\n", "\n", " Args:\n", " product: The product to find combinations for.\n", "\n", " Returns:\n", " A list of tuples, where each tuple represents a combination of three integers.\n", " \"\"\"\n", "\n", " combinations = []\n", " for i in range(1, product):\n", " for j in range(i + 1, product):\n", " k = product / (i * j)\n", " if k > j and k.is_integer():\n", " combinations.append((i, j, int(k)))\n", " return combinations\n", "\n", "product = 108\n", "result = find_combinations(product)\n", "print(result)" ] }, { "cell_type": "markdown", "id": "81f032a3-5621-470e-9ada-8d3e026ff1cc", "metadata": {}, "source": [ "# ChatGPT 4o\n", "\n", "ChatGPT got it right (although it imported `itertools` and then never used it)." ] }, { "cell_type": "code", "execution_count": 4, "id": "c248a136-5ed5-40dd-ad48-b303e99c3675", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All distinct triplets of integers that multiply to 108:\n", "(1, 2, 54)\n", "(1, 3, 36)\n", "(1, 4, 27)\n", "(1, 6, 18)\n", "(1, 9, 12)\n", "(2, 3, 18)\n", "(2, 6, 9)\n", "(3, 4, 9)\n" ] } ], "source": [ "import itertools\n", "\n", "def find_triplets_with_product(product):\n", " solutions = []\n", " \n", " # Iterate over all possible combinations of distinct integers\n", " for a in range(1, int(product ** (1/3)) + 1):\n", " for b in range(a + 1, product // a + 1):\n", " for c in range(b + 1, product // (a * b) + 1):\n", " if a * b * c == product:\n", " solutions.append((a, b, c))\n", " \n", " return solutions\n", "\n", "product = 108\n", "triplets = find_triplets_with_product(product)\n", "\n", "if triplets:\n", " print(f\"All distinct triplets of integers that multiply to {product}:\")\n", " for triplet in triplets:\n", " print(triplet)\n", "else:\n", " print(f\"No distinct triplets found that multiply to {product}.\")\n" ] }, { "cell_type": "markdown", "id": "17e74293-feab-4fff-b682-bc26823ebefa", "metadata": {}, "source": [ "# Bing CoPilot\n", "\n", "Bing produces a very clean (but somewhat slower) `find_triplets` function." ] }, { "cell_type": "code", "execution_count": 5, "id": "2b0ee5b6-64bd-4581-bdc7-d22065f02d64", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All distinct triplets with a product of 108:\n", "(1, 2, 54)\n", "(1, 3, 36)\n", "(1, 4, 27)\n", "(1, 6, 18)\n", "(1, 9, 12)\n", "(2, 3, 18)\n", "(2, 6, 9)\n", "(3, 4, 9)\n" ] } ], "source": [ "def find_triplets(product):\n", " triplets = []\n", " for a in range(1, product + 1):\n", " for b in range(a + 1, product + 1):\n", " for c in range(b + 1, product + 1):\n", " if a * b * c == product:\n", " triplets.append((a, b, c))\n", " return triplets\n", "\n", "product = 108\n", "triplets = find_triplets(product)\n", "\n", "print(f\"All distinct triplets with a product of {product}:\")\n", "for triplet in triplets:\n", " print(triplet)" ] }, { "cell_type": "markdown", "id": "fddeabc0-4925-4145-b666-7a094fd61980", "metadata": {}, "source": [ "# Claude 3.5 Sonnet\n", "\n", "Claude's answer is quite simlar to ChatGPT 4o's (but I would criticize it for not taking a parameter):" ] }, { "cell_type": "code", "execution_count": 6, "id": "83f9088c-8444-4207-ad24-91d8e3a98004", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "There are 8 ways to express 108 as a product of three distinct positive integers:\n", "1 x 2 x 54 = 108\n", "1 x 3 x 36 = 108\n", "1 x 4 x 27 = 108\n", "1 x 6 x 18 = 108\n", "1 x 9 x 12 = 108\n", "2 x 3 x 18 = 108\n", "2 x 6 x 9 = 108\n", "3 x 4 x 9 = 108\n" ] } ], "source": [ "def find_combinations():\n", " result = []\n", " for i in range(1, 108):\n", " for j in range(i + 1, 108):\n", " k = 108 // (i * j)\n", " if i * j * k == 108 and k > j:\n", " result.append((i, j, k))\n", " return result\n", "\n", "def main():\n", " combinations = find_combinations()\n", " print(f\"There are {len(combinations)} ways to express 108 as a product of three distinct positive integers:\")\n", " for combo in combinations:\n", " print(f\"{combo[0]} x {combo[1]} x {combo[2]} = 108\")\n", "\n", "if __name__ == \"__main__\":\n", " main()" ] }, { "cell_type": "markdown", "id": "eefc3007-105d-41cd-9023-0a34609517fd", "metadata": {}, "source": [ "# Meta AI Llama 3.1\n", "\n", "This program forgets to test for `i < j < k`, and thus produces six permutations of each triplet. Oddly, it claims that \"*When you run this program, it will output: [(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]*\", so it knows the correct answer, but it didn't run the program to test that it is correct." ] }, { "cell_type": "code", "execution_count": 7, "id": "de1d0cc3-10a8-417a-a35a-0a66325f549c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (1, 12, 9), (1, 18, 6), (1, 27, 4), (1, 36, 3), (1, 54, 2), (2, 1, 54), (2, 3, 18), (2, 6, 9), (2, 9, 6), (2, 18, 3), (2, 54, 1), (3, 1, 36), (3, 2, 18), (3, 4, 9), (3, 9, 4), (3, 18, 2), (3, 36, 1), (4, 1, 27), (4, 3, 9), (4, 9, 3), (4, 27, 1), (6, 1, 18), (6, 2, 9), (6, 9, 2), (6, 18, 1), (9, 1, 12), (9, 2, 6), (9, 3, 4), (9, 4, 3), (9, 6, 2), (9, 12, 1), (12, 1, 9), (12, 9, 1), (18, 1, 6), (18, 2, 3), (18, 3, 2), (18, 6, 1), (27, 1, 4), (27, 4, 1), (36, 1, 3), (36, 3, 1), (54, 1, 2), (54, 2, 1)]\n" ] } ], "source": [ "def find_factors(n):\n", " factors = []\n", " for i in range(1, n + 1):\n", " if n % i == 0:\n", " factors.append(i)\n", " return factors\n", "\n", "def find_triplets(n):\n", " factors = find_factors(n)\n", " triplets = []\n", " for i in factors:\n", " for j in factors:\n", " if i * j <= n:\n", " k = n // (i * j)\n", " if i != j and j != k and i != k and i * j * k == n:\n", " triplets.append((i, j, k))\n", " return triplets\n", "\n", "print(find_triplets(108))" ] }, { "cell_type": "markdown", "id": "742f3b13-a53f-497a-97e7-5aa58b361951", "metadata": {}, "source": [ "# Perplexity\n", "\n", "Perplexity does well." ] }, { "cell_type": "code", "execution_count": 8, "id": "7a764a01-8f18-4773-bd7a-fbc8dc652450", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All combinations of three distinct positive integers with a product of 108:\n", "1 x 2 x 54 = 108\n", "1 x 3 x 36 = 108\n", "1 x 4 x 27 = 108\n", "1 x 6 x 18 = 108\n", "1 x 9 x 12 = 108\n", "2 x 3 x 18 = 108\n", "2 x 6 x 9 = 108\n", "3 x 4 x 9 = 108\n", "\n", "Total number of combinations: 8\n" ] } ], "source": [ "def find_combinations(target):\n", " combinations = []\n", " \n", " for i in range(1, target + 1):\n", " if target % i == 0:\n", " for j in range(i + 1, target + 1):\n", " if (target // i) % j == 0:\n", " k = target // (i * j)\n", " if k > j and i * j * k == target:\n", " combinations.append((i, j, k))\n", " \n", " return combinations\n", "\n", "target_product = 108\n", "result = find_combinations(target_product)\n", "\n", "print(f\"All combinations of three distinct positive integers with a product of {target_product}:\")\n", "for combo in result:\n", " print(f\"{combo[0]} x {combo[1]} x {combo[2]} = {target_product}\")\n", "\n", "print(f\"\\nTotal number of combinations: {len(result)}\")" ] }, { "cell_type": "markdown", "id": "53f85624-bfe3-42cc-b55b-b86a56089314", "metadata": {}, "source": [ "# Cohere Chat Playground\n", "\n", "Cohere fails to find any combinations. The 5th line should just be `k = product // i // j`; Cohere bizarrely adds `* j * i`. It also fails to test that `i < j < k`." ] }, { "cell_type": "code", "execution_count": 9, "id": "61139601-ced2-4598-ae76-464c1041d2d1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "All combinations of three distinct positive integers with a product of 108:\n" ] } ], "source": [ "def find_combinations(product):\n", " combinations = []\n", " for i in range(1, product // 3 + 1):\n", " for j in range(i, product // 2 + 1):\n", " k = product // i // j * j * i\n", " if i * j * k == product and i != j and j != k and i != k:\n", " combinations.append((i, j, k))\n", " return combinations\n", "\n", "product = 108\n", "combinations = find_combinations(product)\n", "print(f\"All combinations of three distinct positive integers with a product of {product}:\")\n", "for combo in combinations:\n", " print(combo)" ] }, { "cell_type": "markdown", "id": "51b0b353-79d2-4da7-a6da-8231d5ff811e", "metadata": {}, "source": [ "# HuggingChat\n", "\n", "Hugging Chat produced a correct concise program. I note that `i < j < k` would be cleaner than `k >= j and i!= j and j!= k` here (and for others as well)." ] }, { "cell_type": "code", "execution_count": 10, "id": "6e4b823b-66f1-48ae-b4ca-00387d480e03", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" ] } ], "source": [ "def find_triplets(n):\n", " triplets = []\n", " for i in range(1, n):\n", " for j in range(i+1, n):\n", " k = n // (i * j)\n", " if k >= j and i * j * k == n and i!= j and j!= k:\n", " triplets.append((i, j, k))\n", " return triplets\n", "\n", "print(find_triplets(108))" ] }, { "cell_type": "markdown", "id": "23d78bca-9b35-43c0-891a-a9c0e6801b31", "metadata": {}, "source": [ "# You.com\n", "\n", "You.com produces a correct solution, with some nice optimizations that make it *O*(*n*5/6), whereas most of the solutions are *O*(*n*2). This means it can handle a 14-digit product in a second of run time, whereas the human-written solution can only handle 10-digit products in one second, while the HuggingChat version (for example) takes several seconds just to handle a 5-digit product." ] }, { "cell_type": "code", "execution_count": 11, "id": "fe69bb04-02de-4f38-a8ab-6693c15b02c4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" ] } ], "source": [ "def find_triplets(product):\n", " triplets = []\n", " for i in range(1, int(product ** (1/3)) + 1): # The cube root of the product is the maximum possible value for i\n", " if product % i == 0:\n", " for j in range(i + 1, int((product / i) ** 0.5) + 1): # The square root of the product divided by i is the maximum possible value for j\n", " if (product / i) % j == 0:\n", " k = product // (i * j)\n", " if k > j: # Ensure the integers are distinct\n", " triplets.append((i, j, k))\n", " return triplets\n", "\n", "triplets = find_triplets(108)\n", "print(triplets)" ] }, { "cell_type": "code", "execution_count": null, "id": "fc16ca96-0b05-4ad4-82c0-552bb99373fd", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.15" } }, "nbformat": 4, "nbformat_minor": 5 }