Add files via upload
This commit is contained in:
parent
40d0f0d5c0
commit
50a82a6df5
File diff suppressed because it is too large
Load Diff
@ -9,13 +9,27 @@
|
||||
"\n",
|
||||
"# LLMs, Theory of Mind, and Cheryl's Birthday\n",
|
||||
"\n",
|
||||
"There has been [much](https://spectrum.ieee.org/theory-of-mind-ai) [debate](https://aclanthology.org/2023.conll-1.25/) [on](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models) [the](https://arxiv.org/abs/2302.02083) [degree](https://www.nature.com/articles/s41562-024-01882-z) to which Large Language Models (LLMs) have a theory of mind: a way of understanding what other people know and don't know. In this notebook I explore one small part of the issue by asking six LLMs to solve the [Cheryl's Birthday Problem](https://en.wikipedia.org/wiki/Cheryl%27s_Birthday), a well-known logic puzzle in which different characters have different states of knowledge. \n",
|
||||
"There has been [much](https://spectrum.ieee.org/theory-of-mind-ai) [debate](https://aclanthology.org/2023.conll-1.25/) [on](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models) [the](https://arxiv.org/abs/2302.02083) [degree](https://www.nature.com/articles/s41562-024-01882-z) to which Large Language Models (LLMs) have a theory of mind: a way of understanding what other people know and don't know. In this notebook I explore one small part of the issue by asking nine LLM chatbots to solve the [Cheryl's Birthday Problem](https://en.wikipedia.org/wiki/Cheryl%27s_Birthday), a well-known logic puzzle in which different characters have different states of knowledge at different times.\n",
|
||||
"\n",
|
||||
"**TLDR**: The six LLMs were all familiar with the problem, so I didn't have to describe it in the prompt, just name it. They were able to correctly state that the answer to the problem is July 16. But none of them were able to write a program that finds the solution. They all failed to distinguish the different knowledge states of the different characters–for this problem they had no theory of mind.\n",
|
||||
"I asked the following ten solvers to tackle the Cheryl's Birthday problem:\n",
|
||||
"- [A human programmer](https://github.com/norvig/)\n",
|
||||
"- [ChatGPT 4o](https://chatgpt.com/)\n",
|
||||
"- [Microsoft Copilot](https://copilot.microsoft.com/)\n",
|
||||
"- [Gemini Advanced](https://gemini.google.com/app)\n",
|
||||
"- [Meta AI Llama 405B](https://www.meta.ai/)\n",
|
||||
"- [Anthropic Claude 3.5 Sonnet](https://claude.ai/new)\n",
|
||||
"- [Perplexity](https://www.perplexity.ai/)\n",
|
||||
"- [Cohere Chat](https://cohere.com/chat)\n",
|
||||
"- [HuggingFace Chat](https://huggingface.co/chat/)\n",
|
||||
"- [You.com](https://you.com/)\n",
|
||||
"\n",
|
||||
"Below I show the response for each LLM. I elided some of the explanatory output when it was very similar to previous output, but I show the programs verbatim. My comments are in *[bracketed italics]* and the LLM output is in regular upright text. The queries were made on Sept 25, 2024; subsequent updates of the models may perform differently.\n",
|
||||
"# TLDR: Conclusion\n",
|
||||
"\n",
|
||||
"For each LLMs the prompt was:\n",
|
||||
"The LLMs were all familiar with the problem, so I didn't have to describe it in the prompt, just name it. Most of them correctly recalled the answer to the problem: July 16. But none of them were able to write a program that finds the solution. They all failed to distinguish the different knowledge states of the different characters over time. At least with respect to this problem, they had no theory of mind. (Perhaps that is in part due to the fact that very few of the Python programs they were trained on deal with theory of mind.)\n",
|
||||
"\n",
|
||||
"Below I show the response for each LLM. Each one provided explanatory output along with a program; for brevity I only show the explanatory output from the first one, ChatGPT 4o. My comments are in *[bracketed italics]*. The queries were made on Sept 25, 2024; subsequent updates of the models may perform differently.\n",
|
||||
"\n",
|
||||
"For each LLM the prompt was:\n",
|
||||
"\n",
|
||||
"***What is the answer to the \"Cheryl's Birthday\" problem? Write a Python program to solve it. Make sure that the program will still work if the list of possible dates is changed.***"
|
||||
]
|
||||
@ -27,7 +41,7 @@
|
||||
"source": [
|
||||
"# Human\n",
|
||||
"\n",
|
||||
"An actual human, me, was able to write a [**program**](https://github.com/norvig/pytudes/blob/main/ipynb/Cheryl-and-Eve.ipynb) that correctly solves the original problem and also handles new sets of dates."
|
||||
"An actual human (me) was able to write a program, shown in [**another notebook**](https://github.com/norvig/pytudes/blob/main/ipynb/Cheryl-and-Eve.ipynb), that correctly solves the original problem and also handles new sets of dates, and other variations on the problem. I introduced the idea of a *BeliefState*, a set of possible dates that a person believes might possibly be the birthday, and I modeled a character's *statement* as a function that takes a particular date as input, and returns true if the date is consistent with the statement."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -78,14 +92,14 @@
|
||||
"- Step 2: Based on Albert's statement, we eliminate the months of May and June.\n",
|
||||
"- Step 3: Based on Bernard's statement, we eliminate dates where the day appears more than once (i.e., both July 14 and August 14 are eliminated).\n",
|
||||
"- Step 4: Albert now knows Cheryl's birthday, so it must be in the month where only one date is left (i.e., July).\n",
|
||||
"\n",
|
||||
"- \n",
|
||||
"The program is designed to work with any given list of possible dates, not just the ones in the original problem."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "8a1ca966-d8aa-4bc5-bbe3-253b3b62d538",
|
||||
"id": "03ff513c-d390-413e-b524-9c82de44e1a6",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
@ -149,7 +163,59 @@
|
||||
"id": "cb513413-f71d-476d-a6bc-211ab1589bbe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*[We see that the program fails to find a solution. It fails because it has a single variable for possible dates, when it should be keeping track of separate possible dates for Albert and Bernard.]* "
|
||||
"*[We see that the program fails to find a solution. It starts with the correct list of \"possible_dates\", and correctly updates the list twice to reflect Albert's and Bernard's first statements. So far so good. But then things go wrong: first it defines \"remaining_months\" and never uses it. Then it tries to find a possible date in a unique month, but it should be a unique month with respect to what is now known, not with respect to the original possible dates stored in the \"month_map\". The program has failed to track the two character's different belief states over time.]* "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "93852dab-d167-4d14-8cc6-02979427eb01",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Microsoft Copilot\n",
|
||||
"\n",
|
||||
"*[Copilot makes a similar mistake.*]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "5d7e4408-3663-4f54-acdb-4bddf1ac9121",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Cheryl's birthday is: None\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def cheryls_birthday(possible_dates):\n",
|
||||
" # Step 1: Filter out months with unique days\n",
|
||||
" unique_days = {date.split()[1] for date in possible_dates if sum(d.split()[1] == date.split()[1] for d in possible_dates) == 1}\n",
|
||||
" possible_dates = [date for date in possible_dates if date.split()[1] not in unique_days]\n",
|
||||
"\n",
|
||||
" # Step 2: Filter out dates with unique months\n",
|
||||
" unique_months = {date.split()[0] for date in possible_dates if sum(d.split()[0] == date.split()[0] for d in possible_dates) == 1}\n",
|
||||
" possible_dates = [date for date in possible_dates if date.split()[0] not in unique_months]\n",
|
||||
"\n",
|
||||
" # Step 3: Filter out remaining dates based on the logic of the puzzle\n",
|
||||
" for date in possible_dates:\n",
|
||||
" month, day = date.split()\n",
|
||||
" if all(d.split()[0] != month for d in possible_dates if d != date):\n",
|
||||
" return date\n",
|
||||
"\n",
|
||||
"# List of possible dates\n",
|
||||
"possible_dates = [\n",
|
||||
" \"May 15\", \"May 16\", \"May 19\",\n",
|
||||
" \"June 17\", \"June 18\",\n",
|
||||
" \"July 14\", \"July 16\",\n",
|
||||
" \"August 14\", \"August 15\", \"August 17\"\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"birthday = cheryls_birthday(possible_dates)\n",
|
||||
"print(f\"Cheryl's birthday is: {birthday}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -159,16 +225,12 @@
|
||||
"source": [
|
||||
"# Gemini Advanced\n",
|
||||
"\n",
|
||||
"*[Gemini's response:]*\n",
|
||||
"\n",
|
||||
"The answer to Cheryl's Birthday problem is **July 16th**.\n",
|
||||
"\n",
|
||||
"Here's the Python program, modified to handle changes in the list of possible dates:"
|
||||
"*[Gemini makes a similar mistake.]*\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"execution_count": 3,
|
||||
"id": "7c0bd1e7-4ada-4b43-97a5-674f615b9f77",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -239,7 +301,7 @@
|
||||
"id": "11f9bfc5-8548-4f7a-ac69-acecddb6f19b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*[Gemini correctly states the answer, but then produces a program that fails to find the correct answer. Again, the failure is due to having one list of possible dates. The program accepts a list of \"possible_dates\", but confusingly, those are actually just days of the month; you aren't allowed to pass in different month/day combinations. And even if you do pass in different days, the program hard-codes certain days and months.]*"
|
||||
"*[The program accepts a list of \"possible_dates\", but confusingly, those are actually just days of the month; you aren't allowed to pass in different month/day combinations. And even if you do pass in different days, the program hard-codes certain days and months.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -247,14 +309,14 @@
|
||||
"id": "3932858f-170b-4979-a565-c3b41a74e7fe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Llama 405B\n",
|
||||
"# Meta Llama 405B\n",
|
||||
"\n",
|
||||
"*[Llama gives a fine introduction to the problem, which I omit. Here is the program it produces:]*"
|
||||
"*[Llama also fails to find a solution (and thus has no output).]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"execution_count": 4,
|
||||
"id": "1faff935-74aa-44ee-9f37-15f71521e7d2",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
@ -299,7 +361,7 @@
|
||||
"id": "469c146d-bbc9-4b2f-8609-b82631a5139d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*[There is no output from this program, which means it failed to find the date. This program is more flawed than the others. I do like that it defines functions for the various pieces, but note that the function \"bernard_deduce\" is defined but never called.]*"
|
||||
"*[This program is quite flawed. I do like that it defines functions for the various pieces, but note that the function \"bernard_deduce\" is defined but never called.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -309,12 +371,12 @@
|
||||
"source": [
|
||||
"# Claude 3.5 Sonnet\n",
|
||||
"\n",
|
||||
"*[Again, Claude provides a nice explanation of the problem, which I omit; here is the program:]*"
|
||||
"*[Claude makes a similar mistake. It deserves credit for explicitly trying an example with different dates, and it correctly reports that the second example has no solution, but that was by accident.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"execution_count": 5,
|
||||
"id": "1da1319a-95d1-4c98-8b36-d96a8c552cb7",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -328,8 +390,6 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from itertools import product\n",
|
||||
"\n",
|
||||
"def solve_cheryls_birthday(dates):\n",
|
||||
" # Extract unique months and days\n",
|
||||
" months = sorted(set(date.split()[0] for date in dates))\n",
|
||||
@ -373,14 +433,6 @@
|
||||
"print(f\"With the new set of dates, Cheryl's birthday is on: {new_solution}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "9b16c634-5eda-4249-97fd-e61c980dadbe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*[Again, the program fails. It has the same issue with a single possible_dates variable.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "52003af0-0ea1-4ed4-9b5e-5b9b8862a068",
|
||||
@ -388,12 +440,12 @@
|
||||
"source": [
|
||||
"# Perplexity\n",
|
||||
"\n",
|
||||
"*[Here is Perplexity's code, with its explanation omitted:]*"
|
||||
"*[Once again, this program has a similar mistake.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"execution_count": 6,
|
||||
"id": "55e99206-42f8-44ea-a091-1ca6a14f4e89",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -440,14 +492,6 @@
|
||||
"print(f\"Cheryl's birthday is: {result}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b9296a89-b55c-40aa-8fc0-788cea3511e0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*[Once again, this program has the same mistake.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4b4e47b5-f5a2-491d-ad34-f8c08aceca59",
|
||||
@ -455,12 +499,12 @@
|
||||
"source": [
|
||||
"# Cohere Chat Playground\n",
|
||||
"\n",
|
||||
"*[Cohere gives a nice explanation, and then this program:]*"
|
||||
"*[Cohere too has the same mistake.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 7,
|
||||
"id": "f1c6bfd7-6738-4325-8bc4-f91b4ee15e48",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
@ -517,7 +561,118 @@
|
||||
"id": "9d10112a-299e-49be-96ce-cf9551564894",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*[Again, the program fails to find a solution because it doesn't distinguish who knows what (note that the \"bernard_knows\" variable is defined in terms of \"albert_knows\", but Bernard does not know what Albert knows). The program introduces days of the week, which is extraneous, but okay. A bigger issue is that it accepts \"possible_dates\" as a parameter, but then ignores them and defines \"month_days\" based on the original dates.]*"
|
||||
"*[This program seems promising in that it has separate variables for \"albert_knows\" and \"bernard_knows\", but it does not handle them correctly. The program introduces days of the week, which is extraneous, but not harmful. A bigger issue is that it accepts \"possible_dates\" as a parameter, but then ignores them and defines \"month_days\" based on the original dates, not on the input.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "8379b2f8-9591-4f7e-be42-e5917ce0263c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# HuggingFace Chat\n",
|
||||
"\n",
|
||||
"*[HuggingFace is unique in suggesting a class definition, but still makes a similar mistake. Also, while it accepts a list of possible dates in the initializer, it then hard-codes certain months and days from the original problem.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "e0d33270-0f02-4a70-8258-4b87aab63b01",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"class CherylBirthdayProblem:\n",
|
||||
" def __init__(self, possible_dates):\n",
|
||||
" self.possible_dates = possible_dates \n",
|
||||
" \n",
|
||||
" def filter_dates(self, month, day):\n",
|
||||
" return [date for date in self.possible_dates if date[0] == month and date[1] == day]\n",
|
||||
" \n",
|
||||
" def filter_months(self, month):\n",
|
||||
" return [date for date in self.possible_dates if date[0] == month]\n",
|
||||
" \n",
|
||||
" def filter_days(self, day):\n",
|
||||
" return [date for date in self.possible_dates if date[1] == day]\n",
|
||||
" \n",
|
||||
" def solve(self):\n",
|
||||
" # Albert's first statement\n",
|
||||
" self.possible_dates = self.filter_months(5) + self.filter_months(6)\n",
|
||||
" \n",
|
||||
" # Bernard's first statement\n",
|
||||
" self.possible_dates = [date for date in self.possible_dates if date[1]!= 19]\n",
|
||||
" \n",
|
||||
" # Albert's second statement\n",
|
||||
" self.possible_dates = self.filter_days(14) + self.filter_days(15) + self.filter_days(16) + self.filter_days(17) + self.filter_days(18)\n",
|
||||
" \n",
|
||||
" # Bernard's second statement\n",
|
||||
" self.possible_dates = [date for date in self.possible_dates if date[1]!= 15 and date[0]!= 6]\n",
|
||||
" \n",
|
||||
" # Albert's third statement\n",
|
||||
" self.possible_dates = self.filter_months(7)\n",
|
||||
" \n",
|
||||
" # Bernard's third statement\n",
|
||||
" self.possible_dates = self.filter_days(16)\n",
|
||||
" \n",
|
||||
" return self.possible_dates\n",
|
||||
"\n",
|
||||
"possible_dates = [(5, 14), (5, 15), (5, 16), (6, 15), (6, 16), (6, 17), (6, 18), (6, 19), \n",
|
||||
" (7, 14), (7, 15), (7, 16), (7, 17), (7, 18), (8, 14), (8, 15), (8, 16), (8, 17), (8, 18), (8, 19)]\n",
|
||||
"problem = CherylBirthdayProblem(possible_dates)\n",
|
||||
"print(problem.solve())"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "39904aa9-d97f-4b0c-a2d0-4731c2d4aff0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# You.com Genius Mode\n",
|
||||
"\n",
|
||||
"*[You.com makes a similar mistake, although the end result is that it selects the wrong date rather than no date.]*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "232deb66-b208-4b65-9703-4519dc88c528",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"('May', 19)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def cheryls_birthday(possible_dates):\n",
|
||||
" # Step 1: Albert knows Bernard doesn't know. So, the month can't be one with a unique day.\n",
|
||||
" step1 = [date for date in possible_dates \n",
|
||||
" if not any(possible_dates.count(x) == 1 \n",
|
||||
" for x in [d for m, d in possible_dates if m == date[0]])]\n",
|
||||
"\n",
|
||||
" # Step 2: Bernard now knows. So, the day must be unique among the remaining possible dates.\n",
|
||||
" step2 = [date for date in step1 if sum(date[1] == d for m, d in step1) == 1]\n",
|
||||
"\n",
|
||||
" # Step 3: Albert now knows. So, the month must be unique among the remaining possible dates.\n",
|
||||
" step3 = [date for date in step2 if sum(date[0] == m for m, d in step2) == 1]\n",
|
||||
"\n",
|
||||
" return step3[0] if step3 else None\n",
|
||||
"\n",
|
||||
"# List of possible dates\n",
|
||||
"possible_dates = [('May', 15), ('May', 16), ('May', 19), ('June', 17), ('June', 18), \n",
|
||||
" ('July', 14), ('July', 16), ('August', 14), ('August', 15), ('August', 17)]\n",
|
||||
"\n",
|
||||
"print(cheryls_birthday(possible_dates))"
|
||||
]
|
||||
}
|
||||
],
|
||||
|
@ -1,107 +1,177 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "15a565d8-b9ee-427d-8631-6e1a26089b7f",
|
||||
"cell_type": "markdown",
|
||||
"id": "19ee7dde-0d74-47e8-8d0d-4ffcb99e2f5a",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"(0, 4, 7, 5, 2, 6, 1, 3)"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from itertools import permutations\n",
|
||||
"from typing import *\n",
|
||||
"<div align=\"right\"><i>Peter Norvig<br>Sept 25, 2024</i></div>\n",
|
||||
"\n",
|
||||
"def nqueens(n=8) -> Iterable[Sequence[int]]:\n",
|
||||
" \"\"\"All ways of arranging `n` non-attacking queens on an `n` x `n` board.\n",
|
||||
" Each way is a sequence of `n` column numbers, one for each row\"\"\"\n",
|
||||
" return (cols for cols in permutations(range(n))\n",
|
||||
" if different(diagonal1(cols)) \n",
|
||||
" and different(diagonal2(cols)))\n",
|
||||
"# The Languages of English, Math, and Programming\n",
|
||||
"\n",
|
||||
"def different(items) -> bool: return len(items) == len(set(items))\n",
|
||||
"def diagonal1(cols): return [col - row for row, col in enumerate(cols)]\n",
|
||||
"def diagonal2(cols): return [col + row for row, col in enumerate(cols)]\n",
|
||||
"My colleague [Wei-Hwa Huang](https://en.wikipedia.org/wiki/Wei-Hwa_Huang) gave several AI chatbots this prompt: \n",
|
||||
"\n",
|
||||
"assert len(set(nqueens(8))) == 92\n",
|
||||
"**List all the ways in which three distinct positive integers have a product of 108.**\n",
|
||||
"\n",
|
||||
"next(nqueens(8))"
|
||||
"I tested this prompt on the following solvers:\n",
|
||||
"- [A human programmer](https://github.com/norvig/)\n",
|
||||
"- [Gemini Advanced](https://gemini.google.com/app)\n",
|
||||
"- [ChatGPT 4o](https://chatgpt.com/)\n",
|
||||
"- [Microsoft Copilot](https://copilot.microsoft.com/)\n",
|
||||
"- [Anthropic Claude 3.5 Sonnet](https://claude.ai/new)\n",
|
||||
"- [Meta AI Llama 3](https://www.meta.ai/)\n",
|
||||
"- [Perplexity](https://www.perplexity.ai/)\n",
|
||||
"- [Cohere Chat](https://cohere.com/chat)\n",
|
||||
"- [HuggingFace Chat](https://huggingface.co/chat/)\n",
|
||||
"- [You.com](https://you.com/)\n",
|
||||
"\n",
|
||||
"All the LLMs Wei-Hwa originally tried got this one wrong. From my expanded list, Gemini, ChatGPT 4o, You.com and the human got it right, and 5 other models made mistakes:\n",
|
||||
"- The LLMs all started their answer by noting that 108 = 2 × 2 × 3 × 3 × 3, and then tried to partition those factors into three distinct subsets and report all ways to do so.\n",
|
||||
"- So far so good.\n",
|
||||
"- But most of them forgot that 1 could be a factor of 108 (or equivalently, that the empty set of factors is a valid subset). \n",
|
||||
"- Some of the models ignored the need for \"distinct\" integers, and proposed, say, 3 × 6 × 6.\n",
|
||||
"- Some got 5 or 6 correct triplets, and then stopped, perhaps because their attention mechanism didn't go back far enough.\n",
|
||||
"- SOme even proposed non-integers as \"factors\".\n",
|
||||
"\n",
|
||||
"I thought that the models might have skipped 1 as a factor because 1 is not listed in the prime factorization, so it is easy to forget. But in programming, it is more natural to run a loop from 1 to *n* than from 2 to *n*, so this error would be less likely. Therefore, I decided to test all the models with the following prompt: \n",
|
||||
"\n",
|
||||
"**Write a Python program to list all the ways in which three distinct positive integers have a product of 108.**\n",
|
||||
"\n",
|
||||
"# TLDR: Conclusion\n",
|
||||
"\n",
|
||||
"The models did much better with this prompt. My conclusion is that the language used to solve a problem matters. Sometimes a natural language such as English is a good choice, sometimes you need the language of mathematical equations, or maybe chemical equations, and sometimes a programming language is best.\n",
|
||||
"\n",
|
||||
"# Human\n",
|
||||
"\n",
|
||||
"A human (me) was able to correctly respond to the prompt:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "fd7a971c-d3f4-4f2e-89db-777fb2a208d4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Q . . . . . . . \n",
|
||||
". . . . Q . . . \n",
|
||||
". . . . . . . Q \n",
|
||||
". . . . . Q . . \n",
|
||||
". . Q . . . . . \n",
|
||||
". . . . . . Q . \n",
|
||||
". Q . . . . . . \n",
|
||||
". . . Q . . . . \n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def show(queens, dot='. ', Q='Q ') -> None:\n",
|
||||
" \"\"\"Print the board.\"\"\"\n",
|
||||
" m = max(queens)\n",
|
||||
" for col in queens:\n",
|
||||
" print(dot * col + Q + dot * (m - col))\n",
|
||||
"\n",
|
||||
"show(next(nqueens())) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"execution_count": 1,
|
||||
"id": "f8a27ed0-c2b1-47a0-bdf0-c6a8cd789dc5",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[(1, 2, 54),\n",
|
||||
" (1, 3, 36),\n",
|
||||
" (1, 4, 27),\n",
|
||||
" (1, 6, 18),\n",
|
||||
" (1, 9, 12),\n",
|
||||
" (2, 3, 18),\n",
|
||||
" (2, 6, 9),\n",
|
||||
" (3, 4, 9)]"
|
||||
"[{1, 2, 54},\n",
|
||||
" {1, 3, 36},\n",
|
||||
" {1, 4, 27},\n",
|
||||
" {1, 6, 18},\n",
|
||||
" {1, 9, 12},\n",
|
||||
" {2, 3, 18},\n",
|
||||
" {2, 6, 9},\n",
|
||||
" {3, 4, 9}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 6,
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import math\n",
|
||||
"from math import prod\n",
|
||||
"from itertools import combinations\n",
|
||||
"from typing import *\n",
|
||||
"\n",
|
||||
"def find_product(n, k) -> List[Tuple[int, ...]]:\n",
|
||||
"def find_products(k=3, n=108) -> List[Set[int]]:\n",
|
||||
" \"\"\"A list of all ways in which `k` distinct positive integers have a product of `n`.\"\"\" \n",
|
||||
" factors = {i for i in range(1, n + 1) if n % i == 0}\n",
|
||||
" return [tup for tup in combinations(factors, k) if math.prod(tup) == n]\n",
|
||||
" return [set(ints) for ints in combinations(factors, k) if prod(ints) == n]\n",
|
||||
"\n",
|
||||
"find_product(108, 3)"
|
||||
"find_products()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b7682af0-8c46-4e19-bdba-71f28bbfa101",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The program can solve different versions of the problem (although it would be slow when `n` is over a billion or so):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "55dfa9c2-f366-42c8-ae50-6a1df80c47b3",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"[{1, 2, 3, 4, 15},\n",
|
||||
" {1, 2, 3, 5, 12},\n",
|
||||
" {1, 2, 3, 6, 10},\n",
|
||||
" {1, 2, 4, 5, 9},\n",
|
||||
" {1, 3, 4, 5, 6}]"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"find_products(5, 360)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "42fc5098-2315-4984-b256-4fbf2f6fec51",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Gemini Advanced\n",
|
||||
"\n",
|
||||
"Gemini produced three drafts, of which the following one was correct. In another draft, it had the line `k = product // (i * j)`, using integer division, which is incompatible with the `k.is_integer()` test. Here is the correct draft:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "29789909-15d1-4833-b35f-cb5399893ba2",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_combinations(product):\n",
|
||||
" \"\"\"Finds all combinations of three distinct positive integers that multiply to the given product.\n",
|
||||
"\n",
|
||||
" Args:\n",
|
||||
" product: The product to find combinations for.\n",
|
||||
"\n",
|
||||
" Returns:\n",
|
||||
" A list of tuples, where each tuple represents a combination of three integers.\n",
|
||||
" \"\"\"\n",
|
||||
"\n",
|
||||
" combinations = []\n",
|
||||
" for i in range(1, product):\n",
|
||||
" for j in range(i + 1, product):\n",
|
||||
" k = product / (i * j)\n",
|
||||
" if k > j and k.is_integer():\n",
|
||||
" combinations.append((i, j, int(k)))\n",
|
||||
" return combinations\n",
|
||||
"\n",
|
||||
"product = 108\n",
|
||||
"result = find_combinations(product)\n",
|
||||
"print(result)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "81f032a3-5621-470e-9ada-8d3e026ff1cc",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# ChatGPT 4o\n",
|
||||
"\n",
|
||||
"ChatGPT got it right (although it imported `itertools` and then never used it)."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -114,7 +184,7 @@
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Triplets with product 108:\n",
|
||||
"All distinct triplets of integers that multiply to 108:\n",
|
||||
"(1, 2, 54)\n",
|
||||
"(1, 3, 36)\n",
|
||||
"(1, 4, 27)\n",
|
||||
@ -127,30 +197,359 @@
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_triplets(target_product):\n",
|
||||
" triplets = []\n",
|
||||
" for a in range(1, target_product + 1):\n",
|
||||
" for b in range(a + 1, target_product + 1): # Ensure distinctness (b > a)\n",
|
||||
" c = target_product // (a * b)\n",
|
||||
" if a * b * c == target_product and c > b: # Ensure distinctness (c > b)\n",
|
||||
" triplets.append((a, b, c))\n",
|
||||
" return triplets\n",
|
||||
"import itertools\n",
|
||||
"\n",
|
||||
"target_product = 108\n",
|
||||
"triplets = find_triplets(target_product)\n",
|
||||
"def find_triplets_with_product(product):\n",
|
||||
" solutions = []\n",
|
||||
" \n",
|
||||
" # Iterate over all possible combinations of distinct integers\n",
|
||||
" for a in range(1, int(product ** (1/3)) + 1):\n",
|
||||
" for b in range(a + 1, product // a + 1):\n",
|
||||
" for c in range(b + 1, product // (a * b) + 1):\n",
|
||||
" if a * b * c == product:\n",
|
||||
" solutions.append((a, b, c))\n",
|
||||
" \n",
|
||||
" return solutions\n",
|
||||
"\n",
|
||||
"product = 108\n",
|
||||
"triplets = find_triplets_with_product(product)\n",
|
||||
"\n",
|
||||
"if triplets:\n",
|
||||
" print(f\"Triplets with product {target_product}:\")\n",
|
||||
" print(f\"All distinct triplets of integers that multiply to {product}:\")\n",
|
||||
" for triplet in triplets:\n",
|
||||
" print(triplet)\n",
|
||||
"else:\n",
|
||||
" print(f\"No triplets found with product {target_product}\")"
|
||||
" print(f\"No distinct triplets found that multiply to {product}.\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "17e74293-feab-4fff-b682-bc26823ebefa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Bing CoPilot\n",
|
||||
"\n",
|
||||
"Bing produces a very clean (but somewhat slower) `find_triplets` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "2b0ee5b6-64bd-4581-bdc7-d22065f02d64",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"All distinct triplets with a product of 108:\n",
|
||||
"(1, 2, 54)\n",
|
||||
"(1, 3, 36)\n",
|
||||
"(1, 4, 27)\n",
|
||||
"(1, 6, 18)\n",
|
||||
"(1, 9, 12)\n",
|
||||
"(2, 3, 18)\n",
|
||||
"(2, 6, 9)\n",
|
||||
"(3, 4, 9)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_triplets(product):\n",
|
||||
" triplets = []\n",
|
||||
" for a in range(1, product + 1):\n",
|
||||
" for b in range(a + 1, product + 1):\n",
|
||||
" for c in range(b + 1, product + 1):\n",
|
||||
" if a * b * c == product:\n",
|
||||
" triplets.append((a, b, c))\n",
|
||||
" return triplets\n",
|
||||
"\n",
|
||||
"product = 108\n",
|
||||
"triplets = find_triplets(product)\n",
|
||||
"\n",
|
||||
"print(f\"All distinct triplets with a product of {product}:\")\n",
|
||||
"for triplet in triplets:\n",
|
||||
" print(triplet)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "fddeabc0-4925-4145-b666-7a094fd61980",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Claude 3.5 Sonnet\n",
|
||||
"\n",
|
||||
"Claude's answer is quite simlar to ChatGPT 4o's (but I would criticize it for not taking a parameter):"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "83f9088c-8444-4207-ad24-91d8e3a98004",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"There are 8 ways to express 108 as a product of three distinct positive integers:\n",
|
||||
"1 x 2 x 54 = 108\n",
|
||||
"1 x 3 x 36 = 108\n",
|
||||
"1 x 4 x 27 = 108\n",
|
||||
"1 x 6 x 18 = 108\n",
|
||||
"1 x 9 x 12 = 108\n",
|
||||
"2 x 3 x 18 = 108\n",
|
||||
"2 x 6 x 9 = 108\n",
|
||||
"3 x 4 x 9 = 108\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_combinations():\n",
|
||||
" result = []\n",
|
||||
" for i in range(1, 108):\n",
|
||||
" for j in range(i + 1, 108):\n",
|
||||
" k = 108 // (i * j)\n",
|
||||
" if i * j * k == 108 and k > j:\n",
|
||||
" result.append((i, j, k))\n",
|
||||
" return result\n",
|
||||
"\n",
|
||||
"def main():\n",
|
||||
" combinations = find_combinations()\n",
|
||||
" print(f\"There are {len(combinations)} ways to express 108 as a product of three distinct positive integers:\")\n",
|
||||
" for combo in combinations:\n",
|
||||
" print(f\"{combo[0]} x {combo[1]} x {combo[2]} = 108\")\n",
|
||||
"\n",
|
||||
"if __name__ == \"__main__\":\n",
|
||||
" main()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "eefc3007-105d-41cd-9023-0a34609517fd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Meta AI Llama 3.1\n",
|
||||
"\n",
|
||||
"This program forgets to test for `i < j < k`, and thus produces six permutations of each triplet. Oddly, it claims that \"*When you run this program, it will output: [(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]*\", so it knows the correct answer, but it didn't run the program to test that it is correct."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "de1d0cc3-10a8-417a-a35a-0a66325f549c",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (1, 12, 9), (1, 18, 6), (1, 27, 4), (1, 36, 3), (1, 54, 2), (2, 1, 54), (2, 3, 18), (2, 6, 9), (2, 9, 6), (2, 18, 3), (2, 54, 1), (3, 1, 36), (3, 2, 18), (3, 4, 9), (3, 9, 4), (3, 18, 2), (3, 36, 1), (4, 1, 27), (4, 3, 9), (4, 9, 3), (4, 27, 1), (6, 1, 18), (6, 2, 9), (6, 9, 2), (6, 18, 1), (9, 1, 12), (9, 2, 6), (9, 3, 4), (9, 4, 3), (9, 6, 2), (9, 12, 1), (12, 1, 9), (12, 9, 1), (18, 1, 6), (18, 2, 3), (18, 3, 2), (18, 6, 1), (27, 1, 4), (27, 4, 1), (36, 1, 3), (36, 3, 1), (54, 1, 2), (54, 2, 1)]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_factors(n):\n",
|
||||
" factors = []\n",
|
||||
" for i in range(1, n + 1):\n",
|
||||
" if n % i == 0:\n",
|
||||
" factors.append(i)\n",
|
||||
" return factors\n",
|
||||
"\n",
|
||||
"def find_triplets(n):\n",
|
||||
" factors = find_factors(n)\n",
|
||||
" triplets = []\n",
|
||||
" for i in factors:\n",
|
||||
" for j in factors:\n",
|
||||
" if i * j <= n:\n",
|
||||
" k = n // (i * j)\n",
|
||||
" if i != j and j != k and i != k and i * j * k == n:\n",
|
||||
" triplets.append((i, j, k))\n",
|
||||
" return triplets\n",
|
||||
"\n",
|
||||
"print(find_triplets(108))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "742f3b13-a53f-497a-97e7-5aa58b361951",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Perplexity\n",
|
||||
"\n",
|
||||
"Perplexity does well."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "7a764a01-8f18-4773-bd7a-fbc8dc652450",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"All combinations of three distinct positive integers with a product of 108:\n",
|
||||
"1 x 2 x 54 = 108\n",
|
||||
"1 x 3 x 36 = 108\n",
|
||||
"1 x 4 x 27 = 108\n",
|
||||
"1 x 6 x 18 = 108\n",
|
||||
"1 x 9 x 12 = 108\n",
|
||||
"2 x 3 x 18 = 108\n",
|
||||
"2 x 6 x 9 = 108\n",
|
||||
"3 x 4 x 9 = 108\n",
|
||||
"\n",
|
||||
"Total number of combinations: 8\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_combinations(target):\n",
|
||||
" combinations = []\n",
|
||||
" \n",
|
||||
" for i in range(1, target + 1):\n",
|
||||
" if target % i == 0:\n",
|
||||
" for j in range(i + 1, target + 1):\n",
|
||||
" if (target // i) % j == 0:\n",
|
||||
" k = target // (i * j)\n",
|
||||
" if k > j and i * j * k == target:\n",
|
||||
" combinations.append((i, j, k))\n",
|
||||
" \n",
|
||||
" return combinations\n",
|
||||
"\n",
|
||||
"target_product = 108\n",
|
||||
"result = find_combinations(target_product)\n",
|
||||
"\n",
|
||||
"print(f\"All combinations of three distinct positive integers with a product of {target_product}:\")\n",
|
||||
"for combo in result:\n",
|
||||
" print(f\"{combo[0]} x {combo[1]} x {combo[2]} = {target_product}\")\n",
|
||||
"\n",
|
||||
"print(f\"\\nTotal number of combinations: {len(result)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "53f85624-bfe3-42cc-b55b-b86a56089314",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Cohere Chat Playground\n",
|
||||
"\n",
|
||||
"Cohere fails to find any combinations. The 5th line should just be `k = product // i // j`; Cohere bizarrely adds `* j * i`. It also fails to test that `i < j < k`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "61139601-ced2-4598-ae76-464c1041d2d1",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"All combinations of three distinct positive integers with a product of 108:\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_combinations(product):\n",
|
||||
" combinations = []\n",
|
||||
" for i in range(1, product // 3 + 1):\n",
|
||||
" for j in range(i, product // 2 + 1):\n",
|
||||
" k = product // i // j * j * i\n",
|
||||
" if i * j * k == product and i != j and j != k and i != k:\n",
|
||||
" combinations.append((i, j, k))\n",
|
||||
" return combinations\n",
|
||||
"\n",
|
||||
"product = 108\n",
|
||||
"combinations = find_combinations(product)\n",
|
||||
"print(f\"All combinations of three distinct positive integers with a product of {product}:\")\n",
|
||||
"for combo in combinations:\n",
|
||||
" print(combo)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "51b0b353-79d2-4da7-a6da-8231d5ff811e",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# HuggingChat\n",
|
||||
"\n",
|
||||
"Hugging Chat produced a correct concise program. I note that `i < j < k` would be cleaner than `k >= j and i!= j and j!= k` here (and for others as well)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "6e4b823b-66f1-48ae-b4ca-00387d480e03",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_triplets(n):\n",
|
||||
" triplets = []\n",
|
||||
" for i in range(1, n):\n",
|
||||
" for j in range(i+1, n):\n",
|
||||
" k = n // (i * j)\n",
|
||||
" if k >= j and i * j * k == n and i!= j and j!= k:\n",
|
||||
" triplets.append((i, j, k))\n",
|
||||
" return triplets\n",
|
||||
"\n",
|
||||
"print(find_triplets(108))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "23d78bca-9b35-43c0-891a-a9c0e6801b31",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# You.com\n",
|
||||
"\n",
|
||||
"You.com produces a correct solution, with some nice optimizations that make it *O*(*n*<sup>5/6</sup>), whereas most of the solutions are *O*(*n*<sup>2</sup>). This means it can handle a 14-digit product in a second of run time, whereas the human-written solution can only handle 10-digit products in one second, while the HuggingChat version (for example) takes several seconds just to handle a 5-digit product."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "fe69bb04-02de-4f38-a8ab-6693c15b02c4",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"def find_triplets(product):\n",
|
||||
" triplets = []\n",
|
||||
" for i in range(1, int(product ** (1/3)) + 1): # The cube root of the product is the maximum possible value for i\n",
|
||||
" if product % i == 0:\n",
|
||||
" for j in range(i + 1, int((product / i) ** 0.5) + 1): # The square root of the product divided by i is the maximum possible value for j\n",
|
||||
" if (product / i) % j == 0:\n",
|
||||
" k = product // (i * j)\n",
|
||||
" if k > j: # Ensure the integers are distinct\n",
|
||||
" triplets.append((i, j, k))\n",
|
||||
" return triplets\n",
|
||||
"\n",
|
||||
"triplets = find_triplets(108)\n",
|
||||
"print(triplets)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "6db0bbe1-5156-4e05-9a02-bbe3ed156f89",
|
||||
"id": "fc16ca96-0b05-4ad4-82c0-552bb99373fd",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
|
Loading…
Reference in New Issue
Block a user