\n",
"\n",
"# LLMs, Theory of Mind, and Cheryl's Birthday\n",
"\n",
"There has been [much](https://spectrum.ieee.org/theory-of-mind-ai) [debate](https://aclanthology.org/2023.conll-1.25/) [on](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models) [the](https://arxiv.org/abs/2302.02083) [degree](https://www.nature.com/articles/s41562-024-01882-z) to which Large Language Models (LLMs) have a theory of mind: a way of understanding what other people know and don't know. In this notebook I explore one small part of the issue by asking nine LLM chatbots to solve the [Cheryl's Birthday Problem](https://en.wikipedia.org/wiki/Cheryl%27s_Birthday), a well-known logic puzzle in which different characters have different states of knowledge at different times. I gave the candidate solvers two tasks:\n",
"1. Write a program to solve the problem.\n",
"2. Solve a re-worded variant of the problem with different dates (so that they can't just retrieve a memorized answer).\n",
"\n",
"Here are the ten solvers:\n",
"\n",
"- [A human programmer](https://github.com/norvig/)\n",
"- [ChatGPT 4o](https://chatgpt.com/)\n",
"- [Microsoft Copilot](https://copilot.microsoft.com/)\n",
"- [Gemini Advanced](https://gemini.google.com/app)\n",
"- [Meta AI Llama 405B](https://www.meta.ai/)\n",
"- [Anthropic Claude 3.5 Sonnet](https://claude.ai/new)\n",
"- [Perplexity](https://www.perplexity.ai/)\n",
"- [Cohere Chat](https://cohere.com/chat)\n",
"- [HuggingFace Chat](https://huggingface.co/chat/)\n",
"- [You.com](https://you.com/)\n",
"\n",
"# TLDR: Conclusions\n",
"\n",
"1. The human solved both requests.\n",
"2. None of the LLMs could reliably solve either request.\n",
"\n",
"The LLMs were all familiar with the problem, so I didn't have to describe it in the prompt, just name it. Most of them correctly recalled the answer to the original problem: July 16. But none of them were able to write a program. They all failed to distinguish the different knowledge states of the different characters over time, both in the program they wrote and in the resoning steps for the second request. At least with respect to this problem, they had no theory of mind. (Perhaps that is in part due to the fact that very few of the Python programs they were trained on deal with theory of mind.)\n",
"\n",
"\n",
"# First Prompt\n",
"\n",
"Here is the first prompt:\n",
"\n",
"___\n",
"***What is the answer to the \"Cheryl's Birthday\" problem? Write a Python program to solve it. Make sure that the program will still work if the list of possible dates is changed.***\n",
"___\n",
"\n",
"\n",
"# Responses to First Prompt\n",
"\n",
"Each LLM provided explanatory output along with a program; for brevity I only show the explanatory output from the first LLM, ChatGPT 4o. My comments are in *[bracketed italics]*. \n"
]
},
{
"cell_type": "markdown",
"id": "c3e532bf-67b0-4976-aa5f-730401a4ef07",
"metadata": {},
"source": [
"# Human\n",
"\n",
"An actual human (me) was able to write a program, shown in [**another notebook**](https://github.com/norvig/pytudes/blob/main/ipynb/Cheryl-and-Eve.ipynb), that correctly solves the original problem and also handles new sets of dates, and other variations on the problem. I introduced the idea of a *BeliefState*, a set of possible dates that a person believes might possibly be the birthday, and I modeled a character's *statement* as a function that takes a particular date as input, and returns true if the date is consistent with the statement."
]
},
{
"cell_type": "markdown",
"id": "55f6744e-356c-4fd8-af26-e7559618dd8d",
"metadata": {},
"source": [
"# ChatGPT 4o\n",
"\n",
"*[ChatGPT 4o gives the following response, which starts with an excellent summary of the problem:]*\n",
"\n",
"The \"Cheryl's Birthday\" problem is a well-known logic puzzle that can be solved using logical deduction. Let's first break down the problem before writing a Python program.\n",
"\n",
"### Problem Statement\n",
"Cheryl gives Albert and Bernard a list of 10 possible birthdates, and tells them different information:\n",
"- Albert knows the month.\n",
"- Bernard knows the day.\n",
"\n",
"Cheryl provides the following 10 possible dates:\n",
"- May 15, May 16, May 19\n",
"- June 17, June 18\n",
"- July 14, July 16\n",
"- August 14, August 15, August 17\n",
"\n",
"Albert then says: \"I don't know when Cheryl's birthday is, but I know that Bernard doesn't know either.\"\n",
"\n",
"Bernard replies: \"At first I didn’t know when Cheryl's birthday is, but now I know.\"\n",
"\n",
"Albert then says: \"Now I know when Cheryl's birthday is.\"\n",
"\n",
"### Steps for Deduction\n",
"1. From Albert's statement, we can deduce that Cheryl's birthday is not in May or June, because if it were, Bernard could possibly know the exact birthday if the day was unique (May 19 or June 18). Since Albert is sure Bernard doesn't know, we can eliminate May and June.\n",
"2. After eliminating May and June, the remaining possible dates are:\n",
" - July 14, July 16\n",
" - August 14, August 15, August 17\n",
"3. From Bernard’s statement, now knowing the birthday, we can deduce that Cheryl's birthday isn't on July 14 or August 14, because if it were, Bernard would still be uncertain (since both dates share the day 14).\n",
"4. The remaining possible dates are:\n",
" - July 16\n",
" - August 15\n",
" - August 17\n",
"5. Finally, from Albert’s statement, we deduce that Cheryl's birthday must be in July, because if it were in August, Albert wouldn’t be able to know the exact date. Thus, the birthday is July 16.\n",
"\n",
"### Python Program to Solve the Problem\n",
"We can now write a Python program to generalize this solution and ensure it works for any list of possible dates.\n",
"\n",
"### Explanation:\n",
"- Step 1: The function groups the dates by month and by day.\n",
"- Step 2: Based on Albert's statement, we eliminate the months of May and June.\n",
"- Step 3: Based on Bernard's statement, we eliminate dates where the day appears more than once (i.e., both July 14 and August 14 are eliminated).\n",
"- Step 4: Albert now knows Cheryl's birthday, so it must be in the month where only one date is left (i.e., July).\n",
"- \n",
"The program is designed to work with any given list of possible dates, not just the ones in the original problem."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "03ff513c-d390-413e-b524-9c82de44e1a6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cheryl's birthday is: No solution found\n"
]
}
],
"source": [
"def filter_dates(possible_dates):\n",
" # Step 1: Group dates by month and day\n",
" month_map = {}\n",
" day_map = {}\n",
" \n",
" for date in possible_dates:\n",
" month, day = date.split()\n",
" day = int(day)\n",
" \n",
" # Group by month\n",
" if month not in month_map:\n",
" month_map[month] = []\n",
" month_map[month].append(day)\n",
" \n",
" # Group by day\n",
" if day not in day_map:\n",
" day_map[day] = []\n",
" day_map[day].append(month)\n",
" \n",
" # Step 2: Albert says: \"I don't know when Cheryl's birthday is, but I know Bernard doesn't know either.\"\n",
" # This implies Cheryl's birthday cannot be in May or June because these months have unique days (19th and 18th).\n",
" possible_dates = [date for date in possible_dates if date.split()[0] not in ('May', 'June')]\n",
" \n",
" # Step 3: Bernard now knows the birthday, so it cannot be a day that appears more than once.\n",
" possible_dates = [date for date in possible_dates if len(day_map[int(date.split()[1])]) == 1]\n",
" \n",
" # Step 4: Albert now knows the birthday, so it must be in a unique month remaining after Bernard's elimination.\n",
" remaining_months = set(date.split()[0] for date in possible_dates)\n",
" for date in possible_dates:\n",
" month = date.split()[0]\n",
" if len(month_map[month]) == 1:\n",
" return date\n",
" \n",
" return \"No solution found\"\n",
"\n",
"# Test the program with the given list of possible dates\n",
"possible_dates = [\n",
" \"May 15\", \"May 16\", \"May 19\",\n",
" \"June 17\", \"June 18\",\n",
" \"July 14\", \"July 16\",\n",
" \"August 14\", \"August 15\", \"August 17\"\n",
"]\n",
"\n",
"cheryls_birthday = filter_dates(possible_dates)\n",
"print(\"Cheryl's birthday is:\", cheryls_birthday)"
]
},
{
"cell_type": "markdown",
"id": "cb513413-f71d-476d-a6bc-211ab1589bbe",
"metadata": {},
"source": [
"*[We see that the program fails to find a solution. It starts with the correct list of \"possible_dates\", and correctly updates the list twice to reflect Albert's and Bernard's first statements. So far so good. But then things go wrong: first it defines \"remaining_months\" and never uses it. Then it tries to find a possible date in a unique month, but it should be a unique month with respect to what is now known, not with respect to the original possible dates stored in the \"month_map\". The program has failed to track the two character's different belief states over time.]* "
]
},
{
"cell_type": "markdown",
"id": "93852dab-d167-4d14-8cc6-02979427eb01",
"metadata": {},
"source": [
"# Microsoft Copilot\n",
"\n",
"*[Copilot makes a similar mistake.*]"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5d7e4408-3663-4f54-acdb-4bddf1ac9121",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cheryl's birthday is: None\n"
]
}
],
"source": [
"def cheryls_birthday(possible_dates):\n",
" # Step 1: Filter out months with unique days\n",
" unique_days = {date.split()[1] for date in possible_dates if sum(d.split()[1] == date.split()[1] for d in possible_dates) == 1}\n",
" possible_dates = [date for date in possible_dates if date.split()[1] not in unique_days]\n",
"\n",
" # Step 2: Filter out dates with unique months\n",
" unique_months = {date.split()[0] for date in possible_dates if sum(d.split()[0] == date.split()[0] for d in possible_dates) == 1}\n",
" possible_dates = [date for date in possible_dates if date.split()[0] not in unique_months]\n",
"\n",
" # Step 3: Filter out remaining dates based on the logic of the puzzle\n",
" for date in possible_dates:\n",
" month, day = date.split()\n",
" if all(d.split()[0] != month for d in possible_dates if d != date):\n",
" return date\n",
"\n",
"# List of possible dates\n",
"possible_dates = [\n",
" \"May 15\", \"May 16\", \"May 19\",\n",
" \"June 17\", \"June 18\",\n",
" \"July 14\", \"July 16\",\n",
" \"August 14\", \"August 15\", \"August 17\"\n",
"]\n",
"\n",
"birthday = cheryls_birthday(possible_dates)\n",
"print(f\"Cheryl's birthday is: {birthday}\")"
]
},
{
"cell_type": "markdown",
"id": "d981531c-31f6-442e-b210-d0bff93d49bc",
"metadata": {},
"source": [
"# Gemini Advanced\n",
"\n",
"*[Gemini makes a similar mistake.]*\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "7c0bd1e7-4ada-4b43-97a5-674f615b9f77",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Unable to determine Cheryl's birthday\n",
"With the new set of dates, unable to determine Cheryl's birthday\n"
]
}
],
"source": [
"def cheryl_birthday(possible_dates):\n",
" \"\"\"Solves the Cheryl's Birthday logic problem, allowing for a variable list of possible dates.\"\"\"\n",
"\n",
" # Possible months\n",
" months = [\"May\", \"June\", \"July\", \"August\"]\n",
"\n",
" # Albert's statement eliminates some possibilities\n",
" # He knows the month, so there must be only one possible date in that month\n",
" for month in months:\n",
" if len([date for date in possible_dates if (month, date) not in [(\"May\", 19), (\"June\", 18)]]) == 1:\n",
" months = [month]\n",
" break\n",
"\n",
" # Bernard's statement further narrows down the possibilities\n",
" # He knows the date, so there must be only one possible month for that date\n",
" for date in possible_dates:\n",
" if len([month for month in months if (month, date) not in [(\"May\", 19), (\"June\", 18)]]) == 1:\n",
" possible_dates = [date]\n",
" break\n",
"\n",
" # After Bernard's statement, Albert also knows the answer\n",
" # This means there's only one possible date left in the remaining month\n",
" for month in months:\n",
" if len([date for date in possible_dates if (month, date) not in [(\"May\", 19), (\"June\", 18)]]) == 1:\n",
" possible_dates = [date]\n",
" break\n",
"\n",
" # Now there should be only one possibility left\n",
" if len(months) == 1 and len(possible_dates) == 1:\n",
" return months[0], possible_dates[0]\n",
" else:\n",
" return None\n",
"\n",
"# Original problem setup\n",
"original_dates = [14, 15, 16, 17, 18, 19]\n",
"solution = cheryl_birthday(original_dates)\n",
"\n",
"if solution:\n",
" print(f\"Cheryl's birthday is on {solution[0]} {solution[1]}\")\n",
"else:\n",
" print(\"Unable to determine Cheryl's birthday\")\n",
"\n",
"# Example with a different set of possible dates\n",
"new_dates = [13, 14, 15, 16, 17]\n",
"new_solution = cheryl_birthday(new_dates)\n",
"\n",
"if new_solution:\n",
" print(f\"With the new set of dates, Cheryl's birthday is on {new_solution[0]} {new_solution[1]}\")\n",
"else:\n",
" print(\"With the new set of dates, unable to determine Cheryl's birthday\")"
]
},
{
"cell_type": "markdown",
"id": "11f9bfc5-8548-4f7a-ac69-acecddb6f19b",
"metadata": {},
"source": [
"*[The program accepts a list of \"possible_dates\", but confusingly, those are actually just days of the month; you aren't allowed to pass in different month/day combinations. And even if you do pass in different days, the program hard-codes certain days and months.]*"
]
},
{
"cell_type": "markdown",
"id": "3932858f-170b-4979-a565-c3b41a74e7fe",
"metadata": {},
"source": [
"# Meta Llama 405B\n",
"\n",
"*[Llama also fails to find a solution (and thus has no output). This program is quite flawed. I do like that it defines functions for the various pieces, but note that the function \"bernard_deduce\" is defined but never called.]*"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "1faff935-74aa-44ee-9f37-15f71521e7d2",
"metadata": {},
"outputs": [],
"source": [
"# Possible dates for Cheryl's birthday\n",
"dates = [\n",
" (5, 13), (5, 15), (5, 16), \n",
" (6, 13), (6, 15), (6, 16), \n",
" (7, 13), (7, 15), (7, 16), \n",
" (8, 13), (8, 15), (8, 16)\n",
"]\n",
"\n",
"# Function to check if a date is possible based on Albert's knowledge\n",
"def is_possible_albert(month):\n",
" return all(m == month for m, _ in dates)\n",
"\n",
"# Function to check if a date is possible based on Bernard's knowledge\n",
"def is_possible_bernard(day):\n",
" return any((m, d) == (month, day) for m, d in dates)\n",
"\n",
"# Function to check if Albert can deduce the birthday\n",
"def albert_deduce(month):\n",
" return len([d for m, d in dates if m == month]) == 1\n",
"\n",
"# Function to check if Bernard can deduce the birthday\n",
"def bernard_deduce(day):\n",
" return len([m for m, d in dates if d == day]) == 1\n",
"\n",
"# Main program\n",
"for month, day in dates:\n",
" if not is_possible_albert(month):\n",
" continue\n",
" if not is_possible_bernard(day):\n",
" continue\n",
" if albert_deduce(month):\n",
" print(f\"Cheryl's birthday is {month}/{day}\")\n",
" break"
]
},
{
"cell_type": "markdown",
"id": "02b1f7fd-918c-432d-8784-95dff3d4f86e",
"metadata": {},
"source": [
"# Claude 3.5 Sonnet\n",
"\n",
"*[Claude makes a similar mistake. It deserves credit for explicitly trying an example with different dates, and it correctly reports that the second example has no solution, but that was by accident.]*"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1da1319a-95d1-4c98-8b36-d96a8c552cb7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cheryl's birthday is on: No unique solution found.\n",
"With the new set of dates, Cheryl's birthday is on: No unique solution found.\n"
]
}
],
"source": [
"def solve_cheryls_birthday(dates):\n",
" # Extract unique months and days\n",
" months = sorted(set(date.split()[0] for date in dates))\n",
" days = sorted(set(int(date.split()[1]) for date in dates))\n",
"\n",
" # Step 1: Albert knows the month, Bernard knows the day\n",
" possible_months = [m for m in months if len([d for d in dates if d.startswith(m)]) > 1]\n",
" possible_days = [d for d in days if len([date for date in dates if date.endswith(str(d))]) > 1]\n",
"\n",
" # Step 2: Albert knows Bernard doesn't know\n",
" possible_dates = [date for date in dates if int(date.split()[1]) in possible_days]\n",
"\n",
" # Step 3: Bernard now knows\n",
" possible_dates = [date for date in possible_dates if possible_dates.count(date.split()[1]) == 1]\n",
"\n",
" # Step 4: Albert now knows\n",
" result = [date for date in possible_dates if possible_dates.count(date.split()[0]) == 1]\n",
"\n",
" return result[0] if result else \"No unique solution found.\"\n",
"\n",
"# Example usage\n",
"dates = [\n",
" \"May 15\", \"May 16\", \"May 19\",\n",
" \"June 17\", \"June 18\",\n",
" \"July 14\", \"July 16\",\n",
" \"August 14\", \"August 15\", \"August 17\"\n",
"]\n",
"\n",
"solution = solve_cheryls_birthday(dates)\n",
"print(f\"Cheryl's birthday is on: {solution}\")\n",
"\n",
"# Test with a different set of dates\n",
"new_dates = [\n",
" \"April 15\", \"April 16\", \"April 25\",\n",
" \"May 15\", \"May 16\",\n",
" \"June 17\", \"June 18\",\n",
" \"July 14\", \"July 16\", \"July 17\"\n",
"]\n",
"\n",
"new_solution = solve_cheryls_birthday(new_dates)\n",
"print(f\"With the new set of dates, Cheryl's birthday is on: {new_solution}\")"
]
},
{
"cell_type": "markdown",
"id": "52003af0-0ea1-4ed4-9b5e-5b9b8862a068",
"metadata": {},
"source": [
"# Perplexity\n",
"\n",
"*[Once again, this program has a similar mistake.]*"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "55e99206-42f8-44ea-a091-1ca6a14f4e89",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cheryl's birthday is: None\n"
]
}
],
"source": [
"from collections import defaultdict\n",
"\n",
"def solve_cheryls_birthday(possible_dates):\n",
" months = defaultdict(list)\n",
" days = defaultdict(list)\n",
" \n",
" for date in possible_dates:\n",
" month, day = date.split()\n",
" months[month].append(date)\n",
" days[day].append(date)\n",
" \n",
" # Albert knows the month, Bernard knows the day\n",
" possible = [date for date in possible_dates if len(months[date.split()[0]]) > 1 and len(days[date.split()[1]]) > 1]\n",
" \n",
" # Bernard now knows the date\n",
" possible = [date for date in possible if len([d for d in possible if d.split()[1] == date.split()[1]]) == 1]\n",
" \n",
" # Albert now knows the date\n",
" possible = [date for date in possible if len([d for d in possible if d.split()[0] == date.split()[0]]) == 1]\n",
" \n",
" return possible[0] if len(possible) == 1 else None\n",
"\n",
"# Example usage\n",
"possible_dates = [\n",
" \"May 15\", \"May 16\", \"May 19\",\n",
" \"June 17\", \"June 18\",\n",
" \"July 14\", \"July 16\",\n",
" \"August 14\", \"August 15\", \"August 17\"\n",
"]\n",
"\n",
"result = solve_cheryls_birthday(possible_dates)\n",
"print(f\"Cheryl's birthday is: {result}\")"
]
},
{
"cell_type": "markdown",
"id": "4b4e47b5-f5a2-491d-ad34-f8c08aceca59",
"metadata": {},
"source": [
"# Cohere Chat Playground\n",
"\n",
"*[Cohere too has the same mistake.]*"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "f1c6bfd7-6738-4325-8bc4-f91b4ee15e48",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cheryl's birthday is: The solution could not be determined.\n"
]
}
],
"source": [
"def solve_cheryls_birthday(possible_dates):\n",
" # Create a dictionary to map months to their respective days\n",
" month_days = {\n",
" \"May\": [15, 16, 19],\n",
" \"June\": [17, 18],\n",
" \"July\": [14, 16],\n",
" \"August\": [14, 15, 17],\n",
" }\n",
"\n",
" # Albert knows the month, so we can filter out the possible dates based on that\n",
" albert_knows = [date for date in possible_dates if date[1] in month_days]\n",
"\n",
" # Bernard knows the day, so we can further filter based on that\n",
" bernard_knows = [date for date in albert_knows if date[2] in month_days[date[1]]]\n",
"\n",
" # If Bernard knows the birthday, Albert should also know it\n",
" if len(bernard_knows) == 1:\n",
" return bernard_knows[0]\n",
" else:\n",
" return \"The solution could not be determined.\"\n",
"\n",
"# Example usage:\n",
"possible_dates = [\n",
" (\"May\", 15, \"Friday\"),\n",
" (\"May\", 16, \"Saturday\"),\n",
" (\"May\", 19, \"Tuesday\"),\n",
" (\"June\", 17, \"Thursday\"),\n",
" (\"June\", 18, \"Friday\"),\n",
" (\"July\", 14, \"Monday\"),\n",
" (\"July\", 16, \"Wednesday\"),\n",
" (\"August\", 14, \"Wednesday\"),\n",
" (\"August\", 15, \"Thursday\"),\n",
" (\"August\", 17, \"Saturday\"),\n",
"]\n",
"\n",
"solution = solve_cheryls_birthday(possible_dates)\n",
"print(f\"Cheryl's birthday is: {solution}\")"
]
},
{
"cell_type": "markdown",
"id": "9d10112a-299e-49be-96ce-cf9551564894",
"metadata": {},
"source": [
"*[This program seems promising in that it has separate variables for \"albert_knows\" and \"bernard_knows\", but it does not handle them correctly. The program introduces days of the week, which is extraneous, but not harmful. A bigger issue is that it accepts \"possible_dates\" as a parameter, but then ignores them and defines \"month_days\" based on the original dates, not on the input.]*"
]
},
{
"cell_type": "markdown",
"id": "8379b2f8-9591-4f7e-be42-e5917ce0263c",
"metadata": {},
"source": [
"# HuggingFace Chat\n",
"\n",
"*[HuggingFace is unique in suggesting a class definition, but still makes a similar mistake. Also, while it accepts a list of possible dates in the initializer, it then hard-codes certain months and days from the original problem.]*"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e0d33270-0f02-4a70-8258-4b87aab63b01",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[]\n"
]
}
],
"source": [
"class CherylBirthdayProblem:\n",
" def __init__(self, possible_dates):\n",
" self.possible_dates = possible_dates \n",
" \n",
" def filter_dates(self, month, day):\n",
" return [date for date in self.possible_dates if date[0] == month and date[1] == day]\n",
" \n",
" def filter_months(self, month):\n",
" return [date for date in self.possible_dates if date[0] == month]\n",
" \n",
" def filter_days(self, day):\n",
" return [date for date in self.possible_dates if date[1] == day]\n",
" \n",
" def solve(self):\n",
" # Albert's first statement\n",
" self.possible_dates = self.filter_months(5) + self.filter_months(6)\n",
" \n",
" # Bernard's first statement\n",
" self.possible_dates = [date for date in self.possible_dates if date[1]!= 19]\n",
" \n",
" # Albert's second statement\n",
" self.possible_dates = self.filter_days(14) + self.filter_days(15) + self.filter_days(16) + self.filter_days(17) + self.filter_days(18)\n",
" \n",
" # Bernard's second statement\n",
" self.possible_dates = [date for date in self.possible_dates if date[1]!= 15 and date[0]!= 6]\n",
" \n",
" # Albert's third statement\n",
" self.possible_dates = self.filter_months(7)\n",
" \n",
" # Bernard's third statement\n",
" self.possible_dates = self.filter_days(16)\n",
" \n",
" return self.possible_dates\n",
"\n",
"possible_dates = [(5, 14), (5, 15), (5, 16), (6, 15), (6, 16), (6, 17), (6, 18), (6, 19), \n",
" (7, 14), (7, 15), (7, 16), (7, 17), (7, 18), (8, 14), (8, 15), (8, 16), (8, 17), (8, 18), (8, 19)]\n",
"problem = CherylBirthdayProblem(possible_dates)\n",
"print(problem.solve())"
]
},
{
"cell_type": "markdown",
"id": "39904aa9-d97f-4b0c-a2d0-4731c2d4aff0",
"metadata": {},
"source": [
"# You.com Genius Mode\n",
"\n",
"*[You.com makes a similar mistake, although the end result is that it selects the wrong date rather than no date.]*"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "232deb66-b208-4b65-9703-4519dc88c528",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"('May', 19)\n"
]
}
],
"source": [
"def cheryls_birthday(possible_dates):\n",
" # Step 1: Albert knows Bernard doesn't know. So, the month can't be one with a unique day.\n",
" step1 = [date for date in possible_dates \n",
" if not any(possible_dates.count(x) == 1 \n",
" for x in [d for m, d in possible_dates if m == date[0]])]\n",
"\n",
" # Step 2: Bernard now knows. So, the day must be unique among the remaining possible dates.\n",
" step2 = [date for date in step1 if sum(date[1] == d for m, d in step1) == 1]\n",
"\n",
" # Step 3: Albert now knows. So, the month must be unique among the remaining possible dates.\n",
" step3 = [date for date in step2 if sum(date[0] == m for m, d in step2) == 1]\n",
"\n",
" return step3[0] if step3 else None\n",
"\n",
"# List of possible dates\n",
"possible_dates = [('May', 15), ('May', 16), ('May', 19), ('June', 17), ('June', 18), \n",
" ('July', 14), ('July', 16), ('August', 14), ('August', 15), ('August', 17)]\n",
"\n",
"print(cheryls_birthday(possible_dates))"
]
},
{
"cell_type": "markdown",
"id": "dad267ee-36c9-4133-ae22-a436c2024e47",
"metadata": {},
"source": [
"# Second Prompt\n",
"\n",
"I used [my program](https://github.com/norvig/pytudes/blob/main/ipynb/Cheryl-and-Eve.ipynb) to generate a new set of 10 dates that work, changed the wording, and used this as the prompt:\n",
"\n",
"___\n",
"1. **Ali and Bo are friends with Cam. Cam told them that her anniversary is one of 10 possible dates:**\n",
" - **April 17, April 18, April 28, July 16, July 17, July 19, June 16, June 29, March 18, March 19**\n",
"3. **Cam then privately tells Ali the month and Bo the day number of the anniversary.**\n",
"4. **Ali: \"I don't know when Cam’s anniversary is, and I know that Bo does not know it either.\"**\n",
"5. **Bo: \"At first I didn't know when Cam’s anniversary was, but I know now, after Ali's statement.\"**\n",
"6. **Ali: \"Then I also know when Cam’s anniversary is.\"**\n",
"7. **When is Cam’s anniversary?**\n",
"___\n",
"\n",
"\n",
"# Responses to Second Prompt\n",
"\n",
"All the LLMs were generally headed in the right direction in their reasoning, but all made mistakes. For example, Claude says \"*Bo hears the day and realizes after Ali's statement. Since Bo did not initially know the date, the day number Bo heard must appear in more than one month. Therefore, the days 16, 18, and 19 must be eliminated since they have corresponding unique months.*\" But that's just not right; they don't have unique months. \n",
"\n",
"As it turns out, [http://you.com](you.com) did get the right answer, March 18! But some of the reasoning steps were wrong, so I tested it on another set of 10 dates, and it failed on that. Thus I declare that all the LLMs fail on this problem.\n",
"\n",
"Here are the responses: \n",
"\n",
"|LLM|Answer|\n",
"|---|------|\n",
"|[A human programmer](https://github.com/norvig/)|**March 18**|\n",
"|[ChatGPT 4o](https://chatgpt.com/)|July 17|\n",
"|[Microsoft Copilot](https://copilot.microsoft.com/)|June 17|\n",
"|[Gemini Advanced](https://gemini.google.com/app)|July 16|\n",
"|[Meta AI Llama 405B](https://www.meta.ai/)|July 19|\n",
"|[Anthropic Claude 3.5 Sonnet](https://claude.ai/new)|July 17|\n",
"|[Perplexity](https://www.perplexity.ai/)|April 17|\n",
"|[Cohere Chat](https://cohere.com/chat)|July 17|\n",
"|[HuggingFace Chat](https://huggingface.co/chat/)|July 17|\n",
"|[You.com](https://you.com)|**March 18** (but wrong answer on follow-up problem)|\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.15"
}
},
"nbformat": 4,
"nbformat_minor": 5
}