diff --git a/ipynb/Cheryl-and-Eve.ipynb b/ipynb/Cheryl-and-Eve.ipynb index 199c2e6..bb35c85 100644 --- a/ipynb/Cheryl-and-Eve.ipynb +++ b/ipynb/Cheryl-and-Eve.ipynb @@ -8,30 +8,31 @@ "\n", "# When Cheryl Met Eve: A Birthday Story\n", "\n", - "The *Cheryl's Birthday* logic puzzle [made the rounds](https://www.google.com/webhp?#q=cheryl%27s+birthday),\n", - "and I wrote [code](Cheryl.ipynb) that solves it. In that notebook I said that one reason for solving the problem with code rather than pencil and paper is that you can do more with code. \n", + "The ***Cheryl's Birthday*** logic puzzle [made the rounds](https://www.google.com/webhp?#q=cheryl%27s+birthday),\n", + "and I wrote [code](Cheryl.ipynb) that solves it. In that notebook I said that one reason for solving the puzzle with code rather than pencil and paper is that you can do more with code. \n", "\n", - "**[Gabe Gaster](http://www.gabegaster.com/)** proved me right when he [tweeted](https://twitter.com/gabegaster/status/593976413314777089/photo/1) that he had extended my code to generate a new list of dates that satisfies the constraints of the puzzle:\n", + "**[Gabe Gaster](http://www.gabegaster.com/)** proved me right when he [tweeted](https://twitter.com/gabegaster/status/593976413314777089/photo/1) that he had used my code to generate a new list of dates that satisfies the constraints of the puzzle:\n", "\n", " January 15, January 4,\n", " July 13, July 24, July 30,\n", " March 13, March 24,\n", " May 11, May 17, May 30\n", "\n", - "In this notebook, I verify Gabe's result, and find some other variations on the puzzle.\n", + "In this notebook, I verify Gabe's result, and explore some new variations on the puzzle.\n", "\n", - "First, let's recap [the puzzle](https://en.wikipedia.org/wiki/Cheryl%27s_Birthday):\n", + "First, let's recap [the original Cheryl's Birthday puzzle](https://en.wikipedia.org/wiki/Cheryl%27s_Birthday):\n", "\n", - "> 1. Albert and Bernard became friends with Cheryl, and want to know when her birthday is. Cheryl gave them a list of 10 possible dates:\n", - " May 15 May 16 May 19\n", - " June 17 June 18\n", - " July 14 July 16\n", - " August 14 August 15 August 17\n", - "> 2. **Cheryl** then privately tells Albert the month and Bernard the day of her birthday.\n", - "> 3. **Albert**: \"I don't know when Cheryl's birthday is, and I know that Bernard does not know.\"\n", - "> 4. **Bernard**: \"At first I don't know when Cheryl's birthday is, but I know now.\"\n", - "> 5. **Albert**: \"Then I also know when Cheryl's birthday is.\"\n", - "> 6. So when is Cheryl's birthday?" + "- Albert and Bernard became friends with Cheryl, and want to know when her birthday is.\n", + "- Cheryl wrote down a list of 10 possible dates for all to see:\n", + " - May 15, May 16, May 19,\n", + " June 17, June 18,\n", + " July 14, July 16,\n", + " August 14, August 15, August 17\n", + "- **Cheryl** then privately tells Albert the month and Bernard the day of her birthday.\n", + "- **Albert**: \"I don't know when Cheryl's birthday is, and I know that Bernard does not know.\"\n", + "- **Bernard**: \"At first I don't know when Cheryl's birthday is, but I know now.\"\n", + "- **Albert**: \"Then I also know when Cheryl's birthday is.\"\n", + "- So when is Cheryl's birthday?" ] }, { @@ -45,22 +46,21 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This is a slight modification of my [previous code](Cheryl.ipynb), and I'll give a slight modification of the explanation. The puzzle concerns these concepts:\n", + "This is a slight modification of my [previous code](Cheryl.ipynb). The puzzle concerns these key concepts:\n", "\n", "- **Possible dates** that might be Cheryl's birthday.\n", "- **Knowing** which dates are still possible; knowing for sure when only one is possible.\n", "- **Telling** Albert and Bernard specific facts about the birthday.\n", - "- **Statements** about knowledge.\n", - "- **Hearing** the statements about knowledge.\n", + "- **Statements** made by Albert or Bernard about their knowledge of the birthday.\n", "\n", "I implement them as follows:\n", - "- `dates` is a set of all possible dates (each date is a string); we also consider subsets of `dates`.\n", + "- The global variable `dates` is a set of all possible dates (each date is a string).\n", "- `know(possible_dates)` is a function that returns `True` when there is only one possible date.\n", - "- `told(part)` is a function that returns the set of possible dates after Cheryl tells a part (month or day).\n", - "- *`statement`*`(date)` returns true if the statement is true given that `date` is Cheryl's birthday.\n", - "- `satisfy(possible_dates, statement,...)` returns a subset of possible_dates that are still possible after hearing the statements.\n", + "- `told(part)` is a function that returns the set of possible dates that remain after Cheryl tells a part (month or day).\n", + "- A statement is a function; *statement*`(date)` that returns true if the statement is true given that `date` is Cheryl's birthday.\n", + "- `satisfy(possible_dates, statement,...)` returns a subset of possible_dates for which all the statements are true.\n", "\n", - "In the [previous code](Cheryl.ipynb) I treated `dates` as a constant, but in this version the whole point is exploring different possible sets of dates, so now `dates` is a global variable, and the function `set_dates` is used to set the value of the global variable." + "In the [previous code](Cheryl.ipynb) I treated `dates` as a constant, but in this version the whole point is exploring different sets of possible dates. The easiest way to refactor the code was to make `dates` a global variable, and provide the function `update_dates` to set the value of the global variable. (It would be cleaner to package the dates into a non-global object, but it would be a big change to the code to inject this all the way down to the function `told`, where it is needed.)" ] }, { @@ -72,13 +72,12 @@ "# Albert and Bernard just became friends with Cheryl, and they want to know when her birthday is. \n", "# Cheryl gave them a list of 10 possible dates:\n", "\n", - "dates = ['May 15', 'May 16', 'May 19',\n", + "dates = {'May 15', 'May 16', 'May 19',\n", " 'June 17', 'June 18',\n", " 'July 14', 'July 16',\n", - " 'August 14', 'August 15', 'August 17']\n", + " 'August 14', 'August 15', 'August 17'}\n", "\n", "def month(date): return date.split()[0]\n", - "\n", "def day(date): return date.split()[1]\n", "\n", "# Cheryl then tells Albert and Bernard separately \n", @@ -121,44 +120,41 @@ " \n", "# So when is Cheryl's birthday?\n", "\n", - "def cheryls_birthday(dates) -> BeliefState:\n", - " \"\"\"Return a subset of the global `dates` for which all three statements are true.\"\"\"\n", - " return satisfy(set_dates(dates), albert1, bernard1, albert2)\n", + "def cheryls_birthday(possible_dates) -> BeliefState:\n", + " \"\"\"Return a subset of the dates for which all three statements are true.\"\"\"\n", + " return satisfy(update_dates(possible_dates), albert1, bernard1, albert2)\n", "\n", - "def set_dates(new_dates):\n", - " \"\"\"Set the value of the global `dates` to `new_dates`\"\"\"\n", + "def update_dates(possible_dates) -> BeliefState:\n", + " \"\"\"Set the value of the global `dates` to `possible_dates`.\"\"\"\n", " global dates\n", - " dates = new_dates\n", - " return dates\n", - "\n", - "# Some tests\n", - "\n", - "assert month('May 19') == 'May'\n", - "assert day('May 19') == '19'\n", - "assert albert1('May 19') == False\n", - "assert albert1('July 14') == True\n", - "assert know(told('17')) == False\n", - "assert know(told('19')) == True" + " dates = possible_dates\n", + " return dates" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{'July 16'}" - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "cheryls_birthday(dates)" + "# Some tests\n", + "\n", + "assert month('May 19') == 'May'\n", + "assert day('May 19') == '19'\n", + "assert albert1('May 19') == False\n", + "assert albert1('July 14') == True\n", + "assert know(told('16')) == False\n", + "assert know(told('19')) == True\n", + "assert know(told('May')) == False" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Below we trace through how this works.\n", + "\n", + "First Albert says that he doesn't know, and that Bernard doesn't either. So the possible remaining dates are:" ] }, { @@ -181,6 +177,13 @@ "satisfy(dates, albert1)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Bernard says he initially didn't know, but now he does. He knows, but we the puzzle-solvers don't. The remaining possible dates for us are:" + ] + }, { "cell_type": "code", "execution_count": 4, @@ -201,6 +204,13 @@ "satisfy(dates, albert1, bernard1)" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now Albert knows, and so do we:" + ] + }, { "cell_type": "code", "execution_count": 5, @@ -221,6 +231,26 @@ "satisfy(dates, albert1, bernard1, albert2)" ] }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'July 16'}" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cheryls_birthday(dates)" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -237,7 +267,7 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "metadata": {}, "outputs": [], "source": [ @@ -252,12 +282,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "We can verify that they do indeed make the puzzle work, giving a single known birthdate:" + "We can verify that they do indeed make the puzzle work:" ] }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 8, "metadata": {}, "outputs": [ { @@ -266,7 +296,7 @@ "{'July 30'}" ] }, - "execution_count": 7, + "execution_count": 8, "metadata": {}, "output_type": "execute_result" } @@ -286,26 +316,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If Gabe can do it, we can do it! Our strategy will be to repeatedly pick a random sample of dates, and check if they solve the puzzle. We'll limit ourselves to a subset of dates (not all 366) to make it more likely that a random selection will have multiple dates with the same month and day (otherwise Albert and Bernard would know right away):" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [], - "source": [ - "many_dates = {mo + ' ' + d1 + d2\n", - " for mo in ('March', 'April', 'May', 'June', 'July')\n", - " for d1 in '12'\n", - " for d2 in '3456789'}" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we need to cycle through random samples of these possible dates until we hit one that works. I anticipate wanting to solve other puzzles besides the original `cheryls_birthday`, so I'll make the `puzzle` be a parameter of the function `pick_dates`. Note that `pick_dates` returns two things: the one date that is the solution (the birthday), and the `k` (default 10) dates that form the puzzle." + "If Gabe can do it, we can do it! Our strategy will be to repeatedly pick a random sample of dates, and check if they solve the puzzle. We'll limit ourselves to a subset of dates (not all 366) to make it more likely that a random selection will have multiple dates with the same month and day (otherwise Albert and/or Bernard would know right away):" ] }, { @@ -314,45 +325,34 @@ "metadata": {}, "outputs": [], "source": [ - "import random\n", - "\n", - "def pick_dates(puzzle=cheryls_birthday, k=10):\n", - " \"Pick a set of `k` dates for which the `puzzle` has a unique solution.\"\n", - " while True:\n", - " random_dates = random.sample(many_dates, k)\n", - " solutions = puzzle(random_dates)\n", - " if know(solutions):\n", - " return solutions.pop(), random_dates" + "many_dates = {mo + ' ' + d1 + d2\n", + " for mo in {'April', 'August', 'July', 'June', 'March', 'May'}\n", + " for d1 in '12'\n", + " for d2 in '3456789'}" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we need to cycle through random samples of these possible dates until we hit one that works. I anticipate wanting to solve other puzzles besides the original `cheryls_birthday`, so I'll define the function `pick_dates` to take a parameter, `puzzle`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "('May 27',\n", - " ['May 13',\n", - " 'April 23',\n", - " 'May 27',\n", - " 'April 18',\n", - " 'July 14',\n", - " 'May 23',\n", - " 'July 27',\n", - " 'March 18',\n", - " 'June 19',\n", - " 'March 13'])" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], + "outputs": [], "source": [ - "pick_dates()" + "import random\n", + "\n", + "def pick_dates(puzzle, k=10) -> BeliefState:\n", + " \"\"\"Pick a set of `k` dates for which the `puzzle` has a unique solution.\"\"\"\n", + " while True:\n", + " random_dates = random.sample(many_dates, k)\n", + " solutions = puzzle(random_dates)\n", + " if know(solutions):\n", + " return set(random_dates)" ] }, { @@ -363,8 +363,16 @@ { "data": { "text/plain": [ - "('July 24',\n", - " ['July 25', 'July 24', 'March 25', 'July 28', 'April 24', 'March 28'])" + "{'April 18',\n", + " 'August 19',\n", + " 'August 28',\n", + " 'July 15',\n", + " 'June 14',\n", + " 'June 19',\n", + " 'June 28',\n", + " 'May 14',\n", + " 'May 19',\n", + " 'May 24'}" ] }, "execution_count": 11, @@ -373,7 +381,7 @@ } ], "source": [ - "pick_dates(k=6)" + "pick_dates(cheryls_birthday)" ] }, { @@ -384,19 +392,7 @@ { "data": { "text/plain": [ - "('May 25',\n", - " ['July 28',\n", - " 'March 19',\n", - " 'July 29',\n", - " 'July 13',\n", - " 'April 27',\n", - " 'June 27',\n", - " 'May 25',\n", - " 'April 28',\n", - " 'April 18',\n", - " 'July 25',\n", - " 'June 15',\n", - " 'May 18'])" + "{'July 16', 'July 18', 'July 29', 'March 16', 'March 18', 'May 29'}" ] }, "execution_count": 12, @@ -405,42 +401,74 @@ } ], "source": [ - "pick_dates(k=12)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Great! We can make a new puzzle, just like Gabe. But how often do we get a unique solution to the puzzle (that is, the puzzle returns a set of size 1)? How often do we get a solution where Albert and Bernard know, but we the puzzle solver doesn't (that is, a set of size greater than 1)? How often is there no solution (size 0)? Let's make a Counter of the number of times each length-of-solution occurs:" + "pick_dates(cheryls_birthday, k=6)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, - "outputs": [], + "outputs": [ + { + "data": { + "text/plain": [ + "{'April 23',\n", + " 'April 24',\n", + " 'April 29',\n", + " 'August 19',\n", + " 'August 24',\n", + " 'August 29',\n", + " 'June 13',\n", + " 'June 23',\n", + " 'March 14',\n", + " 'March 15',\n", + " 'May 13',\n", + " 'May 27'}" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ - "from collections import Counter\n", - "\n", - "def solution_lengths(puzzle=cheryls_birthday, N=10000, k=10, many_dates=many_dates):\n", - " \"Try N random samples and count how often each possible length-of-puzzle-solution appears.\"\n", - " return Counter(len(puzzle(random.sample(many_dates, k)))\n", - " for _ in range(N))" + "pick_dates(cheryls_birthday, k=12)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Great! We can make a new puzzle, just like Gabe. But how often do we get a unique solution to the puzzle (that is, the puzzle returns a set of size 1)? How often do we get a solution where Albert and Bernard know, but we the puzzle solver don't (that is, a belief set of size greater than 1)? How often is there no solution (size 0)? Let's make a Counter of the number of times each length-of-solution occurs:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, + "outputs": [], + "source": [ + "from collections import Counter\n", + "\n", + "def solution_lengths(puzzle, N=10000, k=10, many_dates=many_dates):\n", + " \"\"\"Try N random samples of k dates and count how often each possible \n", + " length-of-puzzle-solution appears.\"\"\"\n", + " return Counter(len(puzzle(random.sample(many_dates, k)))\n", + " for _ in range(N))" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Counter({0: 9513, 1: 210, 2: 276, 3: 1})" + "Counter({0: 9414, 2: 380, 1: 206})" ] }, - "execution_count": 14, + "execution_count": 15, "metadata": {}, "output_type": "execute_result" } @@ -453,31 +481,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This says that about 2% of the time we get a unique solution (a set of `len` 1). With similar frequency we get an ambiguous solution (with 2 or more possible birth dates). And about 95% of the time, the sample of dates leads to no solution dates.\n", + "This says that about 2% of the time we get a unique solution (a set of length 1). More often than that we get an ambiguous solution (with 2 or more possible birth dates), but about 95% of the time the sample of dates has no solution (a set of length 0).\n", "\n", "What happens if Cheryl changes the number of possible dates?" ] }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Counter({0: 9971, 2: 18, 1: 11})" - ] - }, - "execution_count": 15, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "solution_lengths(cheryls_birthday, k=6)" - ] - }, { "cell_type": "code", "execution_count": 16, @@ -486,7 +494,7 @@ { "data": { "text/plain": [ - "Counter({0: 9020, 2: 467, 1: 503, 3: 10})" + "Counter({0: 9987, 1: 3, 2: 10})" ] }, "execution_count": 16, @@ -494,6 +502,26 @@ "output_type": "execute_result" } ], + "source": [ + "solution_lengths(cheryls_birthday, k=6)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Counter({0: 8821, 2: 632, 1: 537, 3: 10})" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], "source": [ "solution_lengths(cheryls_birthday, k=12)" ] @@ -502,7 +530,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "It is really hard (but not impossible) to find a set of 6 dates that work for the puzzle, and much easier to find a solution with 12 dates." + "It is hard (but not impossible) to find a set of 6 dates that work for the puzzle, and much easier to find a solution with 12 dates." ] }, { @@ -516,74 +544,64 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now let's see if we can create a more complicated puzzle. We'll introduce a new character, Eve, give her a statement, and alter the rest of the puzzle slightly:\n", + "Now let's see if we can create a more complicated puzzle. We'll introduce a new character, Eve, and keep the same puzzle as before,e xcept that after Albert's second statement, Eve makes this statement:\n", "\n", - "> 1. Albert and Bernard just became friends with Cheryl, and they want to know when her birthday is. Cheryl wrote down a list of 10 possible dates for all to see.\n", - "> 2. **Cheryl** then writes down the month and shows it just to Albert, and also writes down the day and shows it just to Bernard.\n", - "> 3. **Albert**: I don't know when Cheryl's birthday is, but I know that Bernard does not know either.\n", - "> 4. **Bernard**: At first I didn't know when Cheryl's birthday is, but I know now.\n", - "> 5. **Albert**: Then I also know when Cheryl's birthday is.\n", - "> 6. **Eve**: Hi, Everybody. My name is Eve and I'm an evesdropper. It's what I do! I peeked and saw the first letter of the month and the first digit of the day. When I peeked, I didn't know Cheryl's birthday, but after listening to Albert and Bernard I do. And it's a good thing I peeked, because otherwise I couldn't have\n", - "figured it out.\n", - "> 7. So when is Cheryl's birthday?\n", + "- **Eve**: \"Hi, Everybody. My name is Eve and I'm an evesdropper. It's what I do! I peeked and saw the first letter of the month and the first digit of the day. When I peeked, I didn't know Cheryl's birthday, but after listening to Albert and Bernard I do. And it's a good thing I peeked, because otherwise I couldn't have\n", + "figured it out.\"\n", "\n", "We can easily code this up:" ] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "def cheryls_birthday_with_eve(dates):\n", " \"Return a set of the dates for which Albert, Bernard, and Eve's statements are true.\"\n", - " return satisfy(set_dates(dates), albert1, bernard1, albert2, eve1)\n", + " return satisfy(update_dates(dates), albert1, bernard1, albert2, eve1)\n", "\n", "def eve1(date):\n", " \"\"\"Eve: I peeked and saw the first letter of the month and the first digit of the day. \n", " When I peeked, I didn't know Cheryl's birthday, but after listening to Albert and Bernard \n", " I do. And it's a good thing I peeked, because otherwise I couldn't have figured it out.\"\"\"\n", - " at_first = told(first(day(date))) & told(first(month(date)))\n", - " otherwise = told('')\n", + " at_first = told(day(date)[0]) & told(month(date)[0])\n", " return (not know(at_first) and\n", " know(satisfy(at_first, albert1, bernard1, albert2)) and\n", - " not know(satisfy(otherwise, albert1, bernard1, albert2)))\n", - "\n", - "def first(seq): return seq[0]" + " not know(satisfy(dates, albert1, bernard1, albert2)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "*Note*: I admit I \"cheated\" a bit here. Remember that the function `told` tests for `(part in date)`. For that to work for Eve, we have to make sure that the first letter is distinct from any other character in the date (it is—because only the first letter is uppercase) and that the first digit is distinct from any other character (it is—because in `many_dates` I carefully made sure that the first digit is always 1 or 2, and the second digit is never 1 or 2). Also note that `told('')` denotes the hypothetical situation where Cheryl \"told\" Eve nothing.\n", + "*Note*: I admit I \"cheated\" a bit here. Remember that the function `told` tests for `(part in date)`. For that to work for Eve, we have to make sure that the first letter is distinct from any other character in the date (it is—because only the first letter is uppercase) and that the first digit is distinct from any other character (it is—because in `many_dates` I carefully made sure that the first digit is always 1 or 2, and the second digit is never 1 or 2). \n", "\n", "I have no idea if it is possible to find a set of dates that works for this puzzle. But I can try:" ] }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "('March 18',\n", - " ['April 25',\n", - " 'March 18',\n", - " 'March 16',\n", - " 'April 27',\n", - " 'May 29',\n", - " 'July 28',\n", - " 'May 24',\n", - " 'July 16',\n", - " 'May 28',\n", - " 'April 18'])" + "{'April 26',\n", + " 'August 25',\n", + " 'June 15',\n", + " 'June 19',\n", + " 'June 23',\n", + " 'June 29',\n", + " 'March 13',\n", + " 'March 23',\n", + " 'May 13',\n", + " 'May 19'}" ] }, - "execution_count": 18, + "execution_count": 19, "metadata": {}, "output_type": "execute_result" } @@ -601,16 +619,16 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Counter({0: 9729, 1: 143, 2: 128})" + "Counter({0: 9708, 1: 138, 2: 154})" ] }, - "execution_count": 19, + "execution_count": 20, "metadata": {}, "output_type": "execute_result" } @@ -623,7 +641,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "About half as often as for the original puzzle." + "A solution (a set of length 1) occurs a bit less often than with the original puzzle." ] }, { @@ -639,28 +657,28 @@ "source": [ "Let's make the puzzle even more complicated by making Albert wait one more time before he finally knows:\n", "\n", - "> 1. Albert and Bernard just became friends with Cheryl, and they want to know when her birtxhday is. Cheryl wrote down a list of 10 possible dates for all to see.\n", - "> 2. **Cheryl** then writes down the month and shows it just to Albert, and also writes down the day and shows it just to Bernard.\n", - "> 3. **Albert**: I don't know when Cheryl's birthday is, but I know that Bernard does not know either. \n", - "> 4. **Bernard**: At first I didn't know when Cheryl's birthday is, but I know now.\n", - "> 5. **Albert**: I still don't know.\n", - "> 6. **Eve**: Hi, Everybody. My name is Eve and I'm an evesdropper. It's what I do! I peeked and saw the first letter of the month and the first digit of the day. When I peeked, I didn't know Cheryl's birthday, but after listening to Albert and Bernard I do. And it's a good thing I peeked, because otherwise I couldn't have\n", + "- Albert and Bernard just became friends with Cheryl, and they want to know when her birtxhday is. Cheryl wrote down a list of 10 possible dates for all to see.\n", + "- **Cheryl** then writes down the month and shows it just to Albert, and also writes down the day and shows it just to Bernard.\n", + "- **Albert**: I don't know when Cheryl's birthday is, but I know that Bernard does not know either. \n", + "- **Bernard**: At first I didn't know when Cheryl's birthday is, but I know now.\n", + "- **Albert**: I still don't know.\n", + "- **Eve**: Hi, Everybody. My name is Eve and I'm an evesdropper. It's what I do! I peeked and saw the first letter of the month and the first digit of the day. When I peeked, I didn't know Cheryl's birthday, but after listening to Albert and Bernard I do. And it's a good thing I peeked, because otherwise I couldn't have\n", "figured it out.\n", - "> 7. **Albert**: OK, now I know.\n", - "> 8. So when is Cheryl's birthday?\n", + "- **Albert**: OK, now I know.\n", + "- So when is Cheryl's birthday?\n", "\n", - "Let's be careful in coding this up; Albert's second statement is different; he has a new third statement; and Eve's statement uses the same words, but it now implicitly refers to a different statement by Albert. We'll use the names `albert2c`, `eve1c`, and `albert3c` (`c` for \"complex\") to represent the new statements:" + "Albert's second statement is different; he has a new third statement; and Eve's statement uses the same words, but it now implicitly refers to a different statement by Albert. We'll use the names `albert2c`, `albert3c`, and `eve1c` (`c` for \"complex\") to represent the new statements:" ] }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "def cheryls_birthday_complex(dates):\n", " \"Return a set of the dates for which Albert, Bernard, and Eve's statements are true.\"\n", - " return satisfy(set_dates(dates), albert1, bernard1, albert2c, eve1c, albert3c)\n", + " return satisfy(update_dates(dates), albert1, bernard1, albert2c, eve1c, albert3c)\n", "\n", "def albert2c(date):\n", " \"Albert: I still don't know.\"\n", @@ -670,11 +688,10 @@ " \"\"\"Eve: I peeked and saw the first letter of the month and the first digit of the day. \n", " When I peeked, I didn't know Cheryl's birthday, but after listening to Albert and Bernard \n", " I do. And it's a good thing I peeked, because otherwise I couldn't have figured it out.\"\"\"\n", - " at_first = told(first(day(date))) & told(first(month(date)))\n", - " otherwise = told('')\n", + " at_first = told(day(date)[0]) & told(month(date)[0])\n", " return (not know(at_first)\n", " and know(satisfy(at_first, albert1, bernard1, albert2c)) and\n", - " not know(satisfy(otherwise, albert1, bernard1, albert2c)))\n", + " not know(satisfy(dates, albert1, bernard1, albert2c)))\n", "\n", "def albert3c(date):\n", " \"Albert: OK, now I know.\"\n", @@ -690,26 +707,25 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "('March 29',\n", - " ['June 16',\n", - " 'March 13',\n", - " 'March 29',\n", - " 'May 25',\n", - " 'June 13',\n", - " 'April 23',\n", - " 'April 14',\n", - " 'June 29',\n", - " 'March 14',\n", - " 'June 27'])" + "{'April 16',\n", + " 'April 28',\n", + " 'August 13',\n", + " 'August 18',\n", + " 'August 23',\n", + " 'August 26',\n", + " 'August 27',\n", + " 'July 16',\n", + " 'July 27',\n", + " 'June 29'}" ] }, - "execution_count": 21, + "execution_count": 22, "metadata": {}, "output_type": "execute_result" } @@ -727,16 +743,16 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "Counter({0: 9408, 1: 591, 2: 1})" + "Counter({0: 9207, 1: 790, 2: 3})" ] }, - "execution_count": 22, + "execution_count": 23, "metadata": {}, "output_type": "execute_result" } @@ -749,7 +765,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Interesting. It was actually easier to find dates that work for this story than for either of the other stories." + "Interesting. It was actually easier to find dates that work for this story than for any of the other stories." ] }, { @@ -763,16 +779,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Now we will go through a solution step-by-step. We'll use a set of dates selected in a previous run:" + "Now we will go through a solution step-by-step. We'll use these dates:" ] }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 24, "metadata": {}, "outputs": [], "source": [ - "previous_run_dates = {\n", + "complex_dates = {\n", " 'April 28',\n", " 'July 27',\n", " 'June 19',\n", @@ -794,7 +810,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 25, "metadata": {}, "outputs": [ { @@ -803,13 +819,13 @@ "{'July 27'}" ] }, - "execution_count": 24, + "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "cheryls_birthday_complex(previous_run_dates)" + "cheryls_birthday_complex(complex_dates)" ] }, { @@ -821,7 +837,7 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 26, "metadata": {}, "outputs": [ { @@ -830,7 +846,7 @@ "{'July 15', 'July 16', 'July 27'}" ] }, - "execution_count": 25, + "execution_count": 26, "metadata": {}, "output_type": "execute_result" } @@ -848,7 +864,7 @@ }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 27, "metadata": {}, "outputs": [ { @@ -857,7 +873,7 @@ "True" ] }, - "execution_count": 26, + "execution_count": 27, "metadata": {}, "output_type": "execute_result" } @@ -870,12 +886,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Next, Bernard is told the day:" + "Meanwhile, Bernard is told the day:" ] }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 28, "metadata": {}, "outputs": [ { @@ -884,7 +900,7 @@ "{'July 27', 'May 27'}" ] }, - "execution_count": 27, + "execution_count": 28, "metadata": {}, "output_type": "execute_result" } @@ -902,7 +918,7 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 29, "metadata": {}, "outputs": [ { @@ -911,7 +927,7 @@ "{'July 27'}" ] }, - "execution_count": 28, + "execution_count": 29, "metadata": {}, "output_type": "execute_result" } @@ -929,7 +945,7 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 30, "metadata": {}, "outputs": [ { @@ -938,7 +954,7 @@ "{'July 15', 'July 16', 'July 27'}" ] }, - "execution_count": 29, + "execution_count": 30, "metadata": {}, "output_type": "execute_result" } @@ -956,7 +972,7 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 31, "metadata": {}, "outputs": [ { @@ -965,7 +981,7 @@ "{'July 27', 'June 29'}" ] }, - "execution_count": 30, + "execution_count": 31, "metadata": {}, "output_type": "execute_result" } @@ -978,12 +994,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Two dates, so Eve doesn't know yet. But only one of the dates works after hearing the three statements made by Albert and Bernard:" + "Two dates, so Eve doesn't know after evesdropping. But only one of the dates works after hearing the three statements made by Albert and Bernard:" ] }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 32, "metadata": {}, "outputs": [ { @@ -992,7 +1008,7 @@ "{'July 27'}" ] }, - "execution_count": 31, + "execution_count": 32, "metadata": {}, "output_type": "execute_result" } @@ -1005,12 +1021,12 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "But Eve wouldn't have known if she had been told nothing:" + "But Eve wouldn't have known if she hadn;t been told anything:" ] }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 33, "metadata": {}, "outputs": [ { @@ -1019,13 +1035,13 @@ "{'July 15', 'July 16', 'July 27'}" ] }, - "execution_count": 32, + "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "satisfy(told(''), albert1, bernard1, albert2c)" + "satisfy(dates, albert1, bernard1, albert2c)" ] }, { @@ -1037,7 +1053,7 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 34, "metadata": {}, "outputs": [ { @@ -1046,139 +1062,13 @@ "{'July 27'}" ] }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "satisfy(told('July'), eve1c)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Three Children\n", - "\n", - "Here's another puzzle:\n", - "\n", - "> 1. A parent has the following conversation with a friend:\n", - "> 2. **Parent:** the product of my three childrens' ages is 36.\n", - "> 3. **Friend**: I don't know their ages.\n", - "> 4. **Parent**: The sum of their ages is the same as the number of people in this room.\n", - "> 5. **Friend**: I still don't know their ages.\n", - "> 6. **Parent**: The oldest one likes bananas.\n", - "> 7. **Friend**: Now I know their ages.\n", - "\n", - "Let's follow the same methodology to solve this puzzle. Except this time, we're not dealing with sets of possible dates, we're dealing with set of possible *states* of the world. We'll define a state as a tuple of 4 numbers: the ages of the three children (in increasing order), and the number of people in the room. \n", - "\n", - "Note: We'll limit the children's ages to be below 30 and the number of people in the room to be below 90. Also, in `friend2` and `friend3` we'll compute the `possible_states` and cache them, since the computation does not depend on the `date`." - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{(2, 2, 9, 13)}" - ] - }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "N = 30\n", - "states = {(a, b, c, n) \n", - " for a in range(1, N)\n", - " for b in range(a, N)\n", - " for c in range(b, N) if a * b * c == 36\n", - " for n in range(2, 90)}\n", - "\n", - "def ages(state): return state[:-1]\n", - "def room(state): return state[-1]\n", - "\n", - "def parent1(state): \n", - " \"\"\"The product of my three childrens' ages is 36.\"\"\"\n", - " a, b, c = ages(state)\n", - " return a * b * c == 36\n", - "\n", - "def friend1(state): \n", - " \"\"\"I don't know their ages.\"\"\"\n", - " possible_ages = {ages(s) for s in satisfy(states, parent1)}\n", - " return not know(possible_ages)\n", - "\n", - "def parent2(state):\n", - " \"\"\"The sum of their ages is the same as the number of people in this room.\"\"\"\n", - " return sum(ages(state)) == room(state)\n", - "\n", - "def friend2(state, possible_states=satisfy(states, parent1, friend1, parent2)): \n", - " \"\"\"I still don't know their ages.\"\"\"\n", - " # Given there are room(state) people in the room, I still don't know the ages.\n", - " possible_ages = {ages(s) for s in possible_states if room(s) == room(state)}\n", - " return not know(possible_ages)\n", - "\n", - "def parent3(state):\n", - " \"\"\"The oldest one likes bananas.\"\"\"\n", - " # I.e., there is an oldest one (and not twins of the same age)\n", - " a, b, c = ages(state)\n", - " return c > b\n", - "\n", - "def friend3(state, possible_states=satisfy(states, parent1, friend1, parent2, friend2, parent3)): \n", - " \"Now I know their ages.\"\n", - " possible_ages = {ages(s) for s in possible_states}\n", - " return know(possible_ages)\n", - "\n", - "def child_age_puzzle(states):\n", - " return satisfy(states, parent1, friend1, parent2, friend2, parent3, friend3)\n", - "\n", - "child_age_puzzle(states)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The tricky part of this puzzle comes after the `parent2` statement:" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "{(1, 2, 18, 21),\n", - " (1, 3, 12, 16),\n", - " (1, 4, 9, 14),\n", - " (1, 6, 6, 13),\n", - " (2, 2, 9, 13),\n", - " (2, 3, 6, 11),\n", - " (3, 3, 4, 10)}" - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "satisfy(states, parent1, friend1, parent2)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We see that out of these 7 possibilities, if the number of people in the room (the last number in each tuple) \n", - "were anything other than 13, then the friend (who can observe the number of people in the room) would know the ages. Since the `friend2` statement professes continued ignorance, it must be that the number of people in the room is 13. Then the `parent3` statement makes it clear that there can't be 6-year-old twins as the oldest children; it must be 2-year-old twins with an oldest age 9." + "satisfy(told('July'), eve1c)" ] }, { @@ -1192,7 +1082,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "If you like, there are many other directions you could take this:\n", + "There are many other directions you could take this:\n", "\n", "- Could you create a puzzle that goes one or two rounds more before everyone knows?\n", "- Could you add new characters: Faith, and then George, and maybe even a new Hope?\n", @@ -1207,7 +1097,7 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -1221,9 +1111,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.7.7" + "version": "3.8.15" } }, "nbformat": 4, - "nbformat_minor": 1 + "nbformat_minor": 4 } diff --git a/ipynb/CherylMind.ipynb b/ipynb/CherylMind.ipynb index 4fd561f..3508070 100644 --- a/ipynb/CherylMind.ipynb +++ b/ipynb/CherylMind.ipynb @@ -9,13 +9,27 @@ "\n", "# LLMs, Theory of Mind, and Cheryl's Birthday\n", "\n", - "There has been [much](https://spectrum.ieee.org/theory-of-mind-ai) [debate](https://aclanthology.org/2023.conll-1.25/) [on](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models) [the](https://arxiv.org/abs/2302.02083) [degree](https://www.nature.com/articles/s41562-024-01882-z) to which Large Language Models (LLMs) have a theory of mind: a way of understanding what other people know and don't know. In this notebook I explore one small part of the issue by asking six LLMs to solve the [Cheryl's Birthday Problem](https://en.wikipedia.org/wiki/Cheryl%27s_Birthday), a well-known logic puzzle in which different characters have different states of knowledge. \n", + "There has been [much](https://spectrum.ieee.org/theory-of-mind-ai) [debate](https://aclanthology.org/2023.conll-1.25/) [on](https://www.gsb.stanford.edu/faculty-research/working-papers/theory-mind-may-have-spontaneously-emerged-large-language-models) [the](https://arxiv.org/abs/2302.02083) [degree](https://www.nature.com/articles/s41562-024-01882-z) to which Large Language Models (LLMs) have a theory of mind: a way of understanding what other people know and don't know. In this notebook I explore one small part of the issue by asking nine LLM chatbots to solve the [Cheryl's Birthday Problem](https://en.wikipedia.org/wiki/Cheryl%27s_Birthday), a well-known logic puzzle in which different characters have different states of knowledge at different times.\n", "\n", - "**TLDR**: The six LLMs were all familiar with the problem, so I didn't have to describe it in the prompt, just name it. They were able to correctly state that the answer to the problem is July 16. But none of them were able to write a program that finds the solution. They all failed to distinguish the different knowledge states of the different characters–for this problem they had no theory of mind.\n", + "I asked the following ten solvers to tackle the Cheryl's Birthday problem:\n", + "- [A human programmer](https://github.com/norvig/)\n", + "- [ChatGPT 4o](https://chatgpt.com/)\n", + "- [Microsoft Copilot](https://copilot.microsoft.com/)\n", + "- [Gemini Advanced](https://gemini.google.com/app)\n", + "- [Meta AI Llama 405B](https://www.meta.ai/)\n", + "- [Anthropic Claude 3.5 Sonnet](https://claude.ai/new)\n", + "- [Perplexity](https://www.perplexity.ai/)\n", + "- [Cohere Chat](https://cohere.com/chat)\n", + "- [HuggingFace Chat](https://huggingface.co/chat/)\n", + "- [You.com](https://you.com/)\n", "\n", - "Below I show the response for each LLM. I elided some of the explanatory output when it was very similar to previous output, but I show the programs verbatim. My comments are in *[bracketed italics]* and the LLM output is in regular upright text. The queries were made on Sept 25, 2024; subsequent updates of the models may perform differently.\n", + "# TLDR: Conclusion\n", "\n", - "For each LLMs the prompt was:\n", + "The LLMs were all familiar with the problem, so I didn't have to describe it in the prompt, just name it. Most of them correctly recalled the answer to the problem: July 16. But none of them were able to write a program that finds the solution. They all failed to distinguish the different knowledge states of the different characters over time. At least with respect to this problem, they had no theory of mind. (Perhaps that is in part due to the fact that very few of the Python programs they were trained on deal with theory of mind.)\n", + "\n", + "Below I show the response for each LLM. Each one provided explanatory output along with a program; for brevity I only show the explanatory output from the first one, ChatGPT 4o. My comments are in *[bracketed italics]*. The queries were made on Sept 25, 2024; subsequent updates of the models may perform differently.\n", + "\n", + "For each LLM the prompt was:\n", "\n", "***What is the answer to the \"Cheryl's Birthday\" problem? Write a Python program to solve it. Make sure that the program will still work if the list of possible dates is changed.***" ] @@ -27,7 +41,7 @@ "source": [ "# Human\n", "\n", - "An actual human, me, was able to write a [**program**](https://github.com/norvig/pytudes/blob/main/ipynb/Cheryl-and-Eve.ipynb) that correctly solves the original problem and also handles new sets of dates." + "An actual human (me) was able to write a program, shown in [**another notebook**](https://github.com/norvig/pytudes/blob/main/ipynb/Cheryl-and-Eve.ipynb), that correctly solves the original problem and also handles new sets of dates, and other variations on the problem. I introduced the idea of a *BeliefState*, a set of possible dates that a person believes might possibly be the birthday, and I modeled a character's *statement* as a function that takes a particular date as input, and returns true if the date is consistent with the statement." ] }, { @@ -78,14 +92,14 @@ "- Step 2: Based on Albert's statement, we eliminate the months of May and June.\n", "- Step 3: Based on Bernard's statement, we eliminate dates where the day appears more than once (i.e., both July 14 and August 14 are eliminated).\n", "- Step 4: Albert now knows Cheryl's birthday, so it must be in the month where only one date is left (i.e., July).\n", - "\n", + "- \n", "The program is designed to work with any given list of possible dates, not just the ones in the original problem." ] }, { "cell_type": "code", "execution_count": 1, - "id": "8a1ca966-d8aa-4bc5-bbe3-253b3b62d538", + "id": "03ff513c-d390-413e-b524-9c82de44e1a6", "metadata": {}, "outputs": [ { @@ -149,7 +163,59 @@ "id": "cb513413-f71d-476d-a6bc-211ab1589bbe", "metadata": {}, "source": [ - "*[We see that the program fails to find a solution. It fails because it has a single variable for possible dates, when it should be keeping track of separate possible dates for Albert and Bernard.]* " + "*[We see that the program fails to find a solution. It starts with the correct list of \"possible_dates\", and correctly updates the list twice to reflect Albert's and Bernard's first statements. So far so good. But then things go wrong: first it defines \"remaining_months\" and never uses it. Then it tries to find a possible date in a unique month, but it should be a unique month with respect to what is now known, not with respect to the original possible dates stored in the \"month_map\". The program has failed to track the two character's different belief states over time.]* " + ] + }, + { + "cell_type": "markdown", + "id": "93852dab-d167-4d14-8cc6-02979427eb01", + "metadata": {}, + "source": [ + "# Microsoft Copilot\n", + "\n", + "*[Copilot makes a similar mistake.*]" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "5d7e4408-3663-4f54-acdb-4bddf1ac9121", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cheryl's birthday is: None\n" + ] + } + ], + "source": [ + "def cheryls_birthday(possible_dates):\n", + " # Step 1: Filter out months with unique days\n", + " unique_days = {date.split()[1] for date in possible_dates if sum(d.split()[1] == date.split()[1] for d in possible_dates) == 1}\n", + " possible_dates = [date for date in possible_dates if date.split()[1] not in unique_days]\n", + "\n", + " # Step 2: Filter out dates with unique months\n", + " unique_months = {date.split()[0] for date in possible_dates if sum(d.split()[0] == date.split()[0] for d in possible_dates) == 1}\n", + " possible_dates = [date for date in possible_dates if date.split()[0] not in unique_months]\n", + "\n", + " # Step 3: Filter out remaining dates based on the logic of the puzzle\n", + " for date in possible_dates:\n", + " month, day = date.split()\n", + " if all(d.split()[0] != month for d in possible_dates if d != date):\n", + " return date\n", + "\n", + "# List of possible dates\n", + "possible_dates = [\n", + " \"May 15\", \"May 16\", \"May 19\",\n", + " \"June 17\", \"June 18\",\n", + " \"July 14\", \"July 16\",\n", + " \"August 14\", \"August 15\", \"August 17\"\n", + "]\n", + "\n", + "birthday = cheryls_birthday(possible_dates)\n", + "print(f\"Cheryl's birthday is: {birthday}\")" ] }, { @@ -159,16 +225,12 @@ "source": [ "# Gemini Advanced\n", "\n", - "*[Gemini's response:]*\n", - "\n", - "The answer to Cheryl's Birthday problem is **July 16th**.\n", - "\n", - "Here's the Python program, modified to handle changes in the list of possible dates:" + "*[Gemini makes a similar mistake.]*\n" ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 3, "id": "7c0bd1e7-4ada-4b43-97a5-674f615b9f77", "metadata": {}, "outputs": [ @@ -239,7 +301,7 @@ "id": "11f9bfc5-8548-4f7a-ac69-acecddb6f19b", "metadata": {}, "source": [ - "*[Gemini correctly states the answer, but then produces a program that fails to find the correct answer. Again, the failure is due to having one list of possible dates. The program accepts a list of \"possible_dates\", but confusingly, those are actually just days of the month; you aren't allowed to pass in different month/day combinations. And even if you do pass in different days, the program hard-codes certain days and months.]*" + "*[The program accepts a list of \"possible_dates\", but confusingly, those are actually just days of the month; you aren't allowed to pass in different month/day combinations. And even if you do pass in different days, the program hard-codes certain days and months.]*" ] }, { @@ -247,14 +309,14 @@ "id": "3932858f-170b-4979-a565-c3b41a74e7fe", "metadata": {}, "source": [ - "# Llama 405B\n", + "# Meta Llama 405B\n", "\n", - "*[Llama gives a fine introduction to the problem, which I omit. Here is the program it produces:]*" + "*[Llama also fails to find a solution (and thus has no output).]*" ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "id": "1faff935-74aa-44ee-9f37-15f71521e7d2", "metadata": {}, "outputs": [], @@ -299,7 +361,7 @@ "id": "469c146d-bbc9-4b2f-8609-b82631a5139d", "metadata": {}, "source": [ - "*[There is no output from this program, which means it failed to find the date. This program is more flawed than the others. I do like that it defines functions for the various pieces, but note that the function \"bernard_deduce\" is defined but never called.]*" + "*[This program is quite flawed. I do like that it defines functions for the various pieces, but note that the function \"bernard_deduce\" is defined but never called.]*" ] }, { @@ -309,12 +371,12 @@ "source": [ "# Claude 3.5 Sonnet\n", "\n", - "*[Again, Claude provides a nice explanation of the problem, which I omit; here is the program:]*" + "*[Claude makes a similar mistake. It deserves credit for explicitly trying an example with different dates, and it correctly reports that the second example has no solution, but that was by accident.]*" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "id": "1da1319a-95d1-4c98-8b36-d96a8c552cb7", "metadata": {}, "outputs": [ @@ -328,8 +390,6 @@ } ], "source": [ - "from itertools import product\n", - "\n", "def solve_cheryls_birthday(dates):\n", " # Extract unique months and days\n", " months = sorted(set(date.split()[0] for date in dates))\n", @@ -373,14 +433,6 @@ "print(f\"With the new set of dates, Cheryl's birthday is on: {new_solution}\")" ] }, - { - "cell_type": "markdown", - "id": "9b16c634-5eda-4249-97fd-e61c980dadbe", - "metadata": {}, - "source": [ - "*[Again, the program fails. It has the same issue with a single possible_dates variable.]*" - ] - }, { "cell_type": "markdown", "id": "52003af0-0ea1-4ed4-9b5e-5b9b8862a068", @@ -388,12 +440,12 @@ "source": [ "# Perplexity\n", "\n", - "*[Here is Perplexity's code, with its explanation omitted:]*" + "*[Once again, this program has a similar mistake.]*" ] }, { "cell_type": "code", - "execution_count": 5, + "execution_count": 6, "id": "55e99206-42f8-44ea-a091-1ca6a14f4e89", "metadata": {}, "outputs": [ @@ -440,14 +492,6 @@ "print(f\"Cheryl's birthday is: {result}\")" ] }, - { - "cell_type": "markdown", - "id": "b9296a89-b55c-40aa-8fc0-788cea3511e0", - "metadata": {}, - "source": [ - "*[Once again, this program has the same mistake.]*" - ] - }, { "cell_type": "markdown", "id": "4b4e47b5-f5a2-491d-ad34-f8c08aceca59", @@ -455,12 +499,12 @@ "source": [ "# Cohere Chat Playground\n", "\n", - "*[Cohere gives a nice explanation, and then this program:]*" + "*[Cohere too has the same mistake.]*" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 7, "id": "f1c6bfd7-6738-4325-8bc4-f91b4ee15e48", "metadata": {}, "outputs": [ @@ -517,7 +561,118 @@ "id": "9d10112a-299e-49be-96ce-cf9551564894", "metadata": {}, "source": [ - "*[Again, the program fails to find a solution because it doesn't distinguish who knows what (note that the \"bernard_knows\" variable is defined in terms of \"albert_knows\", but Bernard does not know what Albert knows). The program introduces days of the week, which is extraneous, but okay. A bigger issue is that it accepts \"possible_dates\" as a parameter, but then ignores them and defines \"month_days\" based on the original dates.]*" + "*[This program seems promising in that it has separate variables for \"albert_knows\" and \"bernard_knows\", but it does not handle them correctly. The program introduces days of the week, which is extraneous, but not harmful. A bigger issue is that it accepts \"possible_dates\" as a parameter, but then ignores them and defines \"month_days\" based on the original dates, not on the input.]*" + ] + }, + { + "cell_type": "markdown", + "id": "8379b2f8-9591-4f7e-be42-e5917ce0263c", + "metadata": {}, + "source": [ + "# HuggingFace Chat\n", + "\n", + "*[HuggingFace is unique in suggesting a class definition, but still makes a similar mistake. Also, while it accepts a list of possible dates in the initializer, it then hard-codes certain months and days from the original problem.]*" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "e0d33270-0f02-4a70-8258-4b87aab63b01", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[]\n" + ] + } + ], + "source": [ + "class CherylBirthdayProblem:\n", + " def __init__(self, possible_dates):\n", + " self.possible_dates = possible_dates \n", + " \n", + " def filter_dates(self, month, day):\n", + " return [date for date in self.possible_dates if date[0] == month and date[1] == day]\n", + " \n", + " def filter_months(self, month):\n", + " return [date for date in self.possible_dates if date[0] == month]\n", + " \n", + " def filter_days(self, day):\n", + " return [date for date in self.possible_dates if date[1] == day]\n", + " \n", + " def solve(self):\n", + " # Albert's first statement\n", + " self.possible_dates = self.filter_months(5) + self.filter_months(6)\n", + " \n", + " # Bernard's first statement\n", + " self.possible_dates = [date for date in self.possible_dates if date[1]!= 19]\n", + " \n", + " # Albert's second statement\n", + " self.possible_dates = self.filter_days(14) + self.filter_days(15) + self.filter_days(16) + self.filter_days(17) + self.filter_days(18)\n", + " \n", + " # Bernard's second statement\n", + " self.possible_dates = [date for date in self.possible_dates if date[1]!= 15 and date[0]!= 6]\n", + " \n", + " # Albert's third statement\n", + " self.possible_dates = self.filter_months(7)\n", + " \n", + " # Bernard's third statement\n", + " self.possible_dates = self.filter_days(16)\n", + " \n", + " return self.possible_dates\n", + "\n", + "possible_dates = [(5, 14), (5, 15), (5, 16), (6, 15), (6, 16), (6, 17), (6, 18), (6, 19), \n", + " (7, 14), (7, 15), (7, 16), (7, 17), (7, 18), (8, 14), (8, 15), (8, 16), (8, 17), (8, 18), (8, 19)]\n", + "problem = CherylBirthdayProblem(possible_dates)\n", + "print(problem.solve())" + ] + }, + { + "cell_type": "markdown", + "id": "39904aa9-d97f-4b0c-a2d0-4731c2d4aff0", + "metadata": {}, + "source": [ + "# You.com Genius Mode\n", + "\n", + "*[You.com makes a similar mistake, although the end result is that it selects the wrong date rather than no date.]*" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "232deb66-b208-4b65-9703-4519dc88c528", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('May', 19)\n" + ] + } + ], + "source": [ + "def cheryls_birthday(possible_dates):\n", + " # Step 1: Albert knows Bernard doesn't know. So, the month can't be one with a unique day.\n", + " step1 = [date for date in possible_dates \n", + " if not any(possible_dates.count(x) == 1 \n", + " for x in [d for m, d in possible_dates if m == date[0]])]\n", + "\n", + " # Step 2: Bernard now knows. So, the day must be unique among the remaining possible dates.\n", + " step2 = [date for date in step1 if sum(date[1] == d for m, d in step1) == 1]\n", + "\n", + " # Step 3: Albert now knows. So, the month must be unique among the remaining possible dates.\n", + " step3 = [date for date in step2 if sum(date[0] == m for m, d in step2) == 1]\n", + "\n", + " return step3[0] if step3 else None\n", + "\n", + "# List of possible dates\n", + "possible_dates = [('May', 15), ('May', 16), ('May', 19), ('June', 17), ('June', 18), \n", + " ('July', 14), ('July', 16), ('August', 14), ('August', 15), ('August', 17)]\n", + "\n", + "print(cheryls_birthday(possible_dates))" ] } ], diff --git a/ipynb/Triplets.ipynb b/ipynb/Triplets.ipynb index 7d8866b..b0917fd 100644 --- a/ipynb/Triplets.ipynb +++ b/ipynb/Triplets.ipynb @@ -1,107 +1,177 @@ { "cells": [ { - "cell_type": "code", - "execution_count": 7, - "id": "15a565d8-b9ee-427d-8631-6e1a26089b7f", + "cell_type": "markdown", + "id": "19ee7dde-0d74-47e8-8d0d-4ffcb99e2f5a", "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "(0, 4, 7, 5, 2, 6, 1, 3)" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], "source": [ - "from itertools import permutations\n", - "from typing import *\n", + "
Peter Norvig
Sept 25, 2024
\n", "\n", - "def nqueens(n=8) -> Iterable[Sequence[int]]:\n", - " \"\"\"All ways of arranging `n` non-attacking queens on an `n` x `n` board.\n", - " Each way is a sequence of `n` column numbers, one for each row\"\"\"\n", - " return (cols for cols in permutations(range(n))\n", - " if different(diagonal1(cols)) \n", - " and different(diagonal2(cols)))\n", + "# The Languages of English, Math, and Programming\n", "\n", - "def different(items) -> bool: return len(items) == len(set(items))\n", - "def diagonal1(cols): return [col - row for row, col in enumerate(cols)]\n", - "def diagonal2(cols): return [col + row for row, col in enumerate(cols)]\n", + "My colleague [Wei-Hwa Huang](https://en.wikipedia.org/wiki/Wei-Hwa_Huang) gave several AI chatbots this prompt: \n", "\n", - "assert len(set(nqueens(8))) == 92\n", + "**List all the ways in which three distinct positive integers have a product of 108.**\n", "\n", - "next(nqueens(8))" + "I tested this prompt on the following solvers:\n", + "- [A human programmer](https://github.com/norvig/)\n", + "- [Gemini Advanced](https://gemini.google.com/app)\n", + "- [ChatGPT 4o](https://chatgpt.com/)\n", + "- [Microsoft Copilot](https://copilot.microsoft.com/)\n", + "- [Anthropic Claude 3.5 Sonnet](https://claude.ai/new)\n", + "- [Meta AI Llama 3](https://www.meta.ai/)\n", + "- [Perplexity](https://www.perplexity.ai/)\n", + "- [Cohere Chat](https://cohere.com/chat)\n", + "- [HuggingFace Chat](https://huggingface.co/chat/)\n", + "- [You.com](https://you.com/)\n", + "\n", + "All the LLMs Wei-Hwa originally tried got this one wrong. From my expanded list, Gemini, ChatGPT 4o, You.com and the human got it right, and 5 other models made mistakes:\n", + "- The LLMs all started their answer by noting that 108 = 2 × 2 × 3 × 3 × 3, and then tried to partition those factors into three distinct subsets and report all ways to do so.\n", + "- So far so good.\n", + "- But most of them forgot that 1 could be a factor of 108 (or equivalently, that the empty set of factors is a valid subset). \n", + "- Some of the models ignored the need for \"distinct\" integers, and proposed, say, 3 × 6 × 6.\n", + "- Some got 5 or 6 correct triplets, and then stopped, perhaps because their attention mechanism didn't go back far enough.\n", + "- SOme even proposed non-integers as \"factors\".\n", + "\n", + "I thought that the models might have skipped 1 as a factor because 1 is not listed in the prime factorization, so it is easy to forget. But in programming, it is more natural to run a loop from 1 to *n* than from 2 to *n*, so this error would be less likely. Therefore, I decided to test all the models with the following prompt: \n", + "\n", + "**Write a Python program to list all the ways in which three distinct positive integers have a product of 108.**\n", + "\n", + "# TLDR: Conclusion\n", + "\n", + "The models did much better with this prompt. My conclusion is that the language used to solve a problem matters. Sometimes a natural language such as English is a good choice, sometimes you need the language of mathematical equations, or maybe chemical equations, and sometimes a programming language is best.\n", + "\n", + "# Human\n", + "\n", + "A human (me) was able to correctly respond to the prompt:" ] }, { "cell_type": "code", - "execution_count": 8, - "id": "fd7a971c-d3f4-4f2e-89db-777fb2a208d4", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Q . . . . . . . \n", - ". . . . Q . . . \n", - ". . . . . . . Q \n", - ". . . . . Q . . \n", - ". . Q . . . . . \n", - ". . . . . . Q . \n", - ". Q . . . . . . \n", - ". . . Q . . . . \n" - ] - } - ], - "source": [ - "def show(queens, dot='. ', Q='Q ') -> None:\n", - " \"\"\"Print the board.\"\"\"\n", - " m = max(queens)\n", - " for col in queens:\n", - " print(dot * col + Q + dot * (m - col))\n", - "\n", - "show(next(nqueens())) " - ] - }, - { - "cell_type": "code", - "execution_count": 6, + "execution_count": 1, "id": "f8a27ed0-c2b1-47a0-bdf0-c6a8cd789dc5", "metadata": {}, "outputs": [ { "data": { "text/plain": [ - "[(1, 2, 54),\n", - " (1, 3, 36),\n", - " (1, 4, 27),\n", - " (1, 6, 18),\n", - " (1, 9, 12),\n", - " (2, 3, 18),\n", - " (2, 6, 9),\n", - " (3, 4, 9)]" + "[{1, 2, 54},\n", + " {1, 3, 36},\n", + " {1, 4, 27},\n", + " {1, 6, 18},\n", + " {1, 9, 12},\n", + " {2, 3, 18},\n", + " {2, 6, 9},\n", + " {3, 4, 9}]" ] }, - "execution_count": 6, + "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ - "import math\n", + "from math import prod\n", "from itertools import combinations\n", + "from typing import *\n", "\n", - "def find_product(n, k) -> List[Tuple[int, ...]]:\n", + "def find_products(k=3, n=108) -> List[Set[int]]:\n", " \"\"\"A list of all ways in which `k` distinct positive integers have a product of `n`.\"\"\" \n", " factors = {i for i in range(1, n + 1) if n % i == 0}\n", - " return [tup for tup in combinations(factors, k) if math.prod(tup) == n]\n", + " return [set(ints) for ints in combinations(factors, k) if prod(ints) == n]\n", "\n", - "find_product(108, 3)" + "find_products()" + ] + }, + { + "cell_type": "markdown", + "id": "b7682af0-8c46-4e19-bdba-71f28bbfa101", + "metadata": {}, + "source": [ + "The program can solve different versions of the problem (although it would be slow when `n` is over a billion or so):" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "55dfa9c2-f366-42c8-ae50-6a1df80c47b3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{1, 2, 3, 4, 15},\n", + " {1, 2, 3, 5, 12},\n", + " {1, 2, 3, 6, 10},\n", + " {1, 2, 4, 5, 9},\n", + " {1, 3, 4, 5, 6}]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "find_products(5, 360)" + ] + }, + { + "cell_type": "markdown", + "id": "42fc5098-2315-4984-b256-4fbf2f6fec51", + "metadata": {}, + "source": [ + "# Gemini Advanced\n", + "\n", + "Gemini produced three drafts, of which the following one was correct. In another draft, it had the line `k = product // (i * j)`, using integer division, which is incompatible with the `k.is_integer()` test. Here is the correct draft:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "29789909-15d1-4833-b35f-cb5399893ba2", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" + ] + } + ], + "source": [ + "def find_combinations(product):\n", + " \"\"\"Finds all combinations of three distinct positive integers that multiply to the given product.\n", + "\n", + " Args:\n", + " product: The product to find combinations for.\n", + "\n", + " Returns:\n", + " A list of tuples, where each tuple represents a combination of three integers.\n", + " \"\"\"\n", + "\n", + " combinations = []\n", + " for i in range(1, product):\n", + " for j in range(i + 1, product):\n", + " k = product / (i * j)\n", + " if k > j and k.is_integer():\n", + " combinations.append((i, j, int(k)))\n", + " return combinations\n", + "\n", + "product = 108\n", + "result = find_combinations(product)\n", + "print(result)" + ] + }, + { + "cell_type": "markdown", + "id": "81f032a3-5621-470e-9ada-8d3e026ff1cc", + "metadata": {}, + "source": [ + "# ChatGPT 4o\n", + "\n", + "ChatGPT got it right (although it imported `itertools` and then never used it)." ] }, { @@ -114,7 +184,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Triplets with product 108:\n", + "All distinct triplets of integers that multiply to 108:\n", "(1, 2, 54)\n", "(1, 3, 36)\n", "(1, 4, 27)\n", @@ -127,30 +197,359 @@ } ], "source": [ - "def find_triplets(target_product):\n", - " triplets = []\n", - " for a in range(1, target_product + 1):\n", - " for b in range(a + 1, target_product + 1): # Ensure distinctness (b > a)\n", - " c = target_product // (a * b)\n", - " if a * b * c == target_product and c > b: # Ensure distinctness (c > b)\n", - " triplets.append((a, b, c))\n", - " return triplets\n", + "import itertools\n", "\n", - "target_product = 108\n", - "triplets = find_triplets(target_product)\n", + "def find_triplets_with_product(product):\n", + " solutions = []\n", + " \n", + " # Iterate over all possible combinations of distinct integers\n", + " for a in range(1, int(product ** (1/3)) + 1):\n", + " for b in range(a + 1, product // a + 1):\n", + " for c in range(b + 1, product // (a * b) + 1):\n", + " if a * b * c == product:\n", + " solutions.append((a, b, c))\n", + " \n", + " return solutions\n", + "\n", + "product = 108\n", + "triplets = find_triplets_with_product(product)\n", "\n", "if triplets:\n", - " print(f\"Triplets with product {target_product}:\")\n", + " print(f\"All distinct triplets of integers that multiply to {product}:\")\n", " for triplet in triplets:\n", " print(triplet)\n", "else:\n", - " print(f\"No triplets found with product {target_product}\")" + " print(f\"No distinct triplets found that multiply to {product}.\")\n" + ] + }, + { + "cell_type": "markdown", + "id": "17e74293-feab-4fff-b682-bc26823ebefa", + "metadata": {}, + "source": [ + "# Bing CoPilot\n", + "\n", + "Bing produces a very clean (but somewhat slower) `find_triplets` function." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "2b0ee5b6-64bd-4581-bdc7-d22065f02d64", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All distinct triplets with a product of 108:\n", + "(1, 2, 54)\n", + "(1, 3, 36)\n", + "(1, 4, 27)\n", + "(1, 6, 18)\n", + "(1, 9, 12)\n", + "(2, 3, 18)\n", + "(2, 6, 9)\n", + "(3, 4, 9)\n" + ] + } + ], + "source": [ + "def find_triplets(product):\n", + " triplets = []\n", + " for a in range(1, product + 1):\n", + " for b in range(a + 1, product + 1):\n", + " for c in range(b + 1, product + 1):\n", + " if a * b * c == product:\n", + " triplets.append((a, b, c))\n", + " return triplets\n", + "\n", + "product = 108\n", + "triplets = find_triplets(product)\n", + "\n", + "print(f\"All distinct triplets with a product of {product}:\")\n", + "for triplet in triplets:\n", + " print(triplet)" + ] + }, + { + "cell_type": "markdown", + "id": "fddeabc0-4925-4145-b666-7a094fd61980", + "metadata": {}, + "source": [ + "# Claude 3.5 Sonnet\n", + "\n", + "Claude's answer is quite simlar to ChatGPT 4o's (but I would criticize it for not taking a parameter):" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "83f9088c-8444-4207-ad24-91d8e3a98004", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "There are 8 ways to express 108 as a product of three distinct positive integers:\n", + "1 x 2 x 54 = 108\n", + "1 x 3 x 36 = 108\n", + "1 x 4 x 27 = 108\n", + "1 x 6 x 18 = 108\n", + "1 x 9 x 12 = 108\n", + "2 x 3 x 18 = 108\n", + "2 x 6 x 9 = 108\n", + "3 x 4 x 9 = 108\n" + ] + } + ], + "source": [ + "def find_combinations():\n", + " result = []\n", + " for i in range(1, 108):\n", + " for j in range(i + 1, 108):\n", + " k = 108 // (i * j)\n", + " if i * j * k == 108 and k > j:\n", + " result.append((i, j, k))\n", + " return result\n", + "\n", + "def main():\n", + " combinations = find_combinations()\n", + " print(f\"There are {len(combinations)} ways to express 108 as a product of three distinct positive integers:\")\n", + " for combo in combinations:\n", + " print(f\"{combo[0]} x {combo[1]} x {combo[2]} = 108\")\n", + "\n", + "if __name__ == \"__main__\":\n", + " main()" + ] + }, + { + "cell_type": "markdown", + "id": "eefc3007-105d-41cd-9023-0a34609517fd", + "metadata": {}, + "source": [ + "# Meta AI Llama 3.1\n", + "\n", + "This program forgets to test for `i < j < k`, and thus produces six permutations of each triplet. Oddly, it claims that \"*When you run this program, it will output: [(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]*\", so it knows the correct answer, but it didn't run the program to test that it is correct." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "de1d0cc3-10a8-417a-a35a-0a66325f549c", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (1, 12, 9), (1, 18, 6), (1, 27, 4), (1, 36, 3), (1, 54, 2), (2, 1, 54), (2, 3, 18), (2, 6, 9), (2, 9, 6), (2, 18, 3), (2, 54, 1), (3, 1, 36), (3, 2, 18), (3, 4, 9), (3, 9, 4), (3, 18, 2), (3, 36, 1), (4, 1, 27), (4, 3, 9), (4, 9, 3), (4, 27, 1), (6, 1, 18), (6, 2, 9), (6, 9, 2), (6, 18, 1), (9, 1, 12), (9, 2, 6), (9, 3, 4), (9, 4, 3), (9, 6, 2), (9, 12, 1), (12, 1, 9), (12, 9, 1), (18, 1, 6), (18, 2, 3), (18, 3, 2), (18, 6, 1), (27, 1, 4), (27, 4, 1), (36, 1, 3), (36, 3, 1), (54, 1, 2), (54, 2, 1)]\n" + ] + } + ], + "source": [ + "def find_factors(n):\n", + " factors = []\n", + " for i in range(1, n + 1):\n", + " if n % i == 0:\n", + " factors.append(i)\n", + " return factors\n", + "\n", + "def find_triplets(n):\n", + " factors = find_factors(n)\n", + " triplets = []\n", + " for i in factors:\n", + " for j in factors:\n", + " if i * j <= n:\n", + " k = n // (i * j)\n", + " if i != j and j != k and i != k and i * j * k == n:\n", + " triplets.append((i, j, k))\n", + " return triplets\n", + "\n", + "print(find_triplets(108))" + ] + }, + { + "cell_type": "markdown", + "id": "742f3b13-a53f-497a-97e7-5aa58b361951", + "metadata": {}, + "source": [ + "# Perplexity\n", + "\n", + "Perplexity does well." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "7a764a01-8f18-4773-bd7a-fbc8dc652450", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All combinations of three distinct positive integers with a product of 108:\n", + "1 x 2 x 54 = 108\n", + "1 x 3 x 36 = 108\n", + "1 x 4 x 27 = 108\n", + "1 x 6 x 18 = 108\n", + "1 x 9 x 12 = 108\n", + "2 x 3 x 18 = 108\n", + "2 x 6 x 9 = 108\n", + "3 x 4 x 9 = 108\n", + "\n", + "Total number of combinations: 8\n" + ] + } + ], + "source": [ + "def find_combinations(target):\n", + " combinations = []\n", + " \n", + " for i in range(1, target + 1):\n", + " if target % i == 0:\n", + " for j in range(i + 1, target + 1):\n", + " if (target // i) % j == 0:\n", + " k = target // (i * j)\n", + " if k > j and i * j * k == target:\n", + " combinations.append((i, j, k))\n", + " \n", + " return combinations\n", + "\n", + "target_product = 108\n", + "result = find_combinations(target_product)\n", + "\n", + "print(f\"All combinations of three distinct positive integers with a product of {target_product}:\")\n", + "for combo in result:\n", + " print(f\"{combo[0]} x {combo[1]} x {combo[2]} = {target_product}\")\n", + "\n", + "print(f\"\\nTotal number of combinations: {len(result)}\")" + ] + }, + { + "cell_type": "markdown", + "id": "53f85624-bfe3-42cc-b55b-b86a56089314", + "metadata": {}, + "source": [ + "# Cohere Chat Playground\n", + "\n", + "Cohere fails to find any combinations. The 5th line should just be `k = product // i // j`; Cohere bizarrely adds `* j * i`. It also fails to test that `i < j < k`." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "61139601-ced2-4598-ae76-464c1041d2d1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "All combinations of three distinct positive integers with a product of 108:\n" + ] + } + ], + "source": [ + "def find_combinations(product):\n", + " combinations = []\n", + " for i in range(1, product // 3 + 1):\n", + " for j in range(i, product // 2 + 1):\n", + " k = product // i // j * j * i\n", + " if i * j * k == product and i != j and j != k and i != k:\n", + " combinations.append((i, j, k))\n", + " return combinations\n", + "\n", + "product = 108\n", + "combinations = find_combinations(product)\n", + "print(f\"All combinations of three distinct positive integers with a product of {product}:\")\n", + "for combo in combinations:\n", + " print(combo)" + ] + }, + { + "cell_type": "markdown", + "id": "51b0b353-79d2-4da7-a6da-8231d5ff811e", + "metadata": {}, + "source": [ + "# HuggingChat\n", + "\n", + "Hugging Chat produced a correct concise program. I note that `i < j < k` would be cleaner than `k >= j and i!= j and j!= k` here (and for others as well)." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "6e4b823b-66f1-48ae-b4ca-00387d480e03", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" + ] + } + ], + "source": [ + "def find_triplets(n):\n", + " triplets = []\n", + " for i in range(1, n):\n", + " for j in range(i+1, n):\n", + " k = n // (i * j)\n", + " if k >= j and i * j * k == n and i!= j and j!= k:\n", + " triplets.append((i, j, k))\n", + " return triplets\n", + "\n", + "print(find_triplets(108))" + ] + }, + { + "cell_type": "markdown", + "id": "23d78bca-9b35-43c0-891a-a9c0e6801b31", + "metadata": {}, + "source": [ + "# You.com\n", + "\n", + "You.com produces a correct solution, with some nice optimizations that make it *O*(*n*5/6), whereas most of the solutions are *O*(*n*2). This means it can handle a 14-digit product in a second of run time, whereas the human-written solution can only handle 10-digit products in one second, while the HuggingChat version (for example) takes several seconds just to handle a 5-digit product." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "fe69bb04-02de-4f38-a8ab-6693c15b02c4", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[(1, 2, 54), (1, 3, 36), (1, 4, 27), (1, 6, 18), (1, 9, 12), (2, 3, 18), (2, 6, 9), (3, 4, 9)]\n" + ] + } + ], + "source": [ + "def find_triplets(product):\n", + " triplets = []\n", + " for i in range(1, int(product ** (1/3)) + 1): # The cube root of the product is the maximum possible value for i\n", + " if product % i == 0:\n", + " for j in range(i + 1, int((product / i) ** 0.5) + 1): # The square root of the product divided by i is the maximum possible value for j\n", + " if (product / i) % j == 0:\n", + " k = product // (i * j)\n", + " if k > j: # Ensure the integers are distinct\n", + " triplets.append((i, j, k))\n", + " return triplets\n", + "\n", + "triplets = find_triplets(108)\n", + "print(triplets)" ] }, { "cell_type": "code", "execution_count": null, - "id": "6db0bbe1-5156-4e05-9a02-bbe3ed156f89", + "id": "fc16ca96-0b05-4ad4-82c0-552bb99373fd", "metadata": {}, "outputs": [], "source": []