Add files via upload

This commit is contained in:
Peter Norvig
2025-12-13 22:55:06 -08:00
committed by GitHub
parent a522ee8225
commit 9cf68a49e8
2 changed files with 220 additions and 181 deletions

View File

@@ -9,7 +9,7 @@
"\n",
"# Advent of Code 2025: The AI LLM Edition\n",
"\n",
"*In **this notebook** I show some solutions by Gemini, Claude, and ChatGPT, three AI Large Language Models (LLMs) for the\n",
"*This notebook shows some solutions by Gemini, Claude, and ChatGPT, three AI Large Language Models (LLMs), for the\n",
"2025 [**Advent of Code**](https://adventofcode.com/) (AoC) programming puzzles. In order to understand each puzzle, you'll have to look at the problem descriptions at [**Advent of Code**](https://adventofcode.com/2025) for each [**Day**](https://adventofcode.com/2025/day/1), and you can also look at [**my solutions**](Advent2025.ipynb), which I did before turning to the LLMs.*\n",
"\n",
"*All the code in this notebook is written by an LLM (except for the one line where I call the LLM's code for each puzzle). My comments (like this one) are in italics, and my prompts given to the LLMs are in **bold italics**. Sometimes I quote the LLM's responses; those are in* regular roman font.\n",
@@ -18,7 +18,7 @@
"\n",
"# Conclusions\n",
"\n",
"*Now that the 12 days are finished, I moved my conclusions up here:*\n",
"*Now that the 12 days are finished, here are my conclusions:*\n",
"\n",
"- *Overall, the LLMs did very well, producing code that gives the correct answer to every puzzle.*\n",
"- *The run time were reasonably fast, all under a second, except for 12.1, which took about 3 minutes.*\n",
@@ -35,7 +35,7 @@
" - *when to use sets versus lists*\n",
" - *handling a 2D grid of points with 4 or 8 directional neighbors*\n",
" - *accumulating sums in a defaultdict or Counter*\n",
" - *advanced data structures such as Union-Find*\n",
" - *advanced data structures such as Union-Find and dancing links*\n",
" - *computational geometry algorithms including scantiness, flood fill, and ray-casting*\n",
" - *recognizing an integer linear programming problem and knowing how to call a package*\n",
" - *depth-first search, and recognizing search properties such as commutativity of actions*\n",
@@ -58,11 +58,11 @@
" return total\n",
"\n",
"***Note:*** *For brevity, I have removed some of the LLM output, such as:*\n",
"- *Usage examples on how to run the program on the test input*.\n",
"- *Prose analysis of the problem, descriptions of the programs, and chain of thought thinking.*\n",
" - *In most cases these were accurate and thorough! But they were too long to include here.*\n",
"- *The \"`#!/usr/bin/env python3`\" at the top of a file*.\n",
"- *The \"`__main__`\" idiom for running code from the command line and reading from `sys.stdin`*\n",
"- *Usage examples on how to run the program on the test input*.\n",
"- *The \"`#!/usr/bin/env python3`\" and \"`__main__`\" idioms and `sys.stdin.readline()` for command line execution.*\n",
" - *In retrospect, I should have specified in the prompt that \"**You can get the input text as a single string with `get_text()`***\n",
"\n",
"# Day 0: Human\n",
"\n",
@@ -87,9 +87,11 @@
"id": "39657fac-2a12-43d5-ab79-024ebceeadc4",
"metadata": {},
"source": [
"# [Day 1](https://adventofcode.com/2025/day/1): Gemini 3 Pro\n",
"# Day 1: Gemini 3 Pro\n",
"\n",
"*For Day 1, **Part 1** I started with the Gemini 3 Pro Fast model, which produced this code:*"
"*The [Day 1 **Part 1**](https://adventofcode.com/2025/day/1) puzzle is about turning the dial on a safe and counting how many times the pointer ends up at 0.*\n",
"\n",
"*I started with the Gemini 3 Pro Fast model, which produced this code:*"
]
},
{
@@ -99,7 +101,7 @@
"metadata": {},
"outputs": [],
"source": [
"def solve_safe_password(rotations):\n",
"def solve_safe_password(input_str):\n",
" \"\"\"\n",
" Simulates the safe dial rotations and counts how many times the dial\n",
" lands on 0.\n",
@@ -110,6 +112,10 @@
" Returns:\n",
" The total number of times the dial points at 0 after a rotation.\n",
" \"\"\"\n",
" \n",
" # Split the single string into individual lines\n",
" rotations = input_str.strip().split('\\n')\n",
" \n",
" # The dial starts at 50\n",
" current_position = 50\n",
" zero_count = 0\n",
@@ -157,7 +163,7 @@
"id": "c5219309-4f2f-49b3-8208-8c7a3526e328",
"metadata": {},
"source": [
"*A bit verbose and over-commented, but pretty nice. Below I fetch the Day 1 data with `get_text(1)`, split it into lines (as expected by Gemini's `solve_safe_password` function), and verify that the answer is correct:*"
"*A bit verbose and over-commented (and under-type-annotated), but pretty nice. Below I fetch my Day 1 data with `get_text(1)`, pass it to `solve_safe_password`, and verify that Gemini's answer is correct:*"
]
},
{
@@ -169,7 +175,7 @@
{
"data": {
"text/plain": [
"Puzzle 1.1: .0012 seconds, answer 1182 correct"
"Puzzle 1.1: .0015 seconds, answer 1182 correct"
]
},
"execution_count": 3,
@@ -178,7 +184,7 @@
}
],
"source": [
"rotations = get_text(1).splitlines()\n",
"rotations = get_text(1)\n",
"\n",
"answer(puzzle=1.1, solution=1182, code=lambda: \n",
" solve_safe_password(rotations))"
@@ -189,7 +195,7 @@
"id": "47a04265-2dab-4a4c-9a24-e1c4c1b93c6b",
"metadata": {},
"source": [
"*For **Part 2** I gave Gemini Fast the updated instructions, and again it had a nice analysis and the following code:*"
"*For **Part 2** I gave Gemini Fast the updated instructions (that we should count a zero crossing in the middle of a rotation), and again it had a nice analysis and the following code:*"
]
},
{
@@ -201,7 +207,7 @@
"source": [
"import math\n",
"\n",
"def solve_new_safe_password(rotations):\n",
"def solve_new_safe_password(input_str):\n",
" \"\"\"\n",
" Simulates the safe dial rotations and counts every time the dial\n",
" points at 0, including during the rotation movement.\n",
@@ -212,6 +218,10 @@
" Returns:\n",
" The total number of times the dial points at 0.\n",
" \"\"\"\n",
" \n",
" # Split the single string into individual lines\n",
" rotations = input_str.strip().split('\\n')\n",
" \n",
" # The dial starts at 50\n",
" current_position = 50\n",
" total_zero_crossings = 0\n",
@@ -295,7 +305,7 @@
{
"data": {
"text/plain": [
"Puzzle 1.2: .0016 seconds, answer 7509 WRONG; EXPECTED ANSWER IS 6907"
"Puzzle 1.2: .0021 seconds, answer 7509 WRONG; EXPECTED ANSWER IS 6907"
]
},
"execution_count": 5,
@@ -328,13 +338,17 @@
"metadata": {},
"outputs": [],
"source": [
"def solve_safe_password_method_2(rotations):\n",
"def solve_safe_password_method_2(input_str):\n",
" \"\"\"\n",
" Simulates the safe dial rotations and counts every time the dial\n",
" points at 0 (Method 0x434C49434B).\n",
" \n",
" Correctly handles cases where the dial starts at 0.\n",
" \"\"\"\n",
" \n",
" # Split the single string into individual lines\n",
" rotations = input_str.strip().split('\\n')\n",
" \n",
" current_position = 50\n",
" total_zero_crossings = 0\n",
" MODULO = 100\n",
@@ -395,7 +409,7 @@
{
"data": {
"text/plain": [
"Puzzle 1.2: .0015 seconds, answer 6907 correct"
"Puzzle 1.2: .0018 seconds, answer 6907 correct"
]
},
"execution_count": 7,
@@ -420,7 +434,7 @@
},
{
"cell_type": "code",
"execution_count": 67,
"execution_count": 8,
"id": "8eac98f3-b884-4d95-b38b-ea4365ec3004",
"metadata": {},
"outputs": [],
@@ -461,24 +475,24 @@
"id": "2bd0db00-952b-47e5-b787-b3887b7539f1",
"metadata": {},
"source": [
"*This code is overall rather nice, but conspicously lacks comments and doc strings. It uses the more efficient \"enumerate over the first half of the digit string\" strategy, but is not precise in narrowing down the range it enumerates over. For example, for the range \"999000-109000\", this code will enumerate the range (100, 1000), when it could enumerate just the range (999, 1000).*\n",
"*This code is overall rather nice, but conspicously lacks comments and doc strings. (If you can't tell, the goal is to count the number of invalid IDs, which are numbers that consist of the same digit string repeated twice, like \"100100\", within some ID ranges.) It uses the more efficient \"enumerate over the first half of the digit string\" strategy, but is not precise in narrowing down the range it enumerates over. For example, for the range \"999000-109000\", this code will enumerate the range (100, 1000), when it could enumerate just the range (999, 1000).*\n",
"\n",
"*I verified that the code gives the correct answer:*"
]
},
{
"cell_type": "code",
"execution_count": 68,
"execution_count": 9,
"id": "a91845ec-ace7-482e-b0b5-8a620ef3461f",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 2.1: .1298 seconds, answer 23560874270 correct"
"Puzzle 2.1: .1342 seconds, answer 23560874270 correct"
]
},
"execution_count": 68,
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
@@ -495,7 +509,7 @@
"id": "a31d006f-8cf2-4e4c-92d3-d7b7def22227",
"metadata": {},
"source": [
"*When given the **Part 2** instructions, Claude wrote the following code:*"
"*When given the **Part 2** instructions, Claude wrote the following code (where invalid IDs now can be any number of repetitions, like \"100100100\"):*"
]
},
{
@@ -552,17 +566,17 @@
},
{
"cell_type": "code",
"execution_count": 69,
"execution_count": 11,
"id": "9c0049e6-a992-4aa8-a2d7-3ea748e361a6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 2.2: .1254 seconds, answer 23560874270 WRONG; EXPECTED ANSWER IS 44143124633"
"Puzzle 2.2: .1546 seconds, answer 44143124633 correct"
]
},
"execution_count": 69,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
@@ -579,7 +593,9 @@
"source": [
"# [Day 3](https://adventofcode.com/2025/day/3) ChatGPT 5.1 Auto\n",
"\n",
"*For **Part 1** ChatGPT gave a very brief analysis of the problem and then wrote a program that was designed to be called from the command line, using the `\"__main__\"` idiom. I told it \"**I don't want to run it like that, I want a function that I can pass the input text and have it return an int**\" and it produced this code (conspicuously lacking comments or doc strings):*"
"*The puzzle today is to pick the biggest two-digit number from a string of digits, like \"87\" from \"8675305\". Return the sum over all digit strings.*\n",
"\n",
"*For **Part 1** ChatGPT gave a very brief analysis of the problem and produced this code (conspicuously lacking comments or doc strings):*"
]
},
{
@@ -623,17 +639,17 @@
},
{
"cell_type": "code",
"execution_count": 70,
"execution_count": 13,
"id": "09bf306b-8762-4346-aff9-bcff33639c71",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 3.1: .0072 seconds, answer 169408143086082 WRONG; EXPECTED ANSWER IS 17085"
"Puzzle 3.1: .0043 seconds, answer 17085 correct"
]
},
"execution_count": 70,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
@@ -650,7 +666,9 @@
"id": "4a07f37f-c5e3-4484-a7b1-2cae0ff5bd01",
"metadata": {},
"source": [
"*For **Part 2** ChatGPT did well (and for some reason included comments and doc strings, which were missing the first time):*"
"*For **Part 2** *we want a 12-digit number, not 2-digit.**\n",
"\n",
"*ChatGPT did well (and for some reason included comments and doc strings, which were missing the first time):*"
]
},
{
@@ -706,17 +724,17 @@
},
{
"cell_type": "code",
"execution_count": 71,
"execution_count": 15,
"id": "70bde9b9-beb1-4e9d-bef6-0f20fb958891",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 3.2: .0086 seconds, answer 169408143086082 correct"
"Puzzle 3.2: .0080 seconds, answer 169408143086082 correct"
]
},
"execution_count": 71,
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
@@ -733,7 +751,9 @@
"source": [
"# [Day 4](https://adventofcode.com/2025/day/4): Gemini 3 Pro\n",
"\n",
"*Gemini produced a solution to **Part 1** that is straightforward and efficient, although perhaps could use some abstraction (e.g. if they had a function to count neighbors, they wouldn't need the `continue`).*"
"*We are given a 2D map and asked how many squares have a \"@\" that is surrounded by fewer than 4 other \"@\" out of 8 neighbors.*\n",
"\n",
"*Gemini produced a solution to **Part 1** that is straightforward and efficient, although perhaps could use some abstraction (e.g. if they had a function to count neighbors, they wouldn't need the `continue` in the main loop).*"
]
},
{
@@ -794,24 +814,24 @@
"id": "1e12bc4c-8cc8-4c01-b4ad-5392b49642e6",
"metadata": {},
"source": [
"*I like that the 8 `directions` are laid out in a 2D pattern that corresponds to where the directions actually go.*\n",
"*I really like that the 8 `directions` are laid out in a 2D pattern that corresponds to where the directions actually go.*\n",
"\n",
"*Here I verify that the code does produce the correct answer:*"
]
},
{
"cell_type": "code",
"execution_count": 72,
"execution_count": 17,
"id": "5b54c152-ce26-4baf-8b51-d4a166c6c2e7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 4.1: .0183 seconds, answer 1569 correct"
"Puzzle 4.1: .0174 seconds, answer 1569 correct"
]
},
"execution_count": 72,
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
@@ -828,6 +848,8 @@
"id": "76845e18-6209-4995-a825-568782b79d0d",
"metadata": {},
"source": [
"*Now we're asked if we repeatedly remove \"@\" with fewer than 4 neighbors, how many are left in the end?*\n",
"\n",
"*For **Part 2**, Gemini chose the \"repeatedly remove acessible rolls\" approach rather than the more efficient (and arguable more elegant) \"remove a roll and check its neighbors\" approach.*"
]
},
@@ -894,17 +916,17 @@
},
{
"cell_type": "code",
"execution_count": 73,
"execution_count": 19,
"id": "b47c2e05-978a-4b22-aafc-e31ee1825387",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 4.2: .0675 seconds, answer 9280 correct"
"Puzzle 4.2: .4197 seconds, answer 9280 correct"
]
},
"execution_count": 73,
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
@@ -921,7 +943,7 @@
"source": [
"*I asked Gemini \"Can you do it more efficiently\" and it replied*\n",
"\n",
"The previous implementation scanned the entire grid repeatedly (`O(Rows * Cols * Iterations)`). I've replaced it with a Worklist/Queue algorithm that checks neighbors only when necessary. This reduces the complexity to be roughly proportional to the number of rolls, which is significantly faster for large grids.\n",
" The previous implementation scanned the entire grid repeatedly (`O(Rows * Cols * Iterations)`). I've replaced it with a Worklist/Queue algorithm that checks neighbors only when necessary. This reduces the complexity to be roughly proportional to the number of rolls, which is significantly faster for large grids.\n",
"\n",
"*This is the same approach that I used in my solution.*"
]
@@ -1003,17 +1025,17 @@
},
{
"cell_type": "code",
"execution_count": 74,
"execution_count": 21,
"id": "f370ee38-67af-42a6-9ad3-cdeec2019ff3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 4.2: .0631 seconds, answer 9280 correct"
"Puzzle 4.2: .1065 seconds, answer 9280 correct"
]
},
"execution_count": 74,
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
@@ -1030,6 +1052,8 @@
"source": [
"# [Day 5](https://adventofcode.com/2025/day/5): Claude Opus 4.5\n",
"\n",
"*We are asked how many ingredient IDs from a list of IDs are fresh, according to a list of fresh ID ranges.*\n",
"\n",
"*Claude produces a straightforward program that solves **Part 1** just fine and demonstrates good use of abstraction. This time it has nice doc strings; for Day 2 it had none. Go figure.*"
]
},
@@ -1104,7 +1128,7 @@
{
"data": {
"text/plain": [
"Puzzle 5.1: .0050 seconds, answer 635 correct"
"Puzzle 5.1: .0064 seconds, answer 635 correct"
]
},
"execution_count": 23,
@@ -1124,7 +1148,7 @@
"id": "dabe3fbc-0fa6-46d0-adfb-f6413f3a63d5",
"metadata": {},
"source": [
"*For **Part 2**, Claude chose to sort ranges, then merge them, then count the total of the sizes of the ranges. That's a reasonable approachit involves an extra step of merging ranges (instead of just counting on the fly) but it could be useful to have the merged ranges.*"
"*For **Part 2**, we are asked how many possible fresh IDs are there, whether they are in inventory or not. Claude chose to sort ranges (good choice), then merge them, then count the total of the sizes of the ranges. That's a reasonable approachit involves an extra step of merging ranges (instead of just counting on the fly) but it could be useful to have the merged ranges.*"
]
},
{
@@ -1223,6 +1247,8 @@
"source": [
"# [Day 6](https://adventofcode.com/2025/day/6): ChatGPT 5.1 Auto\n",
"\n",
"*We are asked to solve some math problems written in an unusal format (vertical instead of horizontal, with some special rules).*\n",
"\n",
"*For **Part 1** ChatGPT produced a program that is correct, but has poor abstraction, with one long 63-line function. (It also contains a pet peeve of mine: in lines 1720 the pattern \"`if some_boolean: True else: False`\" can always be replaced with \"`some_boolean`\".)*"
]
},
@@ -1319,7 +1345,7 @@
{
"data": {
"text/plain": [
"Puzzle 6.1: .0070 seconds, answer 5877594983578 correct"
"Puzzle 6.1: .0079 seconds, answer 5877594983578 correct"
]
},
"execution_count": 27,
@@ -1339,7 +1365,9 @@
"id": "28de0612-03f3-4587-88cb-18ef9b10ee00",
"metadata": {},
"source": [
"*For **Part 2** ChatGPT writes an even-longer function, with even more complications. For example in \"Determine the operator\"; the 14 lines could be replaced with `op = (add if '+' in segment else mul)`. I think ChatGPT ws trying to be robust against errors in the input.*"
"*For **Part 2** we learn that the details of the format are even more ideosyncratic: the exact placement of spaces is crucial. (Here, as in each day's problem, see [my other notebook](Advent-2025.ipynb) or [the official AoC site](https://adventofcode.com/2025/day/6) for details*.)\n",
"\n",
"*ChatGPT writes an even-longer function, with even more complications. For example in \"Determine the operator\"; the 14 lines could be replaced with `op = (add if '+' in segment else mul)`. I think ChatGPT ws trying to be robust against errors in the input.*"
]
},
{
@@ -1455,7 +1483,7 @@
{
"data": {
"text/plain": [
"Puzzle 6.2: .0057 seconds, answer 11159825706149 correct"
"Puzzle 6.2: .0065 seconds, answer 11159825706149 correct"
]
},
"execution_count": 29,
@@ -1475,6 +1503,8 @@
"source": [
"# [Day 7](https://adventofcode.com/2025/day/7): Gemini 3 Pro\n",
"\n",
"*We are given a 2D grid of characters where a beam enters at the top and moves downward, but is split to both sides by a \"`^`\" character. We need to compute the total number of split beams at the bottom.*\n",
"\n",
"*Gemini's code for **Part 1** is a bit verbose, but gets the job done.*"
]
},
@@ -1567,7 +1597,7 @@
{
"data": {
"text/plain": [
"Puzzle 7.1: .0008 seconds, answer 1681 correct"
"Puzzle 7.1: .0011 seconds, answer 1681 correct"
]
},
"execution_count": 31,
@@ -1587,7 +1617,9 @@
"id": "c56012a6-b9e7-4053-84c1-79fa457436b1",
"metadata": {},
"source": [
"*For **Part 2**, Gemini gets the key ideas (although it is a bit old-fashioned to use a `defaultdict(int)` instead of a `Counter`).*"
"*For **Part 2**, we are asked how many ways can the beams get to the bottom.*\n",
"\n",
"*Gemini gets the key ideas (although it is a bit old-fashioned to use a `defaultdict(int)` instead of a `Counter`).*"
]
},
{
@@ -1680,7 +1712,7 @@
{
"data": {
"text/plain": [
"Puzzle 7.2: .0018 seconds, answer 422102272495018 correct"
"Puzzle 7.2: .0019 seconds, answer 422102272495018 correct"
]
},
"execution_count": 33,
@@ -1726,7 +1758,7 @@
"id": "7a39ec7d-a27b-40ec-9d32-13f922cc764e",
"metadata": {},
"source": [
"*where this code sample is what I wrote for 9.2. Here is Gemini's reply:*\n",
"*where this code is what I wrote for 7.2. Here is Gemini's reply:*\n",
"\n",
"This is a great comparison. Your code is much more concise and \"Pythonic,\" using collections.Counter and in-place updates to reduce verbosity. It strips away a lot of the boilerplate that my solution included for robustness.\n",
"\n",
@@ -1734,7 +1766,7 @@
"- **My Code**: Scans the whole grid for 'S', which handles cases where the entry point isn't at the very top.\n",
"- **Your Code**: Relies on Python's robust dictionary handling to avoid index errors (if b-1 is negative, it just adds a key to the dict). This is clever, but effectively allows particles to wrap around or exist in \"negative space\" if not careful (though typically harmless for this specific puzzle input).\n",
"\n",
"*It seems to me that the instructions make it clear that the 'S' must be on the top line. (If there is no 'S' on the top line, my code will return 0, not crash.) Similarly, the puzzle implies that beams can't go outside the borders, but Gemini is right that if the input file **did** have a splitter in the first or last column then there could be a problem that I don't check for.*\n",
"***My rebuttal**: To me, the instructions make it clear that the 'S' must be on the top line. (If there is no 'S' on the top line, my code will return 0, not crash.) Similarly, the puzzle implies that beams can't go outside the borders, but Gemini is right that if the input file **did** have a splitter in the first or last column then there could be a problem that I don't check for.*\n",
"\n",
"*Next I asked Gemini: **Do you think your code could be improved with some more modularity/abstraction, such as having a separate function to get input?** It replied:*\n",
"\n",
@@ -1827,7 +1859,7 @@
"id": "23e17286-9534-4da1-8ec9-26a4333b8c70",
"metadata": {},
"source": [
"*I like this refactored version better. It produces the correct answer:*"
"*I like this refactored version better (although the \"`#1 #2 #3 #4`\" comments don't add much). This version produces the correct answer:*"
]
},
{
@@ -1839,7 +1871,7 @@
{
"data": {
"text/plain": [
"Puzzle 7.2: .0019 seconds, answer 422102272495018 correct"
"Puzzle 7.2: .0025 seconds, answer 422102272495018 correct"
]
},
"execution_count": 36,
@@ -1859,12 +1891,14 @@
"source": [
"# [Day 8](https://adventofcode.com/2025/day/8): Claude Opus 4.5\n",
"\n",
"*In this puzzle there is a collection of junction boxes with coordinates in 3D space. We are asked to connect them together, conneting the nerarest two first, and continuing for 1000 connections, and then report the product of the sizes of the three largest connected circuits.*\n",
"\n",
"*Here's Claude's code for **Part 1**:*"
]
},
{
"cell_type": "code",
"execution_count": 77,
"execution_count": 37,
"id": "fce446d7-b72f-4ade-b783-b40f4212b43b",
"metadata": {},
"outputs": [],
@@ -1966,22 +2000,22 @@
"id": "0a3f11c5-e193-41ad-b53c-4ca3aac11fc3",
"metadata": {},
"source": [
"*Here I parse the input and hand it to Claude's function, which computes the correct answer:*"
"*Claude's function computes the correct answer:*"
]
},
{
"cell_type": "code",
"execution_count": 78,
"execution_count": 38,
"id": "dd99ffbf-3304-49f5-8cfa-d5428edf0db7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 8.1: .8995 seconds, answer 24360 correct"
"Puzzle 8.1: 1.3418 seconds, answer 24360 correct"
]
},
"execution_count": 78,
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
@@ -1998,12 +2032,12 @@
"id": "2b1df990-73d9-4cb1-aeca-3bad52c86c8a",
"metadata": {},
"source": [
"*Here is Claude's **Part 2** code:*"
"*Here is Claude's **Part 2** code, in which we're asked about the X coordinates of the last two connections:*"
]
},
{
"cell_type": "code",
"execution_count": 79,
"execution_count": 39,
"id": "9ed8312b-4455-4f2d-bef6-67097898aefd",
"metadata": {},
"outputs": [],
@@ -2102,7 +2136,7 @@
},
{
"cell_type": "code",
"execution_count": 80,
"execution_count": 40,
"id": "64309157-e1d3-48fc-ba06-e5e66ba23026",
"metadata": {},
"outputs": [
@@ -2117,10 +2151,10 @@
{
"data": {
"text/plain": [
"Puzzle 8.2: .9127 seconds, answer 2185817796 correct"
"Puzzle 8.2: 1.0359 seconds, answer 2185817796 correct"
]
},
"execution_count": 80,
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
@@ -2137,12 +2171,14 @@
"source": [
"# [Day 9](https://adventofcode.com/2025/day/9): ChatGPT 5.1 Auto\n",
"\n",
"*We are given the (x, y) coordsinates of a collection of red tiles on the floor, and asked what is the largest rectangle with two red tiles as corners.*\n",
"\n",
"*For **Part 1**, I was getting tired of all the programs that have a `main` that reads from input and prints the answer, so I told ChatGPT: **Refactor to have a function that takes the points as input and returns the area** and got this:*"
]
},
{
"cell_type": "code",
"execution_count": 63,
"execution_count": 41,
"id": "aaf6b9df-b55d-4416-b806-167ad3b70fd5",
"metadata": {},
"outputs": [],
@@ -2191,17 +2227,17 @@
},
{
"cell_type": "code",
"execution_count": 81,
"execution_count": 42,
"id": "352cb801-67d7-4aaf-b0a1-1a59c1990c22",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 9.1: .0305 seconds, answer 4772103936 correct"
"Puzzle 9.1: .0333 seconds, answer 4772103936 correct"
]
},
"execution_count": 81,
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
@@ -2218,12 +2254,14 @@
"id": "5f8a09ab-0d49-4a6b-b8ea-33e4219af1eb",
"metadata": {},
"source": [
"*For **Part 2**, ChatGPT considered a [scanline algorithm](https://en.wikipedia.org/wiki/Scanline_rendering), and then went for a [flood fill](https://en.wikipedia.org/wiki/Flood_fill). It assumed \"I'm working with a grid that can have dimensions up to 2k x 2k, or 4 million cells, which seems manageable.\" ChatGPT is correct that it would be manageable under those asumptions, however the assumptions are wrong. I told it **The grid can have dimensions up to 100,000 x 100,000 and there are about 500 red tile points**. I also gave it my input file, so that it would have a chance of discovering the structure of thChatGPT then came up with a [ray casting](https://en.wikipedia.org/wiki/Ray_casting) approach using [grid compression](https://stackoverflow.com/questions/29528934/coordinate-compression): define a grid where each red tile is in a separate cell, where the x and y extent of that cell spans an integer range (which must be recorded elsewhere). Now the grid is only of size roughly 500<sup>2</sup> instead of 100,000<sup>2</sup>.*"
"*For **Part 2**, we're told to consider the sequence of tiles as the vertexes of a polygon, and find the largest rectangle inside the polygon.*\n",
"\n",
"*ChatGPT considered a [scanline algorithm](https://en.wikipedia.org/wiki/Scanline_rendering), and then went for a [flood fill](https://en.wikipedia.org/wiki/Flood_fill). It assumed \"I'm working with a grid that can have dimensions up to 2k x 2k, or 4 million cells, which seems manageable.\" ChatGPT is correct that it would be manageable under those asumptions, however the assumptions are wrong. I told it **The grid can have dimensions up to 100,000 x 100,000 and there are about 500 red tile points**. I also gave it my input file, so that it would have a chance of discovering the structure of thChatGPT then came up with a [ray casting](https://en.wikipedia.org/wiki/Ray_casting) approach using [grid compression](https://stackoverflow.com/questions/29528934/coordinate-compression): define a grid where each red tile is in a separate cell, where the x and y extent of that cell spans an integer range (which must be recorded elsewhere). Now the grid is only of size roughly 500<sup>2</sup> instead of 100,000<sup>2</sup>.*"
]
},
{
"cell_type": "code",
"execution_count": 83,
"execution_count": 43,
"id": "16aa5c15-e5d5-4652-83d8-d99da9196617",
"metadata": {},
"outputs": [],
@@ -2397,17 +2435,17 @@
},
{
"cell_type": "code",
"execution_count": 85,
"execution_count": 44,
"id": "3388880d-6df9-4453-8b54-43175e3077e2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 9.2: .8001 seconds, answer 1529675217 correct"
"Puzzle 9.2: .8590 seconds, answer 1529675217 correct"
]
},
"execution_count": 85,
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
@@ -2424,12 +2462,14 @@
"source": [
"# [Day 10](https://adventofcode.com/2025/day/10): Gemini 3 Pro\n",
"\n",
"*We are given some descriptions of machines; see [AoC Day 10](https://adventofcode.com/2025/day/10) or [my other notebook](Advent-2025.ipynb) for details.*\n",
"\n",
"*Gemini had no problem with **Part 1:***"
]
},
{
"cell_type": "code",
"execution_count": 87,
"execution_count": 45,
"id": "da67e5ff-d1f7-418c-8d55-7dacea3a51b6",
"metadata": {},
"outputs": [],
@@ -2547,7 +2587,7 @@
},
{
"cell_type": "code",
"execution_count": 88,
"execution_count": 46,
"id": "6175af72-5ab6-4ca3-9825-613dbf1ad4f4",
"metadata": {},
"outputs": [
@@ -2561,10 +2601,10 @@
{
"data": {
"text/plain": [
"Puzzle 10.1: .0041 seconds, answer 441 correct"
"Puzzle 10.1: .0044 seconds, answer 441 correct"
]
},
"execution_count": 88,
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
@@ -2586,7 +2626,7 @@
},
{
"cell_type": "code",
"execution_count": 89,
"execution_count": 47,
"id": "8326ffe4-c435-4823-a0c4-3c062c52a06f",
"metadata": {},
"outputs": [],
@@ -2781,17 +2821,17 @@
},
{
"cell_type": "code",
"execution_count": 90,
"execution_count": 48,
"id": "25b94f1e-98ad-4bda-9b75-716dd6a503d3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Puzzle 10.2: 11.2032 seconds, answer 18559 correct"
"Puzzle 10.2: 11.4070 seconds, answer 18559 correct"
]
},
"execution_count": 90,
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
@@ -2810,12 +2850,12 @@
"\n",
" Since SciPy version 1.9.0, there is a dedicated Mixed-Integer Linear Programming (MILP) solver called scipy.optimize.milp\n",
" \n",
"*which was something I had just figured out an hour or so earlier. The milp solution is faster:*"
"*which was something I had just figured out an hour or so earlier. The C++ milp solution is (unsurpringly) faster than the Python Gaussian elimination code:*"
]
},
{
"cell_type": "code",
"execution_count": 92,
"execution_count": 49,
"id": "3f666d8d-caa6-4ec4-a75d-d23f6bf2a21a",
"metadata": {},
"outputs": [],
@@ -2894,7 +2934,7 @@
},
{
"cell_type": "code",
"execution_count": 93,
"execution_count": 50,
"id": "8e9d177e-d7b2-4887-a1be-4f4c5247924f",
"metadata": {},
"outputs": [
@@ -2908,10 +2948,10 @@
{
"data": {
"text/plain": [
"Puzzle 10.2: .1225 seconds, answer 18559 correct"
"Puzzle 10.2: .1177 seconds, answer 18559 correct"
]
},
"execution_count": 93,
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
@@ -2926,11 +2966,11 @@
"id": "f0fcbdff-1b95-4147-87c7-c6b4f0c08c80",
"metadata": {},
"source": [
"*I was wondering if Gemini had any ideas that would make a search-based approach feasible, so I prompted **Can you solve it with search?** and Gemini gave a depth-first search function that correctly solves small machines. Gemini identified the idea that operators are commutative, and applied the ordering of \"always press a button that affects the first light that is not at the goal.\" Gemini confidently states*\n",
"*I had been unable to make a search-based approach feasible, and was wondering if Gemini had any good ideas, so I prompted **Can you solve it with search?** and Gemini gave a depth-first search function that correctly solves small machines. Gemini correctly identified the idea that operators are commutative, and applied the ordering of \"always press a button that affects the first light that is not at the goal.\" Gemini confidently states*\n",
"\n",
" This drastically cuts down the search tree, making it solvable in milliseconds for typical puzzle inputs.\n",
"\n",
"*and it is true that the idea cuts down the search tree, but it still takes seconds, not milliseconds for even moderate size machines, and would probably take hours for some of the machines in the input. So in the end neither I nor Gemini can solve this efficiently with search, but both of us can figure out how to call **milp** to solve it.*"
"*and it is true that the idea cuts down the search tree, but it still takes seconds, not milliseconds for even moderate size machines, and would probably take hours for some of the machines in the input. So in the end neither I nor Gemini can solve this efficiently with search, but both of us can figure out how to call **milp** to solve it (and Gemini's z3 code probably also works).*"
]
},
{
@@ -3060,6 +3100,8 @@
"source": [
"# [Day 11](https://adventofcode.com/2025/day/11): Claude Opus 4.5\n",
"\n",
"*We are given inputs like `qxn: mow khk`, whihc means that device `qxn` outputs to `mow` and `khk`, and are asked how many distinct output paths there are from the device named `you` to the device named `out`.*\n",
"\n",
"*Claude had no trouble solving **Part 1**. It even volunteered two possible implementations of `count_paths`. One thing was strange:*\n",
"\n",
" # Convert to frozenset for caching\n",
@@ -3189,7 +3231,7 @@
"id": "76301e25-9c6b-41dd-8390-3eac05f3da38",
"metadata": {},
"source": [
"*For **Part 2** Claude again gets it right:*"
"*For **Part 2** we are asked how many paths from `svr` to `out` visit both `dac` and `fft`, and Claude again gets it right:*"
]
},
{
@@ -3285,7 +3327,7 @@
{
"data": {
"text/plain": [
"Puzzle 11.2: .0015 seconds, answer 306594217920240 correct"
"Puzzle 11.2: .0016 seconds, answer 306594217920240 correct"
]
},
"execution_count": 55,
@@ -3305,9 +3347,11 @@
"source": [
"# [Day 12](https://adventofcode.com/2025/day/12): ChatGPT 5.2 Auto\n",
"\n",
"***Note:*** *Since 3 days ago, ChatGPT has a new version, 5.2. I don't notice a big difference, but I only did this one interaction.*\n",
"***Note:*** *ChatGPT has a new version now: 5.2. I don't notice a big difference from 5.1, but I only did this one interaction.*\n",
"\n",
"*In my prompt I included my actual input, because that is key to the shortcut for solving the problem (which I covered in [**my notebook**](Advent-2025.ipynb)). ChatGPT didn't detect the shortcut and wrote code to rotate the shapes and search through possible placements. It did have the check for `total_area > W * H`, so it is able to efficiently reject the regions with too many presents. But for the regions where there is a trivial fit into 3x3 squares, ChatGPT's code still tries to pack them in tightly rather than doing the simple layout.* "
"*We are given some 3x3 grids describing the shapes of some oddly-shaped Christmas presents, thena re given some regions with given width and length, and asked if a specified number of presents of each kind can fit in the region.*\n",
"\n",
"*In my prompt I included my actual input, because that is key to the shortcut for solving the problem (which I covered in [**my notebook**](Advent-2025.ipynb)). ChatGPT didn't detect the shortcut and wrote code to rotate the shapes and search through possible placements. ChatGPT did have the check for `total_area > W * H`, so it is able to instantly reject the regions with too many presents (about half of them). But for the regions where there is a trivial fit into 3x3 squares, ChatGPT's code still tries to pack them in tightly rather than doing the simple layout.* "
]
},
{
@@ -3574,7 +3618,7 @@
{
"data": {
"text/plain": [
"Puzzle 12.1: 195.3963 seconds, answer 454 correct"
"Puzzle 12.1: 211.1226 seconds, answer 454 correct"
]
},
"execution_count": 57,
@@ -3596,7 +3640,7 @@
"source": [
"# Summary of Run Times\n",
"\n",
"*All the models and the human eventually got all the correct answers. Here are the run times and some comments.*\n",
"*Here are the run times and some comments.*\n",
"\n",
"*The LLM lines-of-code count is 5 times the human count. The LLM run times are roughly double the human-written run times, if we throw out 12.1, where the human noticed the trick and the LLM didn't. But all the solutions run in under a second, so run time is not a big deal.*"
]
@@ -3632,7 +3676,7 @@
" 11.1\tClaude\t.023\t.001\t83\t11\tEasy; LLM has a bit of vestigial code\n",
" 11.2\tClaude\t.001\t.001\t77\t11\tEasy\n",
" 12.1\tChatGPT\t3min\t.002\t238\t20\tHuman saw shortcut to avoid search; LLM wrote search functions\n",
" TOTAL 3.29\t1.60 1672\t324\tHuman-written code is 5x briefer 2x faster"
" TOTAL 3.29\t1.60 1672\t324\tHuman-written code is 5x briefer 2x faster, even ignoring 12.1"
]
},
{
@@ -3645,36 +3689,36 @@
"name": "stdout",
"output_type": "stream",
"text": [
"Puzzle 1.1: .0012 seconds, answer 1182 correct\n",
"Puzzle 1.2: .0015 seconds, answer 6907 correct\n",
"Puzzle 2.1: .1193 seconds, answer 23560874270 correct\n",
"Puzzle 2.2: .1395 seconds, answer 44143124633 correct\n",
"Puzzle 3.1: .0041 seconds, answer 17085 correct\n",
"Puzzle 3.2: .0073 seconds, answer 169408143086082 correct\n",
"Puzzle 4.1: .0160 seconds, answer 1569 correct\n",
"Puzzle 4.2: .0884 seconds, answer 9280 correct\n",
"Puzzle 5.1: .0050 seconds, answer 635 correct\n",
"Puzzle 1.1: .0015 seconds, answer 1182 correct\n",
"Puzzle 1.2: .0018 seconds, answer 6907 correct\n",
"Puzzle 2.1: .1342 seconds, answer 23560874270 correct\n",
"Puzzle 2.2: .1546 seconds, answer 44143124633 correct\n",
"Puzzle 3.1: .0043 seconds, answer 17085 correct\n",
"Puzzle 3.2: .0080 seconds, answer 169408143086082 correct\n",
"Puzzle 4.1: .0174 seconds, answer 1569 correct\n",
"Puzzle 4.2: .1065 seconds, answer 9280 correct\n",
"Puzzle 5.1: .0064 seconds, answer 635 correct\n",
"Puzzle 5.2: .0002 seconds, answer 369761800782619 correct\n",
"Puzzle 6.1: .0070 seconds, answer 5877594983578 correct\n",
"Puzzle 6.2: .0057 seconds, answer 11159825706149 correct\n",
"Puzzle 7.1: .0008 seconds, answer 1681 correct\n",
"Puzzle 7.2: .0019 seconds, answer 422102272495018 correct\n",
"Puzzle 8.1: .8497 seconds, answer 24360 correct\n",
"Puzzle 8.2: .8570 seconds, answer 2185817796 correct\n",
"Puzzle 9.1: .0312 seconds, answer 4772103936 correct\n",
"Puzzle 9.2: .7880 seconds, answer 1529675217 correct\n",
"Puzzle 10.1: .0065 seconds, answer 441 correct\n",
"Puzzle 10.2: .1212 seconds, answer 18559 correct\n",
"Puzzle 6.1: .0079 seconds, answer 5877594983578 correct\n",
"Puzzle 6.2: .0065 seconds, answer 11159825706149 correct\n",
"Puzzle 7.1: .0011 seconds, answer 1681 correct\n",
"Puzzle 7.2: .0025 seconds, answer 422102272495018 correct\n",
"Puzzle 8.1: 1.3418 seconds, answer 24360 correct\n",
"Puzzle 8.2: 1.0359 seconds, answer 2185817796 correct\n",
"Puzzle 9.1: .0333 seconds, answer 4772103936 correct\n",
"Puzzle 9.2: .8590 seconds, answer 1529675217 correct\n",
"Puzzle 10.1: .0044 seconds, answer 441 correct\n",
"Puzzle 10.2: .1177 seconds, answer 18559 correct\n",
"Puzzle 11.1: .0007 seconds, answer 574 correct\n",
"Puzzle 11.2: .0015 seconds, answer 306594217920240 correct\n",
"Puzzle 12.1: 195.3963 seconds, answer 454 correct\n",
"Puzzle 11.2: .0016 seconds, answer 306594217920240 correct\n",
"Puzzle 12.1: 211.1226 seconds, answer 454 correct\n",
"\n",
"Time in seconds: sum = 198.450, mean = 8.628, median = .007, max = 195.396\n"
"Time in seconds: sum = 214.970, mean = 9.347, median = .008, max = 211.123\n"
]
}
],
"source": [
"summary(answers) # This is the most recent run; the chart above is from a previous run."
"summary(answers) # This is the most recent run of this notebook; the chart above is from a previous run."
]
}
],

View File

@@ -192,7 +192,7 @@
{
"data": {
"text/plain": [
"Puzzle 1.2: .1357 seconds, answer 6907 correct"
"Puzzle 1.2: .1418 seconds, answer 6907 correct"
]
},
"execution_count": 6,
@@ -242,7 +242,7 @@
{
"data": {
"text/plain": [
"Puzzle 1.2: .0010 seconds, answer 6907 correct"
"Puzzle 1.2: .0009 seconds, answer 6907 correct"
]
},
"execution_count": 8,
@@ -387,7 +387,7 @@
{
"data": {
"text/plain": [
"Puzzle 2.1: .0027 seconds, answer 23560874270 correct"
"Puzzle 2.1: .0028 seconds, answer 23560874270 correct"
]
},
"execution_count": 12,
@@ -471,7 +471,7 @@
{
"data": {
"text/plain": [
"Puzzle 2.1: .0027 seconds, answer 23560874270 correct"
"Puzzle 2.1: .0029 seconds, answer 23560874270 correct"
]
},
"execution_count": 15,
@@ -493,7 +493,7 @@
{
"data": {
"text/plain": [
"Puzzle 2.2: .0038 seconds, answer 44143124633 correct"
"Puzzle 2.2: .0037 seconds, answer 44143124633 correct"
]
},
"execution_count": 16,
@@ -679,7 +679,7 @@
{
"data": {
"text/plain": [
"Puzzle 3.1: .0007 seconds, answer 17085 correct"
"Puzzle 3.1: .0006 seconds, answer 17085 correct"
]
},
"execution_count": 23,
@@ -851,7 +851,7 @@
{
"data": {
"text/plain": [
"Puzzle 4.2: 1.2546 seconds, answer 9280 correct"
"Puzzle 4.2: 1.2620 seconds, answer 9280 correct"
]
},
"execution_count": 29,
@@ -900,7 +900,7 @@
{
"data": {
"text/plain": [
"Puzzle 4.2: .1461 seconds, answer 9280 correct"
"Puzzle 4.2: .1446 seconds, answer 9280 correct"
]
},
"execution_count": 31,
@@ -1040,7 +1040,7 @@
{
"data": {
"text/plain": [
"Puzzle 5.1: .0123 seconds, answer 635 correct"
"Puzzle 5.1: .0111 seconds, answer 635 correct"
]
},
"execution_count": 35,
@@ -1288,7 +1288,7 @@
{
"data": {
"text/plain": [
"Puzzle 6.2: .0064 seconds, answer 11159825706149 correct"
"Puzzle 6.2: .0065 seconds, answer 11159825706149 correct"
]
},
"execution_count": 43,
@@ -1500,7 +1500,7 @@
{
"data": {
"text/plain": [
"Puzzle 7.1: .0007 seconds, answer 1681 correct"
"Puzzle 7.1: .0011 seconds, answer 1681 correct"
]
},
"execution_count": 51,
@@ -1554,7 +1554,7 @@
{
"data": {
"text/plain": [
"Puzzle 7.2: .0014 seconds, answer 422102272495018 correct"
"Puzzle 7.2: .0020 seconds, answer 422102272495018 correct"
]
},
"execution_count": 53,
@@ -1678,7 +1678,7 @@
{
"data": {
"text/plain": [
"Puzzle 8.1: .6336 seconds, answer 24360 correct"
"Puzzle 8.1: .6055 seconds, answer 24360 correct"
]
},
"execution_count": 56,
@@ -1698,7 +1698,7 @@
"source": [
"### Part 2: What is the product of the X coordinates of the last two junction boxes connected?\n",
"\n",
"Now we are asked to connect all the boxes into one circuit, still closest first, and compute the product of the X coordinates of the final pair of boxes to be connected. Unfortunately I don't see a way to make `greedy_connect` do both parts; I'll have to copy and paste most of `greedy_connect` to form `last_connected`, which keeps connecting until all the boxes are connected into one circuit, and then returns the two boxes that made the final connection. "
"Now we are asked to connect all the boxes into one circuit, still closest first, and compute the product of the X coordinates of the final pair of boxes to be connected. Unfortunately I don't see a clean way to make the `greedy_connect` function do both parts of the puzle; I'll have to copy and paste most of `greedy_connect` to form `last_connected`, which keeps connecting until all the boxes are connected into one circuit, and then returns the two boxes that made the final connection. "
]
},
{
@@ -1718,7 +1718,8 @@
" if len(new_circuit) == len(boxes):\n",
" return (A, B)\n",
" for C in new_circuit:\n",
" circuits[C] = new_circuit"
" circuits[C] = new_circuit\n",
" raise ValueError('Should never get here.')"
]
},
{
@@ -1730,7 +1731,7 @@
{
"data": {
"text/plain": [
"Puzzle 8.2: .6273 seconds, answer 2185817796 correct"
"Puzzle 8.2: .6371 seconds, answer 2185817796 correct"
]
},
"execution_count": 58,
@@ -1748,7 +1749,7 @@
"id": "fa988909-1a8b-4e8c-aca6-c53af99bc0b6",
"metadata": {},
"source": [
"Today's puzzles had the slowest run times yet. I could perhaps make them faster by mutating sets rather than forming a new tuple for each new circuit, but I think gains from that would be small, and since the run time is still under a second, I'll leave the code as is."
"Today's puzzles had the slowest run times yet. I could perhaps make them faster by mutating circuits rather than forming a new tuple for each new circuit, or by using a Union-Find data structure, but I think gains from that would be small, and since the run time is still under a second, I'll leave the code as is."
]
},
{
@@ -1846,7 +1847,7 @@
{
"data": {
"text/plain": [
"Puzzle 9.1: .0272 seconds, answer 4772103936 correct"
"Puzzle 9.1: .0278 seconds, answer 4772103936 correct"
]
},
"execution_count": 61,
@@ -1996,7 +1997,7 @@
},
{
"cell_type": "code",
"execution_count": 95,
"execution_count": 65,
"id": "922d721e-5330-466b-8a54-8a273522c44d",
"metadata": {},
"outputs": [],
@@ -2095,7 +2096,7 @@
},
{
"cell_type": "code",
"execution_count": 94,
"execution_count": 68,
"id": "d3b44691-da52-4794-ab77-bc4326aa6ca2",
"metadata": {},
"outputs": [
@@ -2103,8 +2104,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 1.56 s, sys: 4.09 ms, total: 1.56 s\n",
"Wall time: 1.56 s\n"
"CPU times: user 2.33 s, sys: 166 ms, total: 2.5 s\n",
"Wall time: 1.6 s\n"
]
},
{
@@ -2113,7 +2114,7 @@
"True"
]
},
"execution_count": 94,
"execution_count": 68,
"metadata": {},
"output_type": "execute_result"
}
@@ -2237,7 +2238,7 @@
{
"data": {
"text/plain": [
"Puzzle 10.1: .0569 seconds, answer 441 correct"
"Puzzle 10.1: .0535 seconds, answer 441 correct"
]
},
"execution_count": 71,
@@ -2389,7 +2390,7 @@
{
"data": {
"text/plain": [
"Puzzle 10.2: .1128 seconds, answer 18559 correct"
"Puzzle 10.2: .1304 seconds, answer 18559 correct"
]
},
"execution_count": 75,
@@ -2651,7 +2652,7 @@
{
"data": {
"text/plain": [
"Puzzle 11.2: .0016 seconds, answer 306594217920240 correct"
"Puzzle 11.2: .0017 seconds, answer 306594217920240 correct"
]
},
"execution_count": 83,
@@ -2753,7 +2754,7 @@
"source": [
"### Part 1: How many of the regions can fit all of the presents listed?\n",
"\n",
"There have been Tetris-like puzzles in past AoC years. Is this another search problem? If so, will the searches be fast or slow? I want to get a feel for it. First, how many regions?"
"There have been Tetris-like puzzles in past AoC years. Is this another search problem? If so, will the searches be trivial or difficult? I want to get a feel for it. First, how many regions?"
]
},
{
@@ -2840,7 +2841,7 @@
"id": "af3fe830-aa75-469b-9046-3f36ff3a03e2",
"metadata": {},
"source": [
"Next I want to get a feel for the variation in how tight the packing is. Each present can definitely fit into a 3x3 square, so what's the ratio of the total quantity of presents to the number of 3x3 squares? I'll make a histogram of that ratio for each region:"
"Next I want to get a feel for the variation in how tight the packing is. Each present can definitely fit into a 3x3 square, so what's the ratio of the total quantity of presents to the number of 3x3 squares? I'll make a histogram of that ratio, which I'll call the occupancy ratio, for each region:"
]
},
{
@@ -2862,7 +2863,7 @@
],
"source": [
"def squares(width, length) -> int: \n",
" \"Number of full 3x3 squares in a region.\"; \n",
" \"Number of full 3x3 squares in a region.\"\n",
" return (width // 3) * (length // 3)\n",
" \n",
"occupancy_ratios = [sum(quantities) / squares(W, L) \n",
@@ -2876,9 +2877,9 @@
"id": "54f2eaa0-b8b6-4a60-bcde-28eeefa26e8c",
"metadata": {},
"source": [
"**Very interesting!** There's a real split. A lot of regions have an occupabncy ratio below 1.0 and thus are trivially easy to fit into the region, and the rest of the regions with occupancy ratios above 1.35 may well be impossible to fit. I say that because, just looking at the shapes, I estimate that the most you could overlap a shape onto another would be to save two \".\" squares; , so I could see getting to an occupancy rato of 1 + 2/9 = 1.22, but I don't think it is possible to get to 1.35. I can prove it is impossible to fit all the presents in a region if the total area of the solid parts of the presents (the '#' squares) is more than the area of the region (the width times length). \n",
"**Very interesting!** There's a real split. About half the regions have an occupancy ratio below 1.0 and thus are trivially easy to fit into the region: just plop onee present into each 3x3 square, without worrying about rotations. The rest of the regions with occupancy ratios above 1.35 may well be impossible to fit. I say that because, just looking at the shapes, I estimate that the average overlap of one shape with another would be to save two \".\" squares; so I could see getting to an occupancy rato of 1 + 2/9 = 1.22, but I don't think it is possible to get to 1.35. I can prove it is impossible to fit all the presents in a region if the total area of the solid parts of the presents (the '#' squares) is more than the total area of the region (the width times length). \n",
"\n",
"I can do triage on the regions to classify each one as a trivial fit, an impossible fit, or uncertain:"
"I can do **triage** on the regions to classify each one as a trivial fit, an impossible fit, or an uncertain fit (for which we would have to do a search):"
]
},
{
@@ -2888,14 +2889,14 @@
"metadata": {},
"outputs": [],
"source": [
"shape_area = [cat(shape).count('#') for shape in shapes] # Total number of '#' in each shape\n",
"shape_area = [cat(shape).count('#') for shape in shapes] # List of total number of '#' in each shape\n",
"\n",
"def triage(region) -> str:\n",
" \"\"\"Decide if a region's presents trivially fit, or are impossible to fit, or it is uncertain.\"\"\"\n",
" width, length, quantities = region\n",
" presents_area = sum(q * shape_area[i] for (i, q) in enumerate(quantities))\n",
" if sum(quantities) <= squares(width, length):\n",
" return 'fit' # The number of presents is no more than the number of 3x3 squares\n",
" return 'fit' # The total number of presents is no more than the number of 3x3 squares\n",
" elif presents_area > width * length:\n",
" return 'impossible' # The '#' area of all the presents is greater than the area of the region\n",
" else:\n",
@@ -2982,42 +2983,34 @@
"output_type": "stream",
"text": [
"Puzzle 1.1: .0005 seconds, answer 1182 correct\n",
"Puzzle 1.2: .0010 seconds, answer 6907 correct\n",
"Puzzle 2.1: .0027 seconds, answer 23560874270 correct\n",
"Puzzle 2.2: .0038 seconds, answer 44143124633 correct\n",
"Puzzle 3.1: .0007 seconds, answer 17085 correct\n",
"Puzzle 1.2: .0009 seconds, answer 6907 correct\n",
"Puzzle 2.1: .0029 seconds, answer 23560874270 correct\n",
"Puzzle 2.2: .0037 seconds, answer 44143124633 correct\n",
"Puzzle 3.1: .0006 seconds, answer 17085 correct\n",
"Puzzle 3.2: .0021 seconds, answer 169408143086082 correct\n",
"Puzzle 4.1: .0572 seconds, answer 1569 correct\n",
"Puzzle 4.2: .1461 seconds, answer 9280 correct\n",
"Puzzle 5.1: .0123 seconds, answer 635 correct\n",
"Puzzle 4.2: .1446 seconds, answer 9280 correct\n",
"Puzzle 5.1: .0111 seconds, answer 635 correct\n",
"Puzzle 5.2: .0002 seconds, answer 369761800782619 correct\n",
"Puzzle 6.1: .0025 seconds, answer 5877594983578 correct\n",
"Puzzle 6.2: .0064 seconds, answer 11159825706149 correct\n",
"Puzzle 7.1: .0007 seconds, answer 1681 correct\n",
"Puzzle 7.2: .0014 seconds, answer 422102272495018 correct\n",
"Puzzle 8.1: .6336 seconds, answer 24360 correct\n",
"Puzzle 8.2: .6273 seconds, answer 2185817796 correct\n",
"Puzzle 9.1: .0272 seconds, answer 4772103936 correct\n",
"Puzzle 6.2: .0065 seconds, answer 11159825706149 correct\n",
"Puzzle 7.1: .0011 seconds, answer 1681 correct\n",
"Puzzle 7.2: .0020 seconds, answer 422102272495018 correct\n",
"Puzzle 8.1: .6055 seconds, answer 24360 correct\n",
"Puzzle 8.2: .6371 seconds, answer 2185817796 correct\n",
"Puzzle 9.1: .0278 seconds, answer 4772103936 correct\n",
"Puzzle 9.2: .0156 seconds, answer 1529675217 correct\n",
"Puzzle 10.1: .0569 seconds, answer 441 correct\n",
"Puzzle 10.2: .1128 seconds, answer 18559 correct\n",
"Puzzle 10.1: .0535 seconds, answer 441 correct\n",
"Puzzle 10.2: .1304 seconds, answer 18559 correct\n",
"Puzzle 11.1: .0003 seconds, answer 574 correct\n",
"Puzzle 11.2: .0016 seconds, answer 306594217920240 correct\n",
"Puzzle 11.2: .0017 seconds, answer 306594217920240 correct\n",
"Puzzle 12.1: .0018 seconds, answer 454 correct\n",
"\n",
"Time in seconds: sum = 1.715, mean = .075, median = .003, max = .634\n"
"Time in seconds: sum = 1.710, mean = .074, median = .003, max = .637\n"
]
}
],
"source": [
"def summary(answers: dict):\n",
" \"\"\"Summary report on the answers.\"\"\"\n",
" for day in sorted(answers):\n",
" print(answers[day])\n",
" times = [answer.secs for answer in answers.values()]\n",
" def stat(fn, times): return f'{fn.__name__} = {fn(times):.3f}'\n",
" print('\\nTime in seconds:', ', '.join(_zap0(stat(fn, times)) for fn in (sum, mean, median, max)))\n",
" \n",
"summary(answers)"
]
},
@@ -3026,14 +3019,16 @@
"id": "1098c017-2746-403c-bfb0-1a08cacc835d",
"metadata": {},
"source": [
"I solved all the puzzles and they run in under 2 seconds of total run time. Happy Advent everyone, and thank you Eric for the interesting puzzles!\n",
"I solved all the puzzles and they run in under 2 seconds of total run time. \n",
"\n",
"Happy Advent everyone, and thank you [Eric Wastl](https://was.tl/) for the interesting puzzles!\n",
"\n",
"<p></p>\n",
"\n",
"<p></p>\n",
"\n",
"<img src=\"https://files.mastodon.social/media_attachments/files/115/708/315/279/158/735/original/77b18cf6b3fba015.jpg\" width=500 alt=\"Gary Grady cartoon\">\n",
"<a href=\"https://mastodon.social/@garygrady\">Gary Grady @GaryGrady</a>\n"
"And thank you <a href=\"https://mastodon.social/@garygrady\">Gary Grady @GaryGrady</a> for the drawings!\n"
]
}
],