"*I enjoy doing the [**Advent of Code**](https://adventofcode.com/) (AoC) programming puzzles, and writing down [**my solutions**](Advent2025.ipynb). In order to understand each puzzle, you'll have to look at the problem descriptions at [**Advent of Code**](https://adventofcode.com/2025) for each [**Day**](https://adventofcode.com/2025/day/1), and you can also look at [my solutions](Advent2025.ipynb).**\n",
"\n",
"*In **this notebook** I show some solutions by various AI Large Language Models (LLMs): Gemini, Claude, and ChatGPT. All the code here is written by an LLM. My comments (like this one) are in italics, and my prompts given to the AI LLMs are in **bold italics**. Sometimes I quote the LLM's non-code responses; those are in* regular roman font.\n",
"\n",
"*This notebook shows some solutions by Gemini, Claude, and ChatGPT, three AI Large Language Models (LLMs), for the\n",
"2025 [**Advent of Code**](https://adventofcode.com/) (AoC) programming puzzles. In order to understand each puzzle, you'll have to look at the problem descriptions at [**Advent of Code**](https://adventofcode.com/2025) for each [**Day**](https://adventofcode.com/2025/day/1), and you can also look at [**my solutions**](Advent2025.ipynb), which I did before turning to the LLMs.*\n",
"\n",
"*All the code in this notebook is written by an LLM (except for the one line where I call the LLM's code for each puzzle). My comments (like this one) are in italics, and my prompts given to the LLMs are in **bold italics**. Sometimes I quote the LLM's responses; those are in* regular roman font.\n",
"\n",
"*Each day I'll choose a model and give it the prompt* \"***Write code to solve the following problem:***\" *followed by the full text of the AoC Part 1 problem description. Then I'll prompt again with* \"***There is a change to the specification:***\" *followed by the AoC Part 2 description.*\n",
"\n",
"*For brevity, I have removed some of the LLM output, such as:*\n",
"- *Prose analysis of the problem and descriptions of the programs. (In most cases these were accurate and thorough!)*\n",
"- *The \"`#!/usr/bin/env python3`\" at the top of a file*\n",
"- *The \"`__main__`\" idiom for running code from the command line*\n",
"- *Usage examples on how to run the program on test input*\n",
"# Conclusions\n",
"\n",
"*Now that the 12 days are finished, here are my conclusions:*\n",
"\n",
"- *Overall, the LLMs did very well, producing code that gives the correct answer to every puzzle.*\n",
"- *The run time were reasonably fast, all under a second, except for 12.1, which took about 3 minutes.*\n",
"- *The three LLMS seemed to be roughly equal in quality.*\n",
"- *The LLMs knew the things you would want an experienced engineer to know, and applied them at the right time:*\n",
" - *How to see through the story about elves and christmas trees, etc. and getting to the real programming issues*\n",
" - *O*(*n*) *reasoning (such as knowing that enumerating up to √n is better than up to n)*\n",
" - *stacks and queues*\n",
" - *sorting and merging*\n",
" - *when to use sets versus lists*\n",
" - *handling a 2D grid of points with 4 or 8 directional neighbors*\n",
" - *accumulating sums in a defaultdict or Counter*\n",
" - *advanced data structures such as Union-Find and dancing links*\n",
" - *computational geometry algorithms including scantiness, flood fill, and ray-casting*\n",
" - *recognizing an integer linear programming problem and knowing how to call a package*\n",
" - *depth-first search, and recognizing search properties such as commutativity of actions*\n",
" - *data classes*\n",
" - *sometimes type annotations (but not always)*\n",
" - *sometimes good doc strings and comments (but not always, and sometimes too many comments).*\n",
"- *Problems 9.2 and 12.1 had a \"trick\" that allowed for a simpler, faster solution. For these I gave ChatGPT my input file, so it would have a chance of finding the trick. It didn't, but perhaps it was trying to solve the general problem over all possible inputs, whereas I understood that if AoC persents a trick input to me, they will present the same trick input to everyone.*\n",
"- *Much of the code (from all 3 LLMs) could be improved stylistically. In many cases the code was one long function that does the parsing of input, intermediate processing, and final results, clearly violating Robert Maartin's maxim of \"Functions should do one thing.\" But maybe if you're vibe coding and not even looking at the code produced by the LLM, this doesn't matter?*\n",
"- *The LLMs produced code that was a lot more verbose than mine; their lines-of-code count is about 5 times mine. There are a few reasons:*\n",
" - *I benefited from a few key utility functions to do things like \"return a tuple of all the integers in a text string.\" For some problems I could parse the input in 2 lines of code, while the LLM would take 20.*\n",
" - *The LLMs were being extra robust in doing error checking, while I recognized that within the bounds of AoC the input will always follow the prescribed format exactly.*\n",
" - *I use a functional style; the LLMs were imperative. So my code was `sum(int(joltage(bank)) for bank in banks)` whereas ChatGPT's was:*\n",
"\n",
"# Day 0\n",
" total = 0\n",
" for line in input_text.strip().splitlines():\n",
" line = line.strip()\n",
" if not line:\n",
" continue\n",
" total += max_joltage_for_bank(line)\n",
" return total\n",
"\n",
"***Note:*** *For brevity, I have removed some of the LLM output, such as:*\n",
"- *Usage examples on how to run the program on the test input*.\n",
"- *Prose analysis of the problem, descriptions of the programs, and chain of thought thinking.*\n",
" - *In most cases these were accurate and thorough! But they were too long to include here.*\n",
"- *The \"`#!/usr/bin/env python3`\" and \"`__main__`\" idioms and `sys.stdin.readline()` for command line execution.*\n",
" - *In retrospect, I should have specified in the prompt that \"**You can get the input text as a single string with `get_text()`***\n",
"\n",
"# Day 0: Human\n",
"\n",
"*On Day 0 I load my [**AdventUtils.ipynb**](AdventUtils.ipynb) and set the`current_year` so I can access my input files with `get_text(day_number)` and can use my `answer` function to verify whether the AI systems get the right answer.*"
"*On Day 0 I load my [**AdventUtils.ipynb**](AdventUtils.ipynb) so I can access two of my utilities:*\n",
"- `get_text(day)` returns the complete text of the input file for `current_year` and `day`\n",
"- `answer` verifies whether the LLM got the right answer to the problem, and records the run time.*"
"*For Day 1, **Part 1** I started with the Gemini 3 Pro Fast model, which produced this code:*"
"*The [Day 1 **Part 1**](https://adventofcode.com/2025/day/1) puzzle is about turning the dial on a safe and counting how many times the pointer ends up at 0.*\n",
"\n",
"*I started with the Gemini 3 Pro Fast model, which produced this code:*"
]
},
{
@@ -60,7 +101,7 @@
"metadata": {},
"outputs": [],
"source": [
"def solve_safe_password(rotations):\n",
"def solve_safe_password(input_str):\n",
" \"\"\"\n",
" Simulates the safe dial rotations and counts how many times the dial\n",
" lands on 0.\n",
@@ -71,6 +112,10 @@
" Returns:\n",
" The total number of times the dial points at 0 after a rotation.\n",
" \"\"\"\n",
" \n",
" # Split the single string into individual lines\n",
" rotations = input_str.strip().split('\\n')\n",
" \n",
" # The dial starts at 50\n",
" current_position = 50\n",
" zero_count = 0\n",
@@ -118,7 +163,7 @@
"id": "c5219309-4f2f-49b3-8208-8c7a3526e328",
"metadata": {},
"source": [
"*A bit verbose and over-commented, but pretty nice. Below I fetch the Day 1 data with `get_text(1)`, split it into lines (as expected by Gemini's `solve_safe_password` function), and verify that the answer is correct:*"
"*A bit verbose and over-commented (and under-type-annotated), but pretty nice. Below I fetch my Day 1 data with `get_text(1)`, pass it to `solve_safe_password`, and verify that Gemini's answer is correct:*"
"*For **Part 2** I gave Gemini Fast the updated instructions, and again it had a nice analysis and the following code:*"
"*For **Part 2** I gave Gemini Fast the updated instructions (that we should count a zero crossing in the middle of a rotation), and again it had a nice analysis and the following code:*"
]
},
{
@@ -162,7 +207,7 @@
"source": [
"import math\n",
"\n",
"def solve_new_safe_password(rotations):\n",
"def solve_new_safe_password(input_str):\n",
" \"\"\"\n",
" Simulates the safe dial rotations and counts every time the dial\n",
" points at 0, including during the rotation movement.\n",
@@ -173,6 +218,10 @@
" Returns:\n",
" The total number of times the dial points at 0.\n",
" \"\"\"\n",
" \n",
" # Split the single string into individual lines\n",
" Simulates the safe dial rotations and counts every time the dial\n",
" points at 0 (Method 0x434C49434B).\n",
" \n",
" Correctly handles cases where the dial starts at 0.\n",
" \"\"\"\n",
" \n",
" # Split the single string into individual lines\n",
" rotations = input_str.strip().split('\\n')\n",
" \n",
" current_position = 50\n",
" total_zero_crossings = 0\n",
" MODULO = 100\n",
@@ -356,7 +409,7 @@
{
"data": {
"text/plain": [
"Puzzle 1.2: .0015 seconds, answer 6907 correct"
"Puzzle 1.2: .0018 seconds, answer 6907 correct"
]
},
"execution_count": 7,
@@ -422,7 +475,7 @@
"id": "2bd0db00-952b-47e5-b787-b3887b7539f1",
"metadata": {},
"source": [
"*This code is overall rather nice, but conspicously lacks comments and doc strings. It uses the more efficient \"enumerate over the first half of the digit string\" strategy, but is not precise in narrowing down the range it enumerates over. For example, for the range \"999000-109000\", this code will enumerate the range (100, 1000), when it could enumerate just the range (999, 1000).*\n",
"*This code is overall rather nice, but conspicously lacks comments and doc strings. (If you can't tell, the goal is to count the number of invalid IDs, which are numbers that consist of the same digit string repeated twice, like \"100100\", within some ID ranges.) It uses the more efficient \"enumerate over the first half of the digit string\" strategy, but is not precise in narrowing down the range it enumerates over. For example, for the range \"999000-109000\", this code will enumerate the range (100, 1000), when it could enumerate just the range (999, 1000).*\n",
"\n",
"*I verified that the code gives the correct answer:*"
"*When given the **Part 2** instructions, Claude wrote the following code:*"
"*When given the **Part 2** instructions, Claude wrote the following code (where invalid IDs now can be any number of repetitions, like \"100100100\"):*"
"*For **Part 1** ChatGPT gave a very brief analysis of the problem and then wrote a program that was designed to be called from the command line, using the `\"__main__\"` idiom. I told it \"**I don't want to run it like that, I want a function that I can pass the input text and have it return an int**\" and it produced this code (conspicuously lacking comments or doc strings):*"
"*The puzzle today is to pick the biggest two-digit number from a string of digits, like \"87\" from \"8675305\". Return the sum over all digit strings.*\n",
"\n",
"*For **Part 1** ChatGPT gave a very brief analysis of the problem and produced this code (conspicuously lacking comments or doc strings):*"
]
},
{
@@ -579,7 +634,7 @@
"id": "6c2da1ae-d915-4f37-9494-bbe427ef87ad",
"metadata": {},
"source": [
"I verified this is correct:"
"*I wish it had made a separate function to parse the input, and I don't know why it made `max_joltage_for_bank` an inner function (remember: `import this` says flat is better than nested), but I verified the code is correct:*"
]
},
{
@@ -591,7 +646,7 @@
{
"data": {
"text/plain": [
"Puzzle 3.1: .0041 seconds, answer 17085 correct"
"Puzzle 3.1: .0043 seconds, answer 17085 correct"
]
},
"execution_count": 13,
@@ -600,10 +655,10 @@
}
],
"source": [
"banks = get_text(3)\n",
"input_text = get_text(3)\n",
"\n",
"answer(3.1, 17085, lambda:\n",
" total_output_joltage(banks))"
" total_output_joltage(input_text))"
]
},
{
@@ -611,7 +666,9 @@
"id": "4a07f37f-c5e3-4484-a7b1-2cae0ff5bd01",
"metadata": {},
"source": [
"*For **Part 2** ChatGPT did well (and for some reason included comments and doc strings this time):*"
"*For **Part 2** *we want a 12-digit number, not 2-digit.**\n",
"\n",
"*ChatGPT did well (and for some reason included comments and doc strings, which were missing the first time):*"
"*Gemini produced a solution to **Part 1** that is straightforward and efficient, although perhaps could use some abstraction (e.g. if they had a function to count neighbors, they wouldn't need the `continue`).*"
"*We are given a 2D map and asked how many squares have a \"@\" that is surrounded by fewer than 4 other \"@\" out of 8 neighbors.*\n",
"\n",
"*Gemini produced a solution to **Part 1** that is straightforward and efficient, although perhaps could use some abstraction (e.g. if they had a function to count neighbors, they wouldn't need the `continue` in the main loop).*"
]
},
{
@@ -755,7 +814,9 @@
"id": "1e12bc4c-8cc8-4c01-b4ad-5392b49642e6",
"metadata": {},
"source": [
"Here I verify that the code does produce the correct answer:"
"*I really like that the 8 `directions` are laid out in a 2D pattern that corresponds to where the directions actually go.*\n",
"\n",
"*Here I verify that the code does produce the correct answer:*"
]
},
{
@@ -767,7 +828,7 @@
{
"data": {
"text/plain": [
"Puzzle 4.1: .0160 seconds, answer 1569 correct"
"Puzzle 4.1: .0174 seconds, answer 1569 correct"
]
},
"execution_count": 17,
@@ -776,10 +837,10 @@
}
],
"source": [
"grid = get_text(4)\n",
"grid_str = get_text(4)\n",
"\n",
"answer(4.1, 1569, lambda:\n",
" count_accessible_rolls(grid))"
" count_accessible_rolls(grid_str))"
]
},
{
@@ -787,6 +848,8 @@
"id": "76845e18-6209-4995-a825-568782b79d0d",
"metadata": {},
"source": [
"*Now we're asked if we repeatedly remove \"@\" with fewer than 4 neighbors, how many are left in the end?*\n",
"\n",
"*For **Part 2**, Gemini chose the \"repeatedly remove acessible rolls\" approach rather than the more efficient (and arguable more elegant) \"remove a roll and check its neighbors\" approach.*"
]
},
@@ -860,7 +923,7 @@
{
"data": {
"text/plain": [
"Puzzle 4.2: .3818 seconds, answer 9280 correct"
"Puzzle 4.2: .4197 seconds, answer 9280 correct"
]
},
"execution_count": 19,
@@ -870,7 +933,7 @@
],
"source": [
"answer(4.2, 9280, lambda:\n",
" count_total_removable_rolls(grid))"
" count_total_removable_rolls(grid_str))"
]
},
{
@@ -880,7 +943,7 @@
"source": [
"*I asked Gemini \"Can you do it more efficiently\" and it replied*\n",
"\n",
"The previous implementation scanned the entire grid repeatedly (`O(Rows * Cols * Iterations)`). I've replaced it with a Worklist/Queue algorithm that checks neighbors only when necessary. This reduces the complexity to be roughly proportional to the number of rolls, which is significantly faster for large grids.\n",
"The previous implementation scanned the entire grid repeatedly (`O(Rows * Cols * Iterations)`). I've replaced it with a Worklist/Queue algorithm that checks neighbors only when necessary. This reduces the complexity to be roughly proportional to the number of rolls, which is significantly faster for large grids.\n",
"\n",
"*This is the same approach that I used in my solution.*"
]
@@ -969,7 +1032,7 @@
{
"data": {
"text/plain": [
"Puzzle 4.2: .0884 seconds, answer 9280 correct"
"Puzzle 4.2: .1065 seconds, answer 9280 correct"
]
},
"execution_count": 21,
@@ -979,7 +1042,7 @@
],
"source": [
"answer(4.2, 9280, lambda:\n",
" count_total_removable_rolls(grid))"
" count_total_removable_rolls(grid_str))"
]
},
{
@@ -989,6 +1052,8 @@
"source": [
"# [Day 5](https://adventofcode.com/2025/day/5): Claude Opus 4.5\n",
"\n",
"*We are asked how many ingredient IDs from a list of IDs are fresh, according to a list of fresh ID ranges.*\n",
"\n",
"*Claude produces a straightforward program that solves **Part 1** just fine and demonstrates good use of abstraction. This time it has nice doc strings; for Day 2 it had none. Go figure.*"
]
},
@@ -1063,7 +1128,7 @@
{
"data": {
"text/plain": [
"Puzzle 5.1: .0050 seconds, answer 635 correct"
"Puzzle 5.1: .0064 seconds, answer 635 correct"
]
},
"execution_count": 23,
@@ -1083,7 +1148,7 @@
"id": "dabe3fbc-0fa6-46d0-adfb-f6413f3a63d5",
"metadata": {},
"source": [
"*For **Part 2**, Claude chose to sort ranges, then merge them, then count the total of the sizes of the ranges. That's a reasonable approach–it involves an extra step of merging ranges (instead of just counting on the fly) but it could be useful to have the merged ranges.*"
"*For **Part 2**, we are asked how many possible fresh IDs are there, whether they are in inventory or not. Claude chose to sort ranges (good choice), then merge them, then count the total of the sizes of the ranges. That's a reasonable approach–it involves an extra step of merging ranges (instead of just counting on the fly) but it could be useful to have the merged ranges.*"
"*We are asked to solve some math problems written in an unusal format (vertical instead of horizontal, with some special rules).*\n",
"\n",
"*For **Part 1** ChatGPT produced a program that is correct, but has poor abstraction, with one long 63-line function. (It also contains a pet peeve of mine: in lines 17–20 the pattern \"`if some_boolean: True else: False`\" can always be replaced with \"`some_boolean`\".)*"
"*For **Part 2** we get an even-longer function, with even more complications. For example in \"Determine the operator\"; the 14 lines could be replaced with `op = (add if '+' in segment else mul)`. I think ChatGPT ws trying to be robust against errors in the input.*"
"*For **Part 2** we learn that the details of the format are even more ideosyncratic: the exact placement of spaces is crucial. (Here, as in each day's problem, see [my other notebook](Advent-2025.ipynb) or [the official AoC site](https://adventofcode.com/2025/day/6) for details*.)\n",
"\n",
"*ChatGPT writes an even-longer function, with even more complications. For example in \"Determine the operator\"; the 14 lines could be replaced with `op = (add if '+' in segment else mul)`. I think ChatGPT ws trying to be robust against errors in the input.*"
"*We are given a 2D grid of characters where a beam enters at the top and moves downward, but is split to both sides by a \"`^`\" character. We need to compute the total number of split beams at the bottom.*\n",
"\n",
"*Gemini's code for **Part 1** is a bit verbose, but gets the job done.*"
]
},
@@ -1526,7 +1597,7 @@
{
"data": {
"text/plain": [
"Puzzle 7.1: .0008 seconds, answer 1681 correct"
"Puzzle 7.1: .0011 seconds, answer 1681 correct"
]
},
"execution_count": 31,
@@ -1546,7 +1617,9 @@
"id": "c56012a6-b9e7-4053-84c1-79fa457436b1",
"metadata": {},
"source": [
"*For **Part 2**, Gemini gets the key ideas (although it is a bit old-fashioned to use a `defaultdict(int)` instead of a `Counter`).*"
"*For **Part 2**, we are asked how many ways can the beams get to the bottom.*\n",
"\n",
"*Gemini gets the key ideas (although it is a bit old-fashioned to use a `defaultdict(int)` instead of a `Counter`).*"
"*My friend Barry Hayes suggested I ask Gemini whose code it preferred, its or mine, So I asked Gemini: \"**How would you compare your code to this:\"***"
"*My friend Barry Hayes suggested I ask Gemini \"**How would you compare your code to this:\"***"
]
},
{
@@ -1685,7 +1758,7 @@
"id": "7a39ec7d-a27b-40ec-9d32-13f922cc764e",
"metadata": {},
"source": [
"*Here is Gemini's reply:*\n",
"*where this code is what I wrote for 7.2. Here is Gemini's reply:*\n",
"\n",
"This is a great comparison. Your code is much more concise and \"Pythonic,\" using collections.Counter and in-place updates to reduce verbosity. It strips away a lot of the boilerplate that my solution included for robustness.\n",
"\n",
@@ -1693,7 +1766,7 @@
"- **My Code**: Scans the whole grid for 'S', which handles cases where the entry point isn't at the very top.\n",
"- **Your Code**: Relies on Python's robust dictionary handling to avoid index errors (if b-1 is negative, it just adds a key to the dict). This is clever, but effectively allows particles to wrap around or exist in \"negative space\" if not careful (though typically harmless for this specific puzzle input).\n",
"\n",
"*It seems to me that the instructions make it clear that there must be an 'S' on the top line, although technically it isn't stated explicitly. (If there is no 'S' on the top line, my code will return 0, not crash.) Similarly, the puzzle implies that beams can't go outside the borders, but Gemini is right that if the input file did have a splitter in the first or last column then there could be a problem that I don't check for.*\n",
"***My rebuttal**: To me, the instructions make it clear that the 'S' must be on the top line. (If there is no 'S' on the top line, my code will return 0, not crash.) Similarly, the puzzle implies that beams can't go outside the borders, but Gemini is right that if the input file **did** have a splitter in the first or last column then there could be a problem that I don't check for.*\n",
"\n",
"*Next I asked Gemini: **Do you think your code could be improved with some more modularity/abstraction, such as having a separate function to get input?** It replied:*\n",
"\n",
@@ -1786,7 +1859,7 @@
"id": "23e17286-9534-4da1-8ec9-26a4333b8c70",
"metadata": {},
"source": [
"*I like this refactored version better. It produces the correct answer:*"
"*I like this refactored version better (although the \"`#1 #2 #3 #4`\" comments don't add much). This version produces the correct answer:*"
"# [Day 8](https://adventofcode.com/2025/day/8): Claude Opus 4.5\n",
"\n",
"*In this puzzle there is a collection of junction boxes with coordinates in 3D space. We are asked to connect them together, conneting the nerarest two first, and continuing for 1000 connections, and then report the product of the sizes of the three largest connected circuits.*\n",
"\n",
"*Here's Claude's code for **Part 1**:*"
]
},
@@ -1835,7 +1910,6 @@
"Then multiply together the sizes of the three largest circuits.\n",
"\"\"\"\n",
"\n",
"import sys\n",
"from collections import defaultdict\n",
"\n",
"def parse_input(text):\n",
@@ -1926,7 +2000,7 @@
"id": "0a3f11c5-e193-41ad-b53c-4ca3aac11fc3",
"metadata": {},
"source": [
"*Here I parse the input and hand it to Claude's function, which computes the correct answer:*"
"*Claude's function computes the correct answer:*"
"*For **Part 1**, ChatGPT wrote a `main` program that reads from input and prints the answer. I told it **Refactor to have a function that takes the points as input and returns the area** and got this:*"
"*We are given the (x, y) coordsinates of a collection of red tiles on the floor, and asked what is the largest rectangle with two red tiles as corners.*\n",
"\n",
"*For **Part 1**, I was getting tired of all the programs that have a `main` that reads from input and prints the answer, so I told ChatGPT: **Refactor to have a function that takes the points as input and returns the area** and got this:*"
"*For **Part 2**, ChatGPT considered a [scanline algorithm](https://en.wikipedia.org/wiki/Scanline_rendering), and then went for a [flood fill](https://en.wikipedia.org/wiki/Flood_fill). It assumed \"I'm working with a grid that can have dimensions up to 2k x 2k, or 4 million cells, which seems manageable.\" It would be manageable under those asumptions, but the assumptions are wrong. I told it **The grid can have dimensions up to 100,000 x 100,000 and there are about 500 red tile points**. ChatGPT then came up with [ray casting](https://en.wikipedia.org/wiki/Ray_casting) approach using [grid compression](https://stackoverflow.com/questions/29528934/coordinate-compression): define a grid where each red tile is in a separate cell, where the x and y extent of that cell spans an integer range (which must be recorded elsewhere). Now the grid is only of size roughly 500<sup>2</sup> instead of 100,000<sup>2</sup>.*"
"*For **Part 2**, we're told to consider the sequence of tiles as the vertexes of a polygon, and find the largest rectangle inside the polygon.*\n",
"\n",
"*ChatGPT considered a [scanline algorithm](https://en.wikipedia.org/wiki/Scanline_rendering), and then went for a [flood fill](https://en.wikipedia.org/wiki/Flood_fill). It assumed \"I'm working with a grid that can have dimensions up to 2k x 2k, or 4 million cells, which seems manageable.\" ChatGPT is correct that it would be manageable under those asumptions, however the assumptions are wrong. I told it **The grid can have dimensions up to 100,000 x 100,000 and there are about 500 red tile points**. I also gave it my input file, so that it would have a chance of discovering the structure of thChatGPT then came up with a [ray casting](https://en.wikipedia.org/wiki/Ray_casting) approach using [grid compression](https://stackoverflow.com/questions/29528934/coordinate-compression): define a grid where each red tile is in a separate cell, where the x and y extent of that cell spans an integer range (which must be recorded elsewhere). Now the grid is only of size roughly 500<sup>2</sup> instead of 100,000<sup>2</sup>.*"
"*We are given some descriptions of machines; see [AoC Day 10](https://adventofcode.com/2025/day/10) or [my other notebook](Advent-2025.ipynb) for details.*\n",
"\n",
"*Gemini had no problem with **Part 1:***"
]
},
{
@@ -2512,7 +2601,7 @@
{
"data": {
"text/plain": [
"Puzzle 10.1: .0065 seconds, answer 441 correct"
"Puzzle 10.1: .0044 seconds, answer 441 correct"
]
},
"execution_count": 46,
@@ -2521,8 +2610,10 @@
}
],
"source": [
"input_text = get_text(10)\n",
"\n",
"answer(10.1, 441, lambda:\n",
" solve(get_text(10)))"
" solve(input_text))"
]
},
{
@@ -2530,7 +2621,7 @@
"id": "f407a27f-f1ac-4c4a-bd46-649449c4dbf1",
"metadata": {},
"source": [
"*Part 2was trickier. Gemini's first solution used z3 (it even gave nice instructions for how to pip install z3), but I responded with the prompt **Can you do it without using z3?** to which Gemini wrote its own Gaussian elimination code:*"
"***Part 2*** *was trickier. Gemini's first solution used z3 (it even gave nice instructions for how to pip install z3), but I responded with the prompt **Can you do it without using z3?** to which Gemini wrote its own Gaussian elimination code:*"
]
},
{
@@ -2725,7 +2816,7 @@
"id": "89366a12-507d-4730-9be9-df757bb999c6",
"metadata": {},
"source": [
"*The part that says `if not free_cols: search(0, []) else: search(0, [])` is a bit unsettling, and I'm not a fan of `nonlocal` in this context, but the code works; the only downside is that it takes 10 seconds to run.*"
"*The part that says `if not free_cols: search(0, []) else: search(0, [])` is a bit unsettling, and I'm not a big fan of `nonlocal` in this context, but the code works; the only downside is that it takes about 10 seconds to run.*"
" Since SciPy version 1.9.0, there is a dedicated Mixed-Integer Linear Programming (MILP) solver called scipy.optimize.milp\n",
" \n",
"*which was something I had just figured out an hour or so earlier. The milp solution is faster:*"
"*which was something I had just figured out an hour or so earlier. The C++ milp solution is (unsurpringly) faster than the Python Gaussian elimination code:*"
"*I was wondering if Gemini had any ideas that would make a search-based approach feasible, so I prompted **Can you solve it with search?** and Gemini gave a depth-first search function that correctly solves small machines. Gemini identified the idea that operators are commutative, and applied the ordering of \"always press a button that affects the first light that is not at the goal.\" Gemini confidently states*\n",
"*I had been unable to make a search-based approach feasible, and was wondering if Gemini had any good ideas, so I prompted **Can you solve it with search?** and Gemini gave a depth-first search function that correctly solves small machines. Gemini correctly identified the idea that operators are commutative, and applied the ordering of \"always press a button that affects the first light that is not at the goal.\" Gemini confidently states*\n",
"\n",
" This drastically cuts down the search tree, making it solvable in milliseconds for typical puzzle inputs.\n",
"\n",
"*and it is true that the idea cuts down the search tree, but it still takes seconds, not milliseconds for even moderate size machines, and would probably take hours for some of the machines in the input. So in the end neither I nor Gemini can solve this efficiently with search, but both of us can figure out how to call **milp** to solve it.*"
"*and it is true that the idea cuts down the search tree, but it still takes seconds, not milliseconds for even moderate size machines, and would probably take hours for some of the machines in the input. So in the end neither I nor Gemini can solve this efficiently with search, but both of us can figure out how to call **milp** to solve it (and Gemini's z3 code probably also works).*"
]
},
{
@@ -3009,12 +3100,14 @@
"source": [
"# [Day 11](https://adventofcode.com/2025/day/11): Claude Opus 4.5\n",
"\n",
"*We are given inputs like `qxn: mow khk`, whihc means that device `qxn` outputs to `mow` and `khk`, and are asked how many distinct output paths there are from the device named `you` to the device named `out`.*\n",
"\n",
"*Claude had no trouble solving **Part 1**. It even volunteered two possible implementations of `count_paths`. One thing was strange:*\n",
"\n",
" # Convert to frozenset for caching\n",
" graph_tuple = {k: tuple(v) for k, v in graph.items()}\n",
"\n",
"*I can see where this is coming from: lists are not hashable, so you can't have them in a cache, but tuples are cachable. However, a dict is not hashable, so this doesn't help. In the end, Claude decided to use the inner function `dp`, so the graph is not in the cache anyway, and this step is not needed. Maybe Claude changed its mind and forgot to update this part of the code.*"
"*I can see where this is coming from: lists mutable and thus not hashable, so you can't have them in a cache, but frozensets and tuples are immutable and cachable. Maybe Claude was first going to convert `v` to a frozenset and then decided on a tuple? However, a dict is not hashable, so this doesn't help. In the end, Claude decided to use the inner function `dp`, so the graph is not in the cache anyway, and this step is not needed. Maybe Claude changed its mind and forgot to update this part of the code.*"
]
},
{
@@ -3138,7 +3231,7 @@
"id": "76301e25-9c6b-41dd-8390-3eac05f3da38",
"metadata": {},
"source": [
"*For **Part 2** Claude again gets it right:*"
"*For **Part 2** we are asked how many paths from `svr` to `out` visit both `dac` and `fft`, and Claude again gets it right:*"
]
},
{
@@ -3148,13 +3241,11 @@
"metadata": {},
"outputs": [],
"source": [
"#!/usr/bin/env python3\n",
"\"\"\"\n",
"Solution for the device path counting puzzle (Part 2).\n",
"Count all paths from 'svr' to 'out' that visit both 'dac' and 'fft'.\n",
"*In my prompt I included my actual input, because that is key to the shortcut for solving the problem (which I covered in [**my notebook**](Advent-2025.ipynb)). ChatGPT didn't detect the shortcut and wrote code to rotate the shapes and search through possible placements. It did have the check for `total_area > W * H`, so it is able to efficiently reject the regions with too many presents. But for the regions where there is a trivial fit into 3x3 squares, ChatGPT's code still tries to pack them in tightly rather than doing the simple layout.* "
"***Note:*** *ChatGPT has a new version now: 5.2. I don't notice a big difference from 5.1, but I only did this one interaction.*\n",
"\n",
"*We are given some 3x3 grids describing the shapes of some oddly-shaped Christmas presents, thena re given some regions with given width and length, and asked if a specified number of presents of each kind can fit in the region.*\n",
"\n",
"*In my prompt I included my actual input, because that is key to the shortcut for solving the problem (which I covered in [**my notebook**](Advent-2025.ipynb)). ChatGPT didn't detect the shortcut and wrote code to rotate the shapes and search through possible placements. ChatGPT did have the check for `total_area > W * H`, so it is able to instantly reject the regions with too many presents (about half of them). But for the regions where there is a trivial fit into 3x3 squares, ChatGPT's code still tries to pack them in tightly rather than doing the simple layout.* "
"# Summary, Comparison to Human Coder, Conclusions\n",
"# Summary of Run Times\n",
"\n",
"*Overall, the models did well, producing code that gives the correct answer in a reasonable run time. Some of the code could be improved stylistically. (But I guess if you're vibe coding and not looking at the code, maybe that doesn't matter.)*\n",
"|8.2 | .835 | .618| Claude|Easy, but LLMs Union-Find data type runs slower than simple approach.|\n",
"|9.1 | .027 | .037 | ChatGPT|Easy\n",
"| 9.2 | .771 | .016 | ChatGPT|LLM code a bit complicated; human uses “2 point” trick for speedup|\n",
"| 10.1 | .005 | .001 |Gemini | Easy|\n",
"| 10.2 | .120 | .112 | Gemini | milp solutions similar; LLM offers other solutions; neither does search efficiently|\n",
"| 11.1 | .023 | .001 | Claude|Easy; LLM has a bit of vestigial code|\n",
"| 11.2 | .001 | .001 | Claude|Easy|\n",
"| 12.1 | 195.396 | .002 | ChatGPT|Human saw shortcut to avoid search; LLM wrote search functions|\n",
"| TOTAL| 198.450 | 1.715| all | Human-written code is a bit faster, and much faster on 12.1 |"
"*The LLM lines-of-code count is 5 times the human count. The LLM run times are roughly double the human-written run times, if we throw out 12.1, where the human noticed the trick and the LLM didn't. But all the solutions run in under a second, so run time is not a big deal.*"
]
},
{
"cell_type": "markdown",
"id": "49736354-51c0-4870-bc39-f7a2e955e196",
"metadata": {},
"source": [
" DAY LLM Human LLM Human\n",
" #\tLLM Time\tTime LOC LOC\t\n",
" --- ------ ---- ----- --- -----\n",
" 1.1\tGemini\t.001\t.001\t51\t6\tStraightforward and easy for LLM and human.\n",
" 1.2\tGemini\t.002\t.001\t75\t11\tBoth LLM and human erred on the distance from 0 to 0.\n",
" 2.1\tClaude\t.121\t.003\t29\t17\tEasy\n",
" 2.2\tClaude\t.135\t.004\t35\t16\tBoth LLM and human found the more efficient half-digits approach\n",
" 3.1\tChatGPT\t.004\t.001\t22\t11\tEasy\n",
" 3.2\tChatGPT\t.007\t.002\t42\t14\tEasy\n",
" 4.1\tGemini\t.016\t.055\t44\t9\tEasy\n",
" 4.2\tGemini\t.379\t.139\t52\t8\tLLM chose the less efficient scan-whole-grid approach\n",
" 5.1\tClaude\t.005\t.012\t45\t11\tEasy\n",
" 5.2\tClaude\t.001\t.001\t58\t9\tEasy\n",
" 6.1\tChatGPT\t.008\t.002\t67\t7\tEasy; bad “if x: True else: False” idiom by LLM\n",
"### Part 2: What is the product of the X coordinates of the last two junction boxes connected?\n",
"\n",
"Now we are asked to connect all the boxes into one circuit, still closest first, and compute the product of the X coordinates of the final pair of boxes to be connected. Unfortunately I don't see a way to make `greedy_connect` do both parts; I'll have to copy and paste most of `greedy_connect` to form `last_connected`, which keeps connecting until all the boxes are connected into one circuit, and then returns the two boxes that made the final connection. "
"Now we are asked to connect all the boxes into one circuit, still closest first, and compute the product of the X coordinates of the final pair of boxes to be connected. Unfortunately I don't see a clean way to make the `greedy_connect` function do both parts of the puzle; I'll have to copy and paste most of `greedy_connect` to form `last_connected`, which keeps connecting until all the boxes are connected into one circuit, and then returns the two boxes that made the final connection. "
"Today's puzzles had the slowest run times yet. I could perhaps make them faster by mutating sets rather than forming a new tuple for each new circuit, but I think gains from that would be small, and since the run time is still under a second, I'll leave the code as is."
"Today's puzzles had the slowest run times yet. I could perhaps make them faster by mutating circuits rather than forming a new tuple for each new circuit, or by using a Union-Find data structure, but I think gains from that would be small, and since the run time is still under a second, I'll leave the code as is."
"### Part 1: How many of the regions can fit all of the presents listed?\n",
"\n",
"There have been Tetris-like puzzles in past AoC years. Is this another search problem? If so, will the searches be fast or slow? I want to get a feel for it. First, how many regions?"
"There have been Tetris-like puzzles in past AoC years. Is this another search problem? If so, will the searches be trivial or difficult? I want to get a feel for it. First, how many regions?"
]
},
{
@@ -2840,7 +2841,7 @@
"id": "af3fe830-aa75-469b-9046-3f36ff3a03e2",
"metadata": {},
"source": [
"Next I want to get a feel for the variation in how tight the packing is. Each present can definitely fit into a 3x3 square, so what's the ratio of the total quantity of presents to the number of 3x3 squares? I'll make a histogram of that ratio for each region:"
"Next I want to get a feel for the variation in how tight the packing is. Each present can definitely fit into a 3x3 square, so what's the ratio of the total quantity of presents to the number of 3x3 squares? I'll make a histogram of that ratio, which I'll call the occupancy ratio, for each region:"
]
},
{
@@ -2862,7 +2863,7 @@
],
"source": [
"def squares(width, length) -> int: \n",
" \"Number of full 3x3 squares in a region.\"; \n",
"**Very interesting!** There's a real split. A lot of regions have an occupabncy ratio below 1.0 and thus are trivially easy to fit into the region, and the rest of the regions with occupancy ratios above 1.35 may well be impossible to fit. I say that because, just looking at the shapes, I estimate that the most you could overlap a shape onto another would be to save two \".\" squares; , so I could see getting to an occupancy rato of 1 + 2/9 = 1.22, but I don't think it is possible to get to 1.35. I can prove it is impossible to fit all the presents in a region if the total area of the solid parts of the presents (the '#' squares) is more than the area of the region (the width times length). \n",
"**Very interesting!** There's a real split. About half the regions have an occupancy ratio below 1.0 and thus are trivially easy to fit into the region: just plop onee present into each 3x3 square, without worrying about rotations. The rest of the regions with occupancy ratios above 1.35 may well be impossible to fit. I say that because, just looking at the shapes, I estimate that the average overlap of one shape with another would be to save two \".\" squares; so I could see getting to an occupancy rato of 1 + 2/9 = 1.22, but I don't think it is possible to get to 1.35. I can prove it is impossible to fit all the presents in a region if the total area of the solid parts of the presents (the '#' squares) is more than the total area of the region (the width times length). \n",
"\n",
"I can do triage on the regions to classify each one as a trivial fit, an impossible fit, or uncertain:"
"I can do **triage** on the regions to classify each one as a trivial fit, an impossible fit, or an uncertain fit (for which we would have to do a search):"
]
},
{
@@ -2888,14 +2889,14 @@
"metadata": {},
"outputs": [],
"source": [
"shape_area = [cat(shape).count('#') for shape in shapes] # Total number of '#' in each shape\n",
"shape_area = [cat(shape).count('#') for shape in shapes] # List of total number of '#' in each shape\n",
"\n",
"def triage(region) -> str:\n",
" \"\"\"Decide if a region's presents trivially fit, or are impossible to fit, or it is uncertain.\"\"\"\n",
" width, length, quantities = region\n",
" presents_area = sum(q * shape_area[i] for (i, q) in enumerate(quantities))\n",
" if sum(quantities) <= squares(width, length):\n",
" return 'fit' # The number of presents is no more than the number of 3x3 squares\n",
" return 'fit' # The total number of presents is no more than the number of 3x3 squares\n",
" elif presents_area > width * length:\n",
" return 'impossible' # The '#' area of all the presents is greater than the area of the region\n",
" print('\\nTime in seconds:', ', '.join(_zap0(stat(fn, times)) for fn in (sum, mean, median, max)))\n",
" \n",
"summary(answers)"
]
},
@@ -3026,14 +3019,16 @@
"id": "1098c017-2746-403c-bfb0-1a08cacc835d",
"metadata": {},
"source": [
"I solved all the puzzles and they run in under 2 seconds of total run time. Happy Advent everyone, and thank you Eric for the interesting puzzles!\n",
"I solved all the puzzles and they run in under 2 seconds of total run time. \n",
"\n",
"Happy Advent everyone, and thank you [Eric Wastl](https://was.tl/) for the interesting puzzles!\n",
"And thank you <a href=\"https://mastodon.social/@garygrady\">Gary Grady @GaryGrady</a> for the drawings!\n"
]
}
],
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.