After solving AoC in golang first, now I try to rewrite it in zig.
While doing it I try to improve the speed of my solutions. There will be no asm tweaks here, but I try to do my best.
Here are a couple similar repository:
Benchmark times below measured on a Raspberry Pi 4 with 8GB RAM.
Zig version 0.9.1 with -Drelease-fast
build option.
We compare to array window (a
and b
) when a < b
we increment our counter.
a: idx-3 + idx-2 + idx-1
b: idx-2 + idx-1 + idx
As idx-2
and idx-1
are in a
and b
too this can be simplified to idx-3 < idx
.
This way it fits in u13
, using @intCast(u15)
is not necessary.
Gain: 271 μs -> 270 μs :-)
Nothing smart here. I could check only the first character of each command, but it does not really matter.
Using std.heap.FixedBufferAllocator
instead of
std.heap.GeneralPurposeAllocator
helps.
Gain: 4780 μs -> 2650 μs
Using std.ArrayList
instead of std.BufSet
and bitwise operators instead of
strings makes things faster. ~2200 μs with std.testing.allocator
and ~950 μs
with std.heap.FixedBufferAllocator
.
I have tried to be smart whith the data structure here: Using an AutoHashMap
to store the position of each number in the grid, so I do not have to loop
through the whole grid when searching for a number.
This turns out quite slow: ~2x15 ms. Takeaway: When the grid is small (in this case 5x5) looping through it is quite fast anyway, it will compensate the HashMap overhead.
I am using ArrayList
for holding the numbers and boards just like the first
version, but the Board
representation is vastly simplified. Items are stored
in a 5x5 array and I keep track of marked numbers in row
and col
which are
[5]u5
so it is easier to check if we reached a bingo, do not have to loop through
the board every time.
This version runs around 1200-1400 μs, so 10x faster than the first version.
The first version used AutoHashMap
to store vents data. With my input on a
1000x1000 grid there are 98435 points, so it does not make sense after all.
Runtime was 50 ms for the first part, so I just gave up that point.
Using an [1000][1000]u2
array for stroring vents. Nothing fancy. Runtime is
around 5ms each.
The main idea here is that we do not try to keep track of the fishes, just following how many fish is in a given age. This way the solution is simple and fast.
In first part we need the median of the numbers, in second part we need the mean of them. https://www.geeksforgeeks.org/program-for-mean-and-median-of-an-unsorted-array/
I get the median by sorting the array (which is O(n*log n) worst case), but there are methods to do it O(n) time.
The mean is computed while parsing the items. For fuel stepping I just use
Gauss' trick: 1+2+3+...+n = n(n+1)/2
. This saves me a loop.
Instead of brute forcing the result deduct the unknown numbers from the already
known ones. I am also using a u7
for representing the seven segments. This way
filling them up also takes care of sorting.
This day does not need any fancy idea. I just walk through the items to get the lowest in first part.
In second part I start from every low point and collect every neighbour (and
neighbour of neighbours) which is lowert than 9 recursively. Already visited
points are stored in a std.AutoHashMap(point, void)
. I collect the
basins to a std.PriorityQueue
, so I do not have to sort them at the end. It is
probably useless, but was fun to use this data type too.
TODO: Using a [100][100]bool
array for stroring the seen items might speed
things up.
In the first part I just store the opening pairs in a std.ArrayList(u8)
. I
quit parsing as soon as a pair mismatch occures and count the correspoinding
fail value.
In second part there is an other std.ArryList()
for storing the completion
scores. The score is computed reading the first array backward.
I just used a [10][10]u4
array for stroring the grid items. Zig has a neat
+|=
operator, so the grid items will not overflow because of the continous
flashing. I just flash all the neighbours in 8 direction recursively.
The second part is very similar. I just check if every item in the grid is 0
with fullFlash()
. I break out from the infinite loop when it returns true.
I used a std.StringHashMap(std.ArrayList([]const u8))
for storing the routes
from every cave to the adjacent ones and a std.StringHashMap(void)
for storing
the seen ones.
In first part I recursively check all the caves starting from start
cave,
trying all the paths increasing the counter on every end
reach. I only collect
small caves in seen HashMap.
In second part the seen is a std.StringHashMap(u2)
, so I can store how many
times a small cave is seen. (Using +|=
to avoid overflow).
These solutions are slow: 7ms for the first part and 272 ms for the second which is way too much.
Instead of a std.StringHashMap(void)
I have used a bitmap for representing
seen caves (this could be further simplified by only tracking lowercase caves.
In the second part I use an other bit for marking if double lowercase cave is
used or not.
The main optimization here is using memoization for storing the alredy generated states.
First part is 1.2 ms, second part is 2.5ms (more than 100x faster).
I use an std.AutoArrayHashMap
for storing the points on the map, it is a bit
faster to iterate through than std.AutoHashMap
. Nothing special.
I store how often a given character pair appears in a std.StringHashMap
. In
every step I just create a new map based on the previous one.
The second part takes ~15ms, this could be improved.
This one is solved by a Dijksra algorithm. I use a std.PriorityQueue
for
storing grid points to visit. I can not use queue.update()
because of some
limitations in the std lib implementation, so I just keep adding the grid
points. It turns out to be faster than updating with std.meta.eql()
. This
solution is ~240ms for the second part.
I convert from hex to binary as string with switch
. Not sure if this is good
or bad. The rest if slicing and parsing. Nothing fancy.
This first part is a simple calculation. In second part I just brute force the solutions with the following optimizations:
My go solution uses strings to solve this with regexes and so. I have tried to
replicate that in zig, but it needs a regex lib (which is not in the std
) and
turns out to be quite slow. The first part takes ~500ms on my
computer, so I have to find a better way.
Instead of strings I use the following node structure for storing items:
const SnailItem = union(enum) {
num: u6,
pair: *SnailNumber,
}
const SnailNumber = struct {
left: ?SnailItem,
right: ?SnailItem,
parent: ?SnailNumber,
}
The first step is parsing the string to this, every other manipulation is done
on the SnailNumber
. First part drops down to 10ms, second part is ~200ms.
Using a brute force solution here, looping through all the scanners and beacons. Comparing them to already known absolute beacon positions. At start any scanner can be the (0,0,0) and there beacons are the absolute ones.
We also have to rotate the coordinates (3*8 rotation) before subtracting the absolute ones. When 12 matching item found we add those beacons to the absolute ones.
We store the absolute beacon coordinates in an std.AutoArrayHashMap
, so it is
easier to loop through them.
This solution is quite slow, first and second part takes more than 5 seconds each.
TODO: Figure out how to avoid checking all scanner, beacon and rotation against the absolute ones. Can we rule out certain scanners based on distance or something else? (Sort of done, but I am not satisfied with the result, especially the hard coded magic numbers.)
The image enhancement algorithm string converted to a std.StaticBitSet
. I use
a u9
for generating the index with bitmasking. This works fine.
I keep the lit items in a std.AutoHashMap
, but generally the picture grid is
just an array of coordinates.
I have replaced std.AutoHashMap
with std.DynamicBitSet
as each pixes is
either lit or not. This makes the code much faster.
First part is ~3792 us, second part is 190 ms.
First part is very easy, not much to talk about. The second part is much harder.
It takes forever without memoization.
Using an std.AutoHashMap
for the cache. There are 16k+ items in the cache and
more than 255k hits for chache items with my input. Runs ~70ms on the rpi4.
The first part is fast, as we only have to check the items in -50..50
range.
The second part is much harder: We use coordinate compression: Each cube switches on and off a range of coordinates, but we do not check them one-by-one, just keeep track of the start and end of each range. This makes this solution much faster, but it is still not fast enough.
We use create subcube only when two cubes are intersecting and switch the cubes
on and off based on the new input. This way we keep the number of cubes low
(~7000 cubes vs. the 840840840 items in the previous version).
We keep the subcubes in std.AutoArrayHashMap
so iterating is fast. Running
time is ~95ms.
I use Dijksra to solve this one. The various state representation is the following:
const Amphipod = enum { A, B, C, D, };
const hallway_length = 7;
const siderooms = 4;
const sideroom_depth = 2; // 4 int the second part
const State = struct {
hallway: [hallway_length]?Amphipod,
rooms: [siderooms][sideroom_depth]?Amphipod,
};
I do not think that I am doing any particularly smart with the moves. But I move the Amphipod to it's room whenever that is possible to lower state count.
I store state and it's energy in a std.PriorityQueue
. I do not update queue
energy values as adding new ones is faster and I do not drain the queue, just
return when reached final state.
TODO:
std.AutoHashMap
which
strores the already visited states is the slow part.The hard part of this task figuring out which input lines matter (which I did not, so...) The solution is fast as it is.
I just use a [137][139]u8
for storing grid values and change them in place. To
change them in place I had to take special care for capturing all the the
row/col state before changing them, but it is bearable. Running time is ~250ms.
TODO:
std.StaticBitSet(grid_rows*grid_cols)
to store occupied or
free state of a given grid point.const Ocean = struct {
easts: []usize,
souths: []usize,
grid: std.StaticBitSet(grid_rows*grid_cols)
}
This way every coordinate is comes as Coord{ .x = value / grid_cols, .y = value % grid_cols}
. Might worth a try.
⋊> ~/c/aoc2021-zig on master > zig build bench -Drelease-fast
Day 1a result: 1400 time: 297us
Day 1b result: 1429 time: 297us
Day 2a result: 1893605 time: 212us
Day 2b result: 2120734350 time: 216us
Day 3a result: 3847100 time: 174us
Day 3b result: 4105235 time: 912us
Day 4a result: 11536 time: 1054us
Day 4b result: 1284 time: 1519us
Day 5a result: 8060 time: 6568us
Day 5b result: 21577 time: 7945us
Day 6a result: 350149 time: 37us
Day 6b result: 1590327954513 time: 35us
Day 7a result: 335271 time: 420us
Day 7b result: 95851339 time: 156us
Day 8a result: 381 time: 248us
Day 8b result: 1023686 time: 645us
Day 9a result: 550 time: 514us
Day 9b result: 1100682 time: 4798us
Day 10a result: 411471 time: 557us
Day 10b result: 3122628974 time: 923us
Day 11a result: 1700 time: 913us
Day 11b result: 273 time: 2214us
Day 12a result: 5212 time: 1065us
Day 12b result: 134862 time: 2265us
Day 13a result: 802 time: 2173us
Day 13b result: 103 time: 5834us
Day 14a result: 3411 time: 3225us
Day 14b result: 7477815755570 time: 13915us
Day 15a result: 366 time: 7948us
Day 15b result: 2829 time: 241226us
Day 16a result: 967 time: 1373us
Day 16b result: 12883091136209 time: 1429us
Day 17a result: 5778 time: 1us
Day 17b result: 2576 time: 28285us
Day 18a result: 3675 time: 10034us
Day 18b result: 4650 time: 190114us
Day 19a result: 342 time: 1063723us
Day 19b result: 9668 time: 1066491us
Day 20a result: 5503 time: 3756us
Day 20b result: 19156 time: 188746us
Day 21a result: 679329 time: 4us
Day 21b result: 433315766324816 time: 75514us
Day 22a result: 607573 time: 14265us
Day 22b result: 1267133912086024time: 95714us
Day 23a result: 16244 time: 636858us
Day 23b result: 43226 time: 1388284us
Day 24a result: 99299513899971 time: 122us
Day 24b result: 93185111127911 time: 118us
Day 25a result: 432 time: 233175us
Total running time: 5357ms
C:\Users\voroskoia\zig\aoc2021-zig>zig build bench -Drelease-fast
Day 1a result: 1400 time: 41us
Day 1b result: 1429 time: 43us
Day 2a result: 1893605 time: 44us
Day 2b result: 2120734350 time: 25us
Day 3a result: 3847100 time: 107us
Day 3b result: 4105235 time: 221us
Day 4a result: 11536 time: 136us
Day 4b result: 1284 time: 196us
Day 5a result: 8060 time: 846us
Day 5b result: 21577 time: 690us
Day 6a result: 350149 time: 1us
Day 6b result: 1590327954513 time: 1us
Day 7a result: 335271 time: 68us
Day 7b result: 95851339 time: 16us
Day 8a result: 381 time: 52us
Day 8b result: 1023686 time: 104us
Day 9a result: 550 time: 108us
Day 9b result: 1100682 time: 932us
Day 10a result: 411471 time: 96us
Day 10b result: 3122628974 time: 269us
Day 11a result: 1700 time: 109us
Day 11b result: 273 time: 244us
Day 12a result: 5212 time: 222us
Day 12b result: 134862 time: 499us
Day 13a result: 802 time: 235us
Day 13b result: 103 time: 958us
Day 14a result: 3411 time: 433us
Day 14b result: 7477815755570 time: 1514us
Day 15a result: 366 time: 1156us
Day 15b result: 2829 time: 28947us
Day 16a result: 967 time: 248us
Day 16b result: 12883091136209 time: 207us
Day 17a result: 5778 time: 0us
Day 17b result: 2576 time: 5552us
Day 18a result: 3675 time: 1661us
Day 18b result: 4650 time: 22754us
Day 19a result: 342 time: 133007us
Day 19b result: 9668 time: 146975us
Day 20a result: 5503 time: 1768us
Day 20b result: 19156 time: 64990us
Day 21a result: 679329 time: 0us
Day 21b result: 433315766324816 time: 9780us
Day 22a result: 607573 time: 2271us
Day 22b result: 1267133912086024time: 12200us
Day 23a result: 16244 time: 99033us
Day 23b result: 43226 time: 197525us
Day 24a result: 99299513899971 time: 46us
Day 24b result: 93185111127911 time: 42us
Day 25a result: 432 time: 44443us
Total running time: 731ms