Entropy always increases

Sunday, January 05, 2025

ICPC World Finals 2024 solutions

I recently got around to looking at the problems from the 2024 ICPC World Finals, and noticed that they don't have test data or solution sketches posted. Fortunately Kattis has the problems in its online judge (https://icpc.kattis.com — the "official" online judge link is broken), so I was at least able to submit solutions to check correctness.

So here are my solution sketches below. In the cases of a few problems I had some intuition about a solution but I don't actually have a proof of correctness.

A: Billboards

It's easy enough to compute the total area of each value function. We can then easily scale the functions so that each has a total area of n (and hence each section must have a value of at least 1).

Let's suppose we already know the order in which we'll present the billboards. Then we can compute the partition points greedily: make each section in turn the length such that the value to the sponsor is 1. The last section has to end at l so it might be longer. Computing this requires a little maths. For each sponsor, we can keep a prefix sum of the areas of the piecewise-linear segments, and binary searching that gives us the segment where the area crosses 1. To find the exact x value we'll need to solve a quadratic equation, and handle a few special cases.

How do we choose the order? We can actually do this greedily too. Just repeatedly pick the sponsor that will allow us to use the smallest section of billboard. Initially, the total of the value functions of all sponsors is n². We're then eliminating one sponsor (with total value n), and a section of billboard which the other n - 1 sponsors value at 1 or less. So the remaining total of value functions of the other sponsors is at least n² - n - (n - 1) = (n - 1)². We can thus keep repeating the process, and see that at end the remaining value will still be least 0.

B: Bingo for the Win!

For a player to be last, one of their numbers must be the last number picked. Consider each distinct number that appears on a particular player P's board. If a slower player also has that number, then this number being last won't make P finish last. Otherwise, if this number is drawn last, P will finish last, and if the number appears m times in total (out of all nk numbers), it has a m/nk chance of being picked last. So for each player, add this up over all numbers on that player's board.

C: Citizenship

This is a fairly straightforward implementation problem. You have to convert dates into a more useful representation (e.g. days since 0000-01-01), and just iterate through possible answers until you find one that satisfies the conditions. One optimisation is that you can pre-compute a prefix sum of days away to quickly determine the number of days resident during any 365-day period.

D: Doubles Horseback Wrestling

I don't have a solid proof that the following works, but it is accepted. Note that two riders i and j are compatible if (and only if) \(l_i + l_j \le s\) and \(u_i + u_j \ge s\).

Sort the riders by l. Iterate through them in decreasing order of l, trying to find a match for each one (unless already matched). We can maintain a set S of riders j which satisfy \(l_i + l_j \le s\) (which grows each time l decreases). From S, take the rider with the smallest \(u_j\) such that \(u_i + u_j \ge s\) (if any exists).

The intuition is that any match currently in S will always remain in S (unless matched) later on, so there is no benefit to matching with any rider other than the one whose u value will be the hardest to match.

E: Flipping Container

We will try to consider all sequences of flips that might lead to the desired answer. Since that's a large (actually infinite) space, we'll try to identify symmetries that allow the search space to be cut down to a more reasonable size.

With "orientation" defined as it is, there are 6 possible orientations. We can write them as abc, cba, cab, bac, bca, acb, according to which sides are aligned to the x, y and z axes respectively. There are four possible transitions from each orientation: two of them will go to the next orientation in that sequence (wrapping around), and two to the previous one.

Each transition can also be labelled according to the amount it to adds to x or y (possibly negative). It follows that to determine where the container ends up, it's sufficient to know how many times each transition occurs, and the order doesn't matter. Conversely, if we decide how many times to use each transitions, then we can reach the corresponding position provided that the conditions for an Eulerian cycle are satisfied (connected and in-degree equals out-degree at every vertex).

Transitions occur in pairs with the same start and end orientation, corresponding to opposite edges to flip over. For example, from abc one can transition to cba either with x+=a or x-=c. To reduce the number of possibilities to cover, we'll start by just deciding how many times each pair appears, then later decide how to split that count amongst the elements of the pair. I'll call such a pair an "edge".

Between each adjacent pair of orientations there is an edge in each direction. Let's call the difference between the frequencies of these edges the "balance" of the pair of orientations. The Eulerian cycle condition requires the balance of each adjacent pair to be the same, and in fact this corresponds to the number of times the Eulerian cycle "winds" around the centre of the cycle. It's not hard to check that the effect of winding one way around is the same as winding the other way, so without loss of generality we can assume that this winding number/balance is either 0 or 1.

We also have to consider the connectivity requirement for an Eulerian cycle. If the window number is 1 this is trivial (we clearly visit all vertices). Otherwise, we can iterate to decide how many vertices we proceed clockwise and anticlockwise from the starting orientation.

With those outer loops (to decide winding number of which vertices are reachable), we now just need to determine how many times each edge gets used. For each pair of adjacent orientations we need only pick one number, since we've already determined the balance. We'll also need to know how to split each edge into the two component transitions. We can cut down the search space further here by noting that half of the edges affect only x and the other half only y, so we can solve for them independently.

At this point we can use dynamic programming. Consider the edges between abc and cba for example. The forward transitions provide X deltas of +a and -c, and the reverse transitions provide -a and +c. A forward-and-back thus provides either -a-c, +0 or +a+c. (Note that while the +0 case is normally useless, one might require one of them to satisfy the graph connectivity requirements.) Similarly, other pairs of edges provide -c-b or +0 or +c+b and -b-a or +0 or +b+a. This is a slightly modified knapsack problem (due to the negative values) but the usual dynamic programming approach works with minimal change (the negative values just require iterating the update in the reverse direction). However, the possible range of inputs is far too large to handle with dynamic programming. When the sum is very large, one can note that it will always be optimal to make up "most" of the sum using the largest component (in particular, if the largest component is q, then there is no point using q of a smaller component p, since this can be replaced by p instances of q). So we can bound the DP to "small" values, and then for each small value with the right remainder modulo q, determine the number of q's to add.

F: Friendly Rivalry

If the separation is to be at least d, then any pair of chapters separated by less than d must be the in same team. So create a graph with an edge for each such pair, and identify the components. Then check whether we can pick a subset of the components to form a team of size n, which is a classic knapsack problem. This gives a test for whether the answer is at least d; a binary search completes the solution.

G: Kindergarten

This is another tough one where I just kept refining my solution until it passed, but I don't have a water-tight proof.

In general, a directed graph where every vertex has out-degree 1 has a known structure: each (weakly-)connected component consists of a set of trees whose roots are joined into a cycle. For this problem, the constraints on coolness ensure that there is only one such component in the jealousy graph, although it's not clear why that's part of the problem except perhaps to make the implementation slightly easier.

We'll build a directed graph of relationships of the form "A must go before B" which is sufficient to prevent any surprises. Provided we ensure this graph is a DAG, we can then find a topological order on it. This DAG will be mostly the jealousy graph (with edges reversed), but since that graph has a cycle we will need to modify it to use "like" relationships instead in some cases (I'll call these "crushes").

Start by finding the cycle of tree roots. We'll have to remove one of the edges from this DAG; we can just iterate to pick which one. Let's say we're going to remove the edge "P must come before Q" (meaning that Q is jealous of P). To prevent surprises, we'll instead need to add Q → R and R → P where Q's crush is R. In some cases this may be sufficient to produce a DAG, but if R is a descendant of P then we have a new problem, because this will produce a cycle involving R and P. The key observation is if we now pretend that P is jealous of R, this looks a lot like the original problem, just constrained to the tree rooted at P, and hence we can solve it recursively. There are a few tweaks needed in the recursive step though:

When choosing which edge of the cycle to cut, we cannot cut the edge R → P, because that's needed to prevent Q leaving a surprise for P.
Let R be jealous of S and have a crush on T. If we cut the edge S → R then we'll be adding the edge R → T, and if T = Q we'll have created a cycle. More generally, this is a problem if T is a predecessor of Q in the DAG we're constructing. With a single level of recursion, Q has no predecessors, but when we cut the S → R edge at consecutive levels of recursion, we will accumulate a chain of predecessors.

This recursive search will either produce a DAG which we can linearise, or fail to find a solution. I don't yet have a proof that there is no solution in the latter case.

If each recursive call takes time proportional to the size of the whole tree, this will be too slow (quadratic time). Instead, each call should do self-work that scales with the size of the cycle: the edges in this cycle are not processed again at lower levels of the recursion, so this would make the algorithm linear-time. In practice, there is likely to be an extra log factor. The trickiest part is determining whether R is a descendant of P. I enumerated the vertices in a depth-first walk, which makes the descendants of any node a contiguous range. I then maintain a data structure that records the current root for each vertex. This could be done easily with a segment tree, but I used an ordered map that stores contiguous ranges that have the same root.

H: Maxwell's Demon

There are two parts to this problem. The first part is to determine the times at which a single particle will reach the demon. Note that letting a particle through just mirrors its future path, so has no impact on when it will next reach the demon again. I'll just discuss a particle in the right-hand side, since a particle on the left-hand side can be handled symmetrically (and in fact my solution maps all particles to the right-hand side and just records whether they need to swap sides).

To avoid the complications of reflections, it's easier to conceptualise the chamber as having size 2w × 2h, with the particle wrapping around to the other side whenever it hits a wall but maintaining its velocity. The other 3 copies of the chamber are like images in the mirror if the chamber has mirrored sides.

It's pretty simple to determine when and where the particle will first hit the dividing wall. It's also clear that it will hit it every \(\frac{2w}{\lvert v_x\rvert}\) seconds, and we can determine the change in y over that time. To determine when the particle will hit the demon or its mirror image at \(y = 2h - d\), we just have to solve a linear equation modulo 2h, which requires just finding a modular inverse. This becomes a little simpler if all coordinates and time values are scaled up by \(\lvert v_x\rvert\), which makes all values integral. This will give both an initial hit time and a period; or the linear equation may have no solution.

There are a few corner cases:

If d = 0 or d = h, then the demon and its reflection are the same, and one may need to handle this to avoid counting encounters with the demon twice.
If the particle is moving purely vertically, it will never hit the demon (avoid dividing by zero).
If the particle is moving purely horizontally, it might or might not hit the demon, but the calculation may need to be handled differently. More generally, if successive hits on the dividing wall have the same y value, this needs to be handled carefully to avoid division by zero.

The second part of the problem is to deal with the interference caused by multiple particles hitting the demon simultaneously. One can simulate time by keeping a priority queue of hit events ordered by time. Each time there is a hit, identify all the particles that hit at that time. These particles either all swap or all stay where they are. You need to determine whether there is a set of these times whose XOR corresponds to the necessary set of particles swapping. That's just a linear algebra problem over Z₂.

The naïve approach would be to try to solve the linear equation from scratch over all events at each time. However, one can speed things up by keeping the result vectors from the previous Gaussian reduction. The new vector is then added as a new row, and reduced using all the previous rows. If it's reduced to all zeros, it can be ignored, as it is just a linear combination of existing options. Otherwise, it can be kept as a new row, and used to reduce the right hand side.

Finally, we need to decide whether to give up. Because the velocities are all integers, after 2w seconds the X coordinates will all be the same as at the start, and after 2h seconds the same applies to the Y coordinates. Hence, the whole system is cyclic with period 2wh (or a factor thereof), and so we can give up after 2wh seconds.

I: Steppe on it

This is a fairly standard tree DP. We'll solve the problem of determining how many fire-engines are needed for a given response time. The optimal response time for a given number of fire-engines is then found by binary search.

So assume the response time must be at most T. For each subtree, calculate:

The minimum number of fire-engines (placed within the subtree) needed to cover all the cities in the tree, and the minimum possible response time this gives for the root (and hence how much coverage that engine can give above the root).
Whether it is possible to save one fire-engine by having the root and possibly some descendants get their service from an external fire-engine, and the maximum response time that engine may have for the root.

To compute these (recursively), treat the current node as a leaf and fill in the values, then add an edge to each child tree in turn. When adding an edge, one considers the different possibilities for the two trees that are being joined.

J: The Silk Road… with Robots!

Let's first consider how to solve the problem if it doesn't have any updates. Firstly, note that there is never any benefit to having two robots cross over or even meet, since they will be duplicating work. Now consider the road segment between any two locations ("locations" meaning robot or shop locations), and what tracks robots take across it. There are 5 possibilities:

A robot crosses it left-to-right.
A robot crosses it right-to-left.
A robot crosses it left-to-right, then returns (right-to-left).
A robot crosses it right-to-left, then returns (left-to-right).
No robot crosses it.

One can now sweep the locations left-to-right, computing \(d_{i,s}\) which is the maximum profit over locations [0, i] where the segment to the right of location i has type s. To compute \(d_{i,s}\), one needs to consider each possible type t to the left of location i, check whether t and s are compatible (e.g. don't require a robot to materialise from this air) and compute a profit from \(d_{i-1,t}\), the tenges collected at i (if any) and the costs of robot movements due to s. I used a 5×5 table to track which states are compatible depending on whether there is a robot or shop at the location.

Running this after every update will be too slow, and we clearly need a way to do incremental updates. Instead of a left-to-right sweep, we can treat this as a divide-and-conquer DP. Let \(d_{i,j,s,t}\) be the maximum profit over [i, j) with type s to the left of i and type t to the left of j. We can combine this with \(d_{j,k,t,u}\) to produce a candidate for \(d_{i,k,s,u}\). Thus, if we can solve the problem recursively for the left and right halves of a range, we can combine those solutions to compute the solution for the full range. These partial solutions can be stored in a segment tree.

When we receive an update, we can update one leaf in the segment tree, then propagate the update up to the root in logarithmic time. Note that since we have the full set up of updates available offline, we can determine the locations in advance. A location that has not yet received any update behaves just like a shop with no tenges.

K: Tower of noiHa

This is another very tricky one where I fiddled with my solution until it passed, but for which I don't have a formal proof of correctness.

The first step is to determine the initial state. This is relatively easy: if the first (most significant) bit is 1, then the biggest disk has been moved to the right-hand stack, and the remaining bits describe moving the remaining disks from the middle to the right. Otherwise, the biggest disk hasn't yet been moved, and the remaining bits describe moving the remaining disks from the left to the middle. This decoding process can be applied recursively.

We can dispose of a simple case first: if the initial position is valid (no disk on top of a smaller disk), then a similar procedure to the above allows the state to be turned back into a binary representation of the number of moves remaining. To let's assume that there is a disk (call it d) that is on top of smaller disk(s). We'll divide the disks into three sets: A is the disks smaller than d and not sitting under d, B is the disks bigger than or equal to d, and C is the rest (smaller disks trapped under d).

Consider the movements made just with disks in B. Since these disks are properly stacked big-to-small relative to each other, we can move them in the standard way for Hanoi. This tells us where the disks in A need to be placed to allow the first move from B, which can be similarly computed. After this, and until we move d, each time we need to make a move from B we also have to shift all the disks in A, which means the total move costs \(2^{\lvert A\rvert}\). Once we move d, the whole game will have reached a valid state, and we can solve it in a standard way.

There is another special case: if all the disks from B are already on the right-hand stack, then the algorithm above will not try to move them at all, but this will leave C out of order. So in this case one needs to move all disks from A to one of the other two stacks, then move d to reach a valid state. It's unclear which stacks to use, so I just try both possibilities and take the cheaper one.

L: Where am I now?

I was a little surprised by the low number of solutions on this — but interactive problems are always a little more difficult to get right.

There is a fairly generous limit on the number of steps. It's not actually necessary to use the exact distance information given; it's sufficient to know whether there is a wall in the next cell. My approach was:

Do a depth-first exploration of the map to determine the shape of the connected component one is in. Call this a "fragment".
Consider all possible translations and rotations to determine how your fragment could be placed on the map. Call these "transformations".
Consider each empty cell within your fragment. If there is a cell which corresponds to the same map position under all transformations, then move to that cell and declare your position. Otherwise, the problem is impossible.

Saturday, December 02, 2023

IOI 2023: Overtaking

I found this to be the easiest of the IOI 2023 problems — it's the only one where I had a rough idea of how to solve it within a few minutes of thinking. It's still not easy though.

There are a few key observations that simplify the problem:

A faster bus has no impact on the arrival times of a slower bus (even indirectly).
This means that we can simply delete any buses faster than (or as fast as) the reserve bus, as it won't affect our answers.
After doing that, the arrival times of all the non-reserve buses is fixed independent of Y.
The arrival time of the reserve bus is an increasing function of Y.

We'll simulate the process in a spatial sweep along the road. At each sorting station, we'll answer the questions

At what times to the non-reserve buses arrive here?
For a given arrival time for the reserve bus, what is the range of Y values which would cause the reserve bus to arrive here at that time?

To answer the first question, we first need to know the order in which buses leave the previous sorting station, which is just a sort by departure time then by speed. A prefix maximum then gives the arrival times at the current station. To answer the second, we'll start by answering it specifically for times at which non-reserve buses arrive at this station. Consider a time t and slowest bus i to arrive at time t. If the reserve bus leaves the previous station between times t-W[i]d (exclusive) and t-Xd (inclusive) then it will arrive at time t. If it leaves any earlier it will be ahead of the traffic and arrive earlier, and if it leaves any later it will arrive later even if it doesn't encounter any traffic. To turn that range into a range of Y values, we rely on dynamic programming: we look up the interval of Y values corresponding to each endpoint and take their union.

What about answering the second question for times other than those when the non-reserve buses arrive? In those cases, the reserve bus did not encounter any traffic, so the answer for time t is simply the answer for the previous station at time t-Xd.

We'll need a data structure to represent all this. Conceptually it will be an increasing function mapping Y to an arrival time. It will need to support the following operations:

Initialise with the identity function.
Increment all values by Xd. This can be handled by maintaining an offset outside of the main data structure.
Set all values in the interval (u, v] to t.
Find the intercept for a given output value.

A suitable data structure is a balanced binary tree, where each node is an interval of Y values with a fixed output t. The intervals must be non-overlapping. For any Y value not contained in one of the intervals, the function is the identity. Setting an interval (u, v] to t requires

Deleting any existing intervals contained entirely in that range.
Trimming any intervals that overlap the range.

The deletion means each insertion could take linear time, but it is amortised since each inserted interval can only be deleted once.

This requires O(NM log(NM)) for initialisation and O(Q log(NM)) for the queries. I have one gripe with this problem, although I don't know if it is the fault of the IOI or just the IOI mirror on Codeforces: my solution was about 25% slower than the model solution, and exceeded the time limit. To make it pass, after initialisation I converted the std::map to a std::vector and used binary search on it to answer the queries, which is not a big-O change but was slightly more efficient and allowed it to get a full score.

Sunday, November 26, 2023

IOI 2023: Beech Tree

Here's my solution to Beech Tree, from IOI 2023 day 2. For a while I hated the problem, but after seeing the solution I think it's quite beautiful (excuse the pun). It's definitely a problem that requires a lot of thinking time before writing relatively little code.

The key is to identify the conditions under which a subtree is beautiful, in a way that can easily be checked. I'll talk just about whether a "tree" is beautiful, because I'll want to talk about subtrees of that tree. I'll also ignore the original labelling and just work with the permuted labels, which eliminates a level of indirection from all the indices. So for example, the parent of A must be f(A). I'll also talk freely about the colour of a node, by which I mean the colour of the edge between that node and its parent (C[i]).

We can make quite a few observations about a beautiful permutation; I'll focus on the relevant ones here. Some of them seem to come from nowhere: I found them by trying to work out a procedure for choosing a beautiful permutation, and then discovering conditions under which it would fail.

If A and B have the same colour, then A < B if and only if f(A) < f(B). This follows directly from the definition of f.
If A is a parent of B, then A < B. Proof: A must be f(B), and f(B) is the number of times C(B) appears in a sequence of length B - 1, so A = f(B) ≤ B - 1.
No node can have two children of the same colour. Proof: suppose A has two children X < Y with the same colour. Then A = f(X) < f(Y) = A, a contradiction.
Let S(X) be the set of colours of children of X. If A < B, then S(A) ⊇ S(B). Proof: Suppose colour D appears in S(B) but not S(A), i.e., B has a child X of colour D but A does not. X must be the B'th node with colour D (counting from 0). The A'th node with colour D should have A as its parent, a contradiction since A has no child of colour D.
If A < B, then |T(A)| ≥ |T(B)|. Proof: suppose this is not the case i.e., there exist A < B with |T(A)| < |T(B)|. We can choose the largest such B. Since S(A) ⊇ S(B), there must be some colour D such that A and B have children X and Y of that colour with |T(X)| < |T(Y)|. But since f(X) = A < B = f(Y), we get X < Y, which means the pair (X, Y) contradictions the maximality of B.
Let z(A, D) be the size of the subtree rooted at A's child of colour D, or 0 if A does not have a child of colour D. If A < B, then z(A, D) ≥ z(B, D) for all colours D. Proof: Suppose this is not true for some A, B, D. Observation 4 rules out z(A, D) = 0, z(B, D) > 0. Let the children of A and B with colour D be X and Y. Then f(X) = A < B = f(Y), so X < Y, but |T(X)| < |T(Y)|, contradicting observation 5.
If |T(A)| ≥ |T(B)|, then z(A, D) ≥ z(B, D) for all colours D. If A < B then this follows from (6), so assume A > B, giving z(A, D) ≤ z(B, D) for all D. But |T(A)| = 1 + Σ z(A, D), so if any of those inequalities are strict we would have |T(A)| < |T(B)|, a contradiction.

We now claim that if a tree satisfies conditions 3 and 7, it is a beautiful tree. Note that these conditions are independent of the labelling, and hence can easily be tested. The proof is constructive, in the form of an algorithm to assign the labels in increasing order. Maintain a queue of unlabelled nodes per colour; the head of the each queue must be the next node to be assigned a label. To start, consider the root to have colour 0, and place it in the queue for colour 0. All other queues are empty.

To advance, select the head-of-queue with the largest |T(X)|, and assign it the next available label. Then place its children at the tails of their respective queues. This process maintains the following invariants:

Within each queue, |T(X)| is decreasing.
The elements in the queues have (non-strictly) smaller |T(X)| than any labelled node.
The labels are assigned in decreasing order of |T(X)|.

The proof is left as an exercise for the reader. A key observation is that if some node A is labelled, and it does not have a child of colour D to append to the D queue, then no other nodes will later be appended to the D queue, because 0 = z(A, D) ≥ z(B, D) for any B > A. This ensures that every node will have the correct parent.

Now we are left with the problem of testing conditions 3 and 7 for every subtree. Testing condition 3 is quite easy. For condition 7 it helps to note that the condition is transitive, so that if |T(A)| ≥ |T(B)| ≥ T(C) and (A, B), (B, C) each satisfy the condition, it is not necessary to check (A, C) as well. We can thus easily check a tree of size K in O(KM) time, but this will be too slow to get a full score. There are two observations that help speed this up:

To test the condition for some A, B, we can just iterate over S(B), and for each colour D look up z(A, D). If we sort the children of each node by colour, this will take O(|S(B)| log M) time. A tree of size K has only O(K) edges, so checking a whole tree can be done in O(K log M) time.
We can re-use results from subtrees. If any child of A is not the root of a beautiful subtree, then nor is A. If they all are, we can use small-into-large merging to combine information about the subtrees. For each subtree, keep an ordered map from subtree size to a representative vertex of that size. Then we can merge subtrees by taking each element from the smaller map and inserting it into the larger map, checking that it has the appropriate relationship with its immediate neighbours. Each insertion requires O(log N) time to find the location in the map structure and amortised O(log M) to perform the verification, and each node will be inserted O(log N) times. Thus, the running time is O(N log N (log N + log M)).

This sounds complicated but requires less than 100 lines of code.

Sunday, November 19, 2023

IOI 2023: Day 1

I've finally gotten around to looking at the day 1 problems from IOI 2023, and they're really tough. Congratulations to those who solved any of them. It's been pointed out that writeups of solutions seem to be absent from the internet, so I'll try to remedy that, at least for day 1 (I haven't had time to read the problems for day 2 yet, and probably won't until early next year).

For now I'll leave out Closing Time, since there is a solution at the link above, and I don't have a good proof of efficiency for my own solution. I might come back to it.

Longest trip

Finding the longest path in general graphs is NP-hard, so we should expect that we'll need to exploit the density condition in some way. I'll only discuss D=1, since the other cases are strictly easier. Let's first consider how many components there can be. There cannot be 3 (or more), because then we could pick one vertex from each component and this triple would violate the density condition. If there are two components, then when picking any two vertices from one component and one vertex from the other, we see that the two vertices in the same component must be connected i.e. both components are complete graphs. In that case the longest path is easy to find: just walk the vertices of the larger component in any order.

What about when there is only a single component? It's certainly not required to be complete (the first example shows that), but perhaps we're still guaranteed to be able to find a Hamiltonian path?

Building a single path a step at a time will be difficult, but we can get close by building two paths at once. We can initialise the paths with the first two vertices, and then take each other vertex in turn and try to append it to one of the paths. If it doesn't have an edge to the end of either path, then the ends of the two paths must be connected to each other. In this case they can be joined into a single path, and we can start a new second path with the vertex we just tried to add.

Once we've processed all the vertices, we will have them organised in two paths, but we don't know if they're separate components or not. A single query can determine that. If so, we just return the longer of the paths. If not, we want to find a way to link the paths together. The following procedure will do the trick:

Test if either end of one path is connected to either end of the other. If so, then we can just chain them together trivially.
If not, then the density condition requires that the end of each path is connected to its beginning, so the paths are in fact cycles. Now find any edge that connects one cycle to the other. By cutting each cycles next to this edge, we obtain a Hamiltonian path.

This algorithm requires O(N) queries, but without careful implementation, the constant factor will be too high to keep q below 400. Let's see how we can improve things.

Let's start with the first part of the algorithm, where we build two paths. Let's call the current last elements of each path P and Q, and the element we're adding R. In some cases we will have information that P and Q are definitely not connected. If that's the case, and we learn that P and R are also not connected, then we know that Q and R are connected, without needing a query. So when we know P and Q are not connected, only one query is needed to make progress. When P and Q might be connected, we might need two queries, but if so we have a good chance of transitioning to a state where we know P' and Q' are not connected. The only time we won't is if

P is connected to Q; and
neither P nor Q is connected to R; and
both paths have more than one element.

The above conditions cannot apply twice in a row (because afterwards one of the paths has a single element), but this still allows for an average of 5/3 queries per element. However, we're also given the information that the grader is not adaptive, and simply randomising the order of elements seems to be sufficient to keep the actual number of queries down.

We can also optimise the final steps in attempting to merge the two paths together. To check whether it is possible to connect the two paths end-to-end, a single query can be used to determine whether either end of the one is connected to either end of the other; if this returns true, further queries can be used to isolate the specific connection. While that requires more queries in the true case, it reduces the queries for the false case (which is more expensive, because we then have to find some other edge between the cycles). Finding an edge between the cycles can be done in logarithmic time, by first binary searching on the one path, then the other.

The official model solution does some slightly more complicated things in the case that P and Q are connected; I suspect this might guarantee that the solution completes within 400 queries without depending on randomisation. A solution that can guarantee no more than 1.5 queries to add each R will have the following budget:

381 queries for the first phase (only 254 elements need to be added, since the first two are used to initialise the paths)
1 query to check whether there is any connection between the paths
1 query to check whether there is a connection between the edges of the paths
15 queries for the binary search (8 for the larger side, 7 for the smaller),

for a total of 398.

Soccer

Let's start by figuring out what a "regular" field looks like. Within each row, let's call all the cells that form part of the field a "span". A span has to be contiguous, since there is no way to kick a ball around a gap in a row in only two straight kicks. Now consider two spans (in different rows). Each span can be seen as an interval of column numbers, and if neither of them is a (non-strict) sub-interval of the other, then field is not regular: there will be a cell in each span whole column does not belong to the other span, and there is no way to kick a ball between these two cells with only two straight kicks.

Let's consider the longest span (ties can be broken arbitrarily), which by the observations above is also a super-interval for every other span. Moving away from this longest span either upwards or downwards, one must encounter progressively shorter spans (again, non-strictly) as otherwise there will be a column with non-contiguous cells in the field. It shouldn't be too hard to convince yourself that these conditions are also sufficient for a field to be regular.

We can think about constructing a maximal field by adding spans one at a time, starting with the longest span and then adding spans either above or below. This naturally leads to a recursive solution: given a partially constructed field (consisting of spans in contiguous rows), consider adding a new span either above or below the existing spans, whose extent must be limited to the intersection of all the existing spans. We need to start the recursion from the widest span and we don't know where that will be, so we can just use an outer loop that considers all possibilities.

This recursion will consider all possible regular fields, but since we only want to know about the biggest one, we can immediately do some pruning. Consider a span to be "maximal" if it cannot be expanded (under whatever conditions it is being selected) by adding more cells to the left or right. When recursing, we only need to consider maximal spans, since this adds more cells to the field and does not reduce our choices later in the recursion. Similarly, only maximal spans need be considered as the initial (widest) span.

This exponential-time solution can be reduced to polynomial time with memoisation in a pretty standard way. The only state needed in each recursive call is the range of rows for which we already have spans and the range of columns that is the intersection of all chosen spans.

This gives O(N⁴) states and an even larger number of edges between states, so we need to optimise this to solve the final subtask. Let's start with a few optimisations that won't obviously reduce the big-O. Immediately after adding a new span, see if we can add more spans (on both sides) without narrowing the column range at all. It will never be suboptimal to add these now, so we can do it immediately. We want this process to be efficient, which we can achieve by building lookup tables for the next tree upwards and downwards of any given position, and precomputing a range-max query structure along each row. We can also use lookup tables of next tree and next non-tree along each row to quickly find all the maximal spans in the new row we're adding.

While this will clearly help in some special cases, it's not clear why it should improve the big-O. Let a "maximal rectangle" be one that doesn't include a tree and cannot be made into a bigger rectangle by adding more cells to it (without including a tree). Then it is not difficult to show that each time we need to decide on the next span, the current row and column range define a maximal rectangle: our recent optimisation is to extend the rectangle vertically, and our choice to only include maximal spans mean it cannot be extended horizontally. So the number of memoisation states is bounded by the number of maximal rectangles.

What's the maximal number of maximal rectangles? It is O(N²). Consider fixing the bottom row, and fixing the tree that bounds the top edge (treat the boundary as surrounded by trees). Since this tree must be the lowest in its column above the bottom edge, there are only O(N) such trees for each choice of bottom row. Once those two parameters are fixed, it is easy to see that the rest of the maximal rectangle is fixed too (simply grow leftwards and rightwards of the chosen tree as far as possible). Note that not all such rectangles are maximal, because we haven't considered that the bottom could be extended. So in fact, the number of rectangles bounded by trees on 3 sides is O(N²).

This means that we have O(N²) states, but what about the number of state transitions? We'll consider only transitions that start by adding a new row on top (adding to the bottom is symmetrical). At first glance, it appears that this could be O(N³), because from each state there could be O(N) successor states. However, we can associate each transition with a unique rectangle bounded on top, left and right; and we showed earlier that there are only O(N²) of these. Specifically, for each transition, consider the intersection of the old and new state rectangles, and call it the "transition rectangle". This will differ from the new state only in the bottom edge, so it will be 3-side bounded as claimed. We can reconstruct the new state from the transition rectangle simply by extending the bottom edge as far as it can go; we can also reconstruct the old state as follows:

Remove one row at a time from the top until it is possible to extend the left or right edges.
Extend the left and right edges as much as possible.

Since we're able to reconstruct the old and new states from the transition rectangle, the transition rectangle must be unique to the transition, and hence there can only be O(N²) transitions.

The overall complexity depends on the implementation of the range max query and of the data structure used for memoisation. With practical implementations the algorithm will have O(N² log N) time and either O(N²) or O(N² log N) space, depending on the range max data structure used.

Incidentally, while the official model solution explicitly computes all the maximal rectangles, it's not actually necessary to do so. In my solution they are needed only to prove the efficiency.

Saturday, December 10, 2022

Solutions to ICPC world finals 2021

In case you're wondering why I'm only posting this in 2022: the "2021" world finals happened in November 2022, in Dhaka. You can see the problems here. I haven't seen a writeup of the solutions (not that I've looked very hard). So I'm writing up my solutions. These are my own solutions, except for K, where I watched the official solution and particularly liked it. You can find the official solution videos here. The proof of correctness for E was provided by Derek Kisman.

A: Crystal Crosswind

Consider a single wind direction, and two cells A at \((x, y)\) and B at \((x - w_x, y - w_y)\). If there is a boundary at A, things are simple: there is a molecule at A and there is no molecule at B. If there is no boundary at A we can't immediately determine where the molecules are (unless B lies outside the grid, in which case there is on molecule at A), but we know that if there is a molecule at A there is also one at B, and conversely if there is no molecule at B then there is no molecule at A. We can add these are edges to two different graphs.

Now every time we know there is a molecule in a particular location, we can follow edges in the first graph to find new molecules, and similarly when we know there is a hole somewhere, we can follow edges of the second graph to find new holes. We can just keep doing this until there is no new information to be found.

This may still leave us with unknown cells. The key observation is that leaving all these locations as holes will yield a valid configuration. Suppose we made a hole at \(P_1\), and that implied there had to be a hole at \(P_2\), which implied there had to a hole at \(P_3\) etc, until eventually it required a hole at some \(P_k\), which already contained a molecule. That cannot happen, because the molecule at \(P_k\) would already have forced a molecule at \(P_{k-1}\).

Similarly, we can safely fill all unknown cells with molecules to get the maximum crystal structure.

B: Dungeon Crawler

Assume we've decided to end the traversal at some node E. Any edge on the path from S to E will necessarily be traversed an odd number of times, and any other edge an even (and positive) number of times. Let r be the sum of all the edge lengths, and let d(P, Q) be the distance between nodes P and Q. If we didn't have to worry about the trap, then we'd be able to achieve \(2r - d(S, E)\), using each edge the minimum number of times.

Let's consider the tree to be rooted at K, and let L be the least common ancestor of S and T, and M be the least common ancestor of L and E. If L = T then the problem is impossible (can't get from S to K without going through T). Otherwise, every edge on the path from L to M has to be traversed 3 times: first to get to the key, then to get to the trap, and finally to get to the end. So the total cost will be \(2r - d(S, E) + 2d(L, M)\). We can now consider two cases.

E lies in the subtree of L, so L = M. Then the cost is just \(2r - d(S, E)\). So for this case, we want E to be as far from S as possible.
E does not lie in the subtree of L. In this case a bit of manipulation (using knowledge of which vertices are ancestors of which) shows that the cost is equivalent to \(2r - d(E, K) + d(L, K) - d(L, S)\). Since L, K and S are independent of E, it follows that we should choose E to be the vertex furthest from K.

For each case, we want to choose E as far from some vertex V (either S or K) as possible, while excluding a particular subtree. Let's consider the tree to be rooted at V, and index the vertices using a pre-order walk. The range of vertices to exclude are contiguous in this pre-order walk. Thus, we wish to consider only a prefix and suffix of the vertices. By pre-computing prefix and suffix maximums (of depth) for every possible root, we can choose E in each case in constant time (once we have L and its neighbour towards K).

The time complexity depends on the method used to find least common ancestors. With a simple structure where every node stores ancestors at power-of-two distances, it will require \(O(n^2\log n + q\log n)\); with more sophisticated structures (such as heavy-light trees) this can be reduced to \(O(n^2 + q\log n)\), and I believe it can (very) theoretically be reduced to \(O(n^2 + q)\) using the method of the Four Russians.

C: Fair Division

It's slightly easier to work with the fraction of loot that gets passed on, rather than the fraction each pirate takes. Let's call that g (so g = 1 - f). Let's work out how much loot each pirate gets. Let's say a pirate gets x in the first round. In the next round they get \(g^nx\), then \(g^{2n}x\) and so on. This is a geometric series, and in the limit they will get \(\frac{x}{1-g^n}\). For the ith pirate (counting from 0), x will be \(g^i(1-g)m\), so the total loot will be \(\frac{g^i(1-g)m}{1-g^n}\).

Let \(g = \frac{a}{b}\), where a and b are relatively prime. Multiplying top and bottom by \(b^n\), we see that the loot is \(\frac{a^i b^{n-1-i}(b - a)m}{b^n - a^n}\), and in particular \(b^n - a^n\) must divide into \(a^i b^{n-1-i}(b - a)m\). Suppose some prime p divides into b. Then it divides into \(b^n\) but not into \(a^n\) and hence does not divide into \(b^n - a^n\). Thus, the denominator is relatively prime to \(b^{n-1-i}\), and similarly to \(a^i\). It's thus required that \(b^n - a^n\) divides into \((b - a)m\).

We now see the reason behind the condition n ≥ 6: it strongly limits how big b can be. It's sufficient to test all values of b up to the 5th root of m to find those that satisfy the condition.

D: Guardians of the Gallery

This is a typical hard geometry problem: relatively simple conceptually, but with lots of corner cases and opportunities for floating-point instabilities.

I don't have a formal proof, but the guard's path can be split into two stages:

Optionally move to a vertex of the polygon, possibly via other vertices. We can find the minimum distance for this by considering a graph consisting of the polygon vertices and the guard, with edges where the line segment connecting vertices is unobstructed. The graph is small so Floyd-Warshall is sufficient.
Move to the nearest point on a segment originating at the art, passing through a vertex, and terminating where it is no longer possible to view the art. This might be an orthogonal projection onto the segment (interior to the polygon) or it might be the endpoint of the segment.

We thus need to know how to find the terminating point of a segment as contemplated in the second part. We can trace the polygon counter-clockwise, and check each edge against this ray. If it completely crosses the ray, that gives us a bound. But what if it just touches the ray? We can separately track edges that touch the ray from one side (obscuring one half of the art) or the other (obscuring the other half); the further of these two determines the cutoff point.

We also need to be able to check whether a line segment is obstructed by the polygon. It's complicated by the fact that either endpoint may lie on the polygon itself. It's not sufficient to determine whether the polygon intersects the interior of the line segment (i.e., excluding the endpoints), because the segment may lie entirely outside the polygon, or it may lie along an edge of the polygon or pass through a vertex. Solving this boils down to careful handling of polygon vertices that lie along the line segment, to ensure that the line segment does not go outside the polygon at any of those vertices.

E: Hand of the Free Marked

Firstly, it is quite easy to show that probabilistic strategies are of no benefit. Suppose the team use a probabilistic strategy that involves picking a die from a selection (with a number of sides appropriate to the situation) and rolling it. Then instead, they could just roll all the dice in advance, before the cards are selected. Each possible roll would have some probability of success, according to the chosen strategy. They might as well just fix the dice in the combination that gives the best probability, and consider that a deterministic strategy.

We can now see this as a bipartite matching problem. Let U be the set of all possible (unordered) sets of cards that could be drawn, and let V be the set of possible ordered sets of k - 1 cards plus a marking for the final card. The assistant maps each element of U to one element of V, and the magician maps each element of V to some element of U. The trick cannot work for two different elements of U that map to the same V (the magician will only be correct for one of them), so the elements for which the trick works form a bipartite matching.

Edges exist where the element of U and of V have consistent information. Let's refer to the different markings as "colours", and let the "spectrum" of a set of cards consist of the number of cards of each colour of the set. The spectrum is known for each element of V, so it follows that each spectrum constitutes a separate connected component of the bipartite graph (we haven't provided that each spectrum is connected, but it's simple; the important point is that there are no edges between spectra). So we can solve the problem independently for each spectrum then combine the results.

From this point on, we'll assume a fixed spectrum, and let U and V denote only the elements within this spectrum. We can hypothesize that the size of the maximum matching is simply the smaller of |U| and |V|. This is not true for general bipartite graphs, but these graphs have a somewhat regular structure that suggests it (and it is clearly not practical to use a generic bipartite matching algorithm). If you take it on faith, you'll be able to solve the problem, with some basic combinatorics to compute |U| and |V|. The number of possible spectra is small enough to be easily handled.

Let's prove the result though. We'll use Hall's Theorem. Elements of V can be classified according to the colour of the hidden card. Let's denote the subsets of V as \(V_1\), \(V_2\), \(V_s\) for some s. All elements in \(V_i\) have the same degree (let's call it \(d_i\)). What's more, every element in U has the same number of connections to \(V_i\), say \(e_i\). It follows that \(d_i|V_i| = e_i|U|\). Now we can consider the two cases:

|U| ≤ |V|. Consider some subset A of U, whose image in V in B. Let \(B_i\) be the intersection of B and \(V_i\). Let's consider the total number of edges between A and \(B_i\). On the one hand it must equal \(e_i|A|\). On the other hand, it cannot exceed the sum of degrees of \(B_i\), namely \(d_i|B_i|\). So \(d_i|B_i| \ge e_i|A| = d_i|A||V_i|/|U|\) and hence \(|B_i| \ge |A||V_i|/|U|\). Adding these inequalities gives \(|B| \ge |A||V|/|U| \ge |A|\). Thus, the Hall condition is satisfied.
|U| ≥ |V|. Consider some subset B of V, whose image in U is A. Define \(B_i\) as before. Between A and \(B_i\) there will be \(d_i|B_i|\) edges, and also at most \(e_i|A|\) edges, giving \(d_i|B_i| \le e_i|A|\). The argument proceeds as before, just with the directions of inequalities reversed.

F: Islands from the Sky

Firstly, congratulations to the judges for creating a geometry problem (and a 3D geometry problem no less!) that is actually relatively easy to solve.

Each island needs to be completely scanned by some plane. So we can take each island, each plane and check what angle needs to be used by that plane to scan that island. For each island we can take the minimum angle over all planes, and then we can take the largest angle over all islands.

Because each plane scans a convex region, it's sufficient to check the vertices of an island. For each vertex, project the vertex onto the (2D) flight path to determine the point at which it is scanned, then use linear interpolation to find the altitude of the plane at that point. Then simple trigometry (using the distance of the vertex from the projection) gives the minimum angle.

G: Mosaic Browsing

It's clear that a naïve implementation will be too slow. But considering every possible way to slide two signals over each other and compute some summary seems remarkably like convolution. We can in fact make it so with some tricks. First, assign each colour a unique point on the complex unit circle (ideally equally spaced, but that's only important for numeric precision). In the motif, use the conjugate values, and use 0 for uncoloured cells. When multiplying a value from the mosaic with a value from the motif, the result will be 1 if the colours match, or something with a real part less than 1 if they do not. Thus, the real part of the sum will equal the number of coloured cells of the motif if and only if there is a match.

There are a few details to take care of (around sign conventions and padding), but essentially the problem has been reduced to a 2D convolution. Using the Fast Fourier Transform, it can thus be implemented in \(O(RC \log(RC))\), where \(R = r_p + r_q\) and \(C = c_p + c_q\).

H: Prehistoric Programs

Firstly, if the total numbers of opening and closing parentheses are not equal, then clearly it is impossible. Otherwise, we can summarise each tablet with two numbers: its balance (number of opening minus the number of closing parens) and its depth (negative of the minimum balance of any prefix).

Note that it is never necessary to have a negative-balance tablet followed by a positive-balance tablet: if this occurs in a valid solution, they can be swapped around to give another valid solution. Similarly, given two adjacent positive-balance tablets, we can always place the one with the smaller depth first. Tablets with zero balance can be (arbitrarily) treated as positive-balance.

This leads to a greedy solution: from the left, try to place the positive-balance tablets in increasing order of depth, then the negative-balance tablets in decreasing order of depth. Either it gives a valid solution (easy to check) or there is no valid solution.

I: Spider Walk

The traversal rules are time-symmetric, so we can just as easily ask how many bridges to add to ensure that starting at strand S and walking inwards will reach the centre from a particular strand. Let us maintain some distance d and track the minimum cost (in new bridges) to reach the point on each strand at a distance D, starting from S. We start with D at infinity and incrementally reduce it by considering the bridges in decreasing distance from the centre.

Suppose there is a bridge linking strands P and Q at distance d. For D slightly larger than d, we have costs to reach P and Q of \(c_P\) and \(c_Q\) respectively. Once D shrinks below d, the cost to reach P is at most \(c_Q\), and the cost to reach Q is at most \(c_P\). However, the costs may actually be less if the costs to reach the neighbouring strands are \(c_P - 1\) or \(c_Q - 1\) respectively.

Once the new costs for P and Q have been computed, the costs to reach every other strand need to be updated using these as starting points (e.g. if the new cost for P is \(e_P\), then a strand 5 away from P can now be reached with cost \(e_P + 5\), which may be better than the previous value).

Performing this cost update efficiently will require a suitable data structure. It requires the following operations:

Set a single value
Update a range of values [L, R) to the minimum of the current value and \(v + i\), where v is given and i is the index.
As above, but use \(v - i\) rather than \(v + i\).

The \(v + i\) and \(v - i\) updates correspond to the two directions one could traverse the web. I found it simplest to use two separate segment trees for the two types of updates, with the final result being the smaller of the two. Each segment tree node holds a value of \(v\) that has been applied to its entire corresponding subtree. Setting a single value requires pushing these internal values down the tree.

Overall complexity is \(O(n\log n + m\log n + m\log m)\) (the final term is just to sort the bridges; it could probably be eliminated, but there is little need.

J: Splitstream

For each edge, we'll need to know the total number of values that are sent along it. This can be done with dynamic programming, or recursively with memoisation. Then to find the kth element along an edge, one needs only consider a few cases. If the edge start from a split node, it is simple to determine the corresponding index on the edge into the split node. For a merge node it is slightly more complex, depending on whether one or the other incoming edge will run out before the kth element is output.

K: Take On Meme

As a reminder, this is not my solution, but rather one presented during the livestream.

We can imagine all possible final memes as points in a plane, and we're trying to find the one furthest from the origin. This will lie on the convex hull of the possible memes. Let's construct that convex hull.

Suppose we wish to find the point which lies furthest in a particular direction (formally, has the largest dot product with a chosen vector). This is a reasonably straightforward tree DP, where for each node of the tree we find the meme with the largest and smallest dot product. Whatever point we find is guaranteed to be on the hull. By picking a few (e.g. 4) directions, we can find a points on the hull. We can then find the rest recursively: pick two adjacent candidates, and search for the furthest point in the direction orthogonal to the line joining them. This will either yield another point on the hull, or prove that the two candidates are neighbours on the hull.

This leaves the question of how many points might lie on the convex hull — after all, there are exponentially many ways to select winning memes. However, the points all have integer coordinates with magnitude limited to 10⁷, which strongly limits the number of points that can be placed on the hull: a hull with coordinates limited to X has at most \(O(X^{\frac{2}{3}})\) points.

L: Where Am I?

Take one step at a time, keeping track of sets of starting positions that are indistinguishable so far. When taking a new step, each set is partitioned into two (although possibly all into one side) based on whether the new step would reveal a marker or an empty space for given starting positions. Every time a set of size 1 is produced, you know how many steps are needed to identify the location for someone who starts in that location.

Update: apparently the judges intend this solution to time out; I can't tell whether mine does or not since there is no online judge, but my implementation (which isn't super-optimised) runs in 0.9s.

Saturday, July 23, 2022

More thoughts on Rust

I've been playing with Rust again recently (see my previous post on the topic) and have some more thoughts. It's a mix of wonder and horror.

First, the good stuff. As a reminder, Rust restricts the references you can make: at any time, you can either have a single mutable reference to a variable, OR an unlimited number of shared (immutable) references. When I first saw that my immediate thought was that it is great for concurrency, because that's exactly what you need to avoid data races, but it seemed like just an inconvenience for single-threaded code. However, I've since realised that there are several advantages:

Iterators. A common problem in several languages is "iterator invalidation", which occurs when the container being iterated is modified while you're iterating. In other languages, at best you're getting an exception, and possibly you're getting undefined behaviour. In Rust, the iterator holds a shared reference to the container, making it impossible to mutate it for the lifetime of the iterator. This does have downsides though: it's impossible to keep a handle to an element in a collection while still allowing new elements to be added, even if the collection doesn't require reallocation to do so (e.g. a linked list).
Aliasing and side effects. A function that takes a mutable output parameter is guaranteed that it won't alias any of the inputs, and can optimise accordingly. And a function A that calls a function B can be sure that B won't modify anything to which A holds a reference.

The bad stuff is that Rust has too much magic and it's not all specified in the documentation. The particular case I ran across is in my Stack Overflow question. In short, writing a function with a template parameter is not the same as writing it with the concrete type, because the compiler has special treatment for arguments that are lexically declared &mut. The reference manual doesn't mention this, and it says basically nothing about how template instantiation is done. What's more, the special behaviour that's triggered is itself undocumented, and pretty arcane (references are in some manner "reborrowed"). One of the side effects is that the identity function isn't actually a no-op.

The borrow checker also relies on trying to infer what the relationship between a functions inputs and outputs is, based on lifetime specifications, rather than letting you be explicit. Consider this code:

fn inc_ret<'a>(x: &'a mut i32) -> &'a i32 {
    *x += 1;
    &*x
}

fn main() {
    let mut x = 1;
    let y = inc_ret(&mut x);
    println!("{} {}", x, *y);
}

The inc_ret function takes a mutable reference to an integer, increments it, and return an immutable reference. This code should be perfectly safe, because y is an immutable reference to x. However, the borrow checker simply relates the output to the input via the common lifetime ('a) and can't tell that the mutability doesn't pass through to the output, so it refuses to compile this code.

Another piece of magic I'm not totally happy with is the dot operator. Unlike in C, there is no -> operator; instead the compiler will automatically dereference references for you. That might be okay if it wasn't that reference types can themselves have methods (via traits), and it can be ambiguous whether you want the method on the reference or on what the reference points to. The Rustonomicon has a horrifying example of this:

fn do_stuff<T: Clone>(value: &T) {
    let cloned = value.clone();
}

That's a function taking a reference to a cloneable value and cloning the value. But, if you forget to specify the : Clone, it will still compile, but clone the reference instead (even if T is cloneable). So restricting the types accepted by your function has actually changed the semantics!

Sunday, May 02, 2021

Disassembling Endo, part 2

If you haven't read Part 1 yet, read that first. This is Part 2.

As a reminder, here's how far we got in Part 1:

We have a bit of trouble with the VMU. It's time we started doing a bit of disassembly to see why. In general, static disassembly is impossible, because the machine is entirely built on self-modifying code, but fortunately most of the time one rule follows another in a sensible way. There are exceptions where some data is embedded directly into the code (rather than stored on the stack or in separate symbols), but these can be treated as special cases.

The first thing we see in the function vmu is IIICCPIICC, which emits the RNA CCPIICC. Why? It's not valid RNA. This was mentioned on help page 2181889, suggesting that these codes starting with a C indicate "a small part of the morphing process" is done. In fact, you'll find that all the functions start with such a piece of RNA, and they have unique codes. From this, it's possible to examine an RNA trace to determine the control flow at the function level. I ended up not really needing this, because I had some tools in my DNA interpreter for determining what code was running (since copying the code to the red zone just references the original code, it's possible to determine the origin of instructions), but it was quite useful when our team originally competed.

Incidentally, while RNA is technically part of the pattern-matching rules, it almost always occurs as the very beginning and so can be treated as if it is an instruction on its own, and I'll continue to treat it like that. The "next" instruction, then, is (![7512628]) -> (0)IIIIIIIIIIIIIIIIIIIIIIIPIIIIIIIIIIIIIIIIIIIIIIIP. If you do some calculations based on how much of the function is left in the red zone, you'll see that this is adding some data to the start of the stack. Those calculations rapidly get tedious, so it's worth writing a tool to match various patterns and print out higher-level information. In this case one can recognise it as a push and print that out.

Next is ![33](![213118]CCIICCIIIIIIIIIIIIIIIIIP) -> (0). This is a conditional forward jump: it moves to a particular address (which is vmuMode) and tries to match a number there (51). If it all matches, then the next 33 acids will vanish; otherwise nothing happens. The next 33 acids are exactly the length of the next instruction, which is ![218] ->, which is an unconditional jump. Taken together, these say to jump ahead if vmuMode is not 51.

Now we have ![2906](![4406243](![1805])![3101361]) -> (0)(1)IICCCIICICCCCCICCCIIICIPICICCICICCICIIIIIIIIIIIP. Once again, one needs to interpret the numbers: 2906 is the remainder of the red zone, 4406243 and 1805 are the address and size of the function ufo-with-smoke, and 3101361 takes us to the blue zone. We then erase the rest of the red zone, copy ufo-with-smoke to the red zone, and push two numbers on the stack. That sounds like a function call, and indeed you can confirm that the two numbers are the address of the following instruction, and the length of the remainder of the function, for returning to.

After this there is another jump (![2586] ->), and another conditional jump: ![33](![212772]CCCCCIIIIIIIIIIIIIIIIIIP) -> (0), this time comparing vmuMode to 31. Aha, we didn't know to try that value earlier!

Fortunately we've already followed the clues to find the registration code (Out_of_Band_II), so let's write that into vmuRegCode (remember to terminate the string with a value of 255). And now we get our caravan, but in the wrong place:

Wait a second, the caravan was encrypted, and we didn't decrypt it. But we know that the VMU registration code is also the encryption key for the caravan, so presumbly vmu is doing the decryption for us. Let's take a look at the next few instructions:

(![7511175])![48] -> (0)IICIICICICCIIIIICCCICICPCICIICCICCIICIIIIIIIIIIP

((![1200])![7509907]) -> (1)(0)

(![211586](![1152])) -> IIPIPIIIIIIICIICPIICIIPIPIIIICICCCCICCIICICIICCCPIICIPIIIIIIICIICPIICIPPCPIPPPIIC(0)(1)

The first writes some values into those stack variables we saw earlier: specifically, the address and size of caravan. Then it pushes more data to the stack. Weirdly, it's pushing the immediately following 1200 acids, which form code. It turns out that the compiler the organisers used to generate code makes function calls by first pushing a block of garbage to the stack, then overwriting it, rather than directly pushing arguments. So this is just making space on the stack for 1200 acids of arguments.

What's going on with the third instruction? It's writing a long bunch of stuff at the business end, prior to any substitutions. Code generating code! This turns out to be fairly common, and disassembly becomes much easier if you detect this, parse the embedded code and present it as a separate rule. Where it's important I'll prefix such follow-on instructions with a +. So let's try to parse that again:

(![211586](![1152])) -> (0)(1)
+ (![1152])(![7510992])![1152] -> (1)(0)

The first instruction copies 1152 acids from the green zone (vmuRegCode) and inserts them at the "start" of the DNA (although not really the start, because it's after the embedded instruction). Then the embedded rule vacuums it back up and writes it into the stack. This shows one advantage of the embedded rule: if it had been written as a separate rule, then the first rule would have inserted the data before the second rule, and things would have gone wrong.

You may be wondering why this needed to be done with this trick, instead of just a single rule that both collected and delivered the data to the stack. It would be possible, but if the data source occurred after the destination in the DNA, it would have been required a different sort of pattern. With this approach, the loading and storing are independent, which probably makes the compiler easier to write.

Carrying on we see just such usage:

(![7511999](![24])) -> (0)(1)
+(![24])(![7510823])![24] -> (1)(0)

(![7511878](![24])) -> (0)(1)
+(![24])(![7510654])![24] -> (1)(0)

This copies the address and size of caravan to that 1200-acid block as well. Here's more evidence that a compiler (and not a super-optimised one) is involved: a human would have just written the values directly to where they were needed, rather than writing them into a temporary and then copying the temporary.

Perhaps not surprisingly, at this point we call crypt. We've given it the decryption key, address and size of caravan, so it will do its thing. Note that, as documented, strings are always passed as 128 characters, even if they are actually cut short by a terminator.

The rest of the function isn't particularly exciting, so I'll jump ahead to near the end.

(![7509644])![48] -> (0)
RNA: CFPICFP
![75](![7509409])(![24])(![24]) -> IIPIP(1)IIPIP(2)IICIICIICIPPPIPPCPIIC(0)

The first pops 48 acids from the stack (the ones we pushed right at the start), which should leave the return address at the top of the stack. Then we see the RNA sequence that we're told indicates a return. The return statement itself uses more code-generating-code, but this time it's not just a fixed rule stuck on the front of the template: the rule is generated based on values in the pattern! This is how indirection is performed. To split off the leading rule, I had to invent some new syntax: <t:X> means part of a rule is defined by the template at the previous level. So rewriting the return using this syntax gives:

![75](![7509409])(![24])(![24]) -> (0)
+(![<t:(1)>](![<t:(2)>])) -> (0)(1)

In other words, the second rule will skip forward not by a constant difference, but by the number found in replacement (1) of the first rule i.e., the return address; then it will skip forward by (2) from the first rule i.e. the return size (remaining amount of code to execute from the calling function).

Parsing this requires the disassembler to do another impossible thing, because the parsing of the inner rule depends on the replacement values e.g. if (1) was empty then the following IIP would be the number zero on skip instead of an opening bracket. Once again, it's possible to make progress because the actual code seen is generated in fairly consistent ways and so heuristics work well e.g. if one sees a skip instruction immediately followed by a template parameter, it's a good chance that it will be substituted by a number, rather than some arbitrary code. And again, there are exceptions that need to be handled on a case-by-case basis.

There is also some trailing stuff at the end of the function which I haven't entirely understood. It seems to occur at the end of every function. I've split it across several lines since it appears to consist of distinct pieces:

CPPPPPP
IICIICIICIICIICIICIICIICIICIIC
CCFIICIIC
IFFCPICCFPICICFCPICIICIIC = ?[IFPICFPPCIFP] ->
IIII

Overall I think the idea is to terminate the machine if code is being executed in the wrong place (e.g. if you patched code in a broken way that meant rule decoding wasn't happening on the intended boundaries). I don't know what the first and last lines are for, although the first might might be intended to terminate numbers and to create a match item that won't match. Then there are a lot of IIC's, which should ensure that any open brackets get closed; excess IIC's will be consumed in pairs (forming empty rules). The middle line can be interpreted in two possible ways: if all the previous IIC's were consumed, then it is the rule IIC ->, which won't match so is ignored. If the last IIC on the previous line formed an (empty) pattern, then CCFIIC becomes the template IIC, which will combine with the remaining IIC to form another empty rule. So regardless of the parity, the end of this line is highly likely to be the end of a rule so that we're all synchronized for line 4, which searches for a marker that's right at the end of the DNA and skips to it, shutting everything down.

Let's see how addition works, because it comes up here and there. There is a function called addInts, which seems like a good place to start. It's almost never called, addition normally being done inline, so I suspect it's mostly provided for educational purposes. The core instructions look like this:

(![7510064](![24])) -> (1)
+(![7510040](![24])) -> (1)
+(![170])(![<t:<t:(0,1)>>]![<t:(0)>]) -> (0)|1|(1)

(?[P]) -> F(0)
+(![<t:|0|>])P -> (0)IIIIIIIIIIIIIIIIIIIIIIIP
+F(![23])?[P](![7509918])![24] -> (1)(0)P

Now we have 3 layers of rules in one, and nested <t:...> constructs. <t:<t:...>> means that the content is defined by the rule two lines above.

The first set of rules does the actual addition. The first two lines just grab the two values to add (from the blue zone). The third line does the real work. ![170] moves just past the following rule, and then we have two skips, with the distances determined by the two numbers we grabbed. We then write the length of this combined skip (i.e. the sum) just after the next rule.

So we're using the length-of template operation as a hack to implement addition. The paper by the organisers mentioned that this feature was specifically added to make arithmetic faster, although it's useful in other places too. It also explains the documentation for the init function: "Ensures happy, trouble-free arithmetic by growing the DNA to the right length." That is necessary because if the jumps go off the end of the DNA the pattern will be considered not to have matched.

The reason for the second rule is that the first will write the sum with the minimum number of bases, whereas the higher-level code is all designed to deal with 24-acid numbers (or sometimes 9 or 12). So we need to pad the number out to the right length. We want to be able to grab the bits of the number without the terminating P, so that we can add some I's after it. Unfortunately, (?[P]) includes the P, so we need to use a little trickery. Let's say the number is currently CICCP: 4 bits and P. The first step inserts an F in front of the number. The second reads 5 bits, namely FCICC, discards the original P, and appends a bunch of I's and a replacement P. The third discards the F, reads 23 bits (the original 4 plus some of the I's), skips ahead to the P, then places those 23 bits and a P. So we now have a 24-acid number in the right place.

Now that we've seen some function calls and returns, let's see if we can fix something in the picture. Note that there many, many ways to achieve each goal, and I'm just going to list one (or a few) of them.

We'll replace the apple trees with pear trees. To do that, we're replace the code in appletree to just call peartree instead of drawing an apple tree. He's the code we'll write at the front:

![14036](![5800492](![14123])) -> (0)(1)

This looks similar to the function call we saw earlier, but the return address is missing, and so is the jump to the start of the blue zone (which we no longer need since we're not pushing anything). So how does peartree know where to return to? The function that called appletree will have pushed a return address, and that's what peartree will pop and return to. This is what's known as a tail call, and it saves us the bother of writing our own return statement.

Let's take a look at what went wrong with the virus help page that caused the title to be upside down (and if you swapped out the wingding font for a normal one, you'd have noticed that all the text is upside down). It has a very deeply nested rule:

(![313]) -> (0)(0)

(![313])(?[P]) ->
+(![<t:(1)>]?[IIIPCCCCCP]) -> IIIPCCCCCP(0)
+(![<t:|0|>])![10] -> (0)IIIPIPIPIP
+![10](![<t:|0|>]) -> <t:<t:<t:(0,3)>>>|0|(0)
+(![313]) -> (0)(0)

This shows one way to implement a loop, particularly in standalone code that can't rely on the green zone. The first instruction reads the body of the loop and makes two copies of it. The last instruction of the loop does the same thing, so there is always one copy of the loop being consumed and one backup copy.

The loop itself reads a number embedded immediately after the (backup) loop cody, jumps that far ahead (using <t:(1)> to read the number loaded in the previous line), then searches for a string. It also adds a copy of that string to the front of the DNA. This is for a similar reason to addInts adding an F to the front: it allows one to skip to the start of the searched-for string, instead of the end (third line of the loop body), and replace it with a different string. The fourth line discards those 10 filler acids that were placed at the front, and makes a jump purely so that its length can be used in the replacement. The template replaces the backup copy of the loop (which was consumed in the first line) and puts the jump distance after it (replacing the number read in the first line).

What does all that mean? It's a search-and-replace. The number stored after the backup loop is the distance already searched, so that searches don't have to restart from the beginning. It jumps ahead to the position indicated by the number, then finds the next match and replaces it. One thing to keep in mind with nested rules is that if one rule in a set doesn't match, the subsequent ones aren't produced and hence are not run. Thus, once the last match is replaced, the loop stops, because the first step erases the backup copy of the loop (and the number) and the steps that replace it won't occur.

In this case, the loop replaces IIIPCCCCCP, the RNA for counter-clockwise turn, with IIIPIPIPIP (nonsense RNA). This followed by a similar loop that replaces IIIPFFFFFP with IIIPCCCCCP, then another to replace IIIPIPIPIP with IIIPFFFFFP (RNA for clockwise turn). Thus, it's swapped left with right, which explains the upside-down page.

We probably wouldn't have been warned about this for nothing. Maybe this loop is the virus that the page warns us about? We can search for it elsewhere. Some parts of it change depending on what swap is being been made, but the replacement template <t:<t:<t:(0,3)>>>|0|(0) seems pretty unusual, and is independent of the replacement code. That corresponds to IPCCPPPPFPFPPFPFPFP in the original DNA, which we can search for. There are 4 matches: the 3 we've already seen..., and one in surfaceTransform. Just by running that function, we can see that it's responsible for the hills, but also recall from part 1 that when we tried setting hillsEnabled, things went horribly wrong:

In that case the virus is replacing IIIPIIPICP (RNA for resetting the colour bucket) with IIIPIPIPIF (nonsense RNA). No wonder everything seems so green. Let's consider a general approach to getting rid of sections of code we don't want: using a forward jump. We overwrite the start of the virus with the rule ![n] ->, for some n. It's actually not quite trivial to determine n, because it needs to be the length of the virus minus the length of the replacement rule, and we don't know the length of the rule until we know how many acids we need to encode n. One approach is just to use a large fixed number of bits (with 0's in the high-order bits); this seems to be generally the approach taken by the organiser's compiler. However, we get a higher score if we use a shorter prefix, so there is some motivation to use shorter replacements. My approach was to use a function that would take an initial estimate of the position of the end of the rule, compile it, then update the estimate and repeat until convergence. In theory there are some corner cases where this could oscillate and a padding bit would be required, but I've not hit one yet.

A more efficient solution (in terms to changing fewer bases) is just to change the string that is searched for. But the tool above will come in handy elsewhere.

With that fixed, the hills now appear in the scene, although they're not yet in quite the right positions:

So how are those hills drawn anyway? It actually involves a fair amount of code. Here's what my disassembler spits out for the first hill:

0x0542 <blue+0x018> := 0
0x058e <blue+0x000> := 242
0x05da CALL moveTo
0x0693 PUSHS 120
0x06d3 <blue+0x018> := 8388580
0x071f <blue+0x000> := 3348
0x076b CALL functionParabola
0x0824 cbfArray+0x48[:72] := POP 72
0x08a1 PUSHS 144
0x08e2 <blue+0x030> := 408
0x092e <blue+0x018> := 7
0x097a <blue+0x000> := 4
0x09c6 CALL functionSine
0x0a7f cbfArray+0x90[:72] := POP 72
0x0afc PUSHS 216
0x0b3d <blue+0x048>[:72] = cbfArray+0x48[:72]
0x0bd4 <blue+0x000>[:72] = cbfArray+0x90[:72]
0x0c6b CALL functionAdd
0x0d24 cbfArray[:72] := POP 72
0x0da1 PUSHS 120
0x0de1 <blue+0x030>[:72] = cbfArray[:72]
0x0e78 <blue+0x018> := 250
0x0ec4 <blue+0x000> := 50
0x0f10 CALL drawFunctionBase

PUSHS X means make X acids of space on the stack. [:72] just indicates the size of a value. The disassembler doesn't know about negative numbers and two's complement, so 8388580 is really 8388580 - 2^23 = -28. In summary, it generates two closures, one from functionParabola and one from functionSine, and adds them, then passes the resulting closure to drawFunctionBase.

The second hill is similar, but with different parameters and moveTo location, while the third uses only a sine function. The first piece of text we found in the DNA mentioned that the organisers had sabotaged the DNA by swapping some parabolas around, so see what happens if you swap the parameters of the two parabolas. To do this you'll need to find the positions of the numbers in the DNA (they're quoted). I found it very useful to reuse some of my disassembly and assembly code to disassemble the desired instruction, replace the value (after checking that it matched what I expected it to be) and reassemble the instruction. That allowed me to put in a number of safety checks so that typing errors didn't send me off on long debugging sessions.

That looks worse, but actually it is progress. If you compare this to the target image, you'll find that the left two hills are merely at the wrong heights. It helps to use an image editor that lets you shift the images relative to each other while showing a difference or other blend of them (I used the GIMP, opening the images as two layers and dragging one across the other). On the first hill, change the second argument to moveTo from 242 to 218, and on the second hill, from 209 to 235.

The third hill is trickier. You can almost make it match by changing both the x and y positions, but there are always a few pixels that don't quite match. I don't know if there is a clever way to determine the solution (maybe analysing functionSine to determine the meaning of the parameters and then fitting them), but brute force (automatically trying multiple values) will get you there if you know that only one of the parameters is wrong. In fact, the second argument must change from 65 to 60.

While it's generally not this bad, the contest does have a fair amount of tedious modification of coordinates of things to put them into the right place. For example, the caravan will need to move; you can find the position being set at offset 0x3b66 in scenario if you want to try your hand.

For now let's keep things interesting by looking at some code. But first let's get the last few pages of the repair guide. If you disassemble main, you'll see various bits of code that compare the value at helpScreen to a constant and then call one of the help functions we've already seen. Even if you don't have a disassembler that can interpret all the rules, you should be able to extract these to determine the valid repair page numbers. There are two that weren't shown in Part 1, and we missed them because they're implemented inline in main rather than in a help- function we could call.

1024 (which you might have found by brute force):

180878:

That yellow shape in the middle looks like exactly what we need to put at the centre of the sun (although it's too large). But page 1024 tells us more about the compressor hinted at by page 123456. Now that we've seen the virus, the principle of a loop that keeps copying itself makes more sense. It says it's used for bitmaps, so let's look inside a page that has a bitmap: alien-lifeforms.

The compressor itself looks like this:

0x182b   (![19])(![944])(?[P]) -> (0)(2)(1)
0x186c   (![56]![128]![32]) -> (1)(0)(1)
0x18a5   (![672])(![178]) -> (1)(0)(1)

Or at least it appears to: this is one of those rare cases of self-modifying code that fools disassembly. One should be suspicious of the middle line because it appears to reference a capture group that doesn't exist. Here's the raw DNA for it:

IIPIPIIICCCPIPIIIII IICIIPIPIIIIICPIICIICIPPCPIPPPIPPCPIIC

The first rule grabs the next number from the green area, and pastes it into the middle of the next instruction (at the gap shown). It appears after IPIIIII, which turns it into a jump that's 32× bigger than the number (left shift by 5). So for example, if the number is 3, the rule becomes

(![56]![96])(![32]) -> (1)(0)(1)

The ![56] jumps over the 3rd instruction, and the ![96] jumps to the 3rd 32-bit table entry, which is then copied to the front and executed. The final instruction just restores the red part from the orange part.

What's in these 32-acid table entries? One could conceivably do quite a lot, but they aren't used for much. Most of them consist of IPIIIIII...IPIII<RNA> i.e. a jump of 0 (just for padding) and then one piece of RNA (all of which get welded onto the front of the next instruction). The last entry (20) is special: it simply jumps over the rest of the compressor to resume execution.

Now that we know what the decompressor looks like, let's go see where it gets used. We can take the raw DNA and just search for it in the downloaded DNA. It appears in the following functions: 'M-class-planet', 'alien-lifeforms', 'cargobox', 'fontCombinator', 'fuundoc1', 'fuundoc2', 'fuundoc3', 'grass1', 'grass2', 'grass3', 'grass4', 'help-steganography', 'most-wanted', 'printGeneTable', 'sticky', 'transmission-buffer'.

Most of these are unsurprising, since they contain some sort of image or complex shape. But what about printGeneTable? It's just text — at least on the pages we asked for. Maybe, like the repair guide, there are hidden pages? Let's ask for page 15:

Notice the image in the bottom-right corner? It's a match for the whale spout in the target picture, although upside down. We'll come back to it when we start assembling the final picture.

The cargo box also uses the compressor. Dump the RNA in the table:

0x0256   Entry 0: RNA: move
0x0276   Entry 1: RNA: cw
0x0296   Entry 2: RNA: line
0x02b6   Entry 3: RNA: mark
0x02d6   Entry 4: RNA: ccw
0x02f6   Entry 5: RNA: red
0x0316   Entry 6: RNA: black
0x0336   Entry 7: RNA: white
0x0356   Entry 8: RNA: magenta
0x0376   Entry 9: RNA: fill
0x0396   Entry 10: RNA: reset
0x03b6   Entry 11: RNA: cyan
0x03d6   Entry 12: RNA: green

The source image has a rather purple cargo box, which is probably built from the magenta in the table. What if we change that table entry to yellow?

Visually it looks right, but comparing pixel values to the target, the filling in the A is now slightly the wrong colour, with too much green and not enough blue. The only other table colour that uses different amounts of green and blue is the last one (green). So maybe we need to change that to blue to balance it out? This does indeed work.

It's about time we turned the weather on again so that we can get the clouds and work on the cow. Set weather to 2 and enableBioMorph to true:

There are a few elements in the scene we don't want, such as the lightning bolt and the rain. We've seen one way to disable code we don't want (replace a chunk of code by a jump), but I'll demonstrate another that can disable individual rules with just a one-acid change (which will help our score). If you disassemble sky-day-bodies, you'll see the call to lightningBolt at 0x29a, with DNA that starts with a jump:

IPIIIICICIIIICIIIIIIIIIIIIP = ![2128]

What happens if we change the last I to a P? It becomes

IPIIIICICIIIICIIIIIIIIIIIPP = ![2128]F

Suddenly the rule is expect to have an F in a particular place (the start of the green zone) or it won't work, and there isn't one there. So instead of making a call, nothing happens:

Note that this only works for calls to functions without arguments, since the arguments are normally pushed separately and without the function call, they won't get popped again.

While we have this tool handy, let's also zap the calls to lambda-id (at scenario+0x4322), crater (at scenario+0x8b30) and cloak-rain (at scenario+0xa370).

What's with all the red, yellow and black? Disassembling the various functions shows that a variable called cloudy seems to play a role. Patching its value in the original DNA doesn't seem to help. With some hacks on the DNA interpreter it's possible to see where it gets changed (or you could just disassemble all the things): when cloud is run, it sets it to true. The trick we used earlier doesn't have quite the same effect here: we end up replacing

(![831561])![1] -> (0)P

with

(![831561]F)![1] -> (0)P

and since cloudy starts off as F, the rule still matches, and ends up writing a P into the following acid. Fortunately it's one that doesn't matter, but there is an alternative way to disable the rule, which will be useful later. The ![1] encodes as IPCP, and we change that to PPCP, which decodes as FFIF, which is unlikely to match unless we're very unlucky (if we are though, things will go badly wrong because we'll be resizing the green zone).

In Part 1 I pointed out that we can see a faint shadow of the desired cow behind the endo-cow hybrid. Let's see if we can recover the cow. If you look at the start of bmu, you'll see RNA early on to put one opaque and 9 transparent into the bucket. What if we change all the transparent to opaque?

Not quite what we were hoping for. You'll notice that after the transparency RNA there is a fill; so the entire layer is now solid black. The reason only a cow-shaped region appears black is that the cow is used to clip this layer. What we actually want is to compose the cow onto the scene. There are a few ways to fix this. One is to change the clip RNA into compose, and instead of changing the transparent commands to opaque, change the one opaque command to transparent (if the opaque command is left in, the whole scene ends up a bit too dark).

We'd better get rid of the unwanted endocow, which we can do the same way we've dealt with other unwanted scene elements.

His tail is missing, because I forgot to decrypt it. In this case, the code actually runs an integrity check on the tail (as well as on cow-spot-middle) and skips it if the integrity check fails. Recall from Part 1 that it is encrypted with 9546.

At first glance it looks good, but the colour is wrong. It fact, it's completely opaque, whereas in the target image it is translucent, and you can see the grass through it. We're going to need to patch it to insert some opaque and transparent commands to get the colour right. We'll need to determine the correct number.

One complication is that the entire scene is overlaid by a fine grid of almost-transparent lines, which is produced by the function anticompressant (as the name suggests, the purpose is to make it more difficult to brute-force the problem by generating the image directly with RNA). While not strictly necessary, it'll be easier to solve colour problems if we don't have to worry about it. For the source picture we can use one of the techniques already discussed to disable the function, but what about the target picture? To fix that that, we'll start by finding the image that anticompressant overlays. We can do that by getting our RNA-to-image tool to spit out the first image (including the alpha channel) each time composition is done. It's easy to mistake the anticompressant pattern for a black image because it is so faint. With the levels adjusted in an image editor, the colour and alpha components look like this:

Now using the equation for composing images from the specification, you can reverse the process. There is a rounding step which loses information, but because the overlay is so faint, in most cases the colour value can be recovered exactly, and in other cases there are only two possibilities.

So returning to the problem of the tail: let's choose a pixel and check its colour when the tail is absent (e.g. it wasn't decrypted), when it's present and fully opaque, and in the target picture; in each case with the anti-compressant absent or reversed. I got the following at (454, 348):

(0, 100, 0) when absent
(119, 166, 219) when opaque (and indeed on all pixels of the tail)
(83, 145, 152) in the target

Let's say we now add some alpha RNA to the mix, so that the tail has an alpha value of a. The colour will then be (119a/255, 166a/255, 219a/255, a), where division rounds down. Just looking at the red component, this tells us that 83*255 ≤ 119a < 84*255, and similarly from the blue component, 152*255 ≤ 219a < 153*255. The only integer value for a that fails into those intervals is 178. 178/255 = 0.6980, which looks pretty close to 0.7, and indeed creating 70% opacity (by issuing opaque 7 times and transparent 3 times) will produce an alpha of 178.

That tells us what RNA to add, but how to we do it? There isn't room in the function to add them directly, but we can do some compression. For example, if one replaces some RNA with a rule like ([!30]) -> (0)(0)(0) it will repeat the following three pieces of RNA three times. So as long as we write a rule that is a multiple of 10 bases long, we can overwrite some of the existing RNA, and generate our new RNA plus the RNA we overwrite. I'll leave the details as an exercise for the reader (if you get stuck, there is a solution listed here).

Unfortunately when you do this, the tail disappears again! Now the integrity check is working against us: because we modified the function, it fails the integrity check. We can hack checkIntegrity to always return true by disabling the jump at offset 0x611, which then gives us the cow with the correct translucent tail:

What about the whale? When we simply called whale it looked alive, but in our picture it looks dead. So possibly it examines some state to determine whether it is alive or dead? Indeed, the code has an IF statement (following the pattern we've seen before), and we can disable it in the usual way (replacing the high bit of a jump with a P in the instruction at offset 0xcb1). Incidentally, it seems like the function takes two boolean arguments but only uses one of them; I don't know why.

While we're dealing with animals, what about the ducks? If you disassemble scenario you'll see that there is a call to motherDuckWithChicks — so where they? Look a little further and you'll realise that it's unreachable code: a boolean is set to true, and then if that same boolean is true, we jump over the code. Break the goto in the usual way, and the ducks appear — although one seems to become lost in the trees.

What's not immediately obvious, but far more serious, is that half the elements in the scene have shifted two pixels to the left! You might remember from part 1 that motherDuck failed the integrity check, so there is probably an issue somewhere there that causes the cursor to end up in the wrong position.

Page 8 of the field repair guide describes how polygons are encoded, and says that the last pair is the sum of all movements. What if it isn't? It will lead to exactly this sort of problem. Polygons are easy to recognise in the DNA with a regex (remember that they appear quoted), so we can find the polygons and check them, and indeed one of the polygons in motherDuck is 2 pixels off. Finding which offset is wrong takes a little more work. I just used brute force: try adjusting every X value by 2 pixels, render them all, and check which matches the target (it turns out to be the 63 numbers from the start of the polygon). Visually not much has changed, but everything is back in the right place:

Next let's sort out the text at the bottom. It's going to be a little trickier than just replacing the text in the DNA, because the replacement text is longer. However, when everything went German, the text changed to "Endo hat gemorpht" which is exactly the right length for our replacement. So perhaps we can use just that bit of the code instead. Here's the relevant area of code:

0x9444 <blue+0x4b0>[:9216] = fontTable_Cyperus[:9216]
0x94f0 <blue+0x498> := 285
0x953c <blue+0x480> := 570
0x9588 <blue+0x000>[:162] := "Endo hat gemorpht"
0x968d CALL drawString
0x9746 GOTO 0x9c16
0x9767 colorTable := ...
0x98a2 charColorCallback := useColorTable,6709
0x9908 PUSHS 10416
0x994f <blue+0x4b0>[:9216] = fontTable_Cyperus[:9216]
0x99fb <blue+0x498> := 285
0x9a47 <blue+0x480> := 570
0x9a93 <blue+0x000>[:108] := "Morph Endo!"
0x9b5d CALL drawString
0x9c16 RNA: compose

Immediately after the PUSHS at 0x9908, we'll make a (backwards) jump to 0x9444, which will print the longer piece of text (and which will then safely jump forwards to 0x9c16). What does a backwards jump look like? Unlike a forward jump, the target code doesn't exist in the red zone, so we have to restore it from the green zone. We copy the code from the jump target to just after the jump instruction. As for forward jumps, we can use two passes to solve the problem of not knowing where the jump instruction ends until after we've compiled it. In my implementation, it ends up looking like this:

(![5074344](![1358])) -> (0)(1)

Let's get the sun back into the sky. We saw in Part 1 that we could see it, in the right place but the wrong shade of yellow, by setting weather to 3. From the target picture (particularly with the anticompressant removed) one can see that the sun is the same shade as the flowers in the bottom-right. Disassembling flowerbed shows that colour to be colorSoftYellow. So we need to somehow call that before drawing the sun. And as luck would have it, there is some code we don't need or want in sky-day-bodies just before drawing the sun, which is to check the value of weather!

0x072c IF weather != 3 GOTO 0x0aae
0x07ac PUSHS 48
0x07eb <blue+0x018> := 480
0x0837 <blue+0x000> := 20
0x0883 CALL setOrigin
0x093c CALL sun
0x09f5 CALL resetOrigin
0x0aae RNA: compose

There is plenty of room in that weather check to overwrite it with a function call, but we can also keep the prefix length down by not calling the whole function and instead just copying the RNA (which saves having to jump to the blue zone and write a return address). The recipe for this looks exactly like the backwards jump from above: we're copying code from the green zone to the front of the red zone, without disturbing the remaining code of the current function. Immediately after that we have to insert a forward jump to 0x07ac to skip over the remnants of the original code.

Unfortunately on its own this won't fix the colour of the sun, because sun starts by setting the colour:

0x000a RNA: reset
0x0014 RNA: yellow

We can fix this either by modifying the call to sun to enter after the unwanted RNA, or just alter the RNA to become nonsense RNA that will be ignored. With that in place (and don't forget to replace the whole sun function with the XOR of flower and sunflower first):

Next let's sort out the spirograph at the centre of the sun. As we saw, the code for it is in main, but we need a way to call it. A powerful technique is to patch the return addresses of function calls: this allows you to resume execution anywhere, even in another function, rather than with the next instruction. We're going to want to be able to control the position, so it'll be useful to get a call to setOrigin before executing the desired code. As it happens, crater has a call to setOrigin very early on. So instead of disabling the call to crater, let's leave it enabled and hijack it to draw the spirograph.

Change the return address at crater+0xf5 to main+0x6f9a, which starts drawing the central spirograph. It's completed at main+0x866d, which has a compose RNA to balance the add RNA from the start of crater. We still need a resetOrigin to balance the setOrigin added by crater. So add a forward jump to main+0x8ae7 (which is a call to resetOrigin).

After that resetOrigin call, we don't want to run the rest of the code in main. We could add another forward jump from after it to the end, or change the return address, but there is another trick we can use that requires a smaller change: we can remove the return address entirely, turning this into a tail call so that the callee returns directly to the caller. This isn't completely trivial: if we just terminate the terminate early then the skips in the call instruction will go to the wrong place (because we'll have consumed less of the red zone than expected). So we have to remove the return address without changing the length of the rule. We can do that by replacing the first three bases of the return address with IPP and the last with P. This turns this part of the template into (n,0), where n is some very large number. The specification says that such substitutions are simply ignored.

That gives us this (don't forget to remove the code that disabled the crater):

Clearly both the size and position will need to be fixed. We'll sort out a whole bunch of positions later, but let's look at the size. The first argument to spirograph is called magnify, which is a bit of a clue. One option is just to change it from 4 to 1 each time spirograph is called. If you want to make a smaller patch, one can look inside spirograph. It multiplies the next two arguments (radiusSum and radiusMoving) by magnify, writing the results back in place. We'd like to disable that code. The writeback looks like this:

(![7519442])(![24]) -> (0)
+ (![7519707])![24] -> (0)<t:(1,1)>

The first line fetches and removes the return value left by mulInts, and the second line uses it to replace an existing value. We can't disable the whole rule (remember, the + indicates that the second line is really a rule emitted by the template on the first line), but if we disable the second line it will have the desired effect. We can do this in our usual way (writing a P just before the end of the number in a jump), except that this time everything is quoted by an extra level, so we write IC instead.

While we're calling into odd bits of code to draw missing picture elements, let's sort out the whale spout. We can again pick a function that we suppressed earlier and repurpose it. We have to be a little careful in our choice though, because it needs to be drawn before the whale (which overwrites part of it). I chose to use lambda-id. For this part I haven't tried to be optimal, just to get something working. Replace the start of lambda-id with a tail call to printGeneTable+0x16ef5 (which sets the location before drawing the spout). At printGeneTable+0x23c62 place a jump to printGeneTable+0x271dc (the return statement). The return statement contains a ![1] to pop the boolean argument from the stack; change it to ![0] (without changing the length). In the RNA compression table, swap the entries for clockwise and counter-clockwise rotation. Don't forget to re-enable the call to lambda-id.

Well that went badly wrong. I'm not sure exactly why, but when objects wrap around (particularly to negative positions) it seems to confuse the code that keeps track of the current position. It's probably time to fix a whole bunch of positions. This is tedious work: find the code that calls moveTo to setOrigin, check pixel positions in an image editor, figure out how much to adjust the position by, and make the patch. In some cases one needs to provide a negative position to setOrigin so that the actual drawn position is in range (encoded using 2's complement). I'll just provide a list of instruction offsets (for the first of two instructions that set x and y) and the correct values:

scenario+0x3ba5: (267, 210) (caravan)
scenario+0x862c: (410, 200) (whale)
scenario+0x3157: (171, 410) (chick)
crater+0x005d: (-133, 147)
printGeneTable+0x16f34: (385, -75) (spout)
flowerbed+0x0049: (34, 0) (flowers in bottom right)
flowerbed+0x04c8: (0, 0)
flowerbed+0x0951: (58, 12)
flowerbed+0x0dda: (17, 24)
clouds+0x0049: (20, 25)
clouds+0x03df: (180, 55)
clouds+0x077f: (340, 30)

We also need to fix the sizes of the clouds (in practice, you'd do this before trying to fix the positions):

clouds+0x56e: 10
clouds+0x90e: 20

That makes things look a lot better:

What do we still need to fix?

The speech bubble and the swimming pool are pretty clear, but there is also a little bit of red at the base of the windmill, because the chick is drawn before the windmill and so the top of its head is clipped. Let's sort that out first. We need to call the code that draws the chick at some point after the windmill. There are probably lots of ways to do this, but I chose one that also lets us eliminate the rain at the same time without a separate patch. At scenario+0xa2b7 (the instruction before calling cloak-rain), change the return address to scenario+0x32a8 (which is the start of the call to chick). Then after 0x346d, jump forward to 0xa429 (just after the call to cloak-rain). The chick now takes its position from the position set at 0xa21f, so the position needs to be patched there instead of where it was originally patched.

We also need to prevent the chick from being drawn earlier, since then the forward jump will skip right over the windmill (and a lot of other things besides). At 0x273c there is an instruction to set ducksShown to true, which is later checked to decide whether to draw the chick. We can disable that in the same way we've disabled other instructions that set a boolean to true.

Now let's change the λ to a μ. The speech bubble is drawn in the function balloon. And the relative bit of code is

0x16e4 PUSHS 10416
0x172b <blue+0x4b0>[:9216] = fontTable_Tempus-Bold-Huge[:9216]
0x17d7 <blue+0x498> := 98
0x1823 <blue+0x480> := 20
0x186f <blue+0x000>[:18] := "L"
0x18d5 CALL drawString

In Part 1 we swapped out the font on help page 10646 to see different fonts, but if you try this with Tempus-Bold-Huge it'll draw the lambda, then (in my implementation) crash out with an integer overflow. I'm left with this:

So there may be something wrong with this font. The symbol table includes charInfo_Tempus-Bold-Huge_, charInfo_Tempus-Bold-Huge_L and charInfo_Tempus-Bold-Huge_M; maybe one of them is broken. On extracting the DNA from the first two we see that there are no F's and that the P's are generally preceeded by either C or P, which suggests they are sequences of variable-length numbers. We were told that the RNA compressor was used for font tables, so this isn't surprising. However, the last one seems almost random, with a mix of all four bases.

Random... or encrypted? In Part 1 we found one password that we haven't found a use for yet: no1@Ax3. And indeed, if we decrypt charInfo_Tempus-Bold-Huge_M with it, the font table looks healthy again, and we have our μ.

Now we can change the "L" to an "M" where it is used:

Getting the shape if the speech balloon right is much harder. As far as I'm aware, the right shape isn't present in Endo's DNA.One approach is to reverse-engineer the polyline that forms it, and replace the existing polyline. I took a different approach: drawing the outline directly with RNA. However, RNA is rather unwieldy (requiring 10 bases per command), so some compression is in order. We'll start withit it though.

Firstly we'll extract the shape of the balloon. Start at some arbitrary point inside the balloon, and do a flood-fill. The inside isn't quite a uniform colour, so match anything that's close enough to gray e.g. difference between min and max channel is at most 50.

Next, identify just the border pixels: those that are inside the balloon, but share an edge with the outside.

These pixels can be arranged in a linear order with each adjacent (possibly diagonally) to the previous one. In most cases it's obvious which pixel to go to next, but a little care is needed at the bottom corner where the path loops back on itself. We can trace out this shape using RNA, using the commands to move forward and turn left and right. Before leaving a pixel, issue a mark command, and after arriving at the next pixel, issue a line command. It doesn't really matter where you start (we'll add a moveTo call before-hand to get us there), but it's important to end facing east (which is also the starting direction) and to end in the same place you started, as otherwise the higher-level code will be confused about the current location.

Once we've done some compression we'll be able to patch the balloon function in place, but for now we don't have enough room. We'll pick some other function we don't need and overwrite it (I used contest-2005). Overwrite it with a call to moveTo (use 0, 0 as coordinates for now), then the RNA, then a return that pops 24529 from the stack (look at the end of drawPolyline to see what that looks like). Now in balloon, replace the call to drawPolyline with a call to the replacement function. The result will depend on which point on the boundary you picked as a starting point; I used the left-most point on the top row.

Definitely not a complete success, but we can see the top-right of the balloon (if it looks a bit out-of-shape, that's just because you're seeing the sail of the windmill through the hole), and it tells us a bit about how it is being drawn. It's a little hard to see, but that white rectangle isn't a uniform colour — it has a gradient to it. The balloon shape is then used as mask on this gradient rectangle.

We can see this more clearly if we apply a transformation to the colours to make gradients more apparent. I wrote a small tool that multiplies R by 11, G by 17 and B by 29 (all wrapping modulo 256). The anticompressant really interferes with this, so here's the result for the image above but with the anticompressant disabled.

And the target, with the anticompressant reversed:

Since we can only see a small part of the gradient in the target it's not that easy to determine where the rectangle should move to. What's more, if you try to line them up you'll find that they seem to have different slopes. It's worth looking at how the gradient is generated. balloon calls drawGradientCornerNW. Before doing so, it sets a special flag colorReset to false. This prevents the colour callback (in this case, colorWhite) from resetting the bucket before adding a colour. So as drawGradientCornerNW proceeds, it will keep adding more white to the bucket, gradually changing the overall colour. As the colour bucket gets fuller, each new addition has less relative impact, which is why the gradient is strongest at the top-right and gets gentler towards the bottom left.

drawGradientCornerNW works by setting a mark at the current position (the NW corner), moves to the NE corner, then steps down the east edge, drawing a line to the mark at each pixel before repeating the process along the bottom edge. On the target we can get an estimate of where the NW corner is by extending the lines of colour in the image to their intersection, and we can estimate how wide the rectangle is by looking at the slopes of the lines and comparing them to the slopes in our generated image. Getting the height is trickier, because we only get a little information from the stem of the balloon, but we know it has to extend at least to the bottom of the balloon and can just try multiple options from there. It turns out that the rectangle is exactly the bounding box of the balloon, which is also something you might have guessed and tried.

So, we're going to make a few changes:

Change the coordinates of our call to moveTo to 67, 0, to move the bubble to the correct position relative to the box.
Change the coordinates of the setOrigin call at scenario+0x900b to 198, 324 to place the NW corner of the box correctly.
Change the arguments to drawGradientCornerNW in balloon to 101, 169 to set the size of the box.
Change the coordinates of the moveTo at balloon+0x13ca to 74, 37. This is the point from which the balloon is filled, and we need it to be somewhere inside the balloon. There are lots of choices, but this one requires changing only a single bit.

Nearly there! We just need to shift the μ to the right place. At balloon+0x17d7, change the coordinates to 56, 2.

That just leaves the swimming pool for the whale. Once again, we can repurpose a function that we previously suppressed. This time it needs to occur after the whale, so we'll use endocow. In Part 1 we say something similar to what we need, when we set the weather to 2 but before activating the VMU. Here it is again:

Notice that the cupola of the UFO is half-filled with water. It's also the shape we need, but we need it upside down. Let's see if we can get just that part into our scene, without trying to flip it over.

As usual, we'll rewrite a return address in endocow so that the return jumps into the code we actually want. endocow starts with a call to setOrigin, so we'll change the return address of that to jump to ufo+0x0a08, which calls water. For now we'll also change the coordinates in the setOrigin call in endocow to 55, 42, to match those that would have been used for the water if we'd run through ufo from the top.

The following code in ufo after needs a bit of explanation: it first draws the shape of the cup (inline) and uses it to clip the water to the right shape. It then draws the shape again (again, inline, rather than by calling ufo-cup) partially transparent, and composes it. That code ends at 0x1c30 with a jump to 0x1f53. We don't want to run any of the subsequent code, so replace 0x1f53 with a jump to the return statement (at 0x2c77).

The cup is visible (between the balloon and the windmill), but the cow has disappeared. This happened because by jumping into the middle of
ufo, we've messed up the image layers. If you step through the RNA a piece at a time, one can see the water being drawn on the same layer as the cow, and when it is clipped, the cow is clipped away. So we need to add an extra layer. Fortunately, endocow starts with a piece of junk RNA to identify the function, which we can change to an add RNA.

Now let's flip the cup upside down. We'll need to wedge in two turns between drawing the water and drawing the first instance of the cup. We can overwrite the CFPICFP marker RNA in the return of water. For the polyline used for clipping, the colour doesn't matter, so we can replace the white RNA at ufo+0x0b8e with a second turn. We also need to restore the original orientation afterwards. Since we added a jump at ufo+0x1f53, we can just insert two turn RNA's at the start of that jump.

That's flipped the cup upside-down. The water now appears to be filling the top instead of the bottom. That's because the water is just a polyline, and we're now seeing the bottom edge instead of the top. More seriously of course, we've messed up the position tracking so a number of elements of the scene are now in the wrong place. This is not unexpected when we flip the direction under the hood.

One way to fix this is to make sure that the movements done while flipped bring us back to where we started. The last movement before the flip is a call to moveTo in water; the last before we flip back is another call to moveTo in ufo. We don't need to change the coordinates of either: the one in water is called within a setOrigin/resetOrigin pair (one started in endocow), and we can bring them into alignment by adjusting those coordinates. Specifically, the coordinates that we changed to 55, 42 earlier instead need to become 48, 8 for the arithmetic to balance.

This has also shifted the water relative to the cup, and we'll need to adjust it further. But haven't we just pinned down the knob we have to adjust these relative positions? There is another knob: polylines have a starting position, encoded as the 2nd and 3rd numbers in the list (refer back to repair guide page 8). That is currently 56, 65; it'll take a little trial and error to get it right (you first need to get it close enough to see some reference points), but these need to be replaced by 62, 72. Don't forget to update both copies of the polyline.

There's just one more step! At bmu+0x45ee, we need to change the position from (160, 310) to (372, 257) to place the cup in the right position.

If you made it this far, thanks for reading, and if you've actually produced a perfect prefix yourself, congratulations!

I had originally planned to write a Part 3 which showed how to optimise the prefix, but this series has already become very long and I need a break. I've also incorporated a lot of tricks into Part 2. Jochen Hoenicke's page describes his incredibly short prefix, and you can probably learn a few things from it (I certainly did: several ideas presented here are originally his). If I do get back to it one day, the interesting parts would be

Compressing the RNA for the balloon. I've written a decompressor that maps each of I, C, F and P to a short sequence of integers, after which the standard RNA decompressor is used to decompress those into RNA. It works out to 773 bases for the decompressor and data (but requires a separate moveTo call first). If one could find the right tradeoff between complexity of the decompressor and length of the data it might be competitive with Jochen's polygon decompressor.
Writing a code reverser to turn duolc into cloud and an XORer to fix sun, in DNA.
Writing a patcher that takes a compact table of patches to apply and runs a loop to apply them.