Entropy always increases

ICPC World Finals 2024 solutions

2025-01-05T00:12:00.000-08:00

I recently got around to looking at the problems from the 2024 ICPC World Finals, and noticed that they don't have test data or solution sketches posted. Fortunately Kattis has the problems in its online judge (https://icpc.kattis.com — the "official" online judge link is broken), so I was at least able to submit solutions to check correctness.

So here are my solution sketches below. In the cases of a few problems I had some intuition about a solution but I don't actually have a proof of correctness.

A: Billboards

It's easy enough to compute the total area of each value function. We can then easily scale the functions so that each has a total area of n (and hence each section must have a value of at least 1).

Let's suppose we already know the order in which we'll present the billboards. Then we can compute the partition points greedily: make each section in turn the length such that the value to the sponsor is 1. The last section has to end at l so it might be longer. Computing this requires a little maths. For each sponsor, we can keep a prefix sum of the areas of the piecewise-linear segments, and binary searching that gives us the segment where the area crosses 1. To find the exact x value we'll need to solve a quadratic equation, and handle a few special cases.

How do we choose the order? We can actually do this greedily too. Just repeatedly pick the sponsor that will allow us to use the smallest section of billboard. Initially, the total of the value functions of all sponsors is n². We're then eliminating one sponsor (with total value n), and a section of billboard which the other n - 1 sponsors value at 1 or less. So the remaining total of value functions of the other sponsors is at least n² - n - (n - 1) = (n - 1)². We can thus keep repeating the process, and see that at end the remaining value will still be least 0.

B: Bingo for the Win!

For a player to be last, one of their numbers must be the last number picked. Consider each distinct number that appears on a particular player P's board. If a slower player also has that number, then this number being last won't make P finish last. Otherwise, if this number is drawn last, P will finish last, and if the number appears m times in total (out of all nk numbers), it has a m/nk chance of being picked last. So for each player, add this up over all numbers on that player's board.

C: Citizenship

This is a fairly straightforward implementation problem. You have to convert dates into a more useful representation (e.g. days since 0000-01-01), and just iterate through possible answers until you find one that satisfies the conditions. One optimisation is that you can pre-compute a prefix sum of days away to quickly determine the number of days resident during any 365-day period.

D: Doubles Horseback Wrestling

I don't have a solid proof that the following works, but it is accepted. Note that two riders i and j are compatible if (and only if) \(l_i + l_j \le s\) and \(u_i + u_j \ge s\).

Sort the riders by l. Iterate through them in decreasing order of l, trying to find a match for each one (unless already matched). We can maintain a set S of riders j which satisfy \(l_i + l_j \le s\) (which grows each time l decreases). From S, take the rider with the smallest \(u_j\) such that \(u_i + u_j \ge s\) (if any exists).

The intuition is that any match currently in S will always remain in S (unless matched) later on, so there is no benefit to matching with any rider other than the one whose u value will be the hardest to match.

E: Flipping Container

We will try to consider all sequences of flips that might lead to the desired answer. Since that's a large (actually infinite) space, we'll try to identify symmetries that allow the search space to be cut down to a more reasonable size.

With "orientation" defined as it is, there are 6 possible orientations. We can write them as abc, cba, cab, bac, bca, acb, according to which sides are aligned to the x, y and z axes respectively. There are four possible transitions from each orientation: two of them will go to the next orientation in that sequence (wrapping around), and two to the previous one.

Each transition can also be labelled according to the amount it to adds to x or y (possibly negative). It follows that to determine where the container ends up, it's sufficient to know how many times each transition occurs, and the order doesn't matter. Conversely, if we decide how many times to use each transitions, then we can reach the corresponding position provided that the conditions for an Eulerian cycle are satisfied (connected and in-degree equals out-degree at every vertex).

Transitions occur in pairs with the same start and end orientation, corresponding to opposite edges to flip over. For example, from abc one can transition to cba either with x+=a or x-=c. To reduce the number of possibilities to cover, we'll start by just deciding how many times each pair appears, then later decide how to split that count amongst the elements of the pair. I'll call such a pair an "edge".

Between each adjacent pair of orientations there is an edge in each direction. Let's call the difference between the frequencies of these edges the "balance" of the pair of orientations. The Eulerian cycle condition requires the balance of each adjacent pair to be the same, and in fact this corresponds to the number of times the Eulerian cycle "winds" around the centre of the cycle. It's not hard to check that the effect of winding one way around is the same as winding the other way, so without loss of generality we can assume that this winding number/balance is either 0 or 1.

We also have to consider the connectivity requirement for an Eulerian cycle. If the window number is 1 this is trivial (we clearly visit all vertices). Otherwise, we can iterate to decide how many vertices we proceed clockwise and anticlockwise from the starting orientation.

With those outer loops (to decide winding number of which vertices are reachable), we now just need to determine how many times each edge gets used. For each pair of adjacent orientations we need only pick one number, since we've already determined the balance. We'll also need to know how to split each edge into the two component transitions. We can cut down the search space further here by noting that half of the edges affect only x and the other half only y, so we can solve for them independently.

At this point we can use dynamic programming. Consider the edges between abc and cba for example. The forward transitions provide X deltas of +a and -c, and the reverse transitions provide -a and +c. A forward-and-back thus provides either -a-c, +0 or +a+c. (Note that while the +0 case is normally useless, one might require one of them to satisfy the graph connectivity requirements.) Similarly, other pairs of edges provide -c-b or +0 or +c+b and -b-a or +0 or +b+a. This is a slightly modified knapsack problem (due to the negative values) but the usual dynamic programming approach works with minimal change (the negative values just require iterating the update in the reverse direction). However, the possible range of inputs is far too large to handle with dynamic programming. When the sum is very large, one can note that it will always be optimal to make up "most" of the sum using the largest component (in particular, if the largest component is q, then there is no point using q of a smaller component p, since this can be replaced by p instances of q). So we can bound the DP to "small" values, and then for each small value with the right remainder modulo q, determine the number of q's to add.

F: Friendly Rivalry

If the separation is to be at least d, then any pair of chapters separated by less than d must be the in same team. So create a graph with an edge for each such pair, and identify the components. Then check whether we can pick a subset of the components to form a team of size n, which is a classic knapsack problem. This gives a test for whether the answer is at least d; a binary search completes the solution.

G: Kindergarten

This is another tough one where I just kept refining my solution until it passed, but I don't have a water-tight proof.

In general, a directed graph where every vertex has out-degree 1 has a known structure: each (weakly-)connected component consists of a set of trees whose roots are joined into a cycle. For this problem, the constraints on coolness ensure that there is only one such component in the jealousy graph, although it's not clear why that's part of the problem except perhaps to make the implementation slightly easier.

We'll build a directed graph of relationships of the form "A must go before B" which is sufficient to prevent any surprises. Provided we ensure this graph is a DAG, we can then find a topological order on it. This DAG will be mostly the jealousy graph (with edges reversed), but since that graph has a cycle we will need to modify it to use "like" relationships instead in some cases (I'll call these "crushes").

Start by finding the cycle of tree roots. We'll have to remove one of the edges from this DAG; we can just iterate to pick which one. Let's say we're going to remove the edge "P must come before Q" (meaning that Q is jealous of P). To prevent surprises, we'll instead need to add Q → R and R → P where Q's crush is R. In some cases this may be sufficient to produce a DAG, but if R is a descendant of P then we have a new problem, because this will produce a cycle involving R and P. The key observation is if we now pretend that P is jealous of R, this looks a lot like the original problem, just constrained to the tree rooted at P, and hence we can solve it recursively. There are a few tweaks needed in the recursive step though:

When choosing which edge of the cycle to cut, we cannot cut the edge R → P, because that's needed to prevent Q leaving a surprise for P.
Let R be jealous of S and have a crush on T. If we cut the edge S → R then we'll be adding the edge R → T, and if T = Q we'll have created a cycle. More generally, this is a problem if T is a predecessor of Q in the DAG we're constructing. With a single level of recursion, Q has no predecessors, but when we cut the S → R edge at consecutive levels of recursion, we will accumulate a chain of predecessors.

This recursive search will either produce a DAG which we can linearise, or fail to find a solution. I don't yet have a proof that there is no solution in the latter case.

If each recursive call takes time proportional to the size of the whole tree, this will be too slow (quadratic time). Instead, each call should do self-work that scales with the size of the cycle: the edges in this cycle are not processed again at lower levels of the recursion, so this would make the algorithm linear-time. In practice, there is likely to be an extra log factor. The trickiest part is determining whether R is a descendant of P. I enumerated the vertices in a depth-first walk, which makes the descendants of any node a contiguous range. I then maintain a data structure that records the current root for each vertex. This could be done easily with a segment tree, but I used an ordered map that stores contiguous ranges that have the same root.

H: Maxwell's Demon

There are two parts to this problem. The first part is to determine the times at which a single particle will reach the demon. Note that letting a particle through just mirrors its future path, so has no impact on when it will next reach the demon again. I'll just discuss a particle in the right-hand side, since a particle on the left-hand side can be handled symmetrically (and in fact my solution maps all particles to the right-hand side and just records whether they need to swap sides).

To avoid the complications of reflections, it's easier to conceptualise the chamber as having size 2w × 2h, with the particle wrapping around to the other side whenever it hits a wall but maintaining its velocity. The other 3 copies of the chamber are like images in the mirror if the chamber has mirrored sides.

It's pretty simple to determine when and where the particle will first hit the dividing wall. It's also clear that it will hit it every \(\frac{2w}{\lvert v_x\rvert}\) seconds, and we can determine the change in y over that time. To determine when the particle will hit the demon or its mirror image at \(y = 2h - d\), we just have to solve a linear equation modulo 2h, which requires just finding a modular inverse. This becomes a little simpler if all coordinates and time values are scaled up by \(\lvert v_x\rvert\), which makes all values integral. This will give both an initial hit time and a period; or the linear equation may have no solution.

There are a few corner cases:

If d = 0 or d = h, then the demon and its reflection are the same, and one may need to handle this to avoid counting encounters with the demon twice.
If the particle is moving purely vertically, it will never hit the demon (avoid dividing by zero).
If the particle is moving purely horizontally, it might or might not hit the demon, but the calculation may need to be handled differently. More generally, if successive hits on the dividing wall have the same y value, this needs to be handled carefully to avoid division by zero.

The second part of the problem is to deal with the interference caused by multiple particles hitting the demon simultaneously. One can simulate time by keeping a priority queue of hit events ordered by time. Each time there is a hit, identify all the particles that hit at that time. These particles either all swap or all stay where they are. You need to determine whether there is a set of these times whose XOR corresponds to the necessary set of particles swapping. That's just a linear algebra problem over Z₂.

The naïve approach would be to try to solve the linear equation from scratch over all events at each time. However, one can speed things up by keeping the result vectors from the previous Gaussian reduction. The new vector is then added as a new row, and reduced using all the previous rows. If it's reduced to all zeros, it can be ignored, as it is just a linear combination of existing options. Otherwise, it can be kept as a new row, and used to reduce the right hand side.

Finally, we need to decide whether to give up. Because the velocities are all integers, after 2w seconds the X coordinates will all be the same as at the start, and after 2h seconds the same applies to the Y coordinates. Hence, the whole system is cyclic with period 2wh (or a factor thereof), and so we can give up after 2wh seconds.

I: Steppe on it

This is a fairly standard tree DP. We'll solve the problem of determining how many fire-engines are needed for a given response time. The optimal response time for a given number of fire-engines is then found by binary search.

So assume the response time must be at most T. For each subtree, calculate:

The minimum number of fire-engines (placed within the subtree) needed to cover all the cities in the tree, and the minimum possible response time this gives for the root (and hence how much coverage that engine can give above the root).
Whether it is possible to save one fire-engine by having the root and possibly some descendants get their service from an external fire-engine, and the maximum response time that engine may have for the root.

To compute these (recursively), treat the current node as a leaf and fill in the values, then add an edge to each child tree in turn. When adding an edge, one considers the different possibilities for the two trees that are being joined.

J: The Silk Road… with Robots!

Let's first consider how to solve the problem if it doesn't have any updates. Firstly, note that there is never any benefit to having two robots cross over or even meet, since they will be duplicating work. Now consider the road segment between any two locations ("locations" meaning robot or shop locations), and what tracks robots take across it. There are 5 possibilities:

A robot crosses it left-to-right.
A robot crosses it right-to-left.
A robot crosses it left-to-right, then returns (right-to-left).
A robot crosses it right-to-left, then returns (left-to-right).
No robot crosses it.

One can now sweep the locations left-to-right, computing \(d_{i,s}\) which is the maximum profit over locations [0, i] where the segment to the right of location i has type s. To compute \(d_{i,s}\), one needs to consider each possible type t to the left of location i, check whether t and s are compatible (e.g. don't require a robot to materialise from this air) and compute a profit from \(d_{i-1,t}\), the tenges collected at i (if any) and the costs of robot movements due to s. I used a 5×5 table to track which states are compatible depending on whether there is a robot or shop at the location.

Running this after every update will be too slow, and we clearly need a way to do incremental updates. Instead of a left-to-right sweep, we can treat this as a divide-and-conquer DP. Let \(d_{i,j,s,t}\) be the maximum profit over [i, j) with type s to the left of i and type t to the left of j. We can combine this with \(d_{j,k,t,u}\) to produce a candidate for \(d_{i,k,s,u}\). Thus, if we can solve the problem recursively for the left and right halves of a range, we can combine those solutions to compute the solution for the full range. These partial solutions can be stored in a segment tree.

When we receive an update, we can update one leaf in the segment tree, then propagate the update up to the root in logarithmic time. Note that since we have the full set up of updates available offline, we can determine the locations in advance. A location that has not yet received any update behaves just like a shop with no tenges.

K: Tower of noiHa

This is another very tricky one where I fiddled with my solution until it passed, but for which I don't have a formal proof of correctness.

The first step is to determine the initial state. This is relatively easy: if the first (most significant) bit is 1, then the biggest disk has been moved to the right-hand stack, and the remaining bits describe moving the remaining disks from the middle to the right. Otherwise, the biggest disk hasn't yet been moved, and the remaining bits describe moving the remaining disks from the left to the middle. This decoding process can be applied recursively.

We can dispose of a simple case first: if the initial position is valid (no disk on top of a smaller disk), then a similar procedure to the above allows the state to be turned back into a binary representation of the number of moves remaining. To let's assume that there is a disk (call it d) that is on top of smaller disk(s). We'll divide the disks into three sets: A is the disks smaller than d and not sitting under d, B is the disks bigger than or equal to d, and C is the rest (smaller disks trapped under d).

Consider the movements made just with disks in B. Since these disks are properly stacked big-to-small relative to each other, we can move them in the standard way for Hanoi. This tells us where the disks in A need to be placed to allow the first move from B, which can be similarly computed. After this, and until we move d, each time we need to make a move from B we also have to shift all the disks in A, which means the total move costs \(2^{\lvert A\rvert}\). Once we move d, the whole game will have reached a valid state, and we can solve it in a standard way.

There is another special case: if all the disks from B are already on the right-hand stack, then the algorithm above will not try to move them at all, but this will leave C out of order. So in this case one needs to move all disks from A to one of the other two stacks, then move d to reach a valid state. It's unclear which stacks to use, so I just try both possibilities and take the cheaper one.

L: Where am I now?

I was a little surprised by the low number of solutions on this — but interactive problems are always a little more difficult to get right.

There is a fairly generous limit on the number of steps. It's not actually necessary to use the exact distance information given; it's sufficient to know whether there is a wall in the next cell. My approach was:

Do a depth-first exploration of the map to determine the shape of the connected component one is in. Call this a "fragment".
Consider all possible translations and rotations to determine how your fragment could be placed on the map. Call these "transformations".
Consider each empty cell within your fragment. If there is a cell which corresponds to the same map position under all transformations, then move to that cell and declare your position. Otherwise, the problem is impossible.

IOI 2023: Overtaking

2023-12-02T22:13:00.000-08:00

I found this to be the easiest of the IOI 2023 problems — it's the only one where I had a rough idea of how to solve it within a few minutes of thinking. It's still not easy though.

There are a few key observations that simplify the problem:

A faster bus has no impact on the arrival times of a slower bus (even indirectly).
This means that we can simply delete any buses faster than (or as fast as) the reserve bus, as it won't affect our answers.
After doing that, the arrival times of all the non-reserve buses is fixed independent of Y.
The arrival time of the reserve bus is an increasing function of Y.

We'll simulate the process in a spatial sweep along the road. At each sorting station, we'll answer the questions

At what times to the non-reserve buses arrive here?
For a given arrival time for the reserve bus, what is the range of Y values which would cause the reserve bus to arrive here at that time?

To answer the first question, we first need to know the order in which buses leave the previous sorting station, which is just a sort by departure time then by speed. A prefix maximum then gives the arrival times at the current station. To answer the second, we'll start by answering it specifically for times at which non-reserve buses arrive at this station. Consider a time t and slowest bus i to arrive at time t. If the reserve bus leaves the previous station between times t-W[i]d (exclusive) and t-Xd (inclusive) then it will arrive at time t. If it leaves any earlier it will be ahead of the traffic and arrive earlier, and if it leaves any later it will arrive later even if it doesn't encounter any traffic. To turn that range into a range of Y values, we rely on dynamic programming: we look up the interval of Y values corresponding to each endpoint and take their union.

What about answering the second question for times other than those when the non-reserve buses arrive? In those cases, the reserve bus did not encounter any traffic, so the answer for time t is simply the answer for the previous station at time t-Xd.

We'll need a data structure to represent all this. Conceptually it will be an increasing function mapping Y to an arrival time. It will need to support the following operations:

Initialise with the identity function.
Increment all values by Xd. This can be handled by maintaining an offset outside of the main data structure.
Set all values in the interval (u, v] to t.
Find the intercept for a given output value.

A suitable data structure is a balanced binary tree, where each node is an interval of Y values with a fixed output t. The intervals must be non-overlapping. For any Y value not contained in one of the intervals, the function is the identity. Setting an interval (u, v] to t requires

Deleting any existing intervals contained entirely in that range.
Trimming any intervals that overlap the range.

The deletion means each insertion could take linear time, but it is amortised since each inserted interval can only be deleted once.

This requires O(NM log(NM)) for initialisation and O(Q log(NM)) for the queries. I have one gripe with this problem, although I don't know if it is the fault of the IOI or just the IOI mirror on Codeforces: my solution was about 25% slower than the model solution, and exceeded the time limit. To make it pass, after initialisation I converted the std::map to a std::vector and used binary search on it to answer the queries, which is not a big-O change but was slightly more efficient and allowed it to get a full score.

IOI 2023: Beech Tree

2023-11-26T01:41:00.000-08:00

Here's my solution to Beech Tree, from IOI 2023 day 2. For a while I hated the problem, but after seeing the solution I think it's quite beautiful (excuse the pun). It's definitely a problem that requires a lot of thinking time before writing relatively little code.

The key is to identify the conditions under which a subtree is beautiful, in a way that can easily be checked. I'll talk just about whether a "tree" is beautiful, because I'll want to talk about subtrees of that tree. I'll also ignore the original labelling and just work with the permuted labels, which eliminates a level of indirection from all the indices. So for example, the parent of A must be f(A). I'll also talk freely about the colour of a node, by which I mean the colour of the edge between that node and its parent (C[i]).

We can make quite a few observations about a beautiful permutation; I'll focus on the relevant ones here. Some of them seem to come from nowhere: I found them by trying to work out a procedure for choosing a beautiful permutation, and then discovering conditions under which it would fail.

If A and B have the same colour, then A < B if and only if f(A) < f(B). This follows directly from the definition of f.
If A is a parent of B, then A < B. Proof: A must be f(B), and f(B) is the number of times C(B) appears in a sequence of length B - 1, so A = f(B) ≤ B - 1.
No node can have two children of the same colour. Proof: suppose A has two children X < Y with the same colour. Then A = f(X) < f(Y) = A, a contradiction.
Let S(X) be the set of colours of children of X. If A < B, then S(A) ⊇ S(B). Proof: Suppose colour D appears in S(B) but not S(A), i.e., B has a child X of colour D but A does not. X must be the B'th node with colour D (counting from 0). The A'th node with colour D should have A as its parent, a contradiction since A has no child of colour D.
If A < B, then |T(A)| ≥ |T(B)|. Proof: suppose this is not the case i.e., there exist A < B with |T(A)| < |T(B)|. We can choose the largest such B. Since S(A) ⊇ S(B), there must be some colour D such that A and B have children X and Y of that colour with |T(X)| < |T(Y)|. But since f(X) = A < B = f(Y), we get X < Y, which means the pair (X, Y) contradictions the maximality of B.
Let z(A, D) be the size of the subtree rooted at A's child of colour D, or 0 if A does not have a child of colour D. If A < B, then z(A, D) ≥ z(B, D) for all colours D. Proof: Suppose this is not true for some A, B, D. Observation 4 rules out z(A, D) = 0, z(B, D) > 0. Let the children of A and B with colour D be X and Y. Then f(X) = A < B = f(Y), so X < Y, but |T(X)| < |T(Y)|, contradicting observation 5.
If |T(A)| ≥ |T(B)|, then z(A, D) ≥ z(B, D) for all colours D. If A < B then this follows from (6), so assume A > B, giving z(A, D) ≤ z(B, D) for all D. But |T(A)| = 1 + Σ z(A, D), so if any of those inequalities are strict we would have |T(A)| < |T(B)|, a contradiction.

We now claim that if a tree satisfies conditions 3 and 7, it is a beautiful tree. Note that these conditions are independent of the labelling, and hence can easily be tested. The proof is constructive, in the form of an algorithm to assign the labels in increasing order. Maintain a queue of unlabelled nodes per colour; the head of the each queue must be the next node to be assigned a label. To start, consider the root to have colour 0, and place it in the queue for colour 0. All other queues are empty.

To advance, select the head-of-queue with the largest |T(X)|, and assign it the next available label. Then place its children at the tails of their respective queues. This process maintains the following invariants:

Within each queue, |T(X)| is decreasing.
The elements in the queues have (non-strictly) smaller |T(X)| than any labelled node.
The labels are assigned in decreasing order of |T(X)|.

The proof is left as an exercise for the reader. A key observation is that if some node A is labelled, and it does not have a child of colour D to append to the D queue, then no other nodes will later be appended to the D queue, because 0 = z(A, D) ≥ z(B, D) for any B > A. This ensures that every node will have the correct parent.

Now we are left with the problem of testing conditions 3 and 7 for every subtree. Testing condition 3 is quite easy. For condition 7 it helps to note that the condition is transitive, so that if |T(A)| ≥ |T(B)| ≥ T(C) and (A, B), (B, C) each satisfy the condition, it is not necessary to check (A, C) as well. We can thus easily check a tree of size K in O(KM) time, but this will be too slow to get a full score. There are two observations that help speed this up:

To test the condition for some A, B, we can just iterate over S(B), and for each colour D look up z(A, D). If we sort the children of each node by colour, this will take O(|S(B)| log M) time. A tree of size K has only O(K) edges, so checking a whole tree can be done in O(K log M) time.
We can re-use results from subtrees. If any child of A is not the root of a beautiful subtree, then nor is A. If they all are, we can use small-into-large merging to combine information about the subtrees. For each subtree, keep an ordered map from subtree size to a representative vertex of that size. Then we can merge subtrees by taking each element from the smaller map and inserting it into the larger map, checking that it has the appropriate relationship with its immediate neighbours. Each insertion requires O(log N) time to find the location in the map structure and amortised O(log M) to perform the verification, and each node will be inserted O(log N) times. Thus, the running time is O(N log N (log N + log M)).

This sounds complicated but requires less than 100 lines of code.

IOI 2023: Day 1

2023-11-19T02:07:00.000-08:00

I've finally gotten around to looking at the day 1 problems from IOI 2023, and they're really tough. Congratulations to those who solved any of them. It's been pointed out that writeups of solutions seem to be absent from the internet, so I'll try to remedy that, at least for day 1 (I haven't had time to read the problems for day 2 yet, and probably won't until early next year).

For now I'll leave out Closing Time, since there is a solution at the link above, and I don't have a good proof of efficiency for my own solution. I might come back to it.

Longest trip

Finding the longest path in general graphs is NP-hard, so we should expect that we'll need to exploit the density condition in some way. I'll only discuss D=1, since the other cases are strictly easier. Let's first consider how many components there can be. There cannot be 3 (or more), because then we could pick one vertex from each component and this triple would violate the density condition. If there are two components, then when picking any two vertices from one component and one vertex from the other, we see that the two vertices in the same component must be connected i.e. both components are complete graphs. In that case the longest path is easy to find: just walk the vertices of the larger component in any order.

What about when there is only a single component? It's certainly not required to be complete (the first example shows that), but perhaps we're still guaranteed to be able to find a Hamiltonian path?

Building a single path a step at a time will be difficult, but we can get close by building two paths at once. We can initialise the paths with the first two vertices, and then take each other vertex in turn and try to append it to one of the paths. If it doesn't have an edge to the end of either path, then the ends of the two paths must be connected to each other. In this case they can be joined into a single path, and we can start a new second path with the vertex we just tried to add.

Once we've processed all the vertices, we will have them organised in two paths, but we don't know if they're separate components or not. A single query can determine that. If so, we just return the longer of the paths. If not, we want to find a way to link the paths together. The following procedure will do the trick:

Test if either end of one path is connected to either end of the other. If so, then we can just chain them together trivially.
If not, then the density condition requires that the end of each path is connected to its beginning, so the paths are in fact cycles. Now find any edge that connects one cycle to the other. By cutting each cycles next to this edge, we obtain a Hamiltonian path.

This algorithm requires O(N) queries, but without careful implementation, the constant factor will be too high to keep q below 400. Let's see how we can improve things.

Let's start with the first part of the algorithm, where we build two paths. Let's call the current last elements of each path P and Q, and the element we're adding R. In some cases we will have information that P and Q are definitely not connected. If that's the case, and we learn that P and R are also not connected, then we know that Q and R are connected, without needing a query. So when we know P and Q are not connected, only one query is needed to make progress. When P and Q might be connected, we might need two queries, but if so we have a good chance of transitioning to a state where we know P' and Q' are not connected. The only time we won't is if

P is connected to Q; and
neither P nor Q is connected to R; and
both paths have more than one element.

The above conditions cannot apply twice in a row (because afterwards one of the paths has a single element), but this still allows for an average of 5/3 queries per element. However, we're also given the information that the grader is not adaptive, and simply randomising the order of elements seems to be sufficient to keep the actual number of queries down.

We can also optimise the final steps in attempting to merge the two paths together. To check whether it is possible to connect the two paths end-to-end, a single query can be used to determine whether either end of the one is connected to either end of the other; if this returns true, further queries can be used to isolate the specific connection. While that requires more queries in the true case, it reduces the queries for the false case (which is more expensive, because we then have to find some other edge between the cycles). Finding an edge between the cycles can be done in logarithmic time, by first binary searching on the one path, then the other.

The official model solution does some slightly more complicated things in the case that P and Q are connected; I suspect this might guarantee that the solution completes within 400 queries without depending on randomisation. A solution that can guarantee no more than 1.5 queries to add each R will have the following budget:

381 queries for the first phase (only 254 elements need to be added, since the first two are used to initialise the paths)
1 query to check whether there is any connection between the paths
1 query to check whether there is a connection between the edges of the paths
15 queries for the binary search (8 for the larger side, 7 for the smaller),

for a total of 398.

Soccer

Let's start by figuring out what a "regular" field looks like. Within each row, let's call all the cells that form part of the field a "span". A span has to be contiguous, since there is no way to kick a ball around a gap in a row in only two straight kicks. Now consider two spans (in different rows). Each span can be seen as an interval of column numbers, and if neither of them is a (non-strict) sub-interval of the other, then field is not regular: there will be a cell in each span whole column does not belong to the other span, and there is no way to kick a ball between these two cells with only two straight kicks.

Let's consider the longest span (ties can be broken arbitrarily), which by the observations above is also a super-interval for every other span. Moving away from this longest span either upwards or downwards, one must encounter progressively shorter spans (again, non-strictly) as otherwise there will be a column with non-contiguous cells in the field. It shouldn't be too hard to convince yourself that these conditions are also sufficient for a field to be regular.

We can think about constructing a maximal field by adding spans one at a time, starting with the longest span and then adding spans either above or below. This naturally leads to a recursive solution: given a partially constructed field (consisting of spans in contiguous rows), consider adding a new span either above or below the existing spans, whose extent must be limited to the intersection of all the existing spans. We need to start the recursion from the widest span and we don't know where that will be, so we can just use an outer loop that considers all possibilities.

This recursion will consider all possible regular fields, but since we only want to know about the biggest one, we can immediately do some pruning. Consider a span to be "maximal" if it cannot be expanded (under whatever conditions it is being selected) by adding more cells to the left or right. When recursing, we only need to consider maximal spans, since this adds more cells to the field and does not reduce our choices later in the recursion. Similarly, only maximal spans need be considered as the initial (widest) span.

This exponential-time solution can be reduced to polynomial time with memoisation in a pretty standard way. The only state needed in each recursive call is the range of rows for which we already have spans and the range of columns that is the intersection of all chosen spans.

This gives O(N⁴) states and an even larger number of edges between states, so we need to optimise this to solve the final subtask. Let's start with a few optimisations that won't obviously reduce the big-O. Immediately after adding a new span, see if we can add more spans (on both sides) without narrowing the column range at all. It will never be suboptimal to add these now, so we can do it immediately. We want this process to be efficient, which we can achieve by building lookup tables for the next tree upwards and downwards of any given position, and precomputing a range-max query structure along each row. We can also use lookup tables of next tree and next non-tree along each row to quickly find all the maximal spans in the new row we're adding.

While this will clearly help in some special cases, it's not clear why it should improve the big-O. Let a "maximal rectangle" be one that doesn't include a tree and cannot be made into a bigger rectangle by adding more cells to it (without including a tree). Then it is not difficult to show that each time we need to decide on the next span, the current row and column range define a maximal rectangle: our recent optimisation is to extend the rectangle vertically, and our choice to only include maximal spans mean it cannot be extended horizontally. So the number of memoisation states is bounded by the number of maximal rectangles.

What's the maximal number of maximal rectangles? It is O(N²). Consider fixing the bottom row, and fixing the tree that bounds the top edge (treat the boundary as surrounded by trees). Since this tree must be the lowest in its column above the bottom edge, there are only O(N) such trees for each choice of bottom row. Once those two parameters are fixed, it is easy to see that the rest of the maximal rectangle is fixed too (simply grow leftwards and rightwards of the chosen tree as far as possible). Note that not all such rectangles are maximal, because we haven't considered that the bottom could be extended. So in fact, the number of rectangles bounded by trees on 3 sides is O(N²).

This means that we have O(N²) states, but what about the number of state transitions? We'll consider only transitions that start by adding a new row on top (adding to the bottom is symmetrical). At first glance, it appears that this could be O(N³), because from each state there could be O(N) successor states. However, we can associate each transition with a unique rectangle bounded on top, left and right; and we showed earlier that there are only O(N²) of these. Specifically, for each transition, consider the intersection of the old and new state rectangles, and call it the "transition rectangle". This will differ from the new state only in the bottom edge, so it will be 3-side bounded as claimed. We can reconstruct the new state from the transition rectangle simply by extending the bottom edge as far as it can go; we can also reconstruct the old state as follows:

Remove one row at a time from the top until it is possible to extend the left or right edges.
Extend the left and right edges as much as possible.

Since we're able to reconstruct the old and new states from the transition rectangle, the transition rectangle must be unique to the transition, and hence there can only be O(N²) transitions.

The overall complexity depends on the implementation of the range max query and of the data structure used for memoisation. With practical implementations the algorithm will have O(N² log N) time and either O(N²) or O(N² log N) space, depending on the range max data structure used.

Incidentally, while the official model solution explicitly computes all the maximal rectangles, it's not actually necessary to do so. In my solution they are needed only to prove the efficiency.

Solutions to ICPC world finals 2021

2022-12-10T03:03:00.010-08:00

In case you're wondering why I'm only posting this in 2022: the "2021" world finals happened in November 2022, in Dhaka. You can see the problems here. I haven't seen a writeup of the solutions (not that I've looked very hard). So I'm writing up my solutions. These are my own solutions, except for K, where I watched the official solution and particularly liked it. You can find the official solution videos here. The proof of correctness for E was provided by Derek Kisman.

A: Crystal Crosswind

Consider a single wind direction, and two cells A at \((x, y)\) and B at \((x - w_x, y - w_y)\). If there is a boundary at A, things are simple: there is a molecule at A and there is no molecule at B. If there is no boundary at A we can't immediately determine where the molecules are (unless B lies outside the grid, in which case there is on molecule at A), but we know that if there is a molecule at A there is also one at B, and conversely if there is no molecule at B then there is no molecule at A. We can add these are edges to two different graphs.

Now every time we know there is a molecule in a particular location, we can follow edges in the first graph to find new molecules, and similarly when we know there is a hole somewhere, we can follow edges of the second graph to find new holes. We can just keep doing this until there is no new information to be found.

This may still leave us with unknown cells. The key observation is that leaving all these locations as holes will yield a valid configuration. Suppose we made a hole at \(P_1\), and that implied there had to be a hole at \(P_2\), which implied there had to a hole at \(P_3\) etc, until eventually it required a hole at some \(P_k\), which already contained a molecule. That cannot happen, because the molecule at \(P_k\) would already have forced a molecule at \(P_{k-1}\).

Similarly, we can safely fill all unknown cells with molecules to get the maximum crystal structure.

B: Dungeon Crawler

Assume we've decided to end the traversal at some node E. Any edge on the path from S to E will necessarily be traversed an odd number of times, and any other edge an even (and positive) number of times. Let r be the sum of all the edge lengths, and let d(P, Q) be the distance between nodes P and Q. If we didn't have to worry about the trap, then we'd be able to achieve \(2r - d(S, E)\), using each edge the minimum number of times.

Let's consider the tree to be rooted at K, and let L be the least common ancestor of S and T, and M be the least common ancestor of L and E. If L = T then the problem is impossible (can't get from S to K without going through T). Otherwise, every edge on the path from L to M has to be traversed 3 times: first to get to the key, then to get to the trap, and finally to get to the end. So the total cost will be \(2r - d(S, E) + 2d(L, M)\). We can now consider two cases.

E lies in the subtree of L, so L = M. Then the cost is just \(2r - d(S, E)\). So for this case, we want E to be as far from S as possible.
E does not lie in the subtree of L. In this case a bit of manipulation (using knowledge of which vertices are ancestors of which) shows that the cost is equivalent to \(2r - d(E, K) + d(L, K) - d(L, S)\). Since L, K and S are independent of E, it follows that we should choose E to be the vertex furthest from K.

For each case, we want to choose E as far from some vertex V (either S or K) as possible, while excluding a particular subtree. Let's consider the tree to be rooted at V, and index the vertices using a pre-order walk. The range of vertices to exclude are contiguous in this pre-order walk. Thus, we wish to consider only a prefix and suffix of the vertices. By pre-computing prefix and suffix maximums (of depth) for every possible root, we can choose E in each case in constant time (once we have L and its neighbour towards K).

The time complexity depends on the method used to find least common ancestors. With a simple structure where every node stores ancestors at power-of-two distances, it will require \(O(n^2\log n + q\log n)\); with more sophisticated structures (such as heavy-light trees) this can be reduced to \(O(n^2 + q\log n)\), and I believe it can (very) theoretically be reduced to \(O(n^2 + q)\) using the method of the Four Russians.

C: Fair Division

It's slightly easier to work with the fraction of loot that gets passed on, rather than the fraction each pirate takes. Let's call that g (so g = 1 - f). Let's work out how much loot each pirate gets. Let's say a pirate gets x in the first round. In the next round they get \(g^nx\), then \(g^{2n}x\) and so on. This is a geometric series, and in the limit they will get \(\frac{x}{1-g^n}\). For the ith pirate (counting from 0), x will be \(g^i(1-g)m\), so the total loot will be \(\frac{g^i(1-g)m}{1-g^n}\).

Let \(g = \frac{a}{b}\), where a and b are relatively prime. Multiplying top and bottom by \(b^n\), we see that the loot is \(\frac{a^i b^{n-1-i}(b - a)m}{b^n - a^n}\), and in particular \(b^n - a^n\) must divide into \(a^i b^{n-1-i}(b - a)m\). Suppose some prime p divides into b. Then it divides into \(b^n\) but not into \(a^n\) and hence does not divide into \(b^n - a^n\). Thus, the denominator is relatively prime to \(b^{n-1-i}\), and similarly to \(a^i\). It's thus required that \(b^n - a^n\) divides into \((b - a)m\).

We now see the reason behind the condition n ≥ 6: it strongly limits how big b can be. It's sufficient to test all values of b up to the 5th root of m to find those that satisfy the condition.

D: Guardians of the Gallery

This is a typical hard geometry problem: relatively simple conceptually, but with lots of corner cases and opportunities for floating-point instabilities.

I don't have a formal proof, but the guard's path can be split into two stages:

Optionally move to a vertex of the polygon, possibly via other vertices. We can find the minimum distance for this by considering a graph consisting of the polygon vertices and the guard, with edges where the line segment connecting vertices is unobstructed. The graph is small so Floyd-Warshall is sufficient.
Move to the nearest point on a segment originating at the art, passing through a vertex, and terminating where it is no longer possible to view the art. This might be an orthogonal projection onto the segment (interior to the polygon) or it might be the endpoint of the segment.

We thus need to know how to find the terminating point of a segment as contemplated in the second part. We can trace the polygon counter-clockwise, and check each edge against this ray. If it completely crosses the ray, that gives us a bound. But what if it just touches the ray? We can separately track edges that touch the ray from one side (obscuring one half of the art) or the other (obscuring the other half); the further of these two determines the cutoff point.

We also need to be able to check whether a line segment is obstructed by the polygon. It's complicated by the fact that either endpoint may lie on the polygon itself. It's not sufficient to determine whether the polygon intersects the interior of the line segment (i.e., excluding the endpoints), because the segment may lie entirely outside the polygon, or it may lie along an edge of the polygon or pass through a vertex. Solving this boils down to careful handling of polygon vertices that lie along the line segment, to ensure that the line segment does not go outside the polygon at any of those vertices.

E: Hand of the Free Marked

Firstly, it is quite easy to show that probabilistic strategies are of no benefit. Suppose the team use a probabilistic strategy that involves picking a die from a selection (with a number of sides appropriate to the situation) and rolling it. Then instead, they could just roll all the dice in advance, before the cards are selected. Each possible roll would have some probability of success, according to the chosen strategy. They might as well just fix the dice in the combination that gives the best probability, and consider that a deterministic strategy.

We can now see this as a bipartite matching problem. Let U be the set of all possible (unordered) sets of cards that could be drawn, and let V be the set of possible ordered sets of k - 1 cards plus a marking for the final card. The assistant maps each element of U to one element of V, and the magician maps each element of V to some element of U. The trick cannot work for two different elements of U that map to the same V (the magician will only be correct for one of them), so the elements for which the trick works form a bipartite matching.

Edges exist where the element of U and of V have consistent information. Let's refer to the different markings as "colours", and let the "spectrum" of a set of cards consist of the number of cards of each colour of the set. The spectrum is known for each element of V, so it follows that each spectrum constitutes a separate connected component of the bipartite graph (we haven't provided that each spectrum is connected, but it's simple; the important point is that there are no edges between spectra). So we can solve the problem independently for each spectrum then combine the results.

From this point on, we'll assume a fixed spectrum, and let U and V denote only the elements within this spectrum. We can hypothesize that the size of the maximum matching is simply the smaller of |U| and |V|. This is not true for general bipartite graphs, but these graphs have a somewhat regular structure that suggests it (and it is clearly not practical to use a generic bipartite matching algorithm). If you take it on faith, you'll be able to solve the problem, with some basic combinatorics to compute |U| and |V|. The number of possible spectra is small enough to be easily handled.

Let's prove the result though. We'll use Hall's Theorem. Elements of V can be classified according to the colour of the hidden card. Let's denote the subsets of V as \(V_1\), \(V_2\), \(V_s\) for some s. All elements in \(V_i\) have the same degree (let's call it \(d_i\)). What's more, every element in U has the same number of connections to \(V_i\), say \(e_i\). It follows that \(d_i|V_i| = e_i|U|\). Now we can consider the two cases:

|U| ≤ |V|. Consider some subset A of U, whose image in V in B. Let \(B_i\) be the intersection of B and \(V_i\). Let's consider the total number of edges between A and \(B_i\). On the one hand it must equal \(e_i|A|\). On the other hand, it cannot exceed the sum of degrees of \(B_i\), namely \(d_i|B_i|\). So \(d_i|B_i| \ge e_i|A| = d_i|A||V_i|/|U|\) and hence \(|B_i| \ge |A||V_i|/|U|\). Adding these inequalities gives \(|B| \ge |A||V|/|U| \ge |A|\). Thus, the Hall condition is satisfied.
|U| ≥ |V|. Consider some subset B of V, whose image in U is A. Define \(B_i\) as before. Between A and \(B_i\) there will be \(d_i|B_i|\) edges, and also at most \(e_i|A|\) edges, giving \(d_i|B_i| \le e_i|A|\). The argument proceeds as before, just with the directions of inequalities reversed.

F: Islands from the Sky

Firstly, congratulations to the judges for creating a geometry problem (and a 3D geometry problem no less!) that is actually relatively easy to solve.

Each island needs to be completely scanned by some plane. So we can take each island, each plane and check what angle needs to be used by that plane to scan that island. For each island we can take the minimum angle over all planes, and then we can take the largest angle over all islands.

Because each plane scans a convex region, it's sufficient to check the vertices of an island. For each vertex, project the vertex onto the (2D) flight path to determine the point at which it is scanned, then use linear interpolation to find the altitude of the plane at that point. Then simple trigometry (using the distance of the vertex from the projection) gives the minimum angle.

G: Mosaic Browsing

It's clear that a naïve implementation will be too slow. But considering every possible way to slide two signals over each other and compute some summary seems remarkably like convolution. We can in fact make it so with some tricks. First, assign each colour a unique point on the complex unit circle (ideally equally spaced, but that's only important for numeric precision). In the motif, use the conjugate values, and use 0 for uncoloured cells. When multiplying a value from the mosaic with a value from the motif, the result will be 1 if the colours match, or something with a real part less than 1 if they do not. Thus, the real part of the sum will equal the number of coloured cells of the motif if and only if there is a match.

There are a few details to take care of (around sign conventions and padding), but essentially the problem has been reduced to a 2D convolution. Using the Fast Fourier Transform, it can thus be implemented in \(O(RC \log(RC))\), where \(R = r_p + r_q\) and \(C = c_p + c_q\).

H: Prehistoric Programs

Firstly, if the total numbers of opening and closing parentheses are not equal, then clearly it is impossible. Otherwise, we can summarise each tablet with two numbers: its balance (number of opening minus the number of closing parens) and its depth (negative of the minimum balance of any prefix).

Note that it is never necessary to have a negative-balance tablet followed by a positive-balance tablet: if this occurs in a valid solution, they can be swapped around to give another valid solution. Similarly, given two adjacent positive-balance tablets, we can always place the one with the smaller depth first. Tablets with zero balance can be (arbitrarily) treated as positive-balance.

This leads to a greedy solution: from the left, try to place the positive-balance tablets in increasing order of depth, then the negative-balance tablets in decreasing order of depth. Either it gives a valid solution (easy to check) or there is no valid solution.

I: Spider Walk

The traversal rules are time-symmetric, so we can just as easily ask how many bridges to add to ensure that starting at strand S and walking inwards will reach the centre from a particular strand. Let us maintain some distance d and track the minimum cost (in new bridges) to reach the point on each strand at a distance D, starting from S. We start with D at infinity and incrementally reduce it by considering the bridges in decreasing distance from the centre.

Suppose there is a bridge linking strands P and Q at distance d. For D slightly larger than d, we have costs to reach P and Q of \(c_P\) and \(c_Q\) respectively. Once D shrinks below d, the cost to reach P is at most \(c_Q\), and the cost to reach Q is at most \(c_P\). However, the costs may actually be less if the costs to reach the neighbouring strands are \(c_P - 1\) or \(c_Q - 1\) respectively.

Once the new costs for P and Q have been computed, the costs to reach every other strand need to be updated using these as starting points (e.g. if the new cost for P is \(e_P\), then a strand 5 away from P can now be reached with cost \(e_P + 5\), which may be better than the previous value).

Performing this cost update efficiently will require a suitable data structure. It requires the following operations:

Set a single value
Update a range of values [L, R) to the minimum of the current value and \(v + i\), where v is given and i is the index.
As above, but use \(v - i\) rather than \(v + i\).

The \(v + i\) and \(v - i\) updates correspond to the two directions one could traverse the web. I found it simplest to use two separate segment trees for the two types of updates, with the final result being the smaller of the two. Each segment tree node holds a value of \(v\) that has been applied to its entire corresponding subtree. Setting a single value requires pushing these internal values down the tree.

Overall complexity is \(O(n\log n + m\log n + m\log m)\) (the final term is just to sort the bridges; it could probably be eliminated, but there is little need.

J: Splitstream

For each edge, we'll need to know the total number of values that are sent along it. This can be done with dynamic programming, or recursively with memoisation. Then to find the kth element along an edge, one needs only consider a few cases. If the edge start from a split node, it is simple to determine the corresponding index on the edge into the split node. For a merge node it is slightly more complex, depending on whether one or the other incoming edge will run out before the kth element is output.

K: Take On Meme

As a reminder, this is not my solution, but rather one presented during the livestream.

We can imagine all possible final memes as points in a plane, and we're trying to find the one furthest from the origin. This will lie on the convex hull of the possible memes. Let's construct that convex hull.

Suppose we wish to find the point which lies furthest in a particular direction (formally, has the largest dot product with a chosen vector). This is a reasonably straightforward tree DP, where for each node of the tree we find the meme with the largest and smallest dot product. Whatever point we find is guaranteed to be on the hull. By picking a few (e.g. 4) directions, we can find a points on the hull. We can then find the rest recursively: pick two adjacent candidates, and search for the furthest point in the direction orthogonal to the line joining them. This will either yield another point on the hull, or prove that the two candidates are neighbours on the hull.

This leaves the question of how many points might lie on the convex hull — after all, there are exponentially many ways to select winning memes. However, the points all have integer coordinates with magnitude limited to 10⁷, which strongly limits the number of points that can be placed on the hull: a hull with coordinates limited to X has at most \(O(X^{\frac{2}{3}})\) points.

L: Where Am I?

Take one step at a time, keeping track of sets of starting positions that are indistinguishable so far. When taking a new step, each set is partitioned into two (although possibly all into one side) based on whether the new step would reveal a marker or an empty space for given starting positions. Every time a set of size 1 is produced, you know how many steps are needed to identify the location for someone who starts in that location.

Update: apparently the judges intend this solution to time out; I can't tell whether mine does or not since there is no online judge, but my implementation (which isn't super-optimised) runs in 0.9s.

More thoughts on Rust

2022-07-23T00:10:00.002-07:00

I've been playing with Rust again recently (see my previous post on the topic) and have some more thoughts. It's a mix of wonder and horror.

First, the good stuff. As a reminder, Rust restricts the references you can make: at any time, you can either have a single mutable reference to a variable, OR an unlimited number of shared (immutable) references. When I first saw that my immediate thought was that it is great for concurrency, because that's exactly what you need to avoid data races, but it seemed like just an inconvenience for single-threaded code. However, I've since realised that there are several advantages:

Iterators. A common problem in several languages is "iterator invalidation", which occurs when the container being iterated is modified while you're iterating. In other languages, at best you're getting an exception, and possibly you're getting undefined behaviour. In Rust, the iterator holds a shared reference to the container, making it impossible to mutate it for the lifetime of the iterator. This does have downsides though: it's impossible to keep a handle to an element in a collection while still allowing new elements to be added, even if the collection doesn't require reallocation to do so (e.g. a linked list).
Aliasing and side effects. A function that takes a mutable output parameter is guaranteed that it won't alias any of the inputs, and can optimise accordingly. And a function A that calls a function B can be sure that B won't modify anything to which A holds a reference.

The bad stuff is that Rust has too much magic and it's not all specified in the documentation. The particular case I ran across is in my Stack Overflow question. In short, writing a function with a template parameter is not the same as writing it with the concrete type, because the compiler has special treatment for arguments that are lexically declared &mut. The reference manual doesn't mention this, and it says basically nothing about how template instantiation is done. What's more, the special behaviour that's triggered is itself undocumented, and pretty arcane (references are in some manner "reborrowed"). One of the side effects is that the identity function isn't actually a no-op.

The borrow checker also relies on trying to infer what the relationship between a functions inputs and outputs is, based on lifetime specifications, rather than letting you be explicit. Consider this code:

fn inc_ret<'a>(x: &'a mut i32) -> &'a i32 {
    *x += 1;
    &*x
}

fn main() {
    let mut x = 1;
    let y = inc_ret(&mut x);
    println!("{} {}", x, *y);
}

The inc_ret function takes a mutable reference to an integer, increments it, and return an immutable reference. This code should be perfectly safe, because y is an immutable reference to x. However, the borrow checker simply relates the output to the input via the common lifetime ('a) and can't tell that the mutability doesn't pass through to the output, so it refuses to compile this code.

Another piece of magic I'm not totally happy with is the dot operator. Unlike in C, there is no -> operator; instead the compiler will automatically dereference references for you. That might be okay if it wasn't that reference types can themselves have methods (via traits), and it can be ambiguous whether you want the method on the reference or on what the reference points to. The Rustonomicon has a horrifying example of this:

fn do_stuff<T: Clone>(value: &T) {
    let cloned = value.clone();
}

That's a function taking a reference to a cloneable value and cloning the value. But, if you forget to specify the : Clone, it will still compile, but clone the reference instead (even if T is cloneable). So restricting the types accepted by your function has actually changed the semantics!

Disassembling Endo, part 2

2021-05-02T06:27:00.000-07:00

If you haven't read Part 1 yet, read that first. This is Part 2.

As a reminder, here's how far we got in Part 1:

We have a bit of trouble with the VMU. It's time we started doing a bit of disassembly to see why. In general, static disassembly is impossible, because the machine is entirely built on self-modifying code, but fortunately most of the time one rule follows another in a sensible way. There are exceptions where some data is embedded directly into the code (rather than stored on the stack or in separate symbols), but these can be treated as special cases.

The first thing we see in the function vmu is IIICCPIICC, which emits the RNA CCPIICC. Why? It's not valid RNA. This was mentioned on help page 2181889, suggesting that these codes starting with a C indicate "a small part of the morphing process" is done. In fact, you'll find that all the functions start with such a piece of RNA, and they have unique codes. From this, it's possible to examine an RNA trace to determine the control flow at the function level. I ended up not really needing this, because I had some tools in my DNA interpreter for determining what code was running (since copying the code to the red zone just references the original code, it's possible to determine the origin of instructions), but it was quite useful when our team originally competed.

Incidentally, while RNA is technically part of the pattern-matching rules, it almost always occurs as the very beginning and so can be treated as if it is an instruction on its own, and I'll continue to treat it like that. The "next" instruction, then, is (![7512628]) -> (0)IIIIIIIIIIIIIIIIIIIIIIIPIIIIIIIIIIIIIIIIIIIIIIIP. If you do some calculations based on how much of the function is left in the red zone, you'll see that this is adding some data to the start of the stack. Those calculations rapidly get tedious, so it's worth writing a tool to match various patterns and print out higher-level information. In this case one can recognise it as a push and print that out.

Next is ![33](![213118]CCIICCIIIIIIIIIIIIIIIIIP) -> (0). This is a conditional forward jump: it moves to a particular address (which is vmuMode) and tries to match a number there (51). If it all matches, then the next 33 acids will vanish; otherwise nothing happens. The next 33 acids are exactly the length of the next instruction, which is ![218] ->, which is an unconditional jump. Taken together, these say to jump ahead if vmuMode is not 51.

Now we have ![2906](![4406243](![1805])![3101361]) -> (0)(1)IICCCIICICCCCCICCCIIICIPICICCICICCICIIIIIIIIIIIP. Once again, one needs to interpret the numbers: 2906 is the remainder of the red zone, 4406243 and 1805 are the address and size of the function ufo-with-smoke, and 3101361 takes us to the blue zone. We then erase the rest of the red zone, copy ufo-with-smoke to the red zone, and push two numbers on the stack. That sounds like a function call, and indeed you can confirm that the two numbers are the address of the following instruction, and the length of the remainder of the function, for returning to.

After this there is another jump (![2586] ->), and another conditional jump: ![33](![212772]CCCCCIIIIIIIIIIIIIIIIIIP) -> (0), this time comparing vmuMode to 31. Aha, we didn't know to try that value earlier!

Fortunately we've already followed the clues to find the registration code (Out_of_Band_II), so let's write that into vmuRegCode (remember to terminate the string with a value of 255). And now we get our caravan, but in the wrong place:

Wait a second, the caravan was encrypted, and we didn't decrypt it. But we know that the VMU registration code is also the encryption key for the caravan, so presumbly vmu is doing the decryption for us. Let's take a look at the next few instructions:

(![7511175])![48] -> (0)IICIICICICCIIIIICCCICICPCICIICCICCIICIIIIIIIIIIP

((![1200])![7509907]) -> (1)(0)

(![211586](![1152])) -> IIPIPIIIIIIICIICPIICIIPIPIIIICICCCCICCIICICIICCCPIICIPIIIIIIICIICPIICIPPCPIPPPIIC(0)(1)

The first writes some values into those stack variables we saw earlier: specifically, the address and size of caravan. Then it pushes more data to the stack. Weirdly, it's pushing the immediately following 1200 acids, which form code. It turns out that the compiler the organisers used to generate code makes function calls by first pushing a block of garbage to the stack, then overwriting it, rather than directly pushing arguments. So this is just making space on the stack for 1200 acids of arguments.

What's going on with the third instruction? It's writing a long bunch of stuff at the business end, prior to any substitutions. Code generating code! This turns out to be fairly common, and disassembly becomes much easier if you detect this, parse the embedded code and present it as a separate rule. Where it's important I'll prefix such follow-on instructions with a +. So let's try to parse that again:

(![211586](![1152])) -> (0)(1)
+ (![1152])(![7510992])![1152] -> (1)(0)

The first instruction copies 1152 acids from the green zone (vmuRegCode) and inserts them at the "start" of the DNA (although not really the start, because it's after the embedded instruction). Then the embedded rule vacuums it back up and writes it into the stack. This shows one advantage of the embedded rule: if it had been written as a separate rule, then the first rule would have inserted the data before the second rule, and things would have gone wrong.

You may be wondering why this needed to be done with this trick, instead of just a single rule that both collected and delivered the data to the stack. It would be possible, but if the data source occurred after the destination in the DNA, it would have been required a different sort of pattern. With this approach, the loading and storing are independent, which probably makes the compiler easier to write.

Carrying on we see just such usage:

(![7511999](![24])) -> (0)(1)
+(![24])(![7510823])![24] -> (1)(0)

(![7511878](![24])) -> (0)(1)
+(![24])(![7510654])![24] -> (1)(0)

This copies the address and size of caravan to that 1200-acid block as well. Here's more evidence that a compiler (and not a super-optimised one) is involved: a human would have just written the values directly to where they were needed, rather than writing them into a temporary and then copying the temporary.

Perhaps not surprisingly, at this point we call crypt. We've given it the decryption key, address and size of caravan, so it will do its thing. Note that, as documented, strings are always passed as 128 characters, even if they are actually cut short by a terminator.

The rest of the function isn't particularly exciting, so I'll jump ahead to near the end.

(![7509644])![48] -> (0)
RNA: CFPICFP
![75](![7509409])(![24])(![24]) -> IIPIP(1)IIPIP(2)IICIICIICIPPPIPPCPIIC(0)

The first pops 48 acids from the stack (the ones we pushed right at the start), which should leave the return address at the top of the stack. Then we see the RNA sequence that we're told indicates a return. The return statement itself uses more code-generating-code, but this time it's not just a fixed rule stuck on the front of the template: the rule is generated based on values in the pattern! This is how indirection is performed. To split off the leading rule, I had to invent some new syntax: <t:X> means part of a rule is defined by the template at the previous level. So rewriting the return using this syntax gives:

![75](![7509409])(![24])(![24]) -> (0)
+(![<t:(1)>](![<t:(2)>])) -> (0)(1)

In other words, the second rule will skip forward not by a constant difference, but by the number found in replacement (1) of the first rule i.e., the return address; then it will skip forward by (2) from the first rule i.e. the return size (remaining amount of code to execute from the calling function).

Parsing this requires the disassembler to do another impossible thing, because the parsing of the inner rule depends on the replacement values e.g. if (1) was empty then the following IIP would be the number zero on skip instead of an opening bracket. Once again, it's possible to make progress because the actual code seen is generated in fairly consistent ways and so heuristics work well e.g. if one sees a skip instruction immediately followed by a template parameter, it's a good chance that it will be substituted by a number, rather than some arbitrary code. And again, there are exceptions that need to be handled on a case-by-case basis.

There is also some trailing stuff at the end of the function which I haven't entirely understood. It seems to occur at the end of every function. I've split it across several lines since it appears to consist of distinct pieces:

CPPPPPP
IICIICIICIICIICIICIICIICIICIIC
CCFIICIIC
IFFCPICCFPICICFCPICIICIIC = ?[IFPICFPPCIFP] ->
IIII

Overall I think the idea is to terminate the machine if code is being executed in the wrong place (e.g. if you patched code in a broken way that meant rule decoding wasn't happening on the intended boundaries). I don't know what the first and last lines are for, although the first might might be intended to terminate numbers and to create a match item that won't match. Then there are a lot of IIC's, which should ensure that any open brackets get closed; excess IIC's will be consumed in pairs (forming empty rules). The middle line can be interpreted in two possible ways: if all the previous IIC's were consumed, then it is the rule IIC ->, which won't match so is ignored. If the last IIC on the previous line formed an (empty) pattern, then CCFIIC becomes the template IIC, which will combine with the remaining IIC to form another empty rule. So regardless of the parity, the end of this line is highly likely to be the end of a rule so that we're all synchronized for line 4, which searches for a marker that's right at the end of the DNA and skips to it, shutting everything down.

Let's see how addition works, because it comes up here and there. There is a function called addInts, which seems like a good place to start. It's almost never called, addition normally being done inline, so I suspect it's mostly provided for educational purposes. The core instructions look like this:

(![7510064](![24])) -> (1)
+(![7510040](![24])) -> (1)
+(![170])(![<t:<t:(0,1)>>]![<t:(0)>]) -> (0)|1|(1)

(?[P]) -> F(0)
+(![<t:|0|>])P -> (0)IIIIIIIIIIIIIIIIIIIIIIIP
+F(![23])?[P](![7509918])![24] -> (1)(0)P

Now we have 3 layers of rules in one, and nested <t:...> constructs. <t:<t:...>> means that the content is defined by the rule two lines above.

The first set of rules does the actual addition. The first two lines just grab the two values to add (from the blue zone). The third line does the real work. ![170] moves just past the following rule, and then we have two skips, with the distances determined by the two numbers we grabbed. We then write the length of this combined skip (i.e. the sum) just after the next rule.

So we're using the length-of template operation as a hack to implement addition. The paper by the organisers mentioned that this feature was specifically added to make arithmetic faster, although it's useful in other places too. It also explains the documentation for the init function: "Ensures happy, trouble-free arithmetic by growing the DNA to the right length." That is necessary because if the jumps go off the end of the DNA the pattern will be considered not to have matched.

The reason for the second rule is that the first will write the sum with the minimum number of bases, whereas the higher-level code is all designed to deal with 24-acid numbers (or sometimes 9 or 12). So we need to pad the number out to the right length. We want to be able to grab the bits of the number without the terminating P, so that we can add some I's after it. Unfortunately, (?[P]) includes the P, so we need to use a little trickery. Let's say the number is currently CICCP: 4 bits and P. The first step inserts an F in front of the number. The second reads 5 bits, namely FCICC, discards the original P, and appends a bunch of I's and a replacement P. The third discards the F, reads 23 bits (the original 4 plus some of the I's), skips ahead to the P, then places those 23 bits and a P. So we now have a 24-acid number in the right place.

Now that we've seen some function calls and returns, let's see if we can fix something in the picture. Note that there many, many ways to achieve each goal, and I'm just going to list one (or a few) of them.

We'll replace the apple trees with pear trees. To do that, we're replace the code in appletree to just call peartree instead of drawing an apple tree. He's the code we'll write at the front:

![14036](![5800492](![14123])) -> (0)(1)

This looks similar to the function call we saw earlier, but the return address is missing, and so is the jump to the start of the blue zone (which we no longer need since we're not pushing anything). So how does peartree know where to return to? The function that called appletree will have pushed a return address, and that's what peartree will pop and return to. This is what's known as a tail call, and it saves us the bother of writing our own return statement.

Let's take a look at what went wrong with the virus help page that caused the title to be upside down (and if you swapped out the wingding font for a normal one, you'd have noticed that all the text is upside down). It has a very deeply nested rule:

(![313]) -> (0)(0)

(![313])(?[P]) ->
+(![<t:(1)>]?[IIIPCCCCCP]) -> IIIPCCCCCP(0)
+(![<t:|0|>])![10] -> (0)IIIPIPIPIP
+![10](![<t:|0|>]) -> <t:<t:<t:(0,3)>>>|0|(0)
+(![313]) -> (0)(0)

This shows one way to implement a loop, particularly in standalone code that can't rely on the green zone. The first instruction reads the body of the loop and makes two copies of it. The last instruction of the loop does the same thing, so there is always one copy of the loop being consumed and one backup copy.

The loop itself reads a number embedded immediately after the (backup) loop cody, jumps that far ahead (using <t:(1)> to read the number loaded in the previous line), then searches for a string. It also adds a copy of that string to the front of the DNA. This is for a similar reason to addInts adding an F to the front: it allows one to skip to the start of the searched-for string, instead of the end (third line of the loop body), and replace it with a different string. The fourth line discards those 10 filler acids that were placed at the front, and makes a jump purely so that its length can be used in the replacement. The template replaces the backup copy of the loop (which was consumed in the first line) and puts the jump distance after it (replacing the number read in the first line).

What does all that mean? It's a search-and-replace. The number stored after the backup loop is the distance already searched, so that searches don't have to restart from the beginning. It jumps ahead to the position indicated by the number, then finds the next match and replaces it. One thing to keep in mind with nested rules is that if one rule in a set doesn't match, the subsequent ones aren't produced and hence are not run. Thus, once the last match is replaced, the loop stops, because the first step erases the backup copy of the loop (and the number) and the steps that replace it won't occur.

In this case, the loop replaces IIIPCCCCCP, the RNA for counter-clockwise turn, with IIIPIPIPIP (nonsense RNA). This followed by a similar loop that replaces IIIPFFFFFP with IIIPCCCCCP, then another to replace IIIPIPIPIP with IIIPFFFFFP (RNA for clockwise turn). Thus, it's swapped left with right, which explains the upside-down page.

We probably wouldn't have been warned about this for nothing. Maybe this loop is the virus that the page warns us about? We can search for it elsewhere. Some parts of it change depending on what swap is being been made, but the replacement template <t:<t:<t:(0,3)>>>|0|(0) seems pretty unusual, and is independent of the replacement code. That corresponds to IPCCPPPPFPFPPFPFPFP in the original DNA, which we can search for. There are 4 matches: the 3 we've already seen..., and one in surfaceTransform. Just by running that function, we can see that it's responsible for the hills, but also recall from part 1 that when we tried setting hillsEnabled, things went horribly wrong:

In that case the virus is replacing IIIPIIPICP (RNA for resetting the colour bucket) with IIIPIPIPIF (nonsense RNA). No wonder everything seems so green. Let's consider a general approach to getting rid of sections of code we don't want: using a forward jump. We overwrite the start of the virus with the rule ![n] ->, for some n. It's actually not quite trivial to determine n, because it needs to be the length of the virus minus the length of the replacement rule, and we don't know the length of the rule until we know how many acids we need to encode n. One approach is just to use a large fixed number of bits (with 0's in the high-order bits); this seems to be generally the approach taken by the organiser's compiler. However, we get a higher score if we use a shorter prefix, so there is some motivation to use shorter replacements. My approach was to use a function that would take an initial estimate of the position of the end of the rule, compile it, then update the estimate and repeat until convergence. In theory there are some corner cases where this could oscillate and a padding bit would be required, but I've not hit one yet.

A more efficient solution (in terms to changing fewer bases) is just to change the string that is searched for. But the tool above will come in handy elsewhere.

With that fixed, the hills now appear in the scene, although they're not yet in quite the right positions:

So how are those hills drawn anyway? It actually involves a fair amount of code. Here's what my disassembler spits out for the first hill:

0x0542 <blue+0x018> := 0
0x058e <blue+0x000> := 242
0x05da CALL moveTo
0x0693 PUSHS 120
0x06d3 <blue+0x018> := 8388580
0x071f <blue+0x000> := 3348
0x076b CALL functionParabola
0x0824 cbfArray+0x48[:72] := POP 72
0x08a1 PUSHS 144
0x08e2 <blue+0x030> := 408
0x092e <blue+0x018> := 7
0x097a <blue+0x000> := 4
0x09c6 CALL functionSine
0x0a7f cbfArray+0x90[:72] := POP 72
0x0afc PUSHS 216
0x0b3d <blue+0x048>[:72] = cbfArray+0x48[:72]
0x0bd4 <blue+0x000>[:72] = cbfArray+0x90[:72]
0x0c6b CALL functionAdd
0x0d24 cbfArray[:72] := POP 72
0x0da1 PUSHS 120
0x0de1 <blue+0x030>[:72] = cbfArray[:72]
0x0e78 <blue+0x018> := 250
0x0ec4 <blue+0x000> := 50
0x0f10 CALL drawFunctionBase

PUSHS X means make X acids of space on the stack. [:72] just indicates the size of a value. The disassembler doesn't know about negative numbers and two's complement, so 8388580 is really 8388580 - 2^23 = -28. In summary, it generates two closures, one from functionParabola and one from functionSine, and adds them, then passes the resulting closure to drawFunctionBase.

The second hill is similar, but with different parameters and moveTo location, while the third uses only a sine function. The first piece of text we found in the DNA mentioned that the organisers had sabotaged the DNA by swapping some parabolas around, so see what happens if you swap the parameters of the two parabolas. To do this you'll need to find the positions of the numbers in the DNA (they're quoted). I found it very useful to reuse some of my disassembly and assembly code to disassemble the desired instruction, replace the value (after checking that it matched what I expected it to be) and reassemble the instruction. That allowed me to put in a number of safety checks so that typing errors didn't send me off on long debugging sessions.

That looks worse, but actually it is progress. If you compare this to the target image, you'll find that the left two hills are merely at the wrong heights. It helps to use an image editor that lets you shift the images relative to each other while showing a difference or other blend of them (I used the GIMP, opening the images as two layers and dragging one across the other). On the first hill, change the second argument to moveTo from 242 to 218, and on the second hill, from 209 to 235.

The third hill is trickier. You can almost make it match by changing both the x and y positions, but there are always a few pixels that don't quite match. I don't know if there is a clever way to determine the solution (maybe analysing functionSine to determine the meaning of the parameters and then fitting them), but brute force (automatically trying multiple values) will get you there if you know that only one of the parameters is wrong. In fact, the second argument must change from 65 to 60.

While it's generally not this bad, the contest does have a fair amount of tedious modification of coordinates of things to put them into the right place. For example, the caravan will need to move; you can find the position being set at offset 0x3b66 in scenario if you want to try your hand.

For now let's keep things interesting by looking at some code. But first let's get the last few pages of the repair guide. If you disassemble main, you'll see various bits of code that compare the value at helpScreen to a constant and then call one of the help functions we've already seen. Even if you don't have a disassembler that can interpret all the rules, you should be able to extract these to determine the valid repair page numbers. There are two that weren't shown in Part 1, and we missed them because they're implemented inline in main rather than in a help- function we could call.

1024 (which you might have found by brute force):

180878:

That yellow shape in the middle looks like exactly what we need to put at the centre of the sun (although it's too large). But page 1024 tells us more about the compressor hinted at by page 123456. Now that we've seen the virus, the principle of a loop that keeps copying itself makes more sense. It says it's used for bitmaps, so let's look inside a page that has a bitmap: alien-lifeforms.

The compressor itself looks like this:

0x182b   (![19])(![944])(?[P]) -> (0)(2)(1)
0x186c   (![56]![128]![32]) -> (1)(0)(1)
0x18a5   (![672])(![178]) -> (1)(0)(1)

Or at least it appears to: this is one of those rare cases of self-modifying code that fools disassembly. One should be suspicious of the middle line because it appears to reference a capture group that doesn't exist. Here's the raw DNA for it:

IIPIPIIICCCPIPIIIII IICIIPIPIIIIICPIICIICIPPCPIPPPIPPCPIIC

The first rule grabs the next number from the green area, and pastes it into the middle of the next instruction (at the gap shown). It appears after IPIIIII, which turns it into a jump that's 32× bigger than the number (left shift by 5). So for example, if the number is 3, the rule becomes

(![56]![96])(![32]) -> (1)(0)(1)

The ![56] jumps over the 3rd instruction, and the ![96] jumps to the 3rd 32-bit table entry, which is then copied to the front and executed. The final instruction just restores the red part from the orange part.

What's in these 32-acid table entries? One could conceivably do quite a lot, but they aren't used for much. Most of them consist of IPIIIIII...IPIII<RNA> i.e. a jump of 0 (just for padding) and then one piece of RNA (all of which get welded onto the front of the next instruction). The last entry (20) is special: it simply jumps over the rest of the compressor to resume execution.

Now that we know what the decompressor looks like, let's go see where it gets used. We can take the raw DNA and just search for it in the downloaded DNA. It appears in the following functions: 'M-class-planet', 'alien-lifeforms', 'cargobox', 'fontCombinator', 'fuundoc1', 'fuundoc2', 'fuundoc3', 'grass1', 'grass2', 'grass3', 'grass4', 'help-steganography', 'most-wanted', 'printGeneTable', 'sticky', 'transmission-buffer'.

Most of these are unsurprising, since they contain some sort of image or complex shape. But what about printGeneTable? It's just text — at least on the pages we asked for. Maybe, like the repair guide, there are hidden pages? Let's ask for page 15:

Notice the image in the bottom-right corner? It's a match for the whale spout in the target picture, although upside down. We'll come back to it when we start assembling the final picture.

The cargo box also uses the compressor. Dump the RNA in the table:

0x0256   Entry 0: RNA: move
0x0276   Entry 1: RNA: cw
0x0296   Entry 2: RNA: line
0x02b6   Entry 3: RNA: mark
0x02d6   Entry 4: RNA: ccw
0x02f6   Entry 5: RNA: red
0x0316   Entry 6: RNA: black
0x0336   Entry 7: RNA: white
0x0356   Entry 8: RNA: magenta
0x0376   Entry 9: RNA: fill
0x0396   Entry 10: RNA: reset
0x03b6   Entry 11: RNA: cyan
0x03d6   Entry 12: RNA: green

The source image has a rather purple cargo box, which is probably built from the magenta in the table. What if we change that table entry to yellow?

Visually it looks right, but comparing pixel values to the target, the filling in the A is now slightly the wrong colour, with too much green and not enough blue. The only other table colour that uses different amounts of green and blue is the last one (green). So maybe we need to change that to blue to balance it out? This does indeed work.

It's about time we turned the weather on again so that we can get the clouds and work on the cow. Set weather to 2 and enableBioMorph to true:

There are a few elements in the scene we don't want, such as the lightning bolt and the rain. We've seen one way to disable code we don't want (replace a chunk of code by a jump), but I'll demonstrate another that can disable individual rules with just a one-acid change (which will help our score). If you disassemble sky-day-bodies, you'll see the call to lightningBolt at 0x29a, with DNA that starts with a jump:

IPIIIICICIIIICIIIIIIIIIIIIP = ![2128]

What happens if we change the last I to a P? It becomes

IPIIIICICIIIICIIIIIIIIIIIPP = ![2128]F

Suddenly the rule is expect to have an F in a particular place (the start of the green zone) or it won't work, and there isn't one there. So instead of making a call, nothing happens:

Note that this only works for calls to functions without arguments, since the arguments are normally pushed separately and without the function call, they won't get popped again.

While we have this tool handy, let's also zap the calls to lambda-id (at scenario+0x4322), crater (at scenario+0x8b30) and cloak-rain (at scenario+0xa370).

What's with all the red, yellow and black? Disassembling the various functions shows that a variable called cloudy seems to play a role. Patching its value in the original DNA doesn't seem to help. With some hacks on the DNA interpreter it's possible to see where it gets changed (or you could just disassemble all the things): when cloud is run, it sets it to true. The trick we used earlier doesn't have quite the same effect here: we end up replacing

(![831561])![1] -> (0)P

with

(![831561]F)![1] -> (0)P

and since cloudy starts off as F, the rule still matches, and ends up writing a P into the following acid. Fortunately it's one that doesn't matter, but there is an alternative way to disable the rule, which will be useful later. The ![1] encodes as IPCP, and we change that to PPCP, which decodes as FFIF, which is unlikely to match unless we're very unlucky (if we are though, things will go badly wrong because we'll be resizing the green zone).

In Part 1 I pointed out that we can see a faint shadow of the desired cow behind the endo-cow hybrid. Let's see if we can recover the cow. If you look at the start of bmu, you'll see RNA early on to put one opaque and 9 transparent into the bucket. What if we change all the transparent to opaque?

Not quite what we were hoping for. You'll notice that after the transparency RNA there is a fill; so the entire layer is now solid black. The reason only a cow-shaped region appears black is that the cow is used to clip this layer. What we actually want is to compose the cow onto the scene. There are a few ways to fix this. One is to change the clip RNA into compose, and instead of changing the transparent commands to opaque, change the one opaque command to transparent (if the opaque command is left in, the whole scene ends up a bit too dark).

We'd better get rid of the unwanted endocow, which we can do the same way we've dealt with other unwanted scene elements.

His tail is missing, because I forgot to decrypt it. In this case, the code actually runs an integrity check on the tail (as well as on cow-spot-middle) and skips it if the integrity check fails. Recall from Part 1 that it is encrypted with 9546.

At first glance it looks good, but the colour is wrong. It fact, it's completely opaque, whereas in the target image it is translucent, and you can see the grass through it. We're going to need to patch it to insert some opaque and transparent commands to get the colour right. We'll need to determine the correct number.

One complication is that the entire scene is overlaid by a fine grid of almost-transparent lines, which is produced by the function anticompressant (as the name suggests, the purpose is to make it more difficult to brute-force the problem by generating the image directly with RNA). While not strictly necessary, it'll be easier to solve colour problems if we don't have to worry about it. For the source picture we can use one of the techniques already discussed to disable the function, but what about the target picture? To fix that that, we'll start by finding the image that anticompressant overlays. We can do that by getting our RNA-to-image tool to spit out the first image (including the alpha channel) each time composition is done. It's easy to mistake the anticompressant pattern for a black image because it is so faint. With the levels adjusted in an image editor, the colour and alpha components look like this:

Now using the equation for composing images from the specification, you can reverse the process. There is a rounding step which loses information, but because the overlay is so faint, in most cases the colour value can be recovered exactly, and in other cases there are only two possibilities.

So returning to the problem of the tail: let's choose a pixel and check its colour when the tail is absent (e.g. it wasn't decrypted), when it's present and fully opaque, and in the target picture; in each case with the anti-compressant absent or reversed. I got the following at (454, 348):

(0, 100, 0) when absent
(119, 166, 219) when opaque (and indeed on all pixels of the tail)
(83, 145, 152) in the target

Let's say we now add some alpha RNA to the mix, so that the tail has an alpha value of a. The colour will then be (119a/255, 166a/255, 219a/255, a), where division rounds down. Just looking at the red component, this tells us that 83*255 ≤ 119a < 84*255, and similarly from the blue component, 152*255 ≤ 219a < 153*255. The only integer value for a that fails into those intervals is 178. 178/255 = 0.6980, which looks pretty close to 0.7, and indeed creating 70% opacity (by issuing opaque 7 times and transparent 3 times) will produce an alpha of 178.

That tells us what RNA to add, but how to we do it? There isn't room in the function to add them directly, but we can do some compression. For example, if one replaces some RNA with a rule like ([!30]) -> (0)(0)(0) it will repeat the following three pieces of RNA three times. So as long as we write a rule that is a multiple of 10 bases long, we can overwrite some of the existing RNA, and generate our new RNA plus the RNA we overwrite. I'll leave the details as an exercise for the reader (if you get stuck, there is a solution listed here).

Unfortunately when you do this, the tail disappears again! Now the integrity check is working against us: because we modified the function, it fails the integrity check. We can hack checkIntegrity to always return true by disabling the jump at offset 0x611, which then gives us the cow with the correct translucent tail:

What about the whale? When we simply called whale it looked alive, but in our picture it looks dead. So possibly it examines some state to determine whether it is alive or dead? Indeed, the code has an IF statement (following the pattern we've seen before), and we can disable it in the usual way (replacing the high bit of a jump with a P in the instruction at offset 0xcb1). Incidentally, it seems like the function takes two boolean arguments but only uses one of them; I don't know why.

While we're dealing with animals, what about the ducks? If you disassemble scenario you'll see that there is a call to motherDuckWithChicks — so where they? Look a little further and you'll realise that it's unreachable code: a boolean is set to true, and then if that same boolean is true, we jump over the code. Break the goto in the usual way, and the ducks appear — although one seems to become lost in the trees.

What's not immediately obvious, but far more serious, is that half the elements in the scene have shifted two pixels to the left! You might remember from part 1 that motherDuck failed the integrity check, so there is probably an issue somewhere there that causes the cursor to end up in the wrong position.

Page 8 of the field repair guide describes how polygons are encoded, and says that the last pair is the sum of all movements. What if it isn't? It will lead to exactly this sort of problem. Polygons are easy to recognise in the DNA with a regex (remember that they appear quoted), so we can find the polygons and check them, and indeed one of the polygons in motherDuck is 2 pixels off. Finding which offset is wrong takes a little more work. I just used brute force: try adjusting every X value by 2 pixels, render them all, and check which matches the target (it turns out to be the 63 numbers from the start of the polygon). Visually not much has changed, but everything is back in the right place:

Next let's sort out the text at the bottom. It's going to be a little trickier than just replacing the text in the DNA, because the replacement text is longer. However, when everything went German, the text changed to "Endo hat gemorpht" which is exactly the right length for our replacement. So perhaps we can use just that bit of the code instead. Here's the relevant area of code:

0x9444 <blue+0x4b0>[:9216] = fontTable_Cyperus[:9216]
0x94f0 <blue+0x498> := 285
0x953c <blue+0x480> := 570
0x9588 <blue+0x000>[:162] := "Endo hat gemorpht"
0x968d CALL drawString
0x9746 GOTO 0x9c16
0x9767 colorTable := ...
0x98a2 charColorCallback := useColorTable,6709
0x9908 PUSHS 10416
0x994f <blue+0x4b0>[:9216] = fontTable_Cyperus[:9216]
0x99fb <blue+0x498> := 285
0x9a47 <blue+0x480> := 570
0x9a93 <blue+0x000>[:108] := "Morph Endo!"
0x9b5d CALL drawString
0x9c16 RNA: compose

Immediately after the PUSHS at 0x9908, we'll make a (backwards) jump to 0x9444, which will print the longer piece of text (and which will then safely jump forwards to 0x9c16). What does a backwards jump look like? Unlike a forward jump, the target code doesn't exist in the red zone, so we have to restore it from the green zone. We copy the code from the jump target to just after the jump instruction. As for forward jumps, we can use two passes to solve the problem of not knowing where the jump instruction ends until after we've compiled it. In my implementation, it ends up looking like this:

(![5074344](![1358])) -> (0)(1)

Let's get the sun back into the sky. We saw in Part 1 that we could see it, in the right place but the wrong shade of yellow, by setting weather to 3. From the target picture (particularly with the anticompressant removed) one can see that the sun is the same shade as the flowers in the bottom-right. Disassembling flowerbed shows that colour to be colorSoftYellow. So we need to somehow call that before drawing the sun. And as luck would have it, there is some code we don't need or want in sky-day-bodies just before drawing the sun, which is to check the value of weather!

0x072c IF weather != 3 GOTO 0x0aae
0x07ac PUSHS 48
0x07eb <blue+0x018> := 480
0x0837 <blue+0x000> := 20
0x0883 CALL setOrigin
0x093c CALL sun
0x09f5 CALL resetOrigin
0x0aae RNA: compose

There is plenty of room in that weather check to overwrite it with a function call, but we can also keep the prefix length down by not calling the whole function and instead just copying the RNA (which saves having to jump to the blue zone and write a return address). The recipe for this looks exactly like the backwards jump from above: we're copying code from the green zone to the front of the red zone, without disturbing the remaining code of the current function. Immediately after that we have to insert a forward jump to 0x07ac to skip over the remnants of the original code.

Unfortunately on its own this won't fix the colour of the sun, because sun starts by setting the colour:

0x000a RNA: reset
0x0014 RNA: yellow

We can fix this either by modifying the call to sun to enter after the unwanted RNA, or just alter the RNA to become nonsense RNA that will be ignored. With that in place (and don't forget to replace the whole sun function with the XOR of flower and sunflower first):

Next let's sort out the spirograph at the centre of the sun. As we saw, the code for it is in main, but we need a way to call it. A powerful technique is to patch the return addresses of function calls: this allows you to resume execution anywhere, even in another function, rather than with the next instruction. We're going to want to be able to control the position, so it'll be useful to get a call to setOrigin before executing the desired code. As it happens, crater has a call to setOrigin very early on. So instead of disabling the call to crater, let's leave it enabled and hijack it to draw the spirograph.

Change the return address at crater+0xf5 to main+0x6f9a, which starts drawing the central spirograph. It's completed at main+0x866d, which has a compose RNA to balance the add RNA from the start of crater. We still need a resetOrigin to balance the setOrigin added by crater. So add a forward jump to main+0x8ae7 (which is a call to resetOrigin).

After that resetOrigin call, we don't want to run the rest of the code in main. We could add another forward jump from after it to the end, or change the return address, but there is another trick we can use that requires a smaller change: we can remove the return address entirely, turning this into a tail call so that the callee returns directly to the caller. This isn't completely trivial: if we just terminate the terminate early then the skips in the call instruction will go to the wrong place (because we'll have consumed less of the red zone than expected). So we have to remove the return address without changing the length of the rule. We can do that by replacing the first three bases of the return address with IPP and the last with P. This turns this part of the template into (n,0), where n is some very large number. The specification says that such substitutions are simply ignored.

That gives us this (don't forget to remove the code that disabled the crater):

Clearly both the size and position will need to be fixed. We'll sort out a whole bunch of positions later, but let's look at the size. The first argument to spirograph is called magnify, which is a bit of a clue. One option is just to change it from 4 to 1 each time spirograph is called. If you want to make a smaller patch, one can look inside spirograph. It multiplies the next two arguments (radiusSum and radiusMoving) by magnify, writing the results back in place. We'd like to disable that code. The writeback looks like this:

(![7519442])(![24]) -> (0)
+ (![7519707])![24] -> (0)<t:(1,1)>

The first line fetches and removes the return value left by mulInts, and the second line uses it to replace an existing value. We can't disable the whole rule (remember, the + indicates that the second line is really a rule emitted by the template on the first line), but if we disable the second line it will have the desired effect. We can do this in our usual way (writing a P just before the end of the number in a jump), except that this time everything is quoted by an extra level, so we write IC instead.

While we're calling into odd bits of code to draw missing picture elements, let's sort out the whale spout. We can again pick a function that we suppressed earlier and repurpose it. We have to be a little careful in our choice though, because it needs to be drawn before the whale (which overwrites part of it). I chose to use lambda-id. For this part I haven't tried to be optimal, just to get something working. Replace the start of lambda-id with a tail call to printGeneTable+0x16ef5 (which sets the location before drawing the spout). At printGeneTable+0x23c62 place a jump to printGeneTable+0x271dc (the return statement). The return statement contains a ![1] to pop the boolean argument from the stack; change it to ![0] (without changing the length). In the RNA compression table, swap the entries for clockwise and counter-clockwise rotation. Don't forget to re-enable the call to lambda-id.

Well that went badly wrong. I'm not sure exactly why, but when objects wrap around (particularly to negative positions) it seems to confuse the code that keeps track of the current position. It's probably time to fix a whole bunch of positions. This is tedious work: find the code that calls moveTo to setOrigin, check pixel positions in an image editor, figure out how much to adjust the position by, and make the patch. In some cases one needs to provide a negative position to setOrigin so that the actual drawn position is in range (encoded using 2's complement). I'll just provide a list of instruction offsets (for the first of two instructions that set x and y) and the correct values:

scenario+0x3ba5: (267, 210) (caravan)
scenario+0x862c: (410, 200) (whale)
scenario+0x3157: (171, 410) (chick)
crater+0x005d: (-133, 147)
printGeneTable+0x16f34: (385, -75) (spout)
flowerbed+0x0049: (34, 0) (flowers in bottom right)
flowerbed+0x04c8: (0, 0)
flowerbed+0x0951: (58, 12)
flowerbed+0x0dda: (17, 24)
clouds+0x0049: (20, 25)
clouds+0x03df: (180, 55)
clouds+0x077f: (340, 30)

We also need to fix the sizes of the clouds (in practice, you'd do this before trying to fix the positions):

clouds+0x56e: 10
clouds+0x90e: 20

That makes things look a lot better:

What do we still need to fix?

The speech bubble and the swimming pool are pretty clear, but there is also a little bit of red at the base of the windmill, because the chick is drawn before the windmill and so the top of its head is clipped. Let's sort that out first. We need to call the code that draws the chick at some point after the windmill. There are probably lots of ways to do this, but I chose one that also lets us eliminate the rain at the same time without a separate patch. At scenario+0xa2b7 (the instruction before calling cloak-rain), change the return address to scenario+0x32a8 (which is the start of the call to chick). Then after 0x346d, jump forward to 0xa429 (just after the call to cloak-rain). The chick now takes its position from the position set at 0xa21f, so the position needs to be patched there instead of where it was originally patched.

We also need to prevent the chick from being drawn earlier, since then the forward jump will skip right over the windmill (and a lot of other things besides). At 0x273c there is an instruction to set ducksShown to true, which is later checked to decide whether to draw the chick. We can disable that in the same way we've disabled other instructions that set a boolean to true.

Now let's change the λ to a μ. The speech bubble is drawn in the function balloon. And the relative bit of code is

0x16e4 PUSHS 10416
0x172b <blue+0x4b0>[:9216] = fontTable_Tempus-Bold-Huge[:9216]
0x17d7 <blue+0x498> := 98
0x1823 <blue+0x480> := 20
0x186f <blue+0x000>[:18] := "L"
0x18d5 CALL drawString

In Part 1 we swapped out the font on help page 10646 to see different fonts, but if you try this with Tempus-Bold-Huge it'll draw the lambda, then (in my implementation) crash out with an integer overflow. I'm left with this:

So there may be something wrong with this font. The symbol table includes charInfo_Tempus-Bold-Huge_, charInfo_Tempus-Bold-Huge_L and charInfo_Tempus-Bold-Huge_M; maybe one of them is broken. On extracting the DNA from the first two we see that there are no F's and that the P's are generally preceeded by either C or P, which suggests they are sequences of variable-length numbers. We were told that the RNA compressor was used for font tables, so this isn't surprising. However, the last one seems almost random, with a mix of all four bases.

Random... or encrypted? In Part 1 we found one password that we haven't found a use for yet: no1@Ax3. And indeed, if we decrypt charInfo_Tempus-Bold-Huge_M with it, the font table looks healthy again, and we have our μ.

Now we can change the "L" to an "M" where it is used:

Getting the shape if the speech balloon right is much harder. As far as I'm aware, the right shape isn't present in Endo's DNA.One approach is to reverse-engineer the polyline that forms it, and replace the existing polyline. I took a different approach: drawing the outline directly with RNA. However, RNA is rather unwieldy (requiring 10 bases per command), so some compression is in order. We'll start withit it though.

Firstly we'll extract the shape of the balloon. Start at some arbitrary point inside the balloon, and do a flood-fill. The inside isn't quite a uniform colour, so match anything that's close enough to gray e.g. difference between min and max channel is at most 50.

Next, identify just the border pixels: those that are inside the balloon, but share an edge with the outside.

These pixels can be arranged in a linear order with each adjacent (possibly diagonally) to the previous one. In most cases it's obvious which pixel to go to next, but a little care is needed at the bottom corner where the path loops back on itself. We can trace out this shape using RNA, using the commands to move forward and turn left and right. Before leaving a pixel, issue a mark command, and after arriving at the next pixel, issue a line command. It doesn't really matter where you start (we'll add a moveTo call before-hand to get us there), but it's important to end facing east (which is also the starting direction) and to end in the same place you started, as otherwise the higher-level code will be confused about the current location.

Once we've done some compression we'll be able to patch the balloon function in place, but for now we don't have enough room. We'll pick some other function we don't need and overwrite it (I used contest-2005). Overwrite it with a call to moveTo (use 0, 0 as coordinates for now), then the RNA, then a return that pops 24529 from the stack (look at the end of drawPolyline to see what that looks like). Now in balloon, replace the call to drawPolyline with a call to the replacement function. The result will depend on which point on the boundary you picked as a starting point; I used the left-most point on the top row.

Definitely not a complete success, but we can see the top-right of the balloon (if it looks a bit out-of-shape, that's just because you're seeing the sail of the windmill through the hole), and it tells us a bit about how it is being drawn. It's a little hard to see, but that white rectangle isn't a uniform colour — it has a gradient to it. The balloon shape is then used as mask on this gradient rectangle.

We can see this more clearly if we apply a transformation to the colours to make gradients more apparent. I wrote a small tool that multiplies R by 11, G by 17 and B by 29 (all wrapping modulo 256). The anticompressant really interferes with this, so here's the result for the image above but with the anticompressant disabled.

And the target, with the anticompressant reversed:

Since we can only see a small part of the gradient in the target it's not that easy to determine where the rectangle should move to. What's more, if you try to line them up you'll find that they seem to have different slopes. It's worth looking at how the gradient is generated. balloon calls drawGradientCornerNW. Before doing so, it sets a special flag colorReset to false. This prevents the colour callback (in this case, colorWhite) from resetting the bucket before adding a colour. So as drawGradientCornerNW proceeds, it will keep adding more white to the bucket, gradually changing the overall colour. As the colour bucket gets fuller, each new addition has less relative impact, which is why the gradient is strongest at the top-right and gets gentler towards the bottom left.

drawGradientCornerNW works by setting a mark at the current position (the NW corner), moves to the NE corner, then steps down the east edge, drawing a line to the mark at each pixel before repeating the process along the bottom edge. On the target we can get an estimate of where the NW corner is by extending the lines of colour in the image to their intersection, and we can estimate how wide the rectangle is by looking at the slopes of the lines and comparing them to the slopes in our generated image. Getting the height is trickier, because we only get a little information from the stem of the balloon, but we know it has to extend at least to the bottom of the balloon and can just try multiple options from there. It turns out that the rectangle is exactly the bounding box of the balloon, which is also something you might have guessed and tried.

So, we're going to make a few changes:

Change the coordinates of our call to moveTo to 67, 0, to move the bubble to the correct position relative to the box.
Change the coordinates of the setOrigin call at scenario+0x900b to 198, 324 to place the NW corner of the box correctly.
Change the arguments to drawGradientCornerNW in balloon to 101, 169 to set the size of the box.
Change the coordinates of the moveTo at balloon+0x13ca to 74, 37. This is the point from which the balloon is filled, and we need it to be somewhere inside the balloon. There are lots of choices, but this one requires changing only a single bit.

Nearly there! We just need to shift the μ to the right place. At balloon+0x17d7, change the coordinates to 56, 2.

That just leaves the swimming pool for the whale. Once again, we can repurpose a function that we previously suppressed. This time it needs to occur after the whale, so we'll use endocow. In Part 1 we say something similar to what we need, when we set the weather to 2 but before activating the VMU. Here it is again:

Notice that the cupola of the UFO is half-filled with water. It's also the shape we need, but we need it upside down. Let's see if we can get just that part into our scene, without trying to flip it over.

As usual, we'll rewrite a return address in endocow so that the return jumps into the code we actually want. endocow starts with a call to setOrigin, so we'll change the return address of that to jump to ufo+0x0a08, which calls water. For now we'll also change the coordinates in the setOrigin call in endocow to 55, 42, to match those that would have been used for the water if we'd run through ufo from the top.

The following code in ufo after needs a bit of explanation: it first draws the shape of the cup (inline) and uses it to clip the water to the right shape. It then draws the shape again (again, inline, rather than by calling ufo-cup) partially transparent, and composes it. That code ends at 0x1c30 with a jump to 0x1f53. We don't want to run any of the subsequent code, so replace 0x1f53 with a jump to the return statement (at 0x2c77).

The cup is visible (between the balloon and the windmill), but the cow has disappeared. This happened because by jumping into the middle of
ufo, we've messed up the image layers. If you step through the RNA a piece at a time, one can see the water being drawn on the same layer as the cow, and when it is clipped, the cow is clipped away. So we need to add an extra layer. Fortunately, endocow starts with a piece of junk RNA to identify the function, which we can change to an add RNA.

Now let's flip the cup upside down. We'll need to wedge in two turns between drawing the water and drawing the first instance of the cup. We can overwrite the CFPICFP marker RNA in the return of water. For the polyline used for clipping, the colour doesn't matter, so we can replace the white RNA at ufo+0x0b8e with a second turn. We also need to restore the original orientation afterwards. Since we added a jump at ufo+0x1f53, we can just insert two turn RNA's at the start of that jump.

That's flipped the cup upside-down. The water now appears to be filling the top instead of the bottom. That's because the water is just a polyline, and we're now seeing the bottom edge instead of the top. More seriously of course, we've messed up the position tracking so a number of elements of the scene are now in the wrong place. This is not unexpected when we flip the direction under the hood.

One way to fix this is to make sure that the movements done while flipped bring us back to where we started. The last movement before the flip is a call to moveTo in water; the last before we flip back is another call to moveTo in ufo. We don't need to change the coordinates of either: the one in water is called within a setOrigin/resetOrigin pair (one started in endocow), and we can bring them into alignment by adjusting those coordinates. Specifically, the coordinates that we changed to 55, 42 earlier instead need to become 48, 8 for the arithmetic to balance.

This has also shifted the water relative to the cup, and we'll need to adjust it further. But haven't we just pinned down the knob we have to adjust these relative positions? There is another knob: polylines have a starting position, encoded as the 2nd and 3rd numbers in the list (refer back to repair guide page 8). That is currently 56, 65; it'll take a little trial and error to get it right (you first need to get it close enough to see some reference points), but these need to be replaced by 62, 72. Don't forget to update both copies of the polyline.

There's just one more step! At bmu+0x45ee, we need to change the position from (160, 310) to (372, 257) to place the cup in the right position.

If you made it this far, thanks for reading, and if you've actually produced a perfect prefix yourself, congratulations!

I had originally planned to write a Part 3 which showed how to optimise the prefix, but this series has already become very long and I need a break. I've also incorporated a lot of tricks into Part 2. Jochen Hoenicke's page describes his incredibly short prefix, and you can probably learn a few things from it (I certainly did: several ideas presented here are originally his). If I do get back to it one day, the interesting parts would be

Compressing the RNA for the balloon. I've written a decompressor that maps each of I, C, F and P to a short sequence of integers, after which the standard RNA decompressor is used to decompress those into RNA. It works out to 773 bases for the decompressor and data (but requires a separate moveTo call first). If one could find the right tradeoff between complexity of the decompressor and length of the data it might be competitive with Jochen's polygon decompressor.
Writing a code reverser to turn duolc into cloud and an XORer to fix sun, in DNA.
Writing a patcher that takes a compact table of patches to apply and runs a loop to apply them.

Disassembling Endo, part 1

2021-04-21T12:56:00.004-07:00

Since it was recently Easter, I decided to revisit what's probably my favourite easter egg hunt / programming contest ever: ICFP 2007. It's a problem where you are given a description of a strange virtual machine together with a large and slightly damaged program for it, and have to modify it to make it produce the right output, with reward for the most minimal modifications. The original contest was 72 hours, but this time around I'm spending a lot more time on it because I really wanted to investigate all the intricacies and solve all the puzzles.

I'm going to make a walkthrough on this blog since I find it interesting to look back at the chain of clues to see just how much depth the problem authors put into it. At the time our team wrote a blog post which described how far we got, and it has links to a lot of similar writeups, but I'm not aware of a complete walkthough. The link above has a report from the authors that briefly describes a few of the puzzles, but not everything. Jochen Hoenicke also has a page listing a highly optimised solution which perfectly reproduces the target image, but it's not a walkthrough: without having tackled the problem yourself the explanation will not make much sense.

Like any walkthrough, this blog is going to be full of spoilers. If you want to try the problem yourself, stop reading now and have a go, and come back when you get stuck.

I plan to write this in 2-3 parts. Part 1 will find most of the hidden clues you need. Part 2 will dive deeper into the internals of the code to understand what it does and how we can write new code to achieve our goals, aiming to finish up by perfectly creating the target scene. If I still have the energy I'll write a Part 3 to look at optimising the prefix to improve Endo's chances of survival.

Ok, onwards! I assume you've read the problem statement. I'll be writing a lot of pattern matching rules, so I should explain my syntax. It's basically the syntax used in the problem statement, but tweaked to make it plain ASCII that can be parsed unambiguously:

*xxxxxxx* is RNA (IIIxxxxxxx in the DNA)
?[xxx] is search for xxx
![n] is skip n
() in patterns for bracketing subexpressions
(n,l) in templates for a reference to the nth element of the environment at protection level l (note: they appear in the opposite order in the DNA)
(n) is shorthand for (n,0)
|n| for the length of the nth element of the environment.

So clearly there are two things you'll need to implement: the DNA → RNA conversion, and the rendering of RNA. The first is by far the trickier of the two. Conceptually it's not too hard, but it needs to be implemented carefully or it'll take hours to run. They're not kidding when they say you need to be able to do large jumps in sub-linear time, as well as pasting large strings when expanding the template. You're going to be running the simulator a lot, so if it's taking more than 30 seconds then it's going to be worth doing some optimisation early on. Here are a few hints:

The search operator (?) is fairly rare, and the strings searched for tend to be quite short (less than 20 characters). Don't spend time implementing fancy string searches like Boyer-Moore. There are a few puzzles which will take much longer to run if this is horribly inefficient.
It's quite common that the template ends with an (unquoted) substitution of a subexpression that ends at the end of the pattern, for example, (![123456](![100])) -> (0)(1). In that case you don't need to remove it and add it back again.
The specification doesn't put an upper bound on integer size. You won't need more than 32 bits (although some have more than 32 bases, but with zeros in the high bits). I've actually found it quite useful to crash out when this is violated, since it indicates that I've done something wrong.
Don't worry if you get some RNA that doesn't match any of the valid combinations. It's normal.

I ended up implementing strings as a binary tree of string views, pointing at either the original DNA, or dynamically allocated strings (basically a rope). I also tried ensuring balance of the tree by using a treap, but it didn't make much difference except in a few special cases. A useful optimisation was to collapse trees with fewer than a certain number of characters, since it's cheaper to do the copy once than to walk through the tree each time.

Ok, so assuming you've implemented your simulator correctly, you should see the starting output:

What now? The problem statement gives you a hint that you should try a particular sequence. I suggest also working out what each of these sequences does, as it'll give insight into the machine You can do it by hand, but it's also useful to have some disassembler functionality in your simulator. In this case it translates to (?[IFPP])F -> (0)P. In other words, find the sequence IFPP, and if there is an F after it, change it to a P. And now we get:

Or of course, you might get some failures. Note that just become something says OK, it might be broken. My initial implementation had messed up some of the alpha processing so the alpha composition didn't look right.

This is one of the points where it is quite easy to get stuck, just because you haven't yet reached a point where clues start branching out. Probably the easiest way to find the next clue is to slow down the drawing process so that you can see how the image is built up. For example, just before each compose or clip operation, dump the two bitmaps involved. You'll be able to see how each element of the image is first drawn on its own layer before compositing it onto the main image. But you'll also see something else right at the start:

Another way to find that clue was to notice that the DNA starts with a big chunk of RNA; if you decided to try that out while someone was implementing the DNA → RNA program, you would probably see the above. Type that in (carefully!) and let's see what happens. For reference, it decompiles as (?[IFPCFFP])I -> (0)C, so once again we're searching for some marker string and changing the character after it.

Ah, now we're making progress. A field repair guide sounds useful, but I'll show the result of the second clue first, just because it doesn't immediately lead to more clues. The prefix is (?[IFPFI])P -> (0)F, so again, replacing one base (or "acid", as the repair guide calls them) after a marker.

So as promised, it's rotated the earth to face the sun (although the sun itself isn't visible). This looks like good progress! We probably now have a lot more correct pixels than we started with. If you have ImageMagick installed, the compare program is useful to visually check how close one is getting:

Ok, let's get back to the field repair guide. This time the prefix is (?[IFPCFFP])II -> (0)IC. That's pretty similar to a previous one: the same search pattern, but now we're replacing II with IC. And it gives this:

Thankfully from now on we'll see more of that font, which is a lot more readable than the last one. It's also time to start writing our own prefix! While at this stage one can do things by hand, it's worth investing in some tooling as you're going to use it a lot. So I recommend writing something that will take human-readable pattern rules and turn them into DNA. It doesn't need to be super-efficient; I put something together with Python and the pyparsing package.

How do we get to other pages? We've already seen two repair guide pages, both of which skip ahead to a particular pattern and then replace some acids after it. If you open up the DNA in a text editor and find the pattern, you'll see the next few acids are IIIIIIIIIIIIIIIIIIIIIIIP. So according to this encoding description, that's zero, and we've been to pages 1 and 2. At this point one could just edit the DNA directly (either with a text editor or with code), but let's practice writing a pattern to get page 1337: (?[IFPCFFP])IIIIIIIIIII -> (0)CIICCCIICIC:

Now we're really getting somewhere! We can repeat for each of those page numbers (although it won't work for the encrypted ones:

1729 (a taxicab number):

42 (the answer to life, the universe and everything):

112 (I guess because it's the international emergency number?):

10646 (the joke is that ISO 10646 defines the Universal Character Set):

85:

2181889 (not sure what the reference is to):

4405829 (apparently the patent number for RSA encryption):

123456:

That opens up a whole lot of parallel threads to explore, so I'm just going to pick an order. I'm going to start by learning as much about the machine as possible before doing anything much about trying to match the target image. In practice one probably wants to work on both in parallel.

The red/green/zone explanation tells us something about how this crazy machine is programmed. Since the blue zone grows and shrinks at one end, it sounds like a stack, and it is. The green zone sounds more like a code or data segment: it can be edited, but you can't go shuffling things around without breaking code that expects to know where things are. And the red zone is "born" from the green zone in the sense that code is copied from the green zone to the red zone to be executed. That makes sense, because once a pattern rule is executed, it disappears from the DNA, so if you want to have reusable code you need to copy it before executing it.

We also have the first page of what looks like a symbol table, and it tells us how to find the green zone: it starts with IFPICFPPCFFPP. Incidentally, how can one be sure that such markers won't accidentally match in the wrong place? You might have noticed when implementing the simulator that in a pattern, IF can be followed by any character; in a template, IF is equivalent to IP; and in both, IIF is equivalent to IIC. It's thus possible to avoid using the sequence IFP in producing any rule (unless you want it in some RNA, but none of the defined RNA sequences need that), and it's a good idea to avoid it in your code too to avoid accidentally messing up patterns that search for such markers.

Can we get the rest of the symbol table? The first symbol (AAA_geneTablePageNr) is a bit of a clue. If we go find the green zone marker (it's 13615 acids into the download), go forward 0x510 acids, and then look at the next 0x18 acids, they're IIIIIIIIIIIIIIIIIIIIIIIP. We've seen before that this encodes 0, so what happens if we change the first acid to a C to encode 1? We'll use (?[IFPICFPPCFFPP]![1283])![1] -> (0)C, (combined with the rule to show the gene table). Note that 1283 is not 0x510: we have to subtract 13 to account for the ? operator taking us to the end of the green zone marker. If you've done it right, you'll get:

And we can continue in the same way (for larger numbers you'll obviously need to replace more bits):

It seems like some of the entries are damanged; we'll eventually want to fix that.

You'll want to start capturing some of this information so that you can pull symbols out to examine or automate modifying them, but don't bother typing in the whole table; later on we'll be able to extract it automatically. But it's worth reading over to look for clues. Some interesting symbols are hitWithTheClueStick, blueZoneStart (which tells you how big the green zone is), and of course all the help-* functions (which suggests there might be more help pages we haven't seen yet). You can also confirm that the value we changed to bring up the sun corresponds to night-or-day, and that the repair guide page number is helpScreen.

The character set page was less than totally helpful, being just a grid of dots. One of the symbols was called fontTable_Dots, so maybe it shows the characters but the font is just dots? We have no idea how font tables work yet, but they're all the same size, so maybe we can just replace it with another one? To do that we'll need to advance to the source font (say, fontTable_Messenger), advance over it (in parentheses, to capture it), advance to the target, and skip over it to consume it, then put back everything else we consumed plus the replacement:

(?[IFPICFPPCFFPP]![610254](![9216])![42728])![9216] -> (1)(0)

Don't forget to add this to the previous prefix that set the repair guide page number (incidentally, you can see this is called helpScreen in the gene list).

Voila! Now we can interpret any text we find. There is another way you could have figured this out: we're told the intergalactic character set has fallen into disuse on Earth, which if you know some history, describes EBCDIC. One look at the Wikipedia page should show you a pattern similar to the dot pattern, although everything is shifted up by 4 lines (and ICS text only goes up to 0xbf). But there are a lot of variations on EBCDIC, so it's helpful to confirm how the punctuation is laid out.

We've also been given a clue that one can go searching for text. We'll start simple by ignoring quoting, and just search for sequences of 9-acid (8-bit) numbers, ending with 255. Regular expressions are good for that, and here's what pops out:

bcdefghij klmnopqrs tuvwxyzA BCDEFGHIJ KLMNOPQRS TUVWXYZ0 123456789a Help! We are a group of computer scientists held prisoner on the remote planet of Utrecht in the Orion nebula by the evil Fuun. The Fuun have already conquered thousands of worlds, and it appears that Earth is next. Their modus operandi is always the same: they organize a fake `programming contest' on the victim world to repair a supposedly disabled Fuun. This enables them to identify that planet's best and brightest minds, who are then eliminated in advance of the actual invasion, leaving the planet defenseless against the Fuun's superior weaponry. You must not, under any circumstances, repair the Endo creature, as he - when reactivated - will surely destroy his `rescuers' and give the attack signal to the Fuun invasion force massing near Sullust. Do not give in to the lure of rewards, monetary or otherwise! * * * It is too late for us, but in the unlikely event that Earth manages to stave off the Fuun invasion, we would appreciate a monument of some sort to honour us (especially since we sabotaged the Fuun DNA by swapping some parabolas). We are: Alexey Rodriguez * Andres Loeh * Arie Middelkoop * Bastiaan Heeren * Chris Eidhof * Clara Loeh * Eelco Dolstra * Eelco Lempsink * Jeroen Leeuwestein * Johan Jeuring * John van Schie * Jurriaan Hage * Maaike Gerritsen * Mark Stobbe * Martijn van Steenbergen * Stefan Holdermans

Well, that's fun ☺. And eventually we'll use the clue about swapped parabolas.

We can also start searching for text that's been quoted (I -> C, C -> F, P -> IC). That turns up a huge amount of stuff: what looks like some function documentation, some of the text we've already seen (including the gene table, plus we can see names for the damaged entries), a story about Major Imp, some history of previous ICFP contests, and some help pages we haven't seen yet. It's all a bit fragmented and out-of-order, and a lot of it we'll see in more readable form later anyway, so I'm not going to delve into it here. If you're attempting the problem yourself, however, I recommend going through it, as you may find clues that you otherwise struggle to unlock.

Now let's tackle the gibberish on the Fuun security features page. It looks encrypted, but there are clearly patterns that suggests it's not good encryption. What's the worst encryption around? ROT13 of course. Typing it all in to decrypt (or decrypting by hand) is pretty tedious, but having bits of the text available as raw text (albeit incomplete and out-of-order) will help. When you get it decrypted, you can see it describes an encryption algorithm. If you've studied encryption, you'll recognise that this is a stream cipher, the best known of which is RC4, and indeed the description matches RC4. There is unfortunately a bug on this page (one of the very few actual mistakes the problem authors made): Step 5B is missing a lookup of the sum into the table before using the result. This also gives us a hint that we might be able to crack some of the encrypted items by using the purchase codes we can see in the symbol table and the crackKey gene.

But maybe we're not done with ROT13 just yet. We can see that help-activating-genes doesn't have a corresponding purchase code in the symbol table, so maybe it's encrypted with the simple encryption? Since there are only four characters in the DNA alphabet, it's going to be ROT2 (I → F, C → P, F → I, P → C). Trying to implement that transformation as a prefix will be challenging at this point, but we can just pull out the current value, transform it in a normal language, and then write a prefix that stuffs the replacement into the right place (or just do everything on the host system). That gives us this:

So this tells us a bit more about how to call "genes" (functions) and pass them "adaptations" (arguments). It also tells us something about how returns work. But recall that help page 85 also told us how to use the "adapter" to simplify making function calls.

Let's see if we can crack some of the other encryption. There is one other repair guide page we're told is encrypted (page 84, "How to fix corrupted DNA") and we can see a symbol called help-error-correcting-codes_purchase_code. Let's read it out, then follow the instructions for the adapter to run this pair of rules:

(?[IFPICFPPCFIPP]) -> (0)CPCPFPPCFCIIPFICCFCFIFCC
(?[IFPICFPPCCC](?[IFPICFPPCCC])) -> (0)CIICICCIIICICIICIICCICCPICCICIIIICCICP(1)

The first rule pushes the purchase code (which we've extracted externally) onto the stack, and the second calls the gene crackKeyAndPrint. This will run for quite a long time (minutes) and eventually produces:

Ok, but now how do we apply it? There is a crypt function in the symbol table (and RC4 is symmetric, so the same process will work for decryption), but we don't know what parameters it takes. We won't actually need the information this key unlocks until much later, so you could just wait, or you could take some guesses, or you could implement RC4 in your language of choice and do the decryption outside of the Fuun machine. You'll need to make a guess about how the 8-bit values are split into groups of 4 (little-endian, just like Fuun numbers). If you want to check that it's implemented correctly, just decrypt the purchase key: it should decrypt to III...IIIP i.e., zero. Now we can view page 84:

There isn't a whole lot we can do with this for now. We can try cracking other keys using the other purchase codes in the table, but in fact only one of them is short enough to be practical to crack with brute force (3 characters, which can be quickly cracked with brute force in C++), and in fact we won't need to (one thing I felt was really well-designed in this contest is that a lot of clues can be solved in more than one way).

Now that we've seen how to call one function, maybe we should try to call some others to see what they do? But first, let's see if we can extract the symbol table, rather than having to type in addresses. We already saw that the symbol names all appear (quoted) in the raw DNA; maybe the addresses and sizes are too? Let's take some symbol (it might as well be crackKeyAndPrint), find it in the DNA, and dump the surrounding acids. Incidentally, by this point it is highly worth having a library of routines to do text encoding, quoting, unquoting, number parsing and so on. You'll see that the previous 50 acids are:

FCCFCFFCCCFCFCCFCCFFCFFIC
CFFCFCCCCFFCFCCCCCCCCCCIC

which when unquoted and interpreted as numbers are 0x6c9469 and 0x1616 — just what we needed. So we can modify our text-finding regular expression to hunt for symbols, consisting of two quoted 24-base numbers followed by a quoted string. Compared to the table we extracted earlier, this one also has the "damaged" entries!

At this point I wrote a script to just try calling every symbol that looks big enough to be a function (in separate invocations). In each case, follow the call with a call to terminate so that the output isn't overwritten by the original code. Most of them just come out blank, some crashed (by triggering my check for integer overflows), and some ran forever and I had to list them for exclusion. Some of the functions require arguments and we don't know yet what they are, so I just pushed 10 24-acid integers onto the stack first.

That still leaves a lot of useful help pages and clues:

There are some easter eggs, including the story of Major Imp. It also contains the Arecibo Message (which I wasted a lot of time trying to interpret during the contest). You'll also see a lot of the elements of the scene (bits of saucer etc).

Let's start with the prefix we've been given in the photo of the organisers, (?[IFPC])![27] -> (0)ICCICIICPCCCICIICPCICIIIICP. Look for that marker and match things to the symbol table, and we find that we're setting giveMeAPresent, and the value looks like text, which turns out to be OPE. If you decided to try cracking purchase codes earlier you might recognise this as the encryption key for vmu-code, and indeed this prefix gives us that page:

There is a symbol called vmuRegCode, so maybe we need to put it in there? Unfortunately that doesn't seem to have any effect, and if we also try setting vmuMode to some obvious small values we get an error page. We'll come back to this later.

By the way, "Out of Band II" is a reference to a spaceship in "A Fire Upon The Deep", a most excellent piece of science fiction.

The steganography page mentioned yellow dots. If you look at the contest history closely, some of the letters are in yellow. And all of them are either i, c, f or p. Put them all together (chronologically) and you get another prefix, which decodes as (?[IFPFCC])F -> (0)P, and matching that pattern to the symbol table tells us that it's setting hillsEnabled to 1. Unfortunately, while the hills appear, something has gone very wrong:

If you compare this to the target image, you'll also see that the hills aren't in quite the right places. There was the clue earlier about some parabolas being swapped, but we'll need to learn more about how to read Fuun code before we can try to fix that.

What does the steganography page mean about the one image being hidden in the other? One of the simplest forms of steganography is to put information in the least significant bits. If we take each pixel, keep only the LSB of each channel, and multiply it by 255, we get this, showing the hidden image:

So let's try the same thing on all the other images we have. You can easily miss it (I did this time around, and only rediscovered it when reading our team blog again), but the left side of ET contains the number 9546. It's not immediately clear what that's good for though; if you try it as a repair guide page number it won't work, nor is it the encryption code for help-beautiful-numbers.

One of the symbols is called hitWithTheClueStick, which sounds like it should give us a clue. If you extract it, you'll notice that there are a lot of I's and C's, very few P's, and no F's. Numbers are encoded with I's and C's terminated by an P, so this looks like more data than code. The first chunk also seems to have some patterns in it. Maybe the Arecibo Message is a hint? It was a string of bits that was intended to be wrapped into a bitmap. In that case it was the product of two primes, making it easy to guess the dimensions, whereas we have 12800 bits but to the first P, but some trial and error shows that a 16×800 bitmap is readable:

Well, that's a pretty strong clue that the next chunk is going to be a PNG file. If we assume each 8 characters produces one byte (little endian), we get this:

Ok, so the next chunk is going to be an audio file. We don't know what type, but there are tools for identifying file types from content which tells us it's an MP3. And it's a voice reading out I's, C's, F's and P's. It's a bit tedious to record (it helps to use some software to slow it down), but produces the prefix (?[IFPFP])![7] -> (0)CCICCCC, which gives us this image:

Maybe some beautiful numbers are repair guide pages? 6, 28 and 496 don't do anything special, but 8128 causes my simulator to crash, so it's definitely special but we don't yet know why.

We also now have documentation on a lot of the functions. One of these is printGeneTable, where we can see a boolean argument controlling integrity checking. We've already been given some hints that various parts of the DNA are damaged, and knowing which parts might help. While we're at it, let's see if we can fix the damaged entries: since they're visible in the source, they can't be that badly damaged. We saw that each symbol name is preceded by the address and size, but what else? In between the symbols are bits of code that look like this:

IPPIICCCPPP
IPPCICCCPICIC

The general pattern seems to be "IPP", a number (which increments by 1 each time), then P or IC, and other P or IC. If we assume the last two are boolean flags, then they're probably quoted (since F is false and P is true, and those become P and IC when quoted). Match them up to the symbols, and one can deduce that these appear after the name, the first boolean indicates whether the entry is damaged or not, and the second probably indicates whether it is a function or data.

So, let's just flip the damaged flag to fix the entries! Wait a second, how do we do that without changing the size? Fortunately the number immediately before is variable-size, so if we change IC to P, we can just insert an extra bit into the number (without changing the value) to make up the space. Here's my Python code for this:

code = re.sub(
'([CF]{23}IC[CF]{23}IC(?:[CF]{8}IC){1,}FFFFFFFFICIPP[CI]*)PIC',
r'\1IPP', code)

Next, we push a P onto the stack (to enable the integrity checking) and call printGeneTable (followed by terminate), and:

So indeed, some of the functions are damaged. A few of them were encrypted, and we've decrypted them (help-activating-genes, help-error-correcting-codes, vmu-code), but that still leaves more. We also have a few potential passwords (no1@Ax3, 9546, 8128, and Out_of_Band_II), and we now have a tool that will tell us if we've decrypted with the correct password. Some trial and error will show that 9546 decrypts cow-tail and Out_of_Band_II decrypts caravan.

What about cow-spot-middle? There is the interestingly-named cow-spot-middle-ecc symbol, of the same size, and the help page on error-correcting codes suggested we should expect 4 parity bits per 4 data bits. There is also the correctErrors function in the symbol table. You'll find that calling that first (with the right arguments) will cause cow-spot-middle to pass. When calling functions with multiple arguments, keep in mind that the first argument gets pushed to the stack first, and hence is further from the top (front) of the stack; and that integers must be encoded with 24 acids.

What about that page on palindromes? The text fades away, but you can find all the useful information in the raw text. It strongly suggests that some code might have a copy stored backwards as a backup. That would be the same size, so let's see what happens if we group symbols by size. We know cloud is one of the damaged symbols, and there is a symbol of the same size called duolc - cloud spelled backwards! And indeed, copying and reversing duolc over cloud will fix it.

How about sun? There is a function called sunflower, which is exactly the same size, and there are no sunflowers in the picture. In the case of cloud-duolc, the alteration of the name was a clue — so maybe sunflower is somehow a combination of sun and flower? Again a little trial and error is needed to consider the possibilities, but it turns out that it is their XOR (again with I=0, C=1, F=2, P=3).

Let's take a break from examining functions and try just poking some of the variables we can see. I'll assume that the functions have been fixed as appropriate. Clearly we need to turn Endo into a cow, and the VMU help page hinted that there is a Biomorphological Unit that might help. What happens if we set enableBioMorph to P (true)?

Well, he's roughly the right shape now, but he's an OCaml. But there is a variable called ocamlrules. Maybe if we change that from P (true) to F (false)?

Clearly the contest organisers had a lot of fun. Well the BMU should be adapting Endo to the local conditions, and both camels and elephants are found in dry areas — maybe if we change the weather (it's a 24-acid number)?

weather=1:

weather=2:

weather=3:

(remember to fix clouds and sun before trying these). Other values of weather don't seem to do anything.

So we can get the clouds to appear (although not in the right places or with the right sizes), together with some unwanted rain and lightning, and a bunch of things suddenly become very German. We can also get the sun to appear, in the right place, but too yellow and without the pattern in the middle. We can get Endo to turn half-way into a cow.

Notice how when weather is 1 or 2, there is also a faint shadow in the outline of the desired cow shape. Maybe the real cow is there but just almost completely transparent? We'll come back to this later.

There is a help page about Lindenmayer systems, and if you've played with them, the weeds (brown sticks on the left horizon) might look related. The impdoc page on lsystem-weeds is also a strong hint that these are L-systems with a random component. You can try calling the function yourself with different depths, but none of the options match the target. Maybe the target uses different random numbers? There is a seed variable, so you can try poking different numbers in there, and indeed the weeds will change. But what is the right value? Well the help page on L-systems refers to The Algorithmic Beauty of Plants, and we know that perfect numbers are considered beautiful, so let's try some of those. And indeed the 4th perfect number (8128) is the right value:

What about the windmill? Windmills rotate, so if anything uses polar coordinates it'll probably be that. Let's write to polarAngleIncr (it also works to call setGlobalPolarRotation). You an use trial and error to hone in on the correct angle, or you can recall that Fuuns use 256 angle steps in a circle and measure the angle adjustment you need to make (although you'll still need an experiment or two to determine the sign convention). You'll want to set it to 5:

We haven't yet used the clues about bioMul and super-adaptive genes. It looks suspiciously like functional programming, and it turns out that there is an entire functional language hidden away inside this machine! The Major Imp episodes hinted at this (Imp being short for Imperative). The terminology is a little confusing though. I'll try to explain it, but as I'm not very familiar with functional programming I'll probably make real functional programmers cry.

"Genes" are functions (the imperative functions we've seen so far are also called "genes", but they're not the same). "Adaptation Trees" are expressions. An Adaptation Tree consists of the address of a gene (function) followed by a sequence of arguments. Some of those arguments are themselves expected to be expressions; where an expression is expected, it is always represented by its length. One of the most common genes to use is activateAdaptationTree, which is followed by the address of an adaptation tree in the green zone. In this case, it first evaluates the expression in that adaptation tree (which resolves to a function, because this is a lambda calculus where everything is a function, including integers), when is then invoked with the arguments.

In the symbol table, symbols with the _adaptation suffix are adaptation trees. There are also some functional genes (like k and intBox) that don't have the suffix. We can use the information we've been given about the encoding to see what bioAdd looks like:

[ activateAdaptationTree caseVar1_adaptation
    [ activateAdaptationTree var2_adaptation ]
    [ activateAdaptationTree apply1_adaptation
        [ activateAdaptationTree bioSucc_adaptation ]
        [ activateAdaptationTree apply2_adaptation
            [ activateAdaptationTree bioAdd_adaptation ]
            [ activateAdaptationTree var1_adaptation ]
            [ activateAdaptationTree var2_adaptation ]
        ]
    ]
]

One subtlety I didn't appreciate at first is that the BioNat type isn't a function taking two arguments (although note that currying is prevalent), but a function taking a pair (which is a distinct data type). So working through this and reading the Fuun docs, we see that if var1 is 0, it returns var2, otherwise it returns the successor of something. apply2 will call var1 and var2 on the pair (getting the first and second elements), reassemble them into a pair, then pass it to bioAdd. That is actually somewhat redundant and the whole apply2 subtree could be replaced by using bioAdd directly, but no matter. This matches the description of bioAdd.

That should give us what we need to implement bioMul. Here's my implementation:

[ activateAdaptationTree caseVar1_adaptation
    [ activateAdaptationTree bioZero_adaptation ]
    [ activateAdaptationTree apply2_adaptation
        [ activateAdaptationTree bioAdd_adaptation ]
        [ activateAdaptationTree var2_adaptation ]
        [ activateAdaptationTree bioMul_adaptation ]
    ]
]

In other words, 0 * y = 0, and (x+1) * y = y + x*y. My original implementation had the last two lines swapped around, which makes no semantic difference, but the implementation of this functional language is pretty simplistic and tends to evaluate some expressions multiple times in a way that can cause a stack explosion. I spent a lot of frustrated time debugging integer overflow errors as a result.

When you patch the adaptation tree in the green zone, remember that it must be preceeded by the length and followed by the length of the rest of the green zone. Unfortunately making this change doesn't seem to have any effect. If you poke around the other adaptation trees for a while you might find biomorph_adaptation, which is a big complicated expression starting with this:

[ activateAdaptationTree enableBioMorph_adaptation
    [ activateAdaptationTree payloadBioMorph_adaptation
        [ activateAdaptationTree fromNat_adaptation
            [ activateAdaptationTree apply2_adaptation
                [ activateAdaptationTree bioMul_adaptation ]
                [ k
                    [ activateAdaptationTree mkSucc_adaptation
                        [ activateAdaptationTree mkZero_adaptation ]
                    ]
                ]
                [ k
                    [ activateAdaptationTree mkSucc_adaptation
                        [ activateAdaptationTree mkSucc_adaptation
                            [ activateAdaptationTree mkSucc_adaptation
                                [ activateAdaptationTree mkZero_adaptation ]
                            ]
                        ]
                    ]
                ]

So it looks like it's constructing some constants (1 and 3) and multiplying them (k take a value and returns a constant function that always returns that value), and obviously it's depending on us to fix bioMul. But there is also that enableBioMorph_adaptation adaptation right at the top. What does that do?

[ false ]

Aha! Confusingly, it has nothing to do with enableBioMorph. If we change it to [true], then the clumps of grass move into the correct positions:

Looking through the _adaptation symbols, there seem to be a few related to the goldfish. In particular, take a look at goldenFish_adaptation:

[ activateAdaptationTree mkBeforeAbove_adaptation
    [ activateAdaptationTree mkEmp_adaptation
        [ intBox 20 ]
        [ intBox 20 ]
    ]
    [ activateAdaptationTree mkBeforeAbove_adaptation
        [ activateAdaptationTree mkAbove_adaptation
            [ activateAdaptationTree mkBeforeAbove_adaptation
                [ activateAdaptationTree mkGoldfishL_adaptation ]
                [ activateAdaptationTree mkBeforeAbove_adaptation
                    [ activateAdaptationTree mkEmp_adaptation
                        [ intBox 8388606 ]
                        [ intBox 8388607 ]
                    ]
                    [ activateAdaptationTree mkGoldfishR_adaptation ]
                ]
            ]
            [ activateAdaptationTree mkBeforeAbove_adaptation
                [ activateAdaptationTree mkEmp_adaptation
                    [ intBox 45 ]
                    [ intBox 24 ]
                ]
                [ activateAdaptationTree mkGoldfishL_adaptation ]
            ]
        ]
        [ activateAdaptationTree mkBeforeAbove_adaptation
            [ activateAdaptationTree mkEmp_adaptation
                [ intBox 0 ]
                [ intBox 24 ]
            ]
            [ activateAdaptationTree mkAbove_adaptation
                [ activateAdaptationTree mkGoldfishR_adaptation ]
                [ activateAdaptationTree mkBeforeAbove_adaptation
                    [ activateAdaptationTree mkEmp_adaptation
                        [ intBox 25 ]
                        [ intBox 29 ]
                    ]
                    [ activateAdaptationTree threeFish_adaptation
                        [ activateAdaptationTree mkGoldfishR_adaptation ]
                        [ activateAdaptationTree emptyBox_adaptation ]
                    ]
                ]
            ]
        ]
    ]
]

This is describing the layout of the goldfish in some way. You don't need to understand all the details. Just try changing the various numbers to see what happens. Also try changing some of the goldfish from L to R or vice versa. You can get things mostly correct this way, but there just aren't enough goldfish. But notice that emptyBox_adaptation right at the end: what if we change it to mkGoldfishR_adaptation?

The full solution is to swap the direction of all the first, change the 0 to 18, and swap out the empty box as above. Now we have this:

That's it for the functional programming language in terms of fixing up the scene. While trying to track down bugs in my bioMul implementation I ended up learning a lot more about it, which I'll share just for interest:

True is a function taking two arguments and returning the first, while false takes two arguments and returns the second.
Zero is also a function taking two arguments and returning the first. So in fact, true, k and mkZero_adaptation have identical implementations (for whatever reason the last is an adaptation tree that simply calls an anonymous gene, which seems to be quite common).
Positive integers are functions that take two arguments and call the second with one less than that integer. If this sounds a lot like what caseVar1 does, it is. There is another (undocumented) adaptation called caseNat which takes a single integer (instead of a pair), and its implementation is an identity function.
A pair is a functor that takes a function and calls it with the elements of the pair i.e. λf.f(x)(y).
Apart from the "Nat" type (Peano integers), the language does allow "raw" 24-acid numbers to be used, but only where they're expected (using them in the wrong place causes them to be confused with lengths; you might have already seen this if you try to fully decompile biomorph_adaptation). intBox allows passing such a raw integer to the following expression. fromNat converts a Nat to a raw integer and passes it to the following expression. These are used to interface with imperative code via a special gene called wrapImp (which is in turn wrapped by adaptation trees like wrapSub_adaptation).

We won't get a whole lot further by just poking at variables. We'll need to start taking apart some of the code to see how it works. For example, we'll discover that there is another repair guide page that we haven't spotted yet. But that's for Part 2. If you made it this far, I hope you're having fun!

Thoughts on Rust

2020-01-26T02:45:00.003-08:00

I've recently spent a few days learning to program in Rust, and thought I'd write down my thoughts so far. I used programming contest problems as away to get practical experience, which probably biases things somewhat. For example, I didn't look into multi-threading, testing, or see very much of the standard library.

References

This is obviously the stand-out feature of Rust. It seems like a nice idea, and eliminating use-after-free, double-free, null pointer dereferences, and many forms of memory leak sounds great from a C/C++ perspective. While modern C++ makes it much easier to manage ownership safely, it doesn't really address borrowing.

The mutability rules also sound like a huge win for safe concurrency, but I haven't looked into any of the details.

The one thing I've found non-intuitive is that it seems sometimes references are implicitly dereferenced (e.g., you can add two references to integers) but in other places they aren't (e.g., adding in place to a mutable reference to integer).

I haven't really tried using references very heavily: my contest coding style tends not to use a lot of pointers anywhere, preferring standard library structures like vectors.

Types

I found it really annoying that there is no automatic coercion between integer types. What's worse, the typecast operator ("as") doesn't follow the same rules as arithmetic operations, namely panicking on overflow in debug builds. It's also annoying that indexing requires an unsigned type. If you have an index (of type usize) and an offset to it (of type i32, possibly negative) it is a real pain to add them to produce a new index.

The type inference also seems like spooky action at a distance: I'm all for omitting the type when declaring a variable and inferring it from the initialiser, but inferring it from usage elsewhere in the code takes some getting used to. It leads to weirdness where changing/removing some apparently unrelated line of code can cause a compilation failure, or even silently change the type of a variable. Given that the indexing requires unsigned, I worry that some variable that might need to store negative values could end up unsigned without one being aware of it just because of an inference chain from an index.

Overloading

Coming from C++, it's disappointing to have no function/method overloading, not even default arguments. It leads to having to invent names for lots of variants of basically the same function. And unfortunately I don't think it can be easily fixed, because of the aggressive type inference: overloading uses the types of arguments to select a function overload, but type inference uses the types of formal parameters to infer the types of arguments. So as soon as you add a new overload, you're reducing the power of type inference and potentially making old code no longer valid.

Traits and generics

While it took a bit of getting used to when coming from traditional OO languages like C++ and Python, I quite like the traits system. It's a lot better than Java interfaces, because you can have default implementations, and you can define new traits and bolt them on to existing classes. It's a much cleaner way to put type bounds on generics than SFINAE in C++, and seems like it probably has many of the same advantages as C++ concepts (not that I've looked at the latest incarnations of C++ concepts).

I also really like the way traits allow you to use a trait with either static dispatch (ala C++ templates) or dynamic dispatch (ala C++ virtual functions), rather than forcing the API designer to choose one or the other. I suspect there are also some performance advantages: in C++ when using a polymorphic class, you pay for a virtual function call even when the exact class is known to the programmer, unless the compiler can also determine that it is known or the programmer uses final classes/methods. With Rust there is no inheritance, so an object whose type is a concrete struct will have exactly that type.

The one thing I disliked is that methods defined in traits end up looking the same as normal methods on the class. If you're not aware that a particular method is actually provided by a trait (or you don't know which trait), it can mysteriously fail to exist if you forget to import the trait into your namespace.

The generics system still has some way to go before it catches up to C++ e.g. there are no non-type template parameters (although it's being worked on), no variadic templates, and from what I could see, no real specialisation to override more generic implementations.

Enums

I think this is one of the more under-rated features of Rust. Rust enums are really discriminated unions (ala boost::variant), with first-class language support. I particularly like the "?" operator: given an expression of type Result (which holds either an "ok" or an error), put a "?" after it, and if it is an error it will immediately return it from the function. This means that although Rust doesn't have exceptions in the same way as C++/Java/Python, one can propagate errors with very little boilerplate, and with the benefit that it's explicit where early returns might happen. It certainly looks nicer than what I've seen of Go.

Performance

I translated a few contest solutions from C++ to Rust, and was pleasantly surprised by the performance: generally faster than the C++ code (which might just be because Rust uses LLVM, which is pretty good and often better than GCC, particularly since Codeforces uses a 32-bit GCC). The one area where it was much worse is writing large outputs to stdout, because Rust always makes stdout line-buffered while GCC makes it fully-buffered when it is not a TTY (which is a known issue in Rust). After wrapping a buffer around stdout the performance was good again.

Summary

In general I like Rust, although I think it still needs a few years to mature before I'd consider abandoning C++ for it. While Go seems to be getting all the popularity, I think a high-performance language really needs to avoid garbage collection and provide a strong compile-time generics system.

COCI 2017/2018 r5 analysis

2018-01-20T10:12:00.004-08:00

For the problem texts, see here. I found this contest easier than usual for a COCI.

Olivander

Given two wands A < B and two boxes X > Y, there is never any point in putting A in X and B in Y; if they fit, then they also fit the other way around. So sort the boxes and wands, match them up in this order and check if it all fits.

Spirale

This is really just an implementation challenge. For each spiral, walk along it, filling in the grid where the spiral overlaps it. With more than one spiral, set each grid point to the minimum of the values induced by each spiral. Of course, it is not necessary to go to \(10^{100}\); about 10000 is enough to ensure that the grid is completely covered by the partial spiral.

Birokracija

The rule for determining which employee does the next task is a red herring. Each employee does exactly one task, and the order doesn't matter. The money earned by an employee is the sum of the distance from that employee to all underlings. The can be computed by a tree DP (computing both this sum and the size of each subtree).

Karte

Suppose there is a solution. It can be modified by a series of transformations into a canonical form.Firstly, moving a false claim below an adjacent true claim will not alter either of them; thus, we can push all the false claims to the bottom and true claims to the top. At this point, the true claims can be freely reordered amongst themselves. Similarly, if we have a false claim with small a above one with a larger a, we can swap them without making either true. So we can assume that the false claims are increasing from bottom to top in the deck. Finally, given a true claim with large a and a false claim with small a, we can swap them and they will both flip (so the positions of false claims stay the same).

After these transformations, the deck will, from bottom to top, contain the largest K cards in increasing order, followed by the rest in arbitrary order (let's say increasnig). To solve the problem, we simply construct this canonical form, then check if it indeed satisfies the conditions.

Pictionary

When building roads on the day with factor F, we don't actually need to build roads between every pair of multiples of F: it is equivalent to connect every multiple of F to F. This gives O(N log M) roads. We can represent the connected components after each day with a standard union-find structure. For reasons we'll see later, we won't use path compression, but always putting the smaller component under the larger one in the tree is sufficient to ensure a shallow tree (in theory O(log N), but I found the maximum depth was 6). A slow solution would be to check every remaining query after each day to see whether the two mathematicians are in the same component yet.

To speed this up, we can record extra information in the tree: for each edge, we record the day on which it was added. If the largest label on a path from A to B is D, then D was the first day on which they were connected (this property would be broken by path compression, which is why we cannot use it). Thus, to answer a query, we need only walk up the tree from each side to the least common ancestor; and given the shallowness of the tree, is this cheap.

Planinarenje

I really liked this problem. Take a single starting peak P. Let A be the size of the maximum matching the graph, and let B be the size of the maximum matching excluding P. Suppose A = B. Then Mirko can win as follows: take the latter matching, and whenever Slavko moves to a valley, Mirko moves to the matched peak. Slavko can never move to a valley without a match, because otherwise the journey would form an augmenting path that would give a matching for the graph of size B + 1.

Conversely, suppose A > B. Then take a whole-graph matching, which by the assumption must include a match for P. Slavko can win by always moving to the matched valley. By a similar argument, Mirko can never reach an unmatched peak, because otherwise toggling all the edges on their journey would give a maximum matching that excludes P.

To implement it, it may not be efficient enough to construct a new subgraph matching from every peak. Instead, one can start with a full-graph matching, remove P and its match from the graph (if any), then re-augment starting from that match. This should give an O(NM) algorithm (O(NM) for the initial matching, then O(M) per query).

Analysis of Croatian Open contest 2017r3

2017-11-26T12:26:00.004-08:00

This round of COCI was more challenging than usual, with some of the lower-scoring problems being particularly difficult.

Aron

This was quite straightforward: run through the letters, and each time a letter is different to the previous one (or is the first one), increment a counter.

Programiranje

Firstly, one word can be made by rearranging the letters of another if and only if they have the same letter frequencies. So if we have a fast way to find the letter frequencies in any substring, the problem is solved. Consider the sequence whose ith element is 1 if S[i] is 'a' and 0 otherwise: the number of a's in a substring is then just a contiguous sum of this sequence. By precomputing the prefix sum, we can find the sum of any interval in constant time. We thus just need to store 26 prefix sum tables, one for each letter.

Retro

The first part (finding the length) is a reasonably standard although fiddly dynamic programming problem. At any point in the process, one has a position relative to the original grid and a nesting depth, and need to know the longest suffix that will complete a valid bracket sequence. For each of the (up to) three possible moves, one can compute the state would be reached, and find the longest possible sequence from previously computed values.

My attempt at the second part during the contest was overly-complex and algorithmically too slow, involving trying to rank all length-L strings represented by DP states, for each L in increasing order. This required a lot of sorting, making the whole thing O(N³ log N) (where N is O(R + C)).

A simpler and faster solution I wrote after the contest is to reconstruct the path non-deterministically, one bracket at a time. Instead of working from a single source state, keep a set of candidate states (in super-position). From those, run a breadth-first (or depth-first) search to find all states where one might encounter the next bracket. Any transition that is optimal (for length) in the original DP now becomes an edge in this search. Once the candidate states for the next bracket has been found, there might be a mix of opening and closing brackets. If so, discard those states corresponding to closing brackets, since they will not be lexicographically minimal.

Portal

I didn't attempt this problem, and am not sure how to solve it. I suspect it requires coming up with and proving a number of simplifying assumptions which allow the search space to be reduced.

Dojave

I really liked this problem. First of all, M = 1 is a bit tricky, because it's the only case where you can't swap two numbers without having an effect. It's easy enough to solve by hand and treat as a special case.

Now let's count all the cases where the player can't win. Let's say that the lit elements have an XOR of K. For M > 1, we can't have \(K = 2^M - 1\), since otherwise the player could swap either two lit or two unlit numbers and win. Let \(K' = K \oplus 2^M - 1\), and consider a lit number A. To win, the player has to swap it with B such that \(A \oplus B = K'\). As the RHS can't be 0, the lit numbers must be grouped into pairs with XOR of K'. The XOR of all the lit numbers must thus be either 0 or K', depending on whether there are an even or odd number of such pairs. But this is K (by definition); and since K = K' is impossible, we must have K = 0.

So now we need to find intervals such that the XOR of all elements is 0, and for every pair of complementary elements A and B, they are either both inside or both outside the interval. The first part is reasonably easy to handle: we can take a prefix sums (using \(\oplus\) instead of addition), and an interval with XOR of 0 is one where two prefix sums are equal. For the second part, we can use some sweeps: for each potential left endpoint, establish an upper bound on the right endpoint (considering only complementary pairs that lie on opposite sides of the left endpoint), and vice versa - doable with an ordered set or a priority queue to keep track of right elements of complementary pairs during the sweep. Now perform another left-to-right sweep, keeping track of possible left endpoints that are still "active" (upper bound not yet reached), and each time a right endpoint is encountered, query how many of the left endpoints with the same suffix sum are above the lower bound for the right end-point. Actually getting the count in logarithmic time requires a bit of fancy footwork with data structures that I won't go into (e.g. a Fenwick tree, a segment tree or a balanced binary tree with node sizes in the internal nodes).

Sažetak

I somehow miscounted and didn't realise that this problem even existed until after the contest (maybe because I don't expect more than three harder problems in COCI). It's disappointing, because it only took me about 15 minutes to solve after the contest, yet was worth the most points.

For each K-summary, draw a line between each pair of adjacent segments of that summary. For example, with a 2-summary and 3-summary and N=9, you would get |01|2|3|45|67|8|9|. Clearly in any marked interval with more than one number, none of them can be determined. It is also not hard to see that in any interval with exactly one number, that number can be determined (hint: find the sum of all numbers except that one). Thus, we need the count of numbers X such that X is a multiple of some Ki and X+1 is a multiple of another Kj. Note that if any K=1 then the answer is simply N, which we'll assume is handled as a special case.

If one has chosen a single Ki and Kj, it is not difficult to determine the numbers that satisfy the property, using the Chinese Remainder Theorem (they will be in arithmetic progression, with period Ki×Kj, and so are easy to count). However, this can easily lead to double-counting, and we thus need to apply some form of inclusion-exclusion principle. That is, we can pick any two subsets S and T of the K's, with products P and Q, and count the number of X's such that P divides X and Q divides X + 1. Note that S and T must be disjoint, as otherwise the common element would need to divide X and X+1 (and we assumed Ki > 1). We also need to determine the appropriate coefficient to scale this term in the sum. Rather than trying to derive equations, I let the computer do the work: for each value of |S| and |T| I consider every possible subset of S and of T (actually, just each possible size), add up the coefficients they're already contributing, and set the coefficient for this value of |S| and |T| to ensure that the total is 1.

Extra DCJ 2017 R2 analysis

2017-06-12T23:19:00.000-07:00

Flagpoles

My solution during the contest was essentially the same as the official analysis. Afterwards I realised a potential slight simplification: if one starts by computing the second-order differences (i.e., the differences of the differences), then one is looking for the longest run of zeros, rather than the longest run of the same value. That removes the need to communicate the value used in the runs at the start and end of each section.

Number Bases

I missed the trick of being able to uniquely determine the base from the first point at which X[i] + Y[i] ≠ Z[i]. Instead, at every point where X[i] + Y[i] ≠ Z[i], I determine two candidate bases (depending on whether there is a carry of not). Then I collect the candidates and test each of them. If more than three candidates are found, then the test case is impossible, since there must be two disjoint candidate pairs.

Broken Memory

My approach was slightly different. Each node binary searches for its broken value, using two other nodes to help (and simultaneously helping two other nodes). Let's say we know the broken value is in a particular interval. Split that interval in half, and compute hashes for each half on the node (h1 and h2) and on two other nodes (p1 and p2, q1 and q2). If h1 equals p1 or q1, then the broken value must be in interval 2, or vice versa. If neither applies, then nodes p and q both have broken values, in the opposite interval to that of the current node. We can tell which by checking whether p1 = q1 or p2 = q2.

This does rely on not having collisions in the hash function. In the contest I relied on the contest organisers not breaking my exact choice of hash function, but it is actually possible to write a solution that works on all test data. Let P be a prime greater than \(10^{18}\). To hash an interval, compute the sums \(\sum m_i\) and \(\sum i m_i\), both mod P, giving a 128-bit hash. Suppose two sequences p and q collide, but differ in at most two positions. The sums are the same, so they must differ in exactly two positions j and k, with \(p_j - q_j = q_k - p_k\) (all mod P). But then the second sums will differ by

\(jp_j + kp_k - jq_j - kq_k = (j - k)(p_j - q_j)\), and since P is prime and each factor is less than P, this will be non-zero.

Alternative Code Jam 2017 R3 solution

2017-06-12T00:13:00.001-07:00

This last weekend was round 3 of Code Jam 2017 and round 2 of Distributed Code Jam 2017. I'm not going to describe how to solve all the problems since there are official analyses (here and here), but just mention some alternatives. For this post I'll just talk about one problem from Code Jam; some commentary on DCJ will follow later.

Slate Modern (Code Jam)

The idea I had for this seems simpler (in my opinion, without having tried implementing both) than the official solution, but unfortunately I had a bug that I couldn't find until about 10 minutes after the contest finished.

As noted in the official analysis, one can first check whether a solution is possible by comparing each pair of fixed cells: if the difference is value is greater than D times the Manhattan distance, then it is impossible; if no such pair exists, it is possible. The best solution is then found by setting each cell to the smallest lower bound imposed by any of the fixed cells.

Let's try to reduce the complexity by a factor of C, by computing the sum of a single row quickly. If we look at the upper bound imposed by one fixed cell, it has the shape \(b + |x - c| D\), where \(x\) is the column and b, c are constants. When combining the upper bounds, we take the lowest of them. Each function will be the smallest for some contiguous (possibly empty) interval. By sweeping through the fixed cells in order of c, we can identify those that contribute, similar to finding a Pareto front. Then, by comparing adjacent functions one can find the range in which each function is smallest, and a bit of algebra gives the sum over that range.

This reduces the complexity to O(RN + N²) (the N² is to check whether a solution is possible, but also hides an O(N log N) to sort the fixed cells). That's obviously still too slow. The next insight is that most of the time, moving from one row to the next changes very little: each of the functions increases or decreases by one (depending on whether the corresponding fixed cell is above or below the row), and the range in which each function is smallest grows or shrinks slightly. It thus seems highly likely that the row sum will be a low-degree polynomial in the row number. From experimentation with the small dataset, I found that it is actually a quadratic (conceptually it shouldn't be too hard to prove, but I didn't want to get bogged down in the details).

Note that I said "most of the time". This will only be true piecewise, and we need to find the "interesting" rows where the coefficients change. It is fairly obvious that rows containing fixed cells will be interesting. Additionally, when the range in which a function is smallest disappears or appears will be interesting. Consider just two fixed cells, and colour the whole grid according to which of the two fixed cells gives the lower upper bound. The boundary between the two colours can have diagonal, horizontal and vertical portions, and it is these horizontal portions that are (potentially) interesting. I took the conservative approach of adding all O(N²) such rows as interesting.

Now that we have partitioned the rows into homogeneous intervals, we need to compute the sum over each interval efficiently. Rather than determine the quadratic coefficients analytically, I just interfered them by taking the row sums of three consecutive rows (if there are fewer than three rows in the interval, just add up their row sums directly). A bit more algebra to find the sum of a quadratic series, and we're done! There are O(N²) intervals to sum and it requires O(N) to evaluate each of the row sums needed, giving a running time of O(N³). I suspect that this could probably be reduced to O(N² log N) by being smarter about how interesting rows are picked, but it is not necessary given the constraints on N.

An alternative DCJ 2016 solution

2016-11-22T22:45:00.001-08:00

During this year's Distributed Code Jam I had an idea for solving the hard part of Toothpick Sculptures, but had no time to implement it, or even to work out all the details. When the official solution was presented, I was surprised that it was very different to mine. Recently Pablo from Google has been able to help me check that my solution does indeed pass the system tests (thanks Pablo), so it seems like a good time to describe the solution.

There are going to be a lot of different trees involved, so let's give them names. A "minitree" is one of the original sculptures of N toothpicks, and its root is a "miniroot". The "maxitree" is the full tree of 1000N toothpicks. Let's assume that each minitree is built out of toothpicks of one colour. We can then consider a tree whose vertices are colours, called the "colour tree", which contains an edge between two colours if and only if the maxitree contains an edge between two vertices of those colours.

For each colour, we can find its children in the colour tree by walking the corresponding minitree looking for children which are miniroots. This can be done in parallel over the colours.

For a given vertex, we can use DP to solve the following two problems for the subtree: what is the minimum cost to stabilise the subtree if we stabilise the root, and what is the minimum cost if we do not stabilise the root? But this will not be so easy to parallelise. We can go one step further: if we fix some vertex C and cut off the subtree rooted at C, then we can answer the same two problems for each case of C being stabilised or not stabilised (four questions in total). If we later obtain the answers to the original two queries for C, then we can combine this with the four answer we have to answer the original queries for the full tree.

This is sufficient to solve the small problem, and more generally any test case where the colour tree does not branch. For each miniroot, compute the DP, where the miniroot corresponding to the single child (if any) in the colour tree is the cutoff. These DPs can be done in parallel, and the master can combine the results.

Let's consider another special case, where the colour tree is shallow. In this case, one can solve one layer at a time, bottom up, without needing to use the cutoff trick at all. The colours in each layer of the colourtree are independent and can be solved in parallel, so the latency is proportional to the depth of the tree. The results of each layer are fed into the calculations of the layer above.

So, we have a way to handle long paths, and a way to handle shallow trees. This should immediately suggest light-heavy decomposition. Let the "light-depth" of a node in the colour tree be the number of light edges between it and the root. The native of light-heavy decomposition guarantees that light-depth is at most logarithmic in the tree size (which is 1000). We will process all nodes with the same light-depth in parallel. This means that a node in a tree may be processed at the same time as its children, but only along a heavy path. We thus handle each heavy path using the same technique as in the small problem. For other children in the colour tree, the subtree results were computed in a previous pass and are sent to the slave.

TCO 2016 finals

2016-11-22T22:37:00.003-08:00

The final round of TCO 2016 did not go well for me. I implemented an over-complicated solution to the 500 that did not work (failed about 1 in 1000 of my random cases), and I started implementing the 400 before having properly solved it, leading to panic and rushed attempts to fix things in the code rather than solving the problem.

Easy

The fact that there are only four mealtimes is a clue that this will probably require some exponential time solution. I haven't worked out the exact complexity, but it has at least a factor of \(2^\binom{n}{n/2}\) for n mealtimes.

Let's restrict our attention to meal plans that contain no meals numbered higher than m and have t meals in the plan (t can be less than 4). We can further categorise these meals by which subsets of the mealtimes can accommodate the plan. This will form the state space for dynamic programming. To propagate forwards from this state, we iterate over the number of meals of type m+1 (0 to 4 of them). For each choice, we determine which subsets of mealtimes can accommodate this new meal plan (taking every subset that can accommodate the original plan and combining it with any disjoint subset that can accommodate the new meals of type m+1).

Medium

Let's start by working out whether a particular bracket sequence can be formed. Consider splitting the sequence into a prefix and suffix at some point. If the number of ('s in the prefix differs between the initial and target sequence, then the swaps that cross the split need to be used to increase or decrease the number of ('s. How many ('s can we possibly move to the left? We can answer that greedily, by running through all the operations and applying them if and only if they swap )( to (). Similarly, we can find the least possible number of ('s in the final prefix.

If our target sequence qualifies by not having too many or too few ('s in any prefix, can we definitely obtain it? It turns out that we can, although it is not obvious why, and I did not bother to prove it during the contest. Firstly, if there is any (proper) prefix which has exactly the right number of ('s, we can split the sequence at this point, choose not to apply any operations that cross the split, and recursively figure out how to satisfy each half (note that at this point we are not worrying whether the brackets are balanced — this argument applies to any sequence of brackets).

We cannot cross between having too few and too many ('s when growing the prefix without passing through having the right number, so at the bottom of our recursion we have sequences where every proper prefix has too few ('s (or too many, but that can be handled by symmetry). We can solve this using a greedy algorithm: for each operation, apply it if and only if it swaps )( to () and the number of ('s in the prefix is currently too low. We can compare the result after each operation to the maximising algorithm that tries to maximise the number of ('s in each prefix. It is not hard to see (and prove, using induction) that for any prefix and after each operation is considered, the number of ('s in the prefix is:

between the original value and the target value (inclusive); and
the smaller of the target value and the value found in the maximising algorithm.

Thus, after all operations have been considered, we will have reached the target.

So now we know how to determine whether a specific bracket sequence can be obtained. Counting the number of possible bracket sequences is straightforward dynamic programming: for each prefix length, we count the number of prefixes
that are valid bracket sequence prefixes (no unmatched right brackets) for each possible nesting depth.

Hard

This is an exceptionally mean problem, and I very much doubt I would have solved it in the fully 85 minutes; congratulations to Makoto for solving all three!

For convenience, let H be the complement of G2 (i.e. n copies of G1). A Hamiltonian path in G2 is simply a permutation of the vertices, such that no two consecutive vertices form an edge in H. We can count them using inclusion-exclusion, which requires us to count, for each m, the number of paths that pass through any m chosen edges. Once we pick a set of m edges and assign them directions (in such a way that we have a forest of paths), we can permute all the vertices that are not tails of edges, and the remaining vertices have their positions uniquely determined.

Let's start by solving the case n=1. Since k is very small, we can expect an exponential-time solution. For every subset S of the k vertices and every number m of edges, we can count the number of ways to pick m edges with orientation. We can start by solving this for only connected components, which means that m = |S| - 1 and the edges form a path. This is a standard exponential DP that is used in the Travelling Salesman problem, where one counts the number of paths within each subset ending at each vertex.

Now we need to generalise to m < |S| - 1. Let v be the first vertex in the set. We can consider every subset for the path containing v, and then use dynamic programming to solve for the remaining set of vertices. This is another exponential DP, with a \(O(3^n)\) term.

Now we need to generalise to n > 1. If a graph contains two parts that are not connected, then edges can be chosen independently, and so the counts for the graph are obtained by convolving the counts for the two parts. Unfortunately, a naïve implementation of this convolution would require something like O(n²k) time, which will be too slow. But wait, why are we returning the answer modulo 998244353? Where is our old friend 1000000007, and his little brother 1000000009? In fact, \(998244353 = 2^{23} \times 7 \times 17 + 1\), which strongly suggests that the Fast Fourier Transform will be involved. Indeed, we can perform an 1048576-element FFT in the field modulo 998244353, raise each element to the power of n, and then invert the FFT.

Despite the large number of steps above, the solution actually requires surprisingly little new code, as long as one has library code for the FFT and modulo arithmetic.

TCO 2016 semifinal 2 solutions

2016-11-21T03:59:00.003-08:00

Semifinal 2 was much tougher than semifinal 1, with only one contestant correctly finishing two problems. The first two problems each needed a key insight, after which very little coding is needed (assuming one has the appropriate library code available). The hard problem was more traditional, requiring a serious of smaller steps to work towards the solution.

Easy

Consider the smallest special clique. What happens if we remove an element? Then we will still have a clique, but the AND of all the elements is non-zero. In particular, there is some bit position which is 1 in every element of the special clique except for 1.

We can turn that around: it suffices to find a bit position B, a number T which is zero on bit B, and a special clique containing T whose other elements are all 1 on bit B. We can easily make a list of candidates to put in this special clique: any element which is 1 on bit B and which is connected to T. What's more, these candidates are all connected to each other (through bit B), so taking T plus all candidates forms a clique. What's more, if we can make a special clique by using only a subset of the candidates, then the clique formed using all candidates will be special too, so the latter is the only one we need to test.

Medium

The key insight is to treat this as a flow/matching problem. Rather than thinking of filling and emptying an unlabelled tree, we can just build a labelled tree with the properties that if A's parent is B, then B < A and A appears earlier in p than B does. A few minutes thinking should convince you that any labelled tree satisfying these constraints can be used for the sorting.

We thus need to match each number to a parent (either another number or the root). We can set this up with a bipartite graph in the usual way: each number appears on the left, connected to a source (for network flow) with an edge of capacity 1. Each number also appears on the right, along with the root, all connected to the sink with N edges of capacity 1 (we'll see later why I don't say one edge of capacity N). Between the two sides, we add an edge A → B if B is a viable parent for A, again with capacity 1. Each valid assignment can be represented by a maxflow on this network (with flow N), where the edges across the middle are parent-child relationships. Note that normally using this approach to building a tree risks creating cycles, but the heap requirement on the tree makes that impossible.

That tells us which trees are viable, but we still need to incorporate the cost function. That is where the edges from the right numbers to the sink come in. A (non-root) vertex starts with a cost of 1, then adding more children increases the cost by 3, 5, 7, ... Thus, we set the costs of the edges from each number to the sink to this sequence. Similarly, the costs of the edges from the root to the sink have costs 1, 3, 5, 7, ... The min-cost max-flow will automatically ensure that the cheaper edges get used first, so the cost of the flow will match the cost of the tree (after accounting for the initial cost of 1 for each number, which can be handled separately).

Hard

This one requires a fairly solid grasp on linear algebra and vector spaces. The set with \(2^k\) elements is a vector subspace of \(\mathbb{Z}_2^N\) of dimension k, for some unknown N. It will have a basis of size k. To deal more easily with ordering, we will choose to consider a particular canonical basis. Given an arbitrary initial basis, one can use Gaussian reduction and back propagation to obtain a (unique) basis with the following property: the leading 1 bit in each basis element is a 0 bit in every other basis element. With this basis, it is not difficult to see that including any basis element in a sum will increase rather than decrease the sum. The combination of basis elements forming the ith sorted element is thus given exactly by the binary representation of i.

We can now take the information we're given an recast it is a set of linear equations. Furthermore, we can perform Gaussian elimination on these linear equations (which we actually do in code, unlike the thought experiment Gaussian elimination above). This leaves some basis elements fixed (uniquely determined by the smaller basis elements) and others free. Because of the constraints of the basis elements, we also get some more information. If basis i appears in an equation, its leading bit must correspond to a 1 bit in the value (and to the first 1 bit, if i is the leading basis in the equation). Similarly, if basis i does not appear, then it must correspond to a 0 bit in the value. This allows us to build a mask of where the leading bit for each basis element can be.

We now switch to dynamic programming to complete the count. We count the number of ways to assign the first B basis elements using the first C value bits. Clearly dp[0][C] = 1. To add new basis, we can consider the possible positions of its leading bit (within the low C bits), and then count the degrees of freedom for the remaining bits. If the basis is fixed then there is 1 degree of freedom; otherwise, there is a degree of freedom for each bit that isn't the leading bit of a previous basis, and it is easy to count these.

There are a few details that remain to do with checking whether there are an infinite number of solutions (which can only happen if the largest basis is unconstrained), and distinguishing between a true 0 and a 0 that is actually a multiple of 1000000007, but those are left as exercises for the reader.

Topcoder Open 2016 Semifinal 1

2016-11-20T09:10:00.005-08:00

Here are my solutions to the first semifinal of the TCO 2016 algorithm contest.

Easy

Let's start with some easy cases. If s = t, then the answer is obviously zero. Otherwise, it is at least 1. If s and t share a common factor, then the answer is 1, otherwise it is at least 2.

We can eliminate some cases by noting that some vertices are isolated. Specifically, if either s or t is 1 or is a prime greater than n/2 (and s≠t) then there is no solution.

Assuming none of the cases above apply, can the answer be exactly 2? This would mean a path s → x → t, where x must have a common factor with s and a (different) common factor with t. The smallest x can be is thus the product of the smallest prime factors of s and t. If this is at most n, then we have a solution of length 2, otherwise the answer is at least 3.

If none of the above cases applies, then answer is in fact 3. Let p be the smallest prime factor of s and q be the smallest prime factor of t. Then the path s → 2p → 2q → t works (note that 2p and 2q are at most n because we eliminated cases where s or t is a prime greater than n/2 above).

Medium

We can create a graph with the desired properties recursively. If n is 1, then we need just two vertices, source and sink, with an edge from source to sink of arbitrary weight.

If n is even, we can start with a graph with n/2 min cuts and augment it. We add a new vertex X and edges source → X and X → sink, both of weight 1. For every min cut of the original graph, X can be placed on either side of the cut to make a min cut for the new graph.

If n is odd, we can start with a graph with n - 1 min cuts and augment it. Let the cost of the min cut of the old graph be C. We create a new source, and connect it to the original source with an edge of weight C. The cost of the min cut of this new graph is again C (this can easily be seen by considering the effect on the max flow rather than the min cut). There are two ways to achieve a cut of cost C: either cut the new edge only, or make a min cut on the original graph.

We just need to check that this will not produce too many vertices. In the binary representation of n, we need a vertex per bit and a vertex per 1 bit. n can have at most 10 bits and at most 9 one bits, so we need at most 19 vertices.

Hard

I made a complete mess of this during the contest, solving the unweighted version and then trying to retrofit it to handle a weighted version. That doesn't work.

Here's how to do it, which I worked out based on some hints from tourist and coded at 3am while failing to sleep due to jetlag. Firstly, we'll make a change that makes some of the later steps a little easier: for each pair of adjacent vertices, add an edge with weight ∞. This won't violate the nesting property, and guarantees that there is always a solution. Let the length of an edge be the difference in indices of the vertices (as opposed to the cost, which is an input). Call an edge right-maximal if it's the longest edge emanating from a particular vertex, and left-maximal if it is the longest edge terminating at a particular vertex. A useful property is that an edge is always either right-maximal, left-maximal, or both. To see this, imagine an edge that is neither: it must then be nested inside longer edges terminating at each end, but these longer edges would cross, violating the given properties.

We can now make the following observations:

If x → y is right-maximal and color[x] is used, then color[y] must be used too. This is because the path must pass through x, and after x cannot skip over y.
If x → y is the only edge leaving x, then the edge will appear in a path if and only if color[x] is used.
If x → y is right-maximal and x → z is the next-longest edge from x, then x → y is in a path if and only if color[x] is used and color[z] is not used.

These statements all apply similarly for left-maximal edges of course. We can consider vertices 0 and n to be of colour 0 in the above, which is a color that must be picked. What is less obvious is that condition 1 (together with its mirror) is sufficient to determine whether a set of colours is admissible. Suppose we have a set of colours that satisfies condition 1, and consider the subset S of vertices whose colours come from this set. We want to be sure that every pair of consecutive vertices in S are linked by an edge. Suppose there is a pair (u, v) which are not. Consider the right-maximal edge from u: by condition 1, it must terminate at some vertex in S, which must then be to the right of v. But the left-maximal edge to v must start to the left of u, and so these two edges cross.

We can now use this information to construct a new graph, whose minimum cut will give us the cost of the shortest path. It has a vertex per colour, a source vertex corresponding to the special colour 0, and a sink vertex not corresponding to a colour. The vertices on the source side of the cut correspond to the colours that are visited. Every time colour C implies colour D, add an edge C → D of weight ∞, to prevent C being picked while D is not. If an edge falls into case 2 above, add an edge from color[x] to the sink with weight equal to the edge cost. Otherwise it falls into case 3; add an edge from color[x] to color[y] with weight equal to the edge cost. Each edge in the original graph now corresponds to an edge in this new graph, and the original edge is traversed if and only if the new edge forms part of the cut.

Code Jam and Distributed Code Jam 2016 solutions

2016-06-13T01:44:00.003-07:00

Code Jam

Since the official analysis page still says "coming soon", I thought I'd post my solutions to both the Code Jam and Distributed Code Jam here.

Teaching Assistant

We can start by observing that when we take a new task, it might as well match the mood of the assistant: doing so will give us at least 5 points, not doing so will give us at most 5. Also, there is no point in getting to the end of the course with unsubmitted assignments: if we did, we should not have taken that assignment, instead submitting a previous assignment (a parity argument proves that there was a previous assignment). So what we're looking for is a way to pair up days of the course, obeying proper nesting, scoring 10 points if the paired days have the same mood and 5 points otherwise.

That's all we need to start on the small case, which can be done with a standard DP. For every even-length interval of the course, we compute the maximum score. We iterate to pick the day to match up to the first day of the interval, then use previous results to find the best scores for the two subintervals this generates.

That's O(N³), which is too slow for the large case. It turns out that a greedy approach works: every day, if the mood matches the top of the stack, submit, otherwise take a new assignment. The exception is that once the stack height matches the number of remaining days, one must submit every day to ensure that the stack is emptied.

I haven't quite got my head around a proof, but since it worked on the test case I had for the small input I went ahead and submitted it.

Forest university

The very weak accuracy requirement, and the attention drawn to it, is a strong hint that the answer should be found by simulation rather than analytically. Thus, we need to find a way to uniformly sample a topological walk of the forest. This is not as simple as always uniformly picking one of the available courses. Consider one tree in the forest: it can be traversed in some number of ways (all of them starting with the root of the tree), and the rest of the forest can be traversed in some number of ways, and then these can be interleaved arbitrarily. If we consider all ways to interleave A items and B items, then A/(A+B) of them will start with an element of A. Thus, the probability that the root of a particular tree will be chosen is proportional to the size of that tree.

After picking the first element, the available courses again form a forest, and one can continue. After uniformly sampling a sequence of courses, simply check which of the cool words appear on the hat.

Rebel against the empire

I consider this problem to be quite a bit harder than D. It's easier to see what to do, but reams of tricky code.

For the small case, time is not an issue because the asteroids are static. Thus, one just needs to find the bottleneck distance to connect asteroids 0 and 1. I did this by adding edges from shortest to longest to a union-find structure until 0 and 1 were connected (ala Kruskal's algorithm), but it can also be done by priority-first search (Prim's algorithm) or by binary searching for the answer and then checking connectivity.

For the large case we'll take that last approach, binary searching over the answers and then checking whether it is possible. Naturally, we will need to know at which times it's possible to make jumps between each pair of asteroids. This is a bit of geometry/algebra, which gives a window of time for each pair (possibly empty, possibly infinite). Now consider a particular asteroid A. In any contiguous period of time during which at least one window is open, it's possible to remain on A, regardless of S, by simply jumping away and immediately back any time security are about to catch up. Also, if two of these intervals are separated in time by at most S, they can be treated as one interval, because one can sit tight on A during the gap between windows. On the other hand, any period longer than S with no open windows is of no use.

The key is to ask for the earliest time at which one can arrive at each interval. This can be done with a minor modification of Dijkstra's algorithm. The modification is that the outgoing edges from an interval are only those windows which have not already closed by the time of arrival at the interval.

Go++

I liked this problem, and I really wish the large had been worth fewer points, because I might then have taken it on and solved it.

The key is to realise that the good strings don't really matter. Apart from the trivial case of the bad string also being a good string, it is always possible to solve the problem by producing a pair of programs that can produce every string apart from the bad one.

For the small case, we can use 0?0?0?0? (N repetitions) for the first program, and 111 (N-1 repetitions) for the second (but be careful of the special case N=1!) Each of the 1's can be interleaved once into the first program to produce a 1 in the output, but because there are only N-1 1's, it is impossible to produce the bad string.

For the large case, the same idea works, but we need something a bit more cunning. The first program is basically the same: alternating digits and ?'s, with the digits forming the complement of the bad string. To allow all strings but the bad string to be output, we want the second program to be a string which does not contain the bad string as a subsequence, but which contains every subsequence of the bad string as a subsequence. This can be achieved by replacing every 1 in the bad string by 01 and every 0 by 10, concatenating them all together, then dropping the last character. Proving that this works is left as an exercise for the reader.

Distributed Code Jam

Again

The code appears to be summing up every product of an element in A with an element in B. However, closer examination shows that those where the sum of the indices is a multiple of M (being the number of nodes) are omitted. However, once we pick an index modulo M for each, we're either adding all or none. The sum of all products of pairs is the same as the product of the sums. So, for each remainder modulo M, we can add up all elements of A, and all elements of B. Then, for each pair of remainders, we either multiply these sums together and accumulate into the grand total, or not. There are some details regarding the modulo 1000000007, but they're not difficult.

For the large case, we obviously need to distribute this, which can be done by computing each sum on a different node, then sending all the sums back to the master.

lisp_plus_plus

Here, a valid lisp program is just a standard bracket sequence (although it is required to be non-empty). It's well-known that a sequence is valid if and only if:

the nesting level (number of left brackets minus number of right brackets) of every prefix is non-negative; and
the nesting level of the whole sequence is zero.

If the nesting level ever goes negative in a left-to-right scan, then the point just before it went negative is the longest sequence that can be completed. This makes the small case trivial to implement.

To distribute this, we can do the usual thing of assigning each node an interval to work on. We can use a parallel prefix sum to find the nesting level at every point (there is lots of information on the internet about how to do this). Then, each node can find it's first negative nesting level, and send this back to a master to find the globally first one. We also need to know the total nesting level by sending the sum for each interval to the master, but that's already done as part of the parallel prefix sum.

Asteroids

I liked this one better than the Code Jam asteroids problem, but it was also rather fiddly. The small case is a straightforward serial DP: going from bottom to top, compute the maximum sum possible when ending one's turn on each location.

At first I thought that this could be parallelised in the same manner as Rocks or Mutex from last year's problems. However, those had only bottom-up and left-to-right propagation of DP state. In this problem, we have bottom-up, left-to-right and right-to-left. On the other hand, the left-to-right and right-to-left propagation is slow: at one most unit for every unit upwards. We can use this!

Each node will be responsible for a range of columns. However, let's say it starts with knowledge of the prior state for B extra columns on either side. Then after one row of DP iteration, it will correctly know the state for B-1 extra columns, and after B iterations it will still have the correct information for the columns its responsible for. At this stage it needs to exchange information with its neighbours: it will receive information about the B edge columns on either side, allowing it to continue on its way for another B rows.

There are a few issues involved in picking B. If it's too small (less than 60), then nodes will need to send more than 1000 messages. If it's too large, then nodes will do an excessive number of extra GetPosition queries. Also, B must not be larger than the number of columns a node is handling; in narrow cases we actually need to reduce the number of nodes being used.

Gas stations

The serial version of this problem is a known problem, so I'll just recap a standard solution quickly. At each km, you need to decide how much petrol to add. If there is a town within T km that is cheaper, you should add just enough petrol to reach it (the first one if there are multiple), since any extra you could have waited until you got there. Otherwise, you should fill up completely. A sweep algorithm can give the next cheaper town for every town in linear time.

What about the large case? The constraint is interesting: if we had enough memory, and if GetGasPrice was faster, there is enough time to do it serially! In fact, we can do it serially, just passing the baton from one node to the next as the car travels between the intervals assigned to each node.

What about computing the next cheaper station after each station? That would be difficult, but we don't actually need it. We only need to know whether there is a cheaper one within T km. We can start by checking the current node, using a node-local version of the next-cheapest array we had before. If that tells us that there is nothing within the current node, and T km away is beyond the end of the current node, then we will just find the cheapest within T km and check if that is cheaper than the current value. The range will completely overlap some nodes; for those nodes, we can use a precomputed array of the cheapest for each node. There will then be some prefix of another node. We'll just call GetGasPrice to fetch values on this node. Each iteration this prefix will grow by one, so we only need one call to GetGasPrice per iteration (plus an initial prefix). This means we're calling GetGasPrice at most three times per station (on average), which turns out to the fast enough. It's possible to reduce it to 2 by computing the minimum of the initial prefix on the node that already has the data, then sending it to the node that needs it.

There is one catch, which caused me to make a late resubmission. I was calling GetGasPrice to grow the prefix in the serial part of the code! You need to make all the calls up front in the parallel part of the code and cache the results.

Solutions to Facebook Hacker Cup 2016

2016-03-06T10:52:00.000-08:00

Since the official solutions aren't out yet, here are my solutions. Although I scored 0 in the contest, four of my solutions were basically correct, one I was able to solve fairly easily after another contestant pointed out that my approach was wrong, and I figured out Grundy Graphs on the way home.

Snake and Ladder

I'll describe two solutions to this. There are a few things one can start off with. Firstly, if there are rows at the top or bottom that are completely full of flowers, just cut them off — they make no difference. Then, handle N=1 specially: if K=1, the answer is 1, if K=0, it is 2. A number of people (including all the contest organisers!) messed up the first, and I messed up the second. Also, if one rung has two flowers on it, the answer is 0.

My contest solution used dynamic programming. If one cuts the ladder half-way between two rungs and considers the state of the bottom half, it can be divided into four cases:

Only the left-hand side has the snake on it.
Only the right-hand side has the snake on it.
Both sides have the snake on it, and the two are connected somewhere below.
Both sides have the snake on it, and the two do not connect.

It is reasonably straightforward (as long as one is careful) to determine the dynamic programming transitions to add a rung, for each possibility of a flower on the left, on the right, or no flower. Advancing one rung at a time will be too slow for the bound on N, but the transitions can be expressed as 4×4 matrices and large gaps between flowers can be crossed with fast matrix exponentiation. Finally, one must double the answer to account for the choice of which end is the head.

The alternative solution is to argue on a case-by-case basis. If there no flowers, then one can pick a row for the head and a row for the tail, and pick a column for the head (the column for the tail is then forced by parity). Once these choices are made, there is only one way to complete the snake. Also, the head and tail can only occupy the same row if it is the top or bottom row, giving 2N(N-1)+4 ways.

If there is at least one flower, then the head must be above the top flower and the tail below the bottom flower (or vice versa). Again, one can choose a row for each, with the column being forced by parity. One must also consider special cases for a flower in the top/bottom row.

Boomerang Crew

I think this was possibly the easiest of the problems, provided you thought clearly about it (which I failed to do). Clearly any opponents weaker than (or equal to) your champion can be defeated just be putting them up against your champion. Also, if you're going to defeat P players, they might as well be the weakest P players of the opposition. For a given P, can this be done? There will be a certain number of strong players (stronger than our champion), which we will have to sacrifice our weaker players against to wear them down. What's more, once we've worn one down and beaten him/her, we have to use the winner of that game to wear down the next strong opponent. Thus, we should match up the S strong players against our best S players in some order, and use our remaining players as sacrifices. Ideally we'd like to wear down each opponent to exactly the skill level of the player we will use to win; anything more is wasted effort. We can apply this idea greedily: for each opponent, find the remaining strong player of ours who can win by the smallest margin.

With that method to test a particular P, it is a standard binary search to determine the number of players we can beat.

Grundy Graph

I think this was the hardest problem: my submission wasn't a real solution and was just for a laugh; several other people I spoke to had submitted but didn't believe in their solution. During the contest I suspected that the solution would be related to 2-SAT, but it wasn't until the flight home that I could figure out how to incorporate turn ordering.

Let us construct an auxiliary directed graph A whose vertices correspond to those of the original. An edge u->v in A means that if u is black, then v must be black as well for Alice to win. All the edges in the original graph are clearly present in A. However, for each u->v in the original, we also add v'->u', where x' is the vertex assigned a colour at the same time as x. This is because if v' is black but u' is white, then it implies that u is black and v is white. This is the same as the contrapositive edge in a 2-SAT graph. Let x=>y indicate that y is reachable from x in A.

There are a number of situations in which we can see that Bob can win:

If any strongly connected component contains both x and x', then Bob wins regardless of what Alice and Bob play.
If Bob controls vertices x and y and x=>y, then Bob wins by assigning x black and y white, regardless of Alice's play.
If Bob controls x, Alice controls y, x=>y, x' => y', and y is played before x, then Bob wins by assigning y the opposite colour to x.

We claim that if none of the above apply, then Alice wins. Her strategy is that on her turn, she must colour x black if there is any u such that u => x and either u has already been coloured black, u could later be coloured black by Bob, or u = x'. This ensures that it does not immediately allow Bob to win, nor allows Bob to use it to win later. The only way this could fail is if both x and x' are forced to be black (if neither is forced, Alice can pick freely).

Suppose u => x and v => x', where u and v are as described above. Then x => v', so u => v'. Firstly consider u = x'. We cannot have v = x (because otherwise x, x' are in the same SCC). Now x' => v', so v => v'. If Bob controls v, then we hit the second case above; if Alice controls v, then v => v' means she would never have chosen v. So we have a contradiction. Now, if u ≠ x' (and by symmetry, v ≠ x), then consider u => v' again. Alice cannot control v/v', since she would then have chosen v' rather than v. But similarly, v => u', so Alice couldn't have chosen u. It follows that Bob controls u and v, which can only be possible if u = v'. But u and u' can only both be relevant if Bob hasn't decided their colour yet, which is the third case above.

The implementation details are largely left as an exercise. Hint: construct the SCCs of A, and the corresponding DAG of the SCCs, then walk it in topological order to check the conditions. The whole thing takes linear time.

RNG

The concept here is a reasonably straightforward exponential DP, but needs a careful implementation to get all the formulae right, particularly in the face of potential numeric instability. The graph itself isn't particularly relevant; the only "interesting" vertices are the start and the gifts, and the only useful information is the distance from each interesting vertex to each other one (if reachable) and the distance from each gift to the nearest dead-end. The DP state is the set of gifts collected and the current position (one of the interesting vertices). When one is at a gift, there are two possible options:

Pick another gift, and make towards it along the shortest path. Based on the path length L, there is a probability P^L of reaching it in time DL, and a probability 1 - P^L of respawning, with an expected time computable from a formula.
Aim to respawn by heading for the nearest dead-end. Again, one can work out a formula for the expected time to respawn.

When one is at the start, one should of course pick a gift and head for it. Again, one can compute an expected time to reach it (one obtains a formula that includes its own result, but a little algebra sorts that out).

Maximinimax flow

I think I liked this problem the best, even though I screwed up the limits on the binary search. Firstly, what is the minimax flow? For a general graph it would be nasty to compute, but this graph has the same number of edges as vertices. This means that it has exactly one cycle with trees hanging off of it. It shouldn't be too hard to convince yourself that the minimax flow is the smaller of the smallest edge outside the cycle and the sum of the two smallest edges inside the cycle.

Veterans will immediately reach for a binary search, testing whether it is possible to raise the minimax flow to a given level. For the edges outside the ring, this is a somewhat standard problem that can be solved with query structures e.g. Fenwick trees, indexed by edge value. To raise the minimum to a given level, one needs to know the number of edges below that level (one Fenwick tree, storing a 1 for each edge), and their current sum (a second tree, storing the edge weight for each edge).

For the vertices inside the cycle it is only slightly more complex. If the required level is less than double the second-smallest edge, then one need only augment the smallest edge. Otherwise one raises all edges to half the target, with a bit of special-casing when the target is odd. The Fenwick tree can also be queried to find the two smallest edges, but I just added a std::multiset to look this up.

Rainbow strings

This seemed to be one of the most-solved problems, and is possibly the easiest if you have a suffix tree/array routine in your toolbox. A right-to-left sweep will yield the shortest and longest possible substring for each starting position (next green for the shortest, next red for the longest). The trick is to process queries in order of length. Each entry in the suffix array will become usable at some length, and unusable again at another length, and these events, along with the queries, can be sorted by length for processing. The only remaining work is to determine the Kth of the active suffixes, which can be done with a Fenwick tree or other query structure.

Finally, one needs the frequency table for each prefix to be able to quickly count the frequency in each substring.

Analysis of ICPC 2015 world finals problems

2015-05-20T13:58:00.001-07:00

Analysis of ICPC 2015 world finals problems

I haven't solved all the problems yet, but here are solutions to those I have solved (I've spent quite a bit more than 5 hours on it though).

A: Amalgamated Artichokes

This is a trivial problem that basically everybody solved - I won't go into it.

B: Asteroids

This looks quite nasty, but it can actually be solved by assembling a number of fairly standard computational geometry tools, and the example input was very helpful. To simplify things, let's switch to the frame of reference of the first asteroid, and assume only the second asteroid moves (call them P and Q for short). The first thing to notice is that the area is a piecewise quadratic function of time, with changes when a vertex of one polygon crosses an edge of the other. This is because the derivative of area depends on the sections of boundary of Q inside P, and those vary linearly in those intervals. Finding the boundaries between intervals is a standard line-line intersection problem. To distinguish between zero and "never", touching is considered to be an intersection too.

We also need a way to compute the area of the intersection. Clipping a convex polygon to a half-space is a standard problem too - vertices are classified as inside or outside, and edges that go from inside to outside or vice versa introduce a new interpolated vertex, so repeated clipping will give the intersection of convex polygons. And finding the area of a polygon is also very standard.

Finally, we need to handle the case where the best area occurs midway through one of the quadratic pieces. Rather than try to figure out the quadratic coefficients geometrically, I just sampled it at three points (start, middle, end) and solved for the coefficients, and hence the maximum, with algebra.

C: Catering

I'm sure I'm seen something similar, but I'm not sure where. It can be set up as a min-cost max-flow problem. Split every site into two, an arrival area and a departure area. Connect the source to each departure area, with cap 1 for clients and cap K for the company. Connect each arrival area to the sink similarly. Connect every departure area to every later arrival area with cap 1, except for the company->company link with cap K. Edge costs match the table of costs where appropriate, zero elsewhere. The maximum flow is now K+N (K teams leave the company, and equipment arrives at and leaves every client), and the minimum cost is the cost of the operation.

D: Cutting cheese

This is basically just a binary search for each cut position, plus some mathematics to give the volume of a sphere that has been cut by a plane.

E: Parallel evolution

Start by sorting the strings by length. A simplistic view is that we need to go through these strings in order, assigning each to one path or the other. This can be done with fairly standard dynamic programming, but it will be far too slow.
Add an "edge" from each string to the following one if it is a subsequence of this following one. This will break the strings up into connected chains. A key observation is that when assigning the strings in a chain, it is never necessary to switch sides more than once: if two elements in a chain are on one path, one can always put all the intervening elements on the same path without invalidating a solution. Thus, one need only consider two cases for a chain: put all elements in one path (the opposite path to the last element of the previous chain), or start the chain on one path then switch to the other path part-way through. In the latter case, one should make the switch as early as possible (given previous chains), to make it easier to place subsequent strings. A useful feature is that in either case, we know the last element in both paths, which is all that matters for placing future chains. Dynamic programming takes care of deciding which option to use. Only a linear number of subsequence queries is required.
I really liked this problem, even though the implementation in messy. It depends only on the compatibility relationship being a partial order and there being a cheap way to find a topological sort.

F: Keyboarding

This can be done with a fairly standard BFS. The graph has a state for each cursor position and number of completed letters. There are up to 5 edges from each state, corresponding to the 5 buttons. I precomputed where each arrow key moves to from each grid position, but I don't know if that is needed. It's also irrelevant that keys form connected regions.

H: Qanat

This is a good one for a mathsy person to work out on paper while someone else is coding - the code itself is trivial. Firstly, the slope being less than one implies that dirt from the channel always goes out one of the two nearest shafts - whichever gets it to the surface quicker (it helps to think of there being a zero-height shaft at x=0). Between each pair of shafts, one can easily find the crossover point.

Consider just three consecutive shafts, and the cost to excavate them and the section of channel between them. If the outer shafts are fixed, the cost is a quadratic in the position of the middle shaft, from which one can derive a formula for its position in terms of its neighbours. This gives a relationship of the form x_i+2 - gx_i+1 + x_i = 0. In theory, this can now be used to iteratively find the positions of all the shafts, up to a scale factor which is solved by the requirement for the final shaft (the mother well) to be at x=W.
I haven't tested it, but I think evaluating the recurrence could lead to catastrophic rounding problems, because errors introduced early on are exponentially scaled up, faster than the sequence itself grows. The alternative I used is to find a closed formula for the recurrence. This is a known result: let a and b be the roots of x² - gx + 1 = 0; then the ith term is r(aⁱ - bⁱ) for the scale factor r. Finding the formula for g shows that the roots will always be real (and distinct) rather than complex.

I: Ship traffic

This mostly needed some careful implementation. For each ship, there is an interval during which the ferry cannot launch. Sort the start and end times of all these intervals and sweeping through them, keeping track of how many ships will be hit for each interval between events. The longest such interval with zero collisions is the answer.

J: Tile cutting

Firstly, which sizes can be cut, and in how many ways? If the lengths from the corners of the parallelograms to the corners of the tile are a, b, c, d, then the area is ab + cd. So the number of ways to make a size is the number of ways it can be written as ab + cd, for a, b, c, d > 0. The number of ways to write a number as ab can be computed by brute force (iterate over all values of a and b for which ab <= 500000). The number of ways to write a number as ab + cd is the convolution of this function with itself (ala polynomial multiplication). There are well-known algorithms to do this in O(N log N) time (with an FFT) or in O(N^1.58) time (divide-and-conquer), and either is fast enough.

K: Tours

I've got a solution that passes, but I don't think I can fully prove why.
I start by running a DFS to decompose the graph into a forest plus edges from a node to an ancestor in the forest. Each such "up" edge corresponds to a simple cycle. Clearly, the number of companies must divide into the length of any cycle, so we can take the GCD of these cycle lengths.
If the cycles were all disjoint, this would (I think) be sufficient, but overlapping cycles are a problem. Suppose two cycles overlap. That means there are two points A and B, with three independent paths between them, say X, Y, Z. If X has more than its share of routes from some company, then Y must have less than its share, to balance the tour X-Y. Similarly, Z must have less than its share. But then the tour Y-Z has too few routes from that company. It follows that X, Y and Z must all have equal numbers of routes from each company, and hence the number of companies must divide into each of their lengths.
And now for the bit I can't prove: it seems to be sufficient to consider only pairs of the simple cycles corresponding to using a single up edge. I just iterate over all pairs and compute the overlap. That is itself not completely trivial, requiring an acceleration structure to compute kth ancestors and least common ancestors.

L: Weather report

This is an interesting variation of a classic result in compression theory, Hamming codes. The standard way to construct an optimal prefix-free code from a frequency table is to keep a priority queue of coding trees, and repeatedly combine the two least frequent items by giving them a common parent. That can't be done directly here because there are trillions of possible strings to code, but it can be done indirectly by noting that permutations have identical frequency, and storing them as a group rather than individual items. The actions of the original algorithm then need to be simulated, pairing up large numbers of identical items in a single operation.

2014 NEERC Southern Subregional

2014-10-26T02:42:00.001-07:00

The ICPC NEERC South Subregional was mirrored on Codeforces. It was a very nice contest, with some approachable but also challenging problems. Here are my thoughts on the solutions (I solved everything except A, J and L during the contest).

A: Nasta Rabbara

This is quite a nasty one, and my initial attempt during the contest was all wrong. The first thing to be aware of is that a single query can be answered in O(L log L) time (or less) using a modification on the standard union-find data structure: each edge in the union-find structure is labelled to indicate whether end-points have the same or the opposite parity, and the find operation tracks whether the returned root has the same or opposite parity as the query point. That way, edges that create cycles can be found to be either odd- or even-length cycles. Of course, this won't be fast enough if all the queries are large.

Ideally, one would like a data structure that allows both new edges to be added and existing edges to be removed. That would allow for a sliding-window approach, in which we identify the maximal left end-point for each right end-point. However, the 10 second time limit suggests that this is not the right approach.

Instead, there is a \(O(N+(M+Q)\sqrt{M}\log N)\) solution. Dividing the series into blocks of length \(\sqrt{M}\). For each block, identify all the queries with a right end-point inside the block. Now build up a union-find structure going right-to-left, starting from the left edge of the block. Whenever you hit the left end-point of one of the identified queries, add the remaining episodes for the query (which will all come from inside the block) to answer the query, then undo the effect of these extra episodes before continuing. As long as you don't do path compression, each union operation can be unwound in O(1) time. This will miss queries that occur entirely inside a block, but these can be answered by the algorithm from the first paragraph as they are short.

B: Colored blankets

It turns out that it is always possible to find a solution. Firstly, blankets with no colour can be given an arbitrary colour on one side. It is fairly easy to see that we need only allocate blankets to kits such that each kit contains blankets of at most two colours. Repeat the following until completion:

If any colour has at most K/N blankets remaining and that colour has not been painted onto any kit, put those blankets into a new kit and paint it with that colour (this might involve zero blankets into that kit).
Otherwise, pick any colour that has not been painted on a kit. There must be a more than K/N blankets of that colour. Use enough of them to fill up any non-full painted kit. There must be such a kit, otherwise there are more than K blankets in total.

Since each kit is painted with a unique colour, this generates at most N kits; and since each kit has K/N blankets in it, it must generate exactly N kits.

C: Component tree

The only difficulty with this problem is that the tree might be very deep, causing the naive algorithm to spend a lot of time walking up the tree. This can be solved with a heavy-light decomposition. On each heavy path, for each attribute that appears anywhere on the path, store a sorted list (by depth) of the nodes containing that attribute. When arriving on a heavy path during a walk, a binary search can tell where the nearest ancestor with that property occurs on the heavy path. I think this makes each query O(log N) time.

D: Data center

This is a reasonably straightforward sliding window problem. Sort the servers of each type, and start with the minimum number of low voltage servers (obviously taking biggest first). One might also be required to take all the low voltage servers plus some high voltage servers. Then remove the low voltage servers one at a time (smallest first), and after each removal, add high voltage servers (largest first) until the capacity is made up. Then compare this combination to the best solution so far.

E: Election

Firstly, ties can be treated as losses (both within a station and in the election), because we want the mayor to win. When merging two stations, there are two useful results that can occur: win+loss -> win, or loss+loss -> loss; in both cases the number of wins stays the same, while the number of losses goes down by one. So they are equally useful. We can determine the number of viable merges by DP: either the last two can be merged, and we solve for the other N-2; or the last one is untouched, and we solve for the other N-1.

F: Ilya Muromets

Note that the gap closing up after a cut is just a distraction: any set of heads we can cut, can also be cut as two independent cuts of the original set of heads.

We can determine the best cut for every prefixes in linear time, by keeping a sliding window of sums (or using a prefix sum). Similarly, we can determine the best cut for every suffix. Any pair of cuts can be separated into a cut of a prefix and of the corresponding suffix, so we need only consider each split point in turn.

G: FacePalm

Let's consider each k contiguous days in turn, going left to right. If the current sum is non-negative, we need to reduce some of the values. We might as well reduce the right-most value we can, since that will have the effect on as many future values as possible (past values have already been fixed to be negative, so that is not worth considering). So we reduce the last value as much as needed, or until it reaches the lower limit. If necessary, we then reduce the second-last value, and so on until the sum is negative.

The only catch is that we need a way to quickly skip over long sequences of the minimum value, to avoid quadratic running time. I kept a cache of previous non-minimum value (similar to path compression in union-find structures); a stack of the positions of non-minimum values should work too.

H: Minimal Agapov code

The first triangle will clearly consist of the minimum three labels. This divides the polygon into (up to) three sections. Within each section, the next triangle will always stand on the existing diagonal, with the third vertex being the lowest label in the section. This will recursively subdivide the polygon into two more sections with one diagonal, and so on. One catch is that ties need to be broken carefully: pick the one furthest from the base with the smaller label (this lead to me getting a WA). Rather than considering all the tie-breaking cases for the first triangle, I started with a first diagonal with the two smallest labels, and found the first triangle through the recursion.

The main tool needed for this is an efficient range minimum query. There are a number of data structures for this, and any of them should work. I used two RMQ structures to cater for the two possible tie-breaking directions. The cyclic rather than linear nature of the queries, but it is just a matter of being careful.

I: Sale in GameStore

This was the trivial problem: sort, get your friends to buy the most expensive item, start filling up on cheap items until you can't buy any more or you've bought everything.

J: Getting Ready for the VIPC

I got a wrong answer to test case 53 on this, and I still don't know why. But I think my idea is sound.

The basic approach is dynamic programming, where one computes the minimum tiredness one can have after completing each contest, assuming it is possible to complete it (this also determines the resulting skill). To avoid some corner cases, I banned entering a contest with tiredness greater than \(h_i - l_i\). However, this is \(O(N^2)\), because for each contest, one must consider all possible previous contests.

The first optimisation one can do is that one can make a list of outcomes for each day, assuming one enters a contest that day: a list of (result skill, tiredness), one for each contest. If one contest results in both less skill and more tiredness than another, it can be pruned, so that one ends up with a list that increases in both skill and tiredness. Now one can compute the DP for a contest by considering only each previous day, and finding the minimum element in the list for which the skill in great enough to enter the current contest. The search can be done by binary search, so if there are few distinct days with lots of contests each day, this will be efficient; but if every contest is on a different day, we're no better off.

The second optimisation is to note the interesting decay function for tiredness. After about 20 days of inactivity, tireness is guaranteed to reach 0. Thus, there is no need to look more than this distance into the past: beyond 20 days, we only care about the maximum skill that can be reached on that day, regardless of how tired one is. This reduces the cost to \(O(N\log N\log maxT\).

K: Treeland

Pick any vertex and take its nearest neighbour: this is guaranteed to be an edge; call the end-points A and B. An edge in a tree partitions the rest of the tree into two parts. For any vertex C, either d(A, C) < d(B, C) or vice versa, and this tells us which partition C belongs to. We can thus compute the partition, and then recursively solve the problem within each partition.

L: Useful roads

I didn't solve this during the contest, and I don't know exactly how to solve it yet.

M: Variable shadowing

This is another reasonably straightforward implementation problem. For each variable I keep a stack of declarations, with each declaration tagged with the source position and a "scope id" (each new left brace creates a new scope id). I also keep a stack of open scope ids. When a right brace arrives, I check the top-of-stack for each variable to see if it matches the just closed scope id, and if so, pop the stack.

IOI 2014 day 2 analysis

2014-07-18T01:54:00.002-07:00

I found day 2 much harder than day 1, and I still don't know how to solve all the problems (I am seriously impressed by those getting perfect scores). Here's what I've managed to figure out so far.

Update: I've now solved everything (in theory), and the solutions are below. The official solutions are now also available on the IOI website. I'll try coding the solutions at some point if I get time.

Gondola

This was the easiest of the three. Firstly, what makes a valid gondola sequence? In all the subtasks of this problem, there will be two cases. If you see any of the numbers 1 to n, that immediately locks in the phase, and tells you the original gondola for every position. Otherwise, the phase is unknown. So, the constraints are that

if the phase is known, every gondola up to n must appear in the correct spot if it appears;
no two gondolas can have the same number.

Now we can consider how to construct a replacement sequence (and also to count them), which also shows that these conditions are sufficient. If the phase is not locked, pick it arbitrarily. Now the "new gondola" column is simply the numbers from n+1 up to the largest gondola, so picking a replacement sequence is equivalent to deciding which gondola replaces each broken gondola. We can assign each gondola greater than n that we can't see to a position (one where the final gondola number is larger), and this will uniquely determine the replacement sequence. We'll call such gondolas hidden.

For the middle set of subtasks, the simplest thing is to assign all hidden gondolas to one position, the one with the highest-numbered gondola in the final state. For counting the number of possible replacement sequences, each hidden gondola can be assigned independently, so we just multiply together the number of options, and also remember to multiply by n if the phase is unknown. In the last subtask there are too many hidden gondolas to deal with one at a time, but they can be handled in batches (those between two visible gondolas), using fast exponentiation.

Friend

This is a weighted maximum independent set problem. On a general graph this is NP-hard, so we will need to exploit the curious way in which the graph is constructed. I haven't figured out how to solve the whole problem, but let's work through the subtasks:

This is small enough to use brute force (consider all subsets and check whether they are independent).
The graph will be empty, so the sample can consist of everyone.
The graph will be complete, so only one person can be picked in a sample. Pick the best one.
The graph will be a tree. There is a fairly standard tree DP to handle this case: for every subtree, compute the best answer, either with the root excluded or included. If the root is included, add up the root-excluded answers for every subtree; otherwise add up the best of the two for every subtree. This takes linear time.
In this case the graph is bipartite and the vertices are unweighted. This is a standard problem which can be solved by finding the maximum bipartite matching. The relatively simple flow-based algorithm for this is theoretically \(O(n^3)\), but it is one of those algorithms that tends to run much faster in most cases, so it may well be sufficient here.

The final test-case clearly requires a different approach, since n can be much larger. I only managed to figure this out after getting a big hint from the SA team leader, who had seen the official solution.

We will process the operations in reverse order. For each operation, we will transform the graph into one that omits the new person, but for which the optimal solution has the same score. Let's say that the last operation had A as the host and B as the invitee, and consider the different cases:

YourFriendsAreMyFriends: this is the simplest: any solution using B can also use A, and vice versa. So we can collapse the two vertices into one whose weight is the sum of the original weights, and use it to replace A.
WeAreYourFriends: this is almost the same, except now we can use at most one of A and B, and which one we take (if either) has no effect on the rest of the graph. So we can replace A with a single vertex having the larger of the two weights, and delete B.
IAmYourFriend: this is a bit trickier. Let's start with the assumption that B will form part of the sample, and add that to the output value before deleting it. However, if we later decide to use A, there will be a cost to remove B again; so A's weight decreases by the weight of B. If it ends up with negative weight, we can just clamp it to 0.

Repeat this deletion process until only the original vertex is left; the answer will be the weight of this vertex, plus the saved-up weights from the IAmYourFriend steps.

Holiday

Consider the left-most and right-most cities that Jian-Jia visits. Regardless of where he stops, he will need to travel from the start city to one of the ends, and from there to the other end. There is no point in doing any other back-tracking, so we can tell how many days he spends travelling just from the end-points. This then tells us how many cities he has time to see attractions in, and obviously we will pick the best cities within the range.

That's immediately sufficient to solve the first test case. To solve more, we can consider an incremental approach. Fix one end-point, and gradually extend the other end-point, keeping track of the best cities (and their sum) in a priority queue (with the worst of the best cities at the front). As the range is extended, the number of cities that can be visited shrinks, so items will need to be popped. Of course, the next city in the range needs to be added each time as well. Using a binary heap, this gives an \(O(n^2\log n)\) algorithm: a factor of n for each endpoint, and the \(\log n\) for the priority queue operations. That's sufficient for subtask 3. It's also good enough for subtask 2, because the left endpoint will be city 0, saving a factor of n.

For subtask 4, it is clearly not possible to consider every pair of end-points. Let's try to break things up. Assume (without loss of generality) that we move first left, then back to the start, then right. Let's compute the optimal solution for the left part and the right part separately, then combine them. The catch is that we need to know how we are splitting up our time between the two sides. So we'll need to compute the answer for each side for all possible number of days spent within each side. This seems to leave us no better off, since we're still searching within a two-dimensional space (number of days and endpoint), but it allows us to do some things differently.

We'll just consider the right-hand side. The left-hand side is similar, with minor changes because we need two days for travel (there and back) instead of one. Let f(d) be the optimal end-point if we have d days available. Then with a bit of work one can show that f is non-decreasing (provided one is allowed to pick amongst ties). If we find f(d) for d=1, 2, 3, ... in that order, it doesn't really help: we're only, on average, halving the search space. But we can do better by using a divide-and-conquer approach: if we need to find f for all \(d \in [0, D)\) then we start with \(d = \frac{D}{2}\) to subdivide the space, and then recursively process each half of the interval on disjoint subintervals of the cities. This reduces the search space to \(O(n\log n)\).

This still leaves the problem of efficiently finding the total number of attractions that can be visited for particular intervals and available days. The official solution uses one approach, based on a segment tree over the cities, sorted by number of attractions rather than position. The approach I found is, I think, simpler. Visualise the recursion described above as a tree; instead of working depth-first (i.e., recursively), we work breadth-first. We make \(O(\log n)\) passes, and in each pass we compute f(d) where d is an odd multiple of \(2^i\) (with \(i\) decreasing with each pass). Each pass can be done in a single incremental process, similar to the way we tackled subpass 2. The difference is that each time we cross into the next subinterval, we need to increase \(d\), and hence bring more cities into consideration. To do this, we need either a second priority queue of excluded cities, or we can replace the priority queue with a balanced binary tree. Within each pass, d can only be incremented \(O(n)\) times, so the total running time will be \(O(n\log n)\) per pass, or \(O(n\log n \log n)\) overall.

IOI 2014 day 1 analysis

2014-07-16T08:00:00.000-07:00

IOI 2014 day 1

Since there is no online judge, I haven't tried actually coding any of these. So these ideas are not validated yet. You can find the problems here.

Rails

I found this the most difficult of the three to figure out, although coding it will not be particularly challenging.

Firstly, we can note that distances are symmetric: a route from A to B can be reflected in the two tracks to give a route from B to A. So having only \(\frac{n(n-1)}{2}\) queries is not a limitation, as we can query all distances. This might be useful in tackling the first three subtasks, but I'll go directly to the hardest subtask.

If we know the position and type of a station, there is one other that we can immediately locate: the closest one. It must have the opposite type and be reached by a direct route. Let station X be the closest to station 0. The other stations can be split into three groups:

d(X, Y) < d(X, 0): these are reached directly from station X and of type C, so we can locate them exactly.
d(0, X) + d(X, Y) = d(0, Y), but not of type 1: these are reached from station 0 via station X, so they lie to the left of station 0.
All other stations lie to the right of station X.

Let's now consider just the stations to the right of X, and see how to place them. Let's take them in increasing distance from 0. This ensures that we encounter all the type D stations in order, and any type C station will be encountered at some point after the type D station used to reach it. Suppose Y is the right-most type D station already encountered, and consider the distances for a new station Z. Let \(z = d(0, Z) - d(0, Y) - d(Y, Z)\). If Z is type C, then there must be a type D at distance \(\frac{z}{2}\) to the left of Y. On the other hand, if Z is of type D (and lies to the right of Y), then there must be a type C station at distance \(\frac{z}{2}\) to the left of Y. In the first case, we will already have encountered the station, so we can always distinguish the two cases, and hence determine the position and type of Z.

The stations to the left of station zero can be handled similarly, using station X as the reference point instead of station 0.

How many queries is this? Every station Z except 0 and X accounts for at most three queries: d(0, Z), d(X, Z) and d(Y, Z), where Y can be different for each Z. This gives \(3(n-2) + 1\), which I think can be improved to \(3(n-2)\) just by counting more carefully. Either way, it is sufficient to solve all the subtasks.

Wall

This is a fairly standard interval tree structure problem, similar to Mountain from IOI 2005 (but a little easier). Each node of the tree contains a range to which its children are clamped. To determine the value of any element of the wall, start at the leaf with a value of 0 and read up the tree, clamping the value to the range in each node in turn. Initially, each node has the range [0, inf). When applying a new instruction, it is done top-down, and clamps are pushed down the tree whenever recursion is necessary.

An interesting aspect of the problem is that it is offline, in that only the final configuration is requested and all the information is provided up-front. This makes me think that there may be an alternative solution that processes the data in a different order, but I can't immediately see a nicer solution than the one above.

Game

I liked this problem, partly because I could reverse-engineer a solution from the assumption that it is always possible to win, and partly because it requires neither much algorithm/data-structure training (like Wall) nor tricky consideration of cases (like Rails). Suppose Mei-Yu knows that certain cities are connected. If there are any flights between the cities that she has not asked about, then she can win simply by saving one of these flights for last, since it will not affect whether the country is connected. It follows that for Jian-Jia to win, he must always answer no when asked about a flight between two components that Mei-Yu does not know to be connected, unless this is the last flight between these components?

What if he always answers yes to the last flight between two components? In this case he will win. As long as there are at least two components left, there are uncertain edges between every pair of them, so Mei-Yu can't know whether any of them is connected any other. All edges within a component are known, so the number of components can only become one after the last question.

What about complexity? We need to keep track of the number of edges between each pair of components, which takes \(O(N^2)\) space. Most operations will just decrement one of these counts. There will be \(N - 1\) component-merging operations, each of which requires a linear-time merging of these edge counts and updating a vertex-to-component table. Thus, the whole algorithm requires \(O(N^2)\) time. This is optimal given that Mei-Yu will ask \(O(N^2)\) questions.

ICPC Problem H: Pachinko

2014-06-30T09:18:00.000-07:00

Problem A has been covered quite well elsewhere, so I won't discuss it. That leaves only problem H. I started by reading the Google translation of a Polish writeup by gawry. The automatic translation wasn't very good, but it gave me one or two ideas I borrowed. I don't know how my solution compares in length of code, but I consider it much simpler conceptually.

This is a fairly typical Markov process, where a system is in some state, and each timestep it randomly selects one state as a function of the current state. One variation is that the process stops once the ball reaches a target, whereas Markov processes don't terminate. I was initially going to model that as the ball always moving from the target to itself, but things would have become slightly complicated.

Gawry has a nice way of making this explicitly a matrix problem. Set up the matrix M as for a Markov process i.e., \(M_{i,j}\) is the probability of a transition from state j to state i. However, for a target state j, we set \(M_{i,j}=0\) for all i. Now if \(b\) is our initial probability vector (equal probability for each empty spot in the first row), then \(M^t b\) represents the probability of the ball being in each position (and the game not having previously finished) after \(t\) timesteps. We can then say that the expected amount of time the ball spends in each position is given by \(\sum_{t=0}^{\infty} M^t b\). The sum of the elements in this vector is the expected length of a game and we're told that it is less than \(10^9\), so we don't need to worry about convergence. However, that doesn't mean that the matrix series itself converges: Gawry points out that if there are unreachable parts of the game with no targets, then the series won't converge. We fix that by doing an initial flood-fill to find all reachable cells and only use those in the matrix. Gawry then shows that under the right conditions, the series converges to \((I - M)^{-1} b\).

This is where my solution differs. Gawry dismisses Gaussian elimination, because the matrix can be up to 200,000 square. However, this misses the fact that it is banded: by numbering cells left-to-right then top-to-bottom, we ensure that every non-zero entry in the matrix is at most W cells away from the main diagonal. Gaussian elimination (without pivoting) preserves this property. We can exploit this both to store the matrix compactly, and to perform Gaussian elimination in \(O(W^3H)\) time.

One concern is the "without pivoting" caveat. I was slightly surprised that my first submission passed. I think it is possible to prove correctness, however. Gaussian elimination without pivoting is known (and easily provable) to work on strictly column diagonally dominant matrices. In our case the diagonal dominance is weak: columns corresponding to empty cells have a sum of zero, those corresponding to targets have a 1 on the diagonal and zeros elsewhere. However, the matrix is also irreducible, which I think is enough to guarantee that there won't be any division by zero.

EDIT: actually it's not guaranteed to be irreducible, because the probabilities can be zero and hence it's possible to get from A to B without being able to get from B to A. But I suspect that it's enough that one can reach a target from every state.

ICPC Problem L: Wires

2014-06-28T09:59:00.001-07:00

While this problem wasn't too conceptually difficult, it requires a lot of code (my solution is about 400 lines), and careful implementation of a number of geometric algorithms. A good chunk of the code comes from implementing a rational number class in order to precisely represent the intersection points of the wires. It is also very easy to suffer from overflow: I spent a long time banging my head against an assertion failure on the server until I upgraded my rational number class to use 128-bit integers everywhere, instead of just for comparisons.

The wires will divide the space up into connected regions. The regions can be represented in a planar graph, with edges between regions that share an edge. The problem is then to find the shortest path between the regions containing the two new end-points.

My solution works in a number of steps:

Find all the intersection points between wires, and the segments between intersection points. This just tests every wire against every other wire. The case of two parallel wires sharing an endpoint needs to be handled carefully. For each line, I sort the intersection points along the line. I used a dot produce for this, which is where my rational number class overflowed, but would probably have been safer to just sort lexicographically. More than two lines can meet at an intersection point, so I used a std::map to assign a unique ID to each intersection point (I'll call them vertices from here on).
Once the intersection points along a line have been sorted, one can identify the segments connecting them. I create two copies of each segment, one in each direction. With each vertex A I store a list of all segments A->B. Each pair is stored contiguously so that it is trivial to find its partner. Each segment is considered to belong to the region to its left as one travels A->B.
The segments emanating from each vertex are sorted by angle. These comparisons could easily cause overflows again, but one can use a handy trick: instead of using the vector for the segment in an angle comparison, one can use the vector for the entire wire. It has identical direction but has small integer coordinates.
Using the sorted lists from the previous step, each segment is given a pointer to its following segment from the same region. In other words, if one is tracing the boundary of the region and one has just traced A->B, the pointer will point to B->C.
I extract the contours of the regions. A region typically consists of an outer contour and optionally some holes. The outermost region lacks an outer contour (one could add a big square if one needed to, but I didn't). A contour is found by following the next pointers. A case that turns out to be inconvenient later is that some segments might be part of the contour but not enclose any area. This can make a contour disappear completely, in which case it is discarded. Any remaining contours have the property that two adjacent segments are not dual to each other, although it is still possible to both sides of an edge to belong to the same contour.
Each contour is identified as an outer contour or a hole. With integer coordinates I could just measure the signed area of the polygon, but that gets nasty with rational coordinates. Instead, I pick the lexicographically smallest vertex in the contour and examine the angle between the two incident segments (this is why it is important that there is a non-trivial angle between them). I also sort the contours by this lexicographically smallest vertex, which causes any contour to sort before any other contours it encloses.
For each segment I add an edge of weight 1 from its containing region to the containing region of its dual.
For each hole, I search backwards through the other contours to find the smallest non-hole that contains it. I take one vertex of the hole and do a point-in-polygon test. Once again, some care is needed to avoid overflows, and using the vectors for the original wires proves useful. One could then associate the outer contour and the holes it contains into a single region object, but instead I just added an edge to the graph to join them with weight 0. In other words, one can travel from the boundary of a region to the outside of a hole at no cost.
Finally, I identify the regions containing the endpoints of the new wire, using the same search as in the previous step.

After all this, we still need to implement a shortest path search - but by this point that seems almost trivial in comparison.

What is the complexity? There can be \(O(M^2)\) intersections and hence also \(O(M^2)\) contours, but only \(O(M)\) of them can be holes (because two holes cannot be part of the same connected component). The slowest part is the fairly straightforward point-in-polygon test which tests each hole against each non-hole segment, giving \(O(M^3)\) time. There are faster algorithms for doing point location queries, so it is probably theoretically possible to reduce this to \(O(M^2\log N)\) or even \(O(M^2)\), but certainly not necessary for this problem.