Entropy always increases: May 2014

Here are my thoughts/solutions for the Challenge 24 Finals problems from this year.

A: Halting problem

Firstly, be aware that the problem had a typo: the condition should be "A != 0" rather than "A == 0". The intuition is that "most" of the time the g edge will be taken, but occasionally the h edge might be taken. Given a starting point, let us suppose that we have an efficient way to iterate the algorithm until either an h edge is taken or we terminate (or we determine that neither condition will ever hold). In the former case, we have just determine that we need f(0) for some f. Since there are only O(N) of these values, we can cache them as needed, and so only require O(Q + N) rounds of the inner algorithm (namely, iterating until we reach A = 0 or terminate).

Now let us solve this inner problem. Consider the directed graph in which each function f is connected to g. A graph where every vertex has out-degree one has a specific structure, in which each component contains a single loop from which hang upward-directed trees. Iteration thus consists of walking up a tree until reaching the loop, then going round and round the loop until we exit. Of course, we might finish early, or we might go around the loop forever.

For the walking up the tree part, let's use brute force for now. Assuming we make it to the loop, how do we find the exit point? Let's say that every complete time around the loop we increase A by C, and that at some point around the loop we have A = a. Then to exit at this point after t full cycles, we must have a + tC = 0 (mod 2⁶⁴). We can solve this using the extended GCD algorithm. We can measure this for every point around the loop, and the smallest valid t value tells us where we will exit. If there are no valid t values, then we will go round forever.

This is essentially the solution I used in the contest. It is somewhat slow: O(N(N+Q)) I think. I did a lot of microoptimisation, and even then it took hours for the largest test case. The important optimisations were precomputing parameters for each loop (particularly the inverse of C) once, and reordering the data so that points around a loop are consecutive in memory. In fact over 75% of time ended up being spent in pointer-chasing up the trees, because this was cache-unfriendly.

I think the problem can be solved in something like O((N+Q)log (N+Q)), but it is rather more complex, and a blog doesn't really lend itself to the mathematical notation. I might write it up later if I get around to it. I'll give a few hints. If C is odd then increasing the A value on entry to the loop by a certain amount will increase the time-to-exit for each point by the same amount; this can be generalized if C is an odd multiple of 2^k by partitioning things by their remainder mod 2^k. The tree part can be handled by considering all queries at the same time, in a preorder walk of the tree.

C: Complete program

I had a nasty super-polynomial time solution during the contest, which was complex and only managed to solve about 7 test cases. There is a much simpler and polynomial-time DP solution, however.

Let's start with some observations. Any O not followed by a valid number will need one, so let's add it immediately (it can be anything valid). After this, O and the number might as well be treated as a single unit (there is no point inserted anything between an O and a valid number either). There is no point ever adding an I. Adding F's is useful only to increment the nesting depth, so we might as well add F's only right at the front, since that way everything will be nested inside them. Finally, whenever we add a number, it might as well be a 0.
What is less obvious is that it is only necessary to add an A immediately before a number. This requires checking a few cases, but is because an A added anywhere else can be shuffled to the right until it reaches a number, without affecting whether a token string is valid. Similar rotation arguments show that we never need to add two As or Os in front of a number.

For the DP, let's compute, for each contiguous substring of the input and each F nesting level, the minimum number of tokens we need to add to make it into a valid parse tree. There is a special case for the empty range: we must just add a 0, and we must check that the nesting depth at that point is at least 1. Otherwise, we can either

Insert an A or an O.
Use the first token in the substring as the root.

In the first case, the A or O will "swallow" the number, so if it is at most 127 we can just use an O, otherwise we must use an A and the nesting level must be sufficiently large. We then use DP to find the best way to handle the rest of the substring. The second case is largely straightforward, unless the first given token is an A. In this case, we must consider all ways in which the rest of the substring may be split into two valid trees.
Finally, we need to determine the nesting level to use at the root. It is possible, although a little slow, to just keep incrementing this value until there are so many initial Fs that it is clearly impossible to improve the solution. A more efficient solution in practice is to compute the DP over each substring as a piecewise constant function of the nesting level; this is more complex since one needs to do various operations on these piecewise constant functions, but it guarantees an O(N⁴) solution independent of the size of the numbers in the input.

D: Firing game

I solved 9 out of the 10 test cases. The primary observation is that any manager that is unfirable in the initial setup creates a completely independent subtree: firings within this subtree have no effect on the rest of the tree, and vice versa. This partitions the tree into separate Nim piles, and we just need to determine the Grundy number for each such subproblem. In all but one test case, these trees are reasonably small, and can be solved by DP/memoized recursion.

N: Dog tags

This problem looked as though it could be extremely nasty, but fortunately the data was somewhat friendly. For example, each bubble was intersected enough times to get a good guess at the volume, even though the problem only guaranteed a single intersection.

Firstly one must identify the bubbles. The loops can be considered to be a graph, with an edge between two loops if they are on adjacent slices and their interiors intersect in the XY plane. I was lazy and considered two loops to intersect if a vertex of one lay inside another, which is good enough if the loops are sampled well. Each connected component in this graph constitutes a bubble. I estimated the volume as the sum of areas of the loops, and also found the centroid (mean of all points contained in the loops).

Finding the fiduciary bubbles is easy: they are the 3 bubbles with greatest volume. Determining which one is the corner is not quite as easy, since one does not know the inter-slice distance and hence distances cannot easily be measured. I identified the first plane of bubbles (see later), and noted that the mid-point of the two corner bubbles should be close to the centroid of this set of bubbles. I just took all 3 edges to find which is best.

This now gives the grid coordinate system XYZ. One does need to check that the bubbles are on the positive Z side, otherwise the two corners need to be swapped.

Now we can partition the bubbles into planes. I sorted by Z, computed all the differences between adjacent Z values, sorted this list, and looked for a big jump between adjacent values (small values are noise within a plane, big values are plane-to-plane steps).

The same process can then be used to find each row within a plane, and then bubbles in a row can be sorted by X.

Entropy always increases

Saturday, May 10, 2014

Challenge 24 finals 2014

A: Halting problem

C: Complete program

D: Firing game

N: Dog tags