Entropy always increases: February 2013

Here's how my team approached some of the problems in the online round of this year's Challenge 24. You can read the problems here.

A. Cutting back middle management

I didn't completely solve this: A10 was too big for my O(NM²) algorithm. Consider a generalised version of the problem, in which we want to know how many managers can be fired for each possible number of subordinates the CEO ends up managing (0 to M). This can be solved by dynamic programming. Suppose we have solved it for some tree, and wish to graft on another subtree to the root. Given L, the number of subordinates that will end up reporting to the root, we pick the number A that are due to the new subtree, with the remaining L - A coming from the original tree. There is a special case when A = 1, in which case the root of the subtree may be retained. Merging in a subtree in this way requires iterating over L and over A, so takes O(M²) time per edge and hence O(NM²) for the whole tree.

I optimised this slightly by noting that near the bottom of the tree, most values of A are impossible to achieve, and do not need to be iterated over. This is sufficient to get A9 (after running for quite a long time though), but not A10.

B. Requirements

Firstly, how does the number of manuals grow? In one iteration, K manuals generate K(K - 1) new manuals, which together with the originals makes K² manuals. Thus, after T iterations there will be K^{2^T} manuals. Next, which of them will be ultimate manuals? We can compute this using inclusion-exclusion: for each subset of requirements, count the number of manuals that are missing (at least) that set of requirements, and either add or subtract it from the total depending on the parity. The final manuals that are missing a set of requirements are simply those produced from the initial manuals missing those requirements. Counting the initial manuals for each subset can be achieved in O(3^N) time.

The only thing left is to figure out how to efficiently compute K^{2^T}modulo P (where P = 1,000,000,007). Since P is a prime, we can reduce 2^T module P-1 without affecting the result. This is all that is needed to solve all test cases in this problem.

C. Road roller

We didn't have time to do anything with this problem.

D. Octal CNC

We didn't solve this during the contest, but I wrote a solution afterwards using OpenCV. It consists of three parts:

Locating the symbols
Organising the symbols into rows
Identifying the symbols

To locate the symbols, I first eroded the image by 8 pixels (which makes the black blobs larger), thresholded the image to classify each pixel as ink or paper, and detected connected components. For each connected component I took the bounding box (corrected for the erosion) and used this in subsequent steps.

To organise the bounding boxes into rows, I processed them left-to-right (by left edge), and for each symbol picked the row it best matched (by difference in height compared to the last symbol in the row). If it fails some simple heuristics it starts its own row.

To identify the symbols I used a pretty crude image descriptor: I essentially downsampled the bounding box to a 3x3 image, and normalized the result (to compensate for the ink being lighter in some symbols). These 9 values worked fairly well but there were still some ambiguities, particularly with symbol 1. I added a tenth value, the logarithm of the bounding box aspect ratio. Descriptors were compared just on Euclidean distance.

With this approach I solved all test cases except 5 (where this is a checksum failure). I haven't yet investigated which stage failed for this test case.

E. Stack compressor

This stack language is quite difficult to work with, because there is no way to rotate the stack or otherwise make any changes below the TOS without losing data. I realised afterwards that one can just copy the numbers you need from lower in the stack and keep growing the stack indefinitely (well, up to the 20 million limit). During the contest I didn't think of this, which limited how I managed to

The simplest way to do text compression is Huffman coding. I start with a lot of PUSH instructions to put the coded text on the stack, backwards. In each 32-bit word I use only 30 bits. Bit 30 I set to 1 to use as a sentinel (to detect when to move to the next word), and I use only non-negative values (because C99 has no yucky semantics for negative mod). The encoding table is represented in code rather than data. At each state it does a divide by 2 to extract another bit and branches to a child state depending on whether the bit is zero or one. The leaves have an OUT instruction and a jump back to the start.

Putting the table into code causes quite a lot of code bloat. After the contest I made a different version that encoded the transitions into the stack, and which would have scored 100% on many of the test cases. However, there are still some test cases where the 100% goal is below the entropy limit for Huffman encoding, even if all 32 bits per word could be used. Getting 100% for those presumably requires some other encoding, such as encoding of repeated strings, or possibly making use of digraphs.

F. Backup communication

A little physics shows that the cost of a launch is proportional to the distance achieved. Thus, we need to find the point from which the sum of distances is maximised. This is difficult to do directly, but starting with the centroid and using Newton-Raphson iteration solves it easily.

G. Trains

We didn't manage to solve this one.

Entropy always increases

Sunday, February 24, 2013

Challenge 24 Electronic Contest