Entropy always increases: IOI 2012 day 1 analysis

I'm not at IOI this year, but since I didn't spot anything on the web I thought I'd post my thoughts on the Day 1 tasks. It looks like quite a tough problem set, although mostly because there are a lot of subtasks to think about and cases to handle.

UPDATE: The official webpage now has a page with hints. They seem to have essentially the same solutions as me for Crayfish Scrivener and Parachute Rings.

Crayfish Scrivener

This is a task that I proposed.

Half the challenge here is thinking about the problem in the right way. Rather than trying to undo undo's, just think of undo as a time machine: it takes you back to exactly the same state as you were N operations ago. So a simple implementation is just to maintain an array of strings through time, and whenever an undo is encountered, copy the string that was N steps ago.

That's going to use up too much memory for the later subtasks. However, there is a lot of duplication, because for every L command, the next string looks almost the same as the previous one. We can compress the whole thing by using a trie: a tree where each edge is a letter, and each string is a path from the root through the tree. Any string can be encoded as just a pointer to a node in this tree. Adding a letter requires finding the child node with a particular letter (creating it if it doesn't already exist). An undo requires just a copy of a previous pointer. Thus, each step can be simulated in O(1) time and space.

There is still the problem of answering queries. For subtasks where the queries all follow the commands, this is quite easy: just extract the final string once into an array, and answer queries from the array. For the final subtask something smarter is required: some extra information must be stored in each node. A start is to store the node's depth and its parent. Each query can be translated into a query for the kth ancestor of the current node, and the parent pointers can be used to ascend the tree. To make it fast enough, one can also store pointers to the 2ⁱ-th ancestor for all relevant i, which makes it possible to answer queries in O(log N) time. It also requires O(N log N) space: since the task description doesn't list the memory requirements I'm not sure if that will fit. It's also possible to store only the parent and the 2ⁱ-th ancestor where 2ⁱ is the largest power of 2 dividing the depth, for O(N) space.

Parachute Rings

For this problem, the trick was keeping track of all the cases. Firstly, it is quite easy to keep a histogram of the degree of each ring. We can also keep track of the components, and the number of vertices and edges in each component (this can be kept in the root of a standard union-find structure). Now, let's consider cases:

If there is a ring with at least 4 connections, it is the only possible critical ring, because removing any other ring will leave it with at least three connections. If there is more than one such ring, there can be no more critical rings.
If there is a ring with exactly 3 connections, only it and its neighbours can be critical rings. There are thus at most four candidates to consider.
If all rings have at most 2 connections, there could be many critical rings. Every component will be either a chain or a cycle, and these can be distinguished by comparing the number of edges and vertices in the component. If there are no cycles, all rings are critical. If there is one cycle, all rings in that cycle are critical. Otherwise there are no critical rings.

The last case can be checked as we go, but the others are a little more tricky because they identify only candidates, not definitely critical rings. However, at most four rings can become candidates during the lifetime of the program: the first to have at least three connections, plus those three neighbours. As soon as a ring becomes a candidate, we can replay all the previous connections on another union-find data structure with that ring removed, and thereafter maintain that structure so that it can be examined after future connections.

I've omitted one minor detail: we also need to maintain in each union-find structure whether there are 0, 1, or more cycles, and if 1, the size of the component with 1 cycle. This can quite easily be maintained as the connections are made. Thus, total memory is O(N) and total time is O(N α(N)).

Pebbling odometer

This seems like the hardest problem, if only because it takes the most time to work through all the different subtasks. I haven't tried to implement these, so my analysis may be a little off. In terms of implementation, the major annoyance of this language is the lack of subroutine calls. It is likely that you will want to write metaprograms to generate the programs, where a subroutine call is implemented by duplicating code and adjusting internal labels. There is also no mechanism for storing internal state, so state needs to be stored either in the instruction pointer (by duplicating code), or in the grid itself through counts of pebbles.

Subtask 1

Since we don't need to preserve the pebbles, we can just pair off pebbles until one of the piles is empty. Here's an outline of the process, ignoring all the details of moving around the grid.

If there is no pebble at (0, 0), then terminate at (0, 0).
Pick up a pebble from (0, 0).
If there is no pebble at (0, 1), then terminate at (0, 1).
Pick up a pebble from (0, 1).
Repeat.

Subtask 2

We can no longer destroy the state by picking up pebbles. However, we can just move them somewhere for safe keeping. Start by moving all the pebbles in (0, 0) to (1, 0) and from (0, 1) to (1, 1) (using a loop in each case). Then apply the algorithm for subtask 1, and once the result has been decided, move any remaining pebbles back to the top row. Note that there will need to be two code-paths for restoring the pebbles, depending on where you want to terminate.

Subtask 3

This should be doable by moving the pebbles towards each other until they meet. First move right until finding the first pebble. Then

Pick up a pebble.
Move forward one square.
If there is already a pebble on this square, terminate.
Add a pebble to this square.
Continue forward until reaching the other pebble.
Turn around.
Repeat.

Subtask 4

Here it gets a little trickier to tell whether the programs will actually fit, but I think this should work. Firstly, note that there is room for only three commands per cell on average. That seems too little even to do a move, a pebble test, a border test and a jump back to the start of a loop. However, loop unrolling can trade program size for execution length e.g. ((move, pebble) * 4, border, jump) to handle 4 cells gives an average of 2.5 steps per cell. I'm assuming a walk which goes alternatively left and right across rows.

That makes it possible to walk the board in few enough steps, but what do we do when we find a pebble? There probably isn't enough program size to store the number of pebbles in the program counter, so we need to move the pebbles around. We can move the pebble back to the start of the row, then restart the row from the next spot. Even if there are several pebbles in a row, we will still make forward progress. Once we've reached the end of the board, we can process the two extreme columns in a similar way to move all the pebbles up to the top-left corner. Moving pebbles will be expensive (each will cost up to a few thousand instructions, depending on implementation), but as there are at most 15 of them it is not a major issue.

Subtask 5

I've got a few ideas for this one, but nothing that I'm confident will fit into 444 instructions. The fundamental issue is that there is no way to save state in the board, since the initial board can be in any legal state and a valid solution cannot destroy information (because it has to match the final board). Thus, the only way to save state is in the program counter, which leads to massive code duplication. One is quite likely to need to store the minimum count.

This probably requires some very careful micro-optimisation: Johnny Ho's perfect score in fact takes 449 instructions, and is rounded up.

UPDATE: see the official hints page for a solution to subtask 5.

2 comments:

Unknown said...: I was speaking to Andrew Carlotti, the UK contestant who got perfect score on odometer. The solution for the last subtask is actually just to scan through the grid and check for 0s, then 1s, then 2s, etc. He did this in 435 instructions, and he's sure he could have taken at least 10 off that.; 12:21 PM
Bruce Merry said...: Ah, I'd missed that once you've checked for N, you can assume N+1 in the next pass, so you only have to do one pebble test per cell per pass.; 12:35 PM

Wednesday, September 26, 2012

IOI 2012 day 1 analysis