Entropy always increases

Monday, June 02, 2014

New website

I've finally gotten around to replacing my ugly nineties-looking personal web page that was cobbled together with raw HTML and m4, with a much prettier version that nevertheless still contains roughly the same content. It is generated using Sphinx, with the individual pages written in reStructuredText. I've also merged my publications page into the main website instead of leaving it on the equally ugly but separately cobbled-together page I originally set up for my PhD.

Saturday, May 10, 2014

Challenge 24 finals 2014

Here are my thoughts/solutions for the Challenge 24 Finals problems from this year.

A: Halting problem

Firstly, be aware that the problem had a typo: the condition should be "A != 0" rather than "A == 0". The intuition is that "most" of the time the g edge will be taken, but occasionally the h edge might be taken. Given a starting point, let us suppose that we have an efficient way to iterate the algorithm until either an h edge is taken or we terminate (or we determine that neither condition will ever hold). In the former case, we have just determine that we need f(0) for some f. Since there are only O(N) of these values, we can cache them as needed, and so only require O(Q + N) rounds of the inner algorithm (namely, iterating until we reach A = 0 or terminate).

Now let us solve this inner problem. Consider the directed graph in which each function f is connected to g. A graph where every vertex has out-degree one has a specific structure, in which each component contains a single loop from which hang upward-directed trees. Iteration thus consists of walking up a tree until reaching the loop, then going round and round the loop until we exit. Of course, we might finish early, or we might go around the loop forever.

For the walking up the tree part, let's use brute force for now. Assuming we make it to the loop, how do we find the exit point? Let's say that every complete time around the loop we increase A by C, and that at some point around the loop we have A = a. Then to exit at this point after t full cycles, we must have a + tC = 0 (mod 2⁶⁴). We can solve this using the extended GCD algorithm. We can measure this for every point around the loop, and the smallest valid t value tells us where we will exit. If there are no valid t values, then we will go round forever.

This is essentially the solution I used in the contest. It is somewhat slow: O(N(N+Q)) I think. I did a lot of microoptimisation, and even then it took hours for the largest test case. The important optimisations were precomputing parameters for each loop (particularly the inverse of C) once, and reordering the data so that points around a loop are consecutive in memory. In fact over 75% of time ended up being spent in pointer-chasing up the trees, because this was cache-unfriendly.

I think the problem can be solved in something like O((N+Q)log (N+Q)), but it is rather more complex, and a blog doesn't really lend itself to the mathematical notation. I might write it up later if I get around to it. I'll give a few hints. If C is odd then increasing the A value on entry to the loop by a certain amount will increase the time-to-exit for each point by the same amount; this can be generalized if C is an odd multiple of 2^k by partitioning things by their remainder mod 2^k. The tree part can be handled by considering all queries at the same time, in a preorder walk of the tree.

C: Complete program

I had a nasty super-polynomial time solution during the contest, which was complex and only managed to solve about 7 test cases. There is a much simpler and polynomial-time DP solution, however.

Let's start with some observations. Any O not followed by a valid number will need one, so let's add it immediately (it can be anything valid). After this, O and the number might as well be treated as a single unit (there is no point inserted anything between an O and a valid number either). There is no point ever adding an I. Adding F's is useful only to increment the nesting depth, so we might as well add F's only right at the front, since that way everything will be nested inside them. Finally, whenever we add a number, it might as well be a 0.
What is less obvious is that it is only necessary to add an A immediately before a number. This requires checking a few cases, but is because an A added anywhere else can be shuffled to the right until it reaches a number, without affecting whether a token string is valid. Similar rotation arguments show that we never need to add two As or Os in front of a number.

For the DP, let's compute, for each contiguous substring of the input and each F nesting level, the minimum number of tokens we need to add to make it into a valid parse tree. There is a special case for the empty range: we must just add a 0, and we must check that the nesting depth at that point is at least 1. Otherwise, we can either

Insert an A or an O.
Use the first token in the substring as the root.

In the first case, the A or O will "swallow" the number, so if it is at most 127 we can just use an O, otherwise we must use an A and the nesting level must be sufficiently large. We then use DP to find the best way to handle the rest of the substring. The second case is largely straightforward, unless the first given token is an A. In this case, we must consider all ways in which the rest of the substring may be split into two valid trees.
Finally, we need to determine the nesting level to use at the root. It is possible, although a little slow, to just keep incrementing this value until there are so many initial Fs that it is clearly impossible to improve the solution. A more efficient solution in practice is to compute the DP over each substring as a piecewise constant function of the nesting level; this is more complex since one needs to do various operations on these piecewise constant functions, but it guarantees an O(N⁴) solution independent of the size of the numbers in the input.

D: Firing game

I solved 9 out of the 10 test cases. The primary observation is that any manager that is unfirable in the initial setup creates a completely independent subtree: firings within this subtree have no effect on the rest of the tree, and vice versa. This partitions the tree into separate Nim piles, and we just need to determine the Grundy number for each such subproblem. In all but one test case, these trees are reasonably small, and can be solved by DP/memoized recursion.

N: Dog tags

This problem looked as though it could be extremely nasty, but fortunately the data was somewhat friendly. For example, each bubble was intersected enough times to get a good guess at the volume, even though the problem only guaranteed a single intersection.

Firstly one must identify the bubbles. The loops can be considered to be a graph, with an edge between two loops if they are on adjacent slices and their interiors intersect in the XY plane. I was lazy and considered two loops to intersect if a vertex of one lay inside another, which is good enough if the loops are sampled well. Each connected component in this graph constitutes a bubble. I estimated the volume as the sum of areas of the loops, and also found the centroid (mean of all points contained in the loops).

Finding the fiduciary bubbles is easy: they are the 3 bubbles with greatest volume. Determining which one is the corner is not quite as easy, since one does not know the inter-slice distance and hence distances cannot easily be measured. I identified the first plane of bubbles (see later), and noted that the mid-point of the two corner bubbles should be close to the centroid of this set of bubbles. I just took all 3 edges to find which is best.

This now gives the grid coordinate system XYZ. One does need to check that the bubbles are on the positive Z side, otherwise the two corners need to be swapped.

Now we can partition the bubbles into planes. I sorted by Z, computed all the differences between adjacent Z values, sorted this list, and looked for a big jump between adjacent values (small values are noise within a plane, big values are plane-to-plane steps).

The same process can then be used to find each row within a plane, and then bubbles in a row can be sorted by X.

Saturday, February 08, 2014

Challenge 24 EC round 2014

Here are my thoughts on the problems we solved from this year's electronic round of Challenge 24.

A. Safe

I didn't get a complete solution to this, but I was able to solve cases 1-9. I noticed that in most cases the state space for the wheels (keys, but the pictures made me think of them as wheels) is reasonably small. For each password, the solution will involve first pressing the keys a number of times to set up the initial state, and then typing the password. So we want to find the initial states from which we can input the password, and pick the cheapest amongst them.
This can be done by dynamic programming, running backwards over the password. When there are no more characters to type, any state is fine. Considering each letter in turn, we need to find the states in which we can type the next letter on one of the keys, and from which the resulting state allows us to complete the password.

For A10 there are 6.4 billion states, and it was going to take a long time (it did churn out the first two passwords in about 3 hours). I suspect it might be possible to do it by keeping an explicit list/set of reachable states and working from there, rather than iterating over the entire state space.

B. Wiretapping

This problem basically just requires an application of Kirchhoff's Theorem, which gives the number of spanning trees of the graph in polynomial time. To count the number of spanning trees that use the tapped link, just collapse the two endpoints into a single vertex and count the number of spanning trees of the remaining graph. The ratio between the number of spanning trees in this reduced graph and the original graph is the answer.
One complication is that the number of spanning trees may overflow a double-precision float. I worked around that by scaling down the Laplacian matrix by some factor f (I used 1/32 for the larger cases), which reduces the determinant of the cofactor by f^N-1.

C. Visual Programming - VM

This was a fairly straightforward implementation problem, just requiring careful interpretation of the rules, and recognising that a lot of the information given is unnecessary (e.g., for lines you only need to keep the two endpoints).

D. Visual Programming

Due to lack of time I only solved D1, D2 and D4, and had a broken solution to D3. As with most VM problems, I invented a slightly higher-level language that is possible (but still hard) to code in by hand, and write a simpler compiler. I made a few simplifying assumptions

Polygons are laid out one below the other. Each polygon is rectangular, except that the right hand edge may be ragged.
Each line is only ever used in one direction. To ensure this, the outgoing lines from a polygon all emanate from the top part of the right edge, and incoming lines arrive at the bottom part of the right edge. There is also an unconditional jump from each polygon to the next one to ensure that the routing terminates before encountering any of the incoming lines.
To avoid lines cutting through polygons, each line emanates directly horizontally from the polygon to (256, y), then has a second segment to the target point. The x value of the start is chosen depending on which register is to be used in the comparison.
I reserved registers 0 and 1 to always contain the values 0 and 1, which made it easier to construct some other operations, as well as to build an unconditional jump.

The programming mini-language just specifies instructions and conditional a list of conditional jumps that takes place after each instruction (falling through to the next instruction if none match). The instructions I wrote at the level of setting the parameters (A-F, X-Z, T, Q, L, H), but I had useful defaults for all of them so that most instructions only changed one or two defaults e.g. setting A=3 B=15 would generate an instruction that copies from SIO into register 3.

Once all that is done, it is not particularly difficult for anyone who has done any assembly to write code for D1, D2, and D4. The only slightly tricky part was to extract the lower 8 bits from the result, which I did by multiplying to 256 then dividing by 256.

G. Spy Union

At first I thought this could be done greedily, but it doesn't work: reinstating a fired person might let you fire an additional person in each hierarchy. Instead, we can use network flow. Let's consider just one hierarchy for now, and connect the source to every node, connect each node to its parent, and connect the root to the sink (directed edges). The edges from the source have capacity 1, with the interpretation that they have flow if the corresponding employee is fired. The flow through each intermediate edge will then be the total number of employees fired from the corresponding subtree, so we will set the capacities to be the maximum number we can fire each subtree. The maximum flow will thus be the maximum number of people we can fire.

However, we have two hierarchies. Since network flow is symmetric, we can apply the same treatment to the second hierarchy, but connect the source to the root, and the individual employees to the sink. To join up the problems, we get rid of the source in the first graph and the sink in the second graph, and create capacity-1 edges directly from each employee in the second graph to the matching employee in the first. Once again, these edges have flow if the corresponding employee is fired, and the maximum flow gives the solution.

Saturday, November 23, 2013

Fun with C++11 variadic templates

The other day I decided to experiment a bit with the support for variadic template support in C++11 (the new C++ standard). Like many C++ features, they're incredibly powerful, but not always that easy to use. After some experimentation I figured out how to do what I wanted, and I thought I'd write it up in case anyone else wanted to try to do the same thing.

I wanted to create a fixed-size vector class (similar to std::array), but with a constructor that accepts the appropriate number of arguments. So for example, Vec should have a constructor that accepts 3 floats.

The simplest solution is to use initializer lists, which allow a constructor to take an arbitrary sequence of arguments of the same type. However, this has a number of disadvantages:

It can only be used with brace-initializer syntax, not parentheses.
It does not enforce the correct length at compile time.

Variadic templates allow a function to take a variable number of arguments. My second attempt at a constructor started by looking something like this:

template Vec(Args&&... args);

This is just a constructor that accepts an arbitrary set of arguments, so it doesn't seem to fix the second problem, and additionally it doesn't even require the types to be correct. We can fix that using SFINAE to statically validate the arguments before the template is instantiated. The tricky part is figuring out where to insert the enable_if: the constructor has no return type, so it can't go there; and we can't add a default argument, because that makes the interpretation of the variadic argument list ambiguous. However, we can add an extra template parameter with a default value:

template
typename E = typename enable_if_all
std::is_convertible::value...>::type>
Vec(Args&&... args);

Here enable_if_all is a variadic version of enable_if, which has a type member if all its arguments are true. It's not part of the standard, so I had to implement it myself. I won't go into details since this isn't the solution I used. It works, but it's roundabout and I'm not convinced that it won't differ in some subtle ways from a constructor that actually accepts only type Twhen in the presence of other constructors (because every parameter will match exactly, where as a constructor with float parameters might require a conversion sequence which would make this a worse match than another constructor).

In the end I realised that the solution was to create a template parameter pack with N copies of T. At first I thought this couldn't be done, because parameter packs are not types and so one cannot typedef one in a helper class to be expanded in the constructor. The solution is to create it in the vector class itself, using recursion to generate the appropriate type. You can see the code here. The VecBase class is templated with the type T, followed by N copies of T as a variadic parameter pack. Thus, for example, VecBase is a vector with 3 ints. This makes it easy to write the constructor, since we just expand the parameter pack.

The tricky part is that we really want to write Vec rather than VecBase. This is where VecHelper comes in: it recursively adds copies of T to the type name. Thus, VecHelper::type is typedefed to VecHelper::type, which in turn is typedefed to VecHelper and finally VecHelper. This is the base case that is handled in the partial specialization at line 29 to be VecBase.

Finally, we make use of the new using feature to typedef Vec to be VecHelper::type, and the problem is solved.

Friday, July 19, 2013

IOI 2013 day 2 analysis

Cave

This is a nice problem following a recent IOI trend of having problems where the limiting factor is not execution time, but some abstract measure of efficiency. This is slightly easier than some of the previous problems in each vein.

To achieve 100%, we can use up to 14 queries per door. Since 2¹⁴ is slightly more than the number of doors, this strongly suggests some form of binary search on each door. It will be easiest if we attack the doors in order. Once we know which switch and position opens a door, we can lock that switch in place so that we will always be able to see the next door, and thereafter pretend the switch does not even exist.
To solve for a door, we can start with all switches down, which will immediately tell us which switch position opens the door. At this point every switch is a candidate. We can then test with half the candidates down and half up, which will eliminate half the candidates. This process is then repeated until only one candidate remains.

Robots

As is commonly the case for problems that ask how long some agents need to achieve a goal, the answer can be found by a binary search on the answer. So we need to decide whether the robots can clear the floor in S seconds.

We can simplify the problem slightly by noting that for each toy, we only need to know how many weak and how many small robots are able to move it, which can be found by binary searching the two lists (after sorting them). Of course, if a toy cannot be moved by any robot, return -1.

Let's first decide the actions of the weak robots, starting from the weakest. There will be some set of toys it can handle. Since they effectively differ only in size, the weakest robot should work from the largest downwards, so as to make the job of the small robots easier. Also, there is never any reason for it to move fewer than S toys, unless it runs out. Now consider the 2nd-weakest robot. There will be extra toys it can handle, plus any light toys that the weakest robot didn't have time for. Since the weakest robot is finished, the difference in weights are irrelevant, and the 2nd-weakest robot should again work in decreasing order of size amongst the toys it can handle. The same process can be continued for the remaining weak robots.

Now consider the small robots, from largest to smallest. These can again take up to S toys, starting from the largest remaining one. If a robot is unable to handle the largest remaining toy, then S was too small.

Implementation can be done using a priority queue, implemented as a binary heap, representing toys that are light enough to be handled by the current robot and ordered with the largest at the head. The toys are initially sorted by weight. Each time a new weak robot is considered, new elements are added from the list of toys to the priority queue, and the robot then removes items starting from the head of the queue.

Assuming that T is larger than A, B, the running time will be O(T (log T)²): one log T for the binary search, the other for the priority queue operations. I think that it may be possible to reduce this to something like O(T·log T·log(max(A, B))) using the right sort of data structure for the priority queue (to allow S items to be removed in log time): something like an interval tree for the number of toys of each size.

Game

I disliked this problem because it has a nice solution that takes just a bit too much memory. I only managed to get 80% for it in the time I spent on it, and I didn't feel inspired to modify my solution to pass fully.

In 1D, this can be solved by a fairly straightforward use of a segment tree: each node stores the GCD of its two children. Since R can be quite big, this needs to be a sparse segment tree; another alternative would be a balanced binary tree.

In 2D, it is tempting to use a quadtree, but in fact that doesn't guarantee poly-logarithmic time. A 1xC query will force refinement down to the individual non-zero cell entries. Instead, we can use a range tree, which is a tree of trees: an outer tree is built over the columns, and for each column span corresponding to a node in this tree, we have an inner tree over the rows. Each node in this inner tree corresponds to a rectangle of the grid, and stores the GCD of elements from this rectangle. A query now uses the columns to select O(log C) nodes from the outer tree i.e., O(log C) inner trees, and applies a query to each of them. Queries thus take O(log R·log C) time when the implementation uses segment trees for the inner and outer trees. With balanced binary trees, it would only be O((log Nu)²).
Unfortunately, using segment trees also requires O(log R·log C) memory per non-zero element, which just exceeds the available memory. Using balanced binary trees instead should fit within memory, but is a lot more painful to implement. I think it might also be possible to make it work by sticking with segment trees, but compressing the representation by compacting chains of nodes with one child.

Wednesday, July 17, 2013

IOI 2013 day 1 analysis

I finally got around to actually solving the IOI day 1 tasks in the contest system - I'd had ideas, but no time to test them out. So here are my solutions (I haven't looked around for any other writeups, although no doubt they exist). I'm hoping to have time to tackle day 2 before the contest system is shut down.

Art class

This is a heuristic problem that can probably be solved in many ways. Since it was a hammer I have, I decided to hit it with a very simple wavelet-based spectral analysis. To find the highest-frequency components, downsample the image, upsample it again, and subtract it from the original. Then take the sum of squared values in the residual. Now start with the downsampled image and repeat recursively to get progressively lower frequencies. For the downsample and upsample I took a simple box filter. I kept 6 frequencies, since 2⁶ is less than the minimum image size.

For each style, I computed the mean and standard deviation of each of the 6 power values from the examples. To classify a picture, I sum up the squared differences from the means, with the differences scaled using the standard deviations. It's not 100% accurate, but good enough to get a perfect score.

Dreaming

This is a combination of a few common tree processing tasks. Firstly, the longest path might just be within one of the original trees, i.e., a tree diameter. This can be computed recursively on a tree by determining, for each node, the two longest paths downwards via different children (one or both can be zero if there are fewer than 2 children). The diameter path will have a highest node, and so the diameter will be the sum of these two lengths.

When add an edge to a tree, we must decide where to make the connection. The longest path from the connection point to anywhere in the tree ought to be as short as possible, and so for each point in the tree we need to know the distance to the furthest point. This is slightly more complex than before, since we also have to consider paths that start upwards. However, a second recursive walk (this time computing top-down instead of bottom-up) allows the shortest such paths to be found. For a given tree, let the radius be the distance from the optimal connection point to the furthest point in the tree.
Finally, we must decide how to connect the trees together. Sort the trees by decreasing radius r₁ > r₂ > .... Clearly, there will be a path of at least length r₁ + r₂ + L. If there at least three trees, they can't all be connected to each other, so there must also be a path of at least length r₂ + r₃ + 2L. Conversely, by connecting the first tree to every other tree (always using the optimal connection points), it is not hard to see that there are no other paths that can be longer than the worst of these.

Wombats

I found this to be the most difficult of the tasks. Apart from being conceptually difficult, it also required a reasonably amount of tuning, and my solution still takes over 10s in many cases.

The basis of the solution is to note that C is relatively small, and so it is feasible to precompute the costs to get from any point on row X to any point on row Y, for some X and Y. Let's write such a table as {X, Y}. What's less obvious is that it's possible to combine {X, Y} and {Y, Z} to produce {X, Z} in O(C²) time. The trick is to use the fact that optimal paths won't cross over each other. Thus, if i < j, (X, i) to (Z, j-1) goes via (Y, p), and (X, i+1) to (Z, j) goes via (Y, q), then the optimal path from (X, i) to (Z, j) will go via (Y, r) where p ≤ r ≤ q. By iterating in order of increasing j - i, it is possible to compute {X, Z} in quadratic time.

We can combine this observation with a segment tree: for each appropriate i and a, we maintain {a·2ⁱ, (a+1)·2ⁱ}, computing it either directly (i = 0) or by combining two smaller intervals as above (where the upper bound exceeds R - 1, it is clamped). Each change invalidates O(log R) of these, so the time to update after a change is O(C^2 log R). Queries can be answered in O(1) time using the root of the segment tree.

The catch with this approach is that it requires too much memory: we can't even afford R different {X, Y} tables. Instead of keeping {X, X+1} at the finest level of the segment tree, we can instead store, say, {10X, 10X+10} for each X, and use 1/10th the memory. The cost is that updating the base level of the segment tree will now take 10 times as long.

Sunday, February 24, 2013

Challenge 24 Electronic Contest

Here's how my team approached some of the problems in the online round of this year's Challenge 24. You can read the problems here.

A. Cutting back middle management

I didn't completely solve this: A10 was too big for my O(NM²) algorithm. Consider a generalised version of the problem, in which we want to know how many managers can be fired for each possible number of subordinates the CEO ends up managing (0 to M). This can be solved by dynamic programming. Suppose we have solved it for some tree, and wish to graft on another subtree to the root. Given L, the number of subordinates that will end up reporting to the root, we pick the number A that are due to the new subtree, with the remaining L - A coming from the original tree. There is a special case when A = 1, in which case the root of the subtree may be retained. Merging in a subtree in this way requires iterating over L and over A, so takes O(M²) time per edge and hence O(NM²) for the whole tree.

I optimised this slightly by noting that near the bottom of the tree, most values of A are impossible to achieve, and do not need to be iterated over. This is sufficient to get A9 (after running for quite a long time though), but not A10.

B. Requirements

Firstly, how does the number of manuals grow? In one iteration, K manuals generate K(K - 1) new manuals, which together with the originals makes K² manuals. Thus, after T iterations there will be K^{2^T} manuals. Next, which of them will be ultimate manuals? We can compute this using inclusion-exclusion: for each subset of requirements, count the number of manuals that are missing (at least) that set of requirements, and either add or subtract it from the total depending on the parity. The final manuals that are missing a set of requirements are simply those produced from the initial manuals missing those requirements. Counting the initial manuals for each subset can be achieved in O(3^N) time.

The only thing left is to figure out how to efficiently compute K^{2^T}modulo P (where P = 1,000,000,007). Since P is a prime, we can reduce 2^T module P-1 without affecting the result. This is all that is needed to solve all test cases in this problem.

C. Road roller

We didn't have time to do anything with this problem.

D. Octal CNC

We didn't solve this during the contest, but I wrote a solution afterwards using OpenCV. It consists of three parts:

Locating the symbols
Organising the symbols into rows
Identifying the symbols

To locate the symbols, I first eroded the image by 8 pixels (which makes the black blobs larger), thresholded the image to classify each pixel as ink or paper, and detected connected components. For each connected component I took the bounding box (corrected for the erosion) and used this in subsequent steps.

To organise the bounding boxes into rows, I processed them left-to-right (by left edge), and for each symbol picked the row it best matched (by difference in height compared to the last symbol in the row). If it fails some simple heuristics it starts its own row.

To identify the symbols I used a pretty crude image descriptor: I essentially downsampled the bounding box to a 3x3 image, and normalized the result (to compensate for the ink being lighter in some symbols). These 9 values worked fairly well but there were still some ambiguities, particularly with symbol 1. I added a tenth value, the logarithm of the bounding box aspect ratio. Descriptors were compared just on Euclidean distance.

With this approach I solved all test cases except 5 (where this is a checksum failure). I haven't yet investigated which stage failed for this test case.

E. Stack compressor

This stack language is quite difficult to work with, because there is no way to rotate the stack or otherwise make any changes below the TOS without losing data. I realised afterwards that one can just copy the numbers you need from lower in the stack and keep growing the stack indefinitely (well, up to the 20 million limit). During the contest I didn't think of this, which limited how I managed to

The simplest way to do text compression is Huffman coding. I start with a lot of PUSH instructions to put the coded text on the stack, backwards. In each 32-bit word I use only 30 bits. Bit 30 I set to 1 to use as a sentinel (to detect when to move to the next word), and I use only non-negative values (because C99 has no yucky semantics for negative mod). The encoding table is represented in code rather than data. At each state it does a divide by 2 to extract another bit and branches to a child state depending on whether the bit is zero or one. The leaves have an OUT instruction and a jump back to the start.

Putting the table into code causes quite a lot of code bloat. After the contest I made a different version that encoded the transitions into the stack, and which would have scored 100% on many of the test cases. However, there are still some test cases where the 100% goal is below the entropy limit for Huffman encoding, even if all 32 bits per word could be used. Getting 100% for those presumably requires some other encoding, such as encoding of repeated strings, or possibly making use of digraphs.

F. Backup communication

A little physics shows that the cost of a launch is proportional to the distance achieved. Thus, we need to find the point from which the sum of distances is maximised. This is difficult to do directly, but starting with the centroid and using Newton-Raphson iteration solves it easily.

G. Trains

We didn't manage to solve this one.

Thursday, September 27, 2012

IOI 2012 Day 2 analysis

Last Supper

Firstly, it's not even obvious how to simulate Leonardo's strategy efficiently. It can be done though. First, you need to be able to tell, for each request, what the next request of the same colour will be (we'll see why later). This can be computed by making a backwards pass over the data, while keeping track of the last seen request for each colour. Now we can implement Leonardo's strategy in a forward pass. Over time, let's keep track of the next time each colour will be needed. After each request, we need to update this for the just-requested colour to the point to the next instance for this colour. Each time Leonardo requests a paint colour that isn't on the scaffold, we can examine all the colours on the scaffold to determine which will be needed the furthest in the future. That would still require O(NK) time, but we can keep the next-needed-time for all the scaffold colours in a max priority queue. Since keys need to be changed as we go, a multi_set is a good choice in C++ for O(N log K) time.

With that in place, it's straightforward to implement the cases where we have at least ceil(log K) bits per request, because we can just encode the index of the colour on the scaffold to replace. However, to solve subtasks 4 and 5 you need something smarter. The key point is that you don't have to exactly implement Leonardo's algorithm. Let's do a thought experiment: Leonardo has two shelves on his scaffold, top and bottom. The top shelf contains the colours he will use at least once more before they're taken away. The bottom shelf contains the colours he knows will be taken away before they're used again. Since he knows the sequence of colours and he assumes his assistant follows his strategy, he can arrange the initial K paints on the right shelves, and every time he uses a paint he also puts it back on the right shelf. His assistant will only ever take away a colour from the bottom shelf, and Leonardo will only pick up paint either from his assistant or from the top shelf.

And now for the key observation: Leonardo never looks at the bottom shelf. Thus, it doesn't actually matter which colour the assistant takes off the bottom shelf. If he instead takes away an arbitrary colour from the bottom shelf, the set of top shelf paints will still be exactly as before, step after step. The only possible difference is that a requested colour might turn out now to already be on the bottom shelf, but this can't happen because it would contradict the optimality of Leonardo's strategy.

Thus, it is sufficient for the advice to provide enough information to know which colours are on which shelf, and the assistant can pick bottom-shelf colours arbitrary to take away. We need K bits to indicate the shelf for each of the initial K colours, and we need N bits to indicate which shelf each paint goes back on after it has been used. N + K bits is enough to solve all subtasks, and in fact subtask 5 can be solved with only 125000 bits.

Jousting tournament

There are a number of observations one can make. Firstly, let's say a knight is a candidate for a tournament if he either participates in the tournament or in another tournament whose winner was a candidate (recursively). Then it is easy to see that the candidates for a tournament form a contiguous interval of the original line of knights, and that the location of the interval is independent of the ranks of the knights. Computing this interval efficiently is not completely trivial, but let's come back to that and assume we know the candidate interval for every tournament. We can also see that the winner of any tournament is the candidate with highest rank.

Suppose the popular knight is a candidate for a tournament. We can immediately determine who all the other candidates for that tournament are, regardless of exactly where he is in the line. We can also compute whether he will win. Now, for each tournament going backwards in time, we can compute how many times he will win if that is his first tournament: if he loses it will be 0, otherwise it will be more one that the number for the tournament he advances to. We can also find the leftmost slot (if any) he could occupy that doesn't require advancement from a previous tournament. We then pick the tournament with the highest such score and take the leftmost slot (breaking ties leftwards as required).

Done naively this will require O(N²) time, which should suffice for the first 2 subtasks. We need a few operations to make this small enough to tackle the final subtask. Firstly, we need to be able to determine the candidate ranges. This can be done with a binary indexed tree. As the tournaments are played, store an array of N values in a BIT, where each value is 1 for knights that haven't competed or that had the lowest index (not rank) of all candidates in their last tournament, and 0 for all others. For a new tournament (S, E), the first candidate will be the first knight whose (inclusive) prefix sum is S+1, and the last candidate will be one to the left of the first knight whose (inclusive) prefix sum is E+2. Binary searching for a given prefix sum can be implemented in O(log N) time, but we also need to find the bits to zero out in the BIT. These can be found by keeping the locations of all 1 bits in a linked list. Each bit is zeroed at most once, so this process takes O(N log N) time.

For each tournament, we also need to know to which tournament the winner advances. This can be computed at the same time as the previous step, by storing the tournament index in the linked list and building a tournament tree as we go.

Finally, we need to be able to determine the winner of each tournament, assuming it contains the popular knight. For this we need to determine the maximum value within a range of values. This is the standard range max query problem, for which there are a number of O(N log N) and even O(N) algorithms to answer O(N) queries on N items. A simple one is to precompute for each element the maximum of the next 1, 2, 4, 8, ... elements in O(N log N) time, which then allows queries to be answered in O(1) time.

UPDATE: a commenter made a good observation: we don't care about the ranks of the other knights, only whether they are higher or lower than that of the popular knight. The range max queries can all be done on a boolean array, which is simpler. For example, by precomputing prefix sums one can determine the number of higher-ranked knights in an interval, and the popular knight will win a tournament if and only if the corresponding sum is zero.

Ideal City

This one had me totally stumped yesterday, but I woke up this morning with what I think is a solution. It depends on a conjecture. Let's call the x-weight of a path the number of horizontal steps, and similarly for y-weight. Let's call a path from A to B x-shortest if it has minimal x-weight, and let the x-distance from A to B be x-weight of this path (again, similarly for y-shortest and y-weight). I conjecture that the distance from A to B is the sum of the x-distance and the y-distance. Put another way, I conjecture that the shortest path from A to B is both x-shortest and y-shortest. I can't formally prove it, but the intuition is that any x-shortest path can be "straightened out" until it matches a shortest path, because there are no internal obstacles to get in the way.

Given that, we can separate the problem into finding the sum of x-distances and the sum of y-distances. Let's look at just the y-distances. The city can be divided into contiguous horizontal strips. Consider each such strip as a node in a graph, where two nodes are connected if the strips are directly connected. In fact, the graph must be a tree, since a cycle would imply that the city is not ideal. Also, travel within a strip is free when computing y-distance, so the y-distance between two blocks is equal to the number of tree edges between their corresponding nodes.

Computing the sum of all distances in a tree is a more conventional problem with conventional recursive techniques. For each subtree, compute the sum of all distances within the subtree, the number of blocks, and the sum of distances from every block to the root. When merging two trees at their roots, these values can be computed in O(1) time for the combined tree using the values for the original tree and the new subtree, so the process takes O(N) time.

UPDATE: I realised that this construction can also prove the conjecture (as did a commenter). Consider the shortest path, and assume it is not y-shortest. Project it onto the tree of horizontal strips. It is then not a shortest path through this tree, since all y steps are projected to steps in this tree. But a non-shortest path through a tree must visit some vertex twice, which means that the original path exits some strip and then re-enters it. We could then shorten the path by short-cutting directly between the exit and entry points along the path.

Wednesday, September 26, 2012

IOI 2012 day 1 analysis

I'm not at IOI this year, but since I didn't spot anything on the web I thought I'd post my thoughts on the Day 1 tasks. It looks like quite a tough problem set, although mostly because there are a lot of subtasks to think about and cases to handle.

UPDATE: The official webpage now has a page with hints. They seem to have essentially the same solutions as me for Crayfish Scrivener and Parachute Rings.

Crayfish Scrivener

This is a task that I proposed.

Half the challenge here is thinking about the problem in the right way. Rather than trying to undo undo's, just think of undo as a time machine: it takes you back to exactly the same state as you were N operations ago. So a simple implementation is just to maintain an array of strings through time, and whenever an undo is encountered, copy the string that was N steps ago.

That's going to use up too much memory for the later subtasks. However, there is a lot of duplication, because for every L command, the next string looks almost the same as the previous one. We can compress the whole thing by using a trie: a tree where each edge is a letter, and each string is a path from the root through the tree. Any string can be encoded as just a pointer to a node in this tree. Adding a letter requires finding the child node with a particular letter (creating it if it doesn't already exist). An undo requires just a copy of a previous pointer. Thus, each step can be simulated in O(1) time and space.

There is still the problem of answering queries. For subtasks where the queries all follow the commands, this is quite easy: just extract the final string once into an array, and answer queries from the array. For the final subtask something smarter is required: some extra information must be stored in each node. A start is to store the node's depth and its parent. Each query can be translated into a query for the kth ancestor of the current node, and the parent pointers can be used to ascend the tree. To make it fast enough, one can also store pointers to the 2ⁱ-th ancestor for all relevant i, which makes it possible to answer queries in O(log N) time. It also requires O(N log N) space: since the task description doesn't list the memory requirements I'm not sure if that will fit. It's also possible to store only the parent and the 2ⁱ-th ancestor where 2ⁱ is the largest power of 2 dividing the depth, for O(N) space.

Parachute Rings

For this problem, the trick was keeping track of all the cases. Firstly, it is quite easy to keep a histogram of the degree of each ring. We can also keep track of the components, and the number of vertices and edges in each component (this can be kept in the root of a standard union-find structure). Now, let's consider cases:

If there is a ring with at least 4 connections, it is the only possible critical ring, because removing any other ring will leave it with at least three connections. If there is more than one such ring, there can be no more critical rings.
If there is a ring with exactly 3 connections, only it and its neighbours can be critical rings. There are thus at most four candidates to consider.
If all rings have at most 2 connections, there could be many critical rings. Every component will be either a chain or a cycle, and these can be distinguished by comparing the number of edges and vertices in the component. If there are no cycles, all rings are critical. If there is one cycle, all rings in that cycle are critical. Otherwise there are no critical rings.

The last case can be checked as we go, but the others are a little more tricky because they identify only candidates, not definitely critical rings. However, at most four rings can become candidates during the lifetime of the program: the first to have at least three connections, plus those three neighbours. As soon as a ring becomes a candidate, we can replay all the previous connections on another union-find data structure with that ring removed, and thereafter maintain that structure so that it can be examined after future connections.

I've omitted one minor detail: we also need to maintain in each union-find structure whether there are 0, 1, or more cycles, and if 1, the size of the component with 1 cycle. This can quite easily be maintained as the connections are made. Thus, total memory is O(N) and total time is O(N α(N)).

Pebbling odometer

This seems like the hardest problem, if only because it takes the most time to work through all the different subtasks. I haven't tried to implement these, so my analysis may be a little off. In terms of implementation, the major annoyance of this language is the lack of subroutine calls. It is likely that you will want to write metaprograms to generate the programs, where a subroutine call is implemented by duplicating code and adjusting internal labels. There is also no mechanism for storing internal state, so state needs to be stored either in the instruction pointer (by duplicating code), or in the grid itself through counts of pebbles.

Subtask 1

Since we don't need to preserve the pebbles, we can just pair off pebbles until one of the piles is empty. Here's an outline of the process, ignoring all the details of moving around the grid.

If there is no pebble at (0, 0), then terminate at (0, 0).
Pick up a pebble from (0, 0).
If there is no pebble at (0, 1), then terminate at (0, 1).
Pick up a pebble from (0, 1).
Repeat.

Subtask 2

We can no longer destroy the state by picking up pebbles. However, we can just move them somewhere for safe keeping. Start by moving all the pebbles in (0, 0) to (1, 0) and from (0, 1) to (1, 1) (using a loop in each case). Then apply the algorithm for subtask 1, and once the result has been decided, move any remaining pebbles back to the top row. Note that there will need to be two code-paths for restoring the pebbles, depending on where you want to terminate.

Subtask 3

This should be doable by moving the pebbles towards each other until they meet. First move right until finding the first pebble. Then

Pick up a pebble.
Move forward one square.
If there is already a pebble on this square, terminate.
Add a pebble to this square.
Continue forward until reaching the other pebble.
Turn around.
Repeat.

Subtask 4

Here it gets a little trickier to tell whether the programs will actually fit, but I think this should work. Firstly, note that there is room for only three commands per cell on average. That seems too little even to do a move, a pebble test, a border test and a jump back to the start of a loop. However, loop unrolling can trade program size for execution length e.g. ((move, pebble) * 4, border, jump) to handle 4 cells gives an average of 2.5 steps per cell. I'm assuming a walk which goes alternatively left and right across rows.

That makes it possible to walk the board in few enough steps, but what do we do when we find a pebble? There probably isn't enough program size to store the number of pebbles in the program counter, so we need to move the pebbles around. We can move the pebble back to the start of the row, then restart the row from the next spot. Even if there are several pebbles in a row, we will still make forward progress. Once we've reached the end of the board, we can process the two extreme columns in a similar way to move all the pebbles up to the top-left corner. Moving pebbles will be expensive (each will cost up to a few thousand instructions, depending on implementation), but as there are at most 15 of them it is not a major issue.

Subtask 5

I've got a few ideas for this one, but nothing that I'm confident will fit into 444 instructions. The fundamental issue is that there is no way to save state in the board, since the initial board can be in any legal state and a valid solution cannot destroy information (because it has to match the final board). Thus, the only way to save state is in the program counter, which leads to massive code duplication. One is quite likely to need to store the minimum count.

This probably requires some very careful micro-optimisation: Johnny Ho's perfect score in fact takes 449 instructions, and is rounded up.

UPDATE: see the official hints page for a solution to subtask 5.

Saturday, May 14, 2011

The myth of "up" and "down" in computer graphics

Today is another of my rare technical rants, er, blog posts. It's a topic that I struggled with a number of years ago until I understood how to think about it properly, and which I still see people struggling with regularly.

The OpenGL coordinate system causes a lot of confusion because it uses a "bottom-left" origin, which is seen as "upside-down" relative to other coordinate systems typically used in window systems. This leads to people putting in lots of extra code to "flip images over" to make them right, often in the wrong part of a pipeline and requiring further reflections elsewhere to correct for it; as well as mis-guided requests for features in various APIs.

The way to stop your brain hurting is to stop labelling things as "up" or "down". Up and down are functions of an eyeball, and so it can only be directly applied to things that you see on a screen. Windows on a screen are visible, and this is the one place where OpenGL really is a bottom-up coordinate system. However, textures are not directly visible on the screen, either for normal sampling or for render-to-texture, so the use of up and down in the specification is purely a convenience for describing the behaviour. When uploading a texture, the first texel that is provided has texel coordinates (0, 0), and when this texture is treated as a rendertarget, that same texel has window coordinates (0, 0). This is true in both OpenGL and in Direct3D - there's no need to make any changes.

Things get more complicated at interface boundaries with other formats that do specify an up and a down. For example, PNG files do not have a standard coordinate system, but they do have a well-defined top and a bottom. So if a PNG file is used as a texture, where should the coordinate system origin be placed? That's not obvious, and needs to be consistent for an entire toolchain e.g. if your 3D modelling package shows you previews in which texture coordinates of (0, 0) map to the bottom-left of your image, then you should probably do the same thing in an application that consumes those models. Note that not all image file formats work the same way: formats created specifically for use with textures (e.g. KTX) may specify an origin rather than an up/down orientation (KTX also has an optional hint to tell viewer/editor applications how to display the image). The asset format may also provide a convention e.g. COLLADA explicitly indicates that (0, 0) corresponds to the lower-left corner of a texture image.

So, to summarize:

Avoid using the terms "up" and "down" where they are not absolutely necessary. They'll just confuse you.
Correct applications never flip images "upside-down" - they just sometimes have to re-arrange pixels in memory to conform to an interface. An upside-down image is a bug.
When defining interfaces between systems which have a defined "up" and "down" (e.g. a PNG file) and systems which have a defined coordinate system (e.g. OpenGL textures), make sure you know what the correspondence is (using precedent set by thirdparty tools or file formats where possible), then stick it to throughout your toolchain.

Hopefully this will reduce the amount of confusion in the world.

Sunday, March 27, 2011

Going home!

Yes, after 3 years of cold, cloud, not quite so cold, travel, cloud, work, pub lunches, bad pizza, more cloud, rain, snow, more cloud, more work, more travel etc etc, I'M GOING HOME! That's right - as of 5 May, I'll be back in the Mother City, back to my old haunts with my old friends and enjoying and missing my Cambridge friends, instead of the other way around.

I'll be doing a post-doc in the computer science department at UCT. When people ask me exactly what it is I'm going to do I find it hard to explain (possibly because I don't precisely know yet myself), but you can read the job advert here.

I've decided that nanny states are a real pain. Time was, people just made sure they had something put away for when they couldn't work any more. Now governments have very complicated rules by which you get a tax break if you put money into a pension scheme which you then can't withdraw from and can only spend in certain ways, and it creates a huge amount of red tape. Frankly dealing with red tape is the most stressful part about moving back - the actual moving is stressful but a doddle by comparison.

Anyway, if you're friend/acquaintance in the UK and want to see me before I leave, get in touch; and if you're a Cape Town friend/acquaintance I'll see you soon!

Saturday, September 25, 2010

Bored in Vegas

As usual, my advice on Vegas is, don't bother. This trip, I've gotten sick (something flu-like), been fed far too much, sucked in a lot of signature smoke, been too hot outside and too cold inside, and now my flight home is indefinitely delayed due to mechanical problems (I'm now in the airport).

Unfortunately, I'm coming back in 2 weeks for the TopCoder Open. Oh well.

Sunday, September 05, 2010

Visualising sorting algorithms

This is another one of my rare technical posts, as opposed to news of which countries I've been visiting.

If you're in computer science, you've probably seen an animation of sorting algorithms, maybe heard a rendition, or seen a visual representation. I have, somewhat by accident, discovered a different way to visualise a sorting algorithm: plot points for memory accesses, with address on the X axis and time (counted by accesses) on the Y axis, and different colours for reads and writes. It produces some rather pretty pictures. Note that these are not to scale relative to each other - the Y axis has been compressed to fit the entire sort into a fixed height.

Ye olde bubblesort. Some of the patterns are an optical illusion due to aliasing, but the green spikes are a feature of the algorithm.

Insertion sort - the version optimized for mostly sorted content. Although the data is random, you can see that in many cases it reduces the search distance.

Shellsort, clearly showing the phases.

Selection sort:

Heapsort: the solid lines at the top are the heap-building phase, while the rest shows the extraction. Note the very slight slope to the bottom-right line: as the heap gets smaller, the heap extraction gets faster, but only as O(log N).

Divide-and-conquer algorithms have a pretty fractal nature. This is quicksort - the perturbations in the fractal indicate the random selection of pivots (it just picks the middle, rather than median-of-3).

Mergesort: this diagram is twice as wide as the others because it uses temporary storage on the right.

Monday, May 03, 2010

Budapest (again)

Yes, it's blog time again! Once again, it was the 24-hour Challenge in Budapest.

Like last year, there were some very interesting and tough problems. This year they did a much better job of implementation, and there were far fewer issues with things crashing and so it was more enjoyable. Unfortunately we didn't do as well as last year, but we still managed to come 5th.

After the contest, we finally got our cruise on the Danube. This time we came better prepared with info on where to catch the tourist boats. It turns out we were just looking in the wrong area, and if you go to the right place there are scores of them.

Friday, April 16, 2010

IT Challenge over

It ran smoothly, so not much to report. Stellenbosch won, UKZN came second and UCT third. Now I get a week of holiday in Cape Town; too bad half my friends have moved to other places.

Saturday, April 10, 2010

IT Challenge 2010

I'm off to South Africa next week to help run the finals of the Standard Bank IT Challenge. This is a contest for teams of 4 university students. We've already run the heats to select one team from each of 9 universities; on Thursday the teams will meet at Standard Bank headquarters to do the final.

It's been some crazy hard work (including just about every free moment for the last week or two), but it should be a really great contest. We're hoping to have live standings during the contest (although we'll stop them an hour before the end) at http://sbitc2010.dyndns.org/ (yes, that link won't work right now, and probably won't after the contest either).

After that it's off to Cape Town for a week of hard-earned relaxation.

Wednesday, November 18, 2009

All Jammed up

Following my usual trend of only blogging when I travel and have some pictures I need to post, here's some photos from my last trip to the Googleplex for Google Code Jam. I was somewhat off form, but still managed to make 10th. If you want to see some seriously scary problems, take a look at the site. There is also an article about it here, including a video showing some of my ugly mug.

Sunday, August 16, 2009

Photos from Bulgaria

I'm back in England. The contest was great fun and I'm going to miss the Bulgarian warmth and sunshine.

Wednesday, August 12, 2009

How not to be seen

Don't turn up.

That seems to be the view of most of the people I've talked to about the IOI planned excursion to the Black Sea. It apparently involves 4-6 hours on a bus. Each way. Not including the time you spend waiting around at the beginning until everyone gets on the bus. And leaving at 6am. When the time comes, I for one won't be found wanting. I won't be found at all, since I will be sleeping the sleep of the just, or possible just very tired.

Apart from the attempt to inflict long periods of time on buses (which in my case normally consists of periods of boredom interspersed by vomiting), things have been going pretty well here - in fact far smoother than usual. The SA team also did pretty respectably on the first day, and the second day is finishing in a few minutes. I haven't been on any of the excursions so far (being on the scientific committee seems to involve lots of work, but it's all been fun), but I plan to take myself off to the old town tomorrow (hopefully with some other leaders to make it more fun).

Saturday, August 08, 2009

Bulgaria, land of enormous hotel rooms

It's blog time again! As usual, it's because I'm travelling - this time to IOI again, in Bulgaria. Haven't been here long, but so far I've been blown away by the size of the hotel room. More news when I've actually been out and about more.