Saturday, November 23, 2013

Fun with C++11 variadic templates

The other day I decided to experiment a bit with the support for variadic template support in C++11 (the new C++ standard). Like many C++ features, they're incredibly powerful, but not always that easy to use. After some experimentation I figured out how to do what I wanted, and I thought I'd write it up in case anyone else wanted to try to do the same thing.

I wanted to create a fixed-size vector class (similar to std::array), but with a constructor that accepts the appropriate number of arguments. So for example, Vec should have a constructor that accepts 3 floats.

The simplest solution is to use initializer lists, which allow a constructor to take an arbitrary sequence of arguments of the same type. However, this has a number of disadvantages:
  1. It can only be used with brace-initializer syntax, not parentheses.
  2. It does not enforce the correct length at compile time.
Variadic templates allow a function to take a variable number of arguments. My second attempt at a constructor started by looking something like this:

template Vec(Args&&... args);

This is just a constructor that accepts an arbitrary set of arguments, so it doesn't seem to fix the second problem, and additionally it doesn't even require the types to be correct. We can fix that using SFINAE to statically validate the arguments before the template is instantiated. The tricky part is figuring out where to insert the enable_if: the constructor has no return type, so it can't go there; and we can't add a default argument, because that makes the interpretation of the variadic argument list ambiguous. However, we can add an extra template parameter with a default value:

template
         typename E = typename enable_if_all
             std::is_convertible::value...>::type>
Vec(Args&&... args);

Here enable_if_all is a variadic version of enable_if, which has a type member if all its arguments are true. It's not part of the standard, so I had to implement it myself. I won't go into details since this isn't the solution I used. It works, but it's roundabout and I'm not convinced that it won't differ in some subtle ways from a constructor that actually accepts only type Twhen in the presence of other constructors (because every parameter will match exactly, where as a constructor with float parameters might require a conversion sequence which would make this a worse match than another constructor).

In the end I realised that the solution was to create a template parameter pack with N copies of T. At first I thought this couldn't be done, because parameter packs are not types and so one cannot typedef one in a helper class to be expanded in the constructor. The solution is to create it in the vector class itself, using recursion to generate the appropriate type. You can see the code here. The VecBase class is templated with the type T, followed by N copies of T as a variadic parameter pack. Thus, for example, VecBase is a vector with 3 ints. This makes it easy to write the constructor, since we just expand the parameter pack.

The tricky part is that we really want to write Vec rather than VecBase. This is where VecHelper comes in: it recursively adds copies of T to the type name. Thus, VecHelper::type is typedefed to VecHelper::type, which in turn is typedefed to VecHelper and finally VecHelper. This is the base case that is handled in the partial specialization at line 29 to be VecBase.

Finally, we make use of the new using feature to typedef Vec to be VecHelper::type, and the problem is solved.

Friday, July 19, 2013

IOI 2013 day 2 analysis

Cave

This is a nice problem following a recent IOI trend of having problems where the limiting factor is not execution time, but some abstract measure of efficiency. This is slightly easier than some of the previous problems in each vein.
To achieve 100%, we can use up to 14 queries per door. Since 214 is slightly more than the number of doors, this strongly suggests some form of binary search on each door. It will be easiest if we attack the doors in order. Once we know which switch and position opens a door, we can lock that switch in place so that we will always be able to see the next door, and thereafter pretend the switch does not even exist.
To solve for a door, we can start with all switches down, which will immediately tell us which switch position opens the door. At this point every switch is a candidate. We can then test with half the candidates down and half up, which will eliminate half the candidates. This process is then repeated until only one candidate remains.

Robots

As is commonly the case for problems that ask how long some agents need to achieve a goal, the answer can be found by a binary search on the answer. So we need to decide whether the robots can clear the floor in S seconds.
We can simplify the problem slightly by noting that for each toy, we only need to know how many weak and how many small robots are able to move it, which can be found by binary searching the two lists (after sorting them). Of course, if a toy cannot be moved by any robot, return ­ -1.
Let's first decide the actions of the weak robots, starting from the weakest. There will be some set of toys it can handle. Since they effectively differ only in size, the weakest robot should work from the largest downwards, so as to make the job of the small robots easier. Also, there is never any reason for it to move fewer than S toys, unless it runs out. Now consider the 2nd-weakest robot. There will be extra toys it can handle, plus any light toys that the weakest robot didn't have time for. Since the weakest robot is finished, the difference in weights are irrelevant, and the 2nd-weakest robot should again work in decreasing order of size amongst the toys it can handle. The same process can be continued for the remaining weak robots.
Now consider the small robots, from largest to smallest. These can again take up to S toys, starting from the largest remaining one. If a robot is unable to handle the largest remaining toy, then S was too small.
Implementation can be done using a priority queue, implemented as a binary heap, representing toys that are light enough to be handled by the current robot and ordered with the largest at the head. The toys are initially sorted by weight. Each time a new weak robot is considered, new elements are added from the list of toys to the priority queue, and the robot then removes items starting from the head of the queue.
Assuming that T is larger than A, B, the running time will be O(T (log T)2): one log T for the binary search, the other for the priority queue operations. I think that it may be possible to reduce this to something like O(T·log T·log(max(A, B))) using the right sort of data structure for the priority queue (to allow S items to be removed in log time): something like an interval tree for the number of toys of each size.

Game

I disliked this problem because it has a nice solution that takes just a bit too much memory. I only managed to get 80% for it in the time I spent on it, and I didn't feel inspired to modify my solution to pass fully.
In 1D, this can be solved by a fairly straightforward use of a segment tree: each node stores the GCD of its two children. Since R can be quite big, this needs to be a sparse segment tree; another alternative would be a balanced binary tree.
In 2D, it is tempting to use a quadtree, but in fact that doesn't guarantee poly-logarithmic time. A 1xC query will force refinement down to the individual non-zero cell entries. Instead, we can use a range tree, which is a tree of trees: an outer tree is built over the columns, and for each column span corresponding to a node in this tree, we have an inner tree over the rows. Each node in this inner tree corresponds to a rectangle of the grid, and stores the GCD of elements from this rectangle. A query now uses the columns to select O(log C) nodes from the outer tree i.e., O(log C) inner trees, and applies a query to each of them. Queries thus take O(log R·log C) time when the implementation uses segment trees for the inner and outer trees. With balanced binary trees, it would only be O((log Nu)2).
Unfortunately, using segment trees also requires O(log R·log C) memory per non-zero element, which just exceeds the available memory. Using balanced binary trees instead should fit within memory, but is a lot more painful to implement. I think it might also be possible to make it work by sticking with segment trees, but compressing the representation by compacting chains of nodes with one child.

Wednesday, July 17, 2013

IOI 2013 day 1 analysis

I finally got around to actually solving the IOI day 1 tasks in the contest system - I'd had ideas, but no time to test them out. So here are my solutions (I haven't looked around for any other writeups, although no doubt they exist). I'm hoping to have time to tackle day 2 before the contest system is shut down.

Art class

This is a heuristic problem that can probably be solved in many ways. Since it was a hammer I have, I decided to hit it with a very simple wavelet-based spectral analysis. To find the highest-frequency components, downsample the image, upsample it again, and subtract it from the original. Then take the sum of squared values in the residual. Now start with the downsampled image and repeat recursively to get progressively lower frequencies. For the downsample and upsample I took a simple box filter. I kept 6 frequencies, since 26 is less than the minimum image size.
For each style, I computed the mean and standard deviation of each of the 6 power values from the examples. To classify a picture, I sum up the squared differences from the means, with the differences scaled using the standard deviations. It's not 100% accurate, but good enough to get a perfect score.

Dreaming

This is a combination of a few common tree processing tasks. Firstly, the longest path might just be within one of the original trees, i.e., a tree diameter. This can be computed recursively on a tree by determining, for each node, the two longest paths downwards via different children (one or both can be zero if there are fewer than 2 children). The diameter path will have a highest node, and so the diameter will be the sum of these two lengths.
When add an edge to a tree, we must decide where to make the connection. The longest path from the connection point to anywhere in the tree ought to be as short as possible, and so for each point in the tree we need to know the distance to the furthest point. This is slightly more complex than before, since we also have to consider paths that start upwards. However, a second recursive walk (this time computing top-down instead of bottom-up) allows the shortest such paths to be found. For a given tree, let the radius be the distance from the optimal connection point to the furthest point in the tree.
Finally, we must decide how to connect the trees together. Sort the trees by decreasing radius r1 > r2 > .... Clearly, there will be a path of at least length r1 + r2 + L. If there at least three trees, they can't all be connected to each other, so there must also be a path of at least length r2 + r3 + 2L. Conversely, by connecting the first tree to every other tree (always using the optimal connection points), it is not hard to see that there are no other paths that can be longer than the worst of these.

Wombats

I found this to be the most difficult of the tasks. Apart from being conceptually difficult, it also required a reasonably amount of tuning, and my solution still takes over 10s in many cases.
The basis of the solution is to note that C is relatively small, and so it is feasible to precompute the costs to get from any point on row X to any point on row Y, for some X and Y. Let's write such a table as {X, Y}. What's less obvious is that it's possible to combine {X, Y} and {Y, Z} to produce {X, Z} in O(C2) time. The trick is to use the fact that optimal paths won't cross over each other. Thus, if i < j, (X, i) to (Z, j-1) goes via (Y, p), and (X, i+1) to (Z, j) goes via (Y, q), then the optimal path from (X, i) to (Z, j) will go via (Y, r) where p ≤ r ≤ q. By iterating in order of increasing j - i, it is possible to compute {X, Z} in quadratic time.
We can combine this observation with a segment tree: for each appropriate i and a, we maintain {a·2i, (a+1)·2i}, computing it either directly (i = 0) or by combining two smaller intervals as above (where the upper bound exceeds R - 1, it is clamped). Each change invalidates O(log R) of these, so the time to update after a change is O(C^2 log R). Queries can be answered in O(1) time using the root of the segment tree.
The catch with this approach is that it requires too much memory: we can't even afford R different {X, Y} tables. Instead of keeping {X, X+1} at the finest level of the segment tree, we can instead store, say, {10X, 10X+10} for each X, and use 1/10th the memory. The cost is that updating the base level of the segment tree will now take 10 times as long.

Sunday, February 24, 2013

Challenge 24 Electronic Contest

Here's how my team approached some of the problems in the online round of this year's Challenge 24. You can read the problems here.

A. Cutting back middle management

I didn't completely solve this: A10 was too big for my O(NM2) algorithm. Consider a generalised version of the problem, in which we want to know how many managers can be fired for each possible number of subordinates the CEO ends up managing (0 to M). This can be solved by dynamic programming. Suppose we have solved it for some tree, and wish to graft on another subtree to the root. Given L, the number of subordinates that will end up reporting to the root, we pick the number A that are due to the new subtree, with the remaining L - A coming from the original tree. There is a special case when A = 1, in which case the root of the subtree may be retained. Merging in a subtree in this way requires iterating over L and over A, so takes O(M2) time per edge and hence O(NM2) for the whole tree.

I optimised this slightly by noting that near the bottom of the tree, most values of A are impossible to achieve, and do not need to be iterated over. This is sufficient to get A9 (after running for quite a long time though), but not A10.

B. Requirements

Firstly, how does the number of manuals grow? In one iteration, K manuals generate K(K - 1) new manuals, which together with the originals makes K2 manuals. Thus, after T iterations there will be K2T manuals. Next, which of them will be ultimate manuals? We can compute this using inclusion-exclusion: for each subset of requirements, count the number of manuals that are missing (at least) that set of requirements, and either add or subtract it from the total depending on the parity. The final manuals that are missing a set of requirements are simply those produced from the initial manuals missing those requirements. Counting the initial manuals for each subset can be achieved in O(3N) time.

The only thing left is to figure out how to efficiently compute K2modulo P (where P = 1,000,000,007). Since P is a prime, we can reduce 2T module P-1 without affecting the result. This is all that is needed to solve all test cases in this problem.

C. Road roller

We didn't have time to do anything with this problem.

D. Octal CNC

We didn't solve this during the contest, but I wrote a solution afterwards using OpenCV. It consists of three parts:
  1. Locating the symbols
  2. Organising the symbols into rows
  3. Identifying the symbols
To locate the symbols, I first eroded the image by 8 pixels (which makes the black blobs larger), thresholded the image to classify each pixel as ink or paper, and detected connected components. For each connected component I took the bounding box (corrected for the erosion) and used this in subsequent steps.

To organise the bounding boxes into rows, I processed them left-to-right (by left edge), and for each symbol picked the row it best matched (by difference in height compared to the last symbol in the row). If it fails some simple heuristics it starts its own row.

To identify the symbols I used a pretty crude image descriptor: I essentially downsampled the bounding box to a 3x3 image, and normalized the result (to compensate for the ink being lighter in some symbols). These 9 values worked fairly well but there were still some ambiguities, particularly with symbol 1. I added a tenth value, the logarithm of the bounding box aspect ratio. Descriptors were compared just on Euclidean distance.

With this approach I solved all test cases except 5 (where this is a checksum failure). I haven't yet investigated which stage failed for this test case.

E. Stack compressor

This stack language is quite difficult to work with, because there is no way to rotate the stack or otherwise make any changes below the TOS without losing data. I realised afterwards that one can just copy the numbers you need from lower in the stack and keep growing the stack indefinitely (well, up to the 20 million limit). During the contest I didn't think of this, which limited how I managed to 

The simplest way to do text compression is Huffman coding. I start with a lot of PUSH instructions to put the coded text on the stack, backwards. In each 32-bit word I use only 30 bits. Bit 30 I set to 1 to use as a sentinel (to detect when to move to the next word), and I use only non-negative values (because C99 has no yucky semantics for negative mod). The encoding table is represented in code rather than data. At each state it does a divide by 2 to extract another bit and branches to a child state depending on whether the bit is zero or one. The leaves have an OUT instruction and a jump back to the start.

Putting the table into code causes quite a lot of code bloat. After the contest I made a different version that encoded the transitions into the stack, and which would have scored 100% on many of the test cases. However, there are still some test cases where the 100% goal is below the entropy limit for Huffman encoding, even if all 32 bits per word could be used. Getting 100% for those presumably requires some other encoding, such as encoding of repeated strings, or possibly making use of digraphs.

F. Backup communication

A little physics shows that the cost of a launch is proportional to the distance achieved. Thus, we need to find the point from which the sum of distances is maximised. This is difficult to do directly, but starting with the centroid and using Newton-Raphson iteration solves it easily.

G. Trains

We didn't manage to solve this one.