Sunday, February 24, 2013

Challenge 24 Electronic Contest

Here's how my team approached some of the problems in the online round of this year's Challenge 24. You can read the problems here.

A. Cutting back middle management

I didn't completely solve this: A10 was too big for my O(NM2) algorithm. Consider a generalised version of the problem, in which we want to know how many managers can be fired for each possible number of subordinates the CEO ends up managing (0 to M). This can be solved by dynamic programming. Suppose we have solved it for some tree, and wish to graft on another subtree to the root. Given L, the number of subordinates that will end up reporting to the root, we pick the number A that are due to the new subtree, with the remaining L - A coming from the original tree. There is a special case when A = 1, in which case the root of the subtree may be retained. Merging in a subtree in this way requires iterating over L and over A, so takes O(M2) time per edge and hence O(NM2) for the whole tree.

I optimised this slightly by noting that near the bottom of the tree, most values of A are impossible to achieve, and do not need to be iterated over. This is sufficient to get A9 (after running for quite a long time though), but not A10.

B. Requirements

Firstly, how does the number of manuals grow? In one iteration, K manuals generate K(K - 1) new manuals, which together with the originals makes K2 manuals. Thus, after T iterations there will be K2T manuals. Next, which of them will be ultimate manuals? We can compute this using inclusion-exclusion: for each subset of requirements, count the number of manuals that are missing (at least) that set of requirements, and either add or subtract it from the total depending on the parity. The final manuals that are missing a set of requirements are simply those produced from the initial manuals missing those requirements. Counting the initial manuals for each subset can be achieved in O(3N) time.

The only thing left is to figure out how to efficiently compute K2modulo P (where P = 1,000,000,007). Since P is a prime, we can reduce 2T module P-1 without affecting the result. This is all that is needed to solve all test cases in this problem.

C. Road roller

We didn't have time to do anything with this problem.

D. Octal CNC

We didn't solve this during the contest, but I wrote a solution afterwards using OpenCV. It consists of three parts:
  1. Locating the symbols
  2. Organising the symbols into rows
  3. Identifying the symbols
To locate the symbols, I first eroded the image by 8 pixels (which makes the black blobs larger), thresholded the image to classify each pixel as ink or paper, and detected connected components. For each connected component I took the bounding box (corrected for the erosion) and used this in subsequent steps.

To organise the bounding boxes into rows, I processed them left-to-right (by left edge), and for each symbol picked the row it best matched (by difference in height compared to the last symbol in the row). If it fails some simple heuristics it starts its own row.

To identify the symbols I used a pretty crude image descriptor: I essentially downsampled the bounding box to a 3x3 image, and normalized the result (to compensate for the ink being lighter in some symbols). These 9 values worked fairly well but there were still some ambiguities, particularly with symbol 1. I added a tenth value, the logarithm of the bounding box aspect ratio. Descriptors were compared just on Euclidean distance.

With this approach I solved all test cases except 5 (where this is a checksum failure). I haven't yet investigated which stage failed for this test case.

E. Stack compressor

This stack language is quite difficult to work with, because there is no way to rotate the stack or otherwise make any changes below the TOS without losing data. I realised afterwards that one can just copy the numbers you need from lower in the stack and keep growing the stack indefinitely (well, up to the 20 million limit). During the contest I didn't think of this, which limited how I managed to 

The simplest way to do text compression is Huffman coding. I start with a lot of PUSH instructions to put the coded text on the stack, backwards. In each 32-bit word I use only 30 bits. Bit 30 I set to 1 to use as a sentinel (to detect when to move to the next word), and I use only non-negative values (because C99 has no yucky semantics for negative mod). The encoding table is represented in code rather than data. At each state it does a divide by 2 to extract another bit and branches to a child state depending on whether the bit is zero or one. The leaves have an OUT instruction and a jump back to the start.

Putting the table into code causes quite a lot of code bloat. After the contest I made a different version that encoded the transitions into the stack, and which would have scored 100% on many of the test cases. However, there are still some test cases where the 100% goal is below the entropy limit for Huffman encoding, even if all 32 bits per word could be used. Getting 100% for those presumably requires some other encoding, such as encoding of repeated strings, or possibly making use of digraphs.

F. Backup communication

A little physics shows that the cost of a launch is proportional to the distance achieved. Thus, we need to find the point from which the sum of distances is maximised. This is difficult to do directly, but starting with the centroid and using Newton-Raphson iteration solves it easily.

G. Trains

We didn't manage to solve this one.

Thursday, September 27, 2012

IOI 2012 Day 2 analysis

Last Supper

Firstly, it's not even obvious how to simulate Leonardo's strategy efficiently. It can be done though. First, you need to be able to tell, for each request, what the next request of the same colour will be (we'll see why later). This can be computed by making a backwards pass over the data, while keeping track of the last seen request for each colour. Now we can implement Leonardo's strategy in a forward pass. Over time, let's keep track of the next time each colour will be needed. After each request, we need to update this for the just-requested colour to the point to the next instance for this colour. Each time Leonardo requests a paint colour that isn't on the scaffold, we can examine all the colours on the scaffold to determine which will be needed the furthest in the future. That would still require O(NK) time, but we can keep the next-needed-time for all the scaffold colours in a max priority queue. Since keys need to be changed as we go, a multi_set is a good choice in C++ for O(N log K) time.

With that in place, it's straightforward to implement the cases where we have at least ceil(log K) bits per request, because we can just encode the index of the colour on the scaffold to replace. However, to solve subtasks 4 and 5 you need something smarter. The key point is that you don't have to exactly implement Leonardo's algorithm. Let's do a thought experiment: Leonardo has two shelves on his scaffold, top and bottom. The top shelf contains the colours he will use at least once more before they're taken away. The bottom shelf contains the colours he knows will be taken away before they're used again. Since he knows the sequence of colours and he assumes his assistant follows his strategy, he can arrange the initial K paints on the right shelves, and every time he uses a paint he also puts it back on the right shelf. His assistant will only ever take away a colour from the bottom shelf, and Leonardo will only pick up paint either from his assistant or from the top shelf.

And now for the key observation: Leonardo never looks at the bottom shelf. Thus, it doesn't actually matter which colour the assistant takes off the bottom shelf. If he instead takes away an arbitrary colour from the bottom shelf, the set of top shelf paints will still be exactly as before, step after step. The only possible difference is that a requested colour might turn out now to already be on the bottom shelf, but this can't happen because it would contradict the optimality of Leonardo's strategy.

Thus, it is sufficient for the advice to provide enough information to know which colours are on which shelf, and the assistant can pick bottom-shelf colours arbitrary to take away. We need K bits to indicate the shelf for each of the initial K colours, and we need N bits to indicate which shelf each paint goes back on after it has been used. N + K bits is enough to solve all subtasks, and in fact subtask 5 can be solved with only 125000 bits.

Jousting tournament

There are a number of observations one can make. Firstly, let's say a knight is a candidate for a tournament if he either participates in the tournament or in another tournament whose winner was a candidate (recursively). Then it is easy to see that the candidates for a tournament form a contiguous interval of the original line of knights, and that the location of the interval is independent of the ranks of the knights. Computing this interval efficiently is not completely trivial, but let's come back to that and assume we know the candidate interval for every tournament. We can also see that the winner of any tournament is the candidate with highest rank.

Suppose the popular knight is a candidate for a tournament. We can immediately determine who all the other candidates for that tournament are, regardless of exactly where he is in the line. We can also compute whether he will win. Now, for each tournament going backwards in time, we can compute how many times he will win if that is his first tournament: if he loses it will be 0, otherwise it will be more one that the number for the tournament he advances to. We can also find the leftmost slot (if any) he could occupy that doesn't require advancement from a previous tournament. We then pick the tournament with the highest such score and take the leftmost slot (breaking ties leftwards as required).

Done naively this will require O(N2) time, which should suffice for the first 2 subtasks. We need a few  operations to make this small enough to tackle the final subtask. Firstly, we need to be able to determine the candidate ranges. This can be done with a binary indexed tree. As the tournaments are played, store an array of N values in a BIT, where each value is 1 for knights that haven't competed or that had the lowest index (not rank) of all candidates in their last tournament, and 0 for all others. For a new tournament (S, E), the first candidate will be the first knight whose (inclusive) prefix sum is S+1, and the last candidate will be one to the left of the first knight whose (inclusive) prefix sum is E+2. Binary searching for a given prefix sum can be implemented in O(log N) time, but we also need to find the bits to zero out in the BIT. These can be found by keeping the locations of all 1 bits in a linked list. Each bit is zeroed at most once, so this process takes O(N log N) time.

For each tournament, we also need to know to which tournament the winner advances. This can be computed at the same time as the previous step, by storing the tournament index in the linked list and building a tournament tree as we go.

Finally, we need to be able to determine the winner of each tournament, assuming it contains the popular knight. For this we need to determine the maximum value within a range of values. This is the standard range max query problem, for which there are a number of O(N log N) and even O(N) algorithms to answer O(N) queries on N items. A simple one is to precompute for each element the maximum of the next 1, 2, 4, 8, ... elements in O(N log N) time, which then allows queries to be answered in O(1) time.

UPDATE:  a commenter made a good observation: we don't care about the ranks of the other knights, only whether they are higher or lower than that of the popular knight. The range max queries can all be done on a boolean array, which is simpler. For example, by precomputing prefix sums one can determine the number of higher-ranked knights in an interval, and the popular knight will win a tournament if and only if the corresponding sum is zero.

Ideal City

This one had me totally stumped yesterday, but I woke up this morning with what I think is a solution. It depends on a conjecture. Let's call the x-weight of a path the number of horizontal steps, and similarly for y-weight. Let's call a path from A to B x-shortest if it has minimal x-weight, and let the x-distance from A to B be x-weight of this path (again, similarly for y-shortest and y-weight). I conjecture that the distance from A to B is the sum of the x-distance and the y-distance. Put another way, I conjecture that the shortest path from A to B is both x-shortest and y-shortest. I can't formally prove it, but the intuition is that any x-shortest path can be "straightened out" until it matches a shortest path, because there are no internal obstacles to get in the way.
Given that, we can separate the problem into finding the sum of x-distances and the sum of y-distances. Let's look at just the y-distances. The city can be divided into contiguous horizontal strips. Consider each such strip as a node in a graph, where two nodes are connected if the strips are directly connected. In fact, the graph must be a tree, since a cycle would imply that the city is not ideal. Also, travel within a strip is free when computing y-distance, so the y-distance between two blocks is equal to the number of tree edges between their corresponding nodes.
Computing the sum of all distances in a tree is a more conventional problem with conventional recursive techniques. For each subtree, compute the sum of all distances within the subtree, the number of blocks, and the sum of distances from every block to the root. When merging two trees at their roots, these values can be computed in O(1) time for the combined tree using the values for the original tree and the new subtree, so the process takes O(N) time.

UPDATE:  I realised that this construction can also prove the conjecture (as did a commenter). Consider the shortest path, and assume it is not y-shortest. Project it onto the tree of horizontal strips. It is then not a shortest path through this tree, since all y steps are projected to steps in this tree. But a non-shortest path through a tree must visit some vertex twice, which means that the original path exits some strip and then re-enters it. We could then shorten the path by short-cutting directly between the exit and entry points along the path.

Wednesday, September 26, 2012

IOI 2012 day 1 analysis

I'm not at IOI this year, but since I didn't spot anything on the web I thought I'd post my thoughts on the Day 1 tasks. It looks like quite a tough problem set, although mostly because there are a lot of subtasks to think about and cases to handle.

UPDATE: The official webpage now has a page with hints. They seem to have essentially the same solutions as me for Crayfish Scrivener and Parachute Rings.

Crayfish Scrivener

This is a task that I proposed.

Half the challenge here is thinking about the problem in the right way. Rather than trying to undo undo's, just think of undo as a time machine: it takes you back to exactly the same state as you were N operations ago. So a simple implementation is just to maintain an array of strings through time, and whenever an undo is encountered, copy the string that was N steps ago.

That's going to use up too much memory for the later subtasks. However, there is a lot of duplication, because for every L command, the next string looks almost the same as the previous one. We can compress the whole thing by using a trie: a tree where each edge is a letter, and each string is a path from the root through the tree. Any string can be encoded as just a pointer to a node in this tree. Adding a letter requires finding the child node with a particular letter (creating it if it doesn't already exist). An undo requires just a copy of a previous pointer. Thus, each step can be simulated in O(1) time and space.

There is still the problem of answering queries. For subtasks where the queries all follow the commands, this is quite easy: just extract the final string once into an array, and answer queries from the array. For the final subtask something smarter is required: some extra information must be stored in each node. A start is to store the node's depth and its parent. Each query can be translated into a query for the kth ancestor of the current node, and the parent pointers can be used to ascend the tree. To make it fast enough, one can also store pointers to the 2i-th ancestor for all relevant i, which makes it possible to answer queries in O(log N) time. It also requires O(N log N) space: since the task description doesn't list the memory requirements I'm not sure if that will fit. It's also possible to store only the parent and the 2i-th ancestor where 2i is the largest power of 2 dividing the depth, for O(N) space.

Parachute Rings

For this problem, the trick was keeping track of all the cases. Firstly, it is quite easy to keep a histogram of the degree of each ring. We can also keep track of the components, and the number of vertices and edges in each component (this can be kept in the root of a standard union-find structure). Now, let's consider cases:
  1. If there is a ring with at least 4 connections, it is the only possible critical ring, because removing any other ring will leave it with at least three connections. If there is more than one such ring, there can be no more critical rings.
  2. If there is a ring with exactly 3 connections, only it and its neighbours can be critical rings. There are thus at most four candidates to consider.
  3. If all rings have at most 2 connections, there could be many critical rings. Every component will be either a chain or a cycle, and these can be distinguished by comparing the number of edges and vertices in the component. If there are no cycles, all rings are critical. If there is one cycle, all rings in that cycle are critical. Otherwise there are no critical rings.
The last case can be checked as we go, but the others are a little more tricky because they identify only candidates, not definitely critical rings. However, at most four rings can become candidates during the lifetime of the program: the first to have at least three connections, plus those three neighbours. As soon as a ring becomes a candidate, we can replay all the previous connections on another union-find data structure with that ring removed, and thereafter maintain that structure so that it can be examined after future connections.

I've omitted one minor detail: we also need to maintain in each union-find structure whether there are 0, 1, or more cycles, and if 1, the size of the component with 1 cycle. This can quite easily be maintained as the connections are made. Thus, total memory is O(N) and total time is O(N α(N)).

Pebbling odometer

This seems like the hardest problem, if only because it takes the most time to work through all the different subtasks. I haven't tried to implement these, so my analysis may be a little off. In terms of implementation, the major annoyance of this language is the lack of subroutine calls. It is likely that you will want to write metaprograms to generate the programs, where a subroutine call is implemented by duplicating code and adjusting internal labels. There is also no mechanism for storing internal state, so state needs to be stored either in the instruction pointer (by duplicating code), or in the grid itself through counts of pebbles.

 Subtask 1

Since we don't need to preserve the pebbles, we can just pair off pebbles until one of the piles is empty. Here's an outline of the process, ignoring all the details of moving around the grid.
  1. If there is no pebble at (0, 0), then terminate at (0, 0).
  2. Pick up a pebble from (0, 0).
  3. If there is no pebble at (0, 1), then terminate at (0, 1).
  4. Pick up a pebble from (0, 1).
  5. Repeat.

Subtask 2

We can no longer destroy the state by picking up pebbles. However, we can just move them somewhere for safe keeping. Start by moving all the pebbles in (0, 0) to (1, 0) and from (0, 1) to (1, 1) (using a loop in each case). Then apply the algorithm for subtask 1, and once the result has been decided, move any remaining pebbles back to the top row. Note that there will need to be two code-paths for restoring the pebbles, depending on where you want to terminate.

Subtask 3

This should be doable by moving the pebbles towards each other until they meet. First move right until finding the first pebble. Then
  1. Pick up a pebble.
  2. Move forward one square.
  3. If there is already a pebble on this square, terminate.
  4. Add a pebble to this square.
  5. Continue forward until reaching the other pebble.
  6. Turn around.
  7. Repeat.

Subtask 4

Here it gets a little trickier to tell whether the programs will actually fit, but I think this should work. Firstly, note that there is room for only three commands per cell on average. That seems too little even to do a move, a pebble test, a border test and a jump back to the start of a loop. However, loop unrolling can trade program size for execution length e.g. ((move, pebble) * 4, border, jump) to handle 4 cells gives an average of 2.5 steps per cell. I'm assuming a walk which goes alternatively left and right across rows.

That makes it possible to walk the board in few enough steps, but what do we do when we find a pebble? There probably isn't enough program size to store the number of pebbles in the program counter, so we need to move the pebbles around. We can move the pebble back to the start of the row, then restart the row from the next spot. Even if there are several pebbles in a row, we will still make forward progress. Once we've reached the end of the board, we can process the two extreme columns in a similar way to move all the pebbles up to the top-left corner. Moving pebbles will be expensive (each will cost up to a few thousand instructions, depending on implementation), but as there are at most 15 of them it is not a major issue.

Subtask 5

I've got a few ideas for this one, but nothing that I'm confident will fit into 444 instructions. The fundamental issue is that there is no way to save state in the board, since the initial board can be in any legal state and a valid solution cannot destroy information (because it has to match the final board). Thus, the only way to save state is in the program counter, which leads to massive code duplication. One is quite likely to need to store the minimum count.

This probably requires some very careful micro-optimisation: Johnny Ho's perfect score in fact takes 449 instructions, and is rounded up.

UPDATE: see the official hints page for a solution to subtask 5.

Saturday, May 14, 2011

The myth of "up" and "down" in computer graphics

Today is another of my rare technical rants, er, blog posts. It's a topic that I struggled with a number of years ago until I understood how to think about it properly, and which I still see people struggling with regularly.

The OpenGL coordinate system causes a lot of confusion because it uses a "bottom-left" origin, which is seen as "upside-down" relative to other coordinate systems typically used in window systems. This leads to people putting in lots of extra code to "flip images over" to make them right, often in the wrong part of a pipeline and requiring further reflections elsewhere to correct for it; as well as mis-guided requests for features in various APIs.

The way to stop your brain hurting is to stop labelling things as "up" or "down". Up and down are functions of an eyeball, and so it can only be directly applied to things that you see on a screen. Windows on a screen are visible, and this is the one place where OpenGL really is a bottom-up coordinate system. However, textures are not directly visible on the screen, either for normal sampling or for render-to-texture, so the use of up and down in the specification is purely a convenience for describing the behaviour. When uploading a texture, the first texel that is provided has texel coordinates (0, 0), and when this texture is treated as a rendertarget, that same texel has window coordinates (0, 0). This is true in both OpenGL and in Direct3D - there's no need to make any changes.

Things get more complicated at interface boundaries with other formats that do specify an up and a down. For example, PNG files do not have a standard coordinate system, but they do have a well-defined top and a bottom. So if a PNG file is used as a texture, where should the coordinate system origin be placed? That's not obvious, and needs to be consistent for an entire toolchain e.g. if your 3D modelling package shows you previews in which texture coordinates of (0, 0) map to the bottom-left of your image, then you should probably do the same thing in an application that consumes those models. Note that not all image file formats work the same way: formats created specifically for use with textures (e.g. KTX) may specify an origin rather than an up/down orientation (KTX also has an optional hint to tell viewer/editor applications how to display the image). The asset format may also provide a convention e.g. COLLADA explicitly indicates that (0, 0) corresponds to the lower-left corner of a texture image.

So, to summarize:
  • Avoid using the terms "up" and "down" where they are not absolutely necessary. They'll just confuse you.
  • Correct applications never flip images "upside-down" - they just sometimes have to re-arrange pixels in memory to conform to an interface. An upside-down image is a bug.
  • When defining interfaces between systems which have a defined "up" and "down" (e.g. a PNG file) and systems which have a defined coordinate system (e.g. OpenGL textures), make sure you know what the correspondence is (using precedent set by thirdparty tools or file formats where possible), then stick it to throughout your toolchain.
Hopefully this will reduce the amount of confusion in the world.

Sunday, March 27, 2011

Going home!

Yes, after 3 years of cold, cloud, not quite so cold, travel, cloud, work, pub lunches, bad pizza, more cloud, rain, snow, more cloud, more work, more travel etc etc, I'M GOING HOME! That's right - as of 5 May, I'll be back in the Mother City, back to my old haunts with my old friends and enjoying and missing my Cambridge friends, instead of the other way around.

I'll be doing a post-doc in the computer science department at UCT. When people ask me exactly what it is I'm going to do I find it hard to explain (possibly because I don't precisely know yet myself), but you can read the job advert here.

I've decided that nanny states are a real pain. Time was, people just made sure they had something put away for when they couldn't work any more. Now governments have very complicated rules by which you get a tax break if you put money into a pension scheme which you then can't withdraw from and can only spend in certain ways, and it creates a huge amount of red tape. Frankly dealing with red tape is the most stressful part about moving back - the actual moving is stressful but a doddle by comparison.

Anyway, if you're friend/acquaintance in the UK and want to see me before I leave, get in touch; and if you're a Cape Town friend/acquaintance I'll see you soon!

Saturday, September 25, 2010

Bored in Vegas

As usual, my advice on Vegas is, don't bother. This trip, I've gotten sick (something flu-like), been fed far too much, sucked in a lot of signature smoke, been too hot outside and too cold inside, and now my flight home is indefinitely delayed due to mechanical problems (I'm now in the airport).

Unfortunately, I'm coming back in 2 weeks for the TopCoder Open. Oh well.

Sunday, September 05, 2010

Visualising sorting algorithms

This is another one of my rare technical posts, as opposed to news of which countries I've been visiting.

If you're in computer science, you've probably seen an animation of sorting algorithms, maybe heard a rendition, or seen a visual representation. I have, somewhat by accident, discovered a different way to visualise a sorting algorithm: plot points for memory accesses, with address on the X axis and time (counted by accesses) on the Y axis, and different colours for reads and writes. It produces some rather pretty pictures. Note that these are not to scale relative to each other - the Y axis has been compressed to fit the entire sort into a fixed height.

Ye olde bubblesort. Some of the patterns are an optical illusion due to aliasing, but the green spikes are a feature of the algorithm.

Insertion sort - the version optimized for mostly sorted content. Although the data is random, you can see that in many cases it reduces the search distance.Shellsort, clearly showing the phases.

Selection sort:Heapsort: the solid lines at the top are the heap-building phase, while the rest shows the extraction. Note the very slight slope to the bottom-right line: as the heap gets smaller, the heap extraction gets faster, but only as O(log N).
Divide-and-conquer algorithms have a pretty fractal nature. This is quicksort - the perturbations in the fractal indicate the random selection of pivots (it just picks the middle, rather than median-of-3). Mergesort: this diagram is twice as wide as the others because it uses temporary storage on the right.