Sunday, October 26, 2014

2014 NEERC Southern Subregional

The ICPC NEERC South Subregional was mirrored on Codeforces. It was a very nice contest, with some approachable but also challenging problems. Here are my thoughts on the solutions (I solved everything except A, J and L during the contest).

A: Nasta Rabbara

This is quite a nasty one, and my initial attempt during the contest was all wrong. The first thing to be aware of is that a single query can be answered in O(L log L) time (or less) using a modification on the standard union-find data structure: each edge in the union-find structure is labelled to indicate whether end-points have the same or the opposite parity, and the find operation tracks whether the returned root has the same or opposite parity as the query point. That way, edges that create cycles can be found to be either odd- or even-length cycles. Of course, this won't be fast enough if all the queries are large.

Ideally, one would like a data structure that allows both new edges to be added and existing edges to be removed. That would allow for a sliding-window approach, in which we identify the maximal left end-point for each right end-point. However, the 10 second time limit suggests that this is not the right approach.

Instead, there is a \(O(N+(M+Q)\sqrt{M}\log N)\) solution. Dividing the series into blocks of length \(\sqrt{M}\). For each block, identify all the queries with a right end-point inside the block. Now build up a union-find structure going right-to-left, starting from the left edge of the block. Whenever you hit the left end-point of one of the identified queries, add the remaining episodes for the query (which will all come from inside the block) to answer the query, then undo the effect of these extra episodes before continuing. As long as you don't do path compression, each union operation can be unwound in O(1) time. This will miss queries that occur entirely inside a block, but these can be answered by the algorithm from the first paragraph as they are short.

B: Colored blankets

It turns out that it is always possible to find a solution. Firstly, blankets with no colour can be given an arbitrary colour on one side. It is fairly easy to see that we need only allocate blankets to kits such that each kit contains blankets of at most two colours. Repeat the following until completion:
  1. If any colour has at most K/N blankets remaining and that colour has not been painted onto any kit, put those blankets into a new kit and paint it with that colour (this might involve zero blankets into that kit).
  2. Otherwise, pick any colour that has not been painted on a kit. There must be a more than K/N blankets of that colour. Use enough of them to fill up any non-full painted kit. There must be such a kit, otherwise there are more than K blankets in total.
Since each kit is painted with a unique colour, this generates at most N kits; and since each kit has K/N blankets in it, it must generate exactly N kits.

C: Component tree

The only difficulty with this problem is that the tree might be very deep, causing the naive algorithm to spend a lot of time walking up the tree. This can be solved with a heavy-light decomposition. On each heavy path, for each attribute that appears anywhere on the path, store a sorted list (by depth) of the nodes containing that attribute. When arriving on a heavy path during a walk, a binary search can tell where the nearest ancestor with that property occurs on the heavy path. I think this makes each query O(log N) time.

D: Data center

This is a reasonably straightforward sliding window problem. Sort the servers of each type, and start with the minimum number of low voltage servers (obviously taking biggest first). One might also be required to take all the low voltage servers plus some high voltage servers. Then remove the low voltage servers one at a time (smallest first), and after each removal, add high voltage servers (largest first) until the capacity is made up. Then compare this combination to the best solution so far.

E: Election

Firstly, ties can be treated as losses (both within a station and in the election), because we want the mayor to win. When merging two stations, there are two useful results that can occur: win+loss -> win, or loss+loss -> loss; in both cases the number of wins stays the same, while the number of losses goes down by one. So they are equally useful. We can determine the number of viable merges by DP: either the last two can be merged, and we solve for the other N-2; or the last one is untouched, and we solve for the other N-1.

F: Ilya Muromets

Note that the gap closing up after a cut is just a distraction: any set of heads we can cut, can also be cut as two independent cuts of the original set of heads.

We can determine the best cut for every prefixes in linear time, by keeping a sliding window of sums (or using a prefix sum). Similarly, we can determine the best cut for every suffix. Any pair of cuts can be separated into a cut of a prefix and of the corresponding suffix, so we need only consider each split point in turn.

G: FacePalm

Let's consider each k contiguous days in turn, going left to right. If the current sum is non-negative, we need to reduce some of the values. We might as well reduce the right-most value we can, since that will have the effect on as many future values as possible (past values have already been fixed to be negative, so that is not worth considering). So we reduce the last value as much as needed, or until it reaches the lower limit. If necessary, we then reduce the second-last value, and so on until the sum is negative.

The only catch is that we need a way to quickly skip over long sequences of the minimum value, to avoid quadratic running time. I kept a cache of previous non-minimum value (similar to path compression in union-find structures); a stack of the positions of non-minimum values should work too.

H: Minimal Agapov code

The first triangle will clearly consist of the minimum three labels. This divides the polygon into (up to) three sections. Within each section, the next triangle will always stand on the existing diagonal, with the third vertex being the lowest label in the section. This will recursively subdivide the polygon into two more sections with one diagonal, and so on. One catch is that ties need to be broken carefully: pick the one furthest from the base with the smaller label (this lead to me getting a WA). Rather than considering all the tie-breaking cases for the first triangle, I started with a first diagonal with the two smallest labels, and found the first triangle through the recursion.

The main tool needed for this is an efficient range minimum query. There are a number of data structures for this, and any of them should work. I used two RMQ structures to cater for the two possible tie-breaking directions. The cyclic rather than linear nature of the queries, but it is just a matter of being careful.

I: Sale in GameStore

This was the trivial problem: sort, get your friends to buy the most expensive item, start filling up on cheap items until you can't buy any more or you've bought everything.

J: Getting Ready for the VIPC

I got a wrong answer to test case 53 on this, and I still don't know why. But I think my idea is sound.

The basic approach is dynamic programming, where one computes the minimum tiredness one can have after completing each contest, assuming it is possible to complete it (this also determines the resulting skill). To avoid some corner cases, I banned entering a contest with tiredness greater than \(h_i - l_i\). However, this is \(O(N^2)\), because for each contest, one must consider all possible previous contests.

The first optimisation one can do is that one can make a list of outcomes for each day, assuming one enters a contest that day: a list of (result skill, tiredness), one for each contest. If one contest results in both less skill and more tiredness than another, it can be pruned, so that one ends up with a list that increases in both skill and tiredness. Now one can compute the DP for a contest by considering only each previous day, and finding the minimum element in the list for which the skill in great enough to enter the current contest. The search can be done by binary search, so if there are few distinct days with lots of contests each day, this will be efficient; but if every contest is on a different day, we're no better off.

The second optimisation is to note the interesting decay function for tiredness. After about 20 days of inactivity, tireness is guaranteed to reach 0. Thus, there is no need to look more than this distance into the past: beyond 20 days, we only care about the maximum skill that can be reached on that day, regardless of how tired one is. This reduces the cost to \(O(N\log N\log maxT\). 

K: Treeland

Pick any vertex and take its nearest neighbour: this is guaranteed to be an edge; call the end-points A and B. An edge in a tree partitions the rest of the tree into two parts. For any vertex C, either d(A, C) < d(B, C) or vice versa, and this tells us which partition C belongs to. We can thus compute the partition, and then recursively solve the problem within each partition.

L: Useful roads

I didn't solve this during the contest, and I don't know exactly how to solve it yet.

M: Variable shadowing

This is another reasonably straightforward implementation problem. For each variable I keep a stack of declarations, with each declaration tagged with the source position and a "scope id" (each new left brace creates a new scope id). I also keep a stack of open scope ids. When a right brace arrives, I check the top-of-stack for each variable to see if it matches the just closed scope id, and if so, pop the stack.

Friday, July 18, 2014

IOI 2014 day 2 analysis

I found day 2 much harder than day 1, and I still don't know how to solve all the problems (I am seriously impressed by those getting perfect scores). Here's what I've managed to figure out so far.

Update: I've now solved everything (in theory), and the solutions are below. The official solutions are now also available on the IOI website. I'll try coding the solutions at some point if I get time.

Gondola

This was the easiest of the three. Firstly, what makes a valid gondola sequence? In all the subtasks of this problem, there will be two cases. If you see any of the numbers 1 to n, that immediately locks in the phase, and tells you the original gondola for every position. Otherwise, the phase is unknown. So, the constraints are that
  • if the phase is known, every gondola up to n must appear in the correct spot if it appears;
  • no two gondolas can have the same number.
Now we can consider how to construct a replacement sequence (and also to count them), which also shows that these conditions are sufficient. If the phase is not locked, pick it arbitrarily. Now the "new gondola" column is simply the numbers from n+1 up to the largest gondola, so picking a replacement sequence is equivalent to deciding which gondola replaces each broken gondola. We can assign each gondola greater than n that we can't see to a position (one where the final gondola number is larger), and this will uniquely determine the replacement sequence. We'll call such gondolas hidden.

For the middle set of subtasks, the simplest thing is to assign all hidden gondolas to one position, the one with the highest-numbered gondola in the final state. For counting the number of possible replacement sequences, each hidden gondola can be assigned independently, so we just multiply together the number of options, and also remember to multiply by n if the phase is unknown. In the last subtask there are too many hidden gondolas to deal with one at a time, but they can be handled in batches (those between two visible gondolas), using fast exponentiation.

Friend

This is a weighted maximum independent set problem. On a general graph this is NP-hard, so we will need to exploit the curious way in which the graph is constructed. I haven't figured out how to solve the whole problem, but let's work through the subtasks:
  1. This is small enough to use brute force (consider all subsets and check whether they are independent).
  2. The graph will be empty, so the sample can consist of everyone.
  3. The graph will be complete, so only one person can be picked in a sample. Pick the best one.
  4. The graph will be a tree. There is a fairly standard tree DP to handle this case: for every subtree, compute the best answer, either with the root excluded or included. If the root is included, add up the root-excluded answers for every subtree; otherwise add up the best of the two for every subtree. This takes linear time.
  5. In this case the graph is bipartite and the vertices are unweighted. This is a standard problem which can be solved by finding the maximum bipartite matching. The relatively simple flow-based algorithm for this is theoretically \(O(n^3)\), but it is one of those algorithms that tends to run much faster in most cases, so it may well be sufficient here.
The final test-case clearly requires a different approach, since n can be much larger. I only managed to figure this out after getting a big hint from the SA team leader, who had seen the official solution.

We will process the operations in reverse order. For each operation, we will transform the graph into one that omits the new person, but for which the optimal solution has the same score. Let's say that the last operation had A as the host and B as the invitee, and consider the different cases:
  • YourFriendsAreMyFriends: this is the simplest: any solution using B can also use A, and vice versa. So we can collapse the two vertices into one whose weight is the sum of the original weights, and use it to replace A.
  • WeAreYourFriends: this is almost the same, except now we can use at most one of A and B, and which one we take (if either) has no effect on the rest of the graph. So we can replace A with a single vertex having the larger of the two weights, and delete B.
  • IAmYourFriend: this is a bit trickier. Let's start with the assumption that B will form part of the sample, and add that to the output value before deleting it. However, if we later decide to use A, there will be a cost to remove B again; so A's weight decreases by the weight of B. If it ends up with negative weight, we can just clamp it to 0.
Repeat this deletion process until only the original vertex is left; the answer will be the weight of this vertex, plus the saved-up weights from the IAmYourFriend steps.

Holiday

Consider the left-most and right-most cities that Jian-Jia visits. Regardless of where he stops, he will need to travel from the start city to one of the ends, and from there to the other end. There is no point in doing any other back-tracking, so we can tell how many days he spends travelling just from the end-points. This then tells us how many cities he has time to see attractions in, and obviously we will pick the best cities within the range.

That's immediately sufficient to solve the first test case. To solve more, we can consider an incremental approach. Fix one end-point, and gradually extend the other end-point, keeping track of the best cities (and their sum) in a priority queue (with the worst of the best cities at the front). As the range is extended, the number of cities that can be visited shrinks, so items will need to be popped. Of course, the next city in the range needs to be added each time as well. Using a binary heap, this gives an \(O(n^2\log n)\) algorithm: a factor of n for each endpoint, and the \(\log n\) for the priority queue operations. That's sufficient for subtask 3.  It's also good enough for subtask 2, because the left endpoint will be city 0, saving a factor of n.

For subtask 4, it is clearly not possible to consider every pair of end-points. Let's try to break things up. Assume (without loss of generality) that we move first left, then back to the start, then right. Let's compute the optimal solution for the left part and the right part separately, then combine them. The catch is that we need to know how we are splitting up our time between the two sides. So we'll need to compute the answer for each side for all possible number of days spent within each side. This seems to leave us no better off, since we're still searching within a two-dimensional space (number of days and endpoint), but it allows us to do some things differently.

We'll just consider the right-hand side. The left-hand side is similar, with minor changes because we need two days for travel (there and back) instead of one. Let f(d) be the optimal end-point if we have d days available. Then with a bit of work one can show that f is non-decreasing (provided one is allowed to pick amongst ties). If we find f(d) for d=1, 2, 3, ... in that order, it doesn't really help: we're only, on average, halving the search space. But we can do better by using a divide-and-conquer approach: if we need to find f for all \(d \in [0, D)\) then we start with \(d = \frac{D}{2}\) to subdivide the space, and then recursively process each half of the interval on disjoint subintervals of the cities. This reduces the search space to \(O(n\log n)\).

This still leaves the problem of efficiently finding the total number of attractions that can be visited for particular intervals and available days. The official solution uses one approach, based on a segment tree over the cities, sorted by number of attractions rather than position. The approach I found is, I think, simpler. Visualise the recursion described above as a tree; instead of working depth-first (i.e., recursively), we work breadth-first. We make \(O(\log n)\) passes, and in each pass we compute f(d) where d is an odd multiple of \(2^i\) (with \(i\) decreasing with each pass). Each pass can be done in a single incremental process, similar to the way we tackled subpass 2. The difference is that each time we cross into the next subinterval, we need to increase \(d\), and hence bring more cities into consideration. To do this, we need either a second priority queue of excluded cities, or we can replace the priority queue with a balanced binary tree. Within each pass, d can only be incremented \(O(n)\) times, so the total running time will be \(O(n\log n)\) per pass, or \(O(n\log n \log n)\) overall.

Wednesday, July 16, 2014

IOI 2014 day 1 analysis

IOI 2014 day 1

Since there is no online judge, I haven't tried actually coding any of these. So these ideas are not validated yet. You can find the problems here.

Rails

I found this the most difficult of the three to figure out, although coding it will not be particularly challenging.

Firstly, we can note that distances are symmetric: a route from A to B can be reflected in the two tracks to give a route from B to A. So having only \(\frac{n(n-1)}{2}\) queries is not a limitation, as we can query all distances. This might be useful in tackling the first three subtasks, but I'll go directly to the hardest subtask.

If we know the position and type of a station, there is one other that we can immediately locate: the closest one. It must have the opposite type and be reached by a direct route. Let station X be the closest to station 0. The other stations can be split into three groups:
  1. d(X, Y) < d(X, 0): these are reached directly from station X and of type C, so we can locate them exactly.
  2. d(0, X) + d(X, Y) = d(0, Y), but not of type 1: these are reached from station 0 via station X, so they lie to the left of station 0.
  3. All other stations lie to the right of station X.
Let's now consider just the stations to the right of X, and see how to place them. Let's take them in increasing distance from 0. This ensures that we encounter all the type D stations in order, and any type C station will be encountered at some point after the type D station used to reach it. Suppose Y is the right-most type D station already encountered, and consider the distances for a new station Z. Let \(z = d(0, Z) - d(0, Y) - d(Y, Z)\). If Z is type C, then there must be a type D at distance \(\frac{z}{2}\) to the left of Y. On the other hand, if Z is of type D (and lies to the right of Y), then there must be a type C station at distance \(\frac{z}{2}\) to the left of Y. In the first case, we will already have encountered the station, so we can always distinguish the two cases, and hence determine the position and type of Z.
The stations to the left of station zero can be handled similarly, using station X as the reference point instead of station 0.
How many queries is this? Every station Z except 0 and X accounts for at most three queries: d(0, Z), d(X, Z) and d(Y, Z), where Y can be different for each Z. This gives \(3(n-2) + 1\), which I think can be improved to \(3(n-2)\) just by counting more carefully. Either way, it is sufficient to solve all the subtasks.

Wall

This is a fairly standard interval tree structure problem, similar to Mountain from IOI 2005 (but a little easier). Each node of the tree contains a range to which its children are clamped. To determine the value of any element of the wall, start at the leaf with a value of 0 and read up the tree, clamping the value to the range in each node in turn. Initially, each node has the range [0, inf). When applying a new instruction, it is done top-down, and clamps are pushed down the tree whenever recursion is necessary.

An interesting aspect of the problem is that it is offline, in that only the final configuration is requested and all the information is provided up-front. This makes me think that there may be an alternative solution that processes the data in a different order, but I can't immediately see a nicer solution than the one above.

Game

I liked this problem, partly because I could reverse-engineer a solution from the assumption that it is always possible to win, and partly because it requires neither much algorithm/data-structure training (like Wall) nor tricky consideration of cases (like Rails). Suppose Mei-Yu knows that certain cities are connected. If there are any flights between the cities that she has not asked about, then she can win simply by saving one of these flights for last, since it will not affect whether the country is connected. It follows that for Jian-Jia to win, he must always answer no when asked about a flight between two components that Mei-Yu does not know to be connected, unless this is the last flight between these components?

What if he always answers yes to the last flight between two components? In this case he will win. As long as there are at least two components left, there are uncertain edges between every pair of them, so Mei-Yu can't know whether any of them is connected any other. All edges within a component are known, so the number of components can only become one after the last question.

What about complexity? We need to keep track of the number of edges between each pair of components, which takes \(O(N^2)\) space. Most operations will just decrement one of these counts. There will be \(N - 1\) component-merging operations, each of which requires a linear-time merging of these edge counts and updating a vertex-to-component table. Thus, the whole algorithm requires \(O(N^2)\) time. This is optimal given that Mei-Yu will ask \(O(N^2)\) questions.

Monday, June 30, 2014

ICPC Problem H: Pachinko

Problem A has been covered quite well elsewhere, so I won't discuss it. That leaves only problem H. I started by reading the Google translation of a Polish writeup by gawry. The automatic translation wasn't very good, but it gave me one or two ideas I borrowed. I don't know how my solution compares in length of code, but I consider it much simpler conceptually.

This is a fairly typical Markov process, where a system is in some state, and each timestep it randomly selects one state as a function of the current state. One variation is that the process stops once the ball reaches a target, whereas Markov processes don't terminate. I was initially going to model that as the ball always moving from the target to itself, but things would have become slightly complicated.

Gawry has a nice way of making this explicitly a matrix problem. Set up the matrix M as for a Markov process i.e., \(M_{i,j}\) is the probability of a transition from state j to state i. However, for a target state j, we set \(M_{i,j}=0\) for all i. Now if \(b\) is our initial probability vector (equal probability for each empty spot in the first row), then \(M^t b\) represents the probability of the ball being in each position (and the game not having previously finished) after \(t\) timesteps. We can then say that the expected amount of time the ball spends in each position is given by \(\sum_{t=0}^{\infty} M^t b\). The sum of the elements in this vector is the expected length of a game and we're told that it is less than \(10^9\), so we don't need to worry about convergence. However, that doesn't mean that the matrix series itself converges: Gawry points out that if there are unreachable parts of the game with no targets, then the series won't converge. We fix that by doing an initial flood-fill to find all reachable cells and only use those in the matrix. Gawry then shows that under the right conditions, the series converges to \((I - M)^{-1} b\).

This is where my solution differs. Gawry dismisses Gaussian elimination, because the matrix can be up to 200,000 square. However, this misses the fact that it is banded: by numbering cells left-to-right then top-to-bottom, we ensure that every non-zero entry in the matrix is at most W cells away from the main diagonal. Gaussian elimination (without pivoting) preserves this property. We can exploit this both to store the matrix compactly, and to perform Gaussian elimination in \(O(W^3H)\) time.

One concern is the "without pivoting" caveat. I was slightly surprised that my first submission passed. I think it is possible to prove correctness, however. Gaussian elimination without pivoting is known (and easily provable) to work on strictly column diagonally dominant matrices. In our case the diagonal dominance is weak: columns corresponding to empty cells have a sum of zero, those corresponding to targets have a 1 on the diagonal and zeros elsewhere. However, the matrix is also irreducible, which I think is enough to guarantee that there won't be any division by zero.

EDIT: actually it's not guaranteed to be irreducible, because the probabilities can be zero and hence it's possible to get from A to B without being able to get from B to A. But I suspect that it's enough that one can reach a target from every state.

Saturday, June 28, 2014

ICPC Problem L: Wires

While this problem wasn't too conceptually difficult, it requires a lot of code (my solution is about 400 lines), and careful implementation of a number of geometric algorithms. A good chunk of the code comes from implementing a rational number class in order to precisely represent the intersection points of the wires. It is also very easy to suffer from overflow: I spent a long time banging my head against an assertion failure on the server until I upgraded my rational number class to use 128-bit integers everywhere, instead of just for comparisons.

The wires will divide the space up into connected regions. The regions can be represented in a planar graph, with edges between regions that share an edge. The problem is then to find the shortest path between the regions containing the two new end-points.

My solution works in a number of steps:
  1. Find all the intersection points between wires, and the segments between intersection points. This just tests every wire against every other wire. The case of two parallel wires sharing an endpoint needs to be handled carefully. For each line, I sort the intersection points along the line. I used a dot produce for this, which is where my rational number class overflowed, but would probably have been safer to just sort lexicographically. More than two lines can meet at an intersection point, so I used a std::map to assign a unique ID to each intersection point (I'll call them vertices from here on).
  2. Once the intersection points along a line have been sorted, one can identify the segments connecting them. I create two copies of each segment, one in each direction. With each vertex A I store a list of all segments A->B. Each pair is stored contiguously so that it is trivial to find its partner. Each segment is considered to belong to the region to its left as one travels A->B.
  3. The segments emanating from each vertex are sorted by angle. These comparisons could easily cause overflows again, but one can use a handy trick: instead of using the vector for the segment in an angle comparison, one can use the vector for the entire wire. It has identical direction but has small integer coordinates.
  4. Using the sorted lists from the previous step, each segment is given a pointer to its following segment from the same region. In other words, if one is tracing the boundary of the region and one has just traced A->B, the pointer will point to B->C.
  5. I extract the contours of the regions. A region typically consists of an outer contour and optionally some holes. The outermost region lacks an outer contour (one could add a big square if one needed to, but I didn't). A contour is found by following the next pointers. A case that turns out to be inconvenient later is that some segments might be part of the contour but not enclose any area. This can make a contour disappear completely, in which case it is discarded. Any remaining contours have the property that two adjacent segments are not dual to each other, although it is still possible to both sides of an edge to belong to the same contour.
  6. Each contour is identified as an outer contour or a hole. With integer coordinates I could just measure the signed area of the polygon, but that gets nasty with rational coordinates. Instead, I pick the lexicographically smallest vertex in the contour and examine the angle between the two incident segments (this is why it is important that there is a non-trivial angle between them). I also sort the contours by this lexicographically smallest vertex, which causes any contour to sort before any other contours it encloses.
  7. For each segment I add an edge of weight 1 from its containing region to the containing region of its dual.
  8. For each hole, I search backwards through the other contours to find the smallest non-hole that contains it. I take one vertex of the hole and do a point-in-polygon test. Once again, some care is needed to avoid overflows, and using the vectors for the original wires proves useful. One could then associate the outer contour and the holes it contains into a single region object, but instead I just added an edge to the graph to join them with weight 0. In other words, one can travel from the boundary of a region to the outside of a hole at no cost.
  9. Finally, I identify the regions containing the endpoints of the new wire, using the same search as in the previous step.
After all this, we still need to implement a shortest path search - but by this point that seems almost trivial in comparison.

What is the complexity? There can be \(O(M^2)\) intersections and hence also \(O(M^2)\) contours, but only \(O(M)\) of them can be holes (because two holes cannot be part of the same connected component). The slowest part is the fairly straightforward point-in-polygon test which tests each hole against each non-hole segment, giving \(O(M^3)\) time. There are faster algorithms for doing point location queries, so it is probably theoretically possible to reduce this to \(O(M^2\log N)\) or even \(O(M^2)\), but certainly not necessary for this problem.

ICPC Problem J: Skiing

I'll start by mentioning that there is also some analysis discussion on the Topcoder forums, which includes alternative solutions that in some cases I think are much nicer than mine.

So, on to problem J - which no team solved, and only one team attempted. I'm a little surprised by this, since it's the sort of problem where one can do most of the work on paper while someone else is using the machine. A possible reason is that it's difficult to determine the runtime: I just coded it up and hoped for the best.

It is a slightly annoying problem for a few reasons. Firstly, there are a few special cases, because \(v_y\) or \(a\) can be zero. We also have to find the lexicographically smallest answer.

Let's deal with those special cases first. If \(v_y = 0\) then we can only reach targets with \(y = 0\). If \(a \not= 0\) then we can reach all of them in any order, otherwise we can only reach those with \(x = 0\). To get the lexicographically smallest answer, just go through them in order and print out those that are reachable. I actually dealt with the \(v_y \not= 0\), \(a = 0\) case as part of my general solution, but is also easy enough to deal with. We can only reach targets with \(x = 0\), and we can only reach them in increasing order of y. One catch is that targets are not guaranteed to have distinct coordinates, so if two targets have the same coordinates we must be careful to visit them in lexicographical order. I handled this just by using a stable sort.

So, onwards to the general case. Before we can start doing any high-level optimisation, we need to start by answering this question: if we are at position P with X velocity \(v\) and want to pass through position Q, what possible velocities can we arrive with? I won't prove it formally, but it's not hard to believe that to arrive with the smallest velocity, you should start by accelerating (at the maximum acceleration) for some time, then decelerate for the rest of the time. Let's say that the total time between P and Q is \(T\), the X separation is \(x\), the initial acceleration time is \(T - t\) and the final deceleration time is \(t\). Some basic integration then gives
\[x = v(T-t)+\tfrac{1}{2}a(T-t)^2 + (v+a(T-t))t - \tfrac{1}{2}at^2.\]
This gives a quadratic in \(t\) where the linear terms conveniently cancel, leaving
\[t = \sqrt{\frac{vT+\tfrac{1}{2}aT^2-x}{a}}\]
The final velocity is just \(v+aT-2at\). We can also find the largest possible arrival velocity simply by replacing \(a\) with \(-a\). Let's call these min and max velocity functions \(V_l(v, T, x)\) and \(V_h(v, T, x)\).

More generally, if we can leave P with a given range of velocities \([v_0, v_1]\), with what velocities can we arrive at Q? We first need to clamp the start range to the range from which we can actually reach Q i.e., that satisfy \(|vT - x| \le \frac{1}{2}aT^2\). A bit of calculus shows that \(V_l\) and \(V_h\) are decreasing functions of \(v\), so the arrival range will be \([V_l(v_1), V_h(v_0)]\).

Finally, we are ready to tackle the problem as a whole, with multiple targets. We will use dynamic programming to answer this question, starting from the last target (highest Y): if I start at target i and want to hit a total of j targets, what are the valid X velocities at i? The answer will in general be a set of intervals. We will make a pseudo-target at (0, 0), and the answer will then be the largest j for which 0.0 is a valid velocity at this target.

Computing the DP is generally straightforward, except that the low-level queries we need are the reverse of those we discussed above i.e. knowing the arrival velocities we need to compute departure velocities. No problem, this sort of physics is time-reversible, so we just need to be careful about which signs get flipped. For each (i, j) we consider all options for the next target, back-propagate the set of intervals from that next target, and merge them into the answer for (i, j). Of course, for \(j = 1\) we use the interval \((-\infty, \infty)\).

The final inconvenience is that we must produce a lexicographical minimum output. Now we will see the advantage of doing the DP in the direction we chose. We build a sequence forwards, keeping track of the velocity interval for our current position. Initially the position will be (0, 0) and the velocity interval will be [0.0, 0.0]. To test whether a next position is valid, we forward-propagate the velocity interval to this next position, and check whether it has non-empty intersection with the set of velocities that would allow us to hit the required number of remaining targets. We then just take the next valid position with the lowest ID.

What is the complexity? It could in theory be exponential, because every path through the targets could induce a separate interval in the interval set. However, in reality the intervals merge a lot, and I wouldn't be surprised if there is some way to prove that there can only be a polynomial number of intervals to consider. My solution still ran pretty close to the time limit, though.

Friday, June 27, 2014

More ICPC analysis

Now we're getting on to the harder problems. Today I've cover two that I didn't know how to solve. Some of the others I have ideas for, but I don't want to say anything until I've had a chance to code them up.

Firstly, problem I. This is a maximum clique problem, which on a general graph is NP-complete. So we will need to use the geometry in some way. Misof has a nice set of slides showing how it is done: http://people.ksp.sk/~misof/share/wf_pres_I.pdf. I had the idea for the first step (picking the two points that are furthest apart), but it didn't occur to me that the resulting conflict graph would be bipartite.

Now problem G. I discovered that this is a problem that has had a number of research papers published on the topic, one of which achieves \(O(N^3)\). Fortunately, we don't need to be quite that efficient. Let's start by finding a polynomial-time solution. Let's suppose we've already decided the diameters of the two clusters, D(A) and D(B), and just want to find out whether this is actually possible. For each shipment we have a boolean variable that says whether it goes into part A (false) or part B (true). The constraints become boolean expressions: if d(i, j) > D(A) then we must have i || j, and if d(i, j) > D(B) then we must have !i || !j. Determining whether the variables can be satisfied is just 2-SAT, which can be solved in \(O(N^2)\) time.

Now, how do we decide which diameters to test? There are \(O(N^2)\) choices for each, so the naive approach will take \(O(N^6)\) time. We can reduce this to \(O(N^4\log N)\) by noting that once we've chosen one, we can binary search the other (it's also possible to eliminate the log factor, but it's still too slow). So far, this is what I deduced during the contest.

The trick (which I found in one of those research papers) is that one can eliminate most candidates for the larger diameter. If there is an odd-length cycle, at least one of the edges must be within a cluster, and so that is a lower bound for the larger diameter. What's more, we can ignore the shortest edge of an even cycle (with some tie-breaker), because if the shortest edge lies within a cluster, then so must at least one other edge.

We can exploit this to generate an O(N)-sized set of candidates: process the edges from longest to shortest, adding each to a new bipartite graph (as for constructing a maximum spanning tree with Kruskal's algorithm). There are three cases:
  1. The next edge connects two previously disconnected components. Add the edge to the graph (which will remain bipartite, since one can always re-colour one of the components. This edge length is a candidate diameter.
  2. The next edge connects two vertices in the same component, but the graph remains bipartite. This edge is thus part of an even cycle, and so can be ignored.
  3. The next edge connects two vertices, forming an odd cycle. This edge length is a candidate diameter, and the algorithm terminates.
The edges selected in step 1 form a tree, so there are only O(N) of them.

Wednesday, June 25, 2014

ACM ICPC 2014

ACM ICPC 2014 is over. The contest was incredibly tough: solving less than half the problems was enough to get a medal, and 20 teams didn't solve any of the problems (unfortunately including the UCT team, who got bogged down in problem K).

I managed to solve 5 problems during the contest (in 4:30, since I didn't wake up early enough to be in at the start), and I've solved one more since then. Here are my solutions, in approximately easy-to-hard order.

EDIT: note that I've made follow-up blog posts with more analysis.

D: game strategy

This was definitely the easiest one, and this is reflected in the scores. We can use DP to determine the set of states from which Alice can force a win within i moves. Let's call this w[i]. Of course, w[0] = {target}. To compute w[i+1], consider each state s that is not already a winning state. If one of Alice's options is the set S and \(S \subseteq w[i]\), then Alice can win from this state in i+1 moves.

We can repeat this N times (since no optimal winning sequence can have more than N steps), or just until we don't find any new winning states. We don't actually need to maintain all the values of w, just the latest one.

K: surveillance

Let's first construct a slower polynomial-time algorithm. Firstly, for each i, we can compute the longest contiguous sequence of walls we can cover starting at i. This can be done in a single pass. We sort all the possible cameras by their starting wall, and as we sweep through the walls we keep track of the largest-numbered ending wall amongst cameras whose starting wall we have passed. The camera with \(a > b\) are a bit tricky: we can handle them by treating them as two cameras \([a - N, b]\) and \([a, b + N]\).

Let's suppose that we know we will use a camera starting at wall i. Then we can use this jump table to determine the minimum number of cameras: cover as much wall starting from i as possible with one camera, then again starting from the next empty slot, and so on until we've wrapped all the way around and covered at least N walls. We don't know the initial i, but we can try all values of i. Unfortunately, this requires \(O(N^2)\) time.

To speed things up, we can compute accelerated versions of the jump table. If \(J_c[i]\) is the number of walls we can cover starting from i by using c cameras, then we already have \(J_1\), and we can easily calculate \(J_{a+b}\) given \(J_a\) and \(J_b\). In particular, we can compute \(J_2, J_4, J_8\) and so on, and them combine these powers of two to make any other number. This can then be used in a binary search to find the smallest c such that \(J_c\) contains a value that is at least N. The whole algorithm thus requires \(O(N\log N)\) time.

One catch that broke my initial submission is that if the jump table entries are computed naively, they can become much bigger than N. This isn't a problem, until they overflow and become negative. Clamping them to N fixed the problem.

C: crane balancing

This is a reasonably straightforward geometry problem, requiring some basic mechanics. Let the mass of the crane be \(q\), let the integral of the x position over the crane be \(p\), let the base be the range \([x_0, x_1]\), and let the x position of the weight be \(x\). The centre of mass of the crane is \(\frac p q\), but with a weight of mass \(m\), it will be \(\frac{p+mx}{q+m}\). The crane is stable provided that this value lies in the interval \([x_0, x_1]\). This turns into two linear inequalities in \(m\). Some care is then needed to deal with the different cases of the coefficient being positive, negative or zero.

Computing the area and the integral of the x value is also reasonably straightforward: triangulate the crane by considering triangles with vertices at \((0, i, i+1)\) and computing the area (from the cross product) and centre of mass (average of the vertices) and then multiply the area by the x component of the centre of mass to get the integral.

I prefer to avoid floating point issues whenever possible, so I multiplied all the X coordinates by 6 up front and worked entirely in integers. This does also cause the masses to be 6 times larger, so they have to be adjusted to compensate.

E: maze reduction

This is effectively the same problem I set in Topcoder SRM 378 (thanks to ffao on Topcoder for reminding me where I'd seen this). If you have an account on Topcoder you can read about an alternative solution in the match analysis, in which it is easier to compute worst-case bounds.

The approach I used in this contest is similar in concept to D, in that you incrementally update a hypothesis about which things are distinguishable. We will assign a label to every door of each chamber. Two doors are given the same label if we do not (yet) know of a way to distinguish them. Initially, we just label each door with the degree of the room it is in, since there is nothing else we can tell by taking zero steps.
Now we can add one inference step. Going clockwise from a given door, we can explore all the corridors leading from the current chamber and note down the labels on the matching doors at the opposite end of each corridor. This forms a signature for the door. Having done this for each door, we can now assign new labels, where each unique signature is assigned a corresponding label. We can repeat this process until we achieve stability, i.e., the number of labels does not change. We probably ought to use each door's original label in the signature too, but my solution passed without requiring this.
Finally, two rooms are effectively identical if their sequence of door labels is the same up to rotation. I picked the lexicographically minimum rotation for each room and used this as a signature for the room.
I'm not sure what the theoretical work-case performance is, but I suspect it would be quite large. For a start, by algorithm requires \(O(NE\log N)\) time per iteration, which I suspect could be improved by using a hash table with some kind of rolling hash to rotation in O(1) time. I was surprised when my first submission ran in time.

B: buffed buffet

This one was deceptively sneaky, but the principles are reasonably simple. It's not too hard to guess that one will solve the continuous and discrete problems separately, and then consider all partitions between the two.
Let's start with the continuous problem, since it is a little easier. I don't have a formal proof, but it shouldn't be too hard to convince yourself that a greedy strategy works. We start by eating the tastiest food. Once it degrades to being as tasty as the second-tastiest food, we then each both, in the proportion that makes them decay at the same rate. In fact, at this point we can treat them as a combined food with a combined (slower) decay rate. We continue eating this mix until tastiness decays to match the third-tastiest, and so on. There are some corner cases that need to be handled if there are foods that don't decay.
Now what about the discrete case? Because the items have different weights, a greedy strategy won't work here, as with any knapsack problem. There is a reasonably straightforward \(O(DW^2)\) dynamic programming, where you consider each possible number of serving of each discrete food type, but this will be too slow. But there is some structure in the problem, so let's try to exploit it. Let's say that \(f(i)\) is the maximum tastiness for a weight of \(i\) using only the foods we've already considered, and we're now computing an updated \(f'\) by adding a new discrete food with weight \(w\), initial tastiness \(t\) and decay rate \(d\). For a given i, \(f'(i)\) clearly only depends on \(f(j)\) where \(i - j\) is a multiple of \(w\), so let's split things up by the remainder modulo \(i\). Fix a remainder \(r\) and let \(g(i) = f(iw + r)\) and \(g'(i) = f'(iw + r)\). Now we can say that
\[
\begin{aligned}
g'(i) &= \max_{0 \le j \le i}\big\{ g(j) + \sum_{n=1}^{i-j}(t - (n-1)d\big\}\\
&= \max_{0 \le j \le i}\big\{ g(j) + (i-j)t - \frac{(i-j-1)(i-j)}{2}\cdot d\big\}\\
&= \max_{0 \le j \le i}\big\{ g(j) + (i-j)t - \frac{i(i-1)+j(j+1)-2ij}{2}\cdot d\big\}\\
&= it - \frac{i(i-1)d}{2} + \max_{0 \le j \le i}\big\{ g(j)-\frac{j(j+1)d}{2} - jt + ijd\big\}\\
&= it - \frac{i(i-1)d}{2} + \max_{0 \le j \le i}\big\{ h(j) + ijd \big\}
\end{aligned}
\]
Here we have defined \(h(j) = g(j)-\frac{j(j+1)d}{2} - jt\), which can be computed in constant time per \(j\).

The key observation is that we have turned the expression inside the maximum into a linear function of \(j\). Thus, if we plot \(h\) on a graph, only the upper convex hull is worth considering. We can maintain this upper hull as we increase \(i\) (remember, \(j \le i\)). The second observation is that the optimal choice of \(j\) will increase as \(i\) increases, because increasing \(i\) will increase the \(ijd\) more for larger values of \(j\). It follows that we can incrementally find the optimal \(j\) for each \(i\) by increasing the previous \(j\) along the upper hull until increasing it further would start decreasing the objective function. There are a few corner cases: the upper hull might not have any valid entries (because there might be no way to make up a weight of exactly \(jw+r\)), and we might have popped off the previous \(j\) as part of updating the hull. These just need careful coding. Another snag to watch out for is that if all the foods decay very fast, the optimal result may in fact overload a 32-bit integer in the negative direction.

The discrete part of the algorithm now requires O(DW), and the continuous part requires O(D\log D + W).

F: messenger (solved after contest)

This is a nasty problem mostly because of numerical stability problems. The problem itself is numerically unstable: a rounding error in measuring the total lengths of the paths is sufficient to change the answer between possible and impossible. It requires the black art of choosing good epsilons. I'll only describe the ideal case in which none of these nasty rounding problems occur.

Suppose we pick a point on both Misha and Nadia's path. In general this won't work, because the courier will arrive too late or too early at this point to intercept Nadia. Let L(A, B) be the number of time units that the courier is late when travelling from A to B. L can be late if the courier is early. Valid solutions are those where L = 0.

First, let's prove that L is an increasing function of A (where "increasing" means that A moves further along Misha's path). Suppose that the courier decided to, instead of moving straight to B, instead kept pace with Misha for a while, then went to B. Obviously this would mean arriving later (or at the same time). But this is equivalent to the arrival time if A increases. A similar argument shows that L is a decreasing function of B. In both cases, this is not strict.

This means that there can be at most \(O(N)\) pairs of segments for which L=0, rather than \(O(N^2)\), because we cannot move forward on one path and backwards on another. We can iterate over the candidate pairs by successively advancing either one segment or the other, by measuring L at pairs of segment endpoints.

Now we have reduced the problem to a single segment from each path. Given a point on one path, we can find the corresponding point on the other by solving what looks like a quadratic but which decays into a linear equation. Again, there are corner cases where the linear coefficient is also zero, in which case we have to break ties so as to minimise the travel time. Using this mapping function, we can identify the range of positions on Misha's path that correspond to valid positions on Nadia's path. I then used ternary search to find the optimal position along this segment. I didn't actually prove that the function is convex, but it seemed to work.

The solution I implemented that passed is actually rather messier than what I've described, and ran close to the time limit. I tried implementing it as described above but it hits assertions in my code due to numeric instabilities, but I think it should still be possible to fix it.

Monday, June 02, 2014

New website

I've finally gotten around to replacing my ugly nineties-looking personal web page that was cobbled together with raw HTML and m4, with a much prettier version that nevertheless still contains roughly the same content. It is generated using Sphinx, with the individual pages written in reStructuredText. I've also merged my publications page into the main website instead of leaving it on the equally ugly but separately cobbled-together page I originally set up for my PhD.

Saturday, May 10, 2014

Challenge 24 finals 2014

Here are my thoughts/solutions for the Challenge 24 Finals problems from this year.

A: Halting problem

Firstly, be aware that the problem had a typo: the condition should be "A != 0" rather than "A == 0". The intuition is that "most" of the time the g edge will be taken, but occasionally the h edge might be taken. Given a starting point, let us suppose that we have an efficient way to iterate the algorithm until either an h edge is taken or we terminate (or we determine that neither condition will ever hold). In the former case, we have just determine that we need f(0) for some f. Since there are only O(N) of these values, we can cache them as needed, and so only require O(Q + N) rounds of the inner algorithm (namely, iterating until we reach A = 0 or terminate).
Now let us solve this inner problem. Consider the directed graph in which each function f is connected to g. A graph where every vertex has out-degree one has a specific structure, in which each component contains a single loop from which hang upward-directed trees. Iteration thus consists of walking up a tree until reaching the loop, then going round and round the loop until we exit. Of course, we might finish early, or we might go around the loop forever.
For the walking up the tree part, let's use brute force for now. Assuming we make it to the loop, how do we find the exit point? Let's say that every complete time around the loop we increase A by C, and that at some point around the loop we have A = a. Then to exit at this point after t full cycles, we must have a + tC = 0 (mod 264). We can solve this using the extended GCD algorithm. We can measure this for every point around the loop, and the smallest valid t value tells us where we will exit. If there are no valid t values, then we will go round forever.
This is essentially the solution I used in the contest. It is somewhat slow: O(N(N+Q)) I think. I did a lot of microoptimisation, and even then it took hours for the largest test case. The important optimisations were precomputing parameters for each loop (particularly the inverse of C) once, and reordering the data so that points around a loop are consecutive in memory. In fact over 75% of time ended up being spent in pointer-chasing up the trees, because this was cache-unfriendly.
I think the problem can be solved in something like O((N+Q)log (N+Q)), but it is rather more complex, and a blog doesn't really lend itself to the mathematical notation. I might write it up later if I get around to it. I'll give a few hints. If C is odd then increasing the A value on entry to the loop by a certain amount will increase the time-to-exit for each point by the same amount; this can be generalized if C is an odd multiple of 2^k by partitioning things by their remainder mod 2^k. The tree part can be handled by considering all queries at the same time, in a preorder walk of the tree.

C: Complete program

I had a nasty super-polynomial time solution during the contest, which was complex and only managed to solve about 7 test cases. There is a much simpler and polynomial-time DP solution, however.
Let's start with some observations. Any O not followed by a valid number will need one, so let's add it immediately (it can be anything valid). After this, O and the number might as well be treated as a single unit (there is no point inserted anything between an O and a valid number either). There is no point ever adding an I. Adding F's is useful only to increment the nesting depth, so we might as well add F's only right at the front, since that way everything will be nested inside them. Finally, whenever we add a number, it might as well be a 0.
What is less obvious is that it is only necessary to add an A immediately before a number. This requires checking a few cases, but is because an A added anywhere else can be shuffled to the right until it reaches a number, without affecting whether a token string is valid. Similar rotation arguments show that we never need to add two As or Os in front of a number.
For the DP, let's compute, for each contiguous substring of the input and each F nesting level, the minimum number of tokens we need to add to make it into a valid parse tree. There is a special case for the empty range: we must just add a 0, and we must check that the nesting depth at that point is at least 1. Otherwise, we can either
  • Insert an A or an O.
  • Use the first token in the substring as the root.
In the first case, the A or O will "swallow" the number, so if it is at most 127 we can just use an O, otherwise we must use an A and the nesting level must be sufficiently large. We then use DP to find the best way to handle the rest of the substring. The second case is largely straightforward, unless the first given token is an A. In this case, we must consider all ways in which the rest of the substring may be split into two valid trees.
Finally, we need to determine the nesting level to use at the root. It is possible, although a little slow, to just keep incrementing this value until there are so many initial Fs that it is clearly impossible to improve the solution. A more efficient solution in practice is to compute the DP over each substring as a piecewise constant function of the nesting level; this is more complex since one needs to do various operations on these piecewise constant functions, but it guarantees an O(N4) solution independent of the size of the numbers in the input.

D: Firing game

I solved 9 out of the 10 test cases. The primary observation is that any manager that is unfirable in the initial setup creates a completely independent subtree: firings within this subtree have no effect on the rest of the tree, and vice versa. This partitions the tree into separate Nim piles, and we just need to determine the Grundy number for each such subproblem. In all but one test case, these trees are reasonably small, and can be solved by DP/memoized recursion.

N: Dog tags

This problem looked as though it could be extremely nasty, but fortunately the data was somewhat friendly. For example, each bubble was intersected enough times to get a good guess at the volume, even though the problem only guaranteed a single intersection.
Firstly one must identify the bubbles. The loops can be considered to be a graph, with an edge between two loops if they are on adjacent slices and their interiors intersect in the XY plane. I was lazy and considered two loops to intersect if a vertex of one lay inside another, which is good enough if the loops are sampled well. Each connected component in this graph constitutes a bubble. I estimated the volume as the sum of areas of the loops, and also found the centroid (mean of all points contained in the loops).
Finding the fiduciary bubbles is easy: they are the 3 bubbles with greatest volume. Determining which one is the corner is not quite as easy, since one does not know the inter-slice distance and hence distances cannot easily be measured. I identified the first plane of bubbles (see later), and noted that the mid-point of the two corner bubbles should be close to the centroid of this set of bubbles. I just took all 3 edges to find which is best.
This now gives the grid coordinate system XYZ. One does need to check that the bubbles are on the positive Z side, otherwise the two corners need to be swapped.
Now we can partition the bubbles into planes. I sorted by Z, computed all the differences between adjacent Z values, sorted this list, and looked for a big jump between adjacent values (small values are noise within a plane, big values are plane-to-plane steps).
The same process can then be used to find each row within a plane, and then bubbles in a row can be sorted by X.

Saturday, February 08, 2014

Challenge 24 EC round 2014

Here are my thoughts on the problems we solved from this year's electronic round of Challenge 24.

A. Safe

I didn't get a complete solution to this, but I was able to solve cases 1-9. I noticed that in most cases the state space for the wheels (keys, but the pictures made me think of them as wheels) is reasonably small. For each password, the solution will involve first pressing the keys a number of times to set up the initial state, and then typing the password. So we want to find the initial states from which we can input the password, and pick the cheapest amongst them.
This can be done by dynamic programming, running backwards over the password. When there are no more characters to type, any state is fine. Considering each letter in turn, we need to find the states in which we can type the next letter on one of the keys, and from which the resulting state allows us to complete the password.
For A10 there are 6.4 billion states, and it was going to take a long time (it did churn out the first two passwords in about 3 hours). I suspect it might be possible to do it by keeping an explicit list/set of reachable states and working from there, rather than iterating over the entire state space.

B. Wiretapping

This problem basically just requires an application of Kirchhoff's Theorem, which gives the number of spanning trees of the graph in polynomial time. To count the number of spanning trees that use the tapped link, just collapse the two endpoints into a single vertex and count the number of spanning trees of the remaining graph. The ratio between the number of spanning trees in this reduced graph and the original graph is the answer.
One complication is that the number of spanning trees may overflow a double-precision float. I worked around that by scaling down the Laplacian matrix by some factor f (I used 1/32 for the larger cases), which reduces the determinant of the cofactor by fN-1.

C. Visual Programming - VM

This was a fairly straightforward implementation problem, just requiring careful interpretation of the rules, and recognising that a lot of the information given is unnecessary (e.g., for lines you only need to keep the two endpoints).

D. Visual Programming

Due to lack of time I only solved D1, D2 and D4, and had a broken solution to D3. As with most VM problems, I invented a slightly higher-level language that is possible (but still hard) to code in by hand, and write a simpler compiler. I made a few simplifying assumptions
  • Polygons are laid out one below the other. Each polygon is rectangular, except that the right hand edge may be ragged.
  • Each line is only ever used in one direction. To ensure this, the outgoing lines from a polygon all emanate from the top part of the right edge, and incoming lines arrive at the bottom part of the right edge. There is also an unconditional jump from each polygon to the next one to ensure that the routing terminates before encountering any of the incoming lines.
  • To avoid lines cutting through polygons, each line emanates directly horizontally from the polygon to (256, y), then has a second segment to the target point. The x value of the start is chosen depending on which register is to be used in the comparison.
  • I reserved registers 0 and 1 to always contain the values 0 and 1, which made it easier to construct some other operations, as well as to build an unconditional jump.
The programming mini-language just specifies instructions and conditional a list of conditional jumps that takes place after each instruction (falling through to the next instruction if none match). The instructions I wrote at the level of setting the parameters (A-F, X-Z, T, Q, L, H), but I had useful defaults for all of them so that most instructions only changed one or two defaults e.g. setting A=3 B=15 would generate an instruction that copies from SIO into register 3.
Once all that is done, it is not particularly difficult for anyone who has done any assembly to write code for D1, D2, and D4. The only slightly tricky part was to extract the lower 8 bits from the result, which I did by multiplying to 256 then dividing by 256.

G. Spy Union

At first I thought this could be done greedily, but it doesn't work: reinstating a fired person might let you fire an additional person in each hierarchy. Instead, we can use network flow. Let's consider just one hierarchy for now, and connect the source to every node, connect each node to its parent, and connect the root to the sink (directed edges). The edges from the source have capacity 1, with the interpretation that they have flow if the corresponding employee is fired. The flow through each intermediate edge will then be the total number of employees fired from the corresponding subtree, so we will set the capacities to be the maximum number we can fire each subtree. The maximum flow will thus be the maximum number of people we can fire. 
However, we have two hierarchies. Since network flow is symmetric, we can apply the same treatment to the second hierarchy, but connect the source to the root, and the individual employees to the sink. To join up the problems, we get rid of the source in the first graph and the sink in the second graph, and create capacity-1 edges directly from each employee in the second graph to the matching employee in the first. Once again, these edges have flow if the corresponding employee is fired, and the maximum flow gives the solution.