Monday, June 12, 2017

Extra DCJ 2017 R2 analysis


My solution during the contest was essentially the same as the official analysis. Afterwards I realised a potential slight simplification: if one starts by computing the second-order differences (i.e., the differences of the differences), then one is looking for the longest run of zeros, rather than the longest run of the same value. That removes the need to communicate the value used in the runs at the start and end of each section.

Number Bases

I missed the trick of being able to uniquely determine the base from the first point at which X[i] + Y[i] ≠ Z[i]. Instead, at every point where X[i] + Y[i] ≠ Z[i], I determine two candidate bases (depending on whether there is a carry of not). Then I collect the candidates and test each of them. If more than three candidates are found, then the test case is impossible, since there must be two disjoint candidate pairs.

Broken Memory

My approach was slightly different. Each node binary searches for its broken value, using two other nodes to help (and simultaneously helping two other nodes). Let's say we know the broken value is in a particular interval. Split that interval in half, and compute hashes for each half on the node (h1 and h2) and on two other nodes (p1 and p2, q1 and q2). If h1 equals p1 or q1, then the broken value must be in interval 2, or vice versa. If neither applies, then nodes p and q both have broken values, in the opposite interval to that of the current node. We can tell which by checking whether p1 = q1 or p2 = q2.

This does rely on not having collisions in the hash function. In the contest I relied on the contest organisers not breaking my exact choice of hash function, but it is actually possible to write a solution that works on all test data. Let P be a prime greater than \(10^{18}\). To hash an interval, compute the sums \(\sum m_i\) and \(\sum i m_i\), both mod P, giving a 128-bit hash. Suppose two sequences p and q collide, but differ in at most two positions. The sums are the same, so they must differ in exactly two positions j and k, with \(p_j - q_j = q_k - p_k\) (all mod P). But then the second sums will differ by
\(jp_j + kp_k - jq_j - kq_k = (j - k)(p_j - q_j)\), and since P is prime and each factor is less than P, this will be non-zero.

Alternative Code Jam 2017 R3 solution

This last weekend was round 3 of Code Jam 2017 and round 2 of Distributed Code Jam 2017. I'm not going to describe how to solve all the problems since there are official analyses (here and here), but just mention some alternatives. For this post I'll just talk about one problem from Code Jam; some commentary on DCJ will follow later.

Slate Modern (Code Jam)

The idea I had for this seems simpler (in my opinion, without having tried implementing both) than the official solution, but unfortunately I had a bug that I couldn't find until about 10 minutes after the contest finished.

As noted in the official analysis, one can first check whether a solution is possible by comparing each pair of fixed cells: if the difference is value is greater than D times the Manhattan distance, then it is impossible; if no such pair exists, it is possible. The best solution is then found by setting each cell to the smallest lower bound imposed by any of the fixed cells.

Let's try to reduce the complexity by a factor of C, by computing the sum of a single row quickly. If we look at the upper bound imposed by one fixed cell, it has the shape \(b + |x - c| D\), where \(x\) is the column and b, c are constants. When combining the upper bounds, we take the lowest of them. Each function will be the smallest for some contiguous (possibly empty) interval. By sweeping through the fixed cells in order of c, we can identify those that contribute, similar to finding a Pareto front. Then, by comparing adjacent functions one can find the range in which each function is smallest, and a bit of algebra gives the sum over that range.

This reduces the complexity to O(RN + N²) (the N² is to check whether a solution is possible, but also hides an O(N log N) to sort the fixed cells). That's obviously still too slow. The next insight is that most of the time, moving from one row to the next changes very little: each of the functions increases or decreases by one (depending on whether the corresponding fixed cell is above or below the row), and the range in which each function is smallest grows or shrinks slightly. It thus seems highly likely that the row sum will be a low-degree polynomial in the row number. From experimentation with the small dataset, I found that it is actually a quadratic (conceptually it shouldn't be too hard to prove, but I didn't want to get bogged down in the details).

Note that I said "most of the time". This will only be true piecewise, and we need to find the "interesting" rows where the coefficients change. It is fairly obvious that rows containing fixed cells will be interesting. Additionally, when the range in which a function is smallest disappears or appears will be interesting. Consider just two fixed cells, and colour the whole grid according to which of the two fixed cells gives the lower upper bound. The boundary between the two colours can have diagonal, horizontal and vertical portions, and it is these horizontal portions that are (potentially) interesting. I took the conservative approach of adding all O(N²) such rows as interesting.

Now that we have partitioned the rows into homogeneous intervals, we need to compute the sum over each interval efficiently. Rather than determine the quadratic coefficients analytically, I just interfered them by taking the row sums of three consecutive rows (if there are fewer than three rows in the interval, just add up their row sums directly). A bit more algebra to find the sum of a quadratic series, and we're done! There are O(N²) intervals to sum and it requires O(N) to evaluate each of the row sums needed, giving a running time of O(N³). I suspect that this could probably be reduced to O(N² log N) by being smarter about how interesting rows are picked, but it is not necessary given the constraints on N.

Tuesday, November 22, 2016

An alternative DCJ 2016 solution

During this year's Distributed Code Jam I had an idea for solving the hard part of Toothpick Sculptures, but had no time to implement it, or even to work out all the details. When the official solution was presented, I was surprised that it was very different to mine. Recently Pablo from Google has been able to help me check that my solution does indeed pass the system tests (thanks Pablo), so it seems like a good time to describe the solution.

There are going to be a lot of different trees involved, so let's give them names. A "minitree" is one of the original sculptures of N toothpicks, and its root is a "miniroot". The "maxitree" is the full tree of 1000N toothpicks. Let's assume that each minitree is built out of toothpicks of one colour. We can then consider a tree whose vertices are colours, called the "colour tree", which contains an edge between two colours if and only if the maxitree contains an edge between two vertices of those colours.

For each colour, we can find its children in the colour tree by walking the corresponding minitree looking for children which are miniroots. This can be done in parallel over the colours.

For a given vertex, we can use DP to solve the following two problems for the subtree: what is the minimum cost to stabilise the subtree if we stabilise the root, and what is the minimum cost if we do not stabilise the root? But this will not be so easy to parallelise. We can go one step further: if we fix some vertex C and cut off the subtree rooted at C, then we can answer the same two problems for each case of C being stabilised or not stabilised (four questions in total). If we later obtain the answers to the original two queries for C, then we can combine this with the four answer we have to answer the original queries for the full tree.

This is sufficient to solve the small problem, and more generally any test case where the colour tree does not branch. For each miniroot, compute the DP, where the miniroot corresponding to the single child (if any) in the colour tree is the cutoff. These DPs can be done in parallel, and the master can combine the results.

Let's consider another special case, where the colour tree is shallow. In this case, one can solve one layer at a time, bottom up, without needing to use the cutoff trick at all. The colours in each layer of the colourtree are independent and can be solved in parallel, so the latency is proportional to the depth of the tree. The results of each layer are fed into the calculations of the layer above.

So, we have a way to handle long paths, and a way to handle shallow trees. This should immediately suggest light-heavy decomposition. Let the "light-depth" of a node in the colour tree be the number of light edges between it and the root. The native of light-heavy decomposition guarantees that light-depth is at most logarithmic in the tree size (which is 1000). We will process all nodes with the same light-depth in parallel. This means that a node in a tree may be processed at the same time as its children, but only along a heavy path. We thus handle each heavy path using the same technique as in the small problem. For other children in the colour tree, the subtree results were computed in a previous pass and are sent to the slave.

TCO 2016 finals

The final round of TCO 2016 did not go well for me. I implemented an over-complicated solution to the 500 that did not work (failed about 1 in 1000 of my random cases), and I started implementing the 400 before having properly solved it, leading to panic and rushed attempts to fix things in the code rather than solving the problem.


The fact that there are only four mealtimes is a clue that this will probably require some exponential time solution. I haven't worked out the exact complexity, but it has at least a factor of \(2^\binom{n}{n/2}\) for n mealtimes.

Let's restrict our attention to meal plans that contain no meals numbered higher than m and have t meals in the plan (t can be less than 4). We can further categorise these meals by which subsets of the mealtimes can accommodate the plan. This will form the state space for dynamic programming. To propagate forwards from this state, we iterate over the number of meals of type m+1 (0 to 4 of them). For each choice, we determine which subsets of mealtimes can accommodate this new meal plan (taking every subset that can accommodate the original plan and combining it with any disjoint subset that can accommodate the new meals of type m+1).


Let's start by working out whether a particular bracket sequence can be formed. Consider splitting the sequence into a prefix and suffix at some point. If the number of ('s in the prefix differs between the initial and target sequence, then the swaps that cross the split need to be used to increase or decrease the number of ('s. How many ('s can we possibly move to the left? We can answer that greedily, by running through all the operations and applying them if and only if they swap )( to (). Similarly, we can find the least possible number of ('s in the final prefix.

If our target sequence qualifies by not having too many or too few ('s in any prefix, can we definitely obtain it? It turns out that we can, although it is not obvious why, and I did not bother to prove it during the contest. Firstly, if there is any (proper) prefix which has exactly the right number of ('s, we can split the sequence at this point, choose not to apply any operations that cross the split, and recursively figure out how to satisfy each half (note that at this point we are not worrying whether the brackets are balanced — this argument applies to any sequence of brackets).

We cannot cross between having too few and too many ('s when growing the prefix without passing through having the right number, so at the bottom of our recursion we have sequences where every proper prefix has too few ('s (or too many, but that can be handled by symmetry). We can solve this using a greedy algorithm: for each operation, apply it if and only if it swaps )( to () and the number of ('s in the prefix is currently too low. We can compare the result after each operation to the maximising algorithm that tries to maximise the number of ('s in each prefix. It is not hard to see (and prove, using induction) that for any prefix and after each operation is considered, the number of ('s in the prefix is:
  • between the original value and the target value (inclusive); and
  • the smaller of the target value and the value found in the maximising algorithm.
Thus, after all operations have been considered, we will have reached the target.

So now we know how to determine whether a specific bracket sequence can be obtained. Counting the number of possible bracket sequences is straightforward dynamic programming: for each prefix length, we count the number of prefixes
that are valid bracket sequence prefixes (no unmatched right brackets) for each possible nesting depth.


This is an exceptionally mean problem, and I very much doubt I would have solved it in the fully 85 minutes; congratulations to Makoto for solving all three!

For convenience, let H be the complement of G2 (i.e. n copies of G1). A Hamiltonian path in G2 is simply a permutation of the vertices, such that no two consecutive vertices form an edge in H. We can count them using inclusion-exclusion, which requires us to count, for each m, the number of paths that pass through any m chosen edges. Once we pick a set of m edges and assign them directions (in such a way that we have a forest of paths), we can permute all the vertices that are not tails of edges, and the remaining vertices have their positions uniquely determined.

Let's start by solving the case n=1. Since k is very small, we can expect an exponential-time solution. For every subset S of the k vertices and every number m of edges, we can count the number of ways to pick m edges with orientation.  We can start by solving this for only connected components, which means that m = |S| - 1 and the edges form a path. This is a standard exponential DP that is used in the Travelling Salesman problem, where one counts the number of paths within each subset ending at each vertex.

Now we need to generalise to m < |S| - 1. Let v be the first vertex in the set. We can consider every subset for the path containing v, and then use dynamic programming to solve for the remaining set of vertices. This is another exponential DP, with a \(O(3^n)\) term.

Now we need to generalise to n > 1. If a graph contains two parts that are not connected, then edges can be chosen independently, and so the counts for the graph are obtained by convolving the counts for the two parts. Unfortunately, a na├»ve implementation of this convolution would require something like O(n²k) time, which will be too slow. But wait, why are we returning the answer modulo 998244353? Where is our old friend 1000000007, and his little brother 1000000009? In fact, \(998244353 = 2^{23} \times 7 \times 17 + 1\), which strongly suggests that the Fast Fourier Transform will be involved. Indeed, we can perform an 1048576-element FFT in the field modulo 998244353, raise each element to the power of n, and then invert the FFT.

Despite the large number of steps above, the solution actually requires surprisingly little new code, as long as one has library code for the FFT and modulo arithmetic.

Monday, November 21, 2016

TCO 2016 semifinal 2 solutions

Semifinal 2 was much tougher than semifinal 1, with only one contestant correctly finishing two problems. The first two problems each needed a key insight, after which very little coding is needed (assuming one has the appropriate library code available). The hard problem was more traditional, requiring a serious of smaller steps to work towards the solution.


Consider the smallest special clique. What happens if we remove an element? Then we will still have a clique, but the AND of all the elements is non-zero. In particular, there is some bit position which is 1 in every element of the special clique except for 1.

We can turn that around: it suffices to find a bit position B, a number T which is zero on bit B, and a special clique containing T whose other elements are all 1 on bit B. We can easily make a list of candidates to put in this special clique: any element which is 1 on bit B and which is connected to T. What's more, these candidates are all connected to each other (through bit B), so taking T plus all candidates forms a clique. What's more, if we can make a special clique by using only a subset of the candidates, then the clique formed using all candidates will be special too, so the latter is the only one we need to test.


The key insight is to treat this as a flow/matching problem. Rather than thinking of filling and emptying an unlabelled tree, we can just build a labelled tree with the properties that if A's parent is B, then B < A and A appears earlier in p than B does. A few minutes thinking should convince you that any labelled tree satisfying these constraints can be used for the sorting.

We thus need to match each number to a parent (either another number or the root). We can set this up with a bipartite graph in the usual way: each number appears on the left, connected to a source (for network flow) with an edge of capacity 1. Each number also appears on the right, along with the root, all connected to the sink with N edges of capacity 1 (we'll see later why I don't say one edge of capacity N). Between the two sides, we add an edge A → B if B is a viable parent for A, again with capacity 1. Each valid assignment can be represented by a maxflow on this network (with flow N), where the edges across the middle are parent-child relationships. Note that normally using this approach to building a tree risks creating cycles, but the heap requirement on the tree makes that impossible.

That tells us which trees are viable, but we still need to incorporate the cost function. That is where the edges from the right numbers to the sink come in. A (non-root) vertex starts with a cost of 1, then adding more children increases the cost by 3, 5, 7, ... Thus, we set the costs of the edges from each number to the sink to this sequence. Similarly, the costs of the edges from the root to the sink have costs 1, 3, 5, 7, ... The min-cost max-flow will automatically ensure that the cheaper edges get used first, so the cost of the flow will match the cost of the tree (after accounting for the initial cost of 1 for each number, which can be handled separately).


This one requires a fairly solid grasp on linear algebra and vector spaces. The set with \(2^k\) elements is a vector subspace of \(\mathbb{Z}_2^N\) of dimension k, for some unknown N. It will have a basis of size k. To deal more easily with ordering, we will choose to consider a particular canonical basis. Given an arbitrary initial basis, one can use Gaussian reduction and back propagation to obtain a (unique) basis with the following property: the leading 1 bit in each basis element is a 0 bit in every other basis element. With this basis, it is not difficult to see that including any basis element in a sum will increase rather than decrease the sum. The combination of basis elements forming the ith sorted element is thus given exactly by the binary representation of i.

We can now take the information we're given an recast it is a set of linear equations. Furthermore, we can perform Gaussian elimination on these linear equations (which we actually do in code, unlike the thought experiment Gaussian elimination above). This leaves some basis elements fixed (uniquely determined by the smaller basis elements) and others free. Because of the constraints of the basis elements, we also get some more information. If basis i appears in an equation, its leading bit must correspond to a 1 bit in the value (and to the first 1 bit, if i is the leading basis in the equation). Similarly, if basis i does not appear, then it must correspond to a 0 bit in the value. This allows us to build a mask of where the leading bit for each basis element can be.

We now switch to dynamic programming to complete the count. We count the number of ways to assign the first B basis elements using the first C value bits. Clearly dp[0][C] = 1. To add new basis, we can consider the possible positions of its leading bit (within the low C bits), and then count the degrees of freedom for the remaining bits. If the basis is fixed then there is 1 degree of freedom; otherwise, there is a degree of freedom for each bit that isn't the leading bit of a previous basis, and it is easy to count these.

There are a few details that remain to do with checking whether there are an infinite number of solutions (which can only happen if the largest basis is unconstrained), and distinguishing between a true 0 and a 0 that is actually a multiple of 1000000007, but those are left as exercises for the reader.

Sunday, November 20, 2016

Topcoder Open 2016 Semifinal 1

Here are my solutions to the first semifinal of the TCO 2016 algorithm contest.


Let's start with some easy cases. If s = t, then the answer is obviously zero. Otherwise, it is at least 1. If s and t share a common factor, then the answer is 1, otherwise it is at least 2.
We can eliminate some cases by noting that some vertices are isolated. Specifically, if either s or t is 1 or is a prime greater than n/2 (and s≠t) then there is no solution.
Assuming none of the cases above apply, can the answer be exactly 2? This would mean a path s → x → t, where x must have a common factor with s and a (different) common factor with t. The smallest x can be is thus the product of the smallest prime factors of s and t. If this is at most n, then we have a solution of length 2, otherwise the answer is at least 3.
If none of the above cases applies, then answer is in fact 3. Let p be the smallest prime factor of s and q be the smallest prime factor of t. Then the path s → 2p → 2q → t works (note that 2p and 2q are at most n because we eliminated cases where s or t is a prime greater than n/2 above).


We can create a graph with the desired properties recursively. If n is 1, then we need just two vertices, source and sink, with an edge from source to sink of arbitrary weight.
If n is even, we can start with a graph with n/2 min cuts and augment it. We add a new vertex X and edges source → X and X → sink, both of weight 1. For every min cut of the original graph, X can be placed on either side of the cut to make a min cut for the new graph.
If n is odd, we can start with a graph with n - 1 min cuts and augment it. Let the cost of the min cut of the old graph be C. We create a new source, and connect it to the original source with an edge of weight C. The cost of the min cut of this new graph is again C (this can easily be seen by considering the effect on the max flow rather than the min cut). There are two ways to achieve a cut of cost C: either cut the new edge only, or make a min cut on the original graph.
We just need to check that this will not produce too many vertices. In the binary representation of n, we need a vertex per bit and a vertex per 1 bit. n can have at most 10 bits and at most 9 one bits, so we need at most 19 vertices.


I made a complete mess of this during the contest, solving the unweighted version and then trying to retrofit it to handle a weighted version. That doesn't work.

Here's how to do it, which I worked out based on some hints from tourist and coded at 3am while failing to sleep due to jetlag. Firstly, we'll make a change that makes some of the later steps a little easier: for each pair of adjacent vertices, add an edge with weight ∞. This won't violate the nesting property, and guarantees that there is always a solution. Let the length of an edge be the difference in indices of the vertices (as opposed to the cost, which is an input). Call an edge right-maximal if it's the longest edge emanating from a particular vertex, and left-maximal if it is the longest edge terminating at a particular vertex. A useful property is that an edge is always either right-maximal, left-maximal, or both. To see this, imagine an edge that is neither: it must then be nested inside longer edges terminating at each end, but these longer edges would cross, violating the given properties.

We can now make the following observations:
  1. If x → y is right-maximal and color[x] is used, then color[y] must be used too. This is because the path must pass through x, and after x cannot skip over y.
  2. If x → y is the only edge leaving x, then the edge will appear in a path if and only if color[x] is used.
  3. If x → y is right-maximal and x →  z is the next-longest edge from x, then x → y is in a path if and only if color[x] is used and color[z] is not used.
These statements all apply similarly for left-maximal edges of course. We can consider vertices 0 and n to be of colour 0 in the above, which is a color that must be picked. What is less obvious is that condition 1 (together with its mirror) is sufficient to determine whether a set of colours is admissible. Suppose we have a set of colours that satisfies condition 1, and consider the subset S of vertices whose colours come from this set. We want to be sure that every pair of consecutive vertices in S are linked by an edge. Suppose there is a pair (u, v) which are not. Consider the right-maximal edge from u: by condition 1, it must terminate at some vertex in S, which must then be to the right of v. But the left-maximal edge to v must start to the left of u, and so these two edges cross.

We can now use this information to construct a new graph, whose minimum cut will give us the cost of the shortest path. It has a vertex per colour, a source vertex corresponding to the special colour 0, and a sink vertex not corresponding to a colour. The vertices on the source side of the cut correspond to the colours that are visited. Every time colour C implies colour D, add an edge C → D of weight ∞, to prevent C being picked while D is not. If an edge falls into case 2 above, add an edge from color[x] to the sink with weight equal to the edge cost. Otherwise it falls into case 3; add an edge from color[x] to color[y] with weight equal to the edge cost. Each edge in the original graph now corresponds to an edge in this new graph, and the original edge is traversed if and only if the new edge forms part of the cut.

Monday, June 13, 2016

Code Jam and Distributed Code Jam 2016 solutions

Code Jam

Since the official analysis page still says "coming soon", I thought I'd post my solutions to both the Code Jam and Distributed Code Jam here.

Teaching Assistant

We can start by observing that when we take a new task, it might as well match the mood of the assistant: doing so will give us at least 5 points, not doing so will give us at most 5. Also, there is no point in getting to the end of the course with unsubmitted assignments: if we did, we should not have taken that assignment, instead submitting a previous assignment (a parity argument proves that there was a previous assignment). So what we're looking for is a way to pair up days of the course, obeying proper nesting, scoring 10 points if the paired days have the same mood and 5 points otherwise.

That's all we need to start on the small case, which can be done with a standard DP. For every even-length interval of the course, we compute the maximum score. We iterate to pick the day to match up to the first day of the interval, then use previous results to find the best scores for the two subintervals this generates.

That's O(N³), which is too slow for the large case. It turns out that a greedy approach works: every day, if the mood matches the top of the stack, submit, otherwise take a new assignment. The exception is that once the stack height matches the number of remaining days, one must submit every day to ensure that the stack is emptied.

I haven't quite got my head around a proof, but since it worked on the test case I had for the small input I went ahead and submitted it.

Forest university

The very weak accuracy requirement, and the attention drawn to it, is a strong hint that the answer should be found by simulation rather than analytically. Thus, we need to find a way to uniformly sample a topological walk of the forest. This is not as simple as always uniformly picking one of the available courses. Consider one tree in the forest: it can be traversed in some number of ways (all of them starting with the root of the tree), and the rest of the forest can be traversed in some number of ways, and then these can be interleaved arbitrarily. If we consider all ways to interleave A items and B items, then A/(A+B) of them will start with an element of A. Thus, the probability that the root of a particular tree will be chosen is proportional to the size of that tree.

After picking the first element, the available courses again form a forest, and one can continue. After uniformly sampling a sequence of courses, simply check which of the cool words appear on the hat.

Rebel against the empire

I consider this problem to be quite a bit harder than D. It's easier to see what to do, but reams of tricky code.

For the small case, time is not an issue because the asteroids are static. Thus, one just needs to find the bottleneck distance to connect asteroids 0 and 1. I did this by adding edges from shortest to longest to a union-find structure until 0 and 1 were connected (ala Kruskal's algorithm), but it can also be done by priority-first search (Prim's algorithm) or by binary searching for the answer and then checking connectivity.

For the large case we'll take that last approach, binary searching over the answers and then checking whether it is possible. Naturally, we will need to know at which times it's possible to make jumps between each pair of asteroids. This is a bit of geometry/algebra, which gives a window of time for each pair (possibly empty, possibly infinite). Now consider a particular asteroid A. In any contiguous period of time during which at least one window is open, it's possible to remain on A, regardless of S, by simply jumping away and immediately back any time security are about to catch up. Also, if two of these intervals are separated in time by at most S, they can be treated as one interval, because one can sit tight on A during the gap between windows. On the other hand, any period longer than S with no open windows is of no use.

The key is to ask for the earliest time at which one can arrive at each interval. This can be done with a minor modification of Dijkstra's algorithm. The modification is that the outgoing edges from an interval are only those windows which have not already closed by the time of arrival at the interval.


I liked this problem, and I really wish the large had been worth fewer points, because I might then have taken it on and solved it.

The key is to realise that the good strings don't really matter. Apart from the trivial case of the bad string also being a good string, it is always possible to solve the problem by producing a pair of programs that can produce every string apart from the bad one.

For the small case, we can use 0?0?0?0? (N repetitions) for the first program, and 111 (N-1 repetitions) for the second (but be careful of the special case N=1!) Each of the 1's can be interleaved once into the first program to produce a 1 in the output, but because there are only N-1 1's, it is impossible to produce the bad string.

For the large case, the same idea works, but we need something a bit more cunning. The first program is basically the same: alternating digits and ?'s, with the digits forming the complement of the bad string. To allow all strings but the bad string to be output, we want the second program to be a string which does not contain the bad string as a subsequence, but which contains every subsequence of the bad string as a subsequence. This can be achieved by replacing every 1 in the bad string by 01 and every 0 by 10, concatenating them all together, then dropping the last character. Proving that this works is left as an exercise for the reader.

Distributed Code Jam


The code appears to be summing up every product of an element in A with an element in B. However, closer examination shows that those where the sum of the indices is a multiple of M (being the number of nodes) are omitted. However, once we pick an index modulo M for each, we're either adding all or none. The sum of all products of pairs is the same as the product of the sums. So, for each remainder modulo M, we can add up all elements of A, and all elements of B. Then, for each pair of remainders, we either multiply these sums together and accumulate into the grand total, or not. There are some details regarding the modulo 1000000007, but they're not difficult.

For the large case, we obviously need to distribute this, which can be done by computing each sum on a different node, then sending all the sums back to the master.


Here, a valid lisp program is just a standard bracket sequence (although it is required to be non-empty). It's well-known that a sequence is valid if and only if:
  • the nesting level (number of left brackets minus number of right brackets) of every prefix is non-negative; and
  • the nesting level of the whole sequence is zero.
If the nesting level ever goes negative in a left-to-right scan, then the point just before it went negative is the longest sequence that can be completed. This makes the small case trivial to implement.

To distribute this, we can do the usual thing of assigning each node an interval to work on. We can use a parallel prefix sum to find the nesting level at every point (there is lots of information on the internet about how to do this). Then, each node can find it's first negative nesting level, and send this back to a master to find the globally first one. We also need to know the total nesting level by sending the sum for each interval to the master, but that's already done as part of the parallel prefix sum.


I liked this one better than the Code Jam asteroids problem, but it was also rather fiddly. The small case is a straightforward serial DP: going from bottom to top, compute the maximum sum possible when ending one's turn on each location.

At first I thought that this could be parallelised in the same manner as Rocks or Mutex from last year's problems. However, those had only bottom-up and left-to-right propagation of DP state. In this problem, we have bottom-up, left-to-right and right-to-left. On the other hand, the left-to-right and right-to-left propagation is slow: at one most unit for every unit upwards. We can use this!

Each node will be responsible for a range of columns. However, let's say it starts with knowledge of the prior state for B extra columns on either side. Then after one row of DP iteration, it will correctly know the state for B-1 extra columns, and after B iterations it will still have the correct information for the columns its responsible for. At this stage it needs to exchange information with its neighbours: it will receive information about the B edge columns on either side, allowing it to continue on its way for another B rows.

There are a few issues involved in picking B. If it's too small (less than 60), then nodes will need to send more than 1000 messages. If it's too large, then nodes will do an excessive number of extra GetPosition queries. Also, B must not be larger than the number of columns a node is handling; in narrow cases we actually need to reduce the number of nodes being used.

Gas stations

The serial version of this problem is a known problem, so I'll just recap a standard solution quickly. At each km, you need to decide how much petrol to add. If there is a town within T km that is cheaper, you should add just enough petrol to reach it (the first one if there are multiple), since any extra you could have waited until you got there. Otherwise, you should fill up completely. A sweep algorithm can give the next cheaper town for every town in linear time.

What about the large case? The constraint is interesting: if we had enough memory, and if GetGasPrice was faster, there is enough time to do it serially! In fact, we can do it serially, just passing the baton from one node to the next as the car travels between the intervals assigned to each node.

What about computing the next cheaper station after each station? That would be difficult, but we don't actually need it. We only need to know whether there is a cheaper one within T km. We can start by checking the current node, using a node-local version of the next-cheapest array we had before. If that tells us that there is nothing within the current node, and T km away is beyond the end of the current node, then we will just find the cheapest within T km and check if that is cheaper than the current value. The range will completely overlap some nodes; for those nodes, we can use a precomputed array of the cheapest for each node. There will then be some prefix of another node. We'll just call GetGasPrice to fetch values on this node. Each iteration this prefix will grow by one, so we only need one call to GetGasPrice per iteration (plus an initial prefix). This means we're calling GetGasPrice at most three times per station (on average), which turns out to the fast enough. It's possible to reduce it to 2 by computing the minimum of the initial prefix on the node that already has the data, then sending it to the node that needs it.

There is one catch, which caused me to make a late resubmission. I was calling GetGasPrice to grow the prefix in the serial part of the code! You need to make all the calls up front in the parallel part of the code and cache the results.

Sunday, March 06, 2016

Solutions to Facebook Hacker Cup 2016

Since the official solutions aren't out yet, here are my solutions. Although I scored 0 in the contest, four of my solutions were basically correct, one I was able to solve fairly easily after another contestant pointed out that my approach was wrong, and I figured out Grundy Graphs on the way home.

Snake and Ladder

I'll describe two solutions to this. There are a few things one can start off with. Firstly, if there are rows at the top or bottom that are completely full of flowers, just cut them off — they make no difference. Then, handle N=1 specially: if K=1, the answer is 1, if K=0, it is 2. A number of people (including all the contest organisers!) messed up the first, and I messed up the second. Also, if one rung has two flowers on it, the answer is 0.

My contest solution used dynamic programming. If one cuts the ladder half-way between two rungs and considers the state of the bottom half, it can be divided into four cases:
  1. Only the left-hand side has the snake on it.
  2. Only the right-hand side has the snake on it.
  3. Both sides have the snake on it, and the two are connected somewhere below.
  4. Both sides have the snake on it, and the two do not connect.
It is reasonably straightforward (as long as one is careful) to determine the dynamic programming transitions to add a rung, for each possibility of a flower on the left, on the right, or no flower. Advancing one rung at a time will be too slow for the bound on N, but the transitions can be expressed as 4×4 matrices and large gaps between flowers can be crossed with fast matrix exponentiation. Finally, one must double the answer to account for the choice of which end is the head.

The alternative solution is to argue on a case-by-case basis. If there no flowers, then one can pick a row for the head and a row for the tail, and pick a column for the head (the column for the tail is then forced by parity). Once these choices are made, there is only one way to complete the snake. Also, the head and tail can only occupy the same row if it is the top or bottom row, giving 2N(N-1)+4 ways.

If there is at least one flower, then the head must be above the top flower and the tail below the bottom flower (or vice versa). Again, one can choose a row for each, with the column being forced by parity. One must also consider special cases for a flower in the top/bottom row.

Boomerang Crew

I think this was possibly the easiest of the problems, provided you thought clearly about it (which I failed to do). Clearly any opponents weaker than (or equal to) your champion can be defeated just be putting them up against your champion. Also, if you're going to defeat P players, they might as well be the weakest P players of the opposition. For a given P, can this be done? There will be a certain number of strong players (stronger than our champion), which we will have to sacrifice our weaker players against to wear them down. What's more, once we've worn one down and beaten him/her, we have to use the winner of that game to wear down the next strong opponent. Thus, we should match up the S strong players against our best S players in some order, and use our remaining players as sacrifices. Ideally we'd like to wear down each opponent to exactly the skill level of the player we will use to win; anything more is wasted effort. We can apply this idea greedily: for each opponent, find the remaining strong player of ours who can win by the smallest margin.

With that method to test a particular P, it is a standard binary search to determine the number of players we can beat.

Grundy Graph

I think this was the hardest problem: my submission wasn't a real solution and was just for a laugh; several other people I spoke to had submitted but didn't believe in their solution. During the contest I suspected that the solution would be related to 2-SAT, but it wasn't until the flight home that I could figure out how to incorporate turn ordering.

Let us construct an auxiliary directed graph A whose vertices correspond to those of the original. An edge u->v in A means that if u is black, then v must be black as well for Alice to win. All the edges in the original graph are clearly present in A. However, for each u->v in the original, we also add v'->u', where x' is the vertex assigned a colour at the same time as x. This is because if v' is black but u' is white, then it implies that u is black and v is white. This is the same as the contrapositive edge in a 2-SAT graph. Let x=>y indicate that y is reachable from x in A.

There are a number of situations in which we can see that Bob can win:
  • If any strongly connected component contains both x and x', then Bob wins regardless of what Alice and Bob play.
  • If Bob controls vertices x and y and x=>y, then Bob wins by assigning x black and y white, regardless of Alice's play.
  • If Bob controls x, Alice controls y, x=>y, x' => y', and y is played before x, then Bob wins by assigning y the opposite colour to x.
We claim that if none of the above apply, then Alice wins. Her strategy is that on her turn, she must colour x black if there is any u such that u => x and either u has already been coloured black, u could later be coloured black by Bob, or u = x'. This ensures that it does not immediately allow Bob to win, nor allows Bob to use it to win later. The only way this could fail is if both x and x' are forced to be black (if neither is forced, Alice can pick freely).

Suppose u => x and v => x', where u and v are as described above. Then x => v', so u => v'. Firstly consider u = x'. We cannot have v = x (because otherwise x, x' are in the same SCC). Now x' => v', so v => v'. If Bob controls v, then we hit the second case above; if Alice controls v, then v => v' means she would never have chosen v. So we have a contradiction. Now, if u ≠ x' (and by symmetry, v ≠ x), then consider u => v' again. Alice cannot control v/v', since she would then have chosen v' rather than v. But similarly, v => u', so Alice couldn't have chosen u. It follows that Bob controls u and v, which can only be possible if u = v'. But u and u' can only both be relevant if Bob hasn't decided their colour yet, which is the third case above.

The implementation details are largely left as an exercise. Hint: construct the SCCs of A, and the corresponding DAG of the SCCs, then walk it in topological order to check the conditions. The whole thing takes linear time.


The concept here is a reasonably straightforward exponential DP, but needs a careful implementation to get all the formulae right, particularly in the face of potential numeric instability. The graph itself isn't particularly relevant; the only "interesting" vertices are the start and the gifts, and the only useful information is the distance from each interesting vertex to each other one (if reachable) and the distance from each gift to the nearest dead-end. The DP state is the set of gifts collected and the current position (one of the interesting vertices). When one is at a gift, there are two possible options:
  1. Pick another gift, and make towards it along the shortest path. Based on the path length L, there is a probability P^L of reaching it in time DL, and a probability 1 - P^L of respawning, with an expected time computable from a formula.
  2. Aim to respawn by heading for the nearest dead-end. Again, one can work out a formula for the expected time to respawn.
When one is at the start, one should of course pick a gift and head for it. Again, one can compute an expected time to reach it (one obtains a formula that includes its own result, but a little algebra sorts that out).

Maximinimax flow

I think I liked this problem the best, even though I screwed up the limits on the binary search. Firstly, what is the minimax flow? For a general graph it would be nasty to compute, but this graph has the same number of edges as vertices. This means that it has exactly one cycle with trees hanging off of it. It shouldn't be too hard to convince yourself that the minimax flow is the smaller of the smallest edge outside the cycle and the sum of the two smallest edges inside the cycle.

Veterans will immediately reach for a binary search, testing whether it is possible to raise the minimax flow to a given level. For the edges outside the ring, this is a somewhat standard problem that can be solved with query structures e.g. Fenwick trees, indexed by edge value. To raise the minimum to a given level, one needs to know the number of edges below that level (one Fenwick tree, storing a 1 for each edge), and their current sum (a second tree, storing the edge weight for each edge).

For the vertices inside the cycle it is only slightly more complex. If the required level is less than double the second-smallest edge, then one need only augment the smallest edge. Otherwise one raises all edges to half the target, with a bit of special-casing when the target is odd. The Fenwick tree can also be queried to find the two smallest edges, but I just added a std::multiset to look this up.

Rainbow strings

This seemed to be one of the most-solved problems, and is possibly the easiest if you have a suffix tree/array routine in your toolbox. A right-to-left sweep will yield the shortest and longest possible substring for each starting position (next green for the shortest, next red for the longest). The trick is to process queries in order of length. Each entry in the suffix array will become usable at some length, and unusable again at another length, and these events, along with the queries, can be sorted by length for processing. The only remaining work is to determine the Kth of the active suffixes, which can be done with a Fenwick tree or other query structure.

Finally, one needs the frequency table for each prefix to be able to quickly count the frequency in each substring.

Wednesday, May 20, 2015

Analysis of ICPC 2015 world finals problems

Analysis of ICPC 2015 world finals problems

I haven't solved all the problems yet, but here are solutions to those I have solved (I've spent quite a bit more than 5 hours on it though).

A: Amalgamated Artichokes

This is a trivial problem that basically everybody solved - I won't go into it.

B: Asteroids

This looks quite nasty, but it can actually be solved by assembling a number of fairly standard computational geometry tools, and the example input was very helpful. To simplify things, let's switch to the frame of reference of the first asteroid, and assume only the second asteroid moves (call them P and Q for short). The first thing to notice is that the area is a piecewise quadratic function of time, with changes when a vertex of one polygon crosses an edge of the other. This is because the derivative of area depends on the sections of boundary of Q inside P, and those vary linearly in those intervals. Finding the boundaries between intervals is a standard line-line intersection problem. To distinguish between zero and "never", touching is considered to be an intersection too.
We also need a way to compute the area of the intersection. Clipping a convex polygon to a half-space is a standard problem too - vertices are classified as inside or outside, and edges that go from inside to outside or vice versa introduce a new interpolated vertex, so repeated clipping will give the intersection of convex polygons. And finding the area of a polygon is also very standard.
Finally, we need to handle the case where the best area occurs midway through one of the quadratic pieces. Rather than try to figure out the quadratic coefficients geometrically, I just sampled it at three points (start, middle, end) and solved for the coefficients, and hence the maximum, with algebra.

C: Catering

I'm sure I'm seen something similar, but I'm not sure where. It can be set up as a min-cost max-flow problem. Split every site into two, an arrival area and a departure area. Connect the source to each departure area, with cap 1 for clients and cap K for the company. Connect each arrival area to the sink similarly. Connect every departure area to every later arrival area with cap 1, except for the company->company link with cap K. Edge costs match the table of costs where appropriate, zero elsewhere. The maximum flow is now K+N (K teams leave the company, and equipment arrives at and leaves every client), and the minimum cost is the cost of the operation.

D: Cutting cheese

This is basically just a binary search for each cut position, plus some mathematics to give the volume of a sphere that has been cut by a plane.

E: Parallel evolution

Start by sorting the strings by length. A simplistic view is that we need to go through these strings in order, assigning each to one path or the other. This can be done with fairly standard dynamic programming, but it will be far too slow.
Add an "edge" from each string to the following one if it is a subsequence of this following one. This will break the strings up into connected chains. A key observation is that when assigning the strings in a chain, it is never necessary to switch sides more than once: if two elements in a chain are on one path, one can always put all the intervening elements on the same path without invalidating a solution. Thus, one need only consider two cases for a chain: put all elements in one path (the opposite path to the last element of the previous chain), or start the chain on one path then switch to the other path part-way through. In the latter case, one should make the switch as early as possible (given previous chains), to make it easier to place subsequent strings. A useful feature is that in either case, we know the last element in both paths, which is all that matters for placing future chains. Dynamic programming takes care of deciding which option to use. Only a linear number of subsequence queries is required.
I really liked this problem, even though the implementation in messy. It depends only on the compatibility relationship being a partial order and there being a cheap way to find a topological sort.

F: Keyboarding

This can be done with a fairly standard BFS. The graph has a state for each cursor position and number of completed letters. There are up to 5 edges from each state, corresponding to the 5 buttons. I precomputed where each arrow key moves to from each grid position, but I don't know if that is needed. It's also irrelevant that keys form connected regions.

H: Qanat

This is a good one for a mathsy person to work out on paper while someone else is coding - the code itself is trivial. Firstly, the slope being less than one implies that dirt from the channel always goes out one of the two nearest shafts - whichever gets it to the surface quicker (it helps to think of there being a zero-height shaft at x=0). Between each pair of shafts, one can easily find the crossover point.
Consider just three consecutive shafts, and the cost to excavate them and the section of channel between them. If the outer shafts are fixed, the cost is a quadratic in the position of the middle shaft, from which one can derive a formula for its position in terms of its neighbours. This gives a relationship of the form xi+2 - gxi+1 + xi = 0. In theory, this can now be used to iteratively find the positions of all the shafts, up to a scale factor which is solved by the requirement for the final shaft (the mother well) to be at x=W.
I haven't tested it, but I think evaluating the recurrence could lead to catastrophic rounding problems, because errors introduced early on are exponentially scaled up, faster than the sequence itself grows. The alternative I used is to find a closed formula for the recurrence. This is a known result: let a and b be the roots of x2 - gx + 1 = 0; then the ith term is r(ai - bi) for the scale factor r. Finding the formula for g shows that the roots will always be real (and distinct) rather than complex.

I: Ship traffic

This mostly needed some careful implementation. For each ship, there is an interval during which the ferry cannot launch. Sort the start and end times of all these intervals and sweeping through them, keeping track of how many ships will be hit for each interval between events. The longest such interval with zero collisions is the answer.

J: Tile cutting

Firstly, which sizes can be cut, and in how many ways? If the lengths from the corners of the parallelograms to the corners of the tile are a, b, c, d, then the area is ab + cd. So the number of ways to make a size is the number of ways it can be written as ab + cd, for a, b, c, d > 0. The number of ways to write a number as ab can be computed by brute force (iterate over all values of a and b for which ab <= 500000). The number of ways to write a number as ab + cd is the convolution of this function with itself (ala polynomial multiplication). There are well-known algorithms to do this in O(N log N) time (with an FFT) or in O(N1.58) time (divide-and-conquer), and either is fast enough.

K: Tours

I've got a solution that passes, but I don't think I can fully prove why.
I start by running a DFS to decompose the graph into a forest plus edges from a node to an ancestor in the forest. Each such "up" edge corresponds to a simple cycle. Clearly, the number of companies must divide into the length of any cycle, so we can take the GCD of these cycle lengths.
If the cycles were all disjoint, this would (I think) be sufficient, but overlapping cycles are a problem. Suppose two cycles overlap. That means there are two points A and B, with three independent paths between them, say X, Y, Z. If X has more than its share of routes from some company, then Y must have less than its share, to balance the tour X-Y. Similarly, Z must have less than its share. But then the tour Y-Z has too few routes from that company. It follows that X, Y and Z must all have equal numbers of routes from each company, and hence the number of companies must divide into each of their lengths.
And now for the bit I can't prove: it seems to be sufficient to consider only pairs of the simple cycles corresponding to using a single up edge. I just iterate over all pairs and compute the overlap. That is itself not completely trivial, requiring an acceleration structure to compute kth ancestors and least common ancestors.

L: Weather report

This is an interesting variation of a classic result in compression theory, Hamming codes. The standard way to construct an optimal prefix-free code from a frequency table is to keep a priority queue of coding trees, and repeatedly combine the two least frequent items by giving them a common parent. That can't be done directly here because there are trillions of possible strings to code, but it can be done indirectly by noting that permutations have identical frequency, and storing them as a group rather than individual items. The actions of the original algorithm then need to be simulated, pairing up large numbers of identical items in a single operation.