Tuesday, November 22, 2016

An alternative DCJ 2016 solution

During this year's Distributed Code Jam I had an idea for solving the hard part of Toothpick Sculptures, but had no time to implement it, or even to work out all the details. When the official solution was presented, I was surprised that it was very different to mine. Recently Pablo from Google has been able to help me check that my solution does indeed pass the system tests (thanks Pablo), so it seems like a good time to describe the solution.

There are going to be a lot of different trees involved, so let's give them names. A "minitree" is one of the original sculptures of N toothpicks, and its root is a "miniroot". The "maxitree" is the full tree of 1000N toothpicks. Let's assume that each minitree is built out of toothpicks of one colour. We can then consider a tree whose vertices are colours, called the "colour tree", which contains an edge between two colours if and only if the maxitree contains an edge between two vertices of those colours.

For each colour, we can find its children in the colour tree by walking the corresponding minitree looking for children which are miniroots. This can be done in parallel over the colours.

For a given vertex, we can use DP to solve the following two problems for the subtree: what is the minimum cost to stabilise the subtree if we stabilise the root, and what is the minimum cost if we do not stabilise the root? But this will not be so easy to parallelise. We can go one step further: if we fix some vertex C and cut off the subtree rooted at C, then we can answer the same two problems for each case of C being stabilised or not stabilised (four questions in total). If we later obtain the answers to the original two queries for C, then we can combine this with the four answer we have to answer the original queries for the full tree.

This is sufficient to solve the small problem, and more generally any test case where the colour tree does not branch. For each miniroot, compute the DP, where the miniroot corresponding to the single child (if any) in the colour tree is the cutoff. These DPs can be done in parallel, and the master can combine the results.

Let's consider another special case, where the colour tree is shallow. In this case, one can solve one layer at a time, bottom up, without needing to use the cutoff trick at all. The colours in each layer of the colourtree are independent and can be solved in parallel, so the latency is proportional to the depth of the tree. The results of each layer are fed into the calculations of the layer above.

So, we have a way to handle long paths, and a way to handle shallow trees. This should immediately suggest light-heavy decomposition. Let the "light-depth" of a node in the colour tree be the number of light edges between it and the root. The native of light-heavy decomposition guarantees that light-depth is at most logarithmic in the tree size (which is 1000). We will process all nodes with the same light-depth in parallel. This means that a node in a tree may be processed at the same time as its children, but only along a heavy path. We thus handle each heavy path using the same technique as in the small problem. For other children in the colour tree, the subtree results were computed in a previous pass and are sent to the slave.

TCO 2016 finals

The final round of TCO 2016 did not go well for me. I implemented an over-complicated solution to the 500 that did not work (failed about 1 in 1000 of my random cases), and I started implementing the 400 before having properly solved it, leading to panic and rushed attempts to fix things in the code rather than solving the problem.


The fact that there are only four mealtimes is a clue that this will probably require some exponential time solution. I haven't worked out the exact complexity, but it has at least a factor of \(2^\binom{n}{n/2}\) for n mealtimes.

Let's restrict our attention to meal plans that contain no meals numbered higher than m and have t meals in the plan (t can be less than 4). We can further categorise these meals by which subsets of the mealtimes can accommodate the plan. This will form the state space for dynamic programming. To propagate forwards from this state, we iterate over the number of meals of type m+1 (0 to 4 of them). For each choice, we determine which subsets of mealtimes can accommodate this new meal plan (taking every subset that can accommodate the original plan and combining it with any disjoint subset that can accommodate the new meals of type m+1).


Let's start by working out whether a particular bracket sequence can be formed. Consider splitting the sequence into a prefix and suffix at some point. If the number of ('s in the prefix differs between the initial and target sequence, then the swaps that cross the split need to be used to increase or decrease the number of ('s. How many ('s can we possibly move to the left? We can answer that greedily, by running through all the operations and applying them if and only if they swap )( to (). Similarly, we can find the least possible number of ('s in the final prefix.

If our target sequence qualifies by not having too many or too few ('s in any prefix, can we definitely obtain it? It turns out that we can, although it is not obvious why, and I did not bother to prove it during the contest. Firstly, if there is any (proper) prefix which has exactly the right number of ('s, we can split the sequence at this point, choose not to apply any operations that cross the split, and recursively figure out how to satisfy each half (note that at this point we are not worrying whether the brackets are balanced — this argument applies to any sequence of brackets).

We cannot cross between having too few and too many ('s when growing the prefix without passing through having the right number, so at the bottom of our recursion we have sequences where every proper prefix has too few ('s (or too many, but that can be handled by symmetry). We can solve this using a greedy algorithm: for each operation, apply it if and only if it swaps )( to () and the number of ('s in the prefix is currently too low. We can compare the result after each operation to the maximising algorithm that tries to maximise the number of ('s in each prefix. It is not hard to see (and prove, using induction) that for any prefix and after each operation is considered, the number of ('s in the prefix is:
  • between the original value and the target value (inclusive); and
  • the smaller of the target value and the value found in the maximising algorithm.
Thus, after all operations have been considered, we will have reached the target.

So now we know how to determine whether a specific bracket sequence can be obtained. Counting the number of possible bracket sequences is straightforward dynamic programming: for each prefix length, we count the number of prefixes
that are valid bracket sequence prefixes (no unmatched right brackets) for each possible nesting depth.


This is an exceptionally mean problem, and I very much doubt I would have solved it in the fully 85 minutes; congratulations to Makoto for solving all three!

For convenience, let H be the complement of G2 (i.e. n copies of G1). A Hamiltonian path in G2 is simply a permutation of the vertices, such that no two consecutive vertices form an edge in H. We can count them using inclusion-exclusion, which requires us to count, for each m, the number of paths that pass through any m chosen edges. Once we pick a set of m edges and assign them directions (in such a way that we have a forest of paths), we can permute all the vertices that are not tails of edges, and the remaining vertices have their positions uniquely determined.

Let's start by solving the case n=1. Since k is very small, we can expect an exponential-time solution. For every subset S of the k vertices and every number m of edges, we can count the number of ways to pick m edges with orientation.  We can start by solving this for only connected components, which means that m = |S| - 1 and the edges form a path. This is a standard exponential DP that is used in the Travelling Salesman problem, where one counts the number of paths within each subset ending at each vertex.

Now we need to generalise to m < |S| - 1. Let v be the first vertex in the set. We can consider every subset for the path containing v, and then use dynamic programming to solve for the remaining set of vertices. This is another exponential DP, with a \(O(3^n)\) term.

Now we need to generalise to n > 1. If a graph contains two parts that are not connected, then edges can be chosen independently, and so the counts for the graph are obtained by convolving the counts for the two parts. Unfortunately, a na├»ve implementation of this convolution would require something like O(n²k) time, which will be too slow. But wait, why are we returning the answer modulo 998244353? Where is our old friend 1000000007, and his little brother 1000000009? In fact, \(998244353 = 2^{23} \times 7 \times 17 + 1\), which strongly suggests that the Fast Fourier Transform will be involved. Indeed, we can perform an 1048576-element FFT in the field modulo 998244353, raise each element to the power of n, and then invert the FFT.

Despite the large number of steps above, the solution actually requires surprisingly little new code, as long as one has library code for the FFT and modulo arithmetic.

Monday, November 21, 2016

TCO 2016 semifinal 2 solutions

Semifinal 2 was much tougher than semifinal 1, with only one contestant correctly finishing two problems. The first two problems each needed a key insight, after which very little coding is needed (assuming one has the appropriate library code available). The hard problem was more traditional, requiring a serious of smaller steps to work towards the solution.


Consider the smallest special clique. What happens if we remove an element? Then we will still have a clique, but the AND of all the elements is non-zero. In particular, there is some bit position which is 1 in every element of the special clique except for 1.

We can turn that around: it suffices to find a bit position B, a number T which is zero on bit B, and a special clique containing T whose other elements are all 1 on bit B. We can easily make a list of candidates to put in this special clique: any element which is 1 on bit B and which is connected to T. What's more, these candidates are all connected to each other (through bit B), so taking T plus all candidates forms a clique. What's more, if we can make a special clique by using only a subset of the candidates, then the clique formed using all candidates will be special too, so the latter is the only one we need to test.


The key insight is to treat this as a flow/matching problem. Rather than thinking of filling and emptying an unlabelled tree, we can just build a labelled tree with the properties that if A's parent is B, then B < A and A appears earlier in p than B does. A few minutes thinking should convince you that any labelled tree satisfying these constraints can be used for the sorting.

We thus need to match each number to a parent (either another number or the root). We can set this up with a bipartite graph in the usual way: each number appears on the left, connected to a source (for network flow) with an edge of capacity 1. Each number also appears on the right, along with the root, all connected to the sink with N edges of capacity 1 (we'll see later why I don't say one edge of capacity N). Between the two sides, we add an edge A → B if B is a viable parent for A, again with capacity 1. Each valid assignment can be represented by a maxflow on this network (with flow N), where the edges across the middle are parent-child relationships. Note that normally using this approach to building a tree risks creating cycles, but the heap requirement on the tree makes that impossible.

That tells us which trees are viable, but we still need to incorporate the cost function. That is where the edges from the right numbers to the sink come in. A (non-root) vertex starts with a cost of 1, then adding more children increases the cost by 3, 5, 7, ... Thus, we set the costs of the edges from each number to the sink to this sequence. Similarly, the costs of the edges from the root to the sink have costs 1, 3, 5, 7, ... The min-cost max-flow will automatically ensure that the cheaper edges get used first, so the cost of the flow will match the cost of the tree (after accounting for the initial cost of 1 for each number, which can be handled separately).


This one requires a fairly solid grasp on linear algebra and vector spaces. The set with \(2^k\) elements is a vector subspace of \(\mathbb{Z}_2^N\) of dimension k, for some unknown N. It will have a basis of size k. To deal more easily with ordering, we will choose to consider a particular canonical basis. Given an arbitrary initial basis, one can use Gaussian reduction and back propagation to obtain a (unique) basis with the following property: the leading 1 bit in each basis element is a 0 bit in every other basis element. With this basis, it is not difficult to see that including any basis element in a sum will increase rather than decrease the sum. The combination of basis elements forming the ith sorted element is thus given exactly by the binary representation of i.

We can now take the information we're given an recast it is a set of linear equations. Furthermore, we can perform Gaussian elimination on these linear equations (which we actually do in code, unlike the thought experiment Gaussian elimination above). This leaves some basis elements fixed (uniquely determined by the smaller basis elements) and others free. Because of the constraints of the basis elements, we also get some more information. If basis i appears in an equation, its leading bit must correspond to a 1 bit in the value (and to the first 1 bit, if i is the leading basis in the equation). Similarly, if basis i does not appear, then it must correspond to a 0 bit in the value. This allows us to build a mask of where the leading bit for each basis element can be.

We now switch to dynamic programming to complete the count. We count the number of ways to assign the first B basis elements using the first C value bits. Clearly dp[0][C] = 1. To add new basis, we can consider the possible positions of its leading bit (within the low C bits), and then count the degrees of freedom for the remaining bits. If the basis is fixed then there is 1 degree of freedom; otherwise, there is a degree of freedom for each bit that isn't the leading bit of a previous basis, and it is easy to count these.

There are a few details that remain to do with checking whether there are an infinite number of solutions (which can only happen if the largest basis is unconstrained), and distinguishing between a true 0 and a 0 that is actually a multiple of 1000000007, but those are left as exercises for the reader.

Sunday, November 20, 2016

Topcoder Open 2016 Semifinal 1

Here are my solutions to the first semifinal of the TCO 2016 algorithm contest.


Let's start with some easy cases. If s = t, then the answer is obviously zero. Otherwise, it is at least 1. If s and t share a common factor, then the answer is 1, otherwise it is at least 2.
We can eliminate some cases by noting that some vertices are isolated. Specifically, if either s or t is 1 or is a prime greater than n/2 (and s≠t) then there is no solution.
Assuming none of the cases above apply, can the answer be exactly 2? This would mean a path s → x → t, where x must have a common factor with s and a (different) common factor with t. The smallest x can be is thus the product of the smallest prime factors of s and t. If this is at most n, then we have a solution of length 2, otherwise the answer is at least 3.
If none of the above cases applies, then answer is in fact 3. Let p be the smallest prime factor of s and q be the smallest prime factor of t. Then the path s → 2p → 2q → t works (note that 2p and 2q are at most n because we eliminated cases where s or t is a prime greater than n/2 above).


We can create a graph with the desired properties recursively. If n is 1, then we need just two vertices, source and sink, with an edge from source to sink of arbitrary weight.
If n is even, we can start with a graph with n/2 min cuts and augment it. We add a new vertex X and edges source → X and X → sink, both of weight 1. For every min cut of the original graph, X can be placed on either side of the cut to make a min cut for the new graph.
If n is odd, we can start with a graph with n - 1 min cuts and augment it. Let the cost of the min cut of the old graph be C. We create a new source, and connect it to the original source with an edge of weight C. The cost of the min cut of this new graph is again C (this can easily be seen by considering the effect on the max flow rather than the min cut). There are two ways to achieve a cut of cost C: either cut the new edge only, or make a min cut on the original graph.
We just need to check that this will not produce too many vertices. In the binary representation of n, we need a vertex per bit and a vertex per 1 bit. n can have at most 10 bits and at most 9 one bits, so we need at most 19 vertices.


I made a complete mess of this during the contest, solving the unweighted version and then trying to retrofit it to handle a weighted version. That doesn't work.

Here's how to do it, which I worked out based on some hints from tourist and coded at 3am while failing to sleep due to jetlag. Firstly, we'll make a change that makes some of the later steps a little easier: for each pair of adjacent vertices, add an edge with weight ∞. This won't violate the nesting property, and guarantees that there is always a solution. Let the length of an edge be the difference in indices of the vertices (as opposed to the cost, which is an input). Call an edge right-maximal if it's the longest edge emanating from a particular vertex, and left-maximal if it is the longest edge terminating at a particular vertex. A useful property is that an edge is always either right-maximal, left-maximal, or both. To see this, imagine an edge that is neither: it must then be nested inside longer edges terminating at each end, but these longer edges would cross, violating the given properties.

We can now make the following observations:
  1. If x → y is right-maximal and color[x] is used, then color[y] must be used too. This is because the path must pass through x, and after x cannot skip over y.
  2. If x → y is the only edge leaving x, then the edge will appear in a path if and only if color[x] is used.
  3. If x → y is right-maximal and x →  z is the next-longest edge from x, then x → y is in a path if and only if color[x] is used and color[z] is not used.
These statements all apply similarly for left-maximal edges of course. We can consider vertices 0 and n to be of colour 0 in the above, which is a color that must be picked. What is less obvious is that condition 1 (together with its mirror) is sufficient to determine whether a set of colours is admissible. Suppose we have a set of colours that satisfies condition 1, and consider the subset S of vertices whose colours come from this set. We want to be sure that every pair of consecutive vertices in S are linked by an edge. Suppose there is a pair (u, v) which are not. Consider the right-maximal edge from u: by condition 1, it must terminate at some vertex in S, which must then be to the right of v. But the left-maximal edge to v must start to the left of u, and so these two edges cross.

We can now use this information to construct a new graph, whose minimum cut will give us the cost of the shortest path. It has a vertex per colour, a source vertex corresponding to the special colour 0, and a sink vertex not corresponding to a colour. The vertices on the source side of the cut correspond to the colours that are visited. Every time colour C implies colour D, add an edge C → D of weight ∞, to prevent C being picked while D is not. If an edge falls into case 2 above, add an edge from color[x] to the sink with weight equal to the edge cost. Otherwise it falls into case 3; add an edge from color[x] to color[y] with weight equal to the edge cost. Each edge in the original graph now corresponds to an edge in this new graph, and the original edge is traversed if and only if the new edge forms part of the cut.