Entropy always increases: Solutions to ICPC world finals 2021

In case you're wondering why I'm only posting this in 2022: the "2021" world finals happened in November 2022, in Dhaka. You can see the problems here. I haven't seen a writeup of the solutions (not that I've looked very hard). So I'm writing up my solutions. These are my own solutions, except for K, where I watched the official solution and particularly liked it. You can find the official solution videos here. The proof of correctness for E was provided by Derek Kisman.

A: Crystal Crosswind

Consider a single wind direction, and two cells A at \((x, y)\) and B at \((x - w_x, y - w_y)\). If there is a boundary at A, things are simple: there is a molecule at A and there is no molecule at B. If there is no boundary at A we can't immediately determine where the molecules are (unless B lies outside the grid, in which case there is on molecule at A), but we know that if there is a molecule at A there is also one at B, and conversely if there is no molecule at B then there is no molecule at A. We can add these are edges to two different graphs.

Now every time we know there is a molecule in a particular location, we can follow edges in the first graph to find new molecules, and similarly when we know there is a hole somewhere, we can follow edges of the second graph to find new holes. We can just keep doing this until there is no new information to be found.

This may still leave us with unknown cells. The key observation is that leaving all these locations as holes will yield a valid configuration. Suppose we made a hole at \(P_1\), and that implied there had to be a hole at \(P_2\), which implied there had to a hole at \(P_3\) etc, until eventually it required a hole at some \(P_k\), which already contained a molecule. That cannot happen, because the molecule at \(P_k\) would already have forced a molecule at \(P_{k-1}\).

Similarly, we can safely fill all unknown cells with molecules to get the maximum crystal structure.

B: Dungeon Crawler

Assume we've decided to end the traversal at some node E. Any edge on the path from S to E will necessarily be traversed an odd number of times, and any other edge an even (and positive) number of times. Let r be the sum of all the edge lengths, and let d(P, Q) be the distance between nodes P and Q. If we didn't have to worry about the trap, then we'd be able to achieve \(2r - d(S, E)\), using each edge the minimum number of times.

Let's consider the tree to be rooted at K, and let L be the least common ancestor of S and T, and M be the least common ancestor of L and E. If L = T then the problem is impossible (can't get from S to K without going through T). Otherwise, every edge on the path from L to M has to be traversed 3 times: first to get to the key, then to get to the trap, and finally to get to the end. So the total cost will be \(2r - d(S, E) + 2d(L, M)\). We can now consider two cases.

E lies in the subtree of L, so L = M. Then the cost is just \(2r - d(S, E)\). So for this case, we want E to be as far from S as possible.
E does not lie in the subtree of L. In this case a bit of manipulation (using knowledge of which vertices are ancestors of which) shows that the cost is equivalent to \(2r - d(E, K) + d(L, K) - d(L, S)\). Since L, K and S are independent of E, it follows that we should choose E to be the vertex furthest from K.

For each case, we want to choose E as far from some vertex V (either S or K) as possible, while excluding a particular subtree. Let's consider the tree to be rooted at V, and index the vertices using a pre-order walk. The range of vertices to exclude are contiguous in this pre-order walk. Thus, we wish to consider only a prefix and suffix of the vertices. By pre-computing prefix and suffix maximums (of depth) for every possible root, we can choose E in each case in constant time (once we have L and its neighbour towards K).

The time complexity depends on the method used to find least common ancestors. With a simple structure where every node stores ancestors at power-of-two distances, it will require \(O(n^2\log n + q\log n)\); with more sophisticated structures (such as heavy-light trees) this can be reduced to \(O(n^2 + q\log n)\), and I believe it can (very) theoretically be reduced to \(O(n^2 + q)\) using the method of the Four Russians.

C: Fair Division

It's slightly easier to work with the fraction of loot that gets passed on, rather than the fraction each pirate takes. Let's call that g (so g = 1 - f). Let's work out how much loot each pirate gets. Let's say a pirate gets x in the first round. In the next round they get \(g^nx\), then \(g^{2n}x\) and so on. This is a geometric series, and in the limit they will get \(\frac{x}{1-g^n}\). For the ith pirate (counting from 0), x will be \(g^i(1-g)m\), so the total loot will be \(\frac{g^i(1-g)m}{1-g^n}\).

Let \(g = \frac{a}{b}\), where a and b are relatively prime. Multiplying top and bottom by \(b^n\), we see that the loot is \(\frac{a^i b^{n-1-i}(b - a)m}{b^n - a^n}\), and in particular \(b^n - a^n\) must divide into \(a^i b^{n-1-i}(b - a)m\). Suppose some prime p divides into b. Then it divides into \(b^n\) but not into \(a^n\) and hence does not divide into \(b^n - a^n\). Thus, the denominator is relatively prime to \(b^{n-1-i}\), and similarly to \(a^i\). It's thus required that \(b^n - a^n\) divides into \((b - a)m\).

We now see the reason behind the condition n ≥ 6: it strongly limits how big b can be. It's sufficient to test all values of b up to the 5th root of m to find those that satisfy the condition.

D: Guardians of the Gallery

This is a typical hard geometry problem: relatively simple conceptually, but with lots of corner cases and opportunities for floating-point instabilities.

I don't have a formal proof, but the guard's path can be split into two stages:

Optionally move to a vertex of the polygon, possibly via other vertices. We can find the minimum distance for this by considering a graph consisting of the polygon vertices and the guard, with edges where the line segment connecting vertices is unobstructed. The graph is small so Floyd-Warshall is sufficient.
Move to the nearest point on a segment originating at the art, passing through a vertex, and terminating where it is no longer possible to view the art. This might be an orthogonal projection onto the segment (interior to the polygon) or it might be the endpoint of the segment.

We thus need to know how to find the terminating point of a segment as contemplated in the second part. We can trace the polygon counter-clockwise, and check each edge against this ray. If it completely crosses the ray, that gives us a bound. But what if it just touches the ray? We can separately track edges that touch the ray from one side (obscuring one half of the art) or the other (obscuring the other half); the further of these two determines the cutoff point.

We also need to be able to check whether a line segment is obstructed by the polygon. It's complicated by the fact that either endpoint may lie on the polygon itself. It's not sufficient to determine whether the polygon intersects the interior of the line segment (i.e., excluding the endpoints), because the segment may lie entirely outside the polygon, or it may lie along an edge of the polygon or pass through a vertex. Solving this boils down to careful handling of polygon vertices that lie along the line segment, to ensure that the line segment does not go outside the polygon at any of those vertices.

E: Hand of the Free Marked

Firstly, it is quite easy to show that probabilistic strategies are of no benefit. Suppose the team use a probabilistic strategy that involves picking a die from a selection (with a number of sides appropriate to the situation) and rolling it. Then instead, they could just roll all the dice in advance, before the cards are selected. Each possible roll would have some probability of success, according to the chosen strategy. They might as well just fix the dice in the combination that gives the best probability, and consider that a deterministic strategy.

We can now see this as a bipartite matching problem. Let U be the set of all possible (unordered) sets of cards that could be drawn, and let V be the set of possible ordered sets of k - 1 cards plus a marking for the final card. The assistant maps each element of U to one element of V, and the magician maps each element of V to some element of U. The trick cannot work for two different elements of U that map to the same V (the magician will only be correct for one of them), so the elements for which the trick works form a bipartite matching.

Edges exist where the element of U and of V have consistent information. Let's refer to the different markings as "colours", and let the "spectrum" of a set of cards consist of the number of cards of each colour of the set. The spectrum is known for each element of V, so it follows that each spectrum constitutes a separate connected component of the bipartite graph (we haven't provided that each spectrum is connected, but it's simple; the important point is that there are no edges between spectra). So we can solve the problem independently for each spectrum then combine the results.

From this point on, we'll assume a fixed spectrum, and let U and V denote only the elements within this spectrum. We can hypothesize that the size of the maximum matching is simply the smaller of |U| and |V|. This is not true for general bipartite graphs, but these graphs have a somewhat regular structure that suggests it (and it is clearly not practical to use a generic bipartite matching algorithm). If you take it on faith, you'll be able to solve the problem, with some basic combinatorics to compute |U| and |V|. The number of possible spectra is small enough to be easily handled.

Let's prove the result though. We'll use Hall's Theorem. Elements of V can be classified according to the colour of the hidden card. Let's denote the subsets of V as \(V_1\), \(V_2\), \(V_s\) for some s. All elements in \(V_i\) have the same degree (let's call it \(d_i\)). What's more, every element in U has the same number of connections to \(V_i\), say \(e_i\). It follows that \(d_i|V_i| = e_i|U|\). Now we can consider the two cases:

|U| ≤ |V|. Consider some subset A of U, whose image in V in B. Let \(B_i\) be the intersection of B and \(V_i\). Let's consider the total number of edges between A and \(B_i\). On the one hand it must equal \(e_i|A|\). On the other hand, it cannot exceed the sum of degrees of \(B_i\), namely \(d_i|B_i|\). So \(d_i|B_i| \ge e_i|A| = d_i|A||V_i|/|U|\) and hence \(|B_i| \ge |A||V_i|/|U|\). Adding these inequalities gives \(|B| \ge |A||V|/|U| \ge |A|\). Thus, the Hall condition is satisfied.
|U| ≥ |V|. Consider some subset B of V, whose image in U is A. Define \(B_i\) as before. Between A and \(B_i\) there will be \(d_i|B_i|\) edges, and also at most \(e_i|A|\) edges, giving \(d_i|B_i| \le e_i|A|\). The argument proceeds as before, just with the directions of inequalities reversed.

F: Islands from the Sky

Firstly, congratulations to the judges for creating a geometry problem (and a 3D geometry problem no less!) that is actually relatively easy to solve.

Each island needs to be completely scanned by some plane. So we can take each island, each plane and check what angle needs to be used by that plane to scan that island. For each island we can take the minimum angle over all planes, and then we can take the largest angle over all islands.

Because each plane scans a convex region, it's sufficient to check the vertices of an island. For each vertex, project the vertex onto the (2D) flight path to determine the point at which it is scanned, then use linear interpolation to find the altitude of the plane at that point. Then simple trigometry (using the distance of the vertex from the projection) gives the minimum angle.

G: Mosaic Browsing

It's clear that a naïve implementation will be too slow. But considering every possible way to slide two signals over each other and compute some summary seems remarkably like convolution. We can in fact make it so with some tricks. First, assign each colour a unique point on the complex unit circle (ideally equally spaced, but that's only important for numeric precision). In the motif, use the conjugate values, and use 0 for uncoloured cells. When multiplying a value from the mosaic with a value from the motif, the result will be 1 if the colours match, or something with a real part less than 1 if they do not. Thus, the real part of the sum will equal the number of coloured cells of the motif if and only if there is a match.

There are a few details to take care of (around sign conventions and padding), but essentially the problem has been reduced to a 2D convolution. Using the Fast Fourier Transform, it can thus be implemented in \(O(RC \log(RC))\), where \(R = r_p + r_q\) and \(C = c_p + c_q\).

H: Prehistoric Programs

Firstly, if the total numbers of opening and closing parentheses are not equal, then clearly it is impossible. Otherwise, we can summarise each tablet with two numbers: its balance (number of opening minus the number of closing parens) and its depth (negative of the minimum balance of any prefix).

Note that it is never necessary to have a negative-balance tablet followed by a positive-balance tablet: if this occurs in a valid solution, they can be swapped around to give another valid solution. Similarly, given two adjacent positive-balance tablets, we can always place the one with the smaller depth first. Tablets with zero balance can be (arbitrarily) treated as positive-balance.

This leads to a greedy solution: from the left, try to place the positive-balance tablets in increasing order of depth, then the negative-balance tablets in decreasing order of depth. Either it gives a valid solution (easy to check) or there is no valid solution.

I: Spider Walk

The traversal rules are time-symmetric, so we can just as easily ask how many bridges to add to ensure that starting at strand S and walking inwards will reach the centre from a particular strand. Let us maintain some distance d and track the minimum cost (in new bridges) to reach the point on each strand at a distance D, starting from S. We start with D at infinity and incrementally reduce it by considering the bridges in decreasing distance from the centre.

Suppose there is a bridge linking strands P and Q at distance d. For D slightly larger than d, we have costs to reach P and Q of \(c_P\) and \(c_Q\) respectively. Once D shrinks below d, the cost to reach P is at most \(c_Q\), and the cost to reach Q is at most \(c_P\). However, the costs may actually be less if the costs to reach the neighbouring strands are \(c_P - 1\) or \(c_Q - 1\) respectively.

Once the new costs for P and Q have been computed, the costs to reach every other strand need to be updated using these as starting points (e.g. if the new cost for P is \(e_P\), then a strand 5 away from P can now be reached with cost \(e_P + 5\), which may be better than the previous value).

Performing this cost update efficiently will require a suitable data structure. It requires the following operations:

Set a single value
Update a range of values [L, R) to the minimum of the current value and \(v + i\), where v is given and i is the index.
As above, but use \(v - i\) rather than \(v + i\).

The \(v + i\) and \(v - i\) updates correspond to the two directions one could traverse the web. I found it simplest to use two separate segment trees for the two types of updates, with the final result being the smaller of the two. Each segment tree node holds a value of \(v\) that has been applied to its entire corresponding subtree. Setting a single value requires pushing these internal values down the tree.

Overall complexity is \(O(n\log n + m\log n + m\log m)\) (the final term is just to sort the bridges; it could probably be eliminated, but there is little need.

J: Splitstream

For each edge, we'll need to know the total number of values that are sent along it. This can be done with dynamic programming, or recursively with memoisation. Then to find the kth element along an edge, one needs only consider a few cases. If the edge start from a split node, it is simple to determine the corresponding index on the edge into the split node. For a merge node it is slightly more complex, depending on whether one or the other incoming edge will run out before the kth element is output.

K: Take On Meme

As a reminder, this is not my solution, but rather one presented during the livestream.

We can imagine all possible final memes as points in a plane, and we're trying to find the one furthest from the origin. This will lie on the convex hull of the possible memes. Let's construct that convex hull.

Suppose we wish to find the point which lies furthest in a particular direction (formally, has the largest dot product with a chosen vector). This is a reasonably straightforward tree DP, where for each node of the tree we find the meme with the largest and smallest dot product. Whatever point we find is guaranteed to be on the hull. By picking a few (e.g. 4) directions, we can find a points on the hull. We can then find the rest recursively: pick two adjacent candidates, and search for the furthest point in the direction orthogonal to the line joining them. This will either yield another point on the hull, or prove that the two candidates are neighbours on the hull.

This leaves the question of how many points might lie on the convex hull — after all, there are exponentially many ways to select winning memes. However, the points all have integer coordinates with magnitude limited to 10⁷, which strongly limits the number of points that can be placed on the hull: a hull with coordinates limited to X has at most \(O(X^{\frac{2}{3}})\) points.

L: Where Am I?

Take one step at a time, keeping track of sets of starting positions that are indistinguishable so far. When taking a new step, each set is partitioned into two (although possibly all into one side) based on whether the new step would reveal a marker or an empty space for given starting positions. Every time a set of size 1 is produced, you know how many steps are needed to identify the location for someone who starts in that location.

Update: apparently the judges intend this solution to time out; I can't tell whether mine does or not since there is no online judge, but my implementation (which isn't super-optimised) runs in 0.9s.

Entropy always increases

Saturday, December 10, 2022

Solutions to ICPC world finals 2021