Sunday, November 26, 2017

Analysis of Croatian Open contest 2017r3

This round of COCI was more challenging than usual, with some of the lower-scoring problems being particularly difficult.


This was quite straightforward: run through the letters, and each time a letter is different to the previous one (or is the first one), increment a counter.


Firstly, one word can be made by rearranging the letters of another if and only if they have the same letter frequencies. So if we have a fast way to find the letter frequencies in any substring, the problem is solved. Consider the sequence whose ith element is 1 if S[i] is 'a' and 0 otherwise: the number of a's in a substring is then just a contiguous sum of this sequence. By precomputing the prefix sum, we can find the sum of any interval in constant time. We thus just need to store 26 prefix sum tables, one for each letter.


The first part (finding the length) is a reasonably standard although fiddly dynamic programming problem. At any point in the process, one has a position relative to the original grid and a nesting depth, and need to know the longest suffix that will complete a valid bracket sequence. For each of the (up to) three possible moves, one can compute the state would be reached, and find the longest possible sequence from previously computed values.

My attempt at the second part during the contest was overly-complex and algorithmically too slow, involving trying to rank all length-L strings represented by DP states, for each L in increasing order. This required a lot of sorting, making the whole thing O(N³ log N) (where N is O(R + C)).

A simpler and faster solution I wrote after the contest is to reconstruct the path non-deterministically, one bracket at a time. Instead of working from a single source state, keep a set of candidate states (in super-position). From those, run a breadth-first (or depth-first) search to find all states where one might encounter the next bracket. Any transition that is optimal (for length) in the original DP now becomes an edge in this search. Once the candidate states for the next bracket has been found, there might be a mix of opening and closing brackets. If so, discard those states corresponding to closing brackets, since they will not be lexicographically minimal.


I didn't attempt this problem, and am not sure how to solve it. I suspect it requires coming up with and proving a number of simplifying assumptions which allow the search space to be reduced.


I really liked this problem. First of all, M = 1 is a bit tricky, because it's the only case where you can't swap two numbers without having an effect. It's easy enough to solve by hand and treat as a special case.

Now let's count all the cases where the player can't win. Let's say that the lit elements have an XOR of K. For M > 1, we can't have \(K = 2^M - 1\), since otherwise the player could swap either two lit or two unlit numbers and win. Let \(K' = K \oplus 2^M - 1\), and consider a lit number A. To win, the player has to swap it with B such that  \(A \oplus B = K'\). As the RHS can't be 0, the lit numbers must be grouped into pairs with XOR of K'. The XOR of all the lit numbers must thus be either 0 or K', depending on whether there are an even or odd number of such pairs. But this is K (by definition); and since K = K' is impossible, we must have K = 0.

So now we need to find intervals such that the XOR of all elements is 0, and for every pair of complementary elements A and B, they are either both inside or both outside the interval. The first part is reasonably easy to handle: we can take a prefix sums (using \(\oplus\) instead of addition), and an interval with XOR of 0 is one where two prefix sums are equal. For the second part, we can use some sweeps: for each potential left endpoint, establish an upper bound on the right endpoint (considering only complementary pairs that lie on opposite sides of the left endpoint), and vice versa - doable with an ordered set or a priority queue to keep track of right elements of complementary pairs during the sweep. Now perform another left-to-right sweep, keeping track of possible left endpoints that are still "active" (upper bound not yet reached), and each time a right endpoint is encountered, query how many of the left endpoints with the same suffix sum are above the lower bound for the right end-point. Actually getting the count in logarithmic time requires a bit of fancy footwork with data structures that I won't go into (e.g. a Fenwick tree, a segment tree or a balanced binary tree with node sizes in the internal nodes).


I somehow miscounted and didn't realise that this problem even existed until after the contest (maybe because I don't expect more than three harder problems in COCI). It's disappointing, because it only took me about 15 minutes to solve after the contest, yet was worth the most points.

For each K-summary, draw a line between each pair of adjacent segments of that summary. For example, with a 2-summary and 3-summary and N=9, you would get |01|2|3|45|67|8|9|. Clearly in any marked interval with more than one number, none of them can be determined. It is also not hard to see that in any interval with exactly one number, that number can be determined (hint: find the sum of all numbers except that one). Thus, we need the count of numbers X such that X is a multiple of some Ki and X+1 is a multiple of another Kj. Note that if any K=1 then the answer is simply N, which we'll assume is handled as a special case.

If one has chosen a single Ki and Kj, it is not difficult to determine the numbers that satisfy the property, using the Chinese Remainder Theorem (they will be in arithmetic progression, with period Ki×Kj, and so are easy to count). However, this can easily lead to double-counting, and we thus need to apply some form of inclusion-exclusion principle. That is, we can pick any two subsets S and T of the K's, with products P and Q, and count the number of X's such that P divides X and Q divides X + 1. Note that S and T must be disjoint, as otherwise the common element would need to divide X and X+1 (and we assumed Ki > 1). We also need to determine the appropriate coefficient to scale this term in the sum. Rather than trying to derive equations, I let the computer do the work: for each value of |S| and |T| I consider every possible subset of S and of T (actually, just each possible size), add up the coefficients they're already contributing, and set the coefficient for this value of |S| and |T| to ensure that the total is 1.