Friday, July 18, 2014

IOI 2014 day 2 analysis

I found day 2 much harder than day 1, and I still don't know how to solve all the problems (I am seriously impressed by those getting perfect scores). Here's what I've managed to figure out so far.

Update: I've now solved everything (in theory), and the solutions are below. The official solutions are now also available on the IOI website. I'll try coding the solutions at some point if I get time.

Gondola

This was the easiest of the three. Firstly, what makes a valid gondola sequence? In all the subtasks of this problem, there will be two cases. If you see any of the numbers 1 to n, that immediately locks in the phase, and tells you the original gondola for every position. Otherwise, the phase is unknown. So, the constraints are that
• if the phase is known, every gondola up to n must appear in the correct spot if it appears;
• no two gondolas can have the same number.
Now we can consider how to construct a replacement sequence (and also to count them), which also shows that these conditions are sufficient. If the phase is not locked, pick it arbitrarily. Now the "new gondola" column is simply the numbers from n+1 up to the largest gondola, so picking a replacement sequence is equivalent to deciding which gondola replaces each broken gondola. We can assign each gondola greater than n that we can't see to a position (one where the final gondola number is larger), and this will uniquely determine the replacement sequence. We'll call such gondolas hidden.

For the middle set of subtasks, the simplest thing is to assign all hidden gondolas to one position, the one with the highest-numbered gondola in the final state. For counting the number of possible replacement sequences, each hidden gondola can be assigned independently, so we just multiply together the number of options, and also remember to multiply by n if the phase is unknown. In the last subtask there are too many hidden gondolas to deal with one at a time, but they can be handled in batches (those between two visible gondolas), using fast exponentiation.

Friend

This is a weighted maximum independent set problem. On a general graph this is NP-hard, so we will need to exploit the curious way in which the graph is constructed. I haven't figured out how to solve the whole problem, but let's work through the subtasks:
1. This is small enough to use brute force (consider all subsets and check whether they are independent).
2. The graph will be empty, so the sample can consist of everyone.
3. The graph will be complete, so only one person can be picked in a sample. Pick the best one.
4. The graph will be a tree. There is a fairly standard tree DP to handle this case: for every subtree, compute the best answer, either with the root excluded or included. If the root is included, add up the root-excluded answers for every subtree; otherwise add up the best of the two for every subtree. This takes linear time.
5. In this case the graph is bipartite and the vertices are unweighted. This is a standard problem which can be solved by finding the maximum bipartite matching. The relatively simple flow-based algorithm for this is theoretically $$O(n^3)$$, but it is one of those algorithms that tends to run much faster in most cases, so it may well be sufficient here.
The final test-case clearly requires a different approach, since n can be much larger. I only managed to figure this out after getting a big hint from the SA team leader, who had seen the official solution.

We will process the operations in reverse order. For each operation, we will transform the graph into one that omits the new person, but for which the optimal solution has the same score. Let's say that the last operation had A as the host and B as the invitee, and consider the different cases:
• YourFriendsAreMyFriends: this is the simplest: any solution using B can also use A, and vice versa. So we can collapse the two vertices into one whose weight is the sum of the original weights, and use it to replace A.
• WeAreYourFriends: this is almost the same, except now we can use at most one of A and B, and which one we take (if either) has no effect on the rest of the graph. So we can replace A with a single vertex having the larger of the two weights, and delete B.
• IAmYourFriend: this is a bit trickier. Let's start with the assumption that B will form part of the sample, and add that to the output value before deleting it. However, if we later decide to use A, there will be a cost to remove B again; so A's weight decreases by the weight of B. If it ends up with negative weight, we can just clamp it to 0.
Repeat this deletion process until only the original vertex is left; the answer will be the weight of this vertex, plus the saved-up weights from the IAmYourFriend steps.

Holiday

Consider the left-most and right-most cities that Jian-Jia visits. Regardless of where he stops, he will need to travel from the start city to one of the ends, and from there to the other end. There is no point in doing any other back-tracking, so we can tell how many days he spends travelling just from the end-points. This then tells us how many cities he has time to see attractions in, and obviously we will pick the best cities within the range.

That's immediately sufficient to solve the first test case. To solve more, we can consider an incremental approach. Fix one end-point, and gradually extend the other end-point, keeping track of the best cities (and their sum) in a priority queue (with the worst of the best cities at the front). As the range is extended, the number of cities that can be visited shrinks, so items will need to be popped. Of course, the next city in the range needs to be added each time as well. Using a binary heap, this gives an $$O(n^2\log n)$$ algorithm: a factor of n for each endpoint, and the $$\log n$$ for the priority queue operations. That's sufficient for subtask 3.  It's also good enough for subtask 2, because the left endpoint will be city 0, saving a factor of n.

For subtask 4, it is clearly not possible to consider every pair of end-points. Let's try to break things up. Assume (without loss of generality) that we move first left, then back to the start, then right. Let's compute the optimal solution for the left part and the right part separately, then combine them. The catch is that we need to know how we are splitting up our time between the two sides. So we'll need to compute the answer for each side for all possible number of days spent within each side. This seems to leave us no better off, since we're still searching within a two-dimensional space (number of days and endpoint), but it allows us to do some things differently.

We'll just consider the right-hand side. The left-hand side is similar, with minor changes because we need two days for travel (there and back) instead of one. Let f(d) be the optimal end-point if we have d days available. Then with a bit of work one can show that f is non-decreasing (provided one is allowed to pick amongst ties). If we find f(d) for d=1, 2, 3, ... in that order, it doesn't really help: we're only, on average, halving the search space. But we can do better by using a divide-and-conquer approach: if we need to find f for all $$d \in [0, D)$$ then we start with $$d = \frac{D}{2}$$ to subdivide the space, and then recursively process each half of the interval on disjoint subintervals of the cities. This reduces the search space to $$O(n\log n)$$.

This still leaves the problem of efficiently finding the total number of attractions that can be visited for particular intervals and available days. The official solution uses one approach, based on a segment tree over the cities, sorted by number of attractions rather than position. The approach I found is, I think, simpler. Visualise the recursion described above as a tree; instead of working depth-first (i.e., recursively), we work breadth-first. We make $$O(\log n)$$ passes, and in each pass we compute f(d) where d is an odd multiple of $$2^i$$ (with $$i$$ decreasing with each pass). Each pass can be done in a single incremental process, similar to the way we tackled subpass 2. The difference is that each time we cross into the next subinterval, we need to increase $$d$$, and hence bring more cities into consideration. To do this, we need either a second priority queue of excluded cities, or we can replace the priority queue with a balanced binary tree. Within each pass, d can only be incremented $$O(n)$$ times, so the total running time will be $$O(n\log n)$$ per pass, or $$O(n\log n \log n)$$ overall.