Friday, July 18, 2014

IOI 2014 day 2 analysis

I found day 2 much harder than day 1, and I still don't know how to solve all the problems (I am seriously impressed by those getting perfect scores). Here's what I've managed to figure out so far.

Update: I've now solved everything (in theory), and the solutions are below. The official solutions are now also available on the IOI website. I'll try coding the solutions at some point if I get time.


This was the easiest of the three. Firstly, what makes a valid gondola sequence? In all the subtasks of this problem, there will be two cases. If you see any of the numbers 1 to n, that immediately locks in the phase, and tells you the original gondola for every position. Otherwise, the phase is unknown. So, the constraints are that
  • if the phase is known, every gondola up to n must appear in the correct spot if it appears;
  • no two gondolas can have the same number.
Now we can consider how to construct a replacement sequence (and also to count them), which also shows that these conditions are sufficient. If the phase is not locked, pick it arbitrarily. Now the "new gondola" column is simply the numbers from n+1 up to the largest gondola, so picking a replacement sequence is equivalent to deciding which gondola replaces each broken gondola. We can assign each gondola greater than n that we can't see to a position (one where the final gondola number is larger), and this will uniquely determine the replacement sequence. We'll call such gondolas hidden.

For the middle set of subtasks, the simplest thing is to assign all hidden gondolas to one position, the one with the highest-numbered gondola in the final state. For counting the number of possible replacement sequences, each hidden gondola can be assigned independently, so we just multiply together the number of options, and also remember to multiply by n if the phase is unknown. In the last subtask there are too many hidden gondolas to deal with one at a time, but they can be handled in batches (those between two visible gondolas), using fast exponentiation.


This is a weighted maximum independent set problem. On a general graph this is NP-hard, so we will need to exploit the curious way in which the graph is constructed. I haven't figured out how to solve the whole problem, but let's work through the subtasks:
  1. This is small enough to use brute force (consider all subsets and check whether they are independent).
  2. The graph will be empty, so the sample can consist of everyone.
  3. The graph will be complete, so only one person can be picked in a sample. Pick the best one.
  4. The graph will be a tree. There is a fairly standard tree DP to handle this case: for every subtree, compute the best answer, either with the root excluded or included. If the root is included, add up the root-excluded answers for every subtree; otherwise add up the best of the two for every subtree. This takes linear time.
  5. In this case the graph is bipartite and the vertices are unweighted. This is a standard problem which can be solved by finding the maximum bipartite matching. The relatively simple flow-based algorithm for this is theoretically \(O(n^3)\), but it is one of those algorithms that tends to run much faster in most cases, so it may well be sufficient here.
The final test-case clearly requires a different approach, since n can be much larger. I only managed to figure this out after getting a big hint from the SA team leader, who had seen the official solution.

We will process the operations in reverse order. For each operation, we will transform the graph into one that omits the new person, but for which the optimal solution has the same score. Let's say that the last operation had A as the host and B as the invitee, and consider the different cases:
  • YourFriendsAreMyFriends: this is the simplest: any solution using B can also use A, and vice versa. So we can collapse the two vertices into one whose weight is the sum of the original weights, and use it to replace A.
  • WeAreYourFriends: this is almost the same, except now we can use at most one of A and B, and which one we take (if either) has no effect on the rest of the graph. So we can replace A with a single vertex having the larger of the two weights, and delete B.
  • IAmYourFriend: this is a bit trickier. Let's start with the assumption that B will form part of the sample, and add that to the output value before deleting it. However, if we later decide to use A, there will be a cost to remove B again; so A's weight decreases by the weight of B. If it ends up with negative weight, we can just clamp it to 0.
Repeat this deletion process until only the original vertex is left; the answer will be the weight of this vertex, plus the saved-up weights from the IAmYourFriend steps.


Consider the left-most and right-most cities that Jian-Jia visits. Regardless of where he stops, he will need to travel from the start city to one of the ends, and from there to the other end. There is no point in doing any other back-tracking, so we can tell how many days he spends travelling just from the end-points. This then tells us how many cities he has time to see attractions in, and obviously we will pick the best cities within the range.

That's immediately sufficient to solve the first test case. To solve more, we can consider an incremental approach. Fix one end-point, and gradually extend the other end-point, keeping track of the best cities (and their sum) in a priority queue (with the worst of the best cities at the front). As the range is extended, the number of cities that can be visited shrinks, so items will need to be popped. Of course, the next city in the range needs to be added each time as well. Using a binary heap, this gives an \(O(n^2\log n)\) algorithm: a factor of n for each endpoint, and the \(\log n\) for the priority queue operations. That's sufficient for subtask 3.  It's also good enough for subtask 2, because the left endpoint will be city 0, saving a factor of n.

For subtask 4, it is clearly not possible to consider every pair of end-points. Let's try to break things up. Assume (without loss of generality) that we move first left, then back to the start, then right. Let's compute the optimal solution for the left part and the right part separately, then combine them. The catch is that we need to know how we are splitting up our time between the two sides. So we'll need to compute the answer for each side for all possible number of days spent within each side. This seems to leave us no better off, since we're still searching within a two-dimensional space (number of days and endpoint), but it allows us to do some things differently.

We'll just consider the right-hand side. The left-hand side is similar, with minor changes because we need two days for travel (there and back) instead of one. Let f(d) be the optimal end-point if we have d days available. Then with a bit of work one can show that f is non-decreasing (provided one is allowed to pick amongst ties). If we find f(d) for d=1, 2, 3, ... in that order, it doesn't really help: we're only, on average, halving the search space. But we can do better by using a divide-and-conquer approach: if we need to find f for all \(d \in [0, D)\) then we start with \(d = \frac{D}{2}\) to subdivide the space, and then recursively process each half of the interval on disjoint subintervals of the cities. This reduces the search space to \(O(n\log n)\).

This still leaves the problem of efficiently finding the total number of attractions that can be visited for particular intervals and available days. The official solution uses one approach, based on a segment tree over the cities, sorted by number of attractions rather than position. The approach I found is, I think, simpler. Visualise the recursion described above as a tree; instead of working depth-first (i.e., recursively), we work breadth-first. We make \(O(\log n)\) passes, and in each pass we compute f(d) where d is an odd multiple of \(2^i\) (with \(i\) decreasing with each pass). Each pass can be done in a single incremental process, similar to the way we tackled subpass 2. The difference is that each time we cross into the next subinterval, we need to increase \(d\), and hence bring more cities into consideration. To do this, we need either a second priority queue of excluded cities, or we can replace the priority queue with a balanced binary tree. Within each pass, d can only be incremented \(O(n)\) times, so the total running time will be \(O(n\log n)\) per pass, or \(O(n\log n \log n)\) overall.

Wednesday, July 16, 2014

IOI 2014 day 1 analysis

IOI 2014 day 1

Since there is no online judge, I haven't tried actually coding any of these. So these ideas are not validated yet. You can find the problems here.


I found this the most difficult of the three to figure out, although coding it will not be particularly challenging.

Firstly, we can note that distances are symmetric: a route from A to B can be reflected in the two tracks to give a route from B to A. So having only \(\frac{n(n-1)}{2}\) queries is not a limitation, as we can query all distances. This might be useful in tackling the first three subtasks, but I'll go directly to the hardest subtask.

If we know the position and type of a station, there is one other that we can immediately locate: the closest one. It must have the opposite type and be reached by a direct route. Let station X be the closest to station 0. The other stations can be split into three groups:
  1. d(X, Y) < d(X, 0): these are reached directly from station X and of type C, so we can locate them exactly.
  2. d(0, X) + d(X, Y) = d(0, Y), but not of type 1: these are reached from station 0 via station X, so they lie to the left of station 0.
  3. All other stations lie to the right of station X.
Let's now consider just the stations to the right of X, and see how to place them. Let's take them in increasing distance from 0. This ensures that we encounter all the type D stations in order, and any type C station will be encountered at some point after the type D station used to reach it. Suppose Y is the right-most type D station already encountered, and consider the distances for a new station Z. Let \(z = d(0, Z) - d(0, Y) - d(Y, Z)\). If Z is type C, then there must be a type D at distance \(\frac{z}{2}\) to the left of Y. On the other hand, if Z is of type D (and lies to the right of Y), then there must be a type C station at distance \(\frac{z}{2}\) to the left of Y. In the first case, we will already have encountered the station, so we can always distinguish the two cases, and hence determine the position and type of Z.
The stations to the left of station zero can be handled similarly, using station X as the reference point instead of station 0.
How many queries is this? Every station Z except 0 and X accounts for at most three queries: d(0, Z), d(X, Z) and d(Y, Z), where Y can be different for each Z. This gives \(3(n-2) + 1\), which I think can be improved to \(3(n-2)\) just by counting more carefully. Either way, it is sufficient to solve all the subtasks.


This is a fairly standard interval tree structure problem, similar to Mountain from IOI 2005 (but a little easier). Each node of the tree contains a range to which its children are clamped. To determine the value of any element of the wall, start at the leaf with a value of 0 and read up the tree, clamping the value to the range in each node in turn. Initially, each node has the range [0, inf). When applying a new instruction, it is done top-down, and clamps are pushed down the tree whenever recursion is necessary.

An interesting aspect of the problem is that it is offline, in that only the final configuration is requested and all the information is provided up-front. This makes me think that there may be an alternative solution that processes the data in a different order, but I can't immediately see a nicer solution than the one above.


I liked this problem, partly because I could reverse-engineer a solution from the assumption that it is always possible to win, and partly because it requires neither much algorithm/data-structure training (like Wall) nor tricky consideration of cases (like Rails). Suppose Mei-Yu knows that certain cities are connected. If there are any flights between the cities that she has not asked about, then she can win simply by saving one of these flights for last, since it will not affect whether the country is connected. It follows that for Jian-Jia to win, he must always answer no when asked about a flight between two components that Mei-Yu does not know to be connected, unless this is the last flight between these components?

What if he always answers yes to the last flight between two components? In this case he will win. As long as there are at least two components left, there are uncertain edges between every pair of them, so Mei-Yu can't know whether any of them is connected any other. All edges within a component are known, so the number of components can only become one after the last question.

What about complexity? We need to keep track of the number of edges between each pair of components, which takes \(O(N^2)\) space. Most operations will just decrement one of these counts. There will be \(N - 1\) component-merging operations, each of which requires a linear-time merging of these edge counts and updating a vertex-to-component table. Thus, the whole algorithm requires \(O(N^2)\) time. This is optimal given that Mei-Yu will ask \(O(N^2)\) questions.