Wednesday, May 20, 2015

Analysis of ICPC 2015 world finals problems

Analysis of ICPC 2015 world finals problems

I haven't solved all the problems yet, but here are solutions to those I have solved (I've spent quite a bit more than 5 hours on it though).

A: Amalgamated Artichokes

This is a trivial problem that basically everybody solved - I won't go into it.

B: Asteroids

This looks quite nasty, but it can actually be solved by assembling a number of fairly standard computational geometry tools, and the example input was very helpful. To simplify things, let's switch to the frame of reference of the first asteroid, and assume only the second asteroid moves (call them P and Q for short). The first thing to notice is that the area is a piecewise quadratic function of time, with changes when a vertex of one polygon crosses an edge of the other. This is because the derivative of area depends on the sections of boundary of Q inside P, and those vary linearly in those intervals. Finding the boundaries between intervals is a standard line-line intersection problem. To distinguish between zero and "never", touching is considered to be an intersection too.
We also need a way to compute the area of the intersection. Clipping a convex polygon to a half-space is a standard problem too - vertices are classified as inside or outside, and edges that go from inside to outside or vice versa introduce a new interpolated vertex, so repeated clipping will give the intersection of convex polygons. And finding the area of a polygon is also very standard.
Finally, we need to handle the case where the best area occurs midway through one of the quadratic pieces. Rather than try to figure out the quadratic coefficients geometrically, I just sampled it at three points (start, middle, end) and solved for the coefficients, and hence the maximum, with algebra.

C: Catering

I'm sure I'm seen something similar, but I'm not sure where. It can be set up as a min-cost max-flow problem. Split every site into two, an arrival area and a departure area. Connect the source to each departure area, with cap 1 for clients and cap K for the company. Connect each arrival area to the sink similarly. Connect every departure area to every later arrival area with cap 1, except for the company->company link with cap K. Edge costs match the table of costs where appropriate, zero elsewhere. The maximum flow is now K+N (K teams leave the company, and equipment arrives at and leaves every client), and the minimum cost is the cost of the operation.

D: Cutting cheese

This is basically just a binary search for each cut position, plus some mathematics to give the volume of a sphere that has been cut by a plane.

E: Parallel evolution

Start by sorting the strings by length. A simplistic view is that we need to go through these strings in order, assigning each to one path or the other. This can be done with fairly standard dynamic programming, but it will be far too slow.
Add an "edge" from each string to the following one if it is a subsequence of this following one. This will break the strings up into connected chains. A key observation is that when assigning the strings in a chain, it is never necessary to switch sides more than once: if two elements in a chain are on one path, one can always put all the intervening elements on the same path without invalidating a solution. Thus, one need only consider two cases for a chain: put all elements in one path (the opposite path to the last element of the previous chain), or start the chain on one path then switch to the other path part-way through. In the latter case, one should make the switch as early as possible (given previous chains), to make it easier to place subsequent strings. A useful feature is that in either case, we know the last element in both paths, which is all that matters for placing future chains. Dynamic programming takes care of deciding which option to use. Only a linear number of subsequence queries is required.
I really liked this problem, even though the implementation in messy. It depends only on the compatibility relationship being a partial order and there being a cheap way to find a topological sort.

F: Keyboarding

This can be done with a fairly standard BFS. The graph has a state for each cursor position and number of completed letters. There are up to 5 edges from each state, corresponding to the 5 buttons. I precomputed where each arrow key moves to from each grid position, but I don't know if that is needed. It's also irrelevant that keys form connected regions.

H: Qanat

This is a good one for a mathsy person to work out on paper while someone else is coding - the code itself is trivial. Firstly, the slope being less than one implies that dirt from the channel always goes out one of the two nearest shafts - whichever gets it to the surface quicker (it helps to think of there being a zero-height shaft at x=0). Between each pair of shafts, one can easily find the crossover point.
Consider just three consecutive shafts, and the cost to excavate them and the section of channel between them. If the outer shafts are fixed, the cost is a quadratic in the position of the middle shaft, from which one can derive a formula for its position in terms of its neighbours. This gives a relationship of the form xi+2 - gxi+1 + xi = 0. In theory, this can now be used to iteratively find the positions of all the shafts, up to a scale factor which is solved by the requirement for the final shaft (the mother well) to be at x=W.
I haven't tested it, but I think evaluating the recurrence could lead to catastrophic rounding problems, because errors introduced early on are exponentially scaled up, faster than the sequence itself grows. The alternative I used is to find a closed formula for the recurrence. This is a known result: let a and b be the roots of x2 - gx + 1 = 0; then the ith term is r(ai - bi) for the scale factor r. Finding the formula for g shows that the roots will always be real (and distinct) rather than complex.

I: Ship traffic

This mostly needed some careful implementation. For each ship, there is an interval during which the ferry cannot launch. Sort the start and end times of all these intervals and sweeping through them, keeping track of how many ships will be hit for each interval between events. The longest such interval with zero collisions is the answer.

J: Tile cutting

Firstly, which sizes can be cut, and in how many ways? If the lengths from the corners of the parallelograms to the corners of the tile are a, b, c, d, then the area is ab + cd. So the number of ways to make a size is the number of ways it can be written as ab + cd, for a, b, c, d > 0. The number of ways to write a number as ab can be computed by brute force (iterate over all values of a and b for which ab <= 500000). The number of ways to write a number as ab + cd is the convolution of this function with itself (ala polynomial multiplication). There are well-known algorithms to do this in O(N log N) time (with an FFT) or in O(N1.58) time (divide-and-conquer), and either is fast enough.

K: Tours

I've got a solution that passes, but I don't think I can fully prove why.
I start by running a DFS to decompose the graph into a forest plus edges from a node to an ancestor in the forest. Each such "up" edge corresponds to a simple cycle. Clearly, the number of companies must divide into the length of any cycle, so we can take the GCD of these cycle lengths.
If the cycles were all disjoint, this would (I think) be sufficient, but overlapping cycles are a problem. Suppose two cycles overlap. That means there are two points A and B, with three independent paths between them, say X, Y, Z. If X has more than its share of routes from some company, then Y must have less than its share, to balance the tour X-Y. Similarly, Z must have less than its share. But then the tour Y-Z has too few routes from that company. It follows that X, Y and Z must all have equal numbers of routes from each company, and hence the number of companies must divide into each of their lengths.
And now for the bit I can't prove: it seems to be sufficient to consider only pairs of the simple cycles corresponding to using a single up edge. I just iterate over all pairs and compute the overlap. That is itself not completely trivial, requiring an acceleration structure to compute kth ancestors and least common ancestors.

L: Weather report

This is an interesting variation of a classic result in compression theory, Hamming codes. The standard way to construct an optimal prefix-free code from a frequency table is to keep a priority queue of coding trees, and repeatedly combine the two least frequent items by giving them a common parent. That can't be done directly here because there are trillions of possible strings to code, but it can be done indirectly by noting that permutations have identical frequency, and storing them as a group rather than individual items. The actions of the original algorithm then need to be simulated, pairing up large numbers of identical items in a single operation.

5 comments:

Eduardo Quintana Miranda said...

For problem H-Qanat the form of the cost function can be found:

F=Sum_{i=0}^{n} h[i+1]^2 / 2 + h[i]*(p[i]-x[i])+(p[i]-x[i])^2/2 + h[i+1]*(x[i+1]-p[i])+(x[i+1]-p[i])^2/2

where m=h/w is the slope of the hypotenuse, x[0]=0, x[n+1]=w, and x[i] for 1<=i<=n are the unknown optimal position of the shafts. Also h[i]=m*x[i] is the height of the ith shaft, and p[i] is a point lying between shafts i and i+1, which is at an equal distance from both shaft top points.
p[i]=(1-m)*x[i]/2 + (1+m)*x[i+1]/2

Due to problems special conditions (h<w) it can be easily proven that x[i]<p[i]<x[i+1].

A bit of algebraic work leads to a closed form dependence with x[i] only:

F=Sum_{i=0}^{n} x[i+1]^2 * (m^2+m)/ 2 - x[i]^2 * m/2 + (1-m^2)*(x[i+1]-x[i])^2/4

The Cost function F depends quadratically on x[i]. Then by simple partial derivation we find n linear and tri-diagonal equations to be satisfied by the positions x[i] (1<=i<=n).

x[i+1]+x[i]*a+x[i-1]=0

where a=2/(m^2-1)

These linear equations can be solved in O(n) using gaussian elimination for tri-diagonal systems.

Osman Ay said...

Problem I: Ship traffic

Before start the real implementation, you can get the mirror images of the ships at one side of the ferry line to the other side and then transfer all the ships onto the first lane. So that, all the ships are in one lane and have the same direction. Mirroring and transferring require some basic calculations.

alapan said...

Bruce, could you please link to the questions (or post them if appropriate)?

Caleb said...

In the uva online judge site I believe they modified problem A a bit (in terms of time)
If you look at:
http://icpc.baylor.edu/worldfinals/problems/icpc2015.pdf
You can see the time limit is 5s and that there's only one test each time.
However in the UVA site:
https://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=865&page=show_problem&problem=4782
(If you can't see the problem click on "PDF")
They not only reduced the time limit to 3s but "The input file contains several test cases"
I have coded a linear O(n) solution an got accepted in here:
https://icpc.kattis.com/problems
(since n < 10^6 it should pass the limits restriction, but it doesn't in uva probably for the several test cases.)
So I wanted to know if either there exists a faster solution (which the UVA team may have found) or they are not right by establishing those limits.
Since you didn't post a solution to that problem I'm asking in the comments.
Thanks in advance.

Bruce Merry said...

@alapan: http://icpc.baylor.edu/download/worldfinals/problems/icpc2015.pdf

@Caleb: I'm not aware of a sub-O(n) solution, but I suppose it is possible that one exists. It would need to exploit properties of the generating function in some way.