Monday, June 30, 2014

ICPC Problem H: Pachinko

Problem A has been covered quite well elsewhere, so I won't discuss it. That leaves only problem H. I started by reading the Google translation of a Polish writeup by gawry. The automatic translation wasn't very good, but it gave me one or two ideas I borrowed. I don't know how my solution compares in length of code, but I consider it much simpler conceptually.

This is a fairly typical Markov process, where a system is in some state, and each timestep it randomly selects one state as a function of the current state. One variation is that the process stops once the ball reaches a target, whereas Markov processes don't terminate. I was initially going to model that as the ball always moving from the target to itself, but things would have become slightly complicated.

Gawry has a nice way of making this explicitly a matrix problem. Set up the matrix M as for a Markov process i.e., \(M_{i,j}\) is the probability of a transition from state j to state i. However, for a target state j, we set \(M_{i,j}=0\) for all i. Now if \(b\) is our initial probability vector (equal probability for each empty spot in the first row), then \(M^t b\) represents the probability of the ball being in each position (and the game not having previously finished) after \(t\) timesteps. We can then say that the expected amount of time the ball spends in each position is given by \(\sum_{t=0}^{\infty} M^t b\). The sum of the elements in this vector is the expected length of a game and we're told that it is less than \(10^9\), so we don't need to worry about convergence. However, that doesn't mean that the matrix series itself converges: Gawry points out that if there are unreachable parts of the game with no targets, then the series won't converge. We fix that by doing an initial flood-fill to find all reachable cells and only use those in the matrix. Gawry then shows that under the right conditions, the series converges to \((I - M)^{-1} b\).

This is where my solution differs. Gawry dismisses Gaussian elimination, because the matrix can be up to 200,000 square. However, this misses the fact that it is banded: by numbering cells left-to-right then top-to-bottom, we ensure that every non-zero entry in the matrix is at most W cells away from the main diagonal. Gaussian elimination (without pivoting) preserves this property. We can exploit this both to store the matrix compactly, and to perform Gaussian elimination in \(O(W^3H)\) time.

One concern is the "without pivoting" caveat. I was slightly surprised that my first submission passed. I think it is possible to prove correctness, however. Gaussian elimination without pivoting is known (and easily provable) to work on strictly column diagonally dominant matrices. In our case the diagonal dominance is weak: columns corresponding to empty cells have a sum of zero, those corresponding to targets have a 1 on the diagonal and zeros elsewhere. However, the matrix is also irreducible, which I think is enough to guarantee that there won't be any division by zero.

EDIT: actually it's not guaranteed to be irreducible, because the probabilities can be zero and hence it's possible to get from A to B without being able to get from B to A. But I suspect that it's enough that one can reach a target from every state.

Saturday, June 28, 2014

ICPC Problem L: Wires

While this problem wasn't too conceptually difficult, it requires a lot of code (my solution is about 400 lines), and careful implementation of a number of geometric algorithms. A good chunk of the code comes from implementing a rational number class in order to precisely represent the intersection points of the wires. It is also very easy to suffer from overflow: I spent a long time banging my head against an assertion failure on the server until I upgraded my rational number class to use 128-bit integers everywhere, instead of just for comparisons.

The wires will divide the space up into connected regions. The regions can be represented in a planar graph, with edges between regions that share an edge. The problem is then to find the shortest path between the regions containing the two new end-points.

My solution works in a number of steps:
  1. Find all the intersection points between wires, and the segments between intersection points. This just tests every wire against every other wire. The case of two parallel wires sharing an endpoint needs to be handled carefully. For each line, I sort the intersection points along the line. I used a dot produce for this, which is where my rational number class overflowed, but would probably have been safer to just sort lexicographically. More than two lines can meet at an intersection point, so I used a std::map to assign a unique ID to each intersection point (I'll call them vertices from here on).
  2. Once the intersection points along a line have been sorted, one can identify the segments connecting them. I create two copies of each segment, one in each direction. With each vertex A I store a list of all segments A->B. Each pair is stored contiguously so that it is trivial to find its partner. Each segment is considered to belong to the region to its left as one travels A->B.
  3. The segments emanating from each vertex are sorted by angle. These comparisons could easily cause overflows again, but one can use a handy trick: instead of using the vector for the segment in an angle comparison, one can use the vector for the entire wire. It has identical direction but has small integer coordinates.
  4. Using the sorted lists from the previous step, each segment is given a pointer to its following segment from the same region. In other words, if one is tracing the boundary of the region and one has just traced A->B, the pointer will point to B->C.
  5. I extract the contours of the regions. A region typically consists of an outer contour and optionally some holes. The outermost region lacks an outer contour (one could add a big square if one needed to, but I didn't). A contour is found by following the next pointers. A case that turns out to be inconvenient later is that some segments might be part of the contour but not enclose any area. This can make a contour disappear completely, in which case it is discarded. Any remaining contours have the property that two adjacent segments are not dual to each other, although it is still possible to both sides of an edge to belong to the same contour.
  6. Each contour is identified as an outer contour or a hole. With integer coordinates I could just measure the signed area of the polygon, but that gets nasty with rational coordinates. Instead, I pick the lexicographically smallest vertex in the contour and examine the angle between the two incident segments (this is why it is important that there is a non-trivial angle between them). I also sort the contours by this lexicographically smallest vertex, which causes any contour to sort before any other contours it encloses.
  7. For each segment I add an edge of weight 1 from its containing region to the containing region of its dual.
  8. For each hole, I search backwards through the other contours to find the smallest non-hole that contains it. I take one vertex of the hole and do a point-in-polygon test. Once again, some care is needed to avoid overflows, and using the vectors for the original wires proves useful. One could then associate the outer contour and the holes it contains into a single region object, but instead I just added an edge to the graph to join them with weight 0. In other words, one can travel from the boundary of a region to the outside of a hole at no cost.
  9. Finally, I identify the regions containing the endpoints of the new wire, using the same search as in the previous step.
After all this, we still need to implement a shortest path search - but by this point that seems almost trivial in comparison.

What is the complexity? There can be \(O(M^2)\) intersections and hence also \(O(M^2)\) contours, but only \(O(M)\) of them can be holes (because two holes cannot be part of the same connected component). The slowest part is the fairly straightforward point-in-polygon test which tests each hole against each non-hole segment, giving \(O(M^3)\) time. There are faster algorithms for doing point location queries, so it is probably theoretically possible to reduce this to \(O(M^2\log N)\) or even \(O(M^2)\), but certainly not necessary for this problem.

ICPC Problem J: Skiing

I'll start by mentioning that there is also some analysis discussion on the Topcoder forums, which includes alternative solutions that in some cases I think are much nicer than mine.

So, on to problem J - which no team solved, and only one team attempted. I'm a little surprised by this, since it's the sort of problem where one can do most of the work on paper while someone else is using the machine. A possible reason is that it's difficult to determine the runtime: I just coded it up and hoped for the best.

It is a slightly annoying problem for a few reasons. Firstly, there are a few special cases, because \(v_y\) or \(a\) can be zero. We also have to find the lexicographically smallest answer.

Let's deal with those special cases first. If \(v_y = 0\) then we can only reach targets with \(y = 0\). If \(a \not= 0\) then we can reach all of them in any order, otherwise we can only reach those with \(x = 0\). To get the lexicographically smallest answer, just go through them in order and print out those that are reachable. I actually dealt with the \(v_y \not= 0\), \(a = 0\) case as part of my general solution, but is also easy enough to deal with. We can only reach targets with \(x = 0\), and we can only reach them in increasing order of y. One catch is that targets are not guaranteed to have distinct coordinates, so if two targets have the same coordinates we must be careful to visit them in lexicographical order. I handled this just by using a stable sort.

So, onwards to the general case. Before we can start doing any high-level optimisation, we need to start by answering this question: if we are at position P with X velocity \(v\) and want to pass through position Q, what possible velocities can we arrive with? I won't prove it formally, but it's not hard to believe that to arrive with the smallest velocity, you should start by accelerating (at the maximum acceleration) for some time, then decelerate for the rest of the time. Let's say that the total time between P and Q is \(T\), the X separation is \(x\), the initial acceleration time is \(T - t\) and the final deceleration time is \(t\). Some basic integration then gives
\[x = v(T-t)+\tfrac{1}{2}a(T-t)^2 + (v+a(T-t))t - \tfrac{1}{2}at^2.\]
This gives a quadratic in \(t\) where the linear terms conveniently cancel, leaving
\[t = \sqrt{\frac{vT+\tfrac{1}{2}aT^2-x}{a}}\]
The final velocity is just \(v+aT-2at\). We can also find the largest possible arrival velocity simply by replacing \(a\) with \(-a\). Let's call these min and max velocity functions \(V_l(v, T, x)\) and \(V_h(v, T, x)\).

More generally, if we can leave P with a given range of velocities \([v_0, v_1]\), with what velocities can we arrive at Q? We first need to clamp the start range to the range from which we can actually reach Q i.e., that satisfy \(|vT - x| \le \frac{1}{2}aT^2\). A bit of calculus shows that \(V_l\) and \(V_h\) are decreasing functions of \(v\), so the arrival range will be \([V_l(v_1), V_h(v_0)]\).

Finally, we are ready to tackle the problem as a whole, with multiple targets. We will use dynamic programming to answer this question, starting from the last target (highest Y): if I start at target i and want to hit a total of j targets, what are the valid X velocities at i? The answer will in general be a set of intervals. We will make a pseudo-target at (0, 0), and the answer will then be the largest j for which 0.0 is a valid velocity at this target.

Computing the DP is generally straightforward, except that the low-level queries we need are the reverse of those we discussed above i.e. knowing the arrival velocities we need to compute departure velocities. No problem, this sort of physics is time-reversible, so we just need to be careful about which signs get flipped. For each (i, j) we consider all options for the next target, back-propagate the set of intervals from that next target, and merge them into the answer for (i, j). Of course, for \(j = 1\) we use the interval \((-\infty, \infty)\).

The final inconvenience is that we must produce a lexicographical minimum output. Now we will see the advantage of doing the DP in the direction we chose. We build a sequence forwards, keeping track of the velocity interval for our current position. Initially the position will be (0, 0) and the velocity interval will be [0.0, 0.0]. To test whether a next position is valid, we forward-propagate the velocity interval to this next position, and check whether it has non-empty intersection with the set of velocities that would allow us to hit the required number of remaining targets. We then just take the next valid position with the lowest ID.

What is the complexity? It could in theory be exponential, because every path through the targets could induce a separate interval in the interval set. However, in reality the intervals merge a lot, and I wouldn't be surprised if there is some way to prove that there can only be a polynomial number of intervals to consider. My solution still ran pretty close to the time limit, though.

Friday, June 27, 2014

More ICPC analysis

Now we're getting on to the harder problems. Today I've cover two that I didn't know how to solve. Some of the others I have ideas for, but I don't want to say anything until I've had a chance to code them up.

Firstly, problem I. This is a maximum clique problem, which on a general graph is NP-complete. So we will need to use the geometry in some way. Misof has a nice set of slides showing how it is done: http://people.ksp.sk/~misof/share/wf_pres_I.pdf. I had the idea for the first step (picking the two points that are furthest apart), but it didn't occur to me that the resulting conflict graph would be bipartite.

Now problem G. I discovered that this is a problem that has had a number of research papers published on the topic, one of which achieves \(O(N^3)\). Fortunately, we don't need to be quite that efficient. Let's start by finding a polynomial-time solution. Let's suppose we've already decided the diameters of the two clusters, D(A) and D(B), and just want to find out whether this is actually possible. For each shipment we have a boolean variable that says whether it goes into part A (false) or part B (true). The constraints become boolean expressions: if d(i, j) > D(A) then we must have i || j, and if d(i, j) > D(B) then we must have !i || !j. Determining whether the variables can be satisfied is just 2-SAT, which can be solved in \(O(N^2)\) time.

Now, how do we decide which diameters to test? There are \(O(N^2)\) choices for each, so the naive approach will take \(O(N^6)\) time. We can reduce this to \(O(N^4\log N)\) by noting that once we've chosen one, we can binary search the other (it's also possible to eliminate the log factor, but it's still too slow). So far, this is what I deduced during the contest.

The trick (which I found in one of those research papers) is that one can eliminate most candidates for the larger diameter. If there is an odd-length cycle, at least one of the edges must be within a cluster, and so that is a lower bound for the larger diameter. What's more, we can ignore the shortest edge of an even cycle (with some tie-breaker), because if the shortest edge lies within a cluster, then so must at least one other edge.

We can exploit this to generate an O(N)-sized set of candidates: process the edges from longest to shortest, adding each to a new bipartite graph (as for constructing a maximum spanning tree with Kruskal's algorithm). There are three cases:
  1. The next edge connects two previously disconnected components. Add the edge to the graph (which will remain bipartite, since one can always re-colour one of the components. This edge length is a candidate diameter.
  2. The next edge connects two vertices in the same component, but the graph remains bipartite. This edge is thus part of an even cycle, and so can be ignored.
  3. The next edge connects two vertices, forming an odd cycle. This edge length is a candidate diameter, and the algorithm terminates.
The edges selected in step 1 form a tree, so there are only O(N) of them.

Wednesday, June 25, 2014

ACM ICPC 2014

ACM ICPC 2014 is over. The contest was incredibly tough: solving less than half the problems was enough to get a medal, and 20 teams didn't solve any of the problems (unfortunately including the UCT team, who got bogged down in problem K).

I managed to solve 5 problems during the contest (in 4:30, since I didn't wake up early enough to be in at the start), and I've solved one more since then. Here are my solutions, in approximately easy-to-hard order.

EDIT: note that I've made follow-up blog posts with more analysis.

D: game strategy

This was definitely the easiest one, and this is reflected in the scores. We can use DP to determine the set of states from which Alice can force a win within i moves. Let's call this w[i]. Of course, w[0] = {target}. To compute w[i+1], consider each state s that is not already a winning state. If one of Alice's options is the set S and \(S \subseteq w[i]\), then Alice can win from this state in i+1 moves.

We can repeat this N times (since no optimal winning sequence can have more than N steps), or just until we don't find any new winning states. We don't actually need to maintain all the values of w, just the latest one.

K: surveillance

Let's first construct a slower polynomial-time algorithm. Firstly, for each i, we can compute the longest contiguous sequence of walls we can cover starting at i. This can be done in a single pass. We sort all the possible cameras by their starting wall, and as we sweep through the walls we keep track of the largest-numbered ending wall amongst cameras whose starting wall we have passed. The camera with \(a > b\) are a bit tricky: we can handle them by treating them as two cameras \([a - N, b]\) and \([a, b + N]\).

Let's suppose that we know we will use a camera starting at wall i. Then we can use this jump table to determine the minimum number of cameras: cover as much wall starting from i as possible with one camera, then again starting from the next empty slot, and so on until we've wrapped all the way around and covered at least N walls. We don't know the initial i, but we can try all values of i. Unfortunately, this requires \(O(N^2)\) time.

To speed things up, we can compute accelerated versions of the jump table. If \(J_c[i]\) is the number of walls we can cover starting from i by using c cameras, then we already have \(J_1\), and we can easily calculate \(J_{a+b}\) given \(J_a\) and \(J_b\). In particular, we can compute \(J_2, J_4, J_8\) and so on, and them combine these powers of two to make any other number. This can then be used in a binary search to find the smallest c such that \(J_c\) contains a value that is at least N. The whole algorithm thus requires \(O(N\log N)\) time.

One catch that broke my initial submission is that if the jump table entries are computed naively, they can become much bigger than N. This isn't a problem, until they overflow and become negative. Clamping them to N fixed the problem.

C: crane balancing

This is a reasonably straightforward geometry problem, requiring some basic mechanics. Let the mass of the crane be \(q\), let the integral of the x position over the crane be \(p\), let the base be the range \([x_0, x_1]\), and let the x position of the weight be \(x\). The centre of mass of the crane is \(\frac p q\), but with a weight of mass \(m\), it will be \(\frac{p+mx}{q+m}\). The crane is stable provided that this value lies in the interval \([x_0, x_1]\). This turns into two linear inequalities in \(m\). Some care is then needed to deal with the different cases of the coefficient being positive, negative or zero.

Computing the area and the integral of the x value is also reasonably straightforward: triangulate the crane by considering triangles with vertices at \((0, i, i+1)\) and computing the area (from the cross product) and centre of mass (average of the vertices) and then multiply the area by the x component of the centre of mass to get the integral.

I prefer to avoid floating point issues whenever possible, so I multiplied all the X coordinates by 6 up front and worked entirely in integers. This does also cause the masses to be 6 times larger, so they have to be adjusted to compensate.

E: maze reduction

This is effectively the same problem I set in Topcoder SRM 378 (thanks to ffao on Topcoder for reminding me where I'd seen this). If you have an account on Topcoder you can read about an alternative solution in the match analysis, in which it is easier to compute worst-case bounds.

The approach I used in this contest is similar in concept to D, in that you incrementally update a hypothesis about which things are distinguishable. We will assign a label to every door of each chamber. Two doors are given the same label if we do not (yet) know of a way to distinguish them. Initially, we just label each door with the degree of the room it is in, since there is nothing else we can tell by taking zero steps.
Now we can add one inference step. Going clockwise from a given door, we can explore all the corridors leading from the current chamber and note down the labels on the matching doors at the opposite end of each corridor. This forms a signature for the door. Having done this for each door, we can now assign new labels, where each unique signature is assigned a corresponding label. We can repeat this process until we achieve stability, i.e., the number of labels does not change. We probably ought to use each door's original label in the signature too, but my solution passed without requiring this.
Finally, two rooms are effectively identical if their sequence of door labels is the same up to rotation. I picked the lexicographically minimum rotation for each room and used this as a signature for the room.
I'm not sure what the theoretical work-case performance is, but I suspect it would be quite large. For a start, by algorithm requires \(O(NE\log N)\) time per iteration, which I suspect could be improved by using a hash table with some kind of rolling hash to rotation in O(1) time. I was surprised when my first submission ran in time.

B: buffed buffet

This one was deceptively sneaky, but the principles are reasonably simple. It's not too hard to guess that one will solve the continuous and discrete problems separately, and then consider all partitions between the two.
Let's start with the continuous problem, since it is a little easier. I don't have a formal proof, but it shouldn't be too hard to convince yourself that a greedy strategy works. We start by eating the tastiest food. Once it degrades to being as tasty as the second-tastiest food, we then each both, in the proportion that makes them decay at the same rate. In fact, at this point we can treat them as a combined food with a combined (slower) decay rate. We continue eating this mix until tastiness decays to match the third-tastiest, and so on. There are some corner cases that need to be handled if there are foods that don't decay.
Now what about the discrete case? Because the items have different weights, a greedy strategy won't work here, as with any knapsack problem. There is a reasonably straightforward \(O(DW^2)\) dynamic programming, where you consider each possible number of serving of each discrete food type, but this will be too slow. But there is some structure in the problem, so let's try to exploit it. Let's say that \(f(i)\) is the maximum tastiness for a weight of \(i\) using only the foods we've already considered, and we're now computing an updated \(f'\) by adding a new discrete food with weight \(w\), initial tastiness \(t\) and decay rate \(d\). For a given i, \(f'(i)\) clearly only depends on \(f(j)\) where \(i - j\) is a multiple of \(w\), so let's split things up by the remainder modulo \(i\). Fix a remainder \(r\) and let \(g(i) = f(iw + r)\) and \(g'(i) = f'(iw + r)\). Now we can say that
\[
\begin{aligned}
g'(i) &= \max_{0 \le j \le i}\big\{ g(j) + \sum_{n=1}^{i-j}(t - (n-1)d\big\}\\
&= \max_{0 \le j \le i}\big\{ g(j) + (i-j)t - \frac{(i-j-1)(i-j)}{2}\cdot d\big\}\\
&= \max_{0 \le j \le i}\big\{ g(j) + (i-j)t - \frac{i(i-1)+j(j+1)-2ij}{2}\cdot d\big\}\\
&= it - \frac{i(i-1)d}{2} + \max_{0 \le j \le i}\big\{ g(j)-\frac{j(j+1)d}{2} - jt + ijd\big\}\\
&= it - \frac{i(i-1)d}{2} + \max_{0 \le j \le i}\big\{ h(j) + ijd \big\}
\end{aligned}
\]
Here we have defined \(h(j) = g(j)-\frac{j(j+1)d}{2} - jt\), which can be computed in constant time per \(j\).

The key observation is that we have turned the expression inside the maximum into a linear function of \(j\). Thus, if we plot \(h\) on a graph, only the upper convex hull is worth considering. We can maintain this upper hull as we increase \(i\) (remember, \(j \le i\)). The second observation is that the optimal choice of \(j\) will increase as \(i\) increases, because increasing \(i\) will increase the \(ijd\) more for larger values of \(j\). It follows that we can incrementally find the optimal \(j\) for each \(i\) by increasing the previous \(j\) along the upper hull until increasing it further would start decreasing the objective function. There are a few corner cases: the upper hull might not have any valid entries (because there might be no way to make up a weight of exactly \(jw+r\)), and we might have popped off the previous \(j\) as part of updating the hull. These just need careful coding. Another snag to watch out for is that if all the foods decay very fast, the optimal result may in fact overload a 32-bit integer in the negative direction.

The discrete part of the algorithm now requires O(DW), and the continuous part requires O(D\log D + W).

F: messenger (solved after contest)

This is a nasty problem mostly because of numerical stability problems. The problem itself is numerically unstable: a rounding error in measuring the total lengths of the paths is sufficient to change the answer between possible and impossible. It requires the black art of choosing good epsilons. I'll only describe the ideal case in which none of these nasty rounding problems occur.

Suppose we pick a point on both Misha and Nadia's path. In general this won't work, because the courier will arrive too late or too early at this point to intercept Nadia. Let L(A, B) be the number of time units that the courier is late when travelling from A to B. L can be late if the courier is early. Valid solutions are those where L = 0.

First, let's prove that L is an increasing function of A (where "increasing" means that A moves further along Misha's path). Suppose that the courier decided to, instead of moving straight to B, instead kept pace with Misha for a while, then went to B. Obviously this would mean arriving later (or at the same time). But this is equivalent to the arrival time if A increases. A similar argument shows that L is a decreasing function of B. In both cases, this is not strict.

This means that there can be at most \(O(N)\) pairs of segments for which L=0, rather than \(O(N^2)\), because we cannot move forward on one path and backwards on another. We can iterate over the candidate pairs by successively advancing either one segment or the other, by measuring L at pairs of segment endpoints.

Now we have reduced the problem to a single segment from each path. Given a point on one path, we can find the corresponding point on the other by solving what looks like a quadratic but which decays into a linear equation. Again, there are corner cases where the linear coefficient is also zero, in which case we have to break ties so as to minimise the travel time. Using this mapping function, we can identify the range of positions on Misha's path that correspond to valid positions on Nadia's path. I then used ternary search to find the optimal position along this segment. I didn't actually prove that the function is convex, but it seemed to work.

The solution I implemented that passed is actually rather messier than what I've described, and ran close to the time limit. I tried implementing it as described above but it hits assertions in my code due to numeric instabilities, but I think it should still be possible to fix it.

Monday, June 02, 2014

New website

I've finally gotten around to replacing my ugly nineties-looking personal web page that was cobbled together with raw HTML and m4, with a much prettier version that nevertheless still contains roughly the same content. It is generated using Sphinx, with the individual pages written in reStructuredText. I've also merged my publications page into the main website instead of leaving it on the equally ugly but separately cobbled-together page I originally set up for my PhD.