This last weekend was round 3 of Code Jam 2017 and round 2 of Distributed Code Jam 2017. I'm not going to describe how to solve all the problems since there are official analyses (here and here), but just mention some alternatives. For this post I'll just talk about one problem from Code Jam; some commentary on DCJ will follow later.

### Slate Modern (Code Jam)

The idea I had for this seems simpler (in my opinion, without having tried implementing both) than the official solution, but unfortunately I had a bug that I couldn't find until about 10 minutes after the contest finished.

As noted in the official analysis, one can first check whether a solution is possible by comparing each pair of fixed cells: if the difference is value is greater than D times the Manhattan distance, then it is impossible; if no such pair exists, it is possible. The best solution is then found by setting each cell to the smallest lower bound imposed by any of the fixed cells.

Let's try to reduce the complexity by a factor of C, by computing the sum of a single row quickly. If we look at the upper bound imposed by one fixed cell, it has the shape \(b + |x - c| D\), where \(x\) is the column and b, c are constants. When combining the upper bounds, we take the lowest of them. Each function will be the smallest for some contiguous (possibly empty) interval. By sweeping through the fixed cells in order of c, we can identify those that contribute, similar to finding a Pareto front. Then, by comparing adjacent functions one can find the range in which each function is smallest, and a bit of algebra gives the sum over that range.

This reduces the complexity to O(RN + N²) (the N² is to check whether a solution is possible, but also hides an O(N log N) to sort the fixed cells). That's obviously still too slow. The next insight is that most of the time, moving from one row to the next changes very little: each of the functions increases or decreases by one (depending on whether the corresponding fixed cell is above or below the row), and the range in which each function is smallest grows or shrinks slightly. It thus seems highly likely that the row sum will be a low-degree polynomial in the row number. From experimentation with the small dataset, I found that it is actually a quadratic (conceptually it shouldn't be too hard to prove, but I didn't want to get bogged down in the details).

Note that I said "most of the time". This will only be true piecewise, and we need to find the "interesting" rows where the coefficients change. It is fairly obvious that rows containing fixed cells will be interesting. Additionally, when the range in which a function is smallest disappears or appears will be interesting. Consider just two fixed cells, and colour the whole grid according to which of the two fixed cells gives the lower upper bound. The boundary between the two colours can have diagonal, horizontal and vertical portions, and it is these horizontal portions that are (potentially) interesting. I took the conservative approach of adding all O(N²) such rows as interesting.

Now that we have partitioned the rows into homogeneous intervals, we need to compute the sum over each interval efficiently. Rather than determine the quadratic coefficients analytically, I just interfered them by taking the row sums of three consecutive rows (if there are fewer than three rows in the interval, just add up their row sums directly). A bit more algebra to find the sum of a quadratic series, and we're done! There are O(N²) intervals to sum and it requires O(N) to evaluate each of the row sums needed, giving a running time of O(N³). I suspect that this could probably be reduced to O(N² log N) by being smarter about how interesting rows are picked, but it is not necessary given the constraints on N.

## 3 comments:

Thanks for posting alternative solutions to the problems. I hope this post will help everyone looking to participate in code jam in the coming years.

Hadoop concepts, Applying modelling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.With companies across industries striving to bring their research and analysis (R&A) departments up to speed, the demand for qualified data scientists is rising.

data science training in bangalore

thanks for posting this blog

machine learning training in bangalore

hadoop training in bangalore

Post a Comment