tag:blogger.com,1999:blog-318472812021-05-03T08:39:42.496-07:00Entropy always increasesUnknownnoreply@blogger.comBlogger107125tag:blogger.com,1999:blog-31847281.post-39170150724973178792021-05-02T06:27:00.000-07:002021-05-02T06:27:06.497-07:00Disassembling Endo, part 2<p>If you haven't read <a href="https://blog.brucemerry.org.za/2021/04/disassembling-endo-part-1.html">Part 1</a> yet, read that first. This is Part 2.</p><p>As a reminder, here's how far we got in Part 1:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Sgadzx9Swwc/YIGTYzCrJoI/AAAAAAAABRs/wkh-jwNZfmknPMZg7twFDP-f_c2GokaQACLcBGAsYHQ/s600/037_goldfish.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Sgadzx9Swwc/YIGTYzCrJoI/AAAAAAAABRs/wkh-jwNZfmknPMZg7twFDP-f_c2GokaQACLcBGAsYHQ/s320/037_goldfish.png" /></a></div><br /><p>We have a bit of trouble with the VMU. It's time we started doing a bit of disassembly to see why. In general, static disassembly is impossible, because the machine is entirely built on self-modifying code, but fortunately most of the time one rule follows another in a sensible way. There are exceptions where some data is embedded directly into the code (rather than stored on the stack or in separate symbols), but these can be treated as special cases.</p><p>The first thing we see in the function <span style="font-family: courier;">vmu</span> is IIICCPIICC, which emits the RNA CCPIICC. Why? It's not valid RNA. This was mentioned on help page 2181889, suggesting that these codes starting with a C indicate "a small part of the morphing process" is done. In fact, you'll find that all the functions start with such a piece of RNA, and they have unique codes. From this, it's possible to examine an RNA trace to determine the control flow at the function level. I ended up not really needing this, because I had some tools in my DNA interpreter for determining what code was running (since copying the code to the red zone just references the original code, it's possible to determine the origin of instructions), but it was quite useful when our team originally competed.</p><p>Incidentally, while RNA is technically part of the pattern-matching rules, it almost always occurs as the very beginning and so can be treated as if it is an instruction on its own, and I'll continue to treat it like that. The "next" instruction, then, is (![7512628]) -> (0)IIIIIIIIIIIIIIIIIIIIIIIPIIIIIIIIIIIIIIIIIIIIIIIP. If you do some calculations based on how much of the function is left in the red zone, you'll see that this is adding some data to the start of the stack. Those calculations rapidly get tedious, so it's worth writing a tool to match various patterns and print out higher-level information. In this case one can recognise it as a push and print that out.</p><p>Next is ![33](![213118]CCIICCIIIIIIIIIIIIIIIIIP) -> (0). This is a conditional forward jump: it moves to a particular address (which is vmuMode) and tries to match a number there (51). If it all matches, then the next 33 acids will vanish; otherwise nothing happens. The next 33 acids are exactly the length of the next instruction, which is ![218] ->, which is an unconditional jump. Taken together, these say to jump ahead if <span style="font-family: courier;">vmuMode</span> is <i>not</i> 51.</p><p>Now we have ![2906](![4406243](![1805])![3101361]) -> (0)(1)IICCCIICICCCCCICCCIIICIPICICCICICCICIIIIIIIIIIIP. Once again, one needs to interpret the numbers: 2906 is the remainder of the red zone, 4406243 and 1805 are the address and size of the function <span style="font-family: courier;">ufo-with-smoke</span>, and 3101361 takes us to the blue zone. We then erase the rest of the red zone, copy <span style="font-family: courier;">ufo-with-smoke</span> to the red zone, and push two numbers on the stack. That sounds like a function call, and indeed you can confirm that the two numbers are the address of the following instruction, and the length of the remainder of the function, for returning to.</p><p>After this there is another jump (![2586] ->), and another conditional jump: ![33](![212772]CCCCCIIIIIIIIIIIIIIIIIIP) -> (0), this time comparing <span style="font-family: courier;">vmuMode</span> to 31. Aha, we didn't know to try that value earlier!</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Egxsk6bDs48/YIGYmgzCqaI/AAAAAAAABR0/zSXNGkEjN3ENidJyALcdkMFmNYyt-39vgCLcBGAsYHQ/s600/100_vmu_no_code.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Egxsk6bDs48/YIGYmgzCqaI/AAAAAAAABR0/zSXNGkEjN3ENidJyALcdkMFmNYyt-39vgCLcBGAsYHQ/s320/100_vmu_no_code.png" /></a></div><p>Fortunately we've already followed the clues to find the registration code (Out_of_Band_II), so let's write that into <span style="font-family: courier;">vmuRegCode</span> (remember to terminate the string with a value of 255). And now we get our caravan, but in the wrong place:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-onQsGaINhcM/YIGZVYb6CtI/AAAAAAAABR8/Qu5z9sUDmlU4CgQISrq66q0pvoKwSBw3gCLcBGAsYHQ/s600/101_vmu.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-onQsGaINhcM/YIGZVYb6CtI/AAAAAAAABR8/Qu5z9sUDmlU4CgQISrq66q0pvoKwSBw3gCLcBGAsYHQ/s320/101_vmu.png" /></a></div><p>Wait a second, the caravan was encrypted, and we didn't decrypt it. But we know that the VMU registration code is also the encryption key for the caravan, so presumbly <span style="font-family: courier;">vmu</span> is doing the decryption for us. Let's take a look at the next few instructions:</p><p> (![7511175])![48] -> (0)IICIICICICCIIIIICCCICICPCICIICCICCIICIIIIIIIIIIP</p><p>((![1200])![7509907]) -> (1)(0)</p><p>(![211586](![1152])) -> IIPIPIIIIIIICIICPIICIIPIPIIIICICCCCICCIICICIICCCPIICIPIIIIIIICIICPIICIPPCPIPPPIIC(0)(1)</p><p>The first writes some values into those stack variables we saw earlier: specifically, the address and size of <span style="font-family: courier;">caravan</span>. Then it pushes more data to the stack. Weirdly, it's pushing the immediately following 1200 acids, which form code. It turns out that the compiler the organisers used to generate code makes function calls by first pushing a block of garbage to the stack, then overwriting it, rather than directly pushing arguments. So this is just making space on the stack for 1200 acids of arguments.</p><p>What's going on with the third instruction? It's writing a long bunch of stuff at the business end, prior to any substitutions. Code generating code! This turns out to be fairly common, and disassembly becomes much easier if you detect this, parse the embedded code and present it as a separate rule. Where it's important I'll prefix such follow-on instructions with a +. So let's try to parse that again:</p><p>(![211586](![1152])) -> (0)(1)<br />+ (![1152])(![7510992])![1152] -> (1)(0)</p><p>The first instruction copies 1152 acids from the green zone (<span style="font-family: courier;">vmuRegCode</span>) and inserts them at the "start" of the DNA (although not really the start, because it's after the embedded instruction). Then the embedded rule vacuums it back up and writes it into the stack. This shows one advantage of the embedded rule: if it had been written as a separate rule, then the first rule would have inserted the data <i>before</i> the second rule, and things would have gone wrong.<br /></p><p>You may be wondering why this needed to be done with this trick, instead of just a single rule that both collected and delivered the data to the stack. It would be possible, but if the data source occurred after the destination in the DNA, it would have been required a different sort of pattern. With this approach, the loading and storing are independent, which probably makes the compiler easier to write.</p><p>Carrying on we see just such usage:</p><p>(![7511999](![24])) -> (0)(1)<br />+(![24])(![7510823])![24] -> (1)(0)</p><p>(![7511878](![24])) -> (0)(1)<br />+(![24])(![7510654])![24] -> (1)(0)</p><p>This copies the address and size of <span style="font-family: courier;">caravan</span> to that 1200-acid block as well. Here's more evidence that a compiler (and not a super-optimised one) is involved: a human would have just written the values directly to where they were needed, rather than writing them into a temporary and then copying the temporary.</p><p>Perhaps not surprisingly, at this point we call <span style="font-family: courier;">crypt</span>. We've given it the decryption key, address and size of caravan, so it will do its thing. Note that, as documented, strings are always passed as 128 characters, even if they are actually cut short by a terminator.</p><p>The rest of the function isn't particularly exciting, so I'll jump ahead to near the end.</p><p>(![7509644])![48] -> (0)<br />RNA: CFPICFP<br />![75](![7509409])(![24])(![24]) -> IIPIP(1)IIPIP(2)IICIICIICIPPPIPPCPIIC(0)</p><p>The first pops 48 acids from the stack (the ones we pushed right at the start), which should leave the return address at the top of the stack. Then we see the RNA sequence that we're told indicates a return. The return statement itself uses more code-generating-code, but this time it's not just a fixed rule stuck on the front of the template: the rule is generated based on values in the pattern! This is how indirection is performed. To split off the leading rule, I had to invent some new syntax: <t:<i>X</i>> means part of a rule is defined by the template at the previous level. So rewriting the return using this syntax gives:</p><p>![75](![7509409])(![24])(![24]) -> (0)<br />+(![<t:(1)>](![<t:(2)>])) -> (0)(1)</p><p>In other words, the second rule will skip forward not by a constant difference, but by the number found in replacement (1) of the first rule i.e., the return address; then it will skip forward by (2) from the first rule i.e. the return size (remaining amount of code to execute from the calling function).</p><p>Parsing this requires the disassembler to do another impossible thing, because the parsing of the inner rule depends on the replacement values e.g. if (1) was empty then the following IIP would be the number zero on skip instead of an opening bracket. Once again, it's possible to make progress because the actual code seen is generated in fairly consistent ways and so heuristics work well e.g. if one sees a skip instruction immediately followed by a template parameter, it's a good chance that it will be substituted by a number, rather than some arbitrary code. And again, there are exceptions that need to be handled on a case-by-case basis.</p><p>There is also some trailing stuff at the end of the function which I haven't entirely understood. It seems to occur at the end of every function. I've split it across several lines since it appears to consist of distinct pieces:<br /></p><p>CPPPPPP<br />IICIICIICIICIICIICIICIICIICIIC<br />CCFIICIIC<br />IFFCPICCFPICICFCPICIICIIC = ?[IFPICFPPCIFP] -><br />IIII</p><p>Overall I think the idea is to terminate the machine if code is being executed in the wrong place (e.g. if you patched code in a broken way that meant rule decoding wasn't happening on the intended boundaries). I don't know what the first and last lines are for, although the first might might be intended to terminate numbers and to create a match item that won't match. Then there are a lot of IIC's, which should ensure that any open brackets get closed; excess IIC's will be consumed in pairs (forming empty rules). The middle line can be interpreted in two possible ways: if all the previous IIC's were consumed, then it is the rule IIC ->, which won't match so is ignored. If the last IIC on the previous line formed an (empty) pattern, then CCFIIC becomes the template IIC, which will combine with the remaining IIC to form another empty rule. So regardless of the parity, the end of this line is highly likely to be the end of a rule so that we're all synchronized for line 4, which searches for a marker that's right at the end of the DNA and skips to it, shutting everything down.</p><p>Let's see how addition works, because it comes up here and there. There is a function called <span style="font-family: courier;">addInts</span>, which seems like a good place to start. It's almost never called, addition normally being done inline, so I suspect it's mostly provided for educational purposes. The core instructions look like this:</p><p>(![7510064](![24])) -> (1)<br />+(![7510040](![24])) -> (1)<br />+(![170])(![<t:<t:(0,1)>>]![<t:(0)>]) -> (0)|1|(1)</p><p>(?[P]) -> F(0)<br />+(![<t:|0|>])P -> (0)IIIIIIIIIIIIIIIIIIIIIIIP<br />+F(![23])?[P](![7509918])![24] -> (1)(0)P</p><p>Now we have 3 layers of rules in one, and nested <t:...> constructs. <t:<t:...>> means that the content is defined by the rule <i>two</i> lines above.</p><p>The first set of rules does the actual addition. The first two lines just grab the two values to add (from the blue zone). The third line does the real work. ![170] moves just past the following rule, and then we have two skips, with the distances determined by the two numbers we grabbed. We then write the <i>length</i> of this combined skip (i.e. the sum) just after the next rule.</p><p>So we're using the length-of template operation as a hack to implement addition. The paper by the organisers mentioned that this feature was specifically added to make arithmetic faster, although it's useful in other places too. It also explains the documentation for the <span style="font-family: courier;">init</span> function: "Ensures happy, trouble-free arithmetic by growing the DNA to the right length." That is necessary because if the jumps go off the end of the DNA the pattern will be considered not to have matched.</p><p>The reason for the second rule is that the first will write the sum with the minimum number of bases, whereas the higher-level code is all designed to deal with 24-acid numbers (or sometimes 9 or 12). So we need to pad the number out to the right length. We want to be able to grab the bits of the number <i>without</i> the terminating P, so that we can add some I's after it. Unfortunately, (?[P]) includes the P, so we need to use a little trickery. Let's say the number is currently CICCP: 4 bits and P. The first step inserts an F in front of the number. The second reads 5 bits, namely FCICC, discards the original P, and appends a bunch of I's and a replacement P. The third discards the F, reads 23 bits (the original 4 plus some of the I's), skips ahead to the P, then places those 23 bits and a P. So we now have a 24-acid number in the right place.</p><p>Now that we've seen some function calls and returns, let's see if we can fix something in the picture. Note that there many, many ways to achieve each goal, and I'm just going to list one (or a few) of them.</p><p>We'll replace the apple trees with pear trees. To do that, we're replace the code in <span style="font-family: courier;">appletree</span> to just call <span style="font-family: courier;">peartree</span> instead of drawing an apple tree. He's the code we'll write at the front:</p><p>![14036](![5800492](![14123])) -> (0)(1)</p><p>This looks similar to the function call we saw earlier, but the return address is missing, and so is the jump to the start of the blue zone (which we no longer need since we're not pushing anything). So how does <span style="font-family: courier;">peartree</span> know where to return to? The function that called <span style="font-family: courier;">appletree</span> will have pushed a return address, and that's what <span style="font-family: courier;">peartree</span> will pop and return to. This is what's known as a <i>tail call</i>, and it saves us the bother of writing our own return statement.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ihHezBL1ykk/YIG3MqsJdvI/AAAAAAAABSE/FTvGpjRQ8bI_ZsvdCM5qo-gRVvWcrEhBQCLcBGAsYHQ/s600/102_peartree.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-ihHezBL1ykk/YIG3MqsJdvI/AAAAAAAABSE/FTvGpjRQ8bI_ZsvdCM5qo-gRVvWcrEhBQCLcBGAsYHQ/s320/102_peartree.png" /></a></div><br /><p>Let's take a look at what went wrong with the virus help page that caused the title to be upside down (and if you swapped out the wingding font for a normal one, you'd have noticed that all the text is upside down). It has a very deeply nested rule:</p><p>(![313]) -> (0)(0)</p><p>(![313])(?[P]) -> <br />+(![<t:(1)>]?[IIIPCCCCCP]) -> IIIPCCCCCP(0)<br />+(![<t:|0|>])![10] -> (0)IIIPIPIPIP<br />+![10](![<t:|0|>]) -> <t:<t:<t:(0,3)>>>|0|(0)<br />+(![313]) -> (0)(0)</p><p>This shows one way to implement a loop, particularly in standalone code that can't rely on the green zone. The first instruction reads the body of the loop and makes two copies of it. The last instruction of the loop does the same thing, so there is always one copy of the loop being consumed and one backup copy.</p><p>The loop itself reads a number embedded immediately after the (backup) loop cody, jumps that far ahead (using <t:(1)> to read the number loaded in the previous line), then searches for a string. It also adds a copy of that string to the front of the DNA. This is for a similar reason to <span style="font-family: courier;">addInts</span> adding an F to the front: it allows one to skip to the <i>start</i> of the searched-for string, instead of the end (third line of the loop body), and replace it with a different string. The fourth line discards those 10 filler acids that were placed at the front, and makes a jump purely so that its length can be used in the replacement. The template replaces the backup copy of the loop (which was consumed in the first line) and puts the jump distance after it (replacing the number read in the first line).</p><p>What does all that mean? It's a search-and-replace. The number stored after the backup loop is the distance already searched, so that searches don't have to restart from the beginning. It jumps ahead to the position indicated by the number, then finds the next match and replaces it. One thing to keep in mind with nested rules is that if one rule in a set doesn't match, the subsequent ones aren't produced and hence are not run. Thus, once the last match is replaced, the loop stops, because the first step erases the backup copy of the loop (and the number) and the steps that replace it won't occur.</p><p>In this case, the loop replaces IIIPCCCCCP, the RNA for counter-clockwise turn, with IIIPIPIPIP (nonsense RNA). This followed by a similar loop that replaces IIIPFFFFFP with IIIPCCCCCP, then another to replace IIIPIPIPIP with IIIPFFFFFP (RNA for clockwise turn). Thus, it's swapped left with right, which explains the upside-down page.</p><p>We probably wouldn't have been warned about this for nothing. Maybe this loop is the virus that the page warns us about? We can search for it elsewhere. Some parts of it change depending on what swap is being been made, but the replacement template <t:<t:<t:(0,3)>>>|0|(0) seems pretty unusual, and is independent of the replacement code. That corresponds to IPCCPPPPFPFPPFPFPFP in the original DNA, which we can search for. There are 4 matches: the 3 we've already seen..., and one in <span style="font-family: courier;">surfaceTransform</span>. Just by running that function, we can see that it's responsible for the hills, but also recall from part 1 that when we tried setting <span style="font-family: courier;">hillsEnabled</span>, things went horribly wrong:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-X9tGG-C7_pY/YIHA8CiUuJI/AAAAAAAABSM/1jLAxx4_lgAFbiwHYTYcSlo56zDmFpmvACLcBGAsYHQ/s600/024_history-hillsEnabled.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-X9tGG-C7_pY/YIHA8CiUuJI/AAAAAAAABSM/1jLAxx4_lgAFbiwHYTYcSlo56zDmFpmvACLcBGAsYHQ/s320/024_history-hillsEnabled.png" /></a></div><br /><p>In that case the virus is replacing IIIPIIPICP (RNA for resetting the colour bucket) with IIIPIPIPIF (nonsense RNA). No wonder everything seems so green. Let's consider a general approach to getting rid of sections of code we don't want: using a forward jump. We overwrite the start of the virus with the rule ![n] ->, for some n. It's actually not quite trivial to determine n, because it needs to be the length of the virus <i>minus</i> the length of the replacement rule, and we don't know the length of the rule until we know how many acids we need to encode n. One approach is just to use a large fixed number of bits (with 0's in the high-order bits); this seems to be generally the approach taken by the organiser's compiler. However, we get a higher score if we use a shorter prefix, so there is some motivation to use shorter replacements. My approach was to use a function that would take an initial estimate of the position of the end of the rule, compile it, then update the estimate and repeat until convergence. In theory there are some corner cases where this could oscillate and a padding bit would be required, but I've not hit one yet.</p><p>A more efficient solution (in terms to changing fewer bases) is just to change the string that is searched for. But the tool above will come in handy elsewhere.<br /></p><p>With that fixed, the hills now appear in the scene, although they're not yet in quite the right positions:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-dSMKJD7fyrs/YIHDsfNxPnI/AAAAAAAABSc/mT_ipwfrjhk6OByiY0KXyCHqNVrkUMNHQCLcBGAsYHQ/s600/103_virus.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-dSMKJD7fyrs/YIHDsfNxPnI/AAAAAAAABSc/mT_ipwfrjhk6OByiY0KXyCHqNVrkUMNHQCLcBGAsYHQ/s320/103_virus.png" /></a></div><br /><p>So how are those hills drawn anyway? It actually involves a fair amount of code. Here's what my disassembler spits out for the first hill:<br /></p><p>0x0542 <blue+0x018> := 0<br />0x058e <blue+0x000> := 242<br />0x05da CALL moveTo<br />0x0693 PUSHS 120<br />0x06d3 <blue+0x018> := 8388580<br />0x071f <blue+0x000> := 3348<br />0x076b CALL functionParabola<br />0x0824 cbfArray+0x48[:72] := POP 72<br />0x08a1 PUSHS 144<br />0x08e2 <blue+0x030> := 408<br />0x092e <blue+0x018> := 7<br />0x097a <blue+0x000> := 4<br />0x09c6 CALL functionSine<br />0x0a7f cbfArray+0x90[:72] := POP 72<br />0x0afc PUSHS 216<br />0x0b3d <blue+0x048>[:72] = cbfArray+0x48[:72]<br />0x0bd4 <blue+0x000>[:72] = cbfArray+0x90[:72]<br />0x0c6b CALL functionAdd<br />0x0d24 cbfArray[:72] := POP 72<br />0x0da1 PUSHS 120<br />0x0de1 <blue+0x030>[:72] = cbfArray[:72]<br />0x0e78 <blue+0x018> := 250<br />0x0ec4 <blue+0x000> := 50<br />0x0f10 CALL drawFunctionBase<br /></p><p>PUSHS X means make X acids of space on the stack. [:72] just indicates the size of a value. The disassembler doesn't know about negative numbers and two's complement, so 8388580 is really 8388580 - 2^23 = -28. In summary, it generates two closures, one from <span style="font-family: courier;">functionParabola</span> and one from <span style="font-family: courier;">functionSine</span>, and adds them, then passes the resulting closure to <span style="font-family: courier;">drawFunctionBase</span>.</p><p>The second hill is similar, but with different parameters and <span style="font-family: courier;">moveTo</span> location, while the third uses only a sine function. The first piece of text we found in the DNA mentioned that the organisers had sabotaged the DNA by swapping some parabolas around, so see what happens if you swap the parameters of the two parabolas. To do this you'll need to find the positions of the numbers in the DNA (they're quoted). I found it very useful to reuse some of my disassembly and assembly code to disassemble the desired instruction, replace the value (after checking that it matched what I expected it to be) and reassemble the instruction. That allowed me to put in a number of safety checks so that typing errors didn't send me off on long debugging sessions.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-NV1WKVCsb2Q/YIHHKu_4kqI/AAAAAAAABSk/COGrco_7BO05JrOOuq6rRoyzPRv_6zp4QCLcBGAsYHQ/s600/104_parabolas.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-NV1WKVCsb2Q/YIHHKu_4kqI/AAAAAAAABSk/COGrco_7BO05JrOOuq6rRoyzPRv_6zp4QCLcBGAsYHQ/s320/104_parabolas.png" /></a></div><p></p><p>That looks worse, but actually it is progress. If you compare this to the target image, you'll find that the left two hills are merely at the wrong heights. It helps to use an image editor that lets you shift the images relative to each other while showing a difference or other blend of them (I used the GIMP, opening the images as two layers and dragging one across the other). On the first hill, change the second argument to <span style="font-family: courier;">moveTo</span> from 242 to 218, and on the second hill, from 209 to 235.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-2qtdqw6QU7Y/YIHIMWWoSAI/AAAAAAAABSs/vPamh3yooFQYZ_ZG1Ec0GNwAkJj8btMFwCLcBGAsYHQ/s600/105_parabolas_heights.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-2qtdqw6QU7Y/YIHIMWWoSAI/AAAAAAAABSs/vPamh3yooFQYZ_ZG1Ec0GNwAkJj8btMFwCLcBGAsYHQ/s320/105_parabolas_heights.png" /></a></div><p>The third hill is trickier. You can <i>almost</i> make it match by changing both the x and y positions, but there are always a few pixels that don't quite match. I don't know if there is a clever way to determine the solution (maybe analysing <span style="font-family: courier;">functionSine</span> to determine the meaning of the parameters and then fitting them), but brute force (automatically trying multiple values) will get you there if you know that only one of the parameters is wrong. In fact, the second argument must change from 65 to 60.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-9VqSfuPsKyA/YIHJQYJu50I/AAAAAAAABS0/yXXChZrwtjg4SxZxzRkkjyLGIwUCGh8DQCLcBGAsYHQ/s600/106_hills.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-9VqSfuPsKyA/YIHJQYJu50I/AAAAAAAABS0/yXXChZrwtjg4SxZxzRkkjyLGIwUCGh8DQCLcBGAsYHQ/s320/106_hills.png" /></a></div><p>While it's generally not this bad, the contest does have a fair amount of tedious modification of coordinates of things to put them into the right place. For example, the caravan will need to move; you can find the position being set at offset 0x3b66 in scenario if you want to try your hand.</p><p>For now let's keep things interesting by looking at some code. But first let's get the last few pages of the repair guide. If you disassemble main, you'll see various bits of code that compare the value at <span style="font-family: courier;">helpScreen</span> to a constant and then call one of the help functions we've already seen. Even if you don't have a disassembler that can interpret all the rules, you should be able to extract these to determine the valid repair page numbers. There are two that weren't shown in Part 1, and we missed them because they're implemented inline in <span style="font-family: courier;">main</span> rather than in a <span style="font-family: courier;">help-</span> function we could call.</p><p>1024 (which you might have found by brute force):</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ITT4ERIvGPo/YIHLyRglxbI/AAAAAAAABS8/bEVaKP0hujUHJ2nye0rsPNOPtrZSQ9AqACLcBGAsYHQ/s600/1024_compressor.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-ITT4ERIvGPo/YIHLyRglxbI/AAAAAAAABS8/bEVaKP0hujUHJ2nye0rsPNOPtrZSQ9AqACLcBGAsYHQ/s320/1024_compressor.png" /></a></div><br /> 180878:<p></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-NJvGrt0GzW4/YIHMA4XSUHI/AAAAAAAABTA/IJNCUprDrmQCuAPFqQ3cpXPVvU4GgRNtgCLcBGAsYHQ/s600/180878_spirograph.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-NJvGrt0GzW4/YIHMA4XSUHI/AAAAAAAABTA/IJNCUprDrmQCuAPFqQ3cpXPVvU4GgRNtgCLcBGAsYHQ/s320/180878_spirograph.png" /></a></div><br />That yellow shape in the middle looks like exactly what we need to put at the centre of the sun (although it's too large). But page 1024 tells us more about the compressor hinted at by page 123456. Now that we've seen the virus, the principle of a loop that keeps copying itself makes more sense. It says it's used for bitmaps, so let's look inside a page that has a bitmap: <span style="font-family: courier;">alien-lifeforms</span>.<p></p><p>The compressor itself looks like this:</p><p>0x182b (![19])(![944])(?[P]) -> (0)(2)(1)<br />0x186c (![56]![128]![32]) -> (1)(0)(1)<br />0x18a5 (![672])(![178]) -> (1)(0)(1)<br /></p><p>Or at least it appears to: this is one of those rare cases of self-modifying code that fools disassembly. One should be suspicious of the middle line because it appears to reference a capture group that doesn't exist. Here's the raw DNA for it:</p><p>IIPIPIIICCCPIPIIIII IICIIPIPIIIIICPIICIICIPPCPIPPPIPPCPIIC</p><p>The first rule grabs the next number from the green area, and pastes it into the <i>middle</i> of the next instruction (at the gap shown). It appears after IPIIIII, which turns it into a jump that's 32× bigger than the number (left shift by 5). So for example, if the number is 3, the rule becomes</p><p>(![56]![96])(![32]) -> (1)(0)(1)</p><p>The ![56] jumps over the 3rd instruction, and the ![96] jumps to the 3rd 32-bit table entry, which is then copied to the front and executed. The final instruction just restores the red part from the orange part.</p><p>What's in these 32-acid table entries? One could conceivably do quite a lot, but they aren't used for much. Most of them consist of IPIIIIII...IPIII<RNA> i.e. a jump of 0 (just for padding) and then one piece of RNA (all of which get welded onto the front of the next instruction). The last entry (20) is special: it simply jumps over the rest of the compressor to resume execution.</p><p>Now that we know what the decompressor looks like, let's go see where it gets used. We can take the raw DNA and just search for it in the downloaded DNA. It appears in the following functions: 'M-class-planet', 'alien-lifeforms', 'cargobox', 'fontCombinator', 'fuundoc1', 'fuundoc2', 'fuundoc3', 'grass1', 'grass2', 'grass3', 'grass4', 'help-steganography', 'most-wanted', 'printGeneTable', 'sticky', 'transmission-buffer'.</p><p>Most of these are unsurprising, since they contain some sort of image or complex shape. But what about printGeneTable? It's just text — at least on the pages we asked for. Maybe, like the repair guide, there are hidden pages? Let's ask for page 15:</p><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-SPS02frP4GY/YIU8sVj1qPI/AAAAAAAABTU/uR1xh30q9wwua3tqpeSvUhkhV-YqqaIkwCLcBGAsYHQ/s600/107_gene_table15.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-SPS02frP4GY/YIU8sVj1qPI/AAAAAAAABTU/uR1xh30q9wwua3tqpeSvUhkhV-YqqaIkwCLcBGAsYHQ/s320/107_gene_table15.png" /></a></div><p>Notice the image in the bottom-right corner? It's a match for the whale spout in the target picture, although upside down. We'll come back to it when we start assembling the final picture.</p><p>The cargo box also uses the compressor. Dump the RNA in the table:</p><p><span style="font-family: courier;">0x0256 Entry 0: RNA: move<br />0x0276 Entry 1: RNA: cw<br />0x0296 Entry 2: RNA: line<br />0x02b6 Entry 3: RNA: mark<br />0x02d6 Entry 4: RNA: ccw<br />0x02f6 Entry 5: RNA: red<br />0x0316 Entry 6: RNA: black<br />0x0336 Entry 7: RNA: white<br />0x0356 Entry 8: RNA: magenta<br />0x0376 Entry 9: RNA: fill<br />0x0396 Entry 10: RNA: reset<br />0x03b6 Entry 11: RNA: cyan<br />0x03d6 Entry 12: RNA: green</span><br /></p><p>The source image has a rather purple cargo box, which is probably built from the magenta in the table. What if we change that table entry to yellow?</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-afUkfJHKuCk/YIU_jv5H95I/AAAAAAAABTc/viTIKDk1VQo-TUzOlGoCtLgFkbXU9QagwCLcBGAsYHQ/s600/108_cargobox1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-afUkfJHKuCk/YIU_jv5H95I/AAAAAAAABTc/viTIKDk1VQo-TUzOlGoCtLgFkbXU9QagwCLcBGAsYHQ/s320/108_cargobox1.png" /></a></div><p>Visually it looks right, but comparing pixel values to the target, the filling in the A is now slightly the wrong colour, with too much green and not enough blue. The only other table colour that uses different amounts of green and blue is the last one (green). So maybe we need to change that to blue to balance it out? This does indeed work.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-7KLggyQ0834/YIVARIikfnI/AAAAAAAABTk/DTlV4qEknj89eaTK2ih863vb2CLXAMRBACLcBGAsYHQ/s600/109_cargobox2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-7KLggyQ0834/YIVARIikfnI/AAAAAAAABTk/DTlV4qEknj89eaTK2ih863vb2CLXAMRBACLcBGAsYHQ/s320/109_cargobox2.png" /></a></div><p>It's about time we turned the weather on again so that we can get the clouds and work on the cow. Set <span style="font-family: courier;">weather</span> to 2 and <span style="font-family: courier;">enableBioMorph</span> to true:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-nkvhR3FTuKU/YIVeECX4I8I/AAAAAAAABUM/cvX9DnygyB0O1GsJqOAnamUwOxFXlNuQACLcBGAsYHQ/s600/110_weather.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-nkvhR3FTuKU/YIVeECX4I8I/AAAAAAAABUM/cvX9DnygyB0O1GsJqOAnamUwOxFXlNuQACLcBGAsYHQ/s320/110_weather.png" /></a></div><br /><p></p><p>There are a few elements in the scene we don't want, such as the lightning bolt and the rain. We've seen one way to disable code we don't want (replace a chunk of code by a jump), but I'll demonstrate another that can disable individual rules with just a one-acid change (which will help our score). If you disassemble <span style="font-family: courier;">sky-day-bodies</span>, you'll see the call to <span style="font-family: courier;">lightningBolt</span> at 0x29a, with DNA that starts with a jump:<br /></p><p>IPIIIICICIIIICIIIIIIIIIIIIP = ![2128]<br /></p><p>What happens if we change the last I to a P? It becomes</p><p>IPIIIICICIIIICIIIIIIIIIIIPP = ![2128]F</p><p>Suddenly the rule is expect to have an F in a particular place (the start of the green zone) or it won't work, and there isn't one there. So instead of making a call, nothing happens:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-JpZ_fyuUB0E/YIVeLKlXviI/AAAAAAAABUQ/KVDj-oYsF7E5yF7DbbjW33ufT_LR9SOyQCLcBGAsYHQ/s600/111_no_lightning.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-JpZ_fyuUB0E/YIVeLKlXviI/AAAAAAAABUQ/KVDj-oYsF7E5yF7DbbjW33ufT_LR9SOyQCLcBGAsYHQ/s320/111_no_lightning.png" /></a></div><p>Note that this only works for calls to functions without arguments, since the arguments are normally pushed separately and without the function call, they won't get popped again.</p><p>While we have this tool handy, let's also zap the calls to <span style="font-family: courier;">lambda-id</span> (at <span style="font-family: courier;">scenario+0x4322</span>), <span style="font-family: courier;">crater</span> (at <span style="font-family: courier;">scenario+0x8b30</span>) and <span style="font-family: courier;">cloak-rain</span> (at <span style="font-family: courier;">scenario+0xa370</span>).</p><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-cmcjSgxDPpE/YIVeVYRL5CI/AAAAAAAABUY/x3clpiA4G-UkLIbqgWXrISLCH8GHAAIdACLcBGAsYHQ/s600/112_no_things.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-cmcjSgxDPpE/YIVeVYRL5CI/AAAAAAAABUY/x3clpiA4G-UkLIbqgWXrISLCH8GHAAIdACLcBGAsYHQ/s320/112_no_things.png" /></a></div><p>What's with all the red, yellow and black? Disassembling the various functions shows that a variable called <span style="font-family: courier;">cloudy</span> seems to play a role. Patching its value in the original DNA doesn't seem to help. With some hacks on the DNA interpreter it's possible to see where it gets changed (or you could just disassemble all the things): when <span style="font-family: courier;">cloud</span> is run, it sets it to true. The trick we used earlier doesn't have quite the same effect here: we end up replacing</p><p>(![831561])![1] -> (0)P</p><p>with</p><p>(![831561]F)![1] -> (0)P</p><p>and since <span style="font-family: courier;">cloudy</span> starts off as <span style="font-family: courier;">F</span>, the rule still matches, and ends up writing a <span style="font-family: courier;">P</span> into the following acid. Fortunately it's one that doesn't matter, but there is an alternative way to disable the rule, which will be useful later. The ![1] encodes as IPCP, and we change that to PPCP, which decodes as FFIF, which is unlikely to match unless we're very unlucky (if we are though, things will go badly wrong because we'll be resizing the green zone).</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-qQEgJW6K8ss/YIVnZovGojI/AAAAAAAABUg/OBIWc0h5B3sAWxnFV32LyBb74ECJKBUPACLcBGAsYHQ/s600/113_no_german.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-qQEgJW6K8ss/YIVnZovGojI/AAAAAAAABUg/OBIWc0h5B3sAWxnFV32LyBb74ECJKBUPACLcBGAsYHQ/s320/113_no_german.png" /></a></div><br /><p>In Part 1 I pointed out that we can see a faint shadow of the desired cow behind the endo-cow hybrid. Let's see if we can recover the cow. If you look at the start of <span style="font-family: courier;">bmu</span>, you'll see RNA early on to put one opaque and 9 transparent into the bucket. What if we change all the transparent to opaque?</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-fB6yFIP0fxc/YIgsWhuvPeI/AAAAAAAABUw/2oZc5P5fGLYqD1dACGtDBQ10aEjJwRUFQCLcBGAsYHQ/s600/114_cow_opacity1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-fB6yFIP0fxc/YIgsWhuvPeI/AAAAAAAABUw/2oZc5P5fGLYqD1dACGtDBQ10aEjJwRUFQCLcBGAsYHQ/s320/114_cow_opacity1.png" /></a></div><br /><p>Not quite what we were hoping for. You'll notice that after the transparency RNA there is a fill; so the entire layer is now solid black. The reason only a cow-shaped region appears black is that the cow is used to clip this layer. What we actually want is to compose the cow onto the scene. There are a few ways to fix this. One is to change the clip RNA into compose, and instead of changing the transparent commands to opaque, change the one opaque command to transparent (if the opaque command is left in, the whole scene ends up a bit too dark).</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-6KNDAhijknM/YIguSqeTeFI/AAAAAAAABU4/c_ofjHsbahw3hFxH_whNtRWfoZusr9xfwCLcBGAsYHQ/s600/115_cow_opacity2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-6KNDAhijknM/YIguSqeTeFI/AAAAAAAABU4/c_ofjHsbahw3hFxH_whNtRWfoZusr9xfwCLcBGAsYHQ/s320/115_cow_opacity2.png" /></a></div><br />We'd better get rid of the unwanted <span style="font-family: courier;">endocow</span>, which we can do the same way we've dealt with other unwanted scene elements.<p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-GVc_mTgQXw0/YIguoMom-dI/AAAAAAAABVA/pU6Ex5o9ntQEZp_05_7sE5Msm1ML0YpZQCLcBGAsYHQ/s600/116_cow_opacity3.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-GVc_mTgQXw0/YIguoMom-dI/AAAAAAAABVA/pU6Ex5o9ntQEZp_05_7sE5Msm1ML0YpZQCLcBGAsYHQ/s320/116_cow_opacity3.png" /></a></div><p>His tail is missing, because I forgot to decrypt it. In this case, the code actually runs an integrity check on the tail (as well as on <span style="font-family: courier;">cow-spot-middle</span>) and skips it if the integrity check fails. Recall from Part 1 that it is encrypted with 9546.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-IoSqeYQNAJg/YIgvcmqfJjI/AAAAAAAABVI/P931F_fUrG0y8aRMbL1C-HGN_UlDzMgqgCLcBGAsYHQ/s600/117_cow_tail.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-IoSqeYQNAJg/YIgvcmqfJjI/AAAAAAAABVI/P931F_fUrG0y8aRMbL1C-HGN_UlDzMgqgCLcBGAsYHQ/s320/117_cow_tail.png" /></a></div><p></p><p>At first glance it looks good, but the colour is wrong. It fact, it's completely opaque, whereas in the target image it is translucent, and you can see the grass through it. We're going to need to patch it to insert some opaque and transparent commands to get the colour right. We'll need to determine the correct number. </p><p>One complication is that the entire scene is overlaid by a fine grid of almost-transparent lines, which is produced by the function <span style="font-family: courier;">anticompressant</span> (as the name suggests, the purpose is to make it more difficult to brute-force the problem by generating the image directly with RNA). While not strictly necessary, it'll be easier to solve colour problems if we don't have to worry about it. For the source picture we can use one of the techniques already discussed to disable the function, but what about the target picture? To fix that that, we'll start by finding the image that <span style="font-family: courier;">anticompressant</span> overlays. We can do that by getting our RNA-to-image tool to spit out the first image (including the alpha channel) each time composition is done. It's easy to mistake the anticompressant pattern for a black image because it is so faint. With the levels adjusted in an image editor, the colour and alpha components look like this:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-VD3QV2pFc2M/YIhN02ch00I/AAAAAAAABVQ/LZ5Rv0qZNwgILCv-bGzMKtM9N5MwBme3QCLcBGAsYHQ/s600/anticompressant-rgb-bright.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-VD3QV2pFc2M/YIhN02ch00I/AAAAAAAABVQ/LZ5Rv0qZNwgILCv-bGzMKtM9N5MwBme3QCLcBGAsYHQ/s320/anticompressant-rgb-bright.png" /></a><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-pgiuVouhrKI/YIhN2LnUF7I/AAAAAAAABVU/qSTfrpgVAAgrYz0lazyxb-ScLMJY_g9sACLcBGAsYHQ/s600/anticompressant-alpha-bright.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-pgiuVouhrKI/YIhN2LnUF7I/AAAAAAAABVU/qSTfrpgVAAgrYz0lazyxb-ScLMJY_g9sACLcBGAsYHQ/s320/anticompressant-alpha-bright.png" /></a></div></div><br /><p>Now using the equation for composing images from the specification, you can reverse the process. There is a rounding step which loses information, but because the overlay is so faint, in most cases the colour value can be recovered exactly, and in other cases there are only two possibilities.</p><p>So returning to the problem of the tail: let's choose a pixel and check its colour when the tail is absent (e.g. it wasn't decrypted), when it's present and fully opaque, and in the target picture; in each case with the anti-compressant absent or reversed. I got the following at (454, 348):</p><ul style="text-align: left;"><li>(0, 100, 0) when absent</li><li>(119, 166, 219) when opaque (and indeed on all pixels of the tail)</li><li>(83, 145, 152) in the target</li></ul><p>Let's say we now add some alpha RNA to the mix, so that the tail has an alpha value of <i>a</i>. The colour will then be (119<i>a</i>/255, 166<i>a</i>/255, 219<i>a</i>/255, a), where division rounds down. Just looking at the red component, this tells us that 83*255 ≤ 119<i>a</i> < 84*255, and similarly from the blue component, 152*255 ≤ 219<i>a</i> < 153*255. The only integer value for <i>a</i> that fails into those intervals is 178. 178/255 = 0.6980, which looks pretty close to 0.7, and indeed creating 70% opacity (by issuing opaque 7 times and transparent 3 times) will produce an alpha of 178.</p><p>That tells us what RNA to add, but how to we do it? There isn't room in the function to add them directly, but we can do some compression. For example, if one replaces some RNA with a rule like ([!30]) -> (0)(0)(0) it will repeat the following three pieces of RNA three times. So as long as we write a rule that is a multiple of 10 bases long, we can overwrite some of the existing RNA, and generate our new RNA plus the RNA we overwrite. I'll leave the details as an exercise for the reader (if you get stuck, there is a solution listed <a href="https://jochen-hoenicke.de/icfp07/prefix.html">here</a>).</p><p>Unfortunately when you do this, the tail disappears again! Now the integrity check is working against us: because we modified the function, it fails the integrity check. We can hack <span style="font-family: courier;">checkIntegrity</span> to always return true by disabling the jump at offset 0x611, which then gives us the cow with the correct translucent tail:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-vtqKqiz93TI/YIhVpvzwoDI/AAAAAAAABVc/U2AKYCRNFyU0t8-nUDhat-sHRIH9FhkfQCLcBGAsYHQ/s600/118_cow_tail_opacity.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-vtqKqiz93TI/YIhVpvzwoDI/AAAAAAAABVc/U2AKYCRNFyU0t8-nUDhat-sHRIH9FhkfQCLcBGAsYHQ/s320/118_cow_tail_opacity.png" /></a></div><br /><p>What about the whale? When we simply called <span style="font-family: courier;">whale</span> it looked alive, but in our picture it looks dead. So possibly it examines some state to determine whether it is alive or dead? Indeed, the code has an IF statement (following the pattern we've seen before), and we can disable it in the usual way (replacing the high bit of a jump with a P in the instruction at offset 0xcb1). Incidentally, it seems like the function takes two boolean arguments but only uses one of them; I don't know why.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-10ppjrC-8Io/YIhY-mWkKSI/AAAAAAAABVk/Eqi9-W-9neohEiqIg1NGJsbqueo_dxXLwCLcBGAsYHQ/s600/119_whale_live.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-10ppjrC-8Io/YIhY-mWkKSI/AAAAAAAABVk/Eqi9-W-9neohEiqIg1NGJsbqueo_dxXLwCLcBGAsYHQ/s320/119_whale_live.png" /></a></div><p>While we're dealing with animals, what about the ducks? If you disassemble scenario you'll see that there is a call to <span style="font-family: courier;">motherDuckWithChicks</span> — so where they? Look a little further and you'll realise that it's unreachable code: a boolean is set to true, and then if that same boolean is true, we jump over the code. Break the goto in the usual way, and the ducks appear — although one seems to become lost in the trees.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-gtmeN6AGmlk/YIhaoiheV0I/AAAAAAAABVs/_ioSKNSjIUQygM78iLhmTdarv8wO8b8hwCLcBGAsYHQ/s600/120_ducks.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-gtmeN6AGmlk/YIhaoiheV0I/AAAAAAAABVs/_ioSKNSjIUQygM78iLhmTdarv8wO8b8hwCLcBGAsYHQ/s320/120_ducks.png" /></a></div><p>What's not immediately obvious, but far more serious, is that half the elements in the scene have shifted two pixels to the left! You might remember from part 1 that <span style="font-family: courier;">motherDuck</span> failed the integrity check, so there is probably an issue somewhere there that causes the cursor to end up in the wrong position.</p><p>Page 8 of the field repair guide describes how polygons are encoded, and says that the last pair is the sum of all movements. What if it isn't? It will lead to exactly this sort of problem. Polygons are easy to recognise in the DNA with a regex (remember that they appear quoted), so we can find the polygons and check them, and indeed one of the polygons in <span style="font-family: courier;">motherDuck</span> is 2 pixels off. Finding which offset is wrong takes a little more work. I just used brute force: try adjusting every X value by 2 pixels, render them all, and check which matches the target (it turns out to be the 63 numbers from the start of the polygon). Visually not much has changed, but everything is back in the right place:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Svtc4gD0-7g/YIhciKcvh5I/AAAAAAAABV0/l67KV-kUfMIQY7mVSVXBTf7s-2s6atIiACLcBGAsYHQ/s600/121_fix_motherDuck.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Svtc4gD0-7g/YIhciKcvh5I/AAAAAAAABV0/l67KV-kUfMIQY7mVSVXBTf7s-2s6atIiACLcBGAsYHQ/s320/121_fix_motherDuck.png" /></a></div><p>Next let's sort out the text at the bottom. It's going to be a little trickier than just replacing the text in the DNA, because the replacement text is longer. However, when everything went German, the text changed to "Endo hat gemorpht" which is exactly the right length for our replacement. So perhaps we can use just that bit of the code instead. Here's the relevant area of code:</p><p>0x9444 <blue+0x4b0>[:9216] = fontTable_Cyperus[:9216]<br />0x94f0 <blue+0x498> := 285<br />0x953c <blue+0x480> := 570<br />0x9588 <blue+0x000>[:162] := "Endo hat gemorpht"<br />0x968d CALL drawString<br />0x9746 GOTO 0x9c16<br />0x9767 colorTable := ...<br />0x98a2 charColorCallback := useColorTable,6709<br />0x9908 PUSHS 10416<br />0x994f <blue+0x4b0>[:9216] = fontTable_Cyperus[:9216]<br />0x99fb <blue+0x498> := 285<br />0x9a47 <blue+0x480> := 570<br />0x9a93 <blue+0x000>[:108] := "Morph Endo!"<br />0x9b5d CALL drawString<br />0x9c16 RNA: compose<br /></p><p>Immediately after the PUSHS at 0x9908, we'll make a (backwards) jump to 0x9444, which will print the longer piece of text (and which will then safely jump forwards to 0x9c16). What does a backwards jump look like? Unlike a forward jump, the target code doesn't exist in the red zone, so we have to restore it from the green zone. We copy the code from the jump target to just after the jump instruction. As for forward jumps, we can use two passes to solve the problem of not knowing where the jump instruction ends until after we've compiled it. In my implementation, it ends up looking like this:<br /></p><p>(![5074344](![1358])) -> (0)(1) <br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-3u9tr2mElp0/YIxQ8m2HVUI/AAAAAAAABWI/jBwzqzsHm48_HdoQZF6nU-RmiJS4WrXIACLcBGAsYHQ/s600/122_morphed.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-3u9tr2mElp0/YIxQ8m2HVUI/AAAAAAAABWI/jBwzqzsHm48_HdoQZF6nU-RmiJS4WrXIACLcBGAsYHQ/s320/122_morphed.png" /></a></div><br />Let's get the sun back into the sky. We saw in Part 1 that we could see it, in the right place but the wrong shade of yellow, by setting weather to 3. From the target picture (particularly with the anticompressant removed) one can see that the sun is the same shade as the flowers in the bottom-right. Disassembling <span style="font-family: courier;">flowerbed</span> shows that colour to be <span style="font-family: courier;">colorSoftYellow</span>. So we need to somehow call that before drawing the sun. And as luck would have it, there is some code we don't need or want in <span style="font-family: courier;">sky-day-bodies</span> just before drawing the sun, which is to check the value of <span style="font-family: courier;">weather</span>!<p></p><p>0x072c IF weather != 3 GOTO 0x0aae<br />0x07ac PUSHS 48<br />0x07eb <blue+0x018> := 480<br />0x0837 <blue+0x000> := 20<br />0x0883 CALL setOrigin<br />0x093c CALL sun<br />0x09f5 CALL resetOrigin<br />0x0aae RNA: compose<br /></p><p>There is plenty of room in that weather check to overwrite it with a function call, but we can also keep the prefix length down by not calling the whole function and instead just copying the RNA (which saves having to jump to the blue zone and write a return address). The recipe for this looks exactly like the backwards jump from above: we're copying code from the green zone to the front of the red zone, without disturbing the remaining code of the current function. Immediately after that we have to insert a forward jump to 0x07ac to skip over the remnants of the original code.<br /></p><p>Unfortunately on its own this won't fix the colour of the sun, because sun starts by setting the colour:</p><p>0x000a RNA: reset<br />0x0014 RNA: yellow<br /></p><p>We can fix this either by modifying the call to <span style="font-family: courier;">sun</span> to enter after the unwanted RNA, or just alter the RNA to become nonsense RNA that will be ignored. With that in place (and don't forget to replace the whole <span style="font-family: courier;">sun</span> function with the XOR of <span style="font-family: courier;">flower</span> and <span style="font-family: courier;">sunflower</span> first):</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-eSTmlnfTaz0/YIxbO54mDgI/AAAAAAAABWQ/UHzB_h4bxlMdIoS49iZsMIfeoU4yOG9vwCLcBGAsYHQ/s600/123_sun.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-eSTmlnfTaz0/YIxbO54mDgI/AAAAAAAABWQ/UHzB_h4bxlMdIoS49iZsMIfeoU4yOG9vwCLcBGAsYHQ/s320/123_sun.png" /></a></div><p></p><p>Next let's sort out the spirograph at the centre of the sun. As we saw, the code for it is in main, but we need a way to call it. A powerful technique is to patch the return addresses of function calls: this allows you to resume execution anywhere, even in another function, rather than with the next instruction. We're going to want to be able to control the position, so it'll be useful to get a call to <span style="font-family: courier;">setOrigin</span> before executing the desired code. As it happens, <span style="font-family: courier;">crater</span> has a call to <span style="font-family: courier;">setOrigin</span> very early on. So instead of disabling the call to <span style="font-family: courier;">crater</span>, let's leave it enabled and hijack it to draw the spirograph.</p><p>Change the return address at <span style="font-family: courier;">crater+0xf5</span> to <span style="font-family: courier;">main+0x6f9a</span>, which starts drawing the central spirograph. It's completed at main+0x866d, which has a <i>compose</i> RNA to balance the <i>add</i> RNA from the start of crater. We still need a <span style="font-family: courier;">resetOrigin</span> to balance the <span style="font-family: courier;">setOrigin</span> added by <span style="font-family: courier;">crater</span>. So add a forward jump to <span style="font-family: courier;">main+0x8ae7</span> (which is a call to <span style="font-family: courier;">resetOrigin</span>).</p><p>After that <span style="font-family: courier;">resetOrigin</span> call, we don't want to run the rest of the code in main. We could add another forward jump from after it to the end, or change the return address, but there is another trick we can use that requires a smaller change: we can remove the return address entirely, turning this into a tail call so that the callee returns directly to the caller. This isn't completely trivial: if we just terminate the terminate early then the skips in the call instruction will go to the wrong place (because we'll have consumed less of the red zone than expected). So we have to remove the return address without changing the length of the rule. We can do that by replacing the first three bases of the return address with IPP and the last with P. This <span style="font-family: courier;"></span>turns this part of the template into (n,0), where n is some very large number. The specification says that such substitutions are simply ignored.</p><p>That gives us this (don't forget to remove the code that disabled the crater):</p><p><br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-8mNlBDYt77s/YI048avEk2I/AAAAAAAABWg/n94ZlNyFTncWiEL3S5NZzw2QRteLj0wMQCLcBGAsYHQ/s600/124_spirograph.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-8mNlBDYt77s/YI048avEk2I/AAAAAAAABWg/n94ZlNyFTncWiEL3S5NZzw2QRteLj0wMQCLcBGAsYHQ/s320/124_spirograph.png" /></a></div><br />Clearly both the size and position will need to be fixed. We'll sort out a whole bunch of positions later, but let's look at the size. The first argument to <span style="font-family: courier;">spirograph</span> is called <span style="font-family: courier;">magnify</span>, which is a bit of a clue. One option is just to change it from 4 to 1 each time <span style="font-family: courier;">spirograph</span> is called. If you want to make a smaller patch, one can look inside <span style="font-family: courier;">spirograph</span>. It multiplies the next two arguments (radiusSum and radiusMoving) by <span style="font-family: courier;">magnify</span>, writing the results back in place. We'd like to disable that code. The writeback looks like this:<p></p><p>(![7519442])(![24]) -> (0)<br />+ (![7519707])![24] -> (0)<t:(1,1)><br /></p><p>The first line fetches and removes the return value left by <span style="font-family: courier;">mulInts</span>, and the second line uses it to replace an existing value. We can't disable the whole rule (remember, the + indicates that the second line is really a rule emitted by the template on the first line), but if we disable the second line it will have the desired effect. We can do this in our usual way (writing a P just before the end of the number in a jump), except that this time everything is quoted by an extra level, so we write IC instead.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-YWhTE_Rm0-c/YI0_myAI7FI/AAAAAAAABWo/VEKXaKywD3M6jWUwMZ52hu9cf46Z2OCbQCLcBGAsYHQ/s600/125_spirograph_size.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-YWhTE_Rm0-c/YI0_myAI7FI/AAAAAAAABWo/VEKXaKywD3M6jWUwMZ52hu9cf46Z2OCbQCLcBGAsYHQ/s320/125_spirograph_size.png" /></a></div><br />While we're calling into odd bits of code to draw missing picture elements, let's sort out the whale spout. We can again pick a function that we suppressed earlier and repurpose it. We have to be a little careful in our choice though, because it needs to be drawn before the whale (which overwrites part of it). I chose to use <span style="font-family: courier;">lambda-id</span>. For this part I haven't tried to be optimal, just to get something working. Replace the start of <span style="font-family: courier;">lambda-id</span> with a tail call to <span style="font-family: courier;">printGeneTable+0x16ef5</span> (which sets the location before drawing the spout). At <span style="font-family: courier;">printGeneTable+0x23c62</span> place a jump to <span style="font-family: courier;">printGeneTable+0x271dc</span> (the return statement). The return statement contains a ![1] to pop the boolean argument from the stack; change it to ![0] (without changing the length). In the RNA compression table, swap the entries for clockwise and counter-clockwise rotation. Don't forget to re-enable the call to <span style="font-family: courier;">lambda-id</span>.<br /><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-qTJ7mGPR5Zg/YI1FJvrV77I/AAAAAAAABW4/0ihLxsFsL6w46ijPDsIANQZQJU73iVM2wCLcBGAsYHQ/s600/126_whale_spout1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-qTJ7mGPR5Zg/YI1FJvrV77I/AAAAAAAABW4/0ihLxsFsL6w46ijPDsIANQZQJU73iVM2wCLcBGAsYHQ/s320/126_whale_spout1.png" /></a></div><br /><p>Well that went badly wrong. I'm not sure exactly why, but when objects wrap around (particularly to negative positions) it seems to confuse the code that keeps track of the current position. It's probably time to fix a whole bunch of positions. This is tedious work: find the code that calls <span style="font-family: courier;">moveTo</span> to <span style="font-family: courier;">setOrigin</span>, check pixel positions in an image editor, figure out how much to adjust the position by, and make the patch. In some cases one needs to provide a negative position to <span style="font-family: courier;">setOrigin</span> so that the actual drawn position is in range (encoded using 2's complement). I'll just provide a list of instruction offsets (for the first of two instructions that set x and y) and the correct values:</p><ul style="text-align: left;"><li>scenario+0x3ba5: (267, 210) (caravan)<br /></li><li>scenario+0x862c: (410, 200) (whale)</li><li>scenario+0x3157: (171, 410) (chick)</li><li>crater+0x005d: (-133, 147) <br /></li><li>printGeneTable+0x16f34: (385, -75) (spout)</li><li>flowerbed+0x0049: (34, 0) (flowers in bottom right)</li><li>flowerbed+0x04c8: (0, 0)</li><li>flowerbed+0x0951: (58, 12)</li><li>flowerbed+0x0dda: (17, 24)</li><li>clouds+0x0049: (20, 25)</li><li>clouds+0x03df: (180, 55)</li><li>clouds+0x077f: (340, 30)</li></ul><p>We also need to fix the sizes of the clouds (in practice, you'd do this before trying to fix the positions):</p><ul style="text-align: left;"><li>clouds+0x56e: 10</li><li>clouds+0x90e: 20 <br /></li></ul><div><p>That makes things look a lot better:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-DIa2DuxYF7k/YI1KGNEOFDI/AAAAAAAABXM/Dfa2fK5XWloZZjjgkZRc0mkjRCXcaC0FgCLcBGAsYHQ/s600/127_positions.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-DIa2DuxYF7k/YI1KGNEOFDI/AAAAAAAABXM/Dfa2fK5XWloZZjjgkZRc0mkjRCXcaC0FgCLcBGAsYHQ/s320/127_positions.png" /></a></div><p>What do we still need to fix?</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-IV01My12lQc/YI1KKrq3cBI/AAAAAAAABXQ/yN_uZjfM2BcW0k9bhusM-lyohHUyomGygCLcBGAsYHQ/s600/diff.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-IV01My12lQc/YI1KKrq3cBI/AAAAAAAABXQ/yN_uZjfM2BcW0k9bhusM-lyohHUyomGygCLcBGAsYHQ/s320/diff.png" /></a></div><br />The speech bubble and the swimming pool are pretty clear, but there is also a little bit of red at the base of the windmill, because the chick is drawn before the windmill and so the top of its head is clipped. Let's sort that out first. We need to call the code that draws the chick at some point after the windmill. There are probably lots of ways to do this, but I chose one that also lets us eliminate the rain at the same time without a separate patch. At <span style="font-family: courier;">scenario+0xa2b7</span> (the instruction before calling <span style="font-family: courier;">cloak-rain</span>), change the return address to <span style="font-family: courier;">scenario+0x32a8</span> (which is the start of the call to chick). Then after 0x346d, jump forward to 0xa429 (just after the call to <span style="font-family: courier;">cloak-rain</span>). The chick now takes its position from the position set at 0xa21f, so the position needs to be patched there instead of where it was originally patched.<p></p><p>We also need to prevent the chick from being drawn earlier, since then the forward jump will skip right over the windmill (and a lot of other things besides). At 0x273c there is an instruction to set <span style="font-family: courier;">ducksShown</span> to true, which is later checked to decide whether to draw the chick. We can disable that in the same way we've disabled other instructions that set a boolean to true.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-HYmNonFDyyo/YI1V5UhlKHI/AAAAAAAABXc/GWXLqoSMPP0RwYyxmF5OmRpGD2GF-tFFwCLcBGAsYHQ/s600/128_chick_overlay.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-HYmNonFDyyo/YI1V5UhlKHI/AAAAAAAABXc/GWXLqoSMPP0RwYyxmF5OmRpGD2GF-tFFwCLcBGAsYHQ/s320/128_chick_overlay.png" /></a></div><p>Now let's change the λ to a μ. The speech bubble is drawn in the function <span style="font-family: courier;">balloon</span>. And the relative bit of code is</p><p>0x16e4 PUSHS 10416<br />0x172b <blue+0x4b0>[:9216] = fontTable_Tempus-Bold-Huge[:9216]<br />0x17d7 <blue+0x498> := 98<br />0x1823 <blue+0x480> := 20<br />0x186f <blue+0x000>[:18] := "L"<br />0x18d5 CALL drawString<br /></p><p style="text-align: left;">In Part 1 we swapped out the font on help page 10646 to see different fonts, but if you try this with Tempus-Bold-Huge it'll draw the lambda, then (in my implementation) crash out with an integer overflow. I'm left with this:</p><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/-3zfofXi6bEY/YI1X-PM7FvI/AAAAAAAABXk/5BTZwWx8QlAfXJ2b2ltLPN12w-cUzZTvwCLcBGAsYHQ/s600/129_tempus_huge.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-3zfofXi6bEY/YI1X-PM7FvI/AAAAAAAABXk/5BTZwWx8QlAfXJ2b2ltLPN12w-cUzZTvwCLcBGAsYHQ/s320/129_tempus_huge.png" /></a></p></div><p style="text-align: left;"></p></div><div style="text-align: left;"><p>So there may be something wrong with this font. The symbol table includes charInfo_Tempus-Bold-Huge_, charInfo_Tempus-Bold-Huge_L and charInfo_Tempus-Bold-Huge_M; maybe one of them is broken. On extracting the DNA from the first two we see that there are no F's and that the P's are generally preceeded by either C or P, which suggests they are sequences of variable-length numbers. We were told that the RNA compressor was used for font tables, so this isn't surprising. However, the last one seems almost random, with a mix of all four bases.</p></div><div style="text-align: left;"><p>Random... or encrypted? In Part 1 we found one password that we haven't found a use for yet: no1@Ax3. And indeed, if we decrypt charInfo_Tempus-Bold-Huge_M with it, the font table looks healthy again, and we have our μ.</p></div><div style="text-align: left;"><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/-0i5FnYa-FFg/YI17V0D5nUI/AAAAAAAABXs/PkgFgvht_98TP-UJE2No7w8uni3SaNjxgCLcBGAsYHQ/s600/129_tempus.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-0i5FnYa-FFg/YI17V0D5nUI/AAAAAAAABXs/PkgFgvht_98TP-UJE2No7w8uni3SaNjxgCLcBGAsYHQ/s320/129_tempus.png" /></a></p></div><p>Now we can change the "L" to an "M" where it is used:</p></div><div style="text-align: left;"><p></p></div><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/-gT9f7D_epjA/YI17uPwTQ7I/AAAAAAAABX4/n_MmGT8UNmMSIdpa1PO40nemMbDr3FvOACLcBGAsYHQ/s600/130_mu.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-gT9f7D_epjA/YI17uPwTQ7I/AAAAAAAABX4/n_MmGT8UNmMSIdpa1PO40nemMbDr3FvOACLcBGAsYHQ/s320/130_mu.png" /></a></p></div><p style="text-align: left;"><br /></p><div style="text-align: left;"><p>Getting the shape if the speech balloon right is much harder. As far as I'm aware, the right shape isn't present in Endo's DNA.One approach is to reverse-engineer the polyline that forms it, and replace the existing polyline. I took a different approach: drawing the outline directly with RNA. However, RNA is rather unwieldy (requiring 10 bases per command), so some compression is in order. We'll start withit it though.</p></div><div style="text-align: left;"><p>Firstly we'll extract the shape of the balloon. Start at some arbitrary point inside the balloon, and do a flood-fill. The inside isn't quite a uniform colour, so match anything that's close enough to gray e.g. difference between min and max channel is at most 50.</p></div><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/-wbcU0e35mhM/YI2AuXJtw_I/AAAAAAAABYE/i19bvlp_uxAyXM6dfADZJE-uVHhrI-fygCLcBGAsYHQ/s600/131_balloon_inside.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-wbcU0e35mhM/YI2AuXJtw_I/AAAAAAAABYE/i19bvlp_uxAyXM6dfADZJE-uVHhrI-fygCLcBGAsYHQ/s320/131_balloon_inside.png" /></a></p></div><p style="text-align: left;">Next, identify just the border pixels: those that are inside the balloon, but share an edge with the outside.<br /></p><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/-aWeHwLu7fk0/YI2AubjpjRI/AAAAAAAABYI/tDQ9vN4YsNAk5ofSPAFvu-lIfeOGBjWbgCLcBGAsYHQ/s600/132_balloon_border.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-aWeHwLu7fk0/YI2AubjpjRI/AAAAAAAABYI/tDQ9vN4YsNAk5ofSPAFvu-lIfeOGBjWbgCLcBGAsYHQ/s320/132_balloon_border.png" /></a></p></div><p style="text-align: left;"><br /></p><div style="text-align: left;"><p>These pixels can be arranged in a linear order with each adjacent (possibly diagonally) to the previous one. In most cases it's obvious which pixel to go to next, but a little care is needed at the bottom corner where the path loops back on itself. We can trace out this shape using RNA, using the commands to move forward and turn left and right. Before leaving a pixel, issue a mark command, and after arriving at the next pixel, issue a line command. It doesn't really matter where you start (we'll add a <span style="font-family: courier;">moveTo</span> call before-hand to get us there), but it's important to end facing east (which is also the starting direction) and to end in the same place you started, as otherwise the higher-level code will be confused about the current location.</p></div><div style="text-align: left;"><p>Once we've done some compression we'll be able to patch the balloon function in place, but for now we don't have enough room. We'll pick some other function we don't need and overwrite it (I used <span style="font-family: courier;">contest-2005</span>). Overwrite it with a call to <span style="font-family: courier;">moveTo</span> (use 0, 0 as coordinates for now), then the RNA, then a return that pops 24529 from the stack (look at the end of <span style="font-family: courier;">drawPolyline</span> to see what that looks like). Now in <span style="font-family: courier;">balloon</span>, replace the call to <span style="font-family: courier;">drawPolyline</span> with a call to the replacement function. The result will depend on which point on the boundary you picked as a starting point; I used the left-most point on the top row.</p></div><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/-Av42CmB-M2I/YI2dZ9pI-BI/AAAAAAAABYU/YAr5eiMtdzIoUJW6YLwCMx8VLEMqO_WigCLcBGAsYHQ/s600/133_balloon_rna.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Av42CmB-M2I/YI2dZ9pI-BI/AAAAAAAABYU/YAr5eiMtdzIoUJW6YLwCMx8VLEMqO_WigCLcBGAsYHQ/s320/133_balloon_rna.png" /></a></p></div><p style="text-align: left;"></p><div style="text-align: left;"><p>Definitely not a complete success, but we can see the top-right of the balloon (if it looks a bit out-of-shape, that's just because you're seeing the sail of the windmill through the hole), and it tells us a bit about how it is being drawn. It's a little hard to see, but that white rectangle isn't a uniform colour — it has a gradient to it. The balloon shape is then used as mask on this gradient rectangle.</p></div><div style="text-align: left;"><p>We can see this more clearly if we apply a transformation to the colours to make gradients more apparent. I wrote a small tool that multiplies R by 11, G by 17 and B by 29 (all wrapping modulo 256). The anticompressant really interferes with this, so here's the result for the image above but with the anticompressant disabled.</p></div><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/-JrlXcfM92M4/YI2fVIo9ErI/AAAAAAAABYc/1N1ILTn9LX8RFN5twowLt0q3mKu02FrGgCLcBGAsYHQ/s600/134_balloon_rna-no-ac-gradient.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-JrlXcfM92M4/YI2fVIo9ErI/AAAAAAAABYc/1N1ILTn9LX8RFN5twowLt0q3mKu02FrGgCLcBGAsYHQ/s320/134_balloon_rna-no-ac-gradient.png" /></a></p></div><p style="text-align: left;"></p><div style="text-align: left;"><p>And the target, with the anticompressant reversed:</p></div><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/-6EsBegGToKo/YI2fdeGtQ7I/AAAAAAAABYg/ic_0cNpT2J4qWtmUnmCPVv6dRZ3giez_wCLcBGAsYHQ/s600/target-gradient.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-6EsBegGToKo/YI2fdeGtQ7I/AAAAAAAABYg/ic_0cNpT2J4qWtmUnmCPVv6dRZ3giez_wCLcBGAsYHQ/s320/target-gradient.png" /></a></p></div><p style="text-align: left;"></p><div style="text-align: left;"><p>Since we can only see a small part of the gradient in the target it's not that easy to determine where the rectangle should move to. What's more, if you try to line them up you'll find that they seem to have different slopes. It's worth looking at how the gradient is generated. <span style="font-family: courier;">balloon</span> calls <span style="font-family: courier;">drawGradientCornerNW</span>. Before doing so, it sets a special flag <span style="font-family: courier;">colorReset</span> to false. This prevents the colour callback (in this case, <span style="font-family: courier;">colorWhite</span>) from resetting the bucket before adding a colour. So as <span style="font-family: courier;">drawGradientCornerNW</span> proceeds, it will keep adding more white to the bucket, gradually changing the overall colour. As the colour bucket gets fuller, each new addition has less relative impact, which is why the gradient is strongest at the top-right and gets gentler towards the bottom left. </p></div><div style="text-align: left;"><p><span style="font-family: courier;">drawGradientCornerNW</span> works by setting a mark at the current position (the NW corner), moves to the NE corner, then steps down the east edge, drawing a line to the mark at each pixel before repeating the process along the bottom edge. On the target we can get an estimate of where the NW corner is by extending the lines of colour in the image to their intersection, and we can estimate how wide the rectangle is by looking at the slopes of the lines and comparing them to the slopes in our generated image. Getting the height is trickier, because we only get a little information from the stem of the balloon, but we know it has to extend at least to the bottom of the balloon and can just try multiple options from there. It turns out that the rectangle is exactly the bounding box of the balloon, which is also something you might have guessed and tried.</p><p>So, we're going to make a few changes:</p><ul style="text-align: left;"><li>Change the coordinates of our call to <span style="font-family: courier;">moveTo</span> to 67, 0, to move the bubble to the correct position relative to the box.</li><li>Change the coordinates of the <span style="font-family: courier;">setOrigin</span> call at scenario+<span style="font-family: courier;">0x900b</span> to 198, 324 to place the NW corner of the box correctly. <br /></li><li>Change the arguments to <span style="font-family: courier;">drawGradientCornerNW</span> in balloon to 101, 169 to set the size of the box.</li><li>Change the coordinates of the <span style="font-family: courier;">moveTo</span> at <span style="font-family: courier;">balloon+0x13ca</span> to 74, 37. This is the point from which the balloon is filled, and we need it to be somewhere inside the balloon. There are lots of choices, but this one requires changing only a single bit.</li></ul><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-pJfOpiROypE/YI2mKprdngI/AAAAAAAABYs/BI_adfOSM1cHauUVlQhvv41g6248ypMJQCLcBGAsYHQ/s600/135_balloon.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-pJfOpiROypE/YI2mKprdngI/AAAAAAAABYs/BI_adfOSM1cHauUVlQhvv41g6248ypMJQCLcBGAsYHQ/s320/135_balloon.png" /></a></div><p style="text-align: left;">Nearly there! We just need to shift the μ to the right place. At <span style="font-family: courier;">balloon+0x17d7</span>, change the coordinates to 56, 2.</p></div><div class="separator" style="clear: both; text-align: center;"><p><a href="https://1.bp.blogspot.com/--f6erG7EYNc/YI2p5u6XcII/AAAAAAAABY0/oMrIbgYRimU9EMxE2gjp-7hkbelZkozJACLcBGAsYHQ/s600/136_balloon_mu.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/--f6erG7EYNc/YI2p5u6XcII/AAAAAAAABY0/oMrIbgYRimU9EMxE2gjp-7hkbelZkozJACLcBGAsYHQ/s320/136_balloon_mu.png" /></a></p></div><p style="text-align: left;">That just leaves the swimming pool for the whale. Once again, we can repurpose a function that we previously suppressed. This time it needs to occur after the whale, so we'll use <span style="font-family: courier;">endocow</span>. In Part 1 we say something similar to what we need, when we set the weather to 2 but before activating the VMU. Here it is again:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-jDXxwJOhfWE/YI6cyRSy_AI/AAAAAAAABY8/nmz9Hl8kc9MRQYR2-boqjV4THVPWqqrKgCLcBGAsYHQ/s600/033_weather_2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-jDXxwJOhfWE/YI6cyRSy_AI/AAAAAAAABY8/nmz9Hl8kc9MRQYR2-boqjV4THVPWqqrKgCLcBGAsYHQ/s320/033_weather_2.png" /></a></div><p></p><p>Notice that the cupola of the UFO is half-filled with water. It's also the shape we need, but we need it upside down. Let's see if we can get just that part into our scene, without trying to flip it over.</p><p>As usual, we'll rewrite a return address in <span style="font-family: courier;">endocow</span> so that the return jumps into the code we actually want. <span style="font-family: courier;">endocow</span> starts with a call to <span style="font-family: courier;">setOrigin</span>, so we'll change the return address of that to jump to <span style="font-family: courier;">ufo+0x0a08</span>, which calls <span style="font-family: courier;">water</span>. For now we'll also change the coordinates in the <span style="font-family: courier;">setOrigin</span> call in <span style="font-family: courier;">endocow</span> to 55, 42, to match those that would have been used for the water if we'd run through <span style="font-family: courier;">ufo</span> from the top.<br /></p><p>The following code in <span style="font-family: courier;">ufo</span> after needs a bit of explanation: it first draws the shape of the cup (inline) and uses it to clip the water to the right shape. It then draws the shape again (again, inline, rather than by calling <span style="font-family: courier;">ufo-cup</span>) partially transparent, and composes it. That code ends at 0x1c30 with a jump to 0x1f53. We don't want to run any of the subsequent code, so replace 0x1f53 with a jump to the return statement (at 0x2c77).</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-UKMXaQxFX9Y/YI6fdhUl0TI/AAAAAAAABZE/r_8lpjmwL9U1lTMVYRWT8ID0ThzdstXzACLcBGAsYHQ/s600/137_cup1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-UKMXaQxFX9Y/YI6fdhUl0TI/AAAAAAAABZE/r_8lpjmwL9U1lTMVYRWT8ID0ThzdstXzACLcBGAsYHQ/s320/137_cup1.png" /></a></div><p>The cup is visible (between the balloon and the windmill), but the cow has disappeared. This happened because by jumping into the middle of <br /><span style="font-family: courier;">ufo</span>, we've messed up the image layers. If you step through the RNA a piece at a time, one can see the water being drawn on the same layer as the cow, and when it is clipped, the cow is clipped away. So we need to add an extra layer. Fortunately, <span style="font-family: courier;">endocow</span> starts with a piece of junk RNA to identify the function, which we can change to an <span style="font-family: courier;">add</span> RNA.</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-GvwmRKDur7k/YI6gKCmcr4I/AAAAAAAABZM/HZ1qnmxure4SYyEOJtlm6hDfL32TmU4YQCLcBGAsYHQ/s600/138_cup2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-GvwmRKDur7k/YI6gKCmcr4I/AAAAAAAABZM/HZ1qnmxure4SYyEOJtlm6hDfL32TmU4YQCLcBGAsYHQ/s320/138_cup2.png" /></a></div><p></p><p>Now let's flip the cup upside down. We'll need to wedge in two turns between drawing the water and drawing the first instance of the cup. We can overwrite the CFPICFP marker RNA in the return of <span style="font-family: courier;">water</span>. For the polyline used for clipping, the colour doesn't matter, so we can replace the white RNA at <span style="font-family: courier;">ufo+0x0b8e</span> with a second turn. We also need to restore the original orientation afterwards. Since we added a jump at <span style="font-family: courier;">ufo+0x1f53</span>, we can just insert two turn RNA's at the start of that jump.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Y-c0IRfN7Bw/YI6jeKDWdTI/AAAAAAAABZU/GJa7BNpofl4A96HJJO5w7TgwPal6N9AnACLcBGAsYHQ/s600/139_cup3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Y-c0IRfN7Bw/YI6jeKDWdTI/AAAAAAAABZU/GJa7BNpofl4A96HJJO5w7TgwPal6N9AnACLcBGAsYHQ/s320/139_cup3.png" /></a></div><br /><p>That's flipped the cup upside-down. The water now appears to be filling the top instead of the bottom. That's because the water is just a polyline, and we're now seeing the bottom edge instead of the top. More seriously of course, we've messed up the position tracking so a number of elements of the scene are now in the wrong place. This is not unexpected when we flip the direction under the hood.</p><p>One way to fix this is to make sure that the movements done while flipped bring us back to where we started. The last movement before the flip is a call to <span style="font-family: courier;">moveTo</span> in <span style="font-family: courier;">water</span>; the last before we flip back is another call to <span style="font-family: courier;">moveTo</span> in <span style="font-family: courier;">ufo</span>. We don't need to change the coordinates of either: the one in <span style="font-family: courier;">water</span> is called within a <span style="font-family: courier;">setOrigin</span>/<span style="font-family: courier;">resetOrigin</span> pair (one started in <span style="font-family: courier;">endocow</span>), and we can bring them into alignment by adjusting those coordinates. Specifically, the coordinates that we changed to 55, 42 earlier instead need to become 48, 8 for the arithmetic to balance.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-zKHfjpuVQm4/YI6lBFVsBqI/AAAAAAAABZc/ZZxtnRTa8lwSegwIW9fpG-Vaorn3wvLbgCLcBGAsYHQ/s600/140_cup4.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-zKHfjpuVQm4/YI6lBFVsBqI/AAAAAAAABZc/ZZxtnRTa8lwSegwIW9fpG-Vaorn3wvLbgCLcBGAsYHQ/s320/140_cup4.png" /></a></div><p>This has also shifted the water relative to the cup, and we'll need to adjust it further. But haven't we just pinned down the knob we have to adjust these relative positions? There is another knob: polylines have a starting position, encoded as the 2nd and 3rd numbers in the list (refer back to repair guide page 8). That is currently 56, 65; it'll take a little trial and error to get it right (you first need to get it close enough to see some reference points), but these need to be replaced by 62, 72. Don't forget to update both copies of the polyline.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-6aP5FaWTn00/YI6mBUuZqRI/AAAAAAAABZk/Ptlf8Ms8Or8VGh0s6ND-iVIEtnSL6hbSwCLcBGAsYHQ/s600/141_cup5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-6aP5FaWTn00/YI6mBUuZqRI/AAAAAAAABZk/Ptlf8Ms8Or8VGh0s6ND-iVIEtnSL6hbSwCLcBGAsYHQ/s320/141_cup5.png" /></a></div><p>There's just one more step! At <span style="font-family: courier;">bmu+0x45ee</span>, we need to change the position from (160, 310) to (372, 257) to place the cup in the right position.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Su5-S1pJKBA/YI6mmUnFgEI/AAAAAAAABZs/dKIfZgAJrGAxXF1750x17Cse_a0r7ayfwCLcBGAsYHQ/s600/142_finished.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Su5-S1pJKBA/YI6mmUnFgEI/AAAAAAAABZs/dKIfZgAJrGAxXF1750x17Cse_a0r7ayfwCLcBGAsYHQ/s320/142_finished.png" /></a></div><p></p><p>If you made it this far, thanks for reading, and if you've actually produced a perfect prefix yourself, congratulations!</p><p>I had originally planned to write a Part 3 which showed how to optimise the prefix, but this series has already become very long and I need a break. I've also incorporated a lot of tricks into Part 2. Jochen Hoenicke's <a href="https://jochen-hoenicke.de/icfp07/prefix.html">page</a> describes his incredibly short prefix, and you can probably learn a few things from it (I certainly did: several ideas presented here are originally his). If I do get back to it one day, the interesting parts would be</p><ul style="text-align: left;"><li>Compressing the RNA for the balloon. I've written a decompressor that maps each of I, C, F and P to a short sequence of integers, after which the standard RNA decompressor is used to decompress those into RNA. It works out to 773 bases for the decompressor and data (but requires a separate <span style="font-family: courier;">moveTo</span> call first). If one could find the right tradeoff between complexity of the decompressor and length of the data it might be competitive with Jochen's polygon decompressor.</li><li>Writing a code reverser to turn <span style="font-family: courier;">duolc</span> into <span style="font-family: courier;">cloud</span> and an XORer to fix <span style="font-family: courier;">sun</span>, in DNA.</li><li>Writing a patcher that takes a compact table of patches to apply and runs a loop to apply them.<br /></li></ul><div style="text-align: left;"><p></p></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-18940157234800853162021-04-21T12:56:00.004-07:002021-05-03T08:39:40.938-07:00Disassembling Endo, part 1<p>Since it was recently Easter, I decided to revisit what's probably my favourite easter egg hunt / programming contest ever: <a href="https://save-endo.cs.uu.nl/">ICFP 2007</a>. It's a problem where you are given a description of a strange virtual machine together with a large and slightly damaged program for it, and have to modify it to make it produce the right output, with reward for the most minimal modifications. The original contest was 72 hours, but this time around I'm spending a lot more time on it because I really wanted to investigate all the intricacies and solve all the puzzles.</p><p>I'm going to make a walkthrough on this blog since I find it interesting to look back at the chain of clues to see just how much depth the problem authors put into it. At the time our team wrote a <a href="https://marco-za.blogspot.com/2007/07/icfp-how-we-reached-top-15.html">blog post</a> which described how far we got, and it has links to a lot of similar writeups, but I'm not aware of a complete walkthough. The link above has a report from the authors that briefly describes a few of the puzzles, but not everything. Jochen Hoenicke also has a <a href="https://jochen-hoenicke.de/icfp07/prefix.html">page</a> listing a highly optimised solution which perfectly reproduces the target image, but it's not a walkthrough: without having tackled the problem yourself the explanation will not make much sense.<br /></p><p>Like any walkthrough, this blog is going to be <i>full</i> of spoilers. If you want to try the problem yourself, stop reading now and have a go, and come back when you get stuck.</p><p>I plan to write this in 2-3 parts. Part 1 will find most of the hidden clues you need. Part 2 will dive deeper into the internals of the code to understand what it does and how we can write new code to achieve our goals, aiming to finish up by perfectly creating the target scene. If I still have the energy I'll write a Part 3 to look at optimising the prefix to improve Endo's chances of survival.<br /></p><p>Ok, onwards! I assume you've read the problem statement. I'll be writing a lot of pattern matching rules, so I should explain my syntax. It's basically the syntax used in the problem statement, but tweaked to make it plain ASCII that can be parsed unambiguously:</p><ul style="text-align: left;"><li>*xxxxxxx* is RNA (IIIxxxxxxx in the DNA)</li><li>?[xxx] is search for xxx</li><li>![n] is skip n</li><li>() in patterns for bracketing subexpressions</li><li>(n,l) in templates for a reference to the nth element of the environment at protection level l (note: they appear in the opposite order in the DNA)<br /></li><li>(n) is shorthand for (n,0)</li><li>|n| for the length of the nth element of the environment.<br /></li></ul><p>So clearly there are two things you'll need to implement: the DNA → RNA conversion, and the rendering of RNA. The first is by far the trickier of the two. Conceptually it's not too hard, but it needs to be implemented carefully or it'll take hours to run. They're not kidding when they say you need to be able to do large jumps in sub-linear time, as well as pasting large strings when expanding the template. You're going to be running the simulator a <i>lot</i>, so if it's taking more than 30 seconds then it's going to be worth doing some optimisation early on. Here are a few hints:</p><ul style="text-align: left;"><li>The search operator (?) is fairly rare, and the strings searched for tend to be quite short (less than 20 characters). Don't spend time implementing fancy string searches like Boyer-Moore. There are a few puzzles which will take much longer to run if this is horribly inefficient.<br /></li><li>It's quite common that the template ends with an (unquoted) substitution of a subexpression that ends at the end of the pattern, for example, (![123456](![100])) -> (0)(1). In that case you don't need to remove it and add it back again.</li><li>The specification doesn't put an upper bound on integer size. You won't need more than 32 bits (although some have more than 32 bases, but with zeros in the high bits). I've actually found it quite useful to crash out when this is violated, since it indicates that I've done something wrong.</li><li>Don't worry if you get some RNA that doesn't match any of the valid combinations. It's normal. <br /></li></ul><p>I ended up implementing strings as a binary tree of string views, pointing at either the original DNA, or dynamically allocated strings (basically a <a href="https://en.m.wikipedia.org/wiki/Rope_(data_structure)">rope</a>). I also tried ensuring balance of the tree by using a treap, but it didn't make much difference except in a few special cases. A useful optimisation was to collapse trees with fewer than a certain number of characters, since it's cheaper to do the copy once than to walk through the tree each time.</p><p>Ok, so assuming you've implemented your simulator correctly, you should see the starting output:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-00La8vl48AA/YG86J0gK85I/AAAAAAAABGQ/fPnt2vip0CcqMKjyp4knoWMPvxENe0xBACLcBGAsYHQ/s600/empty.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-00La8vl48AA/YG86J0gK85I/AAAAAAAABGQ/fPnt2vip0CcqMKjyp4knoWMPvxENe0xBACLcBGAsYHQ/s320/empty.png" /></a></div><p>What now? The problem statement gives you a hint that you should try a particular sequence. I suggest also working out what each of these sequences does, as it'll give insight into the machine You can do it by hand, but it's also useful to have some disassembler functionality in your simulator. In this case it translates to (?[IFPP])F -> (0)P. In other words, find the sequence IFPP, and if there is an F after it, change it to a P. And now we get:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-YUTpwKcC1kY/YG8785tV3xI/AAAAAAAABGY/vOAtEP3ydcgL8pJhaHpWlOIDDKXhwlj2wCLcBGAsYHQ/s600/selftest.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-YUTpwKcC1kY/YG8785tV3xI/AAAAAAAABGY/vOAtEP3ydcgL8pJhaHpWlOIDDKXhwlj2wCLcBGAsYHQ/s320/selftest.png" /></a></div><br /><p>Or of course, you might get some failures. Note that just become something says OK, it might be broken. My initial implementation had messed up some of the alpha processing so the alpha composition didn't look right.</p><p>This is one of the points where it is quite easy to get stuck, just because you haven't yet reached a point where clues start branching out. Probably the easiest way to find the next clue is to slow down the drawing process so that you can see how the image is built up. For example, just before each compose or clip operation, dump the two bitmaps involved. You'll be able to see how each element of the image is first drawn on its own layer before compositing it onto the main image. But you'll also see something else right at the start:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-xlXA13B5rxc/YG8_zx5qndI/AAAAAAAABGg/XOMCwKxd-a8py_oWJGeqk-5eXiWXQpPLACLcBGAsYHQ/s600/003_startup_clue.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-xlXA13B5rxc/YG8_zx5qndI/AAAAAAAABGg/XOMCwKxd-a8py_oWJGeqk-5eXiWXQpPLACLcBGAsYHQ/s320/003_startup_clue.png" /></a></div><br /><p>Another way to find that clue was to notice that the DNA starts with a big chunk of RNA; if you decided to try that out while someone was implementing the DNA → RNA program, you would probably see the above. Type that in (carefully!) and let's see what happens. For reference, it decompiles as (?[IFPCFFP])I -> (0)C, so once again we're searching for some marker string and changing the character after it.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Z1pqtxxZW9A/YG9AYe7l_1I/AAAAAAAABGo/NaUhMwlzu1oNi9-AaxMySU0YPL-m3FsTACLcBGAsYHQ/s600/background.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Z1pqtxxZW9A/YG9AYe7l_1I/AAAAAAAABGo/NaUhMwlzu1oNi9-AaxMySU0YPL-m3FsTACLcBGAsYHQ/s320/background.png" /></a></div><br /><p>Ah, now we're making progress. A field repair guide sounds useful, but I'll show the result of the second clue first, just because it doesn't immediately lead to more clues. The prefix is (?[IFPFI])P -> (0)F, so again, replacing one base (or "acid", as the repair guide calls them) after a marker.</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-WuaOpoa9TRk/YG9A9ZbK5NI/AAAAAAAABGw/ljNOBj1ism04rnhEjmGH4BSVFGR9z0M7ACLcBGAsYHQ/s600/sunup.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-WuaOpoa9TRk/YG9A9ZbK5NI/AAAAAAAABGw/ljNOBj1ism04rnhEjmGH4BSVFGR9z0M7ACLcBGAsYHQ/s320/sunup.png" /></a></div><br /><p>So as promised, it's rotated the earth to face the sun (although the sun itself isn't visible). This looks like good progress! We probably now have a lot more correct pixels than we started with. If you have ImageMagick installed, the compare program is useful to visually check how close one is getting:<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-UTyYUQiQJTw/YG9BehoQP5I/AAAAAAAABG8/agP90eKIYJkJew6koF1wrcJ9x7j6uZNgwCLcBGAsYHQ/s600/diff.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-UTyYUQiQJTw/YG9BehoQP5I/AAAAAAAABG8/agP90eKIYJkJew6koF1wrcJ9x7j6uZNgwCLcBGAsYHQ/s320/diff.png" /></a></div><p></p><p></p><p>Ok, let's get back to the field repair guide. This time the prefix is (?[IFPCFFP])II -> (0)IC. That's pretty similar to a previous one: the same search pattern, but now we're replacing II with IC. And it gives this:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Cu1CBEYa_us/YG9CTckXHHI/AAAAAAAABHE/gwLte2d19N8zRRo2B4X0IJa6vtAozZMLgCLcBGAsYHQ/s600/catalog.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Cu1CBEYa_us/YG9CTckXHHI/AAAAAAAABHE/gwLte2d19N8zRRo2B4X0IJa6vtAozZMLgCLcBGAsYHQ/s320/catalog.png" /></a></div><p>Thankfully from now on we'll see more of that font, which is a lot more readable than the last one. It's also time to start writing our own prefix! While at this stage one can do things by hand, it's worth investing in some tooling as you're going to use it a lot. So I recommend writing something that will take human-readable pattern rules and turn them into DNA. It doesn't need to be super-efficient; I put something together with Python and the pyparsing package.</p><p>How do we get to other pages? We've already seen two repair guide pages, both of which skip ahead to a particular pattern and then replace some acids after it. If you open up the DNA in a text editor and find the pattern, you'll see the next few acids are IIIIIIIIIIIIIIIIIIIIIIIP. So according to this encoding description, that's zero, and we've been to pages 1 and 2. At this point one could just edit the DNA directly (either with a text editor or with code), but let's practice writing a pattern to get page 1337: (?[IFPCFFP])IIIIIIIIIII -> (0)CIICCCIICIC:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-y7ie0m_ed8U/YG9EMAp1U7I/AAAAAAAABHM/m4H7nJbnDMAVqk2PE3__nE6ju19dRibyQCLcBGAsYHQ/s600/catalog_index.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-y7ie0m_ed8U/YG9EMAp1U7I/AAAAAAAABHM/m4H7nJbnDMAVqk2PE3__nE6ju19dRibyQCLcBGAsYHQ/s320/catalog_index.png" /></a></div><br /><p>Now we're really getting somewhere! We can repeat for each of those page numbers (although it won't work for the encrypted ones:</p><p>1729 (a taxicab number):</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-D3pPyES-s-Y/YG9E1HUHKwI/AAAAAAAABHU/lB_v4P9G9LsasMbQLijIEXggo-2WqDlZQCLcBGAsYHQ/s600/1729_structure.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-D3pPyES-s-Y/YG9E1HUHKwI/AAAAAAAABHU/lB_v4P9G9LsasMbQLijIEXggo-2WqDlZQCLcBGAsYHQ/s320/1729_structure.png" /></a></div><p>8:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-YBKPxH0W1NQ/YG9FB5SorJI/AAAAAAAABHY/_NeJcCyEpEUQecHXmMxsoSSBI3d-8uilwCLcBGAsYHQ/s600/8_more_notes.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-YBKPxH0W1NQ/YG9FB5SorJI/AAAAAAAABHY/_NeJcCyEpEUQecHXmMxsoSSBI3d-8uilwCLcBGAsYHQ/s320/8_more_notes.png" /></a></div><br /><p>42 (the answer to life, the universe and everything):</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-s8ORMZZ13FQ/YG9Fh4MACCI/AAAAAAAABHk/A2CTswSjCZY95kD3305VRwMx_i5qF-HhACLcBGAsYHQ/s600/42_gene_list.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-s8ORMZZ13FQ/YG9Fh4MACCI/AAAAAAAABHk/A2CTswSjCZY95kD3305VRwMx_i5qF-HhACLcBGAsYHQ/s320/42_gene_list.png" /></a></div><br />112 (I guess because it's the international emergency number?):<div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-UuCri9XKmwY/YG9Fu7nhUjI/AAAAAAAABHo/LQRVjpjwpEoTydPNYMNaGEVlhKCTkl_4gCLcBGAsYHQ/s600/112_something_to_look_out_for.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-UuCri9XKmwY/YG9Fu7nhUjI/AAAAAAAABHo/LQRVjpjwpEoTydPNYMNaGEVlhKCTkl_4gCLcBGAsYHQ/s320/112_something_to_look_out_for.png" /></a></div><br />10646 (the joke is that ISO 10646 defines the Universal Character Set):<p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-h9Q-LQQGHoI/YG9GJns-FxI/AAAAAAAABH0/C2hn9hWnKkI3xq4P0yvVrKpwwpX0u41aACLcBGAsYHQ/s600/10646_charset.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-h9Q-LQQGHoI/YG9GJns-FxI/AAAAAAAABH0/C2hn9hWnKkI3xq4P0yvVrKpwwpX0u41aACLcBGAsYHQ/s320/10646_charset.png" /></a></div><p>85:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-OJlGSrYXZXg/YG9Gis9DNRI/AAAAAAAABH8/nP2fiU3Vo6AeIDS352XMsMMG1bsxs4cNwCLcBGAsYHQ/s600/85_field_repairing.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-OJlGSrYXZXg/YG9Gis9DNRI/AAAAAAAABH8/nP2fiU3Vo6AeIDS352XMsMMG1bsxs4cNwCLcBGAsYHQ/s320/85_field_repairing.png" /></a></div><p>2181889 (not sure what the reference is to):</p><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-psMbgDI7hdo/YG9HVQ1mQgI/AAAAAAAABII/2BNfzdGcpaUBaYYXHU0FREJWtPGc47URgCLcBGAsYHQ/s600/2181889_weird_rna.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-psMbgDI7hdo/YG9HVQ1mQgI/AAAAAAAABII/2BNfzdGcpaUBaYYXHU0FREJWtPGc47URgCLcBGAsYHQ/s320/2181889_weird_rna.png" /></a></div><p>5:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-4gjl_DzQ4Mo/YG9HheEYtBI/AAAAAAAABIQ/hekGz2F7uhMU16uwdNEcRKD8sEZHVcBNACLcBGAsYHQ/s600/5_synthesis.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-4gjl_DzQ4Mo/YG9HheEYtBI/AAAAAAAABIQ/hekGz2F7uhMU16uwdNEcRKD8sEZHVcBNACLcBGAsYHQ/s320/5_synthesis.png" /></a></div><br /><p>4405829 (apparently the patent number for RSA encryption):</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-SlwjG3YaVWQ/YG9Ht1je-pI/AAAAAAAABIY/WA6Keb1qFHMy2ZTaiPOKpXkDXrwYyOc7ACLcBGAsYHQ/s600/4405829_security.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-SlwjG3YaVWQ/YG9Ht1je-pI/AAAAAAAABIY/WA6Keb1qFHMy2ZTaiPOKpXkDXrwYyOc7ACLcBGAsYHQ/s320/4405829_security.png" /></a></div><p>123456:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-46VryhWo2AQ/YG9ae_q-pkI/AAAAAAAABI8/dJgtX8_PwkQ4NcL1d7KUTu2KANlbIeNRACLcBGAsYHQ/s600/017b_123456_compression.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-46VryhWo2AQ/YG9ae_q-pkI/AAAAAAAABI8/dJgtX8_PwkQ4NcL1d7KUTu2KANlbIeNRACLcBGAsYHQ/s320/017b_123456_compression.png" /></a></div><br /><p></p><p>That opens up a whole lot of parallel threads to explore, so I'm just going to pick an order. I'm going to start by learning as much about the machine as possible before doing anything much about trying to match the target image. In practice one probably wants to work on both in parallel.</p><p>The red/green/zone explanation tells us something about how this crazy machine is programmed. Since the blue zone grows and shrinks at one end, it sounds like a stack, and it is. The green zone sounds more like a code or data segment: it can be edited, but you can't go shuffling things around without breaking code that expects to know where things are. And the red zone is "born" from the green zone in the sense that code is copied from the green zone to the red zone to be executed. That makes sense, because once a pattern rule is executed, it disappears from the DNA, so if you want to have reusable code you need to copy it before executing it.</p><p>We also have the first page of what looks like a symbol table, and it tells us how to find the green zone: it starts with IFPICFPPCFFPP. Incidentally, how can one be sure that such markers won't accidentally match in the wrong place? You might have noticed when implementing the simulator that in a pattern, IF can be followed by any character; in a template, IF is equivalent to IP; and in both, IIF is equivalent to IIC. It's thus possible to avoid using the sequence IFP in producing any rule (unless you want it in some RNA, but none of the defined RNA sequences need that), and it's a good idea to avoid it in your code too to avoid accidentally messing up patterns that search for such markers.</p><p>Can we get the rest of the symbol table? The first symbol (AAA_geneTablePageNr) is a bit of a clue. If we go find the green zone marker (it's 13615 acids into the download), go forward 0x510 acids, and then look at the next 0x18 acids, they're IIIIIIIIIIIIIIIIIIIIIIIP. We've seen before that this encodes 0, so what happens if we change the first acid to a C to encode 1? We'll use (?[IFPICFPPCFFPP]![1283])![1] -> (0)C, (combined with the rule to show the gene table). Note that 1283 is not 0x510: we have to subtract 13 to account for the ? operator taking us to the <i>end</i> of the green zone marker. If you've done it right, you'll get:<br /></p><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-llNIBKe7TsI/YG9NIpcxFjI/AAAAAAAABIk/E2W0MZGikwgJMoKKoOwj60elFVXSE2odACLcBGAsYHQ/s600/018_gene_table_1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-llNIBKe7TsI/YG9NIpcxFjI/AAAAAAAABIk/E2W0MZGikwgJMoKKoOwj60elFVXSE2odACLcBGAsYHQ/s320/018_gene_table_1.png" /></a></div><p>And we can continue in the same way (for larger numbers you'll obviously need to replace more bits):</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Bbqbt0mQyag/YHXZfgBm87I/AAAAAAAABM4/5uKCJcoMo5s0TNV7WoD1v8i6Z4C2yO4ywCLcBGAsYHQ/s600/018_gene_table_02.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Bbqbt0mQyag/YHXZfgBm87I/AAAAAAAABM4/5uKCJcoMo5s0TNV7WoD1v8i6Z4C2yO4ywCLcBGAsYHQ/s320/018_gene_table_02.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-DlCg4LSp60E/YHXZfhzjL2I/AAAAAAAABMw/WZo5nLe_gzI5J1gLXIOzUa2LcdkreawJACLcBGAsYHQ/s600/018_gene_table_03.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-DlCg4LSp60E/YHXZfhzjL2I/AAAAAAAABMw/WZo5nLe_gzI5J1gLXIOzUa2LcdkreawJACLcBGAsYHQ/s320/018_gene_table_03.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-_U8tKrQ5IS0/YHXZfsdKQJI/AAAAAAAABM0/CXJtQsR6QScVrbPXV0ErSis1Kjxo59m4gCLcBGAsYHQ/s600/018_gene_table_04.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-_U8tKrQ5IS0/YHXZfsdKQJI/AAAAAAAABM0/CXJtQsR6QScVrbPXV0ErSis1Kjxo59m4gCLcBGAsYHQ/s320/018_gene_table_04.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-9V1PJjrhHjI/YHXZgSJMtaI/AAAAAAAABM8/FmOgHoubVbEeZrrG3XX5pqqOwKAoF1DKACLcBGAsYHQ/s600/018_gene_table_05.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-9V1PJjrhHjI/YHXZgSJMtaI/AAAAAAAABM8/FmOgHoubVbEeZrrG3XX5pqqOwKAoF1DKACLcBGAsYHQ/s320/018_gene_table_05.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-MF6P3gjkhMQ/YHXZgU6_wjI/AAAAAAAABNA/7-zrAjs1UqgnaKkCOni62zVfNIkhfYE8wCLcBGAsYHQ/s600/018_gene_table_06.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-MF6P3gjkhMQ/YHXZgU6_wjI/AAAAAAAABNA/7-zrAjs1UqgnaKkCOni62zVfNIkhfYE8wCLcBGAsYHQ/s320/018_gene_table_06.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-SgDFyvEBYGQ/YHXZghTlJwI/AAAAAAAABNE/u2bNFEhpvmcnZxJon1fCa2Scl84rIS5kACLcBGAsYHQ/s600/018_gene_table_07.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-SgDFyvEBYGQ/YHXZghTlJwI/AAAAAAAABNE/u2bNFEhpvmcnZxJon1fCa2Scl84rIS5kACLcBGAsYHQ/s320/018_gene_table_07.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-DTRpGsoLWL8/YHXZgwIsc8I/AAAAAAAABNI/F_Ymt4kK5I0QitfGnQOMDKAzHegGpHZGQCLcBGAsYHQ/s600/018_gene_table_08.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-DTRpGsoLWL8/YHXZgwIsc8I/AAAAAAAABNI/F_Ymt4kK5I0QitfGnQOMDKAzHegGpHZGQCLcBGAsYHQ/s320/018_gene_table_08.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-MunZ4axChhA/YHXZhEhtjaI/AAAAAAAABNM/Go9sBZAEz5Mzui_KwL3L-U_6xElU8DczACLcBGAsYHQ/s600/018_gene_table_09.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-MunZ4axChhA/YHXZhEhtjaI/AAAAAAAABNM/Go9sBZAEz5Mzui_KwL3L-U_6xElU8DczACLcBGAsYHQ/s320/018_gene_table_09.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-naz_L_em4PY/YHXZhZw0McI/AAAAAAAABNQ/N0BmZBv3VrAQFTitdrvdjAUfSPcHKYnrQCLcBGAsYHQ/s600/018_gene_table_10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-naz_L_em4PY/YHXZhZw0McI/AAAAAAAABNQ/N0BmZBv3VrAQFTitdrvdjAUfSPcHKYnrQCLcBGAsYHQ/s320/018_gene_table_10.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-hTThSpgpw04/YHXZhsgFkgI/AAAAAAAABNU/-09t6zgTg6kek1HnEGbfP8C-IDcai_o6wCLcBGAsYHQ/s600/018_gene_table_11.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-hTThSpgpw04/YHXZhsgFkgI/AAAAAAAABNU/-09t6zgTg6kek1HnEGbfP8C-IDcai_o6wCLcBGAsYHQ/s320/018_gene_table_11.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-aT2edMDE5hQ/YHXZh12q3EI/AAAAAAAABNc/ryjqf33uIJg98kaEXSTSnbaJ442pFSh8wCLcBGAsYHQ/s600/018_gene_table_12.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-aT2edMDE5hQ/YHXZh12q3EI/AAAAAAAABNc/ryjqf33uIJg98kaEXSTSnbaJ442pFSh8wCLcBGAsYHQ/s320/018_gene_table_12.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-kaFU6FwB5BM/YHXZiNXLpNI/AAAAAAAABNY/OmJUHc1iGUUvXH4X6kzKBDelmTn_x0YgACLcBGAsYHQ/s600/018_gene_table_13.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-kaFU6FwB5BM/YHXZiNXLpNI/AAAAAAAABNY/OmJUHc1iGUUvXH4X6kzKBDelmTn_x0YgACLcBGAsYHQ/s320/018_gene_table_13.png" /></a></div><p>It seems like some of the entries are damanged; we'll eventually want to fix that.<br /></p><p></p>You'll want to start capturing some of this information so that you can pull symbols out to examine or automate modifying them, but don't bother typing in the whole table; later on we'll be able to extract it automatically. But it's worth reading over to look for clues. Some interesting symbols are hitWithTheClueStick, <span style="font-family: courier;">blueZoneStart</span> (which tells you how big the green zone is), and of course all the <span style="font-family: courier;">help-*</span> functions (which suggests there might be more help pages we haven't seen yet). You can also confirm that the value we changed to bring up the sun corresponds to <span style="font-family: courier;">night-or-day</span>, and that the repair guide page number is <span style="font-family: courier;">helpScreen</span>.<br /><p>The character set page was less than totally helpful, being just a grid of dots. One of the symbols was called <span style="font-family: courier;">fontTable_Dots</span>, so maybe it shows the characters but the font is just dots? We have no idea how font tables work yet, but they're all the same size, so maybe we can just replace it with another one? To do that we'll need to advance to the source font (say, <span style="font-family: courier;">fontTable_Messenger</span>), advance over it (in parentheses, to capture it), advance to the target, and skip over it to consume it, then put back everything else we consumed plus the replacement:</p><p>(?[IFPICFPPCFFPP]![610254](![9216])![42728])![9216] -> (1)(0)</p><p>Don't forget to add this to the previous prefix that set the repair guide page number (incidentally, you can see this is called <span style="font-family: courier;">helpScreen</span> in the gene list).<br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-4YJgmznCDpc/YG9RTRAmpHI/AAAAAAAABIs/bbJGVftZgugANwqZoeFeI_Nq_3PSDegMACLcBGAsYHQ/s600/019_font_override.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-4YJgmznCDpc/YG9RTRAmpHI/AAAAAAAABIs/bbJGVftZgugANwqZoeFeI_Nq_3PSDegMACLcBGAsYHQ/s320/019_font_override.png" /></a></div><p><br />Voila! Now we can interpret any text we find. There is another way you could have figured this out: we're told the intergalactic character set has fallen into disuse on Earth, which if you know some history, describes <a href="https://en.wikipedia.org/wiki/EBCDIC">EBCDIC</a>. One look at the Wikipedia page should show you a pattern similar to the dot pattern, although everything is shifted up by 4 lines (and ICS text only goes up to 0xbf). But there are a lot of variations on EBCDIC, so it's helpful to confirm how the punctuation is laid out.</p><p>We've also been given a clue that one can go searching for text. We'll start simple by ignoring quoting, and just search for sequences of 9-acid (8-bit) numbers, ending with 255. Regular expressions are good for that, and here's what pops out:</p><blockquote><p><br /> bcdefghij klmnopqrs tuvwxyzA BCDEFGHIJ KLMNOPQRS TUVWXYZ0 123456789a Help! We are a group of computer scientists held prisoner on the remote planet of Utrecht in the Orion nebula by the evil Fuun. The Fuun have already conquered thousands of worlds, and it appears that Earth is next. Their modus operandi is always the same: they organize a fake `programming contest' on the victim world to repair a supposedly disabled Fuun. This enables them to identify that planet's best and brightest minds, who are then eliminated in advance of the actual invasion, leaving the planet defenseless against the Fuun's superior weaponry. You must not, under any circumstances, repair the Endo creature, as he - when reactivated - will surely destroy his `rescuers' and give the attack signal to the Fuun invasion force massing near Sullust. Do not give in to the lure of rewards, monetary or otherwise! * * * It is too late for us, but in the unlikely event that Earth manages to stave off the Fuun invasion, we would appreciate a monument of some sort to honour us (especially since we sabotaged the Fuun DNA by swapping some parabolas). We are: Alexey Rodriguez * Andres Loeh * Arie Middelkoop * Bastiaan Heeren * Chris Eidhof * Clara Loeh * Eelco Dolstra * Eelco Lempsink * Jeroen Leeuwestein * Johan Jeuring * John van Schie * Jurriaan Hage * Maaike Gerritsen * Mark Stobbe * Martijn van Steenbergen * Stefan Holdermans<br /><br /></p></blockquote><p>Well, that's fun ☺. And eventually we'll use the clue about swapped parabolas.<br /></p><p>We can also start searching for text that's been quoted (I -> C, C -> F, P -> IC). That turns up a huge amount of stuff: what looks like some function documentation, some of the text we've already seen (including the gene table, plus we can see names for the damaged entries), a story about Major Imp, some history of previous ICFP contests, and some help pages we haven't seen yet. It's all a bit fragmented and out-of-order, and a lot of it we'll see in more readable form later anyway, so I'm not going to delve into it here. If you're attempting the problem yourself, however, I recommend going through it, as you may find clues that you otherwise struggle to unlock.</p><p>Now let's tackle the gibberish on the Fuun security features page. It looks encrypted, but there are clearly patterns that suggests it's not good encryption. What's the worst encryption around? ROT13 of course. Typing it all in to decrypt (or decrypting by hand) is pretty tedious, but having bits of the text available as raw text (albeit incomplete and out-of-order) will help. When you get it decrypted, you can see it describes an encryption algorithm. If you've studied encryption, you'll recognise that this is a stream cipher, the best known of which is RC4, and indeed the description matches RC4. There is unfortunately a bug on this page (one of the very few actual mistakes the problem authors made): Step 5B is missing a lookup of the sum into the table before using the result. This also gives us a hint that we might be able to crack some of the encrypted items by using the purchase codes we can see in the symbol table and the <span style="font-family: courier;">crackKey</span> gene.</p><p>But maybe we're not done with ROT13 just yet. We can see that <span style="font-family: courier;">help-activating-genes</span> doesn't have a corresponding purchase code in the symbol table, so maybe it's encrypted with the simple encryption? Since there are only four characters in the DNA alphabet, it's going to be ROT2 (I → F, C → P, F → I, P → C). Trying to implement that transformation as a prefix will be challenging at this point, but we can just pull out the current value, transform it in a normal language, and then write a prefix that stuffs the replacement into the right place (or just do everything on the host system). That gives us this:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-u7W-SZobHfg/YG9YiGJIDaI/AAAAAAAABI0/bPLmjnDju6s94nsbosEIrPZzN1KkPTEFwCLcBGAsYHQ/s600/020_help-activating-genes.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-u7W-SZobHfg/YG9YiGJIDaI/AAAAAAAABI0/bPLmjnDju6s94nsbosEIrPZzN1KkPTEFwCLcBGAsYHQ/s320/020_help-activating-genes.png" /></a></div><p>So this tells us a bit more about how to call "genes" (functions) and pass them "adaptations" (arguments). It also tells us something about how returns work. But recall that help page 85 also told us how to use the "adapter" to simplify making function calls.</p><p>Let's see if we can crack some of the other encryption. There is one other repair guide page we're told is encrypted (page 84, "How to fix corrupted DNA") and we can see a symbol called <span style="font-family: courier;">help-error-correcting-codes_purchase_code</span>. Let's read it out, then follow the instructions for the adapter to run this pair of rules:<br /></p><p>(?[IFPICFPPCFIPP]) -> (0)CPCPFPPCFCIIPFICCFCFIFCC<br />(?[IFPICFPPCCC](?[IFPICFPPCCC])) -> (0)CIICICCIIICICIICIICCICCPICCICIIIICCICP(1)</p><p>The first rule pushes the purchase code (which we've extracted externally) onto the stack, and the second calls the gene <span style="font-family: courier;">crackKeyAndPrint</span>. This will run for quite a long time (minutes) and eventually produces:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-P6xGgIz-BEU/YG9bY8mTJqI/AAAAAAAABJE/lG7Kbe9TUIQpNkK2wBzSjtBQcHirZwgzgCLcBGAsYHQ/s600/021_crack_key.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-P6xGgIz-BEU/YG9bY8mTJqI/AAAAAAAABJE/lG7Kbe9TUIQpNkK2wBzSjtBQcHirZwgzgCLcBGAsYHQ/s320/021_crack_key.png" /></a></div><p>Ok, but now how do we apply it? There is a <span style="font-family: courier;">crypt</span> function in the symbol table (and RC4 is symmetric, so the same process will work for decryption), but we don't know what parameters it takes. We won't actually need the information this key unlocks until much later, so you could just wait, or you could take some guesses, or you could implement RC4 in your language of choice and do the decryption outside of the Fuun machine. You'll need to make a guess about how the 8-bit values are split into groups of 4 (little-endian, just like Fuun numbers). If you want to check that it's implemented correctly, just decrypt the purchase key: it should decrypt to III...IIIP i.e., zero. Now we can view page 84:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-U8mURmQQiJo/YG9dA0iSSaI/AAAAAAAABJM/P7dQMnTa3oU3XNIlE72ijN5jX92KqgEFwCLcBGAsYHQ/s600/022_ecc.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-U8mURmQQiJo/YG9dA0iSSaI/AAAAAAAABJM/P7dQMnTa3oU3XNIlE72ijN5jX92KqgEFwCLcBGAsYHQ/s320/022_ecc.png" /></a></div><br /><p>There isn't a whole lot we can do with this for now. We can try cracking other keys using the other purchase codes in the table, but in fact only one of them is short enough to be practical to crack with brute force (3 characters, which can be quickly cracked with brute force in C++), and in fact we won't need to (one thing I felt was really well-designed in this contest is that a lot of clues can be solved in more than one way).</p><p>Now that we've seen how to call one function, maybe we should try to call some others to see what they do? But first, let's see if we can extract the symbol table, rather than having to type in addresses. We already saw that the symbol names all appear (quoted) in the raw DNA; maybe the addresses and sizes are too? Let's take some symbol (it might as well be <span style="font-family: courier;">crackKeyAndPrint</span>), find it in the DNA, and dump the surrounding acids. Incidentally, by this point it is highly worth having a library of routines to do text encoding, quoting, unquoting, number parsing and so on. You'll see that the previous 50 acids are:<br /></p><p>FCCFCFFCCCFCFCCFCCFFCFFIC<br />CFFCFCCCCFFCFCCCCCCCCCCIC<br /></p><p>which when unquoted and interpreted as numbers are 0x6c9469 and 0x1616 — just what we needed. So we can modify our text-finding regular expression to hunt for symbols, consisting of two quoted 24-base numbers followed by a quoted string. Compared to the table we extracted earlier, this one also has the "damaged" entries!</p><p>At this point I wrote a script to just try calling every symbol that looks big enough to be a function (in separate invocations). In each case, follow the call with a call to <span style="font-family: courier;">terminate</span> so that the output isn't overwritten by the original code. Most of them just come out blank, some crashed (by triggering my check for integer overflows), and some ran forever and I had to list them for exclusion. Some of the functions require arguments and we don't know yet what they are, so I just pushed 10 24-acid integers onto the stack first.</p><p>That still leaves a lot of useful help pages and clues:<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-msiQ24Mywzw/YG9hHj-JCTI/AAAAAAAABJU/tgvmDxs-MZM2qEo-PTuoWJmsPWmDLiRDgCLcBGAsYHQ/s600/call_alien-lifeforms.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-msiQ24Mywzw/YG9hHj-JCTI/AAAAAAAABJU/tgvmDxs-MZM2qEo-PTuoWJmsPWmDLiRDgCLcBGAsYHQ/s320/call_alien-lifeforms.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-LPweMHPHOIw/YG9iAC5PqKI/AAAAAAAABJg/Yj5vISXoDUsUdwh760AwkHIcz5kXkdTMACLcBGAsYHQ/s600/call_babel-survey.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-LPweMHPHOIw/YG9iAC5PqKI/AAAAAAAABJg/Yj5vISXoDUsUdwh760AwkHIcz5kXkdTMACLcBGAsYHQ/s320/call_babel-survey.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-pJKDtp42b-o/YG9h_zHGUOI/AAAAAAAABJc/UN_y-Zp70GUfzIHjXxQNs_YGxlTW667iwCLcBGAsYHQ/s600/call_contest-1998.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-pJKDtp42b-o/YG9h_zHGUOI/AAAAAAAABJc/UN_y-Zp70GUfzIHjXxQNs_YGxlTW667iwCLcBGAsYHQ/s320/call_contest-1998.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Q_H0vfBrRpw/YG9iAEFHDAI/AAAAAAAABJk/VM_mjKqc7uA-x0jr2ehMANXQcatdRlhEQCLcBGAsYHQ/s600/call_contest-1999.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Q_H0vfBrRpw/YG9iAEFHDAI/AAAAAAAABJk/VM_mjKqc7uA-x0jr2ehMANXQcatdRlhEQCLcBGAsYHQ/s320/call_contest-1999.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-v80rsUCwZSY/YG9iAjyIl4I/AAAAAAAABJo/Y9errPZbr3Q4S-ut5bmtQYPhtmkfX2LSACLcBGAsYHQ/s600/call_contest-2000.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-v80rsUCwZSY/YG9iAjyIl4I/AAAAAAAABJo/Y9errPZbr3Q4S-ut5bmtQYPhtmkfX2LSACLcBGAsYHQ/s320/call_contest-2000.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-bx5v0LWZMAI/YG9iA9nJq_I/AAAAAAAABJs/ryBJUhefUasGPK4LnCOBuzD5XM0r_QsQwCLcBGAsYHQ/s600/call_contest-2001.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-bx5v0LWZMAI/YG9iA9nJq_I/AAAAAAAABJs/ryBJUhefUasGPK4LnCOBuzD5XM0r_QsQwCLcBGAsYHQ/s320/call_contest-2001.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-JFWYZ2JejXo/YG9iA0oCfKI/AAAAAAAABJw/yq8mCOLO2H8w88F-w2HmJTIktV-hl4BnwCLcBGAsYHQ/s600/call_contest-2002.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-JFWYZ2JejXo/YG9iA0oCfKI/AAAAAAAABJw/yq8mCOLO2H8w88F-w2HmJTIktV-hl4BnwCLcBGAsYHQ/s320/call_contest-2002.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-xjYmRl-GDws/YG9iBZjAqAI/AAAAAAAABJ0/lmt11FoZ68Ip6dgoI_br4gGQTioQ7AHLACLcBGAsYHQ/s600/call_contest-2003.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-xjYmRl-GDws/YG9iBZjAqAI/AAAAAAAABJ0/lmt11FoZ68Ip6dgoI_br4gGQTioQ7AHLACLcBGAsYHQ/s320/call_contest-2003.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-hmo6EUxCDAQ/YG9iBdyCTxI/AAAAAAAABJ4/dN_CNt5ih9MdQ7ZbekkwW_d7PoRVkFDWgCLcBGAsYHQ/s600/call_contest-2004.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-hmo6EUxCDAQ/YG9iBdyCTxI/AAAAAAAABJ4/dN_CNt5ih9MdQ7ZbekkwW_d7PoRVkFDWgCLcBGAsYHQ/s320/call_contest-2004.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-h7EwSv9XqIk/YG9iBdfKGTI/AAAAAAAABJ8/q8GXU0hYjiQBBkEWj7-px4P3YsWV016ygCLcBGAsYHQ/s600/call_contest-2005.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-h7EwSv9XqIk/YG9iBdfKGTI/AAAAAAAABJ8/q8GXU0hYjiQBBkEWj7-px4P3YsWV016ygCLcBGAsYHQ/s320/call_contest-2005.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-S5knJD-pYlU/YG9iB16E6YI/AAAAAAAABKA/Exh3Uz35mfAbvRkdtFE5nb4WeMM-GkBXwCLcBGAsYHQ/s600/call_contest-2006.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-S5knJD-pYlU/YG9iB16E6YI/AAAAAAAABKA/Exh3Uz35mfAbvRkdtFE5nb4WeMM-GkBXwCLcBGAsYHQ/s320/call_contest-2006.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-yKzJy9GLPV8/YG9iCIjW0oI/AAAAAAAABKE/7I5q13VQrawYJN1jAFQ60jkYgxQIu_TmQCLcBGAsYHQ/s600/call_contest-2007.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-yKzJy9GLPV8/YG9iCIjW0oI/AAAAAAAABKE/7I5q13VQrawYJN1jAFQ60jkYgxQIu_TmQCLcBGAsYHQ/s320/call_contest-2007.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-s9VF1vl0j1E/YG9iCFHPw-I/AAAAAAAABKI/-rDX_gZcR9saJEscIuYuDCAMven2a-X1wCLcBGAsYHQ/s600/call_fuundoc1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-s9VF1vl0j1E/YG9iCFHPw-I/AAAAAAAABKI/-rDX_gZcR9saJEscIuYuDCAMven2a-X1wCLcBGAsYHQ/s320/call_fuundoc1.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-0EZLubKsj2U/YG9iCm9tWvI/AAAAAAAABKM/NlfdRjmsUjkPjEk0bsmw-7lCI09Ff9bFwCLcBGAsYHQ/s600/call_fuundoc2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-0EZLubKsj2U/YG9iCm9tWvI/AAAAAAAABKM/NlfdRjmsUjkPjEk0bsmw-7lCI09Ff9bFwCLcBGAsYHQ/s320/call_fuundoc2.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-zm83x7bHa88/YG9iCyBAxLI/AAAAAAAABKQ/wQIXYfXDX3QpUp78DBBb99MRGVgUVK5VACLcBGAsYHQ/s600/call_fuundoc3.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-zm83x7bHa88/YG9iCyBAxLI/AAAAAAAABKQ/wQIXYfXDX3QpUp78DBBb99MRGVgUVK5VACLcBGAsYHQ/s320/call_fuundoc3.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-DV6Y0EeQ-8A/YG9iC0K04-I/AAAAAAAABKU/R0GmxFDZDA8rWNT0ldAyCTVbTQNe_STKwCLcBGAsYHQ/s600/call_help-adaptive-genes.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-DV6Y0EeQ-8A/YG9iC0K04-I/AAAAAAAABKU/R0GmxFDZDA8rWNT0ldAyCTVbTQNe_STKwCLcBGAsYHQ/s320/call_help-adaptive-genes.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-AndrrnabA-4/YG9iDKpTTPI/AAAAAAAABKY/vIPzBWIJ1goVNKFpt2nduEc0YWzdq09sACLcBGAsYHQ/s600/call_help-initial-cond.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-AndrrnabA-4/YG9iDKpTTPI/AAAAAAAABKY/vIPzBWIJ1goVNKFpt2nduEc0YWzdq09sACLcBGAsYHQ/s320/call_help-initial-cond.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-BgtqGQ35yIA/YG9iDd_kuOI/AAAAAAAABKc/d96__RBLTCkjQ0RZb4tcgufHlWdEH_-ggCLcBGAsYHQ/s600/call_help-palindromes.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-BgtqGQ35yIA/YG9iDd_kuOI/AAAAAAAABKc/d96__RBLTCkjQ0RZb4tcgufHlWdEH_-ggCLcBGAsYHQ/s320/call_help-palindromes.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-LDKZZDi6_TI/YG9iDn4KtoI/AAAAAAAABKg/F3CRcT9f2b0e9DIxz2_oWlqzkcmkOQOzACLcBGAsYHQ/s600/call_help-steganography.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-LDKZZDi6_TI/YG9iDn4KtoI/AAAAAAAABKg/F3CRcT9f2b0e9DIxz2_oWlqzkcmkOQOzACLcBGAsYHQ/s320/call_help-steganography.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-iJQuRbq6r-s/YG9iDm1tV7I/AAAAAAAABKk/4q3yluUhXsc9jim5d5Kug3nhkWywtwBgACLcBGAsYHQ/s600/call_help-vmu.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-iJQuRbq6r-s/YG9iDm1tV7I/AAAAAAAABKk/4q3yluUhXsc9jim5d5Kug3nhkWywtwBgACLcBGAsYHQ/s320/call_help-vmu.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-EJUyaZDifxI/YHXcGMbhmZI/AAAAAAAABN4/KNZ_vBA2F58lE6hVq7wx1nQw03vb2SnqACLcBGAsYHQ/s600/100_alien_virus.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-EJUyaZDifxI/YHXcGMbhmZI/AAAAAAAABN4/KNZ_vBA2F58lE6hVq7wx1nQw03vb2SnqACLcBGAsYHQ/s320/100_alien_virus.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-zlh8Ty4VOwc/YG9iEDpWrVI/AAAAAAAABKo/nn8rIRRImFsARnUoG6EcrAN-vpOd74CuQCLcBGAsYHQ/s600/call_impdoc1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-zlh8Ty4VOwc/YG9iEDpWrVI/AAAAAAAABKo/nn8rIRRImFsARnUoG6EcrAN-vpOd74CuQCLcBGAsYHQ/s320/call_impdoc1.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-CGYanb4t0uE/YG9iEVzcSlI/AAAAAAAABKw/mIM2d6ve4OkzX9BrrlkRKlhtPA7wOLU0ACLcBGAsYHQ/s600/call_impdoc2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-CGYanb4t0uE/YG9iEVzcSlI/AAAAAAAABKw/mIM2d6ve4OkzX9BrrlkRKlhtPA7wOLU0ACLcBGAsYHQ/s320/call_impdoc2.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-0M-lxi7YOJU/YG9iEvM28jI/AAAAAAAABK0/TiV_qIpddh4paQ3WWHi52ZPWAzv7BGZXQCLcBGAsYHQ/s600/call_impdoc3.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-0M-lxi7YOJU/YG9iEvM28jI/AAAAAAAABK0/TiV_qIpddh4paQ3WWHi52ZPWAzv7BGZXQCLcBGAsYHQ/s320/call_impdoc3.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-TDm8lTgNRDg/YG9iFPfi3sI/AAAAAAAABK4/hVtRNSybOp83L1ilHsfrj_8wKwOCjJbKQCLcBGAsYHQ/s600/call_impdoc4.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-TDm8lTgNRDg/YG9iFPfi3sI/AAAAAAAABK4/hVtRNSybOp83L1ilHsfrj_8wKwOCjJbKQCLcBGAsYHQ/s320/call_impdoc4.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-e3MvxARef8Y/YG9iFOpqzmI/AAAAAAAABK8/xO_DkJR2M0g-xcwx43ZanRGLqAhPlwuzwCLcBGAsYHQ/s600/call_impdoc5.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-e3MvxARef8Y/YG9iFOpqzmI/AAAAAAAABK8/xO_DkJR2M0g-xcwx43ZanRGLqAhPlwuzwCLcBGAsYHQ/s320/call_impdoc5.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Gb5Uxv_3WgI/YG9iFJ3FJJI/AAAAAAAABLA/fGznVqVRK5YY5-fk5sPzDuqPkWoQf3XMwCLcBGAsYHQ/s600/call_impdoc6.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Gb5Uxv_3WgI/YG9iFJ3FJJI/AAAAAAAABLA/fGznVqVRK5YY5-fk5sPzDuqPkWoQf3XMwCLcBGAsYHQ/s320/call_impdoc6.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-67NXrkqLnh0/YG9iFlkYxHI/AAAAAAAABLE/y3caGBhDPtIEUNxh4tEtVWDcymgxG00ggCLcBGAsYHQ/s600/call_impdoc7.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-67NXrkqLnh0/YG9iFlkYxHI/AAAAAAAABLE/y3caGBhDPtIEUNxh4tEtVWDcymgxG00ggCLcBGAsYHQ/s320/call_impdoc7.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-LhJCNsAX_20/YG9iGBjH01I/AAAAAAAABLI/3x6QmfIT-J0JumwjFlwy9ZCjWteowE9dACLcBGAsYHQ/s600/call_impdoc8.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-LhJCNsAX_20/YG9iGBjH01I/AAAAAAAABLI/3x6QmfIT-J0JumwjFlwy9ZCjWteowE9dACLcBGAsYHQ/s320/call_impdoc8.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-ZKaRGOvACQM/YG9iGHPiBUI/AAAAAAAABLM/dI_tueGpCPkQIFGnFn2eYZ0eV5Mu54t8wCLcBGAsYHQ/s600/call_impdoc9.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-ZKaRGOvACQM/YG9iGHPiBUI/AAAAAAAABLM/dI_tueGpCPkQIFGnFn2eYZ0eV5Mu54t8wCLcBGAsYHQ/s320/call_impdoc9.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-MH3JnR-P0VM/YG9iEVpyw8I/AAAAAAAABKs/05VpaIOl4HI29SdMLA7Gh0HapIowt8jowCLcBGAsYHQ/s600/call_impdoc10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-MH3JnR-P0VM/YG9iEVpyw8I/AAAAAAAABKs/05VpaIOl4HI29SdMLA7Gh0HapIowt8jowCLcBGAsYHQ/s320/call_impdoc10.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-brUCT2PJHYs/YG9iGVy64oI/AAAAAAAABLQ/jHB9aNChQSw2vcTkEHv4UbHFkrzkEGK2ACLcBGAsYHQ/s600/call_sticky.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-brUCT2PJHYs/YG9iGVy64oI/AAAAAAAABLQ/jHB9aNChQSw2vcTkEHv4UbHFkrzkEGK2ACLcBGAsYHQ/s320/call_sticky.png" /></a></div><p></p>There are some easter eggs, including the story of Major Imp. It also contains the <a href="https://en.wikipedia.org/wiki/Arecibo_message">Arecibo Message</a> (which I wasted a lot of time trying to interpret during the contest). You'll also see a lot of the elements of the scene (bits of saucer etc).<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-clymkNagV-U/YG9iy0yPC4I/AAAAAAAABLw/5cLr7yXXJ5YS__pE6URMcz_HOEid5GE5wCLcBGAsYHQ/s600/call_transmission-buffer.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-clymkNagV-U/YG9iy0yPC4I/AAAAAAAABLw/5cLr7yXXJ5YS__pE6URMcz_HOEid5GE5wCLcBGAsYHQ/s320/call_transmission-buffer.png" /></a></div><br /><p>Let's start with the prefix we've been given in the photo of the organisers, (?[IFPC])![27] -> (0)ICCICIICPCCCICIICPCICIIIICP. Look for that marker and match things to the symbol table, and we find that we're setting <span style="font-family: courier;">giveMeAPresent</span>, and the value looks like text, which turns out to be <span style="font-family: courier;">OPE</span>. If you decided to try cracking purchase codes earlier you might recognise this as the encryption key for <span style="font-family: courier;">vmu-code</span>, and indeed this prefix gives us that page:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-JJOJzU-7J5M/YHUyKx4FB2I/AAAAAAAABMo/AYYJcKVt0qQ8h1TGAYF6j7AKBPFBUeNjQCLcBGAsYHQ/s600/023_vmu-reg-code.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-JJOJzU-7J5M/YHUyKx4FB2I/AAAAAAAABMo/AYYJcKVt0qQ8h1TGAYF6j7AKBPFBUeNjQCLcBGAsYHQ/s320/023_vmu-reg-code.png" /></a></div> <p></p><p>There is a symbol called <span style="font-family: courier;">vmuRegCode</span>, so maybe we need to put it in there? Unfortunately that doesn't seem to have any effect, and if we also try setting <span style="font-family: courier;">vmuMode</span> to some obvious small values we get an error page. We'll come back to this later.</p><p>By the way, "Out of Band II" is a reference to a spaceship in "A Fire Upon The Deep", a most excellent piece of science fiction.<br /></p><p>The steganography page mentioned yellow dots. If you look at the contest history closely, some of the letters are in yellow. And all of them are either i, c, f or p. Put them all together (chronologically) and you get another prefix, which decodes as (?[IFPFCC])F -> (0)P, and matching that pattern to the symbol table tells us that it's setting <span style="font-family: courier;">hillsEnabled</span> to 1. Unfortunately, while the hills appear, something has gone very wrong:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-FcNsAS4jG84/YG9l5xR_PRI/AAAAAAAABMA/yXoOUOAlKXAyvfEpX6y2A8_pa123KPZCQCLcBGAsYHQ/s600/024_history-hillsEnabled.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-FcNsAS4jG84/YG9l5xR_PRI/AAAAAAAABMA/yXoOUOAlKXAyvfEpX6y2A8_pa123KPZCQCLcBGAsYHQ/s320/024_history-hillsEnabled.png" /></a></div><p></p><p>If you compare this to the target image, you'll also see that the hills aren't in quite the right places. There was the clue earlier about some parabolas being swapped, but we'll need to learn more about how to read Fuun code before we can try to fix that.</p><p>What does the steganography page mean about the one image being hidden in the other? One of the simplest forms of steganography is to put information in the least significant bits. If we take each pixel, keep only the LSB of each channel, and multiply it by 255, we get this, showing the hidden image:<br /></p><p></p><div class="separator" style="clear: both; text-align: center;"><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-S4dVWqhu_Sw/YHXdxdQQU_I/AAAAAAAABOE/2KCH4PCq5CUaXnIXUB2ebG-VMbEnDa5QgCLcBGAsYHQ/s600/025_stego_stego.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-S4dVWqhu_Sw/YHXdxdQQU_I/AAAAAAAABOE/2KCH4PCq5CUaXnIXUB2ebG-VMbEnDa5QgCLcBGAsYHQ/s320/025_stego_stego.png" /></a></div></div><p>So let's try the same thing on all the other images we have. You can easily miss it (I did this time around, and only rediscovered it when reading our team blog again), but the left side of ET contains the number 9546. It's not immediately clear what that's good for though; if you try it as a repair guide page number it won't work, nor is it the encryption code for <span style="font-family: courier;">help-beautiful-numbers</span>.<br /></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-m0p6hJPztPY/YHXeoUga24I/AAAAAAAABOQ/G5HkppCwdzEABmRcN_LJmRID3XwDdom7wCLcBGAsYHQ/s600/025_stego_et.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-m0p6hJPztPY/YHXeoUga24I/AAAAAAAABOQ/G5HkppCwdzEABmRcN_LJmRID3XwDdom7wCLcBGAsYHQ/s320/025_stego_et.png" /></a></div><p>One of the symbols is called <span style="font-family: courier;">hitWithTheClueStick</span>, which sounds like it should give us a clue. If you extract it, you'll notice that there are a lot of I's and C's, very few P's, and no F's. Numbers are encoded with I's and C's terminated by an P, so this looks like more data than code. The first chunk also seems to have some patterns in it. Maybe the Arecibo Message is a hint? It was a string of bits that was intended to be wrapped into a bitmap. In that case it was the product of two primes, making it easy to guess the dimensions, whereas we have 12800 bits but to the first P, but some trial and error shows that a 16×800 bitmap is readable:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-R0ofX_M5zAo/YHXzzfzaduI/AAAAAAAABPo/9KpPyU9Eu9YWoz2Oiz8cuf2LsBLSUkUggCLcBGAsYHQ/s800/026b_clue0.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="800" data-original-width="16" height="320" src="https://1.bp.blogspot.com/-R0ofX_M5zAo/YHXzzfzaduI/AAAAAAAABPo/9KpPyU9Eu9YWoz2Oiz8cuf2LsBLSUkUggCLcBGAsYHQ/s320/026b_clue0.png" /></a></div>Well, that's a pretty strong clue that the next chunk is going to be a PNG file. If we assume each 8 characters produces one byte (little endian), we get this:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-lBRx-Rjx5JY/YHX0Npp_BmI/AAAAAAAABPw/fG4HQc4olEIDyRgyEhDTqiIB4og-YWsBQCLcBGAsYHQ/s135/026c_clue1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="29" data-original-width="135" src="https://1.bp.blogspot.com/-lBRx-Rjx5JY/YHX0Npp_BmI/AAAAAAAABPw/fG4HQc4olEIDyRgyEhDTqiIB4og-YWsBQCLcBGAsYHQ/s0/026c_clue1.png" /></a></div><p>Ok, so the next chunk is going to be an audio file. We don't know what type, but there are tools for identifying file types from content which tells us it's an MP3. And it's a voice reading out I's, C's, F's and P's. It's a bit tedious to record (it helps to use some software to slow it down), but produces the prefix (?[IFPFP])![7] -> (0)CCICCCC, which gives us this image:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-edCGwPcKC2A/YHX1HKhaB0I/AAAAAAAABP4/0SLPxcth41wD45U-FRb-R1KKjfey0jKbwCLcBGAsYHQ/s600/029_beautiful_numbers.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-edCGwPcKC2A/YHX1HKhaB0I/AAAAAAAABP4/0SLPxcth41wD45U-FRb-R1KKjfey0jKbwCLcBGAsYHQ/s320/029_beautiful_numbers.png" /></a></div><p>Maybe some beautiful numbers are repair guide pages? 6, 28 and 496 don't do anything special, but 8128 causes my simulator to crash, so it's definitely special but we don't yet know why.<br /></p><p>We also now have documentation on a lot of the functions. One of these is <span style="font-family: courier;">printGeneTable</span>, where we can see a boolean argument controlling integrity checking. We've already been given some hints that various parts of the DNA are damaged, and knowing which parts might help. While we're at it, let's see if we can fix the damaged entries: since they're visible in the source, they can't be that badly damaged. We saw that each symbol name is preceded by the address and size, but what else? In between the symbols are bits of code that look like this:</p><p>IPPIICCCPPP<br />IPPCICCCPICIC</p><p>The general pattern seems to be "IPP", a number (which increments by 1 each time), then P or IC, and other P or IC. If we assume the last two are boolean flags, then they're probably quoted (since F is false and P is true, and those become P and IC when quoted). Match them up to the symbols, and one can deduce that these appear after the name, the first boolean indicates whether the entry is damaged or not, and the second probably indicates whether it is a function or data.</p><p>So, let's just flip the damaged flag to fix the entries! Wait a second, how do we do that without changing the size? Fortunately the number immediately before is variable-size, so if we change IC to P, we can just insert an extra bit into the number (without changing the value) to make up the space. Here's my Python code for this:</p><p><span style="font-family: courier;">code = re.sub(<br /> '([CF]{23}IC[CF]{23}IC(?:[CF]{8}IC){1,}FFFFFFFFICIPP[CI]*)PIC',<br /> r'\1IPP', code)</span><br /></p><p>Next, we push a <span style="font-family: courier;">P</span> onto the stack (to enable the integrity checking) and call <span style="font-family: courier;">printGeneTable</span> (followed by <span style="font-family: courier;">terminate</span>), and:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-KOq5TyNmIhU/YHXm_HrCqTI/AAAAAAAABOc/NR2pxtFmB0wGVBotTZCpJ8e8jqvKfc76ACLcBGAsYHQ/s600/027_gene_table_fixed_00.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-KOq5TyNmIhU/YHXm_HrCqTI/AAAAAAAABOc/NR2pxtFmB0wGVBotTZCpJ8e8jqvKfc76ACLcBGAsYHQ/s320/027_gene_table_fixed_00.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-n_PbKcliy7A/YHXm_a3xG9I/AAAAAAAABOg/uYMWfImWiDwMsJfV0QTl11bJkGdr0YqmQCLcBGAsYHQ/s600/027_gene_table_fixed_01.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-n_PbKcliy7A/YHXm_a3xG9I/AAAAAAAABOg/uYMWfImWiDwMsJfV0QTl11bJkGdr0YqmQCLcBGAsYHQ/s320/027_gene_table_fixed_01.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-kS1JVod1Yhw/YHXm_G0IWSI/AAAAAAAABOY/xo_edtk7rLkUFzUUUkus25wltVF5r2CjwCLcBGAsYHQ/s600/027_gene_table_fixed_02.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-kS1JVod1Yhw/YHXm_G0IWSI/AAAAAAAABOY/xo_edtk7rLkUFzUUUkus25wltVF5r2CjwCLcBGAsYHQ/s320/027_gene_table_fixed_02.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-tZ6yFYH7MlQ/YHXnAG1UGLI/AAAAAAAABOk/9IV7RYNCiksrIq224--7Dh6a1UzH7lgYwCLcBGAsYHQ/s600/027_gene_table_fixed_03.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-tZ6yFYH7MlQ/YHXnAG1UGLI/AAAAAAAABOk/9IV7RYNCiksrIq224--7Dh6a1UzH7lgYwCLcBGAsYHQ/s320/027_gene_table_fixed_03.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-dS4HYik4fgI/YHXnAdGc-yI/AAAAAAAABOo/M1pMWlpPbA01-2ndygjFaOS620NZb3pXQCLcBGAsYHQ/s600/027_gene_table_fixed_04.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-dS4HYik4fgI/YHXnAdGc-yI/AAAAAAAABOo/M1pMWlpPbA01-2ndygjFaOS620NZb3pXQCLcBGAsYHQ/s320/027_gene_table_fixed_04.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-O77wUmHd1R4/YHXnAobFVaI/AAAAAAAABOs/w2gkZmcykS8GRviLrnGNJ5iN1-aBJjd5QCLcBGAsYHQ/s600/027_gene_table_fixed_05.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-O77wUmHd1R4/YHXnAobFVaI/AAAAAAAABOs/w2gkZmcykS8GRviLrnGNJ5iN1-aBJjd5QCLcBGAsYHQ/s320/027_gene_table_fixed_05.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-E0Vc49EMn6k/YHXnA2jbchI/AAAAAAAABOw/-M6YzO2GxG0P1YlLQfq39dcEOkt1vrmTwCLcBGAsYHQ/s600/027_gene_table_fixed_06.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-E0Vc49EMn6k/YHXnA2jbchI/AAAAAAAABOw/-M6YzO2GxG0P1YlLQfq39dcEOkt1vrmTwCLcBGAsYHQ/s320/027_gene_table_fixed_06.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-QYanewlEfgc/YHXnBTumXOI/AAAAAAAABO0/z8VeST3FuGcoGnfve8S9dRzbda5RHHxtwCLcBGAsYHQ/s600/027_gene_table_fixed_07.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-QYanewlEfgc/YHXnBTumXOI/AAAAAAAABO0/z8VeST3FuGcoGnfve8S9dRzbda5RHHxtwCLcBGAsYHQ/s320/027_gene_table_fixed_07.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-UbaPFDuTD7A/YHXnBp8jhwI/AAAAAAAABO4/HJO-ookOI90JTC6g7maJyUMJq_yGKd4WgCLcBGAsYHQ/s600/027_gene_table_fixed_08.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-UbaPFDuTD7A/YHXnBp8jhwI/AAAAAAAABO4/HJO-ookOI90JTC6g7maJyUMJq_yGKd4WgCLcBGAsYHQ/s320/027_gene_table_fixed_08.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-MxFX7Vk_Bw4/YHXnBp-_aBI/AAAAAAAABO8/z528Gb16CrIu5ON_JfFCl7ARk-LzSWXbgCLcBGAsYHQ/s600/027_gene_table_fixed_09.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-MxFX7Vk_Bw4/YHXnBp-_aBI/AAAAAAAABO8/z528Gb16CrIu5ON_JfFCl7ARk-LzSWXbgCLcBGAsYHQ/s320/027_gene_table_fixed_09.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-xFIHl33lcZ8/YHXnBxqAYaI/AAAAAAAABPA/BjE_cqHXB8sV7ksUZh0Al6QlO4R-pNNVACLcBGAsYHQ/s600/027_gene_table_fixed_10.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-xFIHl33lcZ8/YHXnBxqAYaI/AAAAAAAABPA/BjE_cqHXB8sV7ksUZh0Al6QlO4R-pNNVACLcBGAsYHQ/s320/027_gene_table_fixed_10.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-V5npF5l-DXY/YHXnCI4sKTI/AAAAAAAABPE/gwbObJKu0y8ov_sP9ITsQ55i7-SoPvwKwCLcBGAsYHQ/s600/027_gene_table_fixed_11.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-V5npF5l-DXY/YHXnCI4sKTI/AAAAAAAABPE/gwbObJKu0y8ov_sP9ITsQ55i7-SoPvwKwCLcBGAsYHQ/s320/027_gene_table_fixed_11.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-VleHYGhcFpY/YHXnCUvx5HI/AAAAAAAABPI/ZaIHvGZPPBMqmBIXArriuOxCKU5prGlnACLcBGAsYHQ/s600/027_gene_table_fixed_12.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-VleHYGhcFpY/YHXnCUvx5HI/AAAAAAAABPI/ZaIHvGZPPBMqmBIXArriuOxCKU5prGlnACLcBGAsYHQ/s320/027_gene_table_fixed_12.png" /></a></div><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-6TvRf6S5sjs/YHXnCmcyT9I/AAAAAAAABPM/ck4iZN7jWNElM8G_sWGy8BHfse8V0U0iQCLcBGAsYHQ/s600/027_gene_table_fixed_13.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-6TvRf6S5sjs/YHXnCmcyT9I/AAAAAAAABPM/ck4iZN7jWNElM8G_sWGy8BHfse8V0U0iQCLcBGAsYHQ/s320/027_gene_table_fixed_13.png" /></a></div><br /><p>So indeed, some of the functions are damaged. A few of them were encrypted, and we've decrypted them (help-activating-genes, help-error-correcting-codes, vmu-code), but that still leaves more. We also have a few potential passwords (<span style="font-family: courier;">no1@Ax3</span>, <span style="font-family: courier;">9546</span>, <span style="font-family: courier;">8128</span>, and <span style="font-family: courier;">Out_of_Band_II</span>), and we now have a tool that will tell us if we've decrypted with the correct password. Some trial and error will show that <span style="font-family: courier;">9546</span> decrypts <span style="font-family: courier;">cow-tail</span> and <span style="font-family: courier;">Out_of_Band_II</span> decrypts <span style="font-family: courier;">caravan</span>.</p><p>What about <span style="font-family: courier;">cow-spot-middle</span>? There is the interestingly-named <span style="font-family: courier;">cow-spot-middle-ecc</span> symbol, of the same size, and the help page on error-correcting codes suggested we should expect 4 parity bits per 4 data bits. There is also the <span style="font-family: courier;">correctErrors</span> function in the symbol table. You'll find that calling that first (with the right arguments) will cause <span style="font-family: courier;">cow-spot-middle</span> to pass. When calling functions with multiple arguments, keep in mind that the first argument gets pushed to the stack first, and hence is further from the top (front) of the stack; and that integers must be encoded with 24 acids.</p><p>What about that page on palindromes? The text fades away, but you can find all the useful information in the raw text. It strongly suggests that some code might have a copy stored backwards as a backup. That would be the same size, so let's see what happens if we group symbols by size. We know <span style="font-family: courier;">cloud</span> is one of the damaged symbols, and there is a symbol of the same size called <span style="font-family: courier;">duolc</span> - cloud spelled backwards! And indeed, copying and reversing <span style="font-family: courier;">duolc</span> over <span style="font-family: courier;">cloud</span> will fix it.</p><p>How about <span style="font-family: courier;">sun</span>? There is a function called <span style="font-family: courier;">sunflower</span>, which is exactly the same size, and there are no sunflowers in the picture. In the case of <span style="font-family: courier;">cloud<span style="font-family: inherit;">-</span>duolc</span>, the alteration of the name was a clue — so maybe <span style="font-family: courier;">sunflower</span> is somehow a combination of <span style="font-family: courier;">sun</span> and <span style="font-family: courier;">flower</span>? Again a little trial and error is needed to consider the possibilities, but it turns out that it is their XOR (again with I=0, C=1, F=2, P=3).</p><p>Let's take a break from examining functions and try just poking some of the variables we can see. I'll assume that the functions have been fixed as appropriate. Clearly we need to turn Endo into a cow, and the VMU help page hinted that there is a Biomorphological Unit that might help. What happens if we set <span style="font-family: courier;">enableBioMorph</span> to P (true)?</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-lg-VIu6qjf8/YH3kdkOBlTI/AAAAAAAABQQ/quaOc4ZRsQ4ZlepFGtXnpDo6wj_kNL1iwCLcBGAsYHQ/s600/031_bmu.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-lg-VIu6qjf8/YH3kdkOBlTI/AAAAAAAABQQ/quaOc4ZRsQ4ZlepFGtXnpDo6wj_kNL1iwCLcBGAsYHQ/s320/031_bmu.png" /></a></div><br /><p>Well, he's roughly the right shape now, but he's an <a href="https://ocaml.org/">OCaml</a>. But there is a variable called <span style="font-family: courier;">ocamlrules</span>. Maybe if we change that from P (true) to F (false)?</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Mnlb7zvVd0c/YH3lOd-CKEI/AAAAAAAABQY/4MVne23xVnImzhRkhXmoXbdd2RZLb6pfgCLcBGAsYHQ/s600/032_mlephant.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-Mnlb7zvVd0c/YH3lOd-CKEI/AAAAAAAABQY/4MVne23xVnImzhRkhXmoXbdd2RZLb6pfgCLcBGAsYHQ/s320/032_mlephant.png" /></a></div><p>Clearly the contest organisers had a lot of fun. Well the BMU should be adapting Endo to the local conditions, and both camels and elephants are found in dry areas — maybe if we change the weather (it's a 24-acid number)?</p><p>weather=1:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-EghUsDg5LHs/YH3l2C9j_cI/AAAAAAAABQo/NAXQFMfRJI4zY_P-9DpWNmL1Virxq2fQwCLcBGAsYHQ/s600/033_weather_1.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-EghUsDg5LHs/YH3l2C9j_cI/AAAAAAAABQo/NAXQFMfRJI4zY_P-9DpWNmL1Virxq2fQwCLcBGAsYHQ/s320/033_weather_1.png" /></a></div>weather=2:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-eYokqAF30IY/YH3l10WByWI/AAAAAAAABQg/ERiBy3fa090CDp_6wNTG2OQcavu47KriwCLcBGAsYHQ/s600/033_weather_2.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-eYokqAF30IY/YH3l10WByWI/AAAAAAAABQg/ERiBy3fa090CDp_6wNTG2OQcavu47KriwCLcBGAsYHQ/s320/033_weather_2.png" /></a></div>weather=3:<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-u3dYkY37E_g/YH3l14tYKqI/AAAAAAAABQk/iyonzwrsTQ8NehBl7ZqG0_X_P7zEg8ZWACLcBGAsYHQ/s600/033_weather_3.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-u3dYkY37E_g/YH3l14tYKqI/AAAAAAAABQk/iyonzwrsTQ8NehBl7ZqG0_X_P7zEg8ZWACLcBGAsYHQ/s320/033_weather_3.png" /></a></div><p>(remember to fix <span style="font-family: courier;">clouds</span> and <span style="font-family: courier;">sun</span> before trying these). Other values of weather don't seem to do anything.<br /></p><p>So we can get the clouds to appear (although not in the right places or with the right sizes), together with some unwanted rain and lightning, and a bunch of things suddenly become very German. We can also get the sun to appear, in the right place, but too yellow and without the pattern in the middle. We can get Endo to turn half-way into a cow.</p><p>Notice how when weather is 1 or 2, there is also a faint shadow in the outline of the desired cow shape. Maybe the real cow is there but just almost completely transparent? We'll come back to this later.</p><p>There is a help page about Lindenmayer systems, and if you've played with them, the weeds (brown sticks on the left horizon) might look related. The impdoc page on <span style="font-family: courier;">lsystem-weeds</span> is also a strong hint that these are L-systems with a random component. You can try calling the function yourself with different depths, but none of the options match the target. Maybe the target uses different random numbers? There is a <span style="font-family: courier;">seed</span> variable, so you can try poking different numbers in there, and indeed the weeds will change. But what is the right value? Well the help page on L-systems refers to <i>The Algorithmic Beauty of Plants</i>, and we know that perfect numbers are considered beautiful, so let's try some of those. And indeed the 4th perfect number (8128) is the right value:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-mgxPa01ziME/YH3o3miybdI/AAAAAAAABQ0/SySlkVmtzs8W1clE-Ttp9vLb4RqC6ukMACLcBGAsYHQ/s600/034_seed.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-mgxPa01ziME/YH3o3miybdI/AAAAAAAABQ0/SySlkVmtzs8W1clE-Ttp9vLb4RqC6ukMACLcBGAsYHQ/s320/034_seed.png" /></a></div><p>What about the windmill? Windmills rotate, so if anything uses polar coordinates it'll probably be that. Let's write to <span style="font-family: courier;">polarAngleIncr</span> (it also works to call <span style="font-family: courier;">setGlobalPolarRotation</span>). You an use trial and error to hone in on the correct angle, or you can recall that Fuuns use 256 angle steps in a circle and measure the angle adjustment you need to make (although you'll still need an experiment or two to determine the sign convention). You'll want to set it to 5:</p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-8ODiPcGDkY4/YH3xXZuQehI/AAAAAAAABRE/Oo7WsYiJkREbsb2v7spVx0BoR5WojvI7wCLcBGAsYHQ/s600/035_windmill.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-8ODiPcGDkY4/YH3xXZuQehI/AAAAAAAABRE/Oo7WsYiJkREbsb2v7spVx0BoR5WojvI7wCLcBGAsYHQ/s320/035_windmill.png" /></a></div><br /> <p></p><p>We haven't yet used the clues about <span style="font-family: courier;">bioMul</span> and super-adaptive genes. It looks suspiciously like functional programming, and it turns out that there is an entire functional language hidden away inside this machine! The Major Imp episodes hinted at this (Imp being short for Imperative). The terminology is a little confusing though. I'll try to explain it, but as I'm not very familiar with functional programming I'll probably make real functional programmers cry.</p><p>"Genes" are functions (the imperative functions we've seen so far are also called "genes", but they're not the same). "Adaptation Trees" are expressions. An Adaptation Tree consists of the address of a gene (function) followed by a sequence of arguments. Some of those arguments are themselves expected to be expressions; where an expression is expected, it is always represented by its length. One of the most common genes to use is activateAdaptationTree, which is followed by the address of an adaptation tree in the green zone. In this case, it first evaluates the expression in that adaptation tree (which resolves to a function, because this is a lambda calculus where <i>everything</i> is a function, including integers), when is then invoked with the arguments.</p><p>In the symbol table, symbols with the <span style="font-family: courier;">_adaptation</span> suffix are adaptation trees. There are also some functional genes (like <span style="font-family: courier;">k</span> and <span style="font-family: courier;">intBox</span>) that don't have the suffix. We can use the information we've been given about the encoding to see what bioAdd looks like:</p><p>[ activateAdaptationTree caseVar1_adaptation<br /> [ activateAdaptationTree var2_adaptation ]<br /> [ activateAdaptationTree apply1_adaptation<br /> [ activateAdaptationTree bioSucc_adaptation ]<br /> [ activateAdaptationTree apply2_adaptation<br /> [ activateAdaptationTree bioAdd_adaptation ]<br /> [ activateAdaptationTree var1_adaptation ]<br /> [ activateAdaptationTree var2_adaptation ]<br /> ]<br /> ]<br />]<br /></p><p>One subtlety I didn't appreciate at first is that the BioNat type isn't a function taking two arguments (although note that <a href="https://en.wikipedia.org/wiki/Currying">currying</a> is prevalent), but a function taking a pair (which is a distinct data type). So working through this and reading the Fuun docs, we see that if var1 is 0, it returns var2, otherwise it returns the successor of something. apply2 will call var1 and var2 on the pair (getting the first and second elements), reassemble them into a pair, then pass it to bioAdd. That is actually somewhat redundant and the whole apply2 subtree could be replaced by using bioAdd directly, but no matter. This matches the description of bioAdd.</p><p>That should give us what we need to implement bioMul. Here's my implementation:</p><p>[ activateAdaptationTree caseVar1_adaptation<br /> [ activateAdaptationTree bioZero_adaptation ]<br /> [ activateAdaptationTree apply2_adaptation<br /> [ activateAdaptationTree bioAdd_adaptation ]<br /> [ activateAdaptationTree var2_adaptation ]<br /> [ activateAdaptationTree bioMul_adaptation ]<br /> ]<br />]<br /></p><p>In other words, 0 * y = 0, and (x+1) * y = y + x*y. My original implementation had the last two lines swapped around, which makes no semantic difference, but the implementation of this functional language is pretty simplistic and tends to evaluate some expressions multiple times in a way that can cause a stack explosion. I spent a lot of frustrated time debugging integer overflow errors as a result.</p><p>When you patch the adaptation tree in the green zone, remember that it must be preceeded by the length and followed by the length of the rest of the green zone. Unfortunately making this change doesn't seem to have any effect. If you poke around the other adaptation trees for a while you might find <span style="font-family: courier;">biomorph_adaptation</span>, which is a big complicated expression starting with this:</p><p>[ activateAdaptationTree enableBioMorph_adaptation<br /> [ activateAdaptationTree payloadBioMorph_adaptation<br /> [ activateAdaptationTree fromNat_adaptation<br /> [ activateAdaptationTree apply2_adaptation<br /> [ activateAdaptationTree bioMul_adaptation ]<br /> [ k<br /> [ activateAdaptationTree mkSucc_adaptation<br /> [ activateAdaptationTree mkZero_adaptation ]<br /> ]<br /> ]<br /> [ k<br /> [ activateAdaptationTree mkSucc_adaptation<br /> [ activateAdaptationTree mkSucc_adaptation<br /> [ activateAdaptationTree mkSucc_adaptation<br /> [ activateAdaptationTree mkZero_adaptation ]<br /> ]<br /> ]<br /> ]<br /> ]<br /> </p><p>So it looks like it's constructing some constants (1 and 3) and multiplying them (<span style="font-family: courier;">k</span> take a value and returns a constant function that always returns that value), and obviously it's depending on us to fix bioMul. But there is also that <span style="font-family: courier;">enableBioMorph_adaptation</span> adaptation right at the top. What does that do?</p><p>[ false ]</p><p>Aha! Confusingly, it has nothing to do with enableBioMorph. If we change it to [true], then the clumps of grass move into the correct positions:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-QtfWl0U9ftk/YH3xcIbhVNI/AAAAAAAABRI/EpUBiK0Oma8n6IsoGjBBiC2mlr9fs8CMwCLcBGAsYHQ/s600/036_grass.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-QtfWl0U9ftk/YH3xcIbhVNI/AAAAAAAABRI/EpUBiK0Oma8n6IsoGjBBiC2mlr9fs8CMwCLcBGAsYHQ/s320/036_grass.png" /></a></div><p>Looking through the <span style="font-family: courier;">_adaptation</span> symbols, there seem to be a few related to the goldfish. In particular, take a look at <span style="font-family: courier;">goldenFish_adaptation</span>:</p><p>[ activateAdaptationTree mkBeforeAbove_adaptation<br /> [ activateAdaptationTree mkEmp_adaptation<br /> [ intBox 20 ]<br /> [ intBox 20 ]<br /> ]<br /> [ activateAdaptationTree mkBeforeAbove_adaptation<br /> [ activateAdaptationTree mkAbove_adaptation<br /> [ activateAdaptationTree mkBeforeAbove_adaptation<br /> [ activateAdaptationTree mkGoldfishL_adaptation ]<br /> [ activateAdaptationTree mkBeforeAbove_adaptation<br /> [ activateAdaptationTree mkEmp_adaptation<br /> [ intBox 8388606 ]<br /> [ intBox 8388607 ]<br /> ]<br /> [ activateAdaptationTree mkGoldfishR_adaptation ]<br /> ]<br /> ]<br /> [ activateAdaptationTree mkBeforeAbove_adaptation<br /> [ activateAdaptationTree mkEmp_adaptation<br /> [ intBox 45 ]<br /> [ intBox 24 ]<br /> ]<br /> [ activateAdaptationTree mkGoldfishL_adaptation ]<br /> ]<br /> ]<br /> [ activateAdaptationTree mkBeforeAbove_adaptation<br /> [ activateAdaptationTree mkEmp_adaptation<br /> [ intBox 0 ]<br /> [ intBox 24 ]<br /> ]<br /> [ activateAdaptationTree mkAbove_adaptation<br /> [ activateAdaptationTree mkGoldfishR_adaptation ]<br /> [ activateAdaptationTree mkBeforeAbove_adaptation<br /> [ activateAdaptationTree mkEmp_adaptation<br /> [ intBox 25 ]<br /> [ intBox 29 ]<br /> ]<br /> [ activateAdaptationTree threeFish_adaptation<br /> [ activateAdaptationTree mkGoldfishR_adaptation ]<br /> [ activateAdaptationTree emptyBox_adaptation ]<br /> ]<br /> ]<br /> ]<br /> ]<br /> ]<br />]<br /></p><p>This is describing the layout of the goldfish in some way. You don't need to understand all the details. Just try changing the various numbers to see what happens. Also try changing some of the goldfish from L to R or vice versa. You can get things mostly correct this way, but there just aren't enough goldfish. But notice that <span style="font-family: courier;">emptyBox_adaptation</span> right at the end: what if we change it to <span style="font-family: courier;">mkGoldfishR_adaptation</span>?</p><p>The full solution is to swap the direction of all the first, change the 0 to 18, and swap out the empty box as above. Now we have this:</p><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-t588fBYLepU/YH74Y_1UFiI/AAAAAAAABRY/CDzWQeagL74BB0DQSkvVRE7d_ES4JTdrwCLcBGAsYHQ/s600/037_goldfish.png" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="600" data-original-width="600" height="320" src="https://1.bp.blogspot.com/-t588fBYLepU/YH74Y_1UFiI/AAAAAAAABRY/CDzWQeagL74BB0DQSkvVRE7d_ES4JTdrwCLcBGAsYHQ/s320/037_goldfish.png" /></a></div><br /><p>That's it for the functional programming language in terms of fixing up the scene. While trying to track down bugs in my bioMul implementation I ended up learning a lot more about it, which I'll share just for interest:</p><ul style="text-align: left;"><li>True is a function taking two arguments and returning the first, while false takes two arguments and returns the second.</li><li>Zero is <i>also</i> a function taking two arguments and returning the first. So in fact, <span style="font-family: courier;">true</span>, <span style="font-family: courier;">k</span> and <span style="font-family: courier;">mkZero_adaptation</span> have identical implementations (for whatever reason the last is an adaptation tree that simply calls an anonymous gene, which seems to be quite common).</li><li>Positive integers are functions that take two arguments and call the second with one less than that integer. If this sounds a lot like what <span style="font-family: courier;">caseVar1</span> does, it is. There is another (undocumented) adaptation called <span style="font-family: courier;">caseNat</span> which takes a single integer (instead of a pair), and its implementation is an identity function.<br /></li><li>A pair is a functor that takes a function and calls it with the elements of the pair i.e. λf.f(x)(y).</li><li>Apart from the "Nat" type (Peano integers), the language does allow "raw" 24-acid numbers to be used, but only where they're expected (using them in the wrong place causes them to be confused with lengths; you might have already seen this if you try to fully decompile <span style="font-family: courier;">biomorph_adaptation</span>). <span style="font-family: courier;">intBox</span> allows passing such a raw integer to the following expression. <span style="font-family: courier;">fromNat</span> converts a Nat to a raw integer and passes it to the following expression. These are used to interface with imperative code via a special gene called <span style="font-family: courier;">wrapImp</span> (which is in turn wrapped by adaptation trees like <span style="font-family: courier;">wrapSub_adaptation</span>).</li></ul><p>We won't get a whole lot further by just poking at variables. We'll need to start taking apart some of the code to see how it works. For example, we'll discover that there is another repair guide page that we haven't spotted yet. But that's for Part 2. If you made it this far, I hope you're having fun!<br /></p>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-57345543162377941592020-01-26T02:45:00.003-08:002020-01-26T02:45:59.984-08:00Thoughts on Rust<div dir="ltr" style="text-align: left;" trbidi="on">I've recently spent a few days learning to program in Rust, and thought I'd write down my thoughts so far. I used programming contest problems as away to get practical experience, which probably biases things somewhat. For example, I didn't look into multi-threading, testing, or see very much of the standard library.<br /><h3 style="text-align: left;">References</h3><div style="text-align: left;">This is obviously the stand-out feature of Rust. It seems like a nice idea, and eliminating use-after-free, double-free, null pointer dereferences, and many forms of memory leak sounds great from a C/C++ perspective. While modern C++ makes it much easier to manage ownership safely, it doesn't really address borrowing.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The mutability rules also sound like a huge win for safe concurrency, but I haven't looked into any of the details.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The one thing I've found non-intuitive is that it seems sometimes references are implicitly dereferenced (e.g., you can add two references to integers) but in other places they aren't (e.g., adding in place to a mutable reference to integer).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">I haven't really tried using references very heavily: my contest coding style tends not to use a lot of pointers anywhere, preferring standard library structures like vectors. </div><h3 style="text-align: left;">Types</h3><div style="text-align: left;">I found it really annoying that there is no automatic coercion between integer types. What's worse, the typecast operator ("as") doesn't follow the same rules as arithmetic operations, namely panicking on overflow in debug builds. It's also annoying that indexing requires an unsigned type. If you have an index (of type usize) and an offset to it (of type i32, possibly negative) it is a real pain to add them to produce a new index.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The type inference also seems like spooky action at a distance: I'm all for omitting the type when declaring a variable and inferring it from the initialiser, but inferring it from usage elsewhere in the code takes some getting used to. It leads to weirdness where changing/removing some apparently unrelated line of code can cause a compilation failure, or even silently change the type of a variable. Given that the indexing requires unsigned, I worry that some variable that might need to store negative values could end up unsigned without one being aware of it just because of an inference chain from an index.</div><h3 style="text-align: left;">Overloading</h3><div style="text-align: left;">Coming from C++, it's disappointing to have no function/method overloading, not even default arguments. It leads to having to invent names for lots of variants of basically the same function. And unfortunately I don't think it can be easily fixed, because of the aggressive type inference: overloading uses the types of arguments to select a function overload, but type inference uses the types of formal parameters to infer the types of arguments. So as soon as you add a new overload, you're reducing the power of type inference and potentially making old code no longer valid.</div><h3 style="text-align: left;">Traits and generics</h3><div style="text-align: left;">While it took a bit of getting used to when coming from traditional OO languages like C++ and Python, I quite like the traits system. It's a lot better than Java interfaces, because you can have default implementations, and you can define new traits and bolt them on to existing classes. It's a much cleaner way to put type bounds on generics than SFINAE in C++, and seems like it probably has many of the same advantages as C++ concepts (not that I've looked at the latest incarnations of C++ concepts).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">I also really like the way traits allow you to use a trait with either static dispatch (ala C++ templates) or dynamic dispatch (ala C++ virtual functions), rather than forcing the API designer to choose one or the other. I suspect there are also some performance advantages: in C++ when using a polymorphic class, you pay for a virtual function call even when the exact class is known to the programmer, unless the compiler can also determine that it is known or the programmer uses final classes/methods. With Rust there is no inheritance, so an object whose type is a concrete struct will have exactly that type.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The one thing I disliked is that methods defined in traits end up looking the same as normal methods on the class. If you're not aware that a particular method is actually provided by a trait (or you don't know which trait), it can mysteriously fail to exist if you forget to import the trait into your namespace.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The generics system still has some way to go before it catches up to C++ e.g. there are no non-type template parameters (although it's being worked on), no variadic templates, and from what I could see, no real specialisation to override more generic implementations.</div><h3 style="text-align: left;">Enums</h3><div style="text-align: left;">I think this is one of the more under-rated features of Rust. Rust enums are really discriminated unions (ala boost::variant), with first-class language support. I particularly like the "?" operator: given an expression of type Result<t e=""> (which holds either an "ok" or an error), put a "?" after it, and if it is an error it will immediately return it from the function. This means that although Rust doesn't have exceptions in the same way as C++/Java/Python, one can propagate errors with very little boilerplate, and with the benefit that it's explicit where early returns might happen. It certainly looks nicer than what I've seen of Go. </t></div><h3 style="text-align: left;">Performance</h3><div style="text-align: left;">I translated a few contest solutions from C++ to Rust, and was pleasantly surprised by the performance: generally faster than the C++ code (which might just be because Rust uses LLVM, which is pretty good and often better than GCC, particularly since Codeforces uses a 32-bit GCC). The one area where it was much worse is writing large outputs to stdout, because Rust always makes stdout line-buffered while GCC makes it fully-buffered when it is not a TTY (which is a known issue in Rust). After wrapping a buffer around stdout the performance was good again.</div><h3 style="text-align: left;">Summary</h3><div style="text-align: left;">In general I like Rust, although I think it still needs a few years to mature before I'd consider abandoning C++ for it. While Go seems to be getting all the popularity, I think a high-performance language really needs to avoid garbage collection and provide a strong compile-time generics system.</div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-81731269198305258312018-01-20T10:12:00.004-08:002018-01-20T10:12:51.639-08:00COCI 2017/2018 r5 analysis<div dir="ltr" style="text-align: left;" trbidi="on"><div style="text-align: left;">For the problem texts, see <a href="http://hsin.hr/coci/">here</a>. I found this contest easier than usual for a COCI.</div><h2 style="text-align: left;">Olivander</h2><div style="text-align: left;">Given two wands A < B and two boxes X > Y, there is never any point in putting A in X and B in Y; if they fit, then they also fit the other way around. So sort the boxes and wands, match them up in this order and check if it all fits.</div><h2 style="text-align: left;">Spirale</h2><div style="text-align: left;">This is really just an implementation challenge. For each spiral, walk along it, filling in the grid where the spiral overlaps it. With more than one spiral, set each grid point to the minimum of the values induced by each spiral. Of course, it is not necessary to go to \(10^{100}\); about 10000 is enough to ensure that the grid is completely covered by the partial spiral.</div><h2 style="text-align: left;">Birokracija</h2><div style="text-align: left;">The rule for determining which employee does the next task is a red herring. Each employee does exactly one task, and the order doesn't matter. The money earned by an employee is the sum of the distance from that employee to all underlings. The can be computed by a tree DP (computing both this sum and the size of each subtree).</div><h2 style="text-align: left;">Karte</h2><div style="text-align: left;">Suppose there is a solution. It can be modified by a series of transformations into a canonical form.Firstly, moving a false claim below an adjacent true claim will not alter either of them; thus, we can push all the false claims to the bottom and true claims to the top. At this point, the true claims can be freely reordered amongst themselves. Similarly, if we have a false claim with small <i>a</i> above one with a larger a, we can swap them without making either true. So we can assume that the false claims are increasing from bottom to top in the deck. Finally, given a true claim with large <i>a</i> and a false claim with small <i>a</i>, we can swap them and they will both flip (so the positions of false claims stay the same).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">After these transformations, the deck will, from bottom to top, contain the largest K cards in increasing order, followed by the rest in arbitrary order (let's say increasnig). To solve the problem, we simply construct this canonical form, then check if it indeed satisfies the conditions.</div><h2 style="text-align: left;">Pictionary</h2><div style="text-align: left;">When building roads on the day with factor F, we don't actually need to build roads between every pair of multiples of F: it is equivalent to connect every multiple of F to F. This gives O(N log M) roads. We can represent the connected components after each day with a standard union-find structure. For reasons we'll see later, we won't use path compression, but always putting the smaller component under the larger one in the tree is sufficient to ensure a shallow tree (in theory O(log N), but I found the maximum depth was 6). A slow solution would be to check every remaining query after each day to see whether the two mathematicians are in the same component yet.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">To speed this up, we can record extra information in the tree: for each edge, we record the day on which it was added. If the largest label on a path from A to B is D, then D was the first day on which they were connected (this property would be broken by path compression, which is why we cannot use it). Thus, to answer a query, we need only walk up the tree from each side to the least common ancestor; and given the shallowness of the tree, is this cheap.</div><h2 style="text-align: left;">Planinarenje</h2><div style="text-align: left;">I really liked this problem. Take a single starting peak P. Let A be the size of the maximum matching the graph, and let B be the size of the maximum matching excluding P. Suppose A = B. Then Mirko can win as follows: take the latter matching, and whenever Slavko moves to a valley, Mirko moves to the matched peak. Slavko can never move to a valley without a match, because otherwise the journey would form an augmenting path that would give a matching for the graph of size B + 1.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Conversely, suppose A > B. Then take a whole-graph matching, which by the assumption must include a match for P. Slavko can win by always moving to the matched valley. By a similar argument, Mirko can never reach an unmatched peak, because otherwise toggling all the edges on their journey would give a maximum matching that excludes P.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">To implement it, it may not be efficient enough to construct a new subgraph matching from every peak. Instead, one can start with a full-graph matching, remove P and its match from the graph (if any), then re-augment starting from that match. This should give an O(NM) algorithm (O(NM) for the initial matching, then O(M) per query).</div></div>Unknownnoreply@blogger.com6tag:blogger.com,1999:blog-31847281.post-38381065761615770402017-11-26T12:26:00.004-08:002017-11-26T12:26:54.309-08:00Analysis of Croatian Open contest 2017r3<div dir="ltr" style="text-align: left;" trbidi="on">This round of <a href="http://hsin.hr/coci/">COCI</a> was more challenging than usual, with some of the lower-scoring problems being particularly difficult.<br /><h3 style="text-align: left;">Aron</h3><div style="text-align: left;">This was quite straightforward: run through the letters, and each time a letter is different to the previous one (or is the first one), increment a counter.</div><h3 style="text-align: left;">Programiranje</h3><div style="text-align: left;">Firstly, one word can be made by rearranging the letters of another if and only if they have the same letter frequencies. So if we have a fast way to find the letter frequencies in any substring, the problem is solved. Consider the sequence whose ith element is 1 if S[i] is 'a' and 0 otherwise: the number of a's in a substring is then just a contiguous sum of this sequence. By precomputing the prefix sum, we can find the sum of any interval in constant time. We thus just need to store 26 prefix sum tables, one for each letter.</div><h3 style="text-align: left;">Retro</h3><div style="text-align: left;">The first part (finding the length) is a reasonably standard although fiddly dynamic programming problem. At any point in the process, one has a position relative to the original grid and a nesting depth, and need to know the longest suffix that will complete a valid bracket sequence. For each of the (up to) three possible moves, one can compute the state would be reached, and find the longest possible sequence from previously computed values.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">My attempt at the second part during the contest was overly-complex and algorithmically too slow, involving trying to rank all length-L strings represented by DP states, for each L in increasing order. This required a lot of sorting, making the whole thing O(N³ log N) (where N is O(R + C)).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">A simpler and faster solution I wrote after the contest is to reconstruct the path non-deterministically, one bracket at a time. Instead of working from a single source state, keep a set of candidate states (in super-position). From those, run a breadth-first (or depth-first) search to find all states where one might encounter the next bracket. Any transition that is optimal (for length) in the original DP now becomes an edge in this search. Once the candidate states for the next bracket has been found, there might be a mix of opening and closing brackets. If so, discard those states corresponding to closing brackets, since they will not be lexicographically minimal.</div><h3 style="text-align: left;">Portal</h3><div style="text-align: left;">I didn't attempt this problem, and am not sure how to solve it. I suspect it requires coming up with and proving a number of simplifying assumptions which allow the search space to be reduced.</div><h3 style="text-align: left;">Dojave</h3><div style="text-align: left;">I really liked this problem. First of all, M = 1 is a bit tricky, because it's the only case where you can't swap two numbers without having an effect. It's easy enough to solve by hand and treat as a special case.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Now let's count all the cases where the player <i>can't</i> win. Let's say that the lit elements have an XOR of K. For M > 1, we can't have \(K = 2^M - 1\), since otherwise the player could swap either two lit or two unlit numbers and win. Let \(K' = K \oplus 2^M - 1\), and consider a lit number A. To win, the player has to swap it with B such that \(A \oplus B = K'\). As the RHS can't be 0, the lit numbers must be grouped into pairs with XOR of K'. The XOR of all the lit numbers must thus be either 0 or K', depending on whether there are an even or odd number of such pairs. But this is K (by definition); and since K = K' is impossible, we must have K = 0.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">So now we need to find intervals such that the XOR of all elements is 0, and for every pair of complementary elements A and B, they are either both inside or both outside the interval. The first part is reasonably easy to handle: we can take a prefix sums (using \(\oplus\) instead of addition), and an interval with XOR of 0 is one where two prefix sums are equal. For the second part, we can use some sweeps: for each potential left endpoint, establish an upper bound on the right endpoint (considering only complementary pairs that lie on opposite sides of the left endpoint), and vice versa - doable with an ordered set or a priority queue to keep track of right elements of complementary pairs during the sweep. Now perform another left-to-right sweep, keeping track of possible left endpoints that are still "active" (upper bound not yet reached), and each time a right endpoint is encountered, query how many of the left endpoints with the same suffix sum are above the lower bound for the right end-point. Actually getting the count in logarithmic time requires a bit of fancy footwork with data structures that I won't go into (e.g. a Fenwick tree, a segment tree or a balanced binary tree with node sizes in the internal nodes).</div><h3 style="text-align: left;">Sažetak</h3><div style="text-align: left;">I somehow miscounted and didn't realise that this problem even existed until after the contest (maybe because I don't expect more than three harder problems in COCI). It's disappointing, because it only took me about 15 minutes to solve after the contest, yet was worth the most points.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">For each K-summary, draw a line between each pair of adjacent segments of that summary. For example, with a 2-summary and 3-summary and N=9, you would get |01|2|3|45|67|8|9|. Clearly in any marked interval with more than one number, none of them can be determined. It is also not hard to see that in any interval with exactly one number, that number can be determined (hint: find the sum of all numbers except that one). Thus, we need the count of numbers X such that X is a multiple of some Ki and X+1 is a multiple of another Kj. Note that if any K=1 then the answer is simply N, which we'll assume is handled as a special case.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">If one has chosen a single Ki and Kj, it is not difficult to determine the numbers that satisfy the property, using the Chinese Remainder Theorem (they will be in arithmetic progression, with period Ki×Kj, and so are easy to count). However, this can easily lead to double-counting, and we thus need to apply some form of inclusion-exclusion principle. That is, we can pick any two subsets S and T of the K's, with products P and Q, and count the number of X's such that P divides X and Q divides X + 1. Note that S and T must be disjoint, as otherwise the common element would need to divide X and X+1 (and we assumed Ki > 1). We also need to determine the appropriate coefficient to scale this term in the sum. Rather than trying to derive equations, I let the computer do the work: for each value of |S| and |T| I consider every possible subset of S and of T (actually, just each possible size), add up the coefficients they're already contributing, and set the coefficient for this value of |S| and |T| to ensure that the total is 1.</div></div>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-31847281.post-18479708433254053842017-06-12T23:19:00.000-07:002017-06-12T23:19:01.320-07:00Extra DCJ 2017 R2 analysis<div dir="ltr" style="text-align: left;" trbidi="on"><h3 style="text-align: left;">Flagpoles</h3><div style="text-align: left;">My solution during the contest was essentially the same as the official analysis. Afterwards I realised a potential slight simplification: if one starts by computing the second-order differences (i.e., the differences of the differences), then one is looking for the longest run of zeros, rather than the longest run of the same value. That removes the need to communicate the value used in the runs at the start and end of each section.</div><h3 style="text-align: left;">Number Bases</h3><div style="text-align: left;">I missed the trick of being able to uniquely determine the base from the <i>first</i> point at which X[i] + Y[i] ≠ Z[i]. Instead, at every point where X[i] + Y[i] ≠ Z[i], I determine two candidate bases (depending on whether there is a carry of not). Then I collect the candidates and test each of them. If more than three candidates are found, then the test case is impossible, since there must be two disjoint candidate pairs.</div><h3 style="text-align: left;">Broken Memory</h3><div style="text-align: left;">My approach was slightly different. Each node binary searches for its broken value, using two other nodes to help (and simultaneously helping two other nodes). Let's say we know the broken value is in a particular interval. Split that interval in half, and compute hashes for each half on the node (h1 and h2) and on two other nodes (p1 and p2, q1 and q2). If h1 equals p1 or q1, then the broken value must be in interval 2, or vice versa. If neither applies, then nodes p and q both have broken values, in the opposite interval to that of the current node. We can tell which by checking whether p1 = q1 or p2 = q2.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">This does rely on not having collisions in the hash function. In the contest I relied on the contest organisers not breaking my exact choice of hash function, but it is actually possible to write a solution that works on all test data. Let P be a prime greater than \(10^{18}\). To hash an interval, compute the sums \(\sum m_i\) and \(\sum i m_i\), both mod P, giving a 128-bit hash. Suppose two sequences p and q collide, but differ in at most two positions. The sums are the same, so they must differ in exactly two positions j and k, with \(p_j - q_j = q_k - p_k\) (all mod P). But then the second sums will differ by</div><div style="text-align: left;">\(jp_j + kp_k - jq_j - kq_k = (j - k)(p_j - q_j)\), and since P is prime and each factor is less than P, this will be non-zero.</div></div>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-31847281.post-43450258690438233972017-06-12T00:13:00.001-07:002017-06-12T00:15:26.674-07:00Alternative Code Jam 2017 R3 solution<div dir="ltr" style="text-align: left;" trbidi="on">This last weekend was round 3 of Code Jam 2017 and round 2 of Distributed Code Jam 2017. I'm not going to describe how to solve all the problems since there are official analyses (<a href="https://code.google.com/codejam/contest/8304486/dashboard#s=a">here</a> and <a href="https://code.google.com/codejam/contest/3284486/dashboard#s=a">here</a>), but just mention some alternatives. For this post I'll just talk about one problem from Code Jam; some commentary on DCJ will follow later.<br /><h3 style="text-align: left;">Slate Modern (Code Jam)</h3><div style="text-align: left;">The idea I had for this seems simpler (in my opinion, without having tried implementing both) than the official solution, but unfortunately I had a bug that I couldn't find until about 10 minutes after the contest finished.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">As noted in the official analysis, one can first check whether a solution is possible by comparing each pair of fixed cells: if the difference is value is greater than D times the Manhattan distance, then it is impossible; if no such pair exists, it is possible. The best solution is then found by setting each cell to the smallest lower bound imposed by any of the fixed cells.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Let's try to reduce the complexity by a factor of C, by computing the sum of a single row quickly. If we look at the upper bound imposed by one fixed cell, it has the shape \(b + |x - c| D\), where \(x\) is the column and b, c are constants. When combining the upper bounds, we take the lowest of them. Each function will be the smallest for some contiguous (possibly empty) interval. By sweeping through the fixed cells in order of c, we can identify those that contribute, similar to finding a Pareto front. Then, by comparing adjacent functions one can find the range in which each function is smallest, and a bit of algebra gives the sum over that range.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">This reduces the complexity to O(RN + N²) (the N² is to check whether a solution is possible, but also hides an O(N log N) to sort the fixed cells). That's obviously still too slow. The next insight is that most of the time, moving from one row to the next changes very little: each of the functions increases or decreases by one (depending on whether the corresponding fixed cell is above or below the row), and the range in which each function is smallest grows or shrinks slightly. It thus seems highly likely that the row sum will be a low-degree polynomial in the row number. From experimentation with the small dataset, I found that it is actually a quadratic (conceptually it shouldn't be too hard to prove, but I didn't want to get bogged down in the details).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Note that I said "most of the time". This will only be true piecewise, and we need to find the "interesting" rows where the coefficients change. It is fairly obvious that rows containing fixed cells will be interesting. Additionally, when the range in which a function is smallest disappears or appears will be interesting. Consider just two fixed cells, and colour the whole grid according to which of the two fixed cells gives the lower upper bound. The boundary between the two colours can have diagonal, horizontal and vertical portions, and it is these horizontal portions that are (potentially) interesting. I took the conservative approach of adding all O(N²) such rows as interesting.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Now that we have partitioned the rows into homogeneous intervals, we need to compute the sum over each interval efficiently. Rather than determine the quadratic coefficients analytically, I just interfered them by taking the row sums of three consecutive rows (if there are fewer than three rows in the interval, just add up their row sums directly). A bit more algebra to find the sum of a quadratic series, and we're done! There are O(N²) intervals to sum and it requires O(N) to evaluate each of the row sums needed, giving a running time of O(N³). I suspect that this could probably be reduced to O(N² log N) by being smarter about how interesting rows are picked, but it is not necessary given the constraints on N.</div></div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-31847281.post-53295121464490044972016-11-22T22:45:00.001-08:002016-11-22T22:45:15.422-08:00An alternative DCJ 2016 solution<div dir="ltr" style="text-align: left;" trbidi="on">During this year's Distributed Code Jam I had an idea for solving the hard part of Toothpick Sculptures, but had no time to implement it, or even to work out all the details. When the official solution was presented, I was surprised that it was very different to mine. Recently Pablo from Google has been able to help me check that my solution does indeed pass the system tests (thanks Pablo), so it seems like a good time to describe the solution.<br /><br />There are going to be a lot of different trees involved, so let's give them names. A "minitree" is one of the original sculptures of N toothpicks, and its root is a "miniroot". The "maxitree" is the full tree of 1000N toothpicks. Let's assume that each minitree is built out of toothpicks of one colour. We can then consider a tree whose vertices are colours, called the "colour tree", which contains an edge between two colours if and only if the maxitree contains an edge between two vertices of those colours.<br /><br />For each colour, we can find its children in the colour tree by walking the corresponding minitree looking for children which are miniroots. This can be done in parallel over the colours.<br /><br />For a given vertex, we can use DP to solve the following two problems for the subtree: what is the minimum cost to stabilise the subtree if we stabilise the root, and what is the minimum cost if we do not stabilise the root? But this will not be so easy to parallelise. We can go one step further: if we fix some vertex C and cut off the subtree rooted at C, then we can answer the same two problems for each case of C being stabilised or not stabilised (four questions in total). If we later obtain the answers to the original two queries for C, then we can combine this with the four answer we have to answer the original queries for the full tree.<br /><br />This is sufficient to solve the small problem, and more generally any test case where the colour tree does not branch. For each miniroot, compute the DP, where the miniroot corresponding to the single child (if any) in the colour tree is the cutoff. These DPs can be done in parallel, and the master can combine the results.<br /><br />Let's consider another special case, where the colour tree is shallow. In this case, one can solve one layer at a time, bottom up, without needing to use the cutoff trick at all. The colours in each layer of the colourtree are independent and can be solved in parallel, so the latency is proportional to the depth of the tree. The results of each layer are fed into the calculations of the layer above.<br /><br />So, we have a way to handle long paths, and a way to handle shallow trees. This should immediately suggest light-heavy decomposition. Let the "light-depth" of a node in the colour tree be the number of light edges between it and the root. The native of light-heavy decomposition guarantees that light-depth is at most logarithmic in the tree size (which is 1000). We will process all nodes with the same light-depth in parallel. This means that a node in a tree may be processed at the same time as its children, but only along a heavy path. We thus handle each heavy path using the same technique as in the small problem. For other children in the colour tree, the subtree results were computed in a previous pass and are sent to the slave.</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-45148395329081301192016-11-22T22:37:00.003-08:002016-11-22T22:46:24.051-08:00TCO 2016 finals<div dir="ltr" style="text-align: left;" trbidi="on">The final round of TCO 2016 did not go well for me. I implemented an over-complicated solution to the 500 that did not work (failed about 1 in 1000 of my random cases), and I started implementing the 400 before having properly solved it, leading to panic and rushed attempts to fix things in the code rather than solving the problem.<br /><h3 style="text-align: left;">Easy</h3><div style="text-align: left;">The fact that there are only four mealtimes is a clue that this will probably require some exponential time solution. I haven't worked out the exact complexity, but it has at least a factor of \(2^\binom{n}{n/2}\) for n mealtimes.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Let's restrict our attention to meal plans that contain no meals numbered higher than m and have t meals in the plan (t can be less than 4). We can further categorise these meals by which subsets of the mealtimes can accommodate the plan. This will form the state space for dynamic programming. To propagate forwards from this state, we iterate over the number of meals of type m+1 (0 to 4 of them). For each choice, we determine which subsets of mealtimes can accommodate this new meal plan (taking every subset that can accommodate the original plan and combining it with any disjoint subset that can accommodate the new meals of type m+1).<br /><h3 style="text-align: left;">Medium</h3><div style="text-align: left;">Let's start by working out whether a particular bracket sequence can be formed. Consider splitting the sequence into a prefix and suffix at some point. If the number of ('s in the prefix differs between the initial and target sequence, then the swaps that cross the split need to be used to increase or decrease the number of ('s. How many ('s can we possibly move to the left? We can answer that greedily, by running through all the operations and applying them if and only if they swap )( to (). Similarly, we can find the least possible number of ('s in the final prefix.<br /><br />If our target sequence qualifies by not having too many or too few ('s in any prefix, can we definitely obtain it? It turns out that we can, although it is not obvious why, and I did not bother to prove it during the contest. Firstly, if there is any (proper) prefix which has exactly the right number of ('s, we can split the sequence at this point, choose not to apply any operations that cross the split, and recursively figure out how to satisfy each half (note that at this point we are not worrying whether the brackets are balanced — this argument applies to any sequence of brackets).<br /><br />We cannot cross between having too few and too many ('s when growing the prefix without passing through having the right number, so at the bottom of our recursion we have sequences where every proper prefix has too few ('s (or too many, but that can be handled by symmetry). We can solve this using a greedy algorithm: for each operation, apply it if and only if it swaps )( to () and the number of ('s in the prefix is currently too low. We can compare the result after each operation to the maximising algorithm that tries to maximise the number of ('s in each prefix. It is not hard to see (and prove, using induction) that for any prefix and after each operation is considered, the number of ('s in the prefix is:</div><ul style="text-align: left;"><li>between the original value and the target value (inclusive); and</li><li>the smaller of the target value and the value found in the maximising algorithm.</li></ul><div style="text-align: left;">Thus, after all operations have been considered, we will have reached the target.<br /><br />So now we know how to determine whether a specific bracket sequence can be obtained. Counting the number of possible bracket sequences is straightforward dynamic programming: for each prefix length, we count the number of prefixes<br />that are valid bracket sequence prefixes (no unmatched right brackets) for each possible nesting depth.</div><h3 style="text-align: left;">Hard</h3><div style="text-align: left;">This is an exceptionally mean problem, and I very much doubt I would have solved it in the fully 85 minutes; congratulations to Makoto for solving all three!<br /><br />For convenience, let H be the complement of G2 (i.e. n copies of G1). A Hamiltonian path in G2 is simply a permutation of the vertices, such that no two consecutive vertices form an edge in H. We can count them using inclusion-exclusion, which requires us to count, for each m, the number of paths that pass through any m chosen edges. Once we pick a set of m edges and assign them directions (in such a way that we have a forest of paths), we can permute all the vertices that are not tails of edges, and the remaining vertices have their positions uniquely determined.<br /><br />Let's start by solving the case n=1. Since k is very small, we can expect an exponential-time solution. For every subset S of the k vertices and every number m of edges, we can count the number of ways to pick m edges with orientation. We can start by solving this for only connected components, which means that m = |S| - 1 and the edges form a path. This is a standard exponential DP that is used in the Travelling Salesman problem, where one counts the number of paths within each subset ending at each vertex.<br /><br />Now we need to generalise to m < |S| - 1. Let v be the first vertex in the set. We can consider every subset for the path containing v, and then use dynamic programming to solve for the remaining set of vertices. This is another exponential DP, with a \(O(3^n)\) term.<br /><br />Now we need to generalise to n > 1. If a graph contains two parts that are not connected, then edges can be chosen independently, and so the counts for the graph are obtained by convolving the counts for the two parts. Unfortunately, a naïve implementation of this convolution would require something like O(n²k) time, which will be too slow. But wait, why are we returning the answer modulo 998244353? Where is our old friend 1000000007, and his little brother 1000000009? In fact, \(998244353 = 2^{23} \times 7 \times 17 + 1\), which strongly suggests that the Fast Fourier Transform will be involved. Indeed, we can perform an 1048576-element FFT in the field modulo 998244353, raise each element to the power of n, and then invert the FFT.<br /><br />Despite the large number of steps above, the solution actually requires surprisingly little new code, as long as one has library code for the FFT and modulo arithmetic.</div></div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-42541078882516302972016-11-21T03:59:00.003-08:002016-11-21T03:59:32.870-08:00TCO 2016 semifinal 2 solutions<div dir="ltr" style="text-align: left;" trbidi="on">Semifinal 2 was much tougher than semifinal 1, with only one contestant correctly finishing two problems. The first two problems each needed a key insight, after which very little coding is needed (assuming one has the appropriate library code available). The hard problem was more traditional, requiring a serious of smaller steps to work towards the solution.<br /><h3 style="text-align: left;">Easy</h3><div style="text-align: left;">Consider the smallest special clique. What happens if we remove an element? Then we will still have a clique, but the AND of all the elements is non-zero. In particular, there is some bit position which is 1 in every element of the special clique except for 1.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">We can turn that around: it suffices to find a bit position B, a number T which is zero on bit B, and a special clique containing T whose other elements are all 1 on bit B. We can easily make a list of candidates to put in this special clique: any element which is 1 on bit B and which is connected to T. What's more, these candidates are all connected to each other (through bit B), so taking T plus all candidates forms a clique. What's more, if we can make a special clique by using only a subset of the candidates, then the clique formed using all candidates will be special too, so the latter is the only one we need to test.</div><h3 style="text-align: left;">Medium</h3><div style="text-align: left;">The key insight is to treat this as a flow/matching problem. Rather than thinking of filling and emptying an unlabelled tree, we can just build a labelled tree with the properties that if A's parent is B, then B < A and A appears earlier in p than B does. A few minutes thinking should convince you that any labelled tree satisfying these constraints can be used for the sorting.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">We thus need to match each number to a parent (either another number or the root). We can set this up with a bipartite graph in the usual way: each number appears on the left, connected to a source (for network flow) with an edge of capacity 1. Each number also appears on the right, along with the root, all connected to the sink with N edges of capacity 1 (we'll see later why I don't say one edge of capacity N). Between the two sides, we add an edge A → B if B is a viable parent for A, again with capacity 1. Each valid assignment can be represented by a maxflow on this network (with flow N), where the edges across the middle are parent-child relationships. Note that normally using this approach to building a tree risks creating cycles, but the heap requirement on the tree makes that impossible.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">That tells us which trees are viable, but we still need to incorporate the cost function. That is where the edges from the right numbers to the sink come in. A (non-root) vertex starts with a cost of 1, then adding more children increases the cost by 3, 5, 7, ... Thus, we set the costs of the edges from each number to the sink to this sequence. Similarly, the costs of the edges from the root to the sink have costs 1, 3, 5, 7, ... The min-cost max-flow will automatically ensure that the cheaper edges get used first, so the cost of the flow will match the cost of the tree (after accounting for the initial cost of 1 for each number, which can be handled separately).</div><h3 style="text-align: left;">Hard</h3><div style="text-align: left;">This one requires a fairly solid grasp on linear algebra and vector spaces. The set with \(2^k\) elements is a vector subspace of \(\mathbb{Z}_2^N\) of dimension k, for some unknown N. It will have a basis of size k. To deal more easily with ordering, we will choose to consider a particular canonical basis. Given an arbitrary initial basis, one can use Gaussian reduction and back propagation to obtain a (unique) basis with the following property: the leading 1 bit in each basis element is a 0 bit in every other basis element. With this basis, it is not difficult to see that including any basis element in a sum will increase rather than decrease the sum. The combination of basis elements forming the ith sorted element is thus given exactly by the binary representation of i.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">We can now take the information we're given an recast it is a set of linear equations. Furthermore, we can perform Gaussian elimination on these linear equations (which we actually do in code, unlike the thought experiment Gaussian elimination above). This leaves some basis elements fixed (uniquely determined by the smaller basis elements) and others free. Because of the constraints of the basis elements, we also get some more information. If basis i appears in an equation, its leading bit must correspond to a 1 bit in the value (and to the first 1 bit, if i is the leading basis in the equation). Similarly, if basis i does not appear, then it must correspond to a 0 bit in the value. This allows us to build a mask of where the leading bit for each basis element can be.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">We now switch to dynamic programming to complete the count. We count the number of ways to assign the first B basis elements using the first C value bits. Clearly dp[0][C] = 1. To add new basis, we can consider the possible positions of its leading bit (within the low C bits), and then count the degrees of freedom for the remaining bits. If the basis is fixed then there is 1 degree of freedom; otherwise, there is a degree of freedom for each bit that isn't the leading bit of a previous basis, and it is easy to count these.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">There are a few details that remain to do with checking whether there are an infinite number of solutions (which can only happen if the largest basis is unconstrained), and distinguishing between a true 0 and a 0 that is actually a multiple of 1000000007, but those are left as exercises for the reader.</div></div>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-31847281.post-76213296937327593642016-11-20T09:10:00.005-08:002016-11-20T09:15:22.277-08:00Topcoder Open 2016 Semifinal 1<div dir="ltr" style="text-align: left;" trbidi="on">Here are my solutions to the first semifinal of the TCO 2016 algorithm contest.<br /><h3 style="text-align: left;">Easy</h3><div style="text-align: left;">Let's start with some easy cases. If s = t, then the answer is obviously zero. Otherwise, it is at least 1. If s and t share a common factor, then the answer is 1, otherwise it is at least 2.<br /></div><div style="text-align: left;">We can eliminate some cases by noting that some vertices are isolated. Specifically, if either s or t is 1 or is a prime greater than n/2 (and s≠t) then there is no solution.<br /></div><div style="text-align: left;">Assuming none of the cases above apply, can the answer be exactly 2? This would mean a path s → x → t, where x must have a common factor with s and a (different) common factor with t. The smallest x can be is thus the product of the smallest prime factors of s and t. If this is at most n, then we have a solution of length 2, otherwise the answer is at least 3.<br /></div><div style="text-align: left;">If none of the above cases applies, then answer is in fact 3. Let p be the smallest prime factor of s and q be the smallest prime factor of t. Then the path s → 2p → 2q → t works (note that 2p and 2q are at most n because we eliminated cases where s or t is a prime greater than n/2 above).</div><h3 style="text-align: left;">Medium</h3><div style="text-align: left;">We can create a graph with the desired properties recursively. If n is 1, then we need just two vertices, source and sink, with an edge from source to sink of arbitrary weight.<br /></div><div style="text-align: left;">If n is even, we can start with a graph with n/2 min cuts and augment it. We add a new vertex X and edges source → X and X → sink, both of weight 1. For every min cut of the original graph, X can be placed on either side of the cut to make a min cut for the new graph.<br /></div><div style="text-align: left;">If n is odd, we can start with a graph with n - 1 min cuts and augment it. Let the cost of the min cut of the old graph be C. We create a new source, and connect it to the original source with an edge of weight C. The cost of the min cut of this new graph is again C (this can easily be seen by considering the effect on the max flow rather than the min cut). There are two ways to achieve a cut of cost C: either cut the new edge only, or make a min cut on the original graph.<br /></div><div style="text-align: left;">We just need to check that this will not produce too many vertices. In the binary representation of n, we need a vertex per bit and a vertex per 1 bit. n can have at most 10 bits and at most 9 one bits, so we need at most 19 vertices.</div><h3 style="text-align: left;">Hard</h3><div style="text-align: left;">I made a complete mess of this during the contest, solving the unweighted version and then trying to retrofit it to handle a weighted version. That doesn't work.<br /><br />Here's how to do it, which I worked out based on some hints from tourist and coded at 3am while failing to sleep due to jetlag. Firstly, we'll make a change that makes some of the later steps a little easier: for each pair of adjacent vertices, add an edge with weight ∞. This won't violate the nesting property, and guarantees that there is always a solution. Let the <i>length</i> of an edge be the difference in indices of the vertices (as opposed to the <i>cost</i>, which is an input). Call an edge <i>right-maximal</i> if it's the longest edge emanating from a particular vertex, and <i>left-maximal</i> if it is the longest edge terminating at a particular vertex. A useful property is that an edge is always either right-maximal, left-maximal, or both. To see this, imagine an edge that is neither: it must then be nested inside longer edges terminating at each end, but these longer edges would cross, violating the given properties.<br /><br />We can now make the following observations:<br /><ol style="text-align: left;"><li>If x → y is right-maximal and color[x] is used, then color[y] must be used too. This is because the path must pass through x, and after x cannot skip over y.</li><li>If x → y is the only edge leaving x, then the edge will appear in a path if and only if color[x] is used.</li><li>If x → y is right-maximal and x → z is the next-longest edge from x, then x → y is in a path if and only if color[x] is used and color[z] is not used. </li></ol>These statements all apply similarly for left-maximal edges of course. We can consider vertices 0 and n to be of colour 0 in the above, which is a color that must be picked. What is less obvious is that condition 1 (together with its mirror) is sufficient to determine whether a set of colours is admissible. Suppose we have a set of colours that satisfies condition 1, and consider the subset S of vertices whose colours come from this set. We want to be sure that every pair of consecutive vertices in S are linked by an edge. Suppose there is a pair (u, v) which are not. Consider the right-maximal edge from u: by condition 1, it must terminate at some vertex in S, which must then be to the right of v. But the left-maximal edge to v must start to the left of u, and so these two edges cross.<br /><br />We can now use this information to construct a new graph, whose minimum cut will give us the cost of the shortest path. It has a vertex per colour, a source vertex corresponding to the special colour 0, and a sink vertex not corresponding to a colour. The vertices on the source side of the cut correspond to the colours that are visited. Every time colour C implies colour D, add an edge C → D of weight ∞, to prevent C being picked while D is not. If an edge falls into case 2 above, add an edge from color[x] to the sink with weight equal to the edge cost. Otherwise it falls into case 3; add an edge from color[x] to color[y] with weight equal to the edge cost. Each edge in the original graph now corresponds to an edge in this new graph, and the original edge is traversed if and only if the new edge forms part of the cut.</div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-81301064002682603772016-06-13T01:44:00.003-07:002016-06-13T01:44:48.519-07:00Code Jam and Distributed Code Jam 2016 solutions<div dir="ltr" style="text-align: left;" trbidi="on"><h2 style="text-align: left;">Code Jam</h2><div style="text-align: left;">Since the official analysis page still says "coming soon", I thought I'd post my solutions to both the Code Jam and Distributed Code Jam here.</div><h3 style="text-align: left;">Teaching Assistant</h3><div style="text-align: left;">We can start by observing that when we take a new task, it might as well match the mood of the assistant: doing so will give us at least 5 points, not doing so will give us at most 5. Also, there is no point in getting to the end of the course with unsubmitted assignments: if we did, we should not have taken that assignment, instead submitting a previous assignment (a parity argument proves that there was a previous assignment). So what we're looking for is a way to pair up days of the course, obeying proper nesting, scoring 10 points if the paired days have the same mood and 5 points otherwise.<br /><br />That's all we need to start on the small case, which can be done with a standard DP. For every even-length interval of the course, we compute the maximum score. We iterate to pick the day to match up to the first day of the interval, then use previous results to find the best scores for the two subintervals this generates.<br /><br />That's O(N³), which is too slow for the large case. It turns out that a greedy approach works: every day, if the mood matches the top of the stack, submit, otherwise take a new assignment. The exception is that once the stack height matches the number of remaining days, one must submit every day to ensure that the stack is emptied.<br /><br />I haven't quite got my head around a proof, but since it worked on the test case I had for the small input I went ahead and submitted it.<br /><h3 style="text-align: left;">Forest university</h3><div style="text-align: left;">The very weak accuracy requirement, and the attention drawn to it, is a strong hint that the answer should be found by simulation rather than analytically. Thus, we need to find a way to uniformly sample a topological walk of the forest. This is not as simple as always uniformly picking one of the available courses. Consider one tree in the forest: it can be traversed in some number of ways (all of them starting with the root of the tree), and the rest of the forest can be traversed in some number of ways, and then these can be interleaved arbitrarily. If we consider all ways to interleave A items and B items, then A/(A+B) of them will start with an element of A. Thus, the probability that the root of a particular tree will be chosen is proportional to the size of that tree.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">After picking the first element, the available courses again form a forest, and one can continue. After uniformly sampling a sequence of courses, simply check which of the cool words appear on the hat.</div><h3 style="text-align: left;">Rebel against the empire</h3><div style="text-align: left;">I consider this problem to be quite a bit harder than D. It's easier to see what to do, but reams of tricky code.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">For the small case, time is not an issue because the asteroids are static. Thus, one just needs to find the bottleneck distance to connect asteroids 0 and 1. I did this by adding edges from shortest to longest to a union-find structure until 0 and 1 were connected (ala Kruskal's algorithm), but it can also be done by priority-first search (Prim's algorithm) or by binary searching for the answer and then checking connectivity.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">For the large case we'll take that last approach, binary searching over the answers and then checking whether it is possible. Naturally, we will need to know at which times it's possible to make jumps between each pair of asteroids. This is a bit of geometry/algebra, which gives a window of time for each pair (possibly empty, possibly infinite). Now consider a particular asteroid A. In any contiguous period of time during which at least one window is open, it's possible to remain on A, regardless of S, by simply jumping away and immediately back any time security are about to catch up. Also, if two of these intervals are separated in time by at most S, they can be treated as one interval, because one can sit tight on A during the gap between windows. On the other hand, any period longer than S with no open windows is of no use.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The key is to ask for the earliest time at which one can arrive at each interval. This can be done with a minor modification of Dijkstra's algorithm. The modification is that the outgoing edges from an interval are only those windows which have not already closed by the time of arrival at the interval.</div><h3 style="text-align: left;">Go++</h3><div style="text-align: left;">I liked this problem, and I really wish the large had been worth fewer points, because I might then have taken it on and solved it.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The key is to realise that the good strings don't really matter. Apart from the trivial case of the bad string also being a good string, it is always possible to solve the problem by producing a pair of programs that can produce <i>every</i> string apart from the bad one.</div><div style="text-align: left;"><br />For the small case, we can use 0?0?0?0? (N repetitions) for the first program, and 111 (N-1 repetitions) for the second (but be careful of the special case N=1!) Each of the 1's can be interleaved once into the first program to produce a 1 in the output, but because there are only N-1 1's, it is impossible to produce the bad string.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">For the large case, the same idea works, but we need something a bit more cunning. The first program is basically the same: alternating digits and ?'s, with the digits forming the complement of the bad string. To allow all strings but the bad string to be output, we want the second program to be a string which does not contain the bad string as a subsequence, but which contains every subsequence of the bad string as a subsequence. This can be achieved by replacing every 1 in the bad string by 01 and every 0 by 10, concatenating them all together, then dropping the last character. Proving that this works is left as an exercise for the reader.</div><h2 style="text-align: left;">Distributed Code Jam</h2><h3 style="text-align: left;">Again</h3><div style="text-align: left;">The code appears to be summing up every product of an element in A with an element in B. However, closer examination shows that those where the sum of the indices is a multiple of M (being the number of nodes) are omitted. However, once we pick an index modulo M for each, we're either adding all or none. The sum of all products of pairs is the same as the product of the sums. So, for each remainder modulo M, we can add up all elements of A, and all elements of B. Then, for each pair of remainders, we either multiply these sums together and accumulate into the grand total, or not. There are some details regarding the modulo 1000000007, but they're not difficult.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">For the large case, we obviously need to distribute this, which can be done by computing each sum on a different node, then sending all the sums back to the master.</div><h3 style="text-align: left;">lisp_plus_plus</h3><div style="text-align: left;">Here, a valid lisp program is just a standard bracket sequence (although it is required to be non-empty). It's well-known that a sequence is valid if and only if:</div><ul style="text-align: left;"><li>the nesting level (number of left brackets minus number of right brackets) of every prefix is non-negative; and</li><li>the nesting level of the whole sequence is zero.</li></ul></div><div style="text-align: left;">If the nesting level ever goes negative in a left-to-right scan, then the point just before it went negative is the longest sequence that can be completed. This makes the small case trivial to implement.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">To distribute this, we can do the usual thing of assigning each node an interval to work on. We can use a <i>parallel prefix sum</i> to find the nesting level at every point (there is lots of information on the internet about how to do this). Then, each node can find it's first negative nesting level, and send this back to a master to find the globally first one. We also need to know the total nesting level by sending the sum for each interval to the master, but that's already done as part of the parallel prefix sum.</div><div style="text-align: left;"><h4 style="text-align: left;">Asteroids</h4><div style="text-align: left;">I liked this one better than the Code Jam asteroids problem, but it was also rather fiddly. The small case is a straightforward serial DP: going from bottom to top, compute the maximum sum possible when ending one's turn on each location.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">At first I thought that this could be parallelised in the same manner as Rocks or Mutex from last year's problems. However, those had only bottom-up and left-to-right propagation of DP state. In this problem, we have bottom-up, left-to-right <i>and</i> right-to-left. On the other hand, the left-to-right and right-to-left propagation is slow: at one most unit for every unit upwards. We can use this!</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Each node will be responsible for a range of columns. However, let's say it starts with knowledge of the prior state for B extra columns on either side. Then after one row of DP iteration, it will correctly know the state for B-1 extra columns, and after B iterations it will still have the correct information for the columns its responsible for. At this stage it needs to exchange information with its neighbours: it will receive information about the B edge columns on either side, allowing it to continue on its way for another B rows.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">There are a few issues involved in picking B. If it's too small (less than 60), then nodes will need to send more than 1000 messages. If it's too large, then nodes will do an excessive number of extra GetPosition queries. Also, B must not be larger than the number of columns a node is handling; in narrow cases we actually need to reduce the number of nodes being used.</div><h3 style="text-align: left;">Gas stations</h3><div style="text-align: left;">The serial version of this problem is a known problem, so I'll just recap a standard solution quickly. At each km, you need to decide how much petrol to add. If there is a town within T km that is cheaper, you should add just enough petrol to reach it (the first one if there are multiple), since any extra you could have waited until you got there. Otherwise, you should fill up completely. A sweep algorithm can give the next cheaper town for every town in linear time.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">What about the large case? The constraint is interesting: if we had enough memory, and if GetGasPrice was faster, there is enough time to do it serially! In fact, we <i>can</i> do it serially, just passing the baton from one node to the next as the car travels between the intervals assigned to each node.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">What about computing the next cheaper station after each station? That would be difficult, but we don't actually need it. We only need to know whether there is a cheaper one within T km. We can start by checking the current node, using a node-local version of the next-cheapest array we had before. If that tells us that there is nothing within the current node, and T km away is beyond the end of the current node, then we will just find the cheapest within T km and check if that is cheaper than the current value. The range will completely overlap some nodes; for those nodes, we can use a precomputed array of the cheapest for each node. There will then be some prefix of another node. We'll just call GetGasPrice to fetch values on this node. Each iteration this prefix will grow by one, so we only need one call to GetGasPrice per iteration (plus an initial prefix). This means we're calling GetGasPrice at most three times per station (on average), which turns out to the fast enough. It's possible to reduce it to 2 by computing the minimum of the initial prefix on the node that already has the data, then sending it to the node that needs it.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">There is one catch, which caused me to make a late resubmission. I was calling GetGasPrice to grow the prefix in the <i>serial</i> part of the code! You need to make all the calls up front in the parallel part of the code and cache the results.</div></div></div>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-31847281.post-43314163602563016682016-03-06T10:52:00.000-08:002016-03-06T10:52:05.508-08:00Solutions to Facebook Hacker Cup 2016<div dir="ltr" style="text-align: left;" trbidi="on"><div><div>Since the official solutions aren't out yet, here are my solutions. Although I scored 0 in the contest, four of my solutions were basically correct, one I was able to solve fairly easily after another contestant pointed out that my approach was wrong, and I figured out Grundy Graphs on the way home.<br /><h3 style="text-align: left;">Snake and Ladder</h3><div style="text-align: left;">I'll describe two solutions to this. There are a few things one can start off with. Firstly, if there are rows at the top or bottom that are completely full of flowers, just cut them off — they make no difference. Then, handle N=1 specially: if K=1, the answer is 1, if K=0, it is 2. A number of people (including all the contest organisers!) messed up the first, and I messed up the second. Also, if one rung has two flowers on it, the answer is 0.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">My contest solution used dynamic programming. If one cuts the ladder half-way between two rungs and considers the state of the bottom half, it can be divided into four cases:</div><ol style="text-align: left;"><li>Only the left-hand side has the snake on it.</li><li>Only the right-hand side has the snake on it.</li><li>Both sides have the snake on it, and the two are connected somewhere below.</li><li>Both sides have the snake on it, and the two do not connect.</li></ol></div>It is reasonably straightforward (as long as one is careful) to determine the dynamic programming transitions to add a rung, for each possibility of a flower on the left, on the right, or no flower. Advancing one rung at a time will be too slow for the bound on N, but the transitions can be expressed as 4×4 matrices and large gaps between flowers can be crossed with fast matrix exponentiation. Finally, one must double the answer to account for the choice of which end is the head.<br /><br />The alternative solution is to argue on a case-by-case basis. If there no flowers, then one can pick a row for the head and a row for the tail, and pick a column for the head (the column for the tail is then forced by parity). Once these choices are made, there is only one way to complete the snake. Also, the head and tail can only occupy the same row if it is the top or bottom row, giving 2N(N-1)+4 ways.<br /><br />If there is at least one flower, then the head must be above the top flower and the tail below the bottom flower (or vice versa). Again, one can choose a row for each, with the column being forced by parity. One must also consider special cases for a flower in the top/bottom row.<br /><h3 style="text-align: left;">Boomerang Crew</h3><div style="text-align: left;">I think this was possibly the easiest of the problems, provided you thought clearly about it (which I failed to do). Clearly any opponents weaker than (or equal to) your champion can be defeated just be putting them up against your champion. Also, if you're going to defeat P players, they might as well be the weakest P players of the opposition. For a given P, can this be done? There will be a certain number of strong players (stronger than our champion), which we will have to sacrifice our weaker players against to wear them down. What's more, once we've worn one down and beaten him/her, we have to use the winner of that game to wear down the next strong opponent. Thus, we should match up the S strong players against our best S players in some order, and use our remaining players as sacrifices. Ideally we'd like to wear down each opponent to exactly the skill level of the player we will use to win; anything more is wasted effort. We can apply this idea greedily: for each opponent, find the remaining strong player of ours who can win by the smallest margin.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">With that method to test a particular P, it is a standard binary search to determine the number of players we can beat.</div><h3 style="text-align: left;">Grundy Graph</h3><div style="text-align: left;">I think this was the hardest problem: my submission wasn't a real solution and was just for a laugh; several other people I spoke to had submitted but didn't believe in their solution. During the contest I suspected that the solution would be related to 2-SAT, but it wasn't until the flight home that I could figure out how to incorporate turn ordering.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Let us construct an auxiliary directed graph A whose vertices correspond to those of the original. An edge u->v in A means that if u is black, then v must be black as well for Alice to win. All the edges in the original graph are clearly present in A. However, for each u->v in the original, we also add v'->u', where x' is the vertex assigned a colour at the same time as x. This is because if v' is black but u' is white, then it implies that u is black and v is white. This is the same as the contrapositive edge in a 2-SAT graph. Let x=>y indicate that y is reachable from x in A.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">There are a number of situations in which we can see that Bob can win:</div><ul style="text-align: left;"><li>If any strongly connected component contains both x and x', then Bob wins regardless of what Alice and Bob play.</li><li>If Bob controls vertices x and y and x=>y, then Bob wins by assigning x black and y white, regardless of Alice's play.</li><li>If Bob controls x, Alice controls y, x=>y, x' => y', and y is played before x, then Bob wins by assigning y the opposite colour to x.</li></ul>We claim that if none of the above apply, then Alice wins. Her strategy is that on her turn, she must colour x black if there is any u such that u => x and either u has already been coloured black, u could later be coloured black by Bob, or u = x'. This ensures that it does not immediately allow Bob to win, nor allows Bob to use it to win later. The only way this could fail is if both x and x' are forced to be black (if neither is forced, Alice can pick freely).<br /><br />Suppose u => x and v => x', where u and v are as described above. Then x => v', so u => v'. Firstly consider u = x'. We cannot have v = x (because otherwise x, x' are in the same SCC). Now x' => v', so v => v'. If Bob controls v, then we hit the second case above; if Alice controls v, then v => v' means she would never have chosen v. So we have a contradiction. Now, if u ≠ x' (and by symmetry, v ≠ x), then consider u => v' again. Alice cannot control v/v', since she would then have chosen v' rather than v. But similarly, v => u', so Alice couldn't have chosen u. It follows that Bob controls u and v, which can only be possible if u = v'. But u and u' can only both be relevant if Bob hasn't decided their colour yet, which is the third case above.<br /><br />The implementation details are largely left as an exercise. Hint: construct the SCCs of A, and the corresponding DAG of the SCCs, then walk it in topological order to check the conditions. The whole thing takes linear time.<br /><h3 style="text-align: left;">RNG</h3><div style="text-align: left;">The concept here is a reasonably straightforward exponential DP, but needs a careful implementation to get all the formulae right, particularly in the face of potential numeric instability. The graph itself isn't particularly relevant; the only "interesting" vertices are the start and the gifts, and the only useful information is the distance from each interesting vertex to each other one (if reachable) and the distance from each gift to the nearest dead-end. The DP state is the set of gifts collected and the current position (one of the interesting vertices). When one is at a gift, there are two possible options:</div><ol style="text-align: left;"><li>Pick another gift, and make towards it along the shortest path. Based on the path length L, there is a probability P^L of reaching it in time DL, and a probability 1 - P^L of respawning, with an expected time computable from a formula.</li><li>Aim to respawn by heading for the nearest dead-end. Again, one can work out a formula for the expected time to respawn.</li></ol></div>When one is at the start, one should of course pick a gift and head for it. Again, one can compute an expected time to reach it (one obtains a formula that includes its own result, but a little algebra sorts that out).<br /><h3 style="text-align: left;">Maximinimax flow</h3><div style="text-align: left;">I think I liked this problem the best, even though I screwed up the limits on the binary search. Firstly, what is the minimax flow? For a general graph it would be nasty to compute, but this graph has the same number of edges as vertices. This means that it has exactly one cycle with trees hanging off of it. It shouldn't be too hard to convince yourself that the minimax flow is the smaller of the smallest edge outside the cycle and the sum of the two smallest edges inside the cycle.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Veterans will immediately reach for a binary search, testing whether it is possible to raise the minimax flow to a given level. For the edges outside the ring, this is a somewhat standard problem that can be solved with query structures e.g. Fenwick trees, indexed by edge value. To raise the minimum to a given level, one needs to know the number of edges below that level (one Fenwick tree, storing a 1 for each edge), and their current sum (a second tree, storing the edge weight for each edge).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">For the vertices inside the cycle it is only slightly more complex. If the required level is less than double the second-smallest edge, then one need only augment the smallest edge. Otherwise one raises all edges to half the target, with a bit of special-casing when the target is odd. The Fenwick tree can also be queried to find the two smallest edges, but I just added a std::multiset to look this up.</div><h3 style="text-align: left;">Rainbow strings</h3><div style="text-align: left;">This seemed to be one of the most-solved problems, and is possibly the easiest if you have a suffix tree/array routine in your toolbox. A right-to-left sweep will yield the shortest and longest possible substring for each starting position (next green for the shortest, next red for the longest). The trick is to process queries in order of length. Each entry in the suffix array will become usable at some length, and unusable again at another length, and these events, along with the queries, can be sorted by length for processing. The only remaining work is to determine the Kth of the active suffixes, which can be done with a Fenwick tree or other query structure.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Finally, one needs the frequency table for each prefix to be able to quickly count the frequency in each substring.</div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-72473165469976218642015-05-20T13:58:00.001-07:002015-05-21T06:52:37.720-07:00Analysis of ICPC 2015 world finals problems<div dir="ltr" style="text-align: left;" trbidi="on"><h2 style="text-align: left;">Analysis of ICPC 2015 world finals problems</h2><div style="text-align: left;">I haven't solved all the problems yet, but here are solutions to those I have solved (I've spent quite a bit more than 5 hours on it though).</div><h3 style="text-align: left;">A: Amalgamated Artichokes</h3><div style="text-align: left;">This is a trivial problem that basically everybody solved - I won't go into it.</div><h3 style="text-align: left;">B: Asteroids</h3><div style="text-align: left;">This looks quite nasty, but it can actually be solved by assembling a number of fairly standard computational geometry tools, and the example input was very helpful. To simplify things, let's switch to the frame of reference of the first asteroid, and assume only the second asteroid moves (call them P and Q for short). The first thing to notice is that the area is a piecewise quadratic function of time, with changes when a vertex of one polygon crosses an edge of the other. This is because the derivative of area depends on the sections of boundary of Q inside P, and those vary linearly in those intervals. Finding the boundaries between intervals is a standard line-line intersection problem. To distinguish between zero and "never", touching is considered to be an intersection too.</div><div style="text-align: left;">We also need a way to compute the area of the intersection. Clipping a convex polygon to a half-space is a standard problem too - vertices are classified as inside or outside, and edges that go from inside to outside or vice versa introduce a new interpolated vertex, so repeated clipping will give the intersection of convex polygons. And finding the area of a polygon is also very standard.</div><div style="text-align: left;">Finally, we need to handle the case where the best area occurs midway through one of the quadratic pieces. Rather than try to figure out the quadratic coefficients geometrically, I just sampled it at three points (start, middle, end) and solved for the coefficients, and hence the maximum, with algebra.</div><h3 style="text-align: left;">C: Catering</h3><div style="text-align: left;">I'm sure I'm seen something similar, but I'm not sure where. It can be set up as a min-cost max-flow problem. Split every site into two, an arrival area and a departure area. Connect the source to each departure area, with cap 1 for clients and cap K for the company. Connect each arrival area to the sink similarly. Connect every departure area to every later arrival area with cap 1, except for the company->company link with cap K. Edge costs match the table of costs where appropriate, zero elsewhere. The maximum flow is now K+N (K teams leave the company, and equipment arrives at and leaves every client), and the minimum cost is the cost of the operation.</div><h3 style="text-align: left;">D: Cutting cheese</h3><div style="text-align: left;">This is basically just a binary search for each cut position, plus some mathematics to give the volume of a sphere that has been cut by a plane.</div><h3 style="text-align: left;">E: Parallel evolution</h3><div style="text-align: left;">Start by sorting the strings by length. A simplistic view is that we need to go through these strings in order, assigning each to one path or the other. This can be done with fairly standard dynamic programming, but it will be far too slow.<br />Add an "edge" from each string to the following one if it is a subsequence of this following one. This will break the strings up into connected chains. A key observation is that when assigning the strings in a chain, it is never necessary to switch sides more than once: if two elements in a chain are on one path, one can always put all the intervening elements on the same path without invalidating a solution. Thus, one need only consider two cases for a chain: put all elements in one path (the opposite path to the last element of the previous chain), or start the chain on one path then switch to the other path part-way through. In the latter case, one should make the switch as early as possible (given previous chains), to make it easier to place subsequent strings. A useful feature is that in either case, we know the last element in both paths, which is all that matters for placing future chains. Dynamic programming takes care of deciding which option to use. Only a linear number of subsequence queries is required.<br />I really liked this problem, even though the implementation in messy. It depends only on the compatibility relationship being a partial order and there being a cheap way to find a topological sort.</div><h3 style="text-align: left;">F: Keyboarding</h3><div style="text-align: left;">This can be done with a fairly standard BFS. The graph has a state for each cursor position and number of completed letters. There are up to 5 edges from each state, corresponding to the 5 buttons. I precomputed where each arrow key moves to from each grid position, but I don't know if that is needed. It's also irrelevant that keys form connected regions.<br /><h3 style="text-align: left;">H: Qanat</h3><div style="text-align: left;">This is a good one for a mathsy person to work out on paper while someone else is coding - the code itself is trivial. Firstly, the slope being less than one implies that dirt from the channel always goes out one of the two nearest shafts - whichever gets it to the surface quicker (it helps to think of there being a zero-height shaft at x=0). Between each pair of shafts, one can easily find the crossover point.</div><div style="text-align: left;">Consider just three consecutive shafts, and the cost to excavate them and the section of channel between them. If the outer shafts are fixed, the cost is a quadratic in the position of the middle shaft, from which one can derive a formula for its position in terms of its neighbours. This gives a relationship of the form x<sub>i+2</sub> - gx<sub>i+1</sub> + x<sub>i</sub> = 0. In theory, this can now be used to iteratively find the positions of all the shafts, up to a scale factor which is solved by the requirement for the final shaft (the mother well) to be at x=W.<br />I haven't tested it, but I think evaluating the recurrence could lead to catastrophic rounding problems, because errors introduced early on are exponentially scaled up, faster than the sequence itself grows. The alternative I used is to find a closed formula for the recurrence. This is a known result: let a and b be the roots of x<sup>2</sup> - gx + 1 = 0; then the ith term is r(a<sup>i</sup> - b<sup>i</sup>) for the scale factor r. Finding the formula for g shows that the roots will always be real (and distinct) rather than complex.</div></div><h3 style="text-align: left;">I: Ship traffic</h3><div style="text-align: left;">This mostly needed some careful implementation. For each ship, there is an interval during which the ferry cannot launch. Sort the start and end times of all these intervals and sweeping through them, keeping track of how many ships will be hit for each interval between events. The longest such interval with zero collisions is the answer.</div><h3 style="text-align: left;">J: Tile cutting</h3><div style="text-align: left;">Firstly, which sizes can be cut, and in how many ways? If the lengths from the corners of the parallelograms to the corners of the tile are a, b, c, d, then the area is ab + cd. So the number of ways to make a size is the number of ways it can be written as ab + cd, for a, b, c, d > 0. The number of ways to write a number as ab can be computed by brute force (iterate over all values of a and b for which ab <= 500000). The number of ways to write a number as ab + cd is the convolution of this function with itself (ala polynomial multiplication). There are well-known algorithms to do this in O(N log N) time (with an FFT) or in O(N<sup>1.58</sup>) time (divide-and-conquer), and either is fast enough.<br /><h3 style="text-align: left;">K: Tours</h3><div style="text-align: left;">I've got a solution that passes, but I don't think I can fully prove why.<br />I start by running a DFS to decompose the graph into a forest plus edges from a node to an ancestor in the forest. Each such "up" edge corresponds to a simple cycle. Clearly, the number of companies must divide into the length of any cycle, so we can take the GCD of these cycle lengths.<br />If the cycles were all disjoint, this would (I think) be sufficient, but overlapping cycles are a problem. Suppose two cycles overlap. That means there are two points A and B, with three independent paths between them, say X, Y, Z. If X has more than its share of routes from some company, then Y must have less than its share, to balance the tour X-Y. Similarly, Z must have less than its share. But then the tour Y-Z has too few routes from that company. It follows that X, Y and Z must all have equal numbers of routes from each company, and hence the number of companies must divide into each of their lengths.<br />And now for the bit I can't prove: it seems to be sufficient to consider only pairs of the simple cycles corresponding to using a single up edge. I just iterate over all pairs and compute the overlap. That is itself not completely trivial, requiring an acceleration structure to compute kth ancestors and least common ancestors.</div><h3 style="text-align: left;">L: Weather report</h3><div style="text-align: left;">This is an interesting variation of a classic result in compression theory, Hamming codes. The standard way to construct an optimal prefix-free code from a frequency table is to keep a priority queue of coding trees, and repeatedly combine the two least frequent items by giving them a common parent. That can't be done directly here because there are trillions of possible strings to code, but it can be done indirectly by noting that permutations have identical frequency, and storing them as a group rather than individual items. The actions of the original algorithm then need to be simulated, pairing up large numbers of identical items in a single operation.</div></div></div>Unknownnoreply@blogger.com5tag:blogger.com,1999:blog-31847281.post-8725731193559615342014-10-26T02:42:00.001-07:002014-10-26T02:42:54.172-07:002014 NEERC Southern Subregional<div dir="ltr" style="text-align: left;" trbidi="on">The ICPC NEERC South Subregional was mirrored on Codeforces. It was a very nice contest, with some approachable but also challenging problems. Here are my thoughts on the solutions (I solved everything except A, J and L during the contest).<br /><h3 style="text-align: left;">A: Nasta Rabbara</h3><div style="text-align: left;">This is quite a nasty one, and my initial attempt during the contest was all wrong. The first thing to be aware of is that a single query can be answered in O(L log L) time (or less) using a modification on the standard union-find data structure: each edge in the union-find structure is labelled to indicate whether end-points have the same or the opposite parity, and the find operation tracks whether the returned root has the same or opposite parity as the query point. That way, edges that create cycles can be found to be either odd- or even-length cycles. Of course, this won't be fast enough if all the queries are large.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Ideally, one would like a data structure that allows both new edges to be added and existing edges to be removed. That would allow for a sliding-window approach, in which we identify the maximal left end-point for each right end-point. However, the 10 second time limit suggests that this is not the right approach.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Instead, there is a \(O(N+(M+Q)\sqrt{M}\log N)\) solution. Dividing the series into blocks of length \(\sqrt{M}\). For each block, identify all the queries with a right end-point inside the block. Now build up a union-find structure going right-to-left, starting from the left edge of the block. Whenever you hit the left end-point of one of the identified queries, add the remaining episodes for the query (which will all come from inside the block) to answer the query, then undo the effect of these extra episodes before continuing. As long as you don't do path compression, each union operation can be unwound in O(1) time. This will miss queries that occur entirely inside a block, but these can be answered by the algorithm from the first paragraph as they are short.</div><h3 style="text-align: left;">B: Colored blankets</h3><div style="text-align: left;">It turns out that it is always possible to find a solution. Firstly, blankets with no colour can be given an arbitrary colour on one side. It is fairly easy to see that we need only allocate blankets to kits such that each kit contains blankets of at most two colours. Repeat the following until completion:</div><ol style="text-align: left;"><li>If any colour has at most K/N blankets remaining and that colour has not been painted onto any kit, put those blankets into a new kit and paint it with that colour (this might involve zero blankets into that kit).</li><li>Otherwise, pick any colour that has not been painted on a kit. There must be a more than K/N blankets of that colour. Use enough of them to fill up any non-full painted kit. There must be such a kit, otherwise there are more than K blankets in total.</li></ol>Since each kit is painted with a unique colour, this generates at most N kits; and since each kit has K/N blankets in it, it must generate exactly N kits.<br /><h3 style="text-align: left;">C: Component tree</h3><div style="text-align: left;">The only difficulty with this problem is that the tree might be very deep, causing the naive algorithm to spend a lot of time walking up the tree. This can be solved with a heavy-light decomposition. On each heavy path, for each attribute that appears anywhere on the path, store a sorted list (by depth) of the nodes containing that attribute. When arriving on a heavy path during a walk, a binary search can tell where the nearest ancestor with that property occurs on the heavy path. I think this makes each query O(log N) time.</div><h3 style="text-align: left;">D: Data center</h3><div style="text-align: left;">This is a reasonably straightforward sliding window problem. Sort the servers of each type, and start with the minimum number of low voltage servers (obviously taking biggest first). One might also be required to take all the low voltage servers plus some high voltage servers. Then remove the low voltage servers one at a time (smallest first), and after each removal, add high voltage servers (largest first) until the capacity is made up. Then compare this combination to the best solution so far.</div><h3 style="text-align: left;">E: Election</h3><div style="text-align: left;">Firstly, ties can be treated as losses (both within a station and in the election), because we want the mayor to win. When merging two stations, there are two useful results that can occur: win+loss -> win, or loss+loss -> loss; in both cases the number of wins stays the same, while the number of losses goes down by one. So they are equally useful. We can determine the number of viable merges by DP: either the last two can be merged, and we solve for the other N-2; or the last one is untouched, and we solve for the other N-1.</div><h3 style="text-align: left;">F: Ilya Muromets</h3><div style="text-align: left;">Note that the gap closing up after a cut is just a distraction: any set of heads we can cut, can also be cut as two independent cuts of the original set of heads.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">We can determine the best cut for every prefixes in linear time, by keeping a sliding window of sums (or using a prefix sum). Similarly, we can determine the best cut for every suffix. Any pair of cuts can be separated into a cut of a prefix and of the corresponding suffix, so we need only consider each split point in turn.</div><h3 style="text-align: left;">G: FacePalm</h3><div style="text-align: left;">Let's consider each k contiguous days in turn, going left to right. If the current sum is non-negative, we need to reduce some of the values. We might as well reduce the right-most value we can, since that will have the effect on as many future values as possible (past values have already been fixed to be negative, so that is not worth considering). So we reduce the last value as much as needed, or until it reaches the lower limit. If necessary, we then reduce the second-last value, and so on until the sum is negative.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The only catch is that we need a way to quickly skip over long sequences of the minimum value, to avoid quadratic running time. I kept a cache of previous non-minimum value (similar to path compression in union-find structures); a stack of the positions of non-minimum values should work too.</div><h3 style="text-align: left;">H: Minimal Agapov code</h3><div style="text-align: left;">The first triangle will clearly consist of the minimum three labels. This divides the polygon into (up to) three sections. Within each section, the next triangle will always stand on the existing diagonal, with the third vertex being the lowest label in the section. This will recursively subdivide the polygon into two more sections with one diagonal, and so on. One catch is that ties need to be broken carefully: pick the one furthest from the base with the smaller label (this lead to me getting a WA). Rather than considering all the tie-breaking cases for the first triangle, I started with a first diagonal with the two smallest labels, and found the first triangle through the recursion.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The main tool needed for this is an efficient range minimum query. There are a number of data structures for this, and any of them should work. I used two RMQ structures to cater for the two possible tie-breaking directions. The cyclic rather than linear nature of the queries, but it is just a matter of being careful.</div><h3 style="text-align: left;">I: Sale in GameStore</h3><div style="text-align: left;">This was the trivial problem: sort, get your friends to buy the most expensive item, start filling up on cheap items until you can't buy any more or you've bought everything.</div><h3 style="text-align: left;">J: Getting Ready for the VIPC</h3><div style="text-align: left;">I got a wrong answer to test case 53 on this, and I still don't know why. But I think my idea is sound.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The basic approach is dynamic programming, where one computes the minimum tiredness one can have after completing each contest, assuming it is possible to complete it (this also determines the resulting skill). To avoid some corner cases, I banned entering a contest with tiredness greater than \(h_i - l_i\). However, this is \(O(N^2)\), because for each contest, one must consider all possible previous contests.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The first optimisation one can do is that one can make a list of outcomes for each day, assuming one enters a contest that day: a list of (result skill, tiredness), one for each contest. If one contest results in both less skill and more tiredness than another, it can be pruned, so that one ends up with a list that increases in both skill and tiredness. Now one can compute the DP for a contest by considering only each previous day, and finding the minimum element in the list for which the skill in great enough to enter the current contest. The search can be done by binary search, so if there are few distinct days with lots of contests each day, this will be efficient; but if every contest is on a different day, we're no better off.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">The second optimisation is to note the interesting decay function for tiredness. After about 20 days of inactivity, tireness is guaranteed to reach 0. Thus, there is no need to look more than this distance into the past: beyond 20 days, we only care about the maximum skill that can be reached on that day, regardless of how tired one is. This reduces the cost to \(O(N\log N\log maxT\). </div><h3 style="text-align: left;">K: Treeland</h3><div style="text-align: left;">Pick any vertex and take its nearest neighbour: this is guaranteed to be an edge; call the end-points A and B. An edge in a tree partitions the rest of the tree into two parts. For any vertex C, either d(A, C) < d(B, C) or vice versa, and this tells us which partition C belongs to. We can thus compute the partition, and then recursively solve the problem within each partition.</div><h3 style="text-align: left;">L: Useful roads</h3><div style="text-align: left;">I didn't solve this during the contest, and I don't know exactly how to solve it yet.</div><h3 style="text-align: left;">M: Variable shadowing</h3><div style="text-align: left;">This is another reasonably straightforward implementation problem. For each variable I keep a stack of declarations, with each declaration tagged with the source position and a "scope id" (each new left brace creates a new scope id). I also keep a stack of open scope ids. When a right brace arrives, I check the top-of-stack for each variable to see if it matches the just closed scope id, and if so, pop the stack.</div></div>Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-31847281.post-56399056529668202882014-07-18T01:54:00.002-07:002014-07-31T12:03:04.609-07:00IOI 2014 day 2 analysis<div dir="ltr" style="text-align: left;" trbidi="on"><div><div><div>I found day 2 much harder than day 1, and I still don't know how to solve all the problems (I am seriously impressed by those getting perfect scores). Here's what I've managed to figure out so far.<br /><br /><b>Update</b>: I've now solved everything (in theory), and the solutions are below. The <a href="http://www.ioi2014.org/index.php/competition/contest-tasks">official solutions</a> are now also available on the IOI website. I'll try coding the solutions at some point if I get time.<br /><h3 style="text-align: left;">Gondola</h3><div style="text-align: left;">This was the easiest of the three. Firstly, what makes a valid gondola sequence? In all the subtasks of this problem, there will be two cases. If you see any of the numbers 1 to n, that immediately locks in the phase, and tells you the original gondola for every position. Otherwise, the phase is unknown. So, the constraints are that</div><ul style="text-align: left;"><li>if the phase is known, every gondola up to n must appear in the correct spot if it appears;</li><li>no two gondolas can have the same number.</li></ul></div>Now we can consider how to construct a replacement sequence (and also to count them), which also shows that these conditions are sufficient. If the phase is not locked, pick it arbitrarily. Now the "new gondola" column is simply the numbers from n+1 up to the largest gondola, so picking a replacement sequence is equivalent to deciding which gondola replaces each broken gondola. We can assign each gondola greater than n that we can't see to a position (one where the final gondola number is larger), and this will uniquely determine the replacement sequence. We'll call such gondolas <i>hidden</i>.<br /><br />For the middle set of subtasks, the simplest thing is to assign all hidden gondolas to one position, the one with the highest-numbered gondola in the final state. For counting the number of possible replacement sequences, each hidden gondola can be assigned independently, so we just multiply together the number of options, and also remember to multiply by n if the phase is unknown. In the last subtask there are too many hidden gondolas to deal with one at a time, but they can be handled in batches (those between two visible gondolas), using fast exponentiation.<br /><h3>Friend</h3><div style="text-align: left;">This is a weighted maximum independent set problem. On a general graph this is NP-hard, so we will need to exploit the curious way in which the graph is constructed. I haven't figured out how to solve the whole problem, but let's work through the subtasks:</div><ol style="text-align: left;"><li>This is small enough to use brute force (consider all subsets and check whether they are independent).</li><li>The graph will be empty, so the sample can consist of everyone. </li><li>The graph will be complete, so only one person can be picked in a sample. Pick the best one.</li><li>The graph will be a tree. There is a fairly standard tree DP to handle this case: for every subtree, compute the best answer, either with the root excluded or included. If the root is included, add up the root-excluded answers for every subtree; otherwise add up the best of the two for every subtree. This takes linear time.</li><li>In this case the graph is bipartite and the vertices are unweighted. This is a standard problem which can be solved by finding the maximum bipartite matching. The relatively simple flow-based algorithm for this is theoretically \(O(n^3)\), but it is one of those algorithms that tends to run much faster in most cases, so it may well be sufficient here.</li></ol></div>The final test-case clearly requires a different approach, since n can be much larger. I only managed to figure this out after getting a big hint from the SA team leader, who had seen the official solution.<br /><br />We will process the operations in reverse order. For each operation, we will transform the graph into one that omits the new person, but for which the optimal solution has the same score. Let's say that the last operation had A as the host and B as the invitee, and consider the different cases:<br /><ul style="text-align: left;"><li>YourFriendsAreMyFriends: this is the simplest: any solution using B can also use A, and vice versa. So we can collapse the two vertices into one whose weight is the sum of the original weights, and use it to replace A.</li><li>WeAreYourFriends: this is almost the same, except now we can use at most one of A and B, and which one we take (if either) has no effect on the rest of the graph. So we can replace A with a single vertex having the larger of the two weights, and delete B.</li><li>IAmYourFriend: this is a bit trickier. Let's start with the assumption that B will form part of the sample, and add that to the output value before deleting it. However, if we later decide to use A, there will be a cost to remove B again; so A's weight <i>decreases</i> by the weight of B. If it ends up with negative weight, we can just clamp it to 0.</li></ul></div>Repeat this deletion process until only the original vertex is left; the answer will be the weight of this vertex, plus the saved-up weights from the IAmYourFriend steps.<br /><div><h3 style="text-align: left;">Holiday</h3><div style="text-align: left;">Consider the left-most and right-most cities that Jian-Jia visits. Regardless of where he stops, he will need to travel from the start city to one of the ends, and from there to the other end. There is no point in doing any other back-tracking, so we can tell how many days he spends travelling just from the end-points. This then tells us how many cities he has time to see attractions in, and obviously we will pick the best cities within the range.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">That's immediately sufficient to solve the first test case. To solve more, we can consider an incremental approach. Fix one end-point, and gradually extend the other end-point, keeping track of the best cities (and their sum) in a priority queue (with the worst of the best cities at the front). As the range is extended, the number of cities that can be visited shrinks, so items will need to be popped. Of course, the next city in the range needs to be added each time as well. Using a binary heap, this gives an \(O(n^2\log n)\) algorithm: a factor of n for each endpoint, and the \(\log n\) for the priority queue operations. That's sufficient for subtask 3. It's also good enough for subtask 2, because the left endpoint will be city 0, saving a factor of n.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">For subtask 4, it is clearly not possible to consider every pair of end-points. Let's try to break things up. Assume (without loss of generality) that we move first left, then back to the start, then right. Let's compute the optimal solution for the left part and the right part separately, then combine them. The catch is that we need to know how we are splitting up our time between the two sides. So we'll need to compute the answer for each side for all possible number of days spent within each side. This seems to leave us no better off, since we're still searching within a two-dimensional space (number of days and endpoint), but it allows us to do some things differently.<br /><br />We'll just consider the right-hand side. The left-hand side is similar, with minor changes because we need two days for travel (there and back) instead of one. Let f(d) be the optimal end-point if we have d days available. Then with a bit of work one can show that f is non-decreasing (provided one is allowed to pick amongst ties). If we find f(d) for d=1, 2, 3, ... in that order, it doesn't really help: we're only, on average, halving the search space. But we can do better by using a divide-and-conquer approach: if we need to find f for all \(d \in [0, D)\) then we start with \(d = \frac{D}{2}\) to subdivide the space, and then recursively process each half of the interval on disjoint subintervals of the cities. This reduces the search space to \(O(n\log n)\).<br /><br />This still leaves the problem of efficiently finding the total number of attractions that can be visited for particular intervals and available days. The official solution uses one approach, based on a segment tree over the cities, sorted by number of attractions rather than position. The approach I found is, I think, simpler. Visualise the recursion described above as a tree; instead of working depth-first (i.e., recursively), we work breadth-first. We make \(O(\log n)\) passes, and in each pass we compute f(d) where d is an odd multiple of \(2^i\) (with \(i\) decreasing with each pass). Each pass can be done in a single incremental process, similar to the way we tackled subpass 2. The difference is that each time we cross into the next subinterval, we need to increase \(d\), and hence bring more cities into consideration. To do this, we need either a second priority queue of excluded cities, or we can replace the priority queue with a balanced binary tree. Within each pass, d can only be incremented \(O(n)\) times, so the total running time will be \(O(n\log n)\) per pass, or \(O(n\log n \log n)\) overall.</div></div></div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-31847281.post-30935558131283065492014-07-16T08:00:00.000-07:002014-07-17T03:41:53.826-07:00IOI 2014 day 1 analysis<div dir="ltr" style="text-align: left;" trbidi="on"><div><h2 style="text-align: left;">IOI 2014 day 1</h2><div style="text-align: left;">Since there is no online judge, I haven't tried actually coding any of these. So these ideas are not validated yet. You can find the problems <a href="http://www.ioi2014.org/index.php/competition/contest-tasks">here</a>.</div><h3 style="text-align: left;">Rails</h3><div style="text-align: left;">I found this the most difficult of the three to figure out, although coding it will not be particularly challenging.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Firstly, we can note that distances are symmetric: a route from A to B can be reflected in the two tracks to give a route from B to A. So having only \(\frac{n(n-1)}{2}\) queries is not a limitation, as we can query all distances. This might be useful in tackling the first three subtasks, but I'll go directly to the hardest subtask.</div><div style="text-align: left;"><br /></div>If we know the position and type of a station, there is one other that we can immediately locate: the closest one. It must have the opposite type and be reached by a direct route. Let station X be the closest to station 0. The other stations can be split into three groups:<br /><ol style="text-align: left;"><li>d(X, Y) < d(X, 0): these are reached directly from station X and of type C, so we can locate them exactly.</li><li>d(0, X) + d(X, Y) = d(0, Y), but not of type 1: these are reached from station 0 via station X, so they lie to the left of station 0.</li><li>All other stations lie to the right of station X.</li></ol>Let's now consider just the stations to the right of X, and see how to place them. Let's take them in increasing distance from 0. This ensures that we encounter all the type D stations in order, and any type C station will be encountered at some point after the type D station used to reach it. Suppose Y is the right-most type D station already encountered, and consider the distances for a new station Z. Let \(z = d(0, Z) - d(0, Y) - d(Y, Z)\). If Z is type C, then there must be a type D at distance \(\frac{z}{2}\) to the left of Y. On the other hand, if Z is of type D (and lies to the right of Y), then there must be a type C station at distance \(\frac{z}{2}\) to the left of Y. In the first case, we will already have encountered the station, so we can always distinguish the two cases, and hence determine the position and type of Z.</div><div></div><div>The stations to the left of station zero can be handled similarly, using station X as the reference point instead of station 0.</div><div></div><div>How many queries is this? Every station Z except 0 and X accounts for at most three queries: d(0, Z), d(X, Z) and d(Y, Z), where Y can be different for each Z. This gives \(3(n-2) + 1\), which I think can be improved to \(3(n-2)\) just by counting more carefully. Either way, it is sufficient to solve all the subtasks.<br /><h3 style="text-align: left;">Wall </h3></div><div style="text-align: left;">This is a fairly standard interval tree structure problem, similar to Mountain from IOI 2005 (but a little easier). Each node of the tree contains a range to which its children are clamped. To determine the value of any element of the wall, start at the leaf with a value of 0 and read up the tree, clamping the value to the range in each node in turn. Initially, each node has the range [0, inf). When applying a new instruction, it is done top-down, and clamps are pushed down the tree whenever recursion is necessary.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">An interesting aspect of the problem is that it is offline, in that only the final configuration is requested and all the information is provided up-front. This makes me think that there may be an alternative solution that processes the data in a different order, but I can't immediately see a nicer solution than the one above. </div><h3 style="text-align: left;">Game</h3><div style="text-align: left;">I liked this problem, partly because I could reverse-engineer a solution from the assumption that it is always possible to win, and partly because it requires neither much algorithm/data-structure training (like Wall) nor tricky consideration of cases (like Rails). Suppose Mei-Yu knows that certain cities are connected. If there are any flights between the cities that she has not asked about, then she can win simply by saving one of these flights for last, since it will not affect whether the country is connected. It follows that for Jian-Jia to win, he must always answer no when asked about a flight between two components that Mei-Yu does not know to be connected, <i>unless</i> this is the last flight between these components?</div><div style="text-align: left;"><br /></div><div style="text-align: left;">What if he always answers yes to the last flight between two components? In this case he will win. As long as there are at least two components left, there are uncertain edges between every pair of them, so Mei-Yu can't know whether any of them is connected any other. All edges within a component are known, so the number of components can only become one after the last question.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">What about complexity? We need to keep track of the number of edges between each pair of components, which takes \(O(N^2)\) space. Most operations will just decrement one of these counts. There will be \(N - 1\) component-merging operations, each of which requires a linear-time merging of these edge counts and updating a vertex-to-component table. Thus, the whole algorithm requires \(O(N^2)\) time. This is optimal given that Mei-Yu will ask \(O(N^2)\) questions.</div></div>Unknownnoreply@blogger.com1tag:blogger.com,1999:blog-31847281.post-72428844748534215182014-06-30T09:18:00.000-07:002014-06-30T09:27:26.945-07:00ICPC Problem H: Pachinko<div dir="ltr" style="text-align: left;" trbidi="on">Problem A has been covered quite well <a href="http://keet.wordpress.com/2014/06/28/acm-icpc-2014-solution-to-problem-a-baggage/">elsewhere</a>, so I won't discuss it. That leaves only problem H. I started by reading the Google translation of a <a href="http://fajnezadania.wordpress.com/2014/06/28/pachinko/">Polish writeup</a> by gawry. The automatic translation wasn't very good, but it gave me one or two ideas I borrowed. I don't know how my solution compares in length of code, but I consider it much simpler conceptually.<br /><br />This is a fairly typical Markov process, where a system is in some state, and each timestep it randomly selects one state as a function of the current state. One variation is that the process stops once the ball reaches a target, whereas Markov processes don't terminate. I was initially going to model that as the ball always moving from the target to itself, but things would have become slightly complicated.<br /><br />Gawry has a nice way of making this explicitly a matrix problem. Set up the matrix M as for a Markov process i.e., \(M_{i,j}\) is the probability of a transition from state j to state i. However, for a target state j, we set \(M_{i,j}=0\) for all i. Now if \(b\) is our initial probability vector (equal probability for each empty spot in the first row), then \(M^t b\) represents the probability of the ball being in each position (and the game not having previously finished) after \(t\) timesteps. We can then say that the expected amount of time the ball spends in each position is given by \(\sum_{t=0}^{\infty} M^t b\). The sum of the elements in this vector is the expected length of a game and we're told that it is less than \(10^9\), so we don't need to worry about convergence. However, that doesn't mean that the matrix series itself converges: Gawry points out that if there are unreachable parts of the game with no targets, then the series won't converge. We fix that by doing an initial flood-fill to find all reachable cells and only use those in the matrix. Gawry then shows that under the right conditions, the series converges to \((I - M)^{-1} b\).<br /><br />This is where my solution differs. Gawry dismisses Gaussian elimination, because the matrix can be up to 200,000 square. However, this misses the fact that it is <i>banded</i>: by numbering cells left-to-right then top-to-bottom, we ensure that every non-zero entry in the matrix is at most W cells away from the main diagonal. Gaussian elimination (without pivoting) preserves this property. We can exploit this both to store the matrix compactly, and to perform Gaussian elimination in \(O(W^3H)\) time.<br /><br />One concern is the "without pivoting" caveat. I was slightly surprised that my first submission passed. I think it is possible to prove correctness, however. Gaussian elimination without pivoting is known (and easily provable) to work on strictly column diagonally dominant matrices. In our case the diagonal dominance is weak: columns corresponding to empty cells have a sum of zero, those corresponding to targets have a 1 on the diagonal and zeros elsewhere. However, the matrix is also irreducible, which I think is enough to guarantee that there won't be any division by zero.<br /><br />EDIT: actually it's not guaranteed to be irreducible, because the probabilities can be zero and hence it's possible to get from A to B without being able to get from B to A. But I suspect that it's enough that one can reach a target from every state. </div>Unknownnoreply@blogger.com4tag:blogger.com,1999:blog-31847281.post-10944906129548100282014-06-28T09:59:00.001-07:002014-06-28T12:27:23.563-07:00ICPC Problem L: Wires<div dir="ltr" style="text-align: left;" trbidi="on">While this problem wasn't too conceptually difficult, it requires a <i>lot</i> of code (my solution is about 400 lines), and careful implementation of a number of geometric algorithms. A good chunk of the code comes from implementing a rational number class in order to precisely represent the intersection points of the wires. It is also very easy to suffer from overflow: I spent a long time banging my head against an assertion failure on the server until I upgraded my rational number class to use 128-bit integers everywhere, instead of just for comparisons.<br /><br />The wires will divide the space up into connected regions. The regions can be represented in a planar graph, with edges between regions that share an edge. The problem is then to find the shortest path between the regions containing the two new end-points.<br /><br />My solution works in a number of steps:<br /><ol style="text-align: left;"><li>Find all the intersection points between wires, and the segments between intersection points. This just tests every wire against every other wire. The case of two parallel wires sharing an endpoint needs to be handled carefully. For each line, I sort the intersection points along the line. I used a dot produce for this, which is where my rational number class overflowed, but would probably have been safer to just sort lexicographically. More than two lines can meet at an intersection point, so I used a std::map to assign a unique ID to each intersection point (I'll call them vertices from here on).</li><li>Once the intersection points along a line have been sorted, one can identify the segments connecting them. I create two copies of each segment, one in each direction. With each vertex A I store a list of all segments A->B. Each pair is stored contiguously so that it is trivial to find its partner. Each segment is considered to belong to the region to its left as one travels A->B.</li><li>The segments emanating from each vertex are sorted by angle. These comparisons could easily cause overflows again, but one can use a handy trick: instead of using the vector for the segment in an angle comparison, one can use the vector for the entire wire. It has identical direction but has small integer coordinates.</li><li>Using the sorted lists from the previous step, each segment is given a pointer to its following segment from the same region. In other words, if one is tracing the boundary of the region and one has just traced A->B, the pointer will point to B->C.</li><li>I extract the <i>contours</i> of the regions. A region typically consists of an outer contour and optionally some holes. The outermost region lacks an outer contour (one could add a big square if one needed to, but I didn't). A contour is found by following the next pointers. A case that turns out to be inconvenient later is that some segments might be part of the contour but not enclose any area. This can make a contour disappear completely, in which case it is discarded. Any remaining contours have the property that two adjacent segments are not dual to each other, although it is still possible to both sides of an edge to belong to the same contour.</li><li>Each contour is identified as an outer contour or a hole. With integer coordinates I could just measure the signed area of the polygon, but that gets nasty with rational coordinates. Instead, I pick the lexicographically smallest vertex in the contour and examine the angle between the two incident segments (this is why it is important that there is a non-trivial angle between them). I also sort the contours by this lexicographically smallest vertex, which causes any contour to sort before any other contours it encloses.</li><li>For each segment I add an edge of weight 1 from its containing region to the containing region of its dual.</li><li>For each hole, I search backwards through the other contours to find the smallest non-hole that contains it. I take one vertex of the hole and do a point-in-polygon test. Once again, some care is needed to avoid overflows, and using the vectors for the original wires proves useful. One could then associate the outer contour and the holes it contains into a single region object, but instead I just added an edge to the graph to join them with weight 0. In other words, one can travel from the boundary of a region to the outside of a hole at no cost.</li><li>Finally, I identify the regions containing the endpoints of the new wire, using the same search as in the previous step.</li></ol>After all this, we still need to implement a shortest path search - but by this point that seems almost trivial in comparison.<br /><br />What is the complexity? There can be \(O(M^2)\) intersections and hence also \(O(M^2)\) contours, but only \(O(M)\) of them can be holes (because two holes cannot be part of the same connected component). The slowest part is the fairly straightforward point-in-polygon test which tests each hole against each non-hole segment, giving \(O(M^3)\) time. There are faster algorithms for doing point location queries, so it is probably theoretically possible to reduce this to \(O(M^2\log N)\) or even \(O(M^2)\), but certainly not necessary for this problem.</div>Unknownnoreply@blogger.com3tag:blogger.com,1999:blog-31847281.post-86068129023016434372014-06-28T02:23:00.000-07:002014-06-30T09:18:51.138-07:00ICPC Problem J: Skiing<div dir="ltr" style="text-align: left;" trbidi="on">I'll start by mentioning that there is also some analysis discussion on the <a href="http://apps.topcoder.com/forums/?module=Thread&threadID=823018&start=0&mc=5#1894272">Topcoder forums</a>, which includes alternative solutions that in some cases I think are much nicer than mine.<br /><br />So, on to problem J - which no team solved, and only one team attempted. I'm a little surprised by this, since it's the sort of problem where one can do most of the work on paper while someone else is using the machine. A possible reason is that it's difficult to determine the runtime: I just coded it up and hoped for the best.<br /><br />It is a slightly annoying problem for a few reasons. Firstly, there are a few special cases, because \(v_y\) or \(a\) can be zero. We also have to find the lexicographically smallest answer.<br /><br />Let's deal with those special cases first. If \(v_y = 0\) then we can only reach targets with \(y = 0\). If \(a \not= 0\) then we can reach all of them in any order, otherwise we can only reach those with \(x = 0\). To get the lexicographically smallest answer, just go through them in order and print out those that are reachable. I actually dealt with the \(v_y \not= 0\), \(a = 0\) case as part of my general solution, but is also easy enough to deal with. We can only reach targets with \(x = 0\), and we can only reach them in increasing order of y. One catch is that targets are not guaranteed to have distinct coordinates, so if two targets have the same coordinates we must be careful to visit them in lexicographical order. I handled this just by using a stable sort.<br /><br />So, onwards to the general case. Before we can start doing any high-level optimisation, we need to start by answering this question: if we are at position P with X velocity \(v\) and want to pass through position Q, what possible velocities can we arrive with? I won't prove it formally, but it's not hard to believe that to arrive with the smallest velocity, you should start by accelerating (at the maximum acceleration) for some time, then decelerate for the rest of the time. Let's say that the total time between P and Q is \(T\), the X separation is \(x\), the initial acceleration time is \(T - t\) and the final deceleration time is \(t\). Some basic integration then gives<br />\[x = v(T-t)+\tfrac{1}{2}a(T-t)^2 + (v+a(T-t))t - \tfrac{1}{2}at^2.\]<br />This gives a quadratic in \(t\) where the linear terms conveniently cancel, leaving<br />\[t = \sqrt{\frac{vT+\tfrac{1}{2}aT^2-x}{a}}\]<br />The final velocity is just \(v+aT-2at\). We can also find the largest possible arrival velocity simply by replacing \(a\) with \(-a\). Let's call these min and max velocity functions \(V_l(v, T, x)\) and \(V_h(v, T, x)\).<br /><br />More generally, if we can leave P with a given range of velocities \([v_0, v_1]\), with what velocities can we arrive at Q? We first need to clamp the start range to the range from which we can actually reach Q i.e., that satisfy \(|vT - x| \le \frac{1}{2}aT^2\). A bit of calculus shows that \(V_l\) and \(V_h\) are decreasing functions of \(v\), so the arrival range will be \([V_l(v_1), V_h(v_0)]\).<br /><br />Finally, we are ready to tackle the problem as a whole, with multiple targets. We will use dynamic programming to answer this question, starting from the last target (highest Y): if I start at target <i>i</i> and want to hit a total of <i>j</i> targets, what are the valid X velocities at <i>i</i>? The answer will in general be a set of intervals. We will make a pseudo-target at (0, 0), and the answer will then be the largest <i>j</i> for which 0.0 is a valid velocity at this target.<br /><br />Computing the DP is generally straightforward, except that the low-level queries we need are the reverse of those we discussed above i.e. knowing the arrival velocities we need to compute departure velocities. No problem, this sort of physics is time-reversible, so we just need to be careful about which signs get flipped. For each (<i>i, j</i>) we consider all options for the next target, back-propagate the set of intervals from that next target, and merge them into the answer for (<i>i, j</i>). Of course, for \(j = 1\) we use the interval \((-\infty, \infty)\).<br /><br />The final inconvenience is that we must produce a lexicographical minimum output. Now we will see the advantage of doing the DP in the direction we chose. We build a sequence forwards, keeping track of the velocity interval for our current position. Initially the position will be (0, 0) and the velocity interval will be [0.0, 0.0]. To test whether a next position is valid, we forward-propagate the velocity interval to this next position, and check whether it has non-empty intersection with the set of velocities that would allow us to hit the required number of remaining targets. We then just take the next valid position with the lowest ID.<br /><br />What is the complexity? It could in theory be exponential, because every path through the targets could induce a separate interval in the interval set. However, in reality the intervals merge a lot, and I wouldn't be surprised if there is some way to prove that there can only be a polynomial number of intervals to consider. My solution still ran pretty close to the time limit, though.</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-84951348545507614602014-06-27T02:40:00.000-07:002014-06-27T02:40:42.907-07:00More ICPC analysis<div dir="ltr" style="text-align: left;" trbidi="on"><div>Now we're getting on to the harder problems. Today I've cover two that I didn't know how to solve. Some of the others I have ideas for, but I don't want to say anything until I've had a chance to code them up.<br /><br />Firstly, problem I. This is a maximum clique problem, which on a general graph is NP-complete. So we will need to use the geometry in some way. Misof has a nice set of slides showing how it is done: http://people.ksp.sk/~misof/share/wf_pres_I.pdf. I had the idea for the first step (picking the two points that are furthest apart), but it didn't occur to me that the resulting conflict graph would be bipartite.<br /><br />Now problem G. I discovered that this is a problem that has had a number of research papers published on the topic, one of which achieves \(O(N^3)\). Fortunately, we don't need to be quite that efficient. Let's start by finding a polynomial-time solution. Let's suppose we've already decided the diameters of the two clusters, D(A) and D(B), and just want to find out whether this is actually possible. For each shipment we have a boolean variable that says whether it goes into part A (false) or part B (true). The constraints become boolean expressions: if d(i, j) > D(A) then we must have i || j, and if d(i, j) > D(B) then we must have !i || !j. Determining whether the variables can be satisfied is just 2-SAT, which can be solved in \(O(N^2)\) time.<br /><br />Now, how do we decide which diameters to test? There are \(O(N^2)\) choices for each, so the naive approach will take \(O(N^6)\) time. We can reduce this to \(O(N^4\log N)\) by noting that once we've chosen one, we can binary search the other (it's also possible to eliminate the log factor, but it's still too slow). So far, this is what I deduced during the contest.<br /><br />The trick (which I found in one of those research papers) is that one can eliminate most candidates for the larger diameter. If there is an odd-length cycle, at least one of the edges must be within a cluster, and so that is a lower bound for the larger diameter. What's more, we can ignore the shortest edge of an even cycle (with some tie-breaker), because if the shortest edge lies within a cluster, then so must at least one other edge.<br /><br />We can exploit this to generate an O(N)-sized set of candidates: process the edges from longest to shortest, adding each to a new bipartite graph (as for constructing a maximum spanning tree with Kruskal's algorithm). There are three cases:<br /><ol style="text-align: left;"><li>The next edge connects two previously disconnected components. Add the edge to the graph (which will remain bipartite, since one can always re-colour one of the components. This edge length is a candidate diameter.</li><li>The next edge connects two vertices in the same component, but the graph remains bipartite. This edge is thus part of an even cycle, and so can be ignored.</li><li>The next edge connects two vertices, forming an odd cycle. This edge length is a candidate diameter, and the algorithm terminates.</li></ol></div>The edges selected in step 1 form a tree, so there are only O(N) of them.</div>Unknownnoreply@blogger.com2tag:blogger.com,1999:blog-31847281.post-39471291094526605002014-06-25T11:43:00.004-07:002014-06-28T10:17:31.896-07:00ACM ICPC 2014<div dir="ltr" style="text-align: left;" trbidi="on">ACM ICPC 2014 is over. The contest was incredibly tough: solving less than half the problems was enough to get a medal, and 20 teams didn't solve any of the problems (unfortunately including the UCT team, who got bogged down in problem K).<br /><br />I managed to solve 5 problems during the contest (in 4:30, since I didn't wake up early enough to be in at the start), and I've solved one more since then. Here are my solutions, in approximately easy-to-hard order.<br /><br />EDIT: note that I've made follow-up blog posts with more analysis. <br /><h3 style="text-align: left;">D: game strategy</h3><div style="text-align: left;">This was definitely the easiest one, and this is reflected in the scores. We can use DP to determine the set of states from which Alice can force a win within <i>i</i> moves. Let's call this w[i]. Of course, w[0] = {target}. To compute w[i+1], consider each state s that is not already a winning state. If one of Alice's options is the set S and \(S \subseteq w[i]\), then Alice can win from this state in i+1 moves.<br /><br />We can repeat this N times (since no optimal winning sequence can have more than N steps), or just until we don't find any new winning states. We don't actually need to maintain all the values of w, just the latest one. </div><h3 style="text-align: left;">K: surveillance</h3><div style="text-align: left;">Let's first construct a slower polynomial-time algorithm. Firstly, for each i, we can compute the longest contiguous sequence of walls we can cover starting at i. This can be done in a single pass. We sort all the possible cameras by their starting wall, and as we sweep through the walls we keep track of the largest-numbered ending wall amongst cameras whose starting wall we have passed. The camera with \(a > b\) are a bit tricky: we can handle them by treating them as two cameras \([a - N, b]\) and \([a, b + N]\).</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Let's suppose that we know we will use a camera starting at wall i. Then we can use this jump table to determine the minimum number of cameras: cover as much wall starting from i as possible with one camera, then again starting from the next empty slot, and so on until we've wrapped all the way around and covered at least N walls. We don't know the initial i, but we can try all values of i. Unfortunately, this requires \(O(N^2)\) time.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">To speed things up, we can compute accelerated versions of the jump table. If \(J_c[i]\) is the number of walls we can cover starting from i by using c cameras, then we already have \(J_1\), and we can easily calculate \(J_{a+b}\) given \(J_a\) and \(J_b\). In particular, we can compute \(J_2, J_4, J_8\) and so on, and them combine these powers of two to make any other number. This can then be used in a binary search to find the smallest c such that \(J_c\) contains a value that is at least N. The whole algorithm thus requires \(O(N\log N)\) time.<br /><br />One catch that broke my initial submission is that if the jump table entries are computed naively, they can become much bigger than N. This isn't a problem, until they overflow and become negative. Clamping them to N fixed the problem.</div><h3 style="text-align: left;">C: crane balancing</h3><div style="text-align: left;">This is a reasonably straightforward geometry problem, requiring some basic mechanics. Let the mass of the crane be \(q\), let the integral of the x position over the crane be \(p\), let the base be the range \([x_0, x_1]\), and let the x position of the weight be \(x\). The centre of mass of the crane is \(\frac p q\), but with a weight of mass \(m\), it will be \(\frac{p+mx}{q+m}\). The crane is stable provided that this value lies in the interval \([x_0, x_1]\). This turns into two linear inequalities in \(m\). Some care is then needed to deal with the different cases of the coefficient being positive, negative or zero.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">Computing the area and the integral of the x value is also reasonably straightforward: triangulate the crane by considering triangles with vertices at \((0, i, i+1)\) and computing the area (from the cross product) and centre of mass (average of the vertices) and then multiply the area by the x component of the centre of mass to get the integral.</div><div style="text-align: left;"><br /></div><div style="text-align: left;">I prefer to avoid floating point issues whenever possible, so I multiplied all the X coordinates by 6 up front and worked entirely in integers. This does also cause the masses to be 6 times larger, so they have to be adjusted to compensate.</div><h3 style="text-align: left;">E: maze reduction</h3><div style="text-align: left;">This is effectively the same problem I set in Topcoder SRM 378 (thanks to ffao on Topcoder for reminding me where I'd seen this). If you have an account on Topcoder you can read about an alternative solution in the match analysis, in which it is easier to compute worst-case bounds.<br /><br />The approach I used in this contest is similar in concept to D, in that you incrementally update a hypothesis about which things are distinguishable. We will assign a label to every door of each chamber. Two doors are given the same label if we do not (yet) know of a way to distinguish them. Initially, we just label each door with the degree of the room it is in, since there is nothing else we can tell by taking zero steps.</div><div style="text-align: left;">Now we can add one inference step. Going clockwise from a given door, we can explore all the corridors leading from the current chamber and note down the labels on the matching doors at the opposite end of each corridor. This forms a signature for the door. Having done this for each door, we can now assign new labels, where each unique signature is assigned a corresponding label. We can repeat this process until we achieve stability, i.e., the number of labels does not change. We probably ought to use each door's original label in the signature too, but my solution passed without requiring this.</div><div style="text-align: left;">Finally, two rooms are effectively identical if their sequence of door labels is the same up to rotation. I picked the lexicographically minimum rotation for each room and used this as a signature for the room.</div><div style="text-align: left;">I'm not sure what the theoretical work-case performance is, but I suspect it would be quite large. For a start, by algorithm requires \(O(NE\log N)\) time <i>per iteration</i>, which I suspect could be improved by using a hash table with some kind of rolling hash to rotation in O(1) time. I was surprised when my first submission ran in time.</div><h3 style="text-align: left;">B: buffed buffet</h3><div style="text-align: left;">This one was deceptively sneaky, but the principles are reasonably simple. It's not too hard to guess that one will solve the continuous and discrete problems separately, and then consider all partitions between the two.</div><div style="text-align: left;">Let's start with the continuous problem, since it is a little easier. I don't have a formal proof, but it shouldn't be too hard to convince yourself that a greedy strategy works. We start by eating the tastiest food. Once it degrades to being as tasty as the second-tastiest food, we then each both, in the proportion that makes them decay at the same rate. In fact, at this point we can treat them as a combined food with a combined (slower) decay rate. We continue eating this mix until tastiness decays to match the third-tastiest, and so on. There are some corner cases that need to be handled if there are foods that don't decay.</div><div style="text-align: left;">Now what about the discrete case? Because the items have different weights, a greedy strategy won't work here, as with any knapsack problem. There is a reasonably straightforward \(O(DW^2)\) dynamic programming, where you consider each possible number of serving of each discrete food type, but this will be too slow. But there is some structure in the problem, so let's try to exploit it. Let's say that \(f(i)\) is the maximum tastiness for a weight of \(i\) using only the foods we've already considered, and we're now computing an updated \(f'\) by adding a new discrete food with weight \(w\), initial tastiness \(t\) and decay rate \(d\). For a given i, \(f'(i)\) clearly only depends on \(f(j)\) where \(i - j\) is a multiple of \(w\), so let's split things up by the remainder modulo \(i\). Fix a remainder \(r\) and let \(g(i) = f(iw + r)\) and \(g'(i) = f'(iw + r)\). Now we can say that</div><div style="text-align: left;">\[<br />\begin{aligned}<br />g'(i) &= \max_{0 \le j \le i}\big\{ g(j) + \sum_{n=1}^{i-j}(t - (n-1)d\big\}\\<br />&= \max_{0 \le j \le i}\big\{ g(j) + (i-j)t - \frac{(i-j-1)(i-j)}{2}\cdot d\big\}\\<br />&= \max_{0 \le j \le i}\big\{ g(j) + (i-j)t - \frac{i(i-1)+j(j+1)-2ij}{2}\cdot d\big\}\\<br />&= it - \frac{i(i-1)d}{2} + \max_{0 \le j \le i}\big\{ g(j)-\frac{j(j+1)d}{2} - jt + ijd\big\}\\<br />&= it - \frac{i(i-1)d}{2} + \max_{0 \le j \le i}\big\{ h(j) + ijd \big\}<br />\end{aligned}<br />\]<br />Here we have defined \(h(j) = g(j)-\frac{j(j+1)d}{2} - jt\), which can be computed in constant time per \(j\).<br /><br />The key observation is that we have turned the expression inside the maximum into a linear function of \(j\). Thus, if we plot \(h\) on a graph, only the upper convex hull is worth considering. We can maintain this upper hull as we increase \(i\) (remember, \(j \le i\)). The second observation is that the optimal choice of \(j\) will increase as \(i\) increases, because increasing \(i\) will increase the \(ijd\) more for larger values of \(j\). It follows that we can incrementally find the optimal \(j\) for each \(i\) by increasing the previous \(j\) along the upper hull until increasing it further would start decreasing the objective function. There are a few corner cases: the upper hull might not have any valid entries (because there might be no way to make up a weight of exactly \(jw+r\)), and we might have popped off the previous \(j\) as part of updating the hull. These just need careful coding. Another snag to watch out for is that if all the foods decay very fast, the optimal result may in fact overload a 32-bit integer in the negative direction.<br /><br />The discrete part of the algorithm now requires O(DW), and the continuous part requires O(D\log D + W).</div><h3 style="text-align: left;">F: messenger (solved after contest)</h3><div style="text-align: left;">This is a nasty problem mostly because of numerical stability problems. The problem itself is numerically unstable: a rounding error in measuring the total lengths of the paths is sufficient to change the answer between possible and impossible. It requires the black art of choosing good epsilons. I'll only describe the ideal case in which none of these nasty rounding problems occur.<br /><br />Suppose we pick a point on both Misha and Nadia's path. In general this won't work, because the courier will arrive too late or too early at this point to intercept Nadia. Let L(A, B) be the number of time units that the courier is late when travelling from A to B. L can be late if the courier is early. Valid solutions are those where L = 0.<br /><br />First, let's prove that L is an increasing function of A (where "increasing" means that A moves further along Misha's path). Suppose that the courier decided to, instead of moving straight to B, instead kept pace with Misha for a while, then went to B. Obviously this would mean arriving later (or at the same time). But this is equivalent to the arrival time if A increases. A similar argument shows that L is a decreasing function of B. In both cases, this is not strict.<br /><br />This means that there can be at most \(O(N)\) pairs of segments for which L=0, rather than \(O(N^2)\), because we cannot move forward on one path and backwards on another. We can iterate over the candidate pairs by successively advancing either one segment or the other, by measuring L at pairs of segment endpoints.<br /><br />Now we have reduced the problem to a single segment from each path. Given a point on one path, we can find the corresponding point on the other by solving what looks like a quadratic but which decays into a linear equation. Again, there are corner cases where the linear coefficient is also zero, in which case we have to break ties so as to minimise the travel time. Using this mapping function, we can identify the range of positions on Misha's path that correspond to valid positions on Nadia's path. I then used ternary search to find the optimal position along this segment. I didn't actually prove that the function is convex, but it seemed to work.<br /><br />The solution I implemented that passed is actually rather messier than what I've described, and ran close to the time limit. I tried implementing it as described above but it hits assertions in my code due to numeric instabilities, but I think it should still be possible to fix it.</div></div>Unknownnoreply@blogger.com12tag:blogger.com,1999:blog-31847281.post-78518432906570376112014-06-02T03:23:00.004-07:002014-06-02T03:23:39.904-07:00New website<div dir="ltr" style="text-align: left;" trbidi="on">I've finally gotten around to replacing my ugly nineties-looking <a href="http://www.brucemerry.org.za/">personal web page</a> that was cobbled together with raw HTML and m4, with a much prettier version that nevertheless still contains roughly the same content. It is generated using <a href="http://sphinx-doc.org/">Sphinx</a>, with the individual pages written in <a href="http://docutils.sf.net/rst.html">reStructuredText</a>. I've also merged my publications page into the main website instead of leaving it on the equally ugly but separately cobbled-together page I originally set up for my PhD.</div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-45467314060592092412014-05-10T08:45:00.002-07:002014-05-10T08:45:23.889-07:00Challenge 24 finals 2014<div dir="ltr" style="text-align: left;" trbidi="on">Here are my thoughts/solutions for the Challenge 24 Finals problems from this year.<br /><h3 style="text-align: left;">A: Halting problem</h3><div style="text-align: left;">Firstly, be aware that the problem had a typo: the condition should be "A != 0" rather than "A == 0". The intuition is that "most" of the time the g edge will be taken, but occasionally the h edge might be taken. Given a starting point, let us suppose that we have an efficient way to iterate the algorithm until either an h edge is taken or we terminate (or we determine that neither condition will ever hold). In the former case, we have just determine that we need f(0) for some f. Since there are only O(N) of these values, we can cache them as needed, and so only require O(Q + N) rounds of the inner algorithm (namely, iterating until we reach A = 0 or terminate).</div><div style="text-align: left;">Now let us solve this inner problem. Consider the directed graph in which each function f is connected to g. A graph where every vertex has out-degree one has a specific structure, in which each component contains a single loop from which hang upward-directed trees. Iteration thus consists of walking up a tree until reaching the loop, then going round and round the loop until we exit. Of course, we might finish early, or we might go around the loop forever.</div><div style="text-align: left;">For the walking up the tree part, let's use brute force for now. Assuming we make it to the loop, how do we find the exit point? Let's say that every complete time around the loop we increase A by C, and that at some point around the loop we have A = a. Then to exit at this point after t full cycles, we must have a + tC = 0 (mod 2<sup>64</sup>). We can solve this using the extended GCD algorithm. We can measure this for every point around the loop, and the smallest valid t value tells us where we will exit. If there are no valid t values, then we will go round forever.</div><div style="text-align: left;">This is essentially the solution I used in the contest. It is somewhat slow: O(N(N+Q)) I think. I did a lot of microoptimisation, and even then it took hours for the largest test case. The important optimisations were precomputing parameters for each loop (particularly the inverse of C) once, and reordering the data so that points around a loop are consecutive in memory. In fact over 75% of time ended up being spent in pointer-chasing up the trees, because this was cache-unfriendly.</div><div style="text-align: left;">I think the problem can be solved in something like O((N+Q)log (N+Q)), but it is rather more complex, and a blog doesn't really lend itself to the mathematical notation. I might write it up later if I get around to it. I'll give a few hints. If C is odd then increasing the A value on entry to the loop by a certain amount will increase the time-to-exit for each point by the same amount; this can be generalized if C is an odd multiple of 2^k by partitioning things by their remainder mod 2^k. The tree part can be handled by considering all queries at the same time, in a preorder walk of the tree.</div><h3 style="text-align: left;">C: Complete program</h3><div style="text-align: left;">I had a nasty super-polynomial time solution during the contest, which was complex and only managed to solve about 7 test cases. There is a much simpler and polynomial-time DP solution, however.</div><div style="text-align: left;">Let's start with some observations. Any O not followed by a valid number will need one, so let's add it immediately (it can be anything valid). After this, O and the number might as well be treated as a single unit (there is no point inserted anything between an O and a valid number either). There is no point ever adding an I. Adding F's is useful only to increment the nesting depth, so we might as well add F's only right at the front, since that way everything will be nested inside them. Finally, whenever we add a number, it might as well be a 0.<br />What is less obvious is that it is only necessary to add an A immediately before a number. This requires checking a few cases, but is because an A added anywhere else can be shuffled to the right until it reaches a number, without affecting whether a token string is valid. Similar rotation arguments show that we never need to add two As or Os in front of a number.</div><div style="text-align: left;">For the DP, let's compute, for each contiguous substring of the input and each F nesting level, the minimum number of tokens we need to add to make it into a valid parse tree. There is a special case for the empty range: we must just add a 0, and we must check that the nesting depth at that point is at least 1. Otherwise, we can either</div><ul style="text-align: left;"><li>Insert an A or an O.</li><li>Use the first token in the substring as the root.</li></ul>In the first case, the A or O will "swallow" the number, so if it is at most 127 we can just use an O, otherwise we must use an A and the nesting level must be sufficiently large. We then use DP to find the best way to handle the rest of the substring. The second case is largely straightforward, unless the first given token is an A. In this case, we must consider all ways in which the rest of the substring may be split into two valid trees.<br />Finally, we need to determine the nesting level to use at the root. It is possible, although a little slow, to just keep incrementing this value until there are so many initial Fs that it is clearly impossible to improve the solution. A more efficient solution in practice is to compute the DP over each substring as a piecewise constant function of the nesting level; this is more complex since one needs to do various operations on these piecewise constant functions, but it guarantees an O(N<sup>4</sup>) solution independent of the size of the numbers in the input.<br /><h3 style="text-align: left;">D: Firing game</h3><div style="text-align: left;">I solved 9 out of the 10 test cases. The primary observation is that any manager that is unfirable in the initial setup creates a completely independent subtree: firings within this subtree have no effect on the rest of the tree, and vice versa. This partitions the tree into separate Nim piles, and we just need to determine the Grundy number for each such subproblem. In all but one test case, these trees are reasonably small, and can be solved by DP/memoized recursion.</div><h3 style="text-align: left;">N: Dog tags</h3><div style="text-align: left;">This problem looked as though it could be extremely nasty, but fortunately the data was somewhat friendly. For example, each bubble was intersected enough times to get a good guess at the volume, even though the problem only guaranteed a single intersection.</div><div style="text-align: left;">Firstly one must identify the bubbles. The loops can be considered to be a graph, with an edge between two loops if they are on adjacent slices and their interiors intersect in the XY plane. I was lazy and considered two loops to intersect if a vertex of one lay inside another, which is good enough if the loops are sampled well. Each connected component in this graph constitutes a bubble. I estimated the volume as the sum of areas of the loops, and also found the centroid (mean of all points contained in the loops).</div><div style="text-align: left;">Finding the fiduciary bubbles is easy: they are the 3 bubbles with greatest volume. Determining which one is the corner is not quite as easy, since one does not know the inter-slice distance and hence distances cannot easily be measured. I identified the first plane of bubbles (see later), and noted that the mid-point of the two corner bubbles should be close to the centroid of this set of bubbles. I just took all 3 edges to find which is best.</div><div style="text-align: left;">This now gives the grid coordinate system XYZ. One does need to check that the bubbles are on the positive Z side, otherwise the two corners need to be swapped.</div><div style="text-align: left;">Now we can partition the bubbles into planes. I sorted by Z, computed all the differences between adjacent Z values, sorted this list, and looked for a big jump between adjacent values (small values are noise within a plane, big values are plane-to-plane steps).</div><div style="text-align: left;">The same process can then be used to find each row within a plane, and then bubbles in a row can be sorted by X.</div></div>Unknownnoreply@blogger.com0tag:blogger.com,1999:blog-31847281.post-78303376044283592462014-02-08T09:39:00.001-08:002014-02-08T09:41:56.929-08:00Challenge 24 EC round 2014<div dir="ltr" style="text-align: left;" trbidi="on">Here are my thoughts on the problems we solved from this year's electronic round of Challenge 24.<br /><h2 style="text-align: left;">A. Safe</h2><div style="text-align: left;">I didn't get a complete solution to this, but I was able to solve cases 1-9. I noticed that in most cases the state space for the wheels (keys, but the pictures made me think of them as wheels) is reasonably small. For each password, the solution will involve first pressing the keys a number of times to set up the initial state, and then typing the password. So we want to find the initial states from which we can input the password, and pick the cheapest amongst them.<br />This can be done by dynamic programming, running backwards over the password. When there are no more characters to type, any state is fine. Considering each letter in turn, we need to find the states in which we can type the next letter on one of the keys, and from which the resulting state allows us to complete the password.</div><div style="text-align: left;"></div><div style="text-align: left;">For A10 there are 6.4 billion states, and it was going to take a long time (it did churn out the first two passwords in about 3 hours). I suspect it might be possible to do it by keeping an explicit list/set of reachable states and working from there, rather than iterating over the entire state space.</div><h2 style="text-align: left;">B. Wiretapping</h2><div style="text-align: left;">This problem basically just requires an application of <a href="http://en.wikipedia.org/wiki/Kirchhoff%27s_theorem">Kirchhoff's Theorem</a>, which gives the number of spanning trees of the graph in polynomial time. To count the number of spanning trees that use the tapped link, just collapse the two endpoints into a single vertex and count the number of spanning trees of the remaining graph. The ratio between the number of spanning trees in this reduced graph and the original graph is the answer.<br />One complication is that the number of spanning trees may overflow a double-precision float. I worked around that by scaling down the Laplacian matrix by some factor f (I used 1/32 for the larger cases), which reduces the determinant of the cofactor by f<sup>N-1</sup>. </div><div style="text-align: left;"><h2 style="text-align: left;">C. Visual Programming - VM</h2><div style="text-align: left;">This was a fairly straightforward implementation problem, just requiring careful interpretation of the rules, and recognising that a lot of the information given is unnecessary (e.g., for lines you only need to keep the two endpoints).</div><h2 style="text-align: left;">D. Visual Programming</h2><div style="text-align: left;">Due to lack of time I only solved D1, D2 and D4, and had a broken solution to D3. As with most VM problems, I invented a slightly higher-level language that is possible (but still hard) to code in by hand, and write a simpler compiler. I made a few simplifying assumptions</div><ul style="text-align: left;"><li>Polygons are laid out one below the other. Each polygon is rectangular, except that the right hand edge may be ragged.</li><li>Each line is only ever used in one direction. To ensure this, the outgoing lines from a polygon all emanate from the top part of the right edge, and incoming lines arrive at the bottom part of the right edge. There is also an unconditional jump from each polygon to the next one to ensure that the routing terminates before encountering any of the incoming lines.</li><li>To avoid lines cutting through polygons, each line emanates directly horizontally from the polygon to (256, y), then has a second segment to the target point. The x value of the start is chosen depending on which register is to be used in the comparison.</li><li>I reserved registers 0 and 1 to always contain the values 0 and 1, which made it easier to construct some other operations, as well as to build an unconditional jump.</li></ul></div><div style="text-align: left;">The programming mini-language just specifies instructions and conditional a list of conditional jumps that takes place after each instruction (falling through to the next instruction if none match). The instructions I wrote at the level of setting the parameters (A-F, X-Z, T, Q, L, H), but I had useful defaults for all of them so that most instructions only changed one or two defaults e.g. setting A=3 B=15 would generate an instruction that copies from SIO into register 3.</div><div style="text-align: left;">Once all that is done, it is not particularly difficult for anyone who has done any assembly to write code for D1, D2, and D4. The only slightly tricky part was to extract the lower 8 bits from the result, which I did by multiplying to 256 then dividing by 256.</div><div style="text-align: left;"><h2 style="text-align: left;">G. Spy Union</h2><div style="text-align: left;">At first I thought this could be done greedily, but it doesn't work: reinstating a fired person might let you fire an additional person <i>in each hierarchy</i>. Instead, we can use network flow. Let's consider just one hierarchy for now, and connect the source to every node, connect each node to its parent, and connect the root to the sink (directed edges). The edges from the source have capacity 1, with the interpretation that they have flow if the corresponding employee is fired. The flow through each intermediate edge will then be the total number of employees fired from the corresponding subtree, so we will set the capacities to be the maximum number we can fire each subtree. The maximum flow will thus be the maximum number of people we can fire. </div><div style="text-align: left;"></div><div style="text-align: left;">However, we have two hierarchies. Since network flow is symmetric, we can apply the same treatment to the second hierarchy, but connect the source to the root, and the individual employees to the sink. To join up the problems, we get rid of the source in the first graph and the sink in the second graph, and create capacity-1 edges directly from each employee in the second graph to the matching employee in the first. Once again, these edges have flow if the corresponding employee is fired, and the maximum flow gives the solution.</div><div style="text-align: left;"><br /></div></div></div>Unknownnoreply@blogger.com0