Sunday, April 09, 2017

Google Code Jam 2017 in Lisp: Qualification Round

In 2016 I was quite happy with my performance in GCJ, even though I didn't make it past Round 1. So despite being quite busy, I registered for the 2017 edition as well. The Qualification Round happened yesterday. It lasted for 27 hours, starting at 1AM in my time zone. I looked at the puzzles after breakfast, and as usual found them intriguing. I initially skipped the first one (Oversized Pancake Flipper) because I knew it must be easy but I didn't find an obvious approach right away. The general approach I took was to first read the problem (out loud if a family member would be willing to listen :-), then think about it while scribbling in a notebook, then when I thought I had an efficient (should be "efficient enough") solution code it up in Lisp more or less top-down. For your amusement, I'm sharing the entirety of my this year's notes, including errors and culs-de-sac.

B. Tidy Numbers

This seemed easy and linear, and I didn't have to do much scribbling. We're looking for the largest Tidy Number that's less than or equal to the input number. We represent the input as a vector of digits. Then we start from the left and look at consecutive pairs of numbers. As long as di≤di+1 we're happy. If di>di+1, we need to "fix it up". Fixing it up means usually means changing di to di-1 and replacing everything to the right with nines. We cannot run into di=0, because there are no leading zeroes and we have already checked that the prefix of the number up to and including digit i is Tidy, so the digits cannot become lower. The local fixup may result in the overall number not being Tidy anymore, i.e. we may have made di<di-1. So we need to recurse (or reiterate, but hey we're thinking in Lisp, right?) on the result. We also may have introduced a leading zero that should be removed. So we remove those. The resulting code is in B.lisp. The code is recursive but not purely functional, because it munches the array using (setf (aref ...) ...). That's ugly and might have made debugging harder, but then the code should be easy enough to not need debugging. And probably this (half-) destructive approach saves a few cycles...

C. Bathroom Stalls

This was definitely the one I had the most (and longest) fun with. It took me several hours (mostly unfocused/doing other things) to get to the breakthrough, which would of course have killed me in a "real" time-limited round.
At first I drew up some simple/small examples. Starting from initial settings with (N=) 7, 6, 5, or 4 spaces, I added all N respective occupants K in order, to get an intuitive picture of how the game plays out. But I didn't find the general pattern.
Pretty quickly, I had a complete understanding of the case where N=2i-1: The first person (K=1=20) has 2i-1-1 space to the left and to the right, the next two (21) persons each have 2i-2-1 space to the left and right, and so on. But how to generalize this to Ns that do not have this form?
At any "round of the game" (K), there is a set of "extents" of free space. In each round, the largest of the current extents is "split": If the size of this largest extent is odd and larger than one, that is, of the form 2x+1, then it will be split into two extents of size x. That was the "easy" case that covers the problem completely when N=2i-1.
But for other kinds of N, I still had no good idea of the pattern how the overall set of extents develops. In a desperate attempt, I tried to draw this up as a tree for N=6. That looked somewhat promising, but I still couldn't figure out the general pattern. At least it shows the special case of the size-2 extent, which doesn't really get split into two extents but degenerates into a single size-1 extent. Hm, this seems complicated.
In an even more desperate attempt, I decided that I needed to look at a bigger example and started drawing up the tree for N=230. Nothing very particular about that number, except that I wanted to start with an even number because that seems to be the more complicated case.
Now that led to the breakthrough. In this non-trivial example, I saw that each level of the tree only contained (a maximum of) two different values with a difference of one.
So a complete and useful description of a given level of the tree is the distribution between these two different values. This is what I wrote up in the right-hand column: On level 1 (after the first split) we have extents of 1x114 and 1x115, on level 2, 1x56 and 3x57, on level 3, 1x27 and 7x28, on level 4, 9x13 and 7x14, and so on. Generalizing to level 0, we can say that we start with 1x230 (1xN) and 0x231.
But how does one get from level j to level j+1? Well, each odd extent splits into two equal extents, and each even one splits into one that's half the original number and one that's one lower. That leads to the two cases below the large (partially written-out) tree. (I started writing the cases with m and n and then replaced that with a and b, respectively, but didn't do so all the way because I had enough to write the code.) The code is in C.lisp, and I'm quite happy with it. Initially I had switched the even and odd cases, which led to wrong results on the tiny example under the problem description. First I panicked, but by simply running a few examples under (trace minmax-1) I could locate the error. I knew the code was O(log(N)) efficient, so I fed it the small, still-small, and large cases with confidence, and each was processed in less than 10 milliseconds.

A. Oversized Pancake Flipper

At this point I was fairly certain to have passed the mark with just two problems done, assuming that at least one of my "large" data set solutions were correct. But it nagged me that I hadn't turned in A, which was supposedly the easiest puzzle. I thought about it again and came up with the following reasoning: The basic operator (multi-pancake flip) is commutative—it doesn't matter whether I flip 4–6 and then 3–5, or the other way round. That means we should be able to strictly process the pancakes in some order, e.g. from left to right. The operator is also an inversion... applying it twice returns the original state and, because of the commutative property, never makes sense.
So we can just go to the row of pancakes left-to-right. If the current pancake is –, we turn it and the others to its right. If the current pancake is + we leave it. Then we shift one place to the right and repeat. When we're at the end of the row (which is K places before the end because of the flipper restriction), we're done. The last pancakes must then all be + or we lose.
The code is in A.lisp. Again, this is written recursively but destructively: The flipping is done by modifying elements in the vector (string). Probably the code would be more elegant if rewritten using a list instead of a vector, and constructing a fresh list when flipping.
When reading the initial state of pancakes, I skipped the reverse (or nreverse) step because it really doesn't matter whether we operate on a mirror image.

D. Fashion Show

Wow, that problem sure seemed from a different league. Fewer than 1'000 contestants even got the small solution right, whereas all other problems had at least 13'000 correct solutions. My mental analysis and scribbling didn't get me very far. You cannot have more than one non-+ in the same row or column, not more than one non-x in the same diagonal. If you want to have an o (twice the value of a + or x), then all the other non-empty models in the same row/column need to be +, all on the same diagonal need to be x.
There's also a parallel to the N-queens problem, in that in both cases you cannot have more than one "thing" (where thing means "queen" or "non-+/x") per row/column/diagonal. But where N-queens only has one "thing", the queen, in this case the things are different (non-+/non-x), and what's more we need to maximize some value.
So I couldn't figure out a useful approach, and the fact that some initial models have been placed makes the reasoning even harder. So I gave up before even trying very hard.

Conclusion

The next morning I checked my results for the large data sets and found them all correct. I ranked in the 4'400s, but then I started late. My 65 points would have been sufficient to enter the top 1000 if I had finished everything in 2h15. That would have been somewhat unlikely, even if I had started immediately and had been fully focused. The first sub-round of round 1 will start at 3AM local time, so I'll probably skip it and try 1B a week later, and then probably 1C. Let's see!
I didn't have too much trouble with the SLIME/SBCL combination; for more than 25 years I have mostly been using Allegro and its Emacs interface when programming, and this is still much easier for me, especially when debugging. But for the contest I really wanted to use a Free/Open Source Lisp.

Sunday, May 08, 2016

Google Code Jam 2016 in Lisp: Round 1B/1C

After failing to qualify in round 1A in this year's Google Code Jam, I had two more chances to advance to round 2.

Round 1B

In round 1B, I only managed to solve the first problem, Getting The Digits.  It took me quite some time, but eventually I found the recommended approach to count letters that are specific to a single digit (e.g. Z occurs in no other digits than ZERO), then eliminate those digits; initially all the even digits have unique characters (0:Z, 2:W, 4:U, 6:X, 8:G), but after those have been eliminated, this holds for other digits as well: (1:O, 3:T, 5:F, 7:S) and finally (9:I).  The program is ugly in that this sequence of letter/digit combinations is hard-coded; a good program would extract the letters by itself so that it could be ported more easily to other languages etc.  My Lisp code is here.

Anyway, both of the other problems resisted my analysis.  I should have been able to solve Close Match, but I couldn't decide whether it should be solved from left to right or right to left, and ended up spinning in circles.  The final problem, Technobabble, is beyond my current capabilities: I neither succeeded in mapping the problem description to the correct graph problem (minimum edge cover on a bipartite graph), nor would I have been able to code an efficient solution to that problem.  As an undergraduate I was good at these kinds of things, but lacked practice over the past 20 years or so.

With only one problem set solved, I ranked #4259 and was very far from qualifying.  The other Lispers didn't fare much better in this round, except for orivej, who advanced by ranking #847.  He or she only used Lisp for the first problem, and Python for the third.

Round 1C

Again I was able to solve the first (and easiest) problem, Senate Evacuation.  There are many viable approaches, and it took me a bit too long to find one.  Here's my slightly embellished A.lisp (raw submission).

After 35 minutes, I attacked the second problem, Slides! After lots of scribbling of graphs and diagonal matrices, I found a method to construct slides for an arbitrary M below the maximum that maps nicely to binary numbers.  The resulting code (B.lisp, cleaned up from the actual submission) is short and sweet and even turns out to work.  But again it has taken me much too long to get there.

With only 16 minutes left, I looked at the final problem, Fashion Police.  I knew I didn't have the slightest chance to solve the problem properly, so I started coding a brute-force algorithm that might solve the small dataset.  But that was not as small as it looked, and the problem not as simple as I thought, so I failed.  Well, anything else would have been a miracle.

My final rank was in this round #1326.  I didn't qualify, but I was quite happy with this honorable (for me) result.  To qualify, I would have had to finished my two working programs in 1h39'30" instead of 2h13'46".  Maybe with more practice I'd be able to save those 34 minutes somewhere—but it looks like a long shot!

Looking at the other Lispers in the contest: This round saw an awesome result from lpetru, who handed in a perfect score (100 points, all problem sets solved), landing him or her on rank #98.  And so far he or she submitted all solutions in Lisp.  I know whom to root for in round 2!

Final remarks

Again it was lots of fun to participate in Google Code Jam.  It was exhausting and sometimes frustrating (especially in round 1B).  But it's good to have those old brain cells work outside their comfort zone.  So I'll probably try again and hope that I'll eventually manage to reach round 2.  See you next year!

Sunday, April 17, 2016

Google Code Jam 2016 in Lisp: Round 1A

Having survived (like 96% of all Lispers and 81.5% of the wider population) the Qualification Round, I'm trying to pass the first "real" round. As expected, things are getting much more serious now! Round 1 is held in three sub-rounds, and you can participate in each, or until you advance to round 2.  In each sub-round, the 1'000 highest-ranking contestants advance to round 2.

Where the qualification round had consisted of four problems (subjectively, two easy and two somewhat tricky) with 27 hours time to solve them, round 1 is three problems (an easier one and two quite tricky ones) that have to be solved within 2h30.  From previous participations I knew that I would likely run into the time limit.

Getting off to a good start


Although the first sub-round, Round 1A, started at 3AM in my time zone, I decided to give it a try.  I went to bed somewhat early and asked one of my sons, who was going out, to wake me up a bit before three, which he dutifully did.  (Thank you!)

The first puzzle (The Last Word) went quite well.  Somehow I convinced myself quickly that it is sufficient to put the next letter in front if it is greater than or equal to the first letter of the current substring, and at the end if it is smaller.  My straightforward Lisp implementation can be found here.

The second task (Rank and File) was cute.  After a bit of scribbling and thinking, I suddenly realized that it's sufficient to look for numbers that occur an odd number of times, sort those numbers and be done with it.  I coded this up fairly quickly as well (Lisp source here).

So I was done with the first two problems after 44'22" with no penalties.  I was also reasonably certain that I got both the large input sets right as well.  (In the Qualification Round I had wrongly believed so as well, but in the critical case of problem C, the difference between small and large input was qualitative rather than quantitative.)

Problem C: close, but no cigar


With 1h45 left, I attacked the last problem, BFFs.  This is a great problem—very simple in its formulation, yet requiring good (but not impossible) analysis, and also a bit of programming skills.  It would be a nice recruitment interview problem.

This is pretty clearly a graph problem.  The input set is a rather special set of directed graph with out-degree 1 (the BFF function).  This must somehow be mapped to another kind of graph that represents a circle with some constraints.  But this mapping is not that straightforward.  Studying the sample inputs on the problem page, I quickly found that there are two different types of circle: One is a complete cyclic sub-graph of the input—that's easy to find given the simple type of input graph.  The other possibility is to have a "core" circular graph of two nodes (a "couple" of mutual BFFs), with "extensions" of other nodes (kids) that (BFF-) point towards the nodes of the core.  I wrote the code for both cases and tested it successfully against the tiny example on the problem page.  My code looked for the maximum cycle, and the maximum graph of the second type, and would return the size of the bigger one.

To my great disappointment, the system told me that my solution for the small input set was incorrect.  I couldn't figure out what was wrong, and time ran out on me.  It was 5h30, and I had finished at rank #2137, with correct solutions for the first two problems and none for the third.

I went to bed slightly frustrated, and of course couldn't sleep right away but continued to think about the problem... what if the entire class consists of happy couples, i.e. mutual BFFs? Then I could put everybody in the circle.  Hmmm... that's a case I had somehow missed! In fact the second type of graph (couple with "inbound" extensions) can occur arbitrarily often in the circle.  Having understood my error, I could finally sleep.

In the next morning I had to write the code, although the contest was over.  Although I had coded (correctly, I think) a good part of the necessary logic the night before, it took me several hours to get the algorithm right for the actual solution.  I was too ambitious and wanted to write a really efficient solution ("early optimization..."), although a stupid O(N2) approach would probably have worked even for the maximum problem size of a 1'000 kids class.  Probably I was also tired.  Also, I saw that I don't know how to do proper debugging in SBCL, and had written the code in a way that made debugging somewhat hard.  Finally I got the code right, see C.lisp here.  Although the code is not particularly compact, I'm happy with it because at least I can read and understand it (for how long is another question), and it is O(N) with a pretty low constant factor for what that is worth—it runs in 20 milliseconds on the large practice set including I/O.

What about the other Lispers?


The per-language dashboard for Lisp on go-hero.net shows that none of us advanced in sub-round 1A.  A very honorable mention goes to vii, who ended up on rank #1019 with correct solutions for all problems and input sets, but with a few penalties landing him or her 2'33" above the qualifying threshold.  Good job & better luck next time! I'm sure he or she has what it takes.

Will I continue to use Lisp (and SBCL)?


While coding the last problem (BFFs), I sometimes thought that the program would have been easier to write in another language, even in C.  I couldn't find an elegant recursive approach, and I always found writing iterative code in Lisp a bit awkward; I never really mastered loop or any of the other "modern" iteration constructs—why learn them when do will do? But the primary reason why I would have preferred another language was data structures.  I built my graph from "node" objects (expressed as defstructs) with pointers, which was natural enough and worked well.  But in another language I would have used a convenient vector representation... in Lisp the syntax for dereferencing vectors is so verbose and awkward that I was worried getting things wrong.  And I felt that in the short time, I couldn't write a nice data type abstraction to hide the intricacies of such a vector representation.

What about SBCL? As mentioned before, I normally use Franz Inc's excellent Allegro Common Lisp, but I wanted to use a "real" open source system for the contest.  I'm slowly getting used to SLIME after more than 20 years of Franz's Emacs-Lisp interface.  But when I had to debug my final BFFs solution, I felt almost blind because I wasn't able to look at local variables from within a backtrace, something that works quite nicely with Allegro.  So I had to resort to the equivalent of printf()s (warn), which made me feel a bit dirty.  Probably I should learn how others debug under SBCL, I'm sure there's a better way! [Edited to add: (declaim (optimize (debug 3))) seems to go a long way!]

In the weeks before the contest, I had toyed with the idea to use Go this year, but I didn't have the time to learn the language well enough to make this viable.  And overall, Lisp has treated me quite well this year.  Also I've been using it for about 30 years now.  So I don't feel like quitting mid-contest.

There are still two opportunities for me to try to advance to round 2.  Timezone-wise they are much more convenient for me.  On the other hand, I need some luck to find a good solution for the medium-tough problem as quickly as in 1A, and a bit more to find a good solution for the tough problem at all.  But I'm an eternal optimist...

Sunday, April 10, 2016

Google Code Jam 2016 Qualification Round in Lisp

This year I registered for Google Code Jam again. The qualification round was this weekend, and I submitted solutions in Lisp.  This time I used SBCL with GNU Emacs and SLIME as the IDE, rather than Allegro Common Lisp with GNU Emacs and Franz Inc's emacs-lisp interface.
The puzzles were lots of fun, as usual.  A (Counting Sheep) and B (Revenge of the Pancakes) were easy.  When I looked at C (Coin Jam), I found it tricky, so I put that aside and did D (Fractiles) next.  Explaining the problem to my 19-year-old son helped me find a good approach.  After implementing it and submitting the solutions, I turned back to puzzle C.  My solution looked complex, and I suspected that there would be a more elegant way.  Still I was happy with the solution as it produced a result for the large set very quickly, so I submitted the results and was done. My solutions can be found here.

So how did I fare?

The round went on for 27 hours, and I started somewhere in the middle due to timezone.  So after I was done with my solutions, I had to wait until the next day to see the final score.  The "small" problem datasets are checked immediately after submission, so I already knew I had all of those right before I went to bed.  Since they already accounted for 37 points, and I needed 40 to advance to the next round, I could be pretty optimistic that I had passed—having any large dataset correct would get me more than the three missing points.
The intermediate scoreboard put me on rank #1189; this is assuming (for everybody) that all the "large" solutions that were submitted would be correct.  I was pretty sure that mine were all correct, so I had hopes that my final rank would be even higher.
But my final rank was #2748, because it turned out that my "large" solution for D (Fractiles) was incorrect.  This was a big disappointment, because I was very confident in my solution approach.  And it turned out that my error was a stupid one in one of the "easy" parts of the implementation; the decision whether the number of grad students (S) is sufficient to clean enough tiles in the base-length-K and complexity C artwork to detect whether there's any gold tile.  The correct test would be
  (if (< (* s c) k)
      "IMPOSSIBLE"
...but I used:
  (let ((min-s (truncate k c)))
    (if (< s min-s)
        "IMPOSSIBLE"
My test was too lax, and I sometimes output a "solution" when I should have printed IMPOSSIBLE.  This stupid mistake cost me >2000 ranks.  It would be nice if I managed to avoid this in the next round.  But realistically I'll make even more of these mistakes because the problems will be harder and the time pressure much higher.

How did the other Lispers do?

On www.go-hero.net there are statistics about GCJ participants and submissions, and it's easy to find all submissions for this round that used Lisp.  We Lispers were quite successful this weekend, with 26 out of 27 proceeding to the next round, a success rate of 96% compared to 81.5% for the entire population.
The highest ranking Lisper was Ipetru at #594 with all solutions in Lisp (and of course all correct).  I looked at his solutions for C and D, and they are so incredibly compact that I couldn't believe my eyes.  D used the same approach as I had, just very elegantly written—the code proper is about two lines; much harder to hide stupid mistakes in there! C used a completely different approach, deterministically generating Jamcoins rather than checking candidates as I had.
The second-ranking Lisper was DarkKnight. at #720.  He only wrote one solution in Lisp.  In fact he used different languages for all solutions, and I mean all 8 solutions, not all 4 puzzles! bc, OCaml, Lisp, Racket, Lua, Perl, Octave, R.  Impressive! :-)

Friday, February 13, 2015

BloFELD

Panopticlick illustrates that even without cookies etc., modern browsers usually send enough specific-enough information to Web servers so that you can be tracked. The various useful standard HTTP headers effectively work like a fingerprint of your browser/OS/hardware combination.

When I look at the Panopticlick results of my favorite browser (I get 22.26 bits of "identifying information", while more senior users have reported scores up to full 24 :-), one thing that stands out is a long list of "System Fonts". Arguably it is useful for me when Web sites know what fonts I have installed on my system, so that they can present Web pages with fonts that I actually have rather than having to send me the fonts as well. So the intention is good, but the implementation discloses too much of my typographic tastes. What can we do to fix this?

Well, that should be quite obvious: Instead of my browser sending the full list of fonts, it could send a Bloom filter that matches the fonts that I have. When a Web server wants to render a document for me, it can check for some candidate fonts whether I have them. Bloom filters are approximative and will sometimes generate false positives, but one Comic Sans Web page in 1'000 or so should be a small price to pay to get my virginitprivacy back.

You may respond that a priori the Bloom filter discloses as much of my privacy as the full list of fonts. But! I can simply send a new Bloom filter ("salted" differently) to each site I visit. VoilĂ  how I'll defeat all traceability of Web visits, undermine the business model of the Internet economy, and destroy the entire Western civilization. Muahaha!

-- BloFELD
(Bloom Filter-Enabled Limited Disclosure)

(Apologies to smb and Bill Cheswick, who seem to have fully baked and published a better version of this idea in 2007, and to the probably numerous others)