Sunday, May 09, 2010

GCJ 2010: Theme Park in Common Lisp (Qualification Round)

A trivial implementation will work fine for small cases, but not for a day with 108 rounds! I convinced myself (correctly I think) that I need to memoize intermediate results. I decided that I should memoize, for each group that is at the head of the queue, two values: The number of people (thus Euros) from the head of the group that would fit into the roller coaster, and the pointer to the next head of the queue (the index of the first group that will not fit). Then I could just iterate R rounds starting with queue pointer 0, and add up the (mostly cached) results. I would have to compute the roller-coaster occupation at most once for each group in the queue (max. 1000 in the large case).

However, I still have to iterate R rounds, so depending on the speed of the computer this can take long. During my attempt at the large example, I suspected that the laptop I was running this on wouldn't complete the computation in time, and copied everything over to a faster machine. On the laptop it took 2'40", on the faster machine around 45".

This is the (inefficient) code as used in the contest:

(defvar *get-riders-cache*)

(defun solve (input)
  (with-open-file (in input)
    (let ((n (read in)))
      (dotimes (k n)
 (solve-case k in)))))

(defun solve-case (caseno in)
  (let* ((R (read in))
  (k (read in))
  (N (read in)))
    (let ((gs (make-array (list N))))
      (dotimes (i N)
        (setf (aref gs i) (read in)))
      (let ((*get-riders-cache* (make-riders-cache N)))
        (solve-1 caseno R k gs 0 0)))))

(defun make-riders-cache (N)
  (make-array (list N) :initial-element nil))

(defun get-riders-cache (N) (aref *get-riders-cache* N))
(defun set-riders-cache (N new)
  (setf (aref *get-riders-cache* N) new))
  
(defun solve-1 (caseno R k gs ptr sum)
  (if (= R 0)
      (finish-solution caseno sum)
    (multiple-value-bind (next-group-size next-ptr)
 (get-riders k gs ptr)
      (solve-1 caseno (1- R) k gs next-ptr (+ sum next-group-size)))))

(defun get-riders (R gs ptr)
  (let ((cached (get-riders-cache ptr)))
    (if cached
 (values (car cached) (cdr cached))
      (multiple-value-bind (size next-ptr)
   (get-riders-1 R gs ptr (length gs) 0)
 (set-riders-cache ptr (cons size next-ptr))
 (values size next-ptr)))))
  
(defun get-riders-1 (R gs ptr rql size)
  (declare (type (unsigned-byte 30) R ptr rql size))
  (declare (optimize (debug 0) (speed 3)))
  (if (or (= rql 0) (< R (aref gs ptr)))
      (values size ptr)
    (get-riders-1 (- R (aref gs ptr)) gs (mod (1+ ptr) (length gs)) (1- rql) (+ size (aref gs ptr)))))

(defun finish-solution (caseno sum)
  (format t "Case #~D: ~D~%" (1+ caseno) sum))
Back at home, I decided that I had to optimize this some more. In particular, when one finds a cycle (a queue index that has already been encountered during a previous round), one can just compute how many of those cycles can be filled with the remaining rounds, and add that number multiplied with the number of Euros earned during a cycle to the account. Then that leaves only a small (smaller than the cycle) number of rounds to execute.
(defun solve-1 (caseno R k gs ptr sum)
  (assert (>= R 0))
  (if (= R 0)
      (finish-solution caseno sum)
    (multiple-value-bind (next-group-size next-ptr last-seen old-sum)
        (get-riders k R gs ptr sum)
      (if last-seen
          (let ((cycle-length (- last-seen R))
                (per-cycle (- sum old-sum)))
            (multiple-value-bind (ncycles remaining-rounds)
                (truncate R cycle-length)
              (incf sum (* ncycles per-cycle))
              (setq R remaining-rounds))
            (if (zerop R)
                (finish-solution caseno sum)
              (solve-1 caseno (1- R) k gs next-ptr (+ sum next-group-size))))
        (solve-1 caseno (1- R) k gs next-ptr (+ sum next-group-size))))))

This code takes 0.34 seconds for my large input, even on my netbook at home - which took 20 minutes for my initial (naïve with memoization) solution! Only eight minutes are allowed for submission of a large solution...

In a way I was right not to optimize this further, because the naïve memoized solution turned out to be (just) fast enough. But this is exactly the kind of not-thinking-the-problem-through that will hurt me in the "real" rounds of the competition. Maybe it would be good training to change the solution into one that calculates all possible values up front rather than on-demand (caching).

Followup: Complexity Analysis and Iterative Solution

The complexity of the first algorithm (naïve with memoization) is O(R+N2).  The work for each R is just a memory access (within the relatively small cache), an integer addition that fits in 64 bits, and a few assignments.  Therefore it can still work when R is close to 108.  The N2 comes from computing the occupancies of the roller coaster for each possible group that may be in front of the queue at the start of a round.  There are N such groups.  For each position, computing the occupancy may take up to N steps.  Note that by "computing the occupancy", I always mean computing the occupancy and the next head-of-queue group.

The complexity of the second algorithm (like first but with cycle detection) is O(N2), because each possible position in the queue (there are N of these) can be visited twice at a maximum.  Again, the complexity of finding the occupation of the roller coaster given a queue pointer is O(N).

The complexity can be brought down to O(N) as follows: The occupancy is computed completely only for the first queue-head group.  Then, the occupancy for each subsequent group can be computed incrementally from the previous one in a small number of steps. Here is an implementation of this algorithm:

(defun solve-case (caseno in)
  (let* ((R (read in))
  (k (read in))
  (N (read in)))
    (declare (type (integer 1 100000000) R)
      (type (integer 1 1000000000) k)
      (type (integer 1 1000) N))
    (let ((g (make-array (list N))))
      (dotimes (i N)
        (setf (aref g i)
          (make-group (read in))))
      (let ((fill-level 0)
            (end 0))
        (do ((start 0 (1+ start)))
            ((= start (length g)))
          (setq fill-level
            (if (zerop start)
                0
              (- fill-level (group-size (aref g (1- start))))))
          (loop
            (when (> (+ fill-level (group-size (aref g end))) k)
              (return))
            (incf fill-level (group-size (aref g end)))
            (setq end (mod (1+ end) (length g)))
            (when (= start end)
              (return)))
          (setf (group-next-pointer (aref g start)) end)
          (setf (group-fill-level (aref g start)) fill-level)))
      (let ((ptr 0) (sum 0) (found-cycle nil))
        (declare (type (integer 0 999) ptr))
        (let ((result
               (loop
                 (when (= R 0) (return sum))
                 (let ((group (aref g ptr)))
                   (when (and (not found-cycle) (group-last-seen group))
                     (setq found-cycle t)
                     (let ((cycle-length (- (group-last-seen group) R))
                           (per-cycle (- sum (group-euros-at-last-seen group))))
                       ;;(declare (type (integer 1 1000) cycle-length))
                       (multiple-value-bind (ncycles remaining-rounds)
                           (truncate R cycle-length)
                         (incf sum (* ncycles per-cycle))
                         (setq R remaining-rounds))
                       (when (zerop R) (return sum))))
                         (setf (group-last-seen group) R)
                         (setf (group-euros-at-last-seen group) sum)
                         (decf R)
                         (setq ptr (group-next-pointer group))
                         (incf sum (group-fill-level group))))))
          (format t "Case #~D: ~D~%" (1+ caseno) result))))))

For my large-input set, this is not significantly faster than the second program (0.21s instead of 0.34s on my Netbook). But it would scale to a queue with massively more groups. And it is possibly optimal in terms of complexity class.

No comments: