2010年6月12日土曜日

ICFP 2009 orbit binary file reader

In preparation for next weekend's ICFP contest, I decided to do some clojure (<= 1.1) exercises this weekend, using last year's problem.

I will not do the actual problems, but I will implement the complete infrastructure, i.e a binary reader, the vm, a binary writer for the output file, and a graphical visualizer, as that is what will probably help me most in terms of improving my clojure skills.

This post takes care of the orbit binary file (obf) reader. I tried to implement the reader the clojure way, i.e. using sequences and operations on sequences rather than using nio and working on arrays of binary data directly to squeeze out performance - optimizations later.

Usage of the reader is straightforward: You provide the function obf-parse with a sequence of bytes (the data type is not Bytes, but Integers that can take values from 0-255, as returned by Java's InputStreams) and you get a sequence of vectors each containing a data value (type Double) and instructions (that are another vector consisting of the opcode and 2 parameters).

Anyone who has done ICFP 2009 knows that the obf format has its quirks - it look contrived, but in professional reality, you would often wish you could work with formats as nice as this one. The obf reader tries to take care of all these quirks, so that the vm gets a nice and clean stream of data and instructions.

Here's a sample output of the 10 first parsed frames of a obf file:



DATA OPCODE PARAM1 PARAM2
--------------------------------
1.00e+00 :noop 0 0
0.00e+00 :copy 0 265
3.00e+01 :noop 0 0
0.00e+00 :noop 0 0
0.00e+00 :copy 0 248
0.00e+00 :sub 4 3
0.00e+00 :cmpz = 5
0.00e+00 :phi 2 1
0.00e+00 :sub 7 0
0.00e+00 :copy 0 263



I'm posting this in the hope that I might get some feedback and that it can be used as motivation and inspiration for anyone who considers participating in the contest next week. It would be great if we got some successful clojure contributions!

If there's anything that could improve readability or usability of the code, or performance improvements that do not turn it into a mess, I'd like to know about it. And of course if you find any bugs, shoot a comment!


(ns orbit-binary-reader)

(defn byte-seq
[input-stream]
"Turns an InputStream into a sequence of bytes, lazily read, closing the stream when the end is reached"
(let [a-byte (. #^java.io.InputStream input-stream read)]
(if (= a-byte -1)
(do (. #^java.io.InputStream input-stream close) nil)
(lazy-seq (cons a-byte (byte-seq input-stream))))))

(defn- bytes-to-integer
"Turns big-endian ordered bytes into a integer number (of type int, long, etc.)"
[bytes]
(reduce #(+ (bit-shift-left %1 8) %2) 0 bytes))

(defn- bytes-to-double
"Turns big-endian ordered bytes into a double"
[bytes]
(Double/longBitsToDouble (bytes-to-integer bytes)))

(defn- obf-byte-seq-normalize
"Turns the input byte-seq into a byte-seq where data-bytes and instruction-bytes are alternating,
thus removing the even/odd oddity; also changes it to big-endian order"

[byte-seq]
(let [step (fn step [byte-seq even]
(when (not (empty? byte-seq))
(let [properly-ordered
(if even
(concat (reverse (take 8 byte-seq)) (reverse (take 4 (drop 8 byte-seq))))
(reverse (take 12 byte-seq)))]
(if (not= 12 (count properly-ordered))
(throw (Exception. (format "Expected 12 bytes, but only %d bytes left in seq" (count properly-ordered)))))
(concat properly-ordered (lazy-seq (step (drop 12 byte-seq) (not even)))))))]
(step byte-seq true)))

(def opcode-map
{
2r10001 :add
2r10010 :sub
2r10011 :mult
2r10100 :div
2r10101 :output
2r10110 :phi

2r00000 :noop
2r00001 :cmpz
2r00010 :sqrt
2r00011 :copy
2r00100 :input
})

(def cmpz-operator-map
{
2r0000000000 < ; LTZ
2r0010000000 <= ; LEZ
2r0100000000 = ; EQZ
2r0110000000 >= ; GEZ
2r1000000000 > ; GTZ
})

(defn- obf-parse-instruction
"Parses an instruction of type long into a vector [opcode arg*]"
[instruction-long]
(let [d-opcode (bit-and (bit-shift-right instruction-long 28) 2r1111)
s? (zero? d-opcode)
; we unify d-opcodes and s-opcodes into one format, d-opcodes have the fifth bit set, s-opcodes do not
; this makes it easier to look up the opcodes in the map
opcode (opcode-map (if s? (bit-and (bit-shift-right instruction-long 24) 2r1111) (bit-or d-opcode 2r10000)))
param1 (bit-and (bit-shift-right instruction-long 14) (if s? 2r1111111111 2r11111111111111))
param1 (if (= :cmpz opcode) (cmpz-operator-map (bit-and 2r1110000000 param1)) param1)
param2 (bit-and instruction-long 2r11111111111111)]
[opcode param1 param2]))


(defn- obf-normalized-parse
"Parses a normalized obf byte sequence, i.e. big-endian, and data / instruction alternating into
a sequence of [data [opcode param1 param2]] vectors, where data is a double and instruction an integer (int or long)"

[normalized-byte-seq]
(map
#(let [[data instruction] (split-at 8 %)] [(bytes-to-double data) (obf-parse-instruction (bytes-to-integer instruction))])
(partition 12 normalized-byte-seq)))

(defn obf-seq
"Parses a raw obf byte sequence (like from an obf file) into a sequence of [data [opcode param1 param2]] vectors,
where data is a double and instruction an integer (int or long)"

[byte-seq]
(obf-normalized-parse (obf-byte-seq-normalize byte-seq)))

0 件のコメント:

コメントを投稿

フォロワー