BitSyntax-0.3.2: A module to aid in the (de)serialisation of binary data

Data.BitSyntax

Contents

Description

This module contains fuctions and templates for building up and breaking down packed bit structures. It's something like Erlang's bit-syntax (or, actually, more like Python's struct module).

This code uses Data.ByteString which is included in GHC 6.5 and you can get it for 6.4 at http://www.cse.unsw.edu.au/~dons/fps.html

Synopsis

Building bit structures

The core function here is makeBits, which is a perfectly normal function. Here's an example which makes a SOCKS4a request header: makeBits [U8 4, U8 1, U16 80, U32 10, NullTerminated "username", NullTerminated "www.haskell.org"]

data BitBlock Source

Constructors

U8 Int

Unsigned 8-bit int

U16 Int

Unsigned 16-bit int

U32 Int

Unsigned 32-bit int

U16LE Int

Little-endian, unsigned 16-bit int

U32LE Int

Little-endian, unsigned 32-bit int

NullTerminated String

Appends the string with a trailing NUL byte

RawString String

Appends the string without any terminator

RawByteString ByteString

Appends a ByteString

PackBits [(Int, Int)]

Packs a series of bit fields together. The argument is a list of pairs where the first element is the size (in bits) and the second is the value. The sum of the sizes for a given PackBits must be a multiple of 8

Instances

makeBits :: [BitBlock] -> ByteStringSource

Make a binary string from the list of elements given

Breaking up bit structures

The main function for this is bitSyn, which is a template function and so you'll need to run with -fth to enable template haskell http://www.haskell.org/th/.

To expand the function you use the splice command: $(bitSyn [...])

The expanded function has type ByteString -> (...) where the elements of the tuple depend of the argument to bitSyn (that's why it has to be a template function).

Heres an example, translated from the Erlang manual, which parses an IP header:

 decodeOptions bs ([_, hlen], _, _, _, _, _, _, _, _, _)
   | hlen > 5  = return $ BS.splitAt (fromIntegral ((hlen - 5) * 4)) bs
   | otherwise = return (BS.empty, bs)
 ipDecode = $(bitSyn [PackedBits [4, 4], Unsigned 1, Unsigned 2, Unsigned 2,
                      PackedBits [3, 13], Unsigned 1, Unsigned 1, Unsigned 2,
                      Fixed 4, Fixed 4, Context 'decodeOptions, Rest])
 ipPacket = BS.pack [0x45, 0, 0, 0x34, 0xd8, 0xd2, 0x40, 0, 0x40, 0x06,
                     0xa0, 0xca, 0xac, 0x12, 0x68, 0x4d, 0xac, 0x18,
                     0x00, 0xaf]

This function has several weaknesses compared to the Erlang version: The elements of the bit structure are not named in place, instead you have to do a pattern match on the resulting tuple and match up the indexes. The type system helps in this, but it's still not quite as nice.

data ReadType Source

Constructors

Unsigned Integer

An unsigned number of some number of bytes. Valid arguments are 1, 2 and 4

UnsignedLE Integer

An unsigned, little-endian integer of some number of bytes. Valid arguments are 2 and 4

Variable Name

A variable length element to be decoded by a custom function. The function's name is given as the single argument and should have type Monad m => ByteString -> m (v, ByteString)

Skip Integer

Skip some number of bytes

Fixed Integer

A fixed size field, the result of which is a ByteString of that length.

Ignore ReadType

Decode a value and ignore it (the result will not be part of the returned tuple)

Context Name

Like variable, but the decoding function is passed the entire result tuple so far. Thus the function whose name passed has type Monad m => ByteString -> (...) -> m (v, ByteString)

LengthPrefixed

Takes the most recent element of the result tuple and interprets it as the length of this field. Results in a ByteString

PackedBits [Integer]

Decode a series of bit fields, results in a list of Integers. Each element of the argument is the length of the bit field. The sums of the lengths must be a multiple of 8

Rest

Results in a ByteString containing the undecoded bytes so far. Generally used at the end to return the trailing body of a structure, it can actually be used at any point in the decoding to return the trailing part at that point.