repa-convert-4.2.2.1: Packing and unpacking flat tables.

Safe HaskellNone
LanguageHaskell98

Data.Repa.Convert

Contents

Description

Convert tuples of Haskell values to and from ASCII or packed binary representations.

This package is intended for cheap and cheerful serialisation and deserialisation of flat tables, where each row has a fixed format. If you have a table consisting of a couple hundred megs of Pipe-Separated-Variables issued by some filthy enterprise system, then this package is for you.

If you want to parse context-free, or context-sensitive languages then try the parsec or attoparsec packages. If you have binary data that does not have a fixed format then try the binary or cereal packages.

For testing purposes, use packToString which takes a format, a record, and returns a list of bytes.

> import Data.Repa.Convert

> let format   = mkSep '|' (VarChars :*: IntAsc :*: DoubleAsc :*: ())
> let Just str = packToString format ("foo" :*: 66 :*: 93.42 :*: ())
> str
"foo|66|93.42"

We can then unpack the raw bytes back to Haskell values with unpackFromString.

> unpackFromString format str 
Just ("foo" :*: (66 :*: (93.42 :*: ())))

In production code use unsafeRunPacker and unsafeRunUnpacker to work directly with a buffer in foreign memory.

  • NOTE that in the current version the separating character is un-escapable.
  • The above means that the format (Sep ',') does NOT parse a CSV file according to the CSV specification: http://tools.ietf.org/html/rfc4180.

Synopsis

Documentation

The Formats module contains the pre-defined data formats.

Data formats

class Format f where Source

Relates a storage format to the Haskell type of the value that is stored in that format.

Associated Types

type Value f Source

Get the type of a value with this format.

Methods

fieldCount :: f -> Int Source

Yield the number of separate fields in this format.

minSize :: f -> Int Source

Yield the minumum number of bytes that a value of this format will take up.

Packing a value into this format is guaranteed to use at least this many bytes. This is exact for fixed-size formats.

fixedSize :: f -> Maybe Int Source

For fixed size formats, yield their size (length) in bytes.

Yields Nothing if this is not a fixed size format.

packedSize :: f -> Value f -> Maybe Int Source

Yield the maximum packed size of the value in this format.

If fixedSize returns a size then packedSize returns the same size.

For variable length formats, packedSize is an over-approximation. We allow the actual packed value to use less space, as it may not be possible to determine how much space it needs without actually packing it.

Yields Nothing when a collection of values is to be packed into a fixed length format, but the size of the collection does not match the format.

Type constraints

forFormat :: format -> Value format -> Value format Source

Constrain the type of a value to match the given format.

The value itself is not used.

listFormat :: format -> [Value format] -> [Value format] Source

Constrain the type of some values to match the given format.

The value itself is not used.

High-level interface

for ByteStrings

packToByteString :: Packable format => format -> Value format -> Maybe ByteString Source

Pack a value to a freshly allocated ByteString.

unpackFromByteString :: Unpackable format => format -> ByteString -> Maybe (Value format) Source

Unpack a value from a ByteString.

for Lists of Word8

packToList8 :: Packable format => format -> Value format -> Maybe [Word8] Source

Pack a value to a list of Word8.

unpackFromList8 :: Unpackable format => format -> [Word8] -> Maybe (Value format) Source

Unpack a value from a list of Word8.

for Strings

packToString :: Packable format => format -> Value format -> Maybe String Source

Pack a value to a (hated) Haskell String.

unpackFromString :: Unpackable format => format -> String -> Maybe (Value format) Source

Unpack a value from a (hated) Haskell String.

Low-level interface

Packing data

class Format format => Packable format where Source

Class of storage formats that can have values packed and unpacked from foreign bufferes.

The methods are written using continuations to make it easier for GHC to optimise its core code when packing/unpacking many fields.

Minimal complete definition

packer

Methods

pack Source

Arguments

:: format

Storage format.

-> Value format

Value to pack.

-> Packer

Packer that can write the value.

Pack a value into a buffer using the given format.

packer Source

Arguments

:: format

Data format.

-> Value format

Value to pack.

-> Addr#

Pointer to start of buffer.

-> IO ()

Signal failure.

-> (Addr# -> IO ())

Accept the address after the packed field.

-> IO () 

Low level packing function for the given format.

data Packer Source

Packer wraps a function that can write to a buffer.

Constructors

Packer 

Fields

fromPacker :: Addr# -> IO () -> (Addr# -> IO ()) -> IO ()

Takes start of buffer; failure action; and a continuation.

We try to pack data into the given buffer. If packing succeeds then we call the continuation with a pointer to the next byte after the packed value, otherwise we call the failure action.

Instances

unsafeRunPacker Source

Arguments

:: Packer

Packer to run.

-> Ptr Word8

Start of buffer.

-> IO (Maybe (Ptr Word8))

Pointer to the byte after the last one written.

Pack data into the given buffer.

PRECONDITION: The buffer needs to be big enough to hold the packed data, otherwise you'll corrupt the heap (bad). Use packedSize to work out how big it needs to be.

Unpacking data

class Format format => Unpackable format where Source

Minimal complete definition

unpacker

Methods

unpack Source

Arguments

:: format

Storage format.

-> Unpacker (Value format)

Unpacker for that format.

Unpack a value from a buffer using the given format.

unpacker Source

Arguments

:: format

Data format.

-> Addr#

Start of buffer.

-> Addr#

Pointer to first byte after end of buffer.

-> (Word8 -> Bool)

Detect a field terminator.

-> IO ()

Signal failure.

-> (Addr# -> Value format -> IO ())

Accept an unpacked value.

-> IO () 

Low level unpacking function for the given format.

data Unpacker a Source

Constructors

Unpacker 

Fields

fromUnpacker :: Addr# -> Addr# -> (Word8 -> Bool) -> IO () -> (Addr# -> a -> IO ()) -> IO ()

Takes pointers to the first byte in the buffer; the first byte after the buffer; a predicate to detect a field terminator; a failure action; and a continuation.

The field terminator is used by variable length encodings where the length of the encoded data cannot be determined from the encoding itself.

We try to unpack a value from the buffer. If unpacking succeeds then call the continuation with a pointer to the next byte after the unpacked value, and the value itself, otherwise call the failure action.

unsafeRunUnpacker Source

Arguments

:: Unpacker a

Unpacker to run.

-> Ptr Word8

Source buffer.

-> Int

Length of source buffer.

-> (Word8 -> Bool)

Detect a field terminator.

-> IO (Maybe (a, Ptr Word8))

Unpacked result, and pointer to the byte after the last one read.

Unpack data from the given buffer.

PRECONDITION: The buffer must be at least the minimum size of the format (minSize). This allows us to avoid repeatedly checking for buffer overrun when unpacking fixed size format. If the buffer is not long enough then you'll get an indeterminate result (bad).