packman-0.3.0: Serialization library for GHC

Copyright(c) Jost Berthold, 2010-2015,
LicenseBSD3
Maintainerjost.berthold@gmail.com
Stabilityexperimental
Portabilityno (depends on GHC internals)
Safe HaskellNone
LanguageHaskell2010

GHC.Packing

Contents

Description

Serialisation of Haskell data structures (independent of evaluation)

Haskell heap structures can be serialised, capturing their current state of evaluation, and deserialised later during the same program run (effectively duplicating the data). Serialised data can also be written to storage or sent over a network, and deserialised in a different run or different instance of the same executable binary.

The feature can be used to implement message passing over a network (which is where the runtime support originated), or for various applications based on data persistence, for instance checkpointing and memoisation.

The library described here supports an operation to serialise Haskell heap data:

trySerialize :: a -> IO (Serialized a)

The routine will throw a PackException if an error occurs inside the C code which accesses the Haskell heap (see PackException). In presence of concurrent threads, another thread might be evaluating data referred to by the data to be serialised. In this case, the calling thread will block on the ongoing evaluation and continue when evaluated data is available. Internally, there is a PackException P_BLACKHOLE to signal the condition, but it is hidden inside the core library (see Background Information below).

The inverse operation to serialisation is

deserialize :: Serialized a -> IO a

The data type Serialized a includes a phantom type a to ensure type safety within one and the same program run. Type a can be polymorphic (at compile time, that is) when Serialized a is not used apart from being argument to deserialize.

The Show, Read, and Binary instances of Serialized a require an additional Typeable context (which requires a to be monomorphic) in order to implement dynamic type checks when parsing and deserialising data from external sources. Consequently, the PackException type contains exceptions which indicate parse errors and type/binary mismatch.

Synopsis

Serialisation Operations

trySerialize :: a -> IO (Serialized a) Source

Serialises its argument (in current evaluation state, as a thunk). May block if the argument captures (blackhole'd) data under evaluation, may throw PackExceptions to signal errors. This version uses a default buffer of 10MB (see trySerializeWith for a version with flexible buffer size).

trySerializeWith :: a -> Int -> IO (Serialized a) Source

Extended serialisation interface: Allocates a buffer of given size (in bytes), serialises data into it, then truncates the buffer to the required size before returning it (as Serialized a)

deserialize :: Serialized a -> IO a Source

Deserialisation function. May throw PackException P_GARBLED

Data Types and instances

data Serialized a Source

The type of Serialized data. Phantom type a ensures that we unpack data as the expected type.

Instances

Typeable * a => Read (Serialized a) Source

Reads the format generated by the Show instance, checks hash values for executable and type and parses exactly as much as the included data size announces.

Typeable * a => Show (Serialized a) Source

prints packet as Word array in 4 columns (Word meaning the machine word size), and additionally includes Fingerprint hash values for executable binary and type.

Typeable * a => Binary (Serialized a) Source

The binary format of Serialized a data includes FingerPrint hash values for type and executable binary, which are checked when reading Serialized data back in using get.

The power of evaluation-orthogonal serialisation is that one can externalise partially evaluated data (containing thunks), for instance write it to disk or send it over a network.

Therefore, the module defines a Binary instance for Serialized a, as well as instances for Read and Show which satisfy read . show == id :: Serialized a -> Serialized a@.

The phantom type is enough to ensure type-correctness when serialised data remain in one single program run. However, when data from previous runs are read in from an external source, their type needs to be checked at runtime. Type information must be stored together with the (binary) serialisation data.

The serialised data contain pointers to static data in the generating program (top-level functions and constants) and very likely to additional library code. Therefore, the exact same binary must be used when reading in serialised data from an external source. A hash of the executable is therefore included in the representation as well.

data PackException Source

Packing exception codes, matching error codes implemented in the runtime system or describing errors which can occur within Haskell.

Constructors

P_SUCCESS

no error, ==0. Internal code, should never be seen by users.

P_BLACKHOLE

RTS: packing hit a blackhole. Used internally, not passed to users.

P_NOBUFFER

RTS: buffer too small

P_CANNOTPACK

RTS: contains closure which cannot be packed (MVar, TVar)

P_UNSUPPORTED

RTS: contains unsupported closure type (implementation missing)

P_IMPOSSIBLE

RTS: impossible case (stack frame, message,...RTS bug!)

P_GARBLED

RTS: corrupted data for deserialisation

P_ParseError

Haskell: Packet data could not be parsed

P_BinaryMismatch

Haskell: Executable binaries do not match

P_TypeMismatch

Haskell: Packet data encodes unexpected type

PackExceptions can occur at Haskell level or in the foreign primop. The Haskell-level exceptions all occur when reading in Serialised data, and are:

  • P_BinaryMismatch: the serialised data have been produced by a different executable (must be the same binary).
  • P_TypeMismatch: the serialised data have the wrong type
  • P_ParseError: serialised data could not be parsed (from binary or text format)

The other exceptions are return codes of the foreign primitive operation, and indicate errors at the C level. Most of them occur when serialising data; the exception is P_GARBLED which indicates corrupt serialised data.

Serialisation and binary file I/O

encodeToFile :: Typeable a => FilePath -> a -> IO () Source

Write serialised binary data directly to a file. May throw PackExceptions.

decodeFromFile :: Typeable a => FilePath -> IO a Source

Directly read binary serialised data from a file. May throw PackExceptions (catches I/O and Binary exceptions from decoding the file and re-throws P_ParseError)

Background Information

The functionality exposed by this module builds on serialisation of Haskell heap graph structures, first implemented in the context of implementing the GpH implementation GUM (Graph reduction on a Unified Memory System) and later adopted by the implementation of Eden. Independent of its evaluation state, data and thunks can be transferred between the (independent) heaps of several running Haskell runtime system instances which execute the same executable.

The idea to expose the heap data serialisation functionality (often called packing) to Haskell by itself was first described in Jost Berthold. Orthogonal Serialisation for Haskell. In Jurriaan Hage and Marco Morazan, editors, /IFL'10, 22nd Symposium on Implementation and Application of Functional Languages/, Springer LNCS 6647, pages 38-53, 2011. This paper can be found at http://www.mathematik.uni-marburg.de/~eden/papers/mainIFL10-withCopyright.pdf, the original publication is available at http://www.springerlink.com/content/78642611n7623551/.

The core runtime support consists of just two operations: (slightly paraphrasing the way in which GHC implements the IO monad here)

serialize#   :: a -> IO ByteArray# -- OUTDATED, see below
deserialize# :: ByteArray# -> IO a -- which is actually pure from a mathematical POW

However, these operations are completely unsafe with respect to Haskell types, and may fail at runtime for various other reasons as well. Type safety can be established by a phantom type, but needs to be checked at runtime when the resulting data structure is externalised (for instance, saved to a file). Besides prohibiting unprotected type casts, another restriction that needs to be explicitly checked in this case is that different programs cannot exchange data by this serialisation. When data are serialised during execution, they can only be deserialised by exactly the same executable binary because they contain code pointers that will change even by recompilation.

Other failures can occur because of the runtime system's limitations, and because some mutable data types are not allowed to be serialised. A newer API therefore suggests additions towards exception handling and better usability. The original primitive serialize is modified and now returns error codes, leading to the following type (again paraphrasing):

trySerialize# :: a -> IO ( Int# , ByteArray# )

where the Int# encodes potential error conditions returned by the runtime.

A second primitive operation has been defined, which uses a pre-allocated ByteArray#

trySerializeWith# :: a -> ByteArray# -> IO ( Int# , ByteArray# )

Further to returning error codes, the newer primitive operation do not block the calling thread when the serialisation encounters a blackhole in the heap. It would be possible to observe the existence of blackholes from Haskell by the return code of these primitive operation. This could - in theory - be used to explicitly control and avoid blocking (avoiding unresponsive behaviour). In practice, however, making blackholes observable from Haskell is certainly undesirable. The primitive operations return the address of the blackhole, and the caller will block on this blackhole at the Haskell level (see code in the GHC.Packing.Core module).

The Haskell layer and its types protect the interface function deserialize from being applied to grossly wrong data (by checking a fingerprint of the executable and the expected type), but deserialisation is still rather fragile (unpacking code pointers and data). The primitive operation in the runtime system will only detect grossly wrong formats, and the primitive will return error code P_GARBLED when data corruption is detected.

deserialize# :: ByteArray# -> IO ( Int# , a )