Copyright | (c) Jost Berthold, 2010-2015, |
---|---|
License | BSD3 |
Maintainer | jost.berthold@gmail.com |
Stability | experimental |
Portability | no (depends on GHC internals) |
Safe Haskell | None |
Language | Haskell2010 |
Serialisation of Haskell data structures (independent of evaluation)
Haskell heap structures can be serialised, capturing their current state of evaluation, and deserialised later during the same program run (effectively duplicating the data). Serialised data can also be written to storage or sent over a network, and deserialised in a different run or different instance of the same executable binary.
The feature can be used to implement message passing over a network (which is where the runtime support originated), or for various applications based on data persistence, for instance checkpointing and memoisation.
The library described here supports an operation to serialise Haskell heap data:
trySerialize :: a -> IO (Serialized a)
The routine will throw a PackException
if an error occurs inside the
C code which accesses the Haskell heap (see
).
In presence of concurrent threads, another thread might be evaluating
data referred to by the data to be serialised. In this case, the calling
thread will block on the ongoing evaluation and continue when evaluated
data is available.
Internally, there is a PackException
PackException
P_BLACKHOLE
to signal the
condition, but it is hidden inside the core library
(see Background Information below).
The inverse operation to serialisation is
deserialize :: Serialized a -> IO a
The data type Serialized
a includes a phantom type a
to ensure
type safety within one and the same program run. Type a
can be
polymorphic (at compile time, that is) when Serialized
a
is not used
apart from being argument to deserialize
.
The Show
, Read
, and Binary
instances of Serialized a
require an
additional Typeable
context (which requires a
to be monomorphic)
in order to implement dynamic type checks when parsing and deserialising
data from external sources.
Consequently, the PackException
type contains exceptions which indicate
parse errors and type/binary mismatch.
- trySerialize :: a -> IO (Serialized a)
- trySerializeWith :: a -> Int -> IO (Serialized a)
- deserialize :: Serialized a -> IO a
- data Serialized a
- data PackException
- encodeToFile :: Typeable a => FilePath -> a -> IO ()
- decodeFromFile :: Typeable a => FilePath -> IO a
Serialisation Operations
trySerialize :: a -> IO (Serialized a) Source
Serialises its argument (in current evaluation state, as a thunk).
May block if the argument captures (blackhole'd) data under evaluation,
may throw PackException
s to signal errors.
This version uses a default buffer of 10MB (see trySerializeWith
for a version with flexible buffer size).
trySerializeWith :: a -> Int -> IO (Serialized a) Source
Extended serialisation interface: Allocates a buffer of given size (in
bytes), serialises data into it, then truncates the buffer to the
required size before returning it (as
)Serialized
a
deserialize :: Serialized a -> IO a Source
Deserialisation function. May throw PackException
P_GARBLED
Data Types and instances
data Serialized a Source
The type of Serialized data. Phantom type a
ensures that we
unpack data as the expected type.
Typeable * a => Read (Serialized a) Source | Reads the format generated by the |
Typeable * a => Show (Serialized a) Source | prints packet as Word array in 4 columns (Word meaning the machine word size), and additionally includes Fingerprint hash values for executable binary and type. |
Typeable * a => Binary (Serialized a) Source | The binary format of |
The power of evaluation-orthogonal serialisation is that one can externalise partially evaluated data (containing thunks), for instance write it to disk or send it over a network.
Therefore, the module defines a Binary
instance for
Serialized
a, as well as instances for Read
and Show
which
satisfy
read
. show
== id
:: Serialized
a -> Serialized
a@.
The phantom type is enough to ensure type-correctness when serialised data remain in one single program run. However, when data from previous runs are read in from an external source, their type needs to be checked at runtime. Type information must be stored together with the (binary) serialisation data.
The serialised data contain pointers to static data in the generating program (top-level functions and constants) and very likely to additional library code. Therefore, the exact same binary must be used when reading in serialised data from an external source. A hash of the executable is therefore included in the representation as well.
data PackException Source
Packing exception codes, matching error codes implemented in the runtime system or describing errors which can occur within Haskell.
P_SUCCESS | no error, ==0. Internal code, should never be seen by users. |
P_BLACKHOLE | RTS: packing hit a blackhole. Used internally, not passed to users. |
P_NOBUFFER | RTS: buffer too small |
P_CANNOTPACK | RTS: contains closure which cannot be packed (MVar, TVar) |
P_UNSUPPORTED | RTS: contains unsupported closure type (implementation missing) |
P_IMPOSSIBLE | RTS: impossible case (stack frame, message,...RTS bug!) |
P_GARBLED | RTS: corrupted data for deserialisation |
P_ParseError | Haskell: Packet data could not be parsed |
P_BinaryMismatch | Haskell: Executable binaries do not match |
P_TypeMismatch | Haskell: Packet data encodes unexpected type |
PackException
s can occur at Haskell level or in the foreign primop.
The Haskell-level exceptions all occur when reading in
Serialised
data, and are:
P_BinaryMismatch
: the serialised data have been produced by a different executable (must be the same binary).P_TypeMismatch
: the serialised data have the wrong typeP_ParseError
: serialised data could not be parsed (from binary or text format)
The other exceptions are return codes of the foreign primitive
operation, and indicate errors at the C level. Most of them occur when
serialising data; the exception is P_GARBLED
which indicates corrupt
serialised data.
Serialisation and binary file I/O
encodeToFile :: Typeable a => FilePath -> a -> IO () Source
Write serialised binary data directly to a file. May throw PackException
s.
decodeFromFile :: Typeable a => FilePath -> IO a Source
Directly read binary serialised data from a file. May throw
PackException
s (catches I/O and Binary exceptions from decoding
the file and re-throws P_ParseError
)
Background Information
The functionality exposed by this module builds on serialisation of Haskell heap graph structures, first implemented in the context of implementing the GpH implementation GUM (Graph reduction on a Unified Memory System) and later adopted by the implementation of Eden. Independent of its evaluation state, data and thunks can be transferred between the (independent) heaps of several running Haskell runtime system instances which execute the same executable.
The idea to expose the heap data serialisation functionality (often called packing) to Haskell by itself was first described in Jost Berthold. Orthogonal Serialisation for Haskell. In Jurriaan Hage and Marco Morazan, editors, /IFL'10, 22nd Symposium on Implementation and Application of Functional Languages/, Springer LNCS 6647, pages 38-53, 2011. This paper can be found at http://www.mathematik.uni-marburg.de/~eden/papers/mainIFL10-withCopyright.pdf, the original publication is available at http://www.springerlink.com/content/78642611n7623551/.
The core runtime support consists of just two operations: (slightly paraphrasing the way in which GHC implements the IO monad here)
serialize# :: a -> IO ByteArray# -- OUTDATED, see below deserialize# :: ByteArray# -> IO a -- which is actually pure from a mathematical POW
However, these operations are completely unsafe with respect to Haskell types, and may fail at runtime for various other reasons as well. Type safety can be established by a phantom type, but needs to be checked at runtime when the resulting data structure is externalised (for instance, saved to a file). Besides prohibiting unprotected type casts, another restriction that needs to be explicitly checked in this case is that different programs cannot exchange data by this serialisation. When data are serialised during execution, they can only be deserialised by exactly the same executable binary because they contain code pointers that will change even by recompilation.
Other failures can occur because of the runtime system's limitations,
and because some mutable data types are not allowed to be serialised.
A newer API therefore suggests additions towards exception handling
and better usability.
The original primitive
is modified and now returns error
codes, leading to the following type (again paraphrasing):serialize
trySerialize# :: a -> IO ( Int# , ByteArray# )
where the Int#
encodes potential error conditions returned by the runtime.
A second primitive operation has been defined, which uses a pre-allocated
ByteArray#
trySerializeWith# :: a -> ByteArray# -> IO ( Int# , ByteArray# )
Further to returning error codes, the newer primitive operation do not block
the calling thread when the serialisation encounters a blackhole in the
heap.
It would be possible to observe the existence of blackholes from Haskell by
the return code of these primitive operation. This could - in theory - be
used to explicitly control and avoid blocking (avoiding unresponsive behaviour).
In practice, however, making blackholes observable from Haskell is
certainly undesirable. The primitive operations return the address of the
blackhole, and the caller will block on this blackhole at
the Haskell level (see code in the GHC.Packing.Core
module).
The Haskell layer and its types protect the interface function
from being applied to grossly wrong data (by checking a fingerprint of the
executable and the expected type), but deserialisation is still rather fragile
(unpacking code pointers and data).
The primitive operation in the runtime system will only detect grossly wrong
formats, and the primitive will return error code deserialize
when data
corruption is detected.P_GARBLED
deserialize# :: ByteArray# -> IO ( Int# , a )