{-# OPTIONS_HADDOCK prune #-}
{-# LANGUAGE ScopedTypeVariables #-}

{- | 

Module      : GHC.Packing
Copyright   : (c) Jost Berthold, 2010-2015,
License     : BSD3
Maintainer  : jost.berthold@gmail.com
Stability   : experimental
Portability : no (depends on GHC internals)

= Serialisation of Haskell data structures (independent of evaluation)

Haskell heap structures can be serialised, capturing their current
state of evaluation, and deserialised later during the same program
run (effectively duplicating the data). Serialised data can also be
written to storage or sent over a network, and deserialised in a
different run or different instance of the /same/ executable binary.

The feature can be used to implement message passing over a network
(which is where the runtime support originated), or for various
applications based on data persistence, for instance checkpointing and
memoisation.

The library described here supports an operation to serialise Haskell
heap data:

> trySerialize :: a -> IO (Serialized a)

The routine will throw a 'PackException' if an error occurs inside the
C code which accesses the Haskell heap (see @'PackException'@).
In presence of concurrent threads, another thread might be evaluating
data /referred to/ by the data to be serialised. In this case, the calling
thread will /block/ on the ongoing evaluation and continue when evaluated
data is available.
Internally, there is a 'PackException' 'P_BLACKHOLE' to signal the
condition, but it is hidden inside the core library
(see <#background Background Information> below).

The inverse operation to serialisation is

> deserialize :: Serialized a -> IO a

The data type 'Serialized' a includes a phantom type @a@ to ensure
type safety within one and the same program run. Type @a@ can be
polymorphic (at compile time, that is) when 'Serialized' @a@ is not used
apart from being argument to 'deserialize'.

The @Show@, @Read@, and @Binary@ instances of @Serialized a@ require an
additional 'Typeable' context (which requires @a@ to be monomorphic)
in order to implement dynamic type checks when parsing and deserialising
data from external sources.
Consequently, the 'PackException' type contains exceptions which indicate
parse errors and type/binary mismatch.

-}

module GHC.Packing
    ( -- * Serialisation Operations
      trySerialize, trySerializeWith
    , deserialize

      -- * Data Types and instances
    , Serialized
      -- $ShowReadBinary
    , PackException(..)
      -- $packexceptions

      -- * Serialisation and binary file I/O
    , encodeToFile 
    , decodeFromFile

    -- * Background Information
      -- $primitives
   )
    where

-- all essentials are defined in other modules, and reexported here
import GHC.Packing.PackException
import GHC.Packing.Type
import GHC.Packing.Core

import Data.Binary
import Control.Exception
import Data.Typeable


-- | Write serialised binary data directly to a file. May throw 'PackException's.
encodeToFile :: Typeable a => FilePath -> a -> IO ()
encodeToFile path x = trySerialize x >>= encodeFile path

-- | Directly read binary serialised data from a file. May throw
-- 'PackException's (catches I/O and Binary exceptions from decoding
-- the file and re-throws 'P_ParseError')
decodeFromFile :: Typeable a => FilePath -> IO a
decodeFromFile path = do ser <- (decodeFile path) 
                                  `catch` 
                                  (\(e::ErrorCall) -> throw P_ParseError)
                         deserialize ser -- exceptions here go through

----------------------------------------
-- digressive documentation

{- $ShowReadBinary

The power of evaluation-orthogonal serialisation is that one can
/externalise/ partially evaluated data (containing thunks), for
instance write it to disk or send it over a network.

Therefore, the module defines a 'Data.Binary' instance for
'Serialized' a, as well as instances for 'Read' and 'Show'@ which
satisfy @ 'read' . 'show' == 'id' :: 'Serialized' a -> 'Serialized' a@.

The phantom type is enough to ensure type-correctness when serialised
data remain in one single program run. However, when data from
previous runs are read in from an external source, their type needs to
be checked at runtime. Type information must be stored together with
the (binary) serialisation data.

The serialised data contain pointers to static data in the generating
program (top-level functions and constants) and very likely to
additional library code. Therefore, the /exact same binary/ must be
used when reading in serialised data from an external source. A hash
of the executable is therefore included in the representation as well.

-}

{- $packexceptions

'PackException's can occur at Haskell level or in the foreign primop.
The Haskell-level exceptions all occur when reading in
'GHC.Packing.Serialised' data, and are:

* 'P_BinaryMismatch': the serialised data have been produced by a
different executable (must be the same binary).
* 'P_TypeMismatch': the serialised data have the wrong type
* 'P_ParseError': serialised data could not be parsed (from binary or
text format)

The other exceptions are return codes of the foreign primitive
operation, and indicate errors at the C level. Most of them occur when
serialising data; the exception is 'P_GARBLED' which indicates corrupt
serialised data.

-}

{- $primitives

  #background#

The functionality exposed by this module builds on serialisation of
Haskell heap graph structures, first implemented in the context of
implementing the GpH implementation GUM (Graph reduction on a 
Unified Memory System) and later adopted by the implementation of
Eden. Independent of its evaluation state, data and thunks can be
transferred between the (independent) heaps of several running Haskell
runtime system instances which execute the same executable.

The idea to expose the heap data serialisation functionality 
(often called /packing/) to Haskell by itself was first described in 
 Jost Berthold. /Orthogonal Serialisation for Haskell/.
 In Jurriaan Hage and Marco Morazan, editors, 
 /IFL'10, 22nd Symposium on Implementation and Application of 
 Functional Languages/, Springer LNCS 6647, pages 38-53, 2011.
This paper can be found at 
<http://www.mathematik.uni-marburg.de/~eden/papers/mainIFL10-withCopyright.pdf>,
the original publication is available at 
<http://www.springerlink.com/content/78642611n7623551/>.

The core runtime support consists of just two operations:
(slightly paraphrasing the way in which GHC implements the IO monad here)

> serialize#   :: a -> IO ByteArray# -- OUTDATED, see below
> deserialize# :: ByteArray# -> IO a -- which is actually pure from a mathematical POW

However, these operations are completely unsafe with respect to Haskell
types, and may fail at runtime for various other reasons as well. 
Type safety can be established by a phantom type, but needs to be checked
at runtime when the resulting data structure is externalised (for instance,
saved to a file). Besides prohibiting unprotected type casts, another
restriction that needs to be explicitly checked in this case is that 
different programs cannot exchange data by this serialisation. When data are
serialised during execution, they can only be deserialised by exactly the 
same executable binary because they contain code pointers that will change
even by recompilation.

Other failures can occur because of the runtime system's limitations, 
and because some mutable data types are not allowed to be serialised.
A newer API therefore suggests additions towards exception handling
and better usability.
The original primitive @'serialize'@ is modified and now returns error
codes, leading to the following type (again paraphrasing):

> trySerialize# :: a -> IO ( Int# , ByteArray# )

where the @Int#@ encodes potential error conditions returned by the runtime.

A second primitive operation has been defined, which uses a pre-allocated
@ByteArray#@

> trySerializeWith# :: a -> ByteArray# -> IO ( Int# , ByteArray# )

Further to returning error codes, the newer primitive operation do not block
the calling thread when the serialisation encounters a blackhole in the
heap.
It would be possible to observe the existence of blackholes from Haskell by
the return code of these primitive operation. This could - in theory - be
used to explicitly control and avoid blocking (avoiding unresponsive behaviour).
In practice, however, making blackholes observable from Haskell is
certainly undesirable. The primitive operations return the address of the
blackhole, and the caller will block on this blackhole at 
the Haskell level (see code in the @GHC.Packing.Core@ module).

The Haskell layer and its types protect the interface function @'deserialize'@
from being applied to  grossly wrong data (by checking a fingerprint of the 
executable and the expected type), but deserialisation is still rather fragile 
(unpacking code pointers and data).
The primitive operation in the runtime system will only detect grossly wrong
formats, and the primitive will return error code @'P_GARBLED'@ when data
corruption is detected.

> deserialize# :: ByteArray# -> IO ( Int# , a )
-}