binary-typed-0.3: Type-safe binary serialization

Safe HaskellSafe-Inferred
LanguageHaskell2010
Extensions
  • ScopedTypeVariables
  • ExplicitForAll

Data.Binary.Typed

Contents

Description

Defines a type-safe Binary instance to ensure data is ecoded with the type it was serialized from.

Synopsis

Core functions

data Typed a Source

A value suitable to be typechecked using the contained extra type information.

Instances

Show a => Show (Typed a)

"typed <format> <value>"

(Binary a, Typeable * a) => Binary (Typed a)

Ensures data is decoded as the appropriate type with high or total confidence (depending on with what TypeFormat the Typed was constructed).

typed :: Typeable a => TypeFormat -> a -> Typed a Source

Construct a Typed value using the chosen type format.

Example:

value = typed Full ("hello", 1 :: Int, 2.34 :: Double)
encoded = encode value

The decode site can now verify whether decoding happens with the right type.

data TypeFormat Source

Different ways of including/verifying type information of serialized messages.

Constructors

Untyped

Include no type information.

  • Requires one byte more compared to using Binary directly (to tag the data as untyped, required for the decoding step).
  • Encoding and decoding require negligible amount of additional computational cost compared to direct (intrinsically untyped) Binary.
Hashed5

Like Hashed32, but uses a 5-bit hash value.

  • Requires the same amount of space as Untyped, i.e. the only overhead compared to it is the computational cost to calculate the hash, which is almost identical to the one of Hashed32.
  • Collisions occur with a probability of 1/2^5 = 1/32. For this reason, this format is only recommended when minimal data size is top priority.
Hashed32

Compare types by their hash values (using the MurmurHash2 algorithm).

  • Requires five bytes more compared to using Binary directly for the type information (one to tag as Hashed32, four for the hash value)
  • Subject to false positive due to hash collisions, although in practice this should almost never happen.
  • Type errors cannot tell the provided type ("Expected X, received type with hash H")
  • Computational cost similar to Hashed64.
Hashed64

Like Hashed32, but uses a 64-bit hash value.

  • Requires nine bytes more compared to using Binary.
  • Hash collisions are even less likely to occur than with Hashed32.
  • Computational cost similar to Hashed32.
Shown

Compare String representation of types, obtained by calling show on the TypeRep, and also include a hash value (like Hashed32). The former is mostly for readable error messages, the latter provides better collision resistance.

  • Data size larger than Hashed32, but usually smaller than Full.
  • Both the hash and the shown type must match to satisfy the typechecker.
  • Useful type errors ("expected X, received Y"). All types are shown unqualified though, making Foo.X and Bar.X look identical in error messages. Remember this when you get a seemingly silly error "expected Foo, but given Foo".
Full

Compare the full representation of a data type.

  • More verbose than Shown. As a rule of thumb, transmitted data is roughly the same as Shown, but all names are fully qualified (package, module, type name).
  • Correct comparison (no false positives). An semi-exception here is when types change between package versions: package-1.0 Foo.X and package-1.1 Foo.X count as the same type.
  • Useful type errors ("expected X, received Y"). All types are shown unqualified though, making Foo.X and Bar.X look identical in error messages. Remember this when you get a seemingly silly error "expected Foo, but given Foo".

erase :: Typed a -> a Source

Extract the value of a Typed, i.e. strip off the explicit type information.

This function is safe to use for all Typed values created by the public API, since all construction sites ensure the actual type matches the contained type description.

erase (typed format x) == x

Useful general helpers

mapTyped :: Typeable b => (a -> b) -> Typed a -> Typed b Source

Modify the value contained in a Typed, keeping the same sort of type representation. In other words, calling mapTyped on something that is typed using Hashed will yield a Hashed value again.

Note: this destroys precached information, so that values have to be precached again if desired. As a consequence, mapTyped id can be used to un-precache values.

reValue :: (a -> a) -> Typed a -> Typed a Source

Change the value contained in a Typed, leaving the type representation unchanged. This can be useful to avoid recomputation of the included type information, and can improve performance significantly if many individual messages are serialized.

Can be seen as a more efficient mapTyped in case f is an endomorphism (i.e. has type a -> a).

reType :: Typeable a => TypeFormat -> Typed a -> Typed a Source

Change the way a type is represented inside a Typed value.

reType format x = typed format (erase x)

preserialize :: TypeInformation -> TypeInformation Source

Sometimes it can be beneficial to serialize the type information in advance, so that the maybe costly serialization step does not have to be repeated on every invocation of encode. Preserialization comes at a price though, as the directly contained ByteString requires its length to be included in the final serialization, yielding a 8-byte overhead for the required Int64, and one for the tag of what was serialized ("shown or full?").

This function calculates the serialized version of TypeInformation in cases where the required 9 bytes are negligible (determined by an arbitrary threshold, currently 10*9 bytes).

Used to make encodeTyped more efficient; the source there also makes a good usage example.

Typed serialization

Encoding

encodeTyped :: forall a. (Typeable a, Binary a) => TypeFormat -> a -> ByteString Source

Encode a Typeable value to ByteString that includes type information. This function is useful to create specialized typed encoding functions, because the type information is cached and does not need to be recalculated on every serialization.

Observationally, encodeTyped format value is equivalent to encode (typed format value). However, encodeTyped does the type information related calculations in advance and shares the results between future invocations of it, making it much more efficient to serialize many values of the same type.

Decoding

decodeTyped :: (Typeable a, Binary a) => ByteString -> Either String a Source

Safely decode data, yielding Either an error String or the value. Equivalent to decodeTypedOrFail stripped of the non-essential data. Based on decodeTypedOrFail.

encoded = encodeTyped Full ("hello", 1 :: Int, 2.34 :: Double)

-- Right <value>:
decodeTyped encoded :: Either String (String, Int, Double)

-- Left "Type error: expected (Char, Int, Double), got (String, Int, Double)"
decodeTyped encoded :: Either String (Char, Int, Double)

decodeTypedOrFail :: forall a. (Typeable a, Binary a) => ByteString -> Either (ByteString, ByteOffset, String) (ByteString, ByteOffset, a) Source

Safely decode data, yielding Either an error String or the value, along with meta-information of the consumed binary data.

unsafeDecodeTyped :: (Typeable a, Binary a) => ByteString -> a Source

Decode a typed value, throwing a descriptive error at runtime on failure. Typed cousin of decode. Based on decodeTypedOrFail.

encoded = encodeTyped Full ("hello", 1 :: Int, 2.34 :: Double)

-- <value>
unsafeDecodeTyped encoded :: (String, Int, Double)

-- (Descriptive) runtime error
unsafeDecodeTyped encoded :: (Char, Int, Double)