serdoc-core: Generated documentation of serialization formats

[ apache, data, library ] [ Propose Tags ]

A set of typeclasses, primitives, combinators, and TH utilities for documenting serialization formats in a mostly automatic fashion.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.1.0.0
Change log CHANGELOG.md
Dependencies base (>=4.14.0.0 && <5), bytestring (>=0.11 && <0.13), containers (>=0.6 && <0.8), mtl (>=2.3.1 && <2.4), tasty (>=1.5 && <1.6), tasty-quickcheck (>=0.10.3 && <0.11), template-haskell (>=2.16 && <2.22), text (>=1.1 && <2.2), th-abstraction (>=0.6 && <0.7), time (>=1.12 && <1.14) [details]
License Apache-2.0
Copyright 2023 IO Global
Author Tobias Dammers
Maintainer tobias@well-typed.com
Category Data
Uploaded by TobiasDammers at 2024-02-06T08:42:41Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Downloads 40 total (5 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2024-02-06 [all 1 reports]

Readme for serdoc-core-0.1.0.0

[back to package description]

serdoc

Unified serialization with semi-automatic documentation

Introduction

SerDoc provides:

  • A unified interface for serialization formats ("codecs"), in the form of a 'Serializable' typeclass.
  • A mini-EDSL (FieldInfo) for describing serialization formats as first-class data structures, and a typeclass (HasInfo) to link them to codecs and serializable Haskell types.
  • Building blocks and utility code for implementing Codec, Serializable, and HasInfo for existing or new serialization formats.

It also includes an implementation of these typeclasses for the binary package.

Components

SerDoc is split up into two sub-projects:

  • serdoc-core (this library), which provides the typeclasses and building blocks
  • serdoc-binary, which provides instances for binary

How To Use

Serializing and deserializing values

For encoding, it's straightforward - use encode.

For decoding, you have a few options, depending on the codec you use. The most general form is decodeM; apart from that, a family of similar functions is provided, following the conventions:

  • Monadic decoders have an M suffix; pure decoders (where the decoding Monad is Except err or Identity) have no M suffix (e.g., decode vs. decodeM).
  • Flavors that ignore any remaining unconsumed input have a _ suffix (e.g. decode_).
  • Flavors that convert decoding errors to Eithers have an Either suffix (e.g. decodeMEither); these require that the decoding monad is Except err or ExceptT err m.

Keep in mind that depending on how the codec works, the serialized data may be returned / consumed via the Encoded type, or passed by (mutable) reference through the Context object. The API purposefully supports both ways, because a given codec may only support one or the other.

Implementing HasInfo and Serializable for an existing codec

  • For newtype wrappers that use the same serialization format as their wrapped payloads, the easiest way is to use GeneralizedNewtypeDeriving to derive instances for both typeclasses.
  • For newtype wrappers that should implement a different serialization format, you may need to hand-write instances; if you do this, take special care to ensure that the HasInfo instance matches the actual serialization.
  • Lists and some other data structures are supported out of the box and require no explicit instance; they are serialized using a 32-bit list length followed by the serialized list elements, in order. If you need a different representation, then newtype-wrapping may be necessary.
  • For enumeration types, a generic wrapper, ViaEnum, is provided, which you can use in combination with the DerivingVia extension; alternatively, you can use enumInfo, encodeEnum, and decodeEnum to write the instances yourself.
  • String, due to being just a type alias for [Char], will only serialize iff an instance for Char exists. However, it is usually preferred to convert your strings to Text.
  • For record types, consider using deriveSerDoc (found in Data.SerDoc.TH). This Template Haskell function will generate matching instances for both typeclasses, following the convention of serializing all fields in the order they appear in the type declaration, and labelled by their Haskell field names. Obviously this will only work if instances for each of the record fields exist.

Adding your own codecs

A codec is indicated using a phantom type; no values of that type ever need to exist at runtime, we merely use it to identify the codec we want, so we can define it as a constructorless data type (like Void), e.g.:

data MyFantasticCodec

We then need a Codec instance, which is where we define:

  • A type for a context passed to each invocation of encode and decodeM; this can be anything you want, depending on the needs of your codec. If the codec does not require any context, use ().
  • A monadic data type used for encoding, MonadEncode. For pure codecs, this can be Identity; if you serialize directly to something like a file handle or network socket, it will typically have to be IO, or some MonadIO.
  • A monadic data type used for decoding, MonadDecode. Since decoding can fail, this will typically involve not just the required effects for the decoding process itself, but also some form of error handling. It is recommended to use Except err for pure codecs, and ExceptT err IO (or MonadIO m => ExceptT err m) for codecs that require IO, where err is an appropriate error data structure for your codec.
  • The default encoding to use for enum types (optional, defaults to Word16).

Providing instances for a reasonable set of primitive values and data structures is highly recommended; a minimum viable set might be:

  • ()
  • Bool
  • Int, Int8, Int16, Int32, Int64
  • Word, Word8, Word16, Word32, Word64
  • Whatever type you picked for the default enum encoding
  • [a]
  • Maybe a
  • Either a b
  • Tuples up to 7 elements
  • ByteString