binary-generic-combinators: Combinators and utilities to make Generic-based deriving of Binary easier and more expressive

[ bsd3, data, library, parsing ] [ Propose Tags ]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.4.0.0, 0.4.2.0, 0.4.3.0, 0.4.4.0
Change log ChangeLog.md
Dependencies base (>=4.7 && <5), binary, QuickCheck [details]
License BSD-3-Clause
Copyright 2021 Georg Rudoy
Author Georg Rudoy
Maintainer 0xd34df00d@gmail.com
Category Data, Parsing
Home page https://github.com/0xd34df00d/binary-generic-combinators#readme
Bug tracker https://github.com/0xd34df00d/binary-generic-combinators/issues
Source repo head: git clone https://github.com/0xd34df00d/binary-generic-combinators
Uploaded by 0xd34df00d at 2021-07-19T01:06:10Z
Distributions LTSHaskell:0.4.4.0, NixOS:0.4.4.0, Stackage:0.4.4.0
Downloads 665 total (25 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2021-07-19 [all 1 reports]

Readme for binary-generic-combinators-0.4.3.0

[back to package description]

binary-generic-combinators

tl;dr

This library provides a set of combinators and utility types to make Generic-based deriving of binary (de)serialization instances easier and more flexible, especially when dealing with existing formats.

Motivation

Isn't it great to just define data types representing your problem domain and let the compiler derive all the instances? If we are talking about binary (de)serialization, the Binary instance is already derivable for Generic types, but there are a couple of problems with that.

Compound types

Firstly, the serialization format of compound types is carved in stone (again, we're talking about Generic-based deriving). For example, for some list [a] the binary library always assumes the list length is mentioned explicitly. This is totally fine if we just need to be able to serialize our type and deserialize it later in the same program without caring about outside world. But what if we're writing a parser (or serializer) for some existing data format? In this case (and it's often the case) the format might just imply the strategy of "try to parse the elements of some type a until failure, and build a list out of that" — pretty much like Alternative's some or many. With stock binary, we'd have to write a custom instance by hand:

data Element = Element { .. } deriving (Generic, Binary)
data MyType = MyType { elems :: [Element] }

instance Binary MyType where
  get = MyType <$> many
  put = mapM_ put . elems

Wouldn't it be great if we could just annotate our types in such a way that we don't have to write that instance? Something like, well... This?

data Element = Element { .. } deriving (Generic, Binary)
data MyType = MyType { elems :: Many Element } deriving (Generic, Binary)

This library provides an array of wrappers that solve precisely this issue.

Utilities

Then, what if we need to skip some number of bytes, or any number of bytes as long as their value is 0xff? Or, maybe, make sure that the input starts with a certain signature sequence of bytes? With stock binary, we again have to write an instance by hand. This library provides a few helpers for that too:

data MyType = MyType
  { header :: MatchBytes "my format header" '[ 0xd3, 0x4d, 0xf0, 0x0d ]   -- consume 0xd34df00d, or fail the parse
  , slack :: SkipByte 0xff                                                -- skip all subsequent 0xff
  , reserved :: SkipCount Word8 4                                         -- 4 bytes reserved
  ..
  } deriving (Generic, Binary)

Deriving strategies

With stock binary, if we serialize an ADT, then binary first writes the integer denoting the index of the constructor, and then the contents of that constructor. This is not always what's needed. Consider:

data JfifSegment
  = App0Segment (MatchByte "app0 segment" 0xe0, JfifApp0)
  | DqtSegment  (MatchByte "dqt segment"  0xdb, QuantTable)
  | SofSegment  (MatchByte "sof segment"  0xc0, SofInfo)
  | DhtSegment  (MatchByte "dht segment"  0xc4, HuffmanTable)
  | DriSegment  (MatchByte "dri segment"  0xdd, RestartInterval)
  | SosSegment  (MatchByte "sos segment"  0xda, SosImage)
  | UnknownSegment RawSegment

Here, the identifiers of the constructors are effectively defined in the standard (by the way, this is JPEG/JFIF). Deriving Binary for this type would yield incorrect results: we don't need to encode or decode the index of the constructor, it's already baked in the MatchBytes part. In this case, what we need is to try to parse each constructor in order, moving on to the next one if its segment identifier doesn't match what's in the byte stream. That's basically what Alternative's <|> does! Here, we can leverage that via this library's handy Alternatively type and DerivingVia:

{-# LANGUAGE DerivingVia #-}

data JfifSegment
  = App0Segment (MatchByte "app0 segment" 0xe0, JfifApp0)
  | DqtSegment  (MatchByte "dqt segment"  0xdb, QuantTable)
  | SofSegment  (MatchByte "sof segment"  0xc0, SofInfo)
  | DhtSegment  (MatchByte "dht segment"  0xc4, HuffmanTable)
  | DriSegment  (MatchByte "dri segment"  0xdd, RestartInterval)
  | SosSegment  (MatchByte "sos segment"  0xda, SosImage)
  | UnknownSegment RawSegment
  deriving Generic
  deriving Binary via Alternatively JfifSegment