Changes between Version 3 and Version 4 of BinaryIO
- Timestamp:
- 12/19/05 16:27:00 (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
BinaryIO
v3 v4 1 1 = Binary I/O = 2 [[PageOutline]] 2 3 3 Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of competing external libraries exist providing various forms of binary I/O, providing forms of compressed I/O, and serialised, persistent data.4 Haskell 98 treats I/O as character-based, and lacks a well-defined mechanism for binary I/O. However, a number of external libraries exist providing various forms of binary I/O. 4 5 6 Two forms of binary I/O are considered here: 7 * Word8 based extensions to Syste.IO, and 8 * Typeclass-based Binary I/O (referred to as Binary) for serialising arbitrary data types, layered over Word8 extensions 9 10 == Explanation == 5 11 * Character-based I/O is needed, at least because systems (e.g. Unix and Windows) have different line-termination conventions that should be hidden from programs. The problem becomes more acute when different environments use different character sets and encodings (see [wiki:Unicode]). 6 12 * Binary I/O is needed both to handle binary data and as a base upon which general treatment s of character-encoding conversions (see [wiki:Unicode]) may be layered. 13 * Type-classed binary I/O is needed to support serialisable structures and peristence for arbitrary Haskell data 7 14 8 One proposal is to add a form of I/O over `Word8` (i.e. octets, 8-bit binary values). See the "Binary input and output" section of [http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html System.IO] for a rough design. 15 == Proposal 1 - System.IO == 16 * One proposal is to add a form of I/O over `Word8` (i.e. octets, 8-bit binary values). See the "Binary input and output" section of [http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html System.IO] for a rough design. 9 17 10 Another would be to look at one of the binary I/O libraries based on [ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html The Bits Between The Lambdas], descendents of which have proliferated in the last couple of years. The advantage of this style over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each type component of the data you with to serialise. Instances of I/O may be written by hand, or derived mechanically with [http://repetae.net/john/computer/haskell/DrIFT/ DrIFT]. 18 == Proposal 2 - The Binary class == 19 * Proposal two is to add a Binary class, based on the type class described in [ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html The Bits Between The Lambdas]. The advantage of this form of binary I/O over the simpler System.IO library is support for serialising more complex data types, using type classes to recursively define binary I/O routines for each component of the type. Instances of I/O may be written by hand, or derived mechanically with [http://repetae.net/john/computer/haskell/DrIFT/ DrIFT]. Ideally Binary would be derivable by the compiler (is this feasible?). 11 20 12 Issues to consider: 13 * What language extensions are required? 14 * Support for cyclic structures 15 * Is it possible to derive I/O instances for types, or must they be written by hand? 21 == References == 16 22 17 Existing libraries for Binary I/O: 18 * The simplest is probably [http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html System.IO], which provides hGetBuf-style I/O. Really only suitable for arrays. 19 * [http://www.cse.unsw.edu.au/~dons/fps.html Packed strings], layered over System.IO is sometimes used, for simple data types, which can be easily converted to and from flat arrays, using list functions. 20 * The de-facto standard, and also the fastest, for non-trivial data types, the Binary class, a version of which is [ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html described here]. Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include: 23 === Proposal 1 === 24 * The simplest implementation option is [http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html System.IO], which provides hGetBuf-style I/O. More sophisticated systems can be layered on top, as external libraries. 25 * [http://www.cse.unsw.edu.au/~dons/fps.html Packed strings], layered over System.IO, are a related interface, and sometimes used for binary I/O of flat data types. 26 27 === Proposal 2 === 28 * The Binary class is the de-facto standard for more structured data. The origins are [ftp://ftp.cs.york.ac.uk/pub/malcolm/ismm98.html described here]. Distributed with nhc, and used by GHC to deal with .hi files. Tool support from DrIFT to derive new instances. Flavours include: 21 29 * [http://haskell.org/nhc98/libs/Binary.html NHC's binary], the original 22 30 * [http://cvs.haskell.org/cgi-bin/cvsweb.cgi/~checkout~/fptools/ghc/compiler/utils/Binary.hs GHC's Binary], used internally by GHC. 23 * [http://www.n-heptane.com/nhlab/repos/NewBinary/ NewBinary], the standard 24 * [http://www.cse.unsw.edu.au/~dons/code/hmp3/Binary.hs Lambdabot/Hmp3's Binary], a faster,Handle-only version of Binary.31 * [http://www.n-heptane.com/nhlab/repos/NewBinary/ NewBinary], the standard version today 32 * [http://www.cse.unsw.edu.au/~dons/code/hmp3/Binary.hs Lambdabot/Hmp3's Binary], a stripped-down Handle-only version of Binary. 25 33 * [http://www.cs.helsinki.fi/u/ekarttun/SerTH/ SerTH] is a Binary-alike, which uses Template Haskell to derive serialiser instances for each data type. It's an alternative to using DrIFT (or handwriting) your own Binary instances. Obviously requires TH. Supports serialising cyclic structures 26 34 * [http://freearc.narod.ru/ ByteStream], a new high-performance serialisation library, using gzip compression. 27 35 28 Further information: 29 * [http://www.haskell.org/pipermail/haskell/2005-December/017029.html A recent mailing list thread]. 30 * [http://haskell.org/hawiki/BinaryIo A page on the Haskell wiki] 36 == Pros/Cons : System.IO == 31 37 32 The two simplest options are to go with only the System.IO extension, or the Binary class. 38 === Pros === 39 * System.IO extensions are already in common use, simple to implement 40 * More sophisticated binary I/O may be layered on top 33 41 34 Pros: 35 * The Binary class (particularly as implemented in NewBinary) is simple, elegant and widely used. 36 * Binary IO is an oft requested feature, lack of which is sometimes considered a flaw in Haskell98, so we should do something about it. 42 === Cons === 43 * Possible that the API is not rich enough for many binary I/O requirements, we should strive for more? 37 44 38 Cons: 39 * Ideally(?) Binary should be derivable without an external tool 40 * Binary only supports I/O from Handles and memory buffers. Some people require other kinds of streams 41 * There is an overlap with Storable that isn't exploited or explained in any existing library. 42 * Some new developments are underway to combine SerTH's cyclic structure support with the speed of NewBinary 43 * What about a NewIO library, how will this overlap/interact? 45 == Pros/Cons : Binary == 46 47 === Pros === 48 * The Binary class (particularly as implemented in NewBinary) is simple to implement and widely used. 49 * Binary IO is an oft requested feature, lack of which is sometimes considered a flaw in Haskell98. 50 * Difficult to serialise data without this class 51 52 === Cons === 53 * There is an overlap with the Storable class that isn't exploited 54 * Doesn't support cyclic structures 55 * Lack of derivability can be annoying 56
