|Portability||tested on GHC only|
|Maintainer||Simon Meier <email@example.com>|
Blaze.ByteString.Builder is the main module, which you should import as a user
It provides you with a type
Builder that allows to efficiently construct
lazy bytestrings with a large average chunk size.
Builder denotes the construction of a part of a lazy
bytestring. Builders can either be created using one of the primitive
combinators in Blaze.ByteString.Builder.Write or by using one of the predefined
combinators for standard Haskell values (see the exposed modules of this
package). Concatenation of builders is done using
mappend from the
Here is a small example that serializes a list of strings using the UTF-8 encoding.
strings :: [String] strings = replicate 10000 "Hello there!"
Builder denoting the UTF-8 encoded
argument. Hence, UTF-8 encoding and concatenating all
strings can be done
concatenation :: Builder concatenation = mconcat $ map fromString strings
result :: L.ByteString result = toLazyByteString concatenation
result is a lazy bytestring containing 10000 repetitions of the string
"Hello there!" encoded using UTF-8. The corresponding 120000 bytes are
distributed among three chunks of 32kb and a last chunk of 6kb.
A note on history. This serialization library was inspired by the
Data.Binary.Builder module provided by the
binary package. It was
originally developed with the specific needs of the
blaze-html package in
mind. Since then it has been restructured to serve as a drop-in replacement
Data.Binary.Builder, which it improves upon both in speed as well as
- data Builder
- module Blaze.ByteString.Builder.Write
- module Blaze.ByteString.Builder.Int
- module Blaze.ByteString.Builder.Word
- module Blaze.ByteString.Builder.ByteString
- flush :: Builder
- toLazyByteString :: Builder -> ByteString
- toLazyByteStringWith :: Int -> Int -> Int -> Builder -> ByteString -> ByteString
- toByteString :: Builder -> ByteString
- toByteStringIO :: (ByteString -> IO ()) -> Builder -> IO ()
- toByteStringIOWith :: Int -> (ByteString -> IO ()) -> Builder -> IO ()
- empty :: Builder
- singleton :: Word8 -> Builder
- append :: Builder -> Builder -> Builder
- putWord16be :: Word16 -> Builder
- putWord32be :: Word32 -> Builder
- putWord64be :: Word64 -> Builder
- putWord16le :: Word16 -> Builder
- putWord32le :: Word32 -> Builder
- putWord64le :: Word64 -> Builder
- putWordhost :: Word -> Builder
- putWord16host :: Word16 -> Builder
- putWord32host :: Word32 -> Builder
- putWord64host :: Word64 -> Builder
Intuitively, a builder denotes the construction of a lazy bytestring.
Builders can be created from primitive buffer manipulations using the
abstraction provided by in Blaze.ByteString.Builder.Write. However for
many Haskell values, there exist predefined functions doing that already.
For example, UTF-8 encoding
String values is provided by the
functions in Blaze.ByteString.Builder.Char.Utf8. Concatenating builders is done
Semantically, builders are nothing special. They just denote a sequence of bytes. However, their representation is chosen such that this sequence of bytes can be efficiently (in terms of CPU cycles) computed in an incremental, chunk-wise fashion such that the average chunk-size is large. Note that the large average chunk size allows to make good use of cache prefetching in later processing steps (e.g. compression) or to reduce the sytem call overhead when writing the resulting lazy bytestring to a file or sending it over the network.
For precisely understanding the performance of a specific
benchmarking is unavoidable. Moreover, it also helps to understand the
implementation of builders and the predefined combinators. This should be
amenable to the average Haskell programmer by reading the source code of
Blaze.ByteString.Builder.Internal and the other modules of this library.
The guiding implementation principle was to reduce the abstraction cost per
output byte. We use continuation passing to achieve a constant time append.
The output buffer is filled by the individual builders as long as possible.
They call each other directly when they are done and control is returned to
the driver (e.g.,
toLazyByteString) only when the buffer is full, a
bytestring needs to be inserted directly, or no more bytes can be written.
We also try to take the pressure off the cache by moving variables as far
out of loops as possible. This leads to some duplication of code, but
results in sometimes dramatic increases in performance. For example, see the
function in Blaze.ByteString.Builder.Word.
Output all data written in the current buffer and start a new chunk.
The use uf this function depends on how the resulting bytestrings are
flush is possibly not very useful in non-interactive scenarios.
However, it is kept for compatibility with the builder provided by
toLazyByteString to extract a lazy
ByteString from a
Builder, this means that a new chunk will be started in the resulting lazy
ByteString. The remaining part of the buffer is spilled, if the
reamining free space is smaller than the minimal desired buffer size.
Extract the lazy
ByteString from the builder by running it with default
buffer sizes. Use this function, if you do not have any special
considerations with respect to buffer sizes.
toLazyByteString mempty == mempty toLazyByteString (x `mappend` y) == toLazyByteString x `mappend` toLazyByteString y
However, in the second equation, the left-hand-side is generally faster to execute.
Buffer size (upper-bounds the resulting chunk size).
Minimal free buffer space for continuing filling
the same buffer after a
Size of the first buffer to be used and copied for larger resulting sequences
Builder to run.
Lazy bytestring to output after the builder is finished.
Resulting lazy bytestring
Builder with the given buffer sizes.
Use this function for integrating the
Builder type with other libraries
that generate lazy bytestrings.
Note that the builders should guarantee that on average the desired chunk size is attained. Builders may decide to start a new buffer and not completely fill the existing buffer, if this is faster. However, they should not spill too much of the buffer, if they cannot compensate for it.
toLazyByteStringWith bufSize minBufSize firstBufSize will generate
a lazy bytestring according to the following strategy. First, we allocate
a buffer of size
firstBufSize and start filling it. If it overflows, we
allocate a buffer of size
minBufSize and copy the first buffer to it in
order to avoid generating a too small chunk. Finally, every next buffer will
be of size
bufSize. This, slow startup strategy is required to achieve
good speed for short (<200 bytes) resulting bytestrings, as for them the
allocation cost is of a large buffer cannot be compensated. Moreover, this
strategy also allows us to avoid spilling too much memory for short
Note that setting
firstBufSize >= minBufSize implies that the first buffer
is no longer copied but allocated and filled directly. Hence, setting
firstBufSize = bufSize means that all chunks will use an underlying buffer
bufSize. This is recommended, if you know that you always output
Run the builder to construct a strict bytestring containing the sequence of bytes denoted by the builder. This is done by first serializing to a lazy bytestring and then packing its chunks to a appropriately sized strict bytestring.
toByteString = packChunks . toLazyByteString
toByteString mempty == mempty toByteString (x `mappend` y) == toByteString x `mappend` toByteString y
However, in the second equation, the left-hand-side is generally faster to execute.
This is a
Monoid homomorphism in the following sense.
toByteStringIO io mempty == return () toByteStringIO io (x `mappend` y) == toByteStringIO io x >> toByteStringIO io y
Buffer size (upper bounds
the number of bytes forced
per call to the
|-> (ByteString -> IO ())|
|-> IO ()|
toByteStringIOWith bufSize io b runs the builder
b with a buffer of
at least the size
bufSize and executes the
io whenever the
buffer is full.
toLazyByteStringWith this function requires less allocation,
as the output buffer is only allocated once at the start of the
serialization and whenever something bigger than the current buffer size has
to be copied into the buffer, which should happen very seldomly for the
default buffer size of 32kb. Hence, the pressure on the garbage collector is
reduced, which can be an advantage when building long sequences of bytes.
Compatibility to Data.Binary.Builder from the binary package
The following functions ensure that
Blaze.ByteString.Builder is a
drop-in replacement for
Data.Binary.Builder from the
package. Note that these functions are deprecated and may be removed
in future versions of the
O(1). Serialize a single byte.
O(1). Append two builders.