blaze-builder-0.2.0.1: Efficient construction of bytestrings.

Portabilitytested on GHC only
Stabilityexperimental
MaintainerSimon Meier <iridcode@gmail.com>

Blaze.ByteString.Builder

Contents

Description

Blaze.ByteString.Builder is the main module, which you should import as a user of the blaze-builder library.

 import Blaze.ByteString.Builder

It provides you with a type Builder that allows to efficiently construct lazy bytestrings with a large average chunk size.

Intuitively, a Builder denotes the construction of a part of a lazy bytestring. Builders can either be created using one of the primitive combinators in Blaze.ByteString.Builder.Write or by using one of the predefined combinators for standard Haskell values (see the exposed modules of this package). Concatenation of builders is done using mappend from the Monoid typeclass.

Here is a small example that serializes a list of strings using the UTF-8 encoding.

 import Blaze.ByteString.Builder.Char.Utf8
 strings :: [String]
 strings = replicate 10000 "Hello there!"

The function fromString creates a Builder denoting the UTF-8 encoded argument. Hence, UTF-8 encoding and concatenating all strings can be done follows.

 concatenation :: Builder
 concatenation = mconcat $ map fromString strings

The function toLazyByteString can be used to execute a Builder and obtain the resulting lazy bytestring.

 result :: L.ByteString
 result = toLazyByteString concatenation

The result is a lazy bytestring containing 10000 repetitions of the string "Hello there!" encoded using UTF-8. The corresponding 120000 bytes are distributed among three chunks of 32kb and a last chunk of 6kb.

A note on history. This serialization library was inspired by the Data.Binary.Builder module provided by the binary package. It was originally developed with the specific needs of the blaze-html package in mind. Since then it has been restructured to serve as a drop-in replacement for Data.Binary.Builder, which it improves upon both in speed as well as expressivity.

Synopsis

The Builder type

data Builder Source

Intuitively, a builder denotes the construction of a lazy bytestring.

Builders can be created from primitive buffer manipulations using the Write abstraction provided by in Blaze.ByteString.Builder.Write. However for many Haskell values, there exist predefined functions doing that already. For example, UTF-8 encoding Char and String values is provided by the functions in Blaze.ByteString.Builder.Char.Utf8. Concatenating builders is done using their Monoid instance.

Semantically, builders are nothing special. They just denote a sequence of bytes. However, their representation is chosen such that this sequence of bytes can be efficiently (in terms of CPU cycles) computed in an incremental, chunk-wise fashion such that the average chunk-size is large. Note that the large average chunk size allows to make good use of cache prefetching in later processing steps (e.g. compression) or to reduce the sytem call overhead when writing the resulting lazy bytestring to a file or sending it over the network.

For precisely understanding the performance of a specific Builder, benchmarking is unavoidable. Moreover, it also helps to understand the implementation of builders and the predefined combinators. This should be amenable to the average Haskell programmer by reading the source code of Blaze.ByteString.Builder.Internal and the other modules of this library.

The guiding implementation principle was to reduce the abstraction cost per output byte. We use continuation passing to achieve a constant time append. The output buffer is filled by the individual builders as long as possible. They call each other directly when they are done and control is returned to the driver (e.g., toLazyByteString) only when the buffer is full, a bytestring needs to be inserted directly, or no more bytes can be written. We also try to take the pressure off the cache by moving variables as far out of loops as possible. This leads to some duplication of code, but results in sometimes dramatic increases in performance. For example, see the fromWord8s function in Blaze.ByteString.Builder.Word.

Instances

Creating builders

flush :: BuilderSource

Output all data written in the current buffer and start a new chunk.

The use uf this function depends on how the resulting bytestrings are consumed. flush is possibly not very useful in non-interactive scenarios. However, it is kept for compatibility with the builder provided by Data.Binary.Builder.

When using toLazyByteString to extract a lazy ByteString from a Builder, this means that a new chunk will be started in the resulting lazy ByteString. The remaining part of the buffer is spilled, if the reamining free space is smaller than the minimal desired buffer size.

Executing builders

toLazyByteString :: Builder -> ByteStringSource

Extract the lazy ByteString from the builder by running it with default buffer sizes. Use this function, if you do not have any special considerations with respect to buffer sizes.

 toLazyByteString b = toLazyByteStringWith defaultBufferSize defaultMinimalBufferSize defaultFirstBufferSize b L.empty

Note that toLazyByteString is a Monoid homomorphism.

 toLazyByteString mempty          == mempty
 toLazyByteString (x `mappend` y) == toLazyByteString x `mappend` toLazyByteString y

However, in the second equation, the left-hand-side is generally faster to execute.

toLazyByteStringWithSource

Arguments

:: Int

Buffer size (upper-bounds the resulting chunk size).

-> Int

Minimal free buffer space for continuing filling the same buffer after a flush or a direct bytestring insertion. This corresponds to the minimal desired chunk size.

-> Int

Size of the first buffer to be used and copied for larger resulting sequences

-> Builder

Builder to run.

-> ByteString

Lazy bytestring to output after the builder is finished.

-> ByteString

Resulting lazy bytestring

Run a Builder with the given buffer sizes.

Use this function for integrating the Builder type with other libraries that generate lazy bytestrings.

Note that the builders should guarantee that on average the desired chunk size is attained. Builders may decide to start a new buffer and not completely fill the existing buffer, if this is faster. However, they should not spill too much of the buffer, if they cannot compensate for it.

A call toLazyByteStringWith bufSize minBufSize firstBufSize will generate a lazy bytestring according to the following strategy. First, we allocate a buffer of size firstBufSize and start filling it. If it overflows, we allocate a buffer of size minBufSize and copy the first buffer to it in order to avoid generating a too small chunk. Finally, every next buffer will be of size bufSize. This, slow startup strategy is required to achieve good speed for short (<200 bytes) resulting bytestrings, as for them the allocation cost is of a large buffer cannot be compensated. Moreover, this strategy also allows us to avoid spilling too much memory for short resulting bytestrings.

Note that setting firstBufSize >= minBufSize implies that the first buffer is no longer copied but allocated and filled directly. Hence, setting firstBufSize = bufSize means that all chunks will use an underlying buffer of size bufSize. This is recommended, if you know that you always output more than minBufSize bytes.

toByteString :: Builder -> ByteStringSource

Run the builder to construct a strict bytestring containing the sequence of bytes denoted by the builder. This is done by first serializing to a lazy bytestring and then packing its chunks to a appropriately sized strict bytestring.

 toByteString = packChunks . toLazyByteString

Note that toByteString is a Monoid homomorphism.

 toByteString mempty          == mempty
 toByteString (x `mappend` y) == toByteString x `mappend` toByteString y

However, in the second equation, the left-hand-side is generally faster to execute.

toByteStringIO :: (ByteString -> IO ()) -> Builder -> IO ()Source

Run the builder with a defaultBufferSized buffer and execute the given IO action whenever the buffer is full or gets flushed.

 toByteStringIO = toByteStringIOWith defaultBufferSize

This is a Monoid homomorphism in the following sense.

 toByteStringIO io mempty          == return ()
 toByteStringIO io (x `mappend` y) == toByteStringIO io x >> toByteStringIO io y

toByteStringIOWithSource

Arguments

:: Int

Buffer size (upper bounds the number of bytes forced per call to the IO action).

-> (ByteString -> IO ())

IO action to execute per full buffer, which is referenced by a strict ByteString.

-> Builder

Builder to run.

-> IO ()

Resulting IO action.

toByteStringIOWith bufSize io b runs the builder b with a buffer of at least the size bufSize and executes the IO action io whenever the buffer is full.

Compared to toLazyByteStringWith this function requires less allocation, as the output buffer is only allocated once at the start of the serialization and whenever something bigger than the current buffer size has to be copied into the buffer, which should happen very seldomly for the default buffer size of 32kb. Hence, the pressure on the garbage collector is reduced, which can be an advantage when building long sequences of bytes.

Compatibility to Data.Binary.Builder from the binary package

The following functions ensure that Blaze.ByteString.Builder is a drop-in replacement for Data.Binary.Builder from the binary package. Note that these functions are deprecated and may be removed in future versions of the blaze-builder package.

empty :: BuilderSource

O(1). An empty builder.

Deprecated: use mempty instead.

singleton :: Word8 -> BuilderSource

O(1). Serialize a single byte.

Deprecated: use fromWord8 instead.

append :: Builder -> Builder -> BuilderSource

O(1). Append two builders.

Deprecated: use mappend instead.

putWord16be :: Word16 -> BuilderSource

O(1). Serialize a Word16 in big endian format.

Deprecated: use fromWord16be instead.

putWord32be :: Word32 -> BuilderSource

O(1). Serialize a Word32 in big endian format.

Deprecated: use fromWord32be instead.

putWord64be :: Word64 -> BuilderSource

O(1). Serialize a Word64 in big endian format.

Deprecated: use fromWord64be instead.

putWord16le :: Word16 -> BuilderSource

O(1). Serialize a Word16 in little endian format.

Deprecated: use fromWord16le instead.

putWord32le :: Word32 -> BuilderSource

O(1). Serialize a Word32 in little endian format.

Deprecated: use fromWord32le instead.

putWord64le :: Word64 -> BuilderSource

O(1). Serialize a Word64 in little endian format.

Deprecated: use fromWord64le instead.

putWordhost :: Word -> BuilderSource

O(1). Serialize a Word in host endian format.

Deprecated: use fromWordhost instead.

putWord16host :: Word16 -> BuilderSource

O(1). Serialize a Word16 in host endian format.

Deprecated: use fromWord16host instead.

putWord32host :: Word32 -> BuilderSource

O(1). Serialize a Word32 in host endian format.

Deprecated: use fromWord32host instead.

putWord64host :: Word64 -> BuilderSource

O(1). Serialize a Word64 in host endian format.

Deprecated: use fromWord64host instead.