lz4-frame-conduit-0.1.0.2: Conduit implementing the official LZ4 frame streaming format
Copyright(c) Niklas Hambüchen 2020
LicenseMIT
Maintainermail@nh2.me
Stabilitystable
Safe HaskellSafe-Inferred
LanguageHaskell2010

Codec.Compression.LZ4.Conduit

Contents

Description

Help Wanted / TODOs

Please feel free to send me a pull request for any of the following items:

  • TODO Block checksumming
  • TODO Dictionary support
  • TODO Performance: Write a version of compress that emits ByteStrings of known constant length. That will allow us to do compression in a zero-copy fashion, writing compressed bytes directly into a the ByteStrings (e.g using unsafePackMallocCString or equivalent). We currently don't do that (instead, use allocaBytes + copying packCStringLen) to ensure that the ByteStrings generated are as compact as possible (for the case that `written < size`), since the current compress conduit directly yields the outputs of LZ4F_compressUpdate() (unless they are of 0 length when they are buffered in the context tmp buffer).
  • TODO Try enabling checksums, then corrupt a bit and see if lz4c detects it.
  • TODO Add `with*` style bracketed functions for creating the LZ4F_createCompressionContext and Lz4FramePreferencesPtr for prompt resource release, in addition to the GC'd variants below. This would replace our use of finalizeForeignPtr in the conduit. finalizeForeignPtr seems almost as good, but note that it doesn't guarantee prompt resource release on exceptions; a `with*` style function that uses bracket does. However, it isn't clear yet which one would be faster (what the cost of mask is compared to foreign pointer finalizers). Also note that prompt freeing has side benefits, such as reduced malloc() fragmentation (the closer malloc() and free() are to each other, the smaller is the chance to have malloc()s on top of the our malloc() in the heap, thus the smaller the chance that we cannot decrease the heap pointer upon free() (because "mallocs on top" render heap memory unreturnable to the OS; memory fragmentation).
Synopsis

Documentation

data ContentChecksum Source #

data FrameInfo Source #

Constructors

FrameInfo 

Fields

compress :: (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m () Source #

Note [Single call to LZ4F_compressUpdate() can create multiple blocks] A single call to LZ4F_compressUpdate() can create multiple blocks, and handles buffers > 32-bit sizes; see: https://github.com/lz4/lz4/blob/52cac9a97342641315c76cfb861206d6acd631a8/lib/lz4frame.c#L601 So we don't need to loop around LZ4F_compressUpdate() to compress an arbitrarily large amount of input data, as long as the destination buffer is large enough.

compressYieldImmediately :: (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m () Source #

Compresses the incoming stream of ByteStrings with the lz4 frame format.

Yields every LZ4 output as a ByteString as soon as the lz4 frame library produces it.

Note that this does not imply ZL4 frame autoFlush (which affects when the lz4 frame library produces outputs).

compressWithOutBufferSize :: forall m. (MonadUnliftIO m, MonadResource m) => CSize -> ConduitT ByteString ByteString m () Source #

Compresses the incoming stream of ByteStrings with the lz4 frame format.

This function implements two optimisations to reduce unnecessary allocations:

  • Incoming ByteStrings are processed in blocks of 16 KB, allowing us to use a single intermediate output buffer through the lifetime of the conduit.
  • The bufferSize of the output buffer can controlled by the caller via the bufferSize argument, to reduce the number of small ByteStrings being yielded (especially in the case that the input data compresses very well, e.g. a stream of zeros).

Note that the given bufferSize is not a hard limit, it can only be used to *increase* the amount of output buffer we're allowed to use: The function will choose `max(bufferSize, minBufferSizeNeededByLz4)` as the eventual output buffer size.

Setting `bufferSize = 0` is the legitimate way to set the output buffer size to be the minimum required to compress 16 KB inputs and is still a fast default.

decompress :: forall m. (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m () Source #

TODO check why decompressSizeHint is always 4

Internals

newtype ScopedLz4FrameCompressionContext Source #