Copyright | (c) Niklas Hambüchen 2020 |
---|---|
License | MIT |
Maintainer | mail@nh2.me |
Stability | stable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Codec.Compression.LZ4.Conduit
Contents
Description
Help Wanted / TODOs
Please feel free to send me a pull request for any of the following items:
- TODO Block checksumming
- TODO Dictionary support
- TODO Performance:
Write a version of
compress
that emits ByteStrings of known constant length. That will allow us to do compression in a zero-copy fashion, writing compressed bytes directly into a the ByteStrings (e.g usingunsafePackMallocCString
or equivalent). We currently don't do that (instead, use allocaBytes + copying packCStringLen) to ensure that the ByteStrings generated are as compact as possible (for the case that `written < size`), since the currentcompress
conduit directly yields the outputs of LZ4F_compressUpdate() (unless they are of 0 length when they are buffered in the context tmp buffer). - TODO Try enabling checksums, then corrupt a bit and see if lz4c detects it.
- TODO Add `with*` style bracketed functions for creating the
LZ4F_createCompressionContext and Lz4FramePreferencesPtr
for prompt resource release,
in addition to the GC'd variants below.
This would replace our use of
finalizeForeignPtr
in the conduit.finalizeForeignPtr
seems almost as good, but note that it doesn't guarantee prompt resource release on exceptions; a `with*` style function that usesbracket
does. However, it isn't clear yet which one would be faster (what the cost ofmask
is compared to foreign pointer finalizers). Also note that prompt freeing has side benefits, such as reduced malloc() fragmentation (the closer malloc() and free() are to each other, the smaller is the chance to have malloc()s on top of the our malloc() in the heap, thus the smaller the chance that we cannot decrease the heap pointer upon free() (because "mallocs on top" render heap memory unreturnable to the OS; memory fragmentation).
Synopsis
- data Lz4FrameException = Lz4FormatException String
- data BlockSizeID
- data BlockMode
- data ContentChecksum
- data BlockChecksum
- data FrameType
- data FrameInfo = FrameInfo {}
- data Preferences = Preferences {}
- lz4DefaultPreferences :: Preferences
- compress :: (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m ()
- compressYieldImmediately :: (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m ()
- compressWithOutBufferSize :: forall m. (MonadUnliftIO m, MonadResource m) => CSize -> ConduitT ByteString ByteString m ()
- decompress :: forall m. (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m ()
- bsChunksOf :: Int -> ByteString -> [ByteString]
- newtype Lz4FrameCompressionContext = Lz4FrameCompressionContext {}
- newtype ScopedLz4FrameCompressionContext = ScopedLz4FrameCompressionContext {}
- newtype ScopedLz4FramePreferencesPtr = ScopedLz4FramePreferencesPtr {}
- newtype Lz4FramePreferencesPtr = Lz4FramePreferencesPtr {}
- newtype Lz4FrameDecompressionContext = Lz4FrameDecompressionContext {}
- lz4fCreatePreferences :: IO Lz4FramePreferencesPtr
- lz4fCreateCompressonContext :: HasCallStack => IO Lz4FrameCompressionContext
- lz4fCreateDecompressionContext :: HasCallStack => IO Lz4FrameDecompressionContext
- withScopedLz4fPreferences :: HasCallStack => (ScopedLz4FramePreferencesPtr -> IO a) -> IO a
- withScopedLz4fCompressionContext :: HasCallStack => (ScopedLz4FrameCompressionContext -> IO a) -> IO a
Documentation
data Lz4FrameException Source #
Constructors
Lz4FormatException String |
Instances
data BlockSizeID Source #
Constructors
LZ4F_default | |
LZ4F_max64KB | |
LZ4F_max256KB | |
LZ4F_max1MB | |
LZ4F_max4MB |
Instances
Constructors
LZ4F_blockLinked | |
LZ4F_blockIndependent |
Instances
Storable BlockMode Source # | |
Defined in Codec.Compression.LZ4.CTypes | |
Show BlockMode Source # | |
Eq BlockMode Source # | |
Ord BlockMode Source # | |
data ContentChecksum Source #
Constructors
LZ4F_noContentChecksum | |
LZ4F_contentChecksumEnabled |
Instances
data BlockChecksum Source #
Constructors
LZ4F_noBlockChecksum | |
LZ4F_blockChecksumEnabled |
Instances
Constructors
LZ4F_frame | |
LZ4F_skippableFrame |
Instances
Storable FrameType Source # | |
Defined in Codec.Compression.LZ4.CTypes | |
Show FrameType Source # | |
Eq FrameType Source # | |
Ord FrameType Source # | |
Constructors
FrameInfo | |
Fields
|
Instances
Storable FrameInfo Source # | |
Defined in Codec.Compression.LZ4.CTypes |
data Preferences Source #
Constructors
Preferences | |
Fields
|
Instances
Storable Preferences Source # | |
Defined in Codec.Compression.LZ4.CTypes Methods sizeOf :: Preferences -> Int # alignment :: Preferences -> Int # peekElemOff :: Ptr Preferences -> Int -> IO Preferences # pokeElemOff :: Ptr Preferences -> Int -> Preferences -> IO () # peekByteOff :: Ptr b -> Int -> IO Preferences # pokeByteOff :: Ptr b -> Int -> Preferences -> IO () # peek :: Ptr Preferences -> IO Preferences # poke :: Ptr Preferences -> Preferences -> IO () # |
compress :: (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m () Source #
Note [Single call to LZ4F_compressUpdate() can create multiple blocks] A single call to LZ4F_compressUpdate() can create multiple blocks, and handles buffers > 32-bit sizes; see: https://github.com/lz4/lz4/blob/52cac9a97342641315c76cfb861206d6acd631a8/lib/lz4frame.c#L601 So we don't need to loop around LZ4F_compressUpdate() to compress an arbitrarily large amount of input data, as long as the destination buffer is large enough.
compressYieldImmediately :: (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m () Source #
Compresses the incoming stream of ByteStrings with the lz4 frame format.
Yields every LZ4 output as a ByteString as soon as the lz4 frame library produces it.
Note that this does not imply ZL4 frame autoFlush (which affects when the lz4 frame library produces outputs).
compressWithOutBufferSize :: forall m. (MonadUnliftIO m, MonadResource m) => CSize -> ConduitT ByteString ByteString m () Source #
Compresses the incoming stream of ByteStrings with the lz4 frame format.
This function implements two optimisations to reduce unnecessary allocations:
- Incoming ByteStrings are processed in blocks of 16 KB, allowing us to use a single intermediate output buffer through the lifetime of the conduit.
- The
bufferSize
of the output buffer can controlled by the caller via thebufferSize
argument, to reduce the number of smallByteString
s beingyield
ed (especially in the case that the input data compresses very well, e.g. a stream of zeros).
Note that the given bufferSize
is not a hard limit, it can only be
used to *increase* the amount of output buffer we're allowed to use:
The function will choose `max(bufferSize, minBufferSizeNeededByLz4)`
as the eventual output buffer size.
Setting `bufferSize = 0` is the legitimate way to set the output buffer size to be the minimum required to compress 16 KB inputs and is still a fast default.
decompress :: forall m. (MonadUnliftIO m, MonadResource m) => ConduitT ByteString ByteString m () Source #
TODO check why decompressSizeHint is always 4
bsChunksOf :: Int -> ByteString -> [ByteString] Source #
Internals
newtype Lz4FrameCompressionContext Source #
Constructors
Lz4FrameCompressionContext | |
Fields |
Instances
newtype ScopedLz4FrameCompressionContext Source #
Constructors
ScopedLz4FrameCompressionContext | |
Instances
newtype ScopedLz4FramePreferencesPtr Source #
Constructors
ScopedLz4FramePreferencesPtr | |
Fields |
Instances
newtype Lz4FramePreferencesPtr Source #
Constructors
Lz4FramePreferencesPtr | |
Instances
newtype Lz4FrameDecompressionContext Source #
Constructors
Lz4FrameDecompressionContext | |
Fields |
Instances
lz4fCreateDecompressionContext :: HasCallStack => IO Lz4FrameDecompressionContext Source #
All notes that apply to lz4fCreateCompressonContext
apply here as well.
withScopedLz4fPreferences :: HasCallStack => (ScopedLz4FramePreferencesPtr -> IO a) -> IO a Source #
withScopedLz4fCompressionContext :: HasCallStack => (ScopedLz4FrameCompressionContext -> IO a) -> IO a Source #