zlib-0.6.1.2: Compression and decompression in the gzip and zlib formats

Codec.Compression.Zlib.Internal

Description

Pure and IO stream based interfaces to lower level zlib wrapper

Synopsis

# Pure interface

Compress a data stream provided as a lazy ByteString.

There are no expected error conditions. All input data streams are valid. It is possible for unexpected errors to occur, such as running out of memory, or finding the wrong version of the zlib C library, these are thrown as exceptions.

Decompress a data stream provided as a lazy ByteString.

It will throw an exception if any error is encountered in the input data. If you need more control over error handling then use one the incremental versions, decompressST or decompressIO.

# Monadic incremental interface

The pure compress and decompress functions are streaming in the sense that they can produce output without demanding all input, however they need the input data stream as a lazy ByteString. Having the input data stream as a lazy ByteString often requires using lazy I/O which is not appropriate in all cicumstances.

For these cases an incremental interface is more appropriate. This interface allows both incremental input and output. Chunks of input data are supplied one by one (e.g. as they are obtained from an input source like a file or network source). Output is also produced chunk by chunk.

The incremental input and output is managed via the CompressStream and DecompressStream types. They represents the unfolding of the process of compressing and decompressing. They operates in either the ST or IO monads. They can be lifted into other incremental abstractions like pipes or conduits, or they can be used directly in the following style.

## Using incremental compression

In a loop:

• Inspect the status of the stream
• When it is CompressInputRequired then you should call the action, passing a chunk of input (or empty when no more input is available) to get the next state of the stream and continue the loop.
• When it is CompressOutputAvailable then do something with the given chunk of output, and call the action to get the next state of the stream and continue the loop.
• When it is CompressStreamEnd then terminate the loop.

Note that you cannot stop as soon as you have no more input, you need to carry on until all the output has been collected, i.e. until you get to CompressStreamEnd.

Here is an example where we get input from one file handle and send the compressed output to another file handle.

go :: Handle -> Handle -> CompressStream IO -> IO ()
go inh outh (CompressInputRequired next) = do
inchunk <- BS.hGet inh 4096
go inh outh =<< next inchunk
go inh outh (CompressOutputAvailable outchunk next) =
BS.hPut outh outchunk
go inh outh =<< next
go _ _ CompressStreamEnd = return ()

The same can be achieved with foldCompressStream:

foldCompressStream
(\next -> do inchunk <- BS.hGet inh 4096; next inchunk)
(\outchunk next -> do BS.hPut outh outchunk; next)
(return ())

The unfolding of the compression process, where you provide a sequence of uncompressed data chunks as input and receive a sequence of compressed data chunks as output. The process is incremental, in that the demand for input and provision of output are interleaved.

Constructors

 CompressInputRequired FieldscompressSupplyInput :: ByteString -> m (CompressStream m) CompressOutputAvailable FieldscompressOutput :: !ByteString compressNext :: m (CompressStream m) CompressStreamEnd

Incremental compression in the ST monad. Using ST makes it possible to write pure lazy functions while making use of incremental compression.

Incremental compression in the IO monad.

foldCompressStream :: Monad m => ((ByteString -> m a) -> m a) -> (ByteString -> m a -> m a) -> m a -> CompressStream m -> m a Source #

A fold over the CompressStream in the given monad.

One way to look at this is that it runs the stream, using callback functions for the three stream events.

foldCompressStreamWithInput :: (ByteString -> a -> a) -> a -> (forall s. CompressStream (ST s)) -> ByteString -> a Source #

A variant on foldCompressStream that is pure rather than operating in a monad and where the input is provided by a lazy ByteString. So we only have to deal with the output and end parts, making it just like a foldr on a list of output chunks.

For example:

toChunks = foldCompressStreamWithInput (:) []

## Using incremental decompression

The use of DecompressStream is very similar to CompressStream but with a few differences:

• There is the extra possibility of a DecompressStreamError
• There can be extra trailing data after a compressed stream, and the DecompressStreamEnd includes that.

Otherwise the same loop style applies, and there are fold functions.

The unfolding of the decompression process, where you provide a sequence of compressed data chunks as input and receive a sequence of uncompressed data chunks as output. The process is incremental, in that the demand for input and provision of output are interleaved.

To indicate the end of the input supply an empty input chunk. Note that for gzipFormat with the default decompressAllMembers True you will have to do this, as the decompressor will look for any following members. With decompressAllMembers False the decompressor knows when the data ends and will produce DecompressStreamEnd without you having to supply an empty chunk to indicate the end of the input.

Constructors

 DecompressInputRequired FieldsdecompressSupplyInput :: ByteString -> m (DecompressStream m) DecompressOutputAvailable FieldsdecompressOutput :: !ByteString decompressNext :: m (DecompressStream m) DecompressStreamEnd Includes any trailing unconsumed input data. FieldsdecompressUnconsumedInput :: ByteString DecompressStreamError An error code FieldsdecompressStreamError :: DecompressError

The possible error cases when decompressing a stream.

This can be shown to give a human readable error message.

Constructors

 TruncatedInput The compressed data stream ended prematurely. This may happen if the input data stream was truncated. DictionaryRequired It is possible to do zlib compression with a custom dictionary. This allows slightly higher compression ratios for short files. However such compressed streams require the same dictionary when decompressing. This error is for when we encounter a compressed stream that needs a dictionary, and it's not provided. DictionaryMismatch If the stream requires a dictionary and you provide one with the wrong DictionaryHash then you will get this error. DataFormatError String If the compressed data stream is corrupted in any way then you will get this error, for example if the input data just isn't a compressed zlib data stream. In particular if the data checksum turns out to be wrong then you will get all the decompressed data but this error at the end, instead of the normal sucessful StreamEnd.

Instances

 Source # Methods Source # MethodsshowList :: [DecompressError] -> ShowS # Source # Methods

Incremental decompression in the ST monad. Using ST makes it possible to write pure lazy functions while making use of incremental decompression.

Incremental decompression in the IO monad.

foldDecompressStream :: Monad m => ((ByteString -> m a) -> m a) -> (ByteString -> m a -> m a) -> (ByteString -> m a) -> (DecompressError -> m a) -> DecompressStream m -> m a Source #

A fold over the DecompressStream in the given monad.

One way to look at this is that it runs the stream, using callback functions for the four stream events.

foldDecompressStreamWithInput :: (ByteString -> a -> a) -> (ByteString -> a) -> (DecompressError -> a) -> (forall s. DecompressStream (ST s)) -> ByteString -> a Source #

A variant on foldCompressStream that is pure rather than operating in a monad and where the input is provided by a lazy ByteString. So we only have to deal with the output, end and error parts, making it like a foldr on a list of output chunks.

For example:

toChunks = foldDecompressStreamWithInput (:) [] throw

# The compression parameter types

The full set of parameters for compression. The defaults are defaultCompressParams.

The compressBufferSize is the size of the first output buffer containing the compressed data. If you know an approximate upper bound on the size of the compressed data then setting this parameter can save memory. The default compression output buffer size is 16k. If your extimate is wrong it does not matter too much, the default buffer size will be used for the remaining chunks.

Constructors

 CompressParams

Instances

 Source # MethodsshowList :: [CompressParams] -> ShowS #

The default set of parameters for compression. This is typically used with the compressWith function with specific parameters overridden.

The full set of parameters for decompression. The defaults are defaultDecompressParams.

The decompressBufferSize is the size of the first output buffer, containing the uncompressed data. If you know an exact or approximate upper bound on the size of the decompressed data then setting this parameter can save memory. The default decompression output buffer size is 32k. If your extimate is wrong it does not matter too much, the default buffer size will be used for the remaining chunks.

One particular use case for setting the decompressBufferSize is if you know the exact size of the decompressed data and want to produce a strict ByteString. The compression and deccompression functions use lazy ByteStrings but if you set the decompressBufferSize correctly then you can generate a lazy ByteString with exactly one chunk, which can be converted to a strict ByteString in O(1) time using concat . toChunks.

Constructors

 DecompressParams Fields

Instances

 Source # MethodsshowList :: [DecompressParams] -> ShowS #

The default set of parameters for decompression. This is typically used with the compressWith function with specific parameters overridden.

data Format Source #

The format used for compression or decompression. There are three variations.

Constructors

 GZip Deprecated: Use gzipFormat. Format constructors will be hidden in version 0.7 Zlib Deprecated: Use zlibFormat. Format constructors will be hidden in version 0.7 Raw Deprecated: Use rawFormat. Format constructors will be hidden in version 0.7 GZipOrZlib Deprecated: Use gzipOrZlibFormat. Format constructors will be hidden in version 0.7

Instances

 Source # Methods Source # MethodstoEnum :: Int -> Format #enumFrom :: Format -> [Format] #enumFromThen :: Format -> Format -> [Format] #enumFromTo :: Format -> Format -> [Format] #enumFromThenTo :: Format -> Format -> Format -> [Format] # Source # Methods(==) :: Format -> Format -> Bool #(/=) :: Format -> Format -> Bool # Source # Methods(<) :: Format -> Format -> Bool #(<=) :: Format -> Format -> Bool #(>) :: Format -> Format -> Bool #(>=) :: Format -> Format -> Bool #max :: Format -> Format -> Format #min :: Format -> Format -> Format # Source # MethodsshowsPrec :: Int -> Format -> ShowS #showList :: [Format] -> ShowS # Source # Associated Typestype Rep Format :: * -> * # Methodsfrom :: Format -> Rep Format x #to :: Rep Format x -> Format # type Rep Format Source # type Rep Format = D1 (MetaData "Format" "Codec.Compression.Zlib.Stream" "zlib-0.6.1.2-3sjfdoEPlrt37YiyVABDl7" False) ((:+:) ((:+:) (C1 (MetaCons "GZip" PrefixI False) U1) (C1 (MetaCons "Zlib" PrefixI False) U1)) ((:+:) (C1 (MetaCons "Raw" PrefixI False) U1) (C1 (MetaCons "GZipOrZlib" PrefixI False) U1)))

The gzip format uses a header with a checksum and some optional meta-data about the compressed file. It is intended primarily for compressing individual files but is also sometimes used for network protocols such as HTTP. The format is described in detail in RFC #1952 http://www.ietf.org/rfc/rfc1952.txt

The zlib format uses a minimal header with a checksum but no other meta-data. It is especially designed for use in network protocols. The format is described in detail in RFC #1950 http://www.ietf.org/rfc/rfc1950.txt

The 'raw' format is just the compressed data stream without any additional header, meta-data or data-integrity checksum. The format is described in detail in RFC #1951 http://www.ietf.org/rfc/rfc1951.txt

This is not a format as such. It enabled zlib or gzip decoding with automatic header detection. This only makes sense for decompression.

The compression level parameter controls the amount of compression. This is a trade-off between the amount of compression and the time required to do the compression.

Constructors

 DefaultCompression Deprecated: Use defaultCompression. CompressionLevel constructors will be hidden in version 0.7 NoCompression Deprecated: Use noCompression. CompressionLevel constructors will be hidden in version 0.7 BestSpeed Deprecated: Use bestSpeed. CompressionLevel constructors will be hidden in version 0.7 BestCompression Deprecated: Use bestCompression. CompressionLevel constructors will be hidden in version 0.7 CompressionLevel Int

Instances

 Source # Methods Source # MethodsshowList :: [CompressionLevel] -> ShowS # Source # Associated Typestype Rep CompressionLevel :: * -> * # Methods Source # type Rep CompressionLevel = D1 (MetaData "CompressionLevel" "Codec.Compression.Zlib.Stream" "zlib-0.6.1.2-3sjfdoEPlrt37YiyVABDl7" False) ((:+:) ((:+:) (C1 (MetaCons "DefaultCompression" PrefixI False) U1) (C1 (MetaCons "NoCompression" PrefixI False) U1)) ((:+:) (C1 (MetaCons "BestSpeed" PrefixI False) U1) ((:+:) (C1 (MetaCons "BestCompression" PrefixI False) U1) (C1 (MetaCons "CompressionLevel" PrefixI False) (S1 (MetaSel (Nothing Symbol) NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Int))))))

The default compression level is 6 (that is, biased towards higher compression at expense of speed).

No compression, just a block copy.

The fastest compression method (less compression)

The slowest compression method (best compression).

A specific compression level between 0 and 9.

data Method Source #

The compression method

Constructors

 Deflated Deprecated: Use deflateMethod. Method constructors will be hidden in version 0.7

Instances

 Source # Methods Source # MethodstoEnum :: Int -> Method #enumFrom :: Method -> [Method] #enumFromThen :: Method -> Method -> [Method] #enumFromTo :: Method -> Method -> [Method] #enumFromThenTo :: Method -> Method -> Method -> [Method] # Source # Methods(==) :: Method -> Method -> Bool #(/=) :: Method -> Method -> Bool # Source # Methods(<) :: Method -> Method -> Bool #(<=) :: Method -> Method -> Bool #(>) :: Method -> Method -> Bool #(>=) :: Method -> Method -> Bool #max :: Method -> Method -> Method #min :: Method -> Method -> Method # Source # MethodsshowsPrec :: Int -> Method -> ShowS #showList :: [Method] -> ShowS # Source # Associated Typestype Rep Method :: * -> * # Methodsfrom :: Method -> Rep Method x #to :: Rep Method x -> Method # type Rep Method Source # type Rep Method = D1 (MetaData "Method" "Codec.Compression.Zlib.Stream" "zlib-0.6.1.2-3sjfdoEPlrt37YiyVABDl7" False) (C1 (MetaCons "Deflated" PrefixI False) U1)

'Deflate' is the only method supported in this version of zlib. Indeed it is likely to be the only method that ever will be supported.

This specifies the size of the compression window. Larger values of this parameter result in better compression at the expense of higher memory usage.

The compression window size is the value of the the window bits raised to the power 2. The window bits must be in the range 8..15 which corresponds to compression window sizes of 256b to 32Kb. The default is 15 which is also the maximum size.

The total amount of memory used depends on the window bits and the MemoryLevel. See the MemoryLevel for the details.

Constructors

 WindowBits Int DefaultWindowBits Deprecated: Use defaultWindowBits. WindowBits constructors will be hidden in version 0.7

Instances

 Source # Methods Source # Methods Source # MethodsshowList :: [WindowBits] -> ShowS # Source # Associated Typestype Rep WindowBits :: * -> * # Methods type Rep WindowBits Source # type Rep WindowBits = D1 (MetaData "WindowBits" "Codec.Compression.Zlib.Stream" "zlib-0.6.1.2-3sjfdoEPlrt37YiyVABDl7" False) ((:+:) (C1 (MetaCons "WindowBits" PrefixI False) (S1 (MetaSel (Nothing Symbol) NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Int))) (C1 (MetaCons "DefaultWindowBits" PrefixI False) U1))

The default WindowBits is 15 which is also the maximum size.

A specific compression window size, specified in bits in the range 8..15

The MemoryLevel parameter specifies how much memory should be allocated for the internal compression state. It is a tradoff between memory usage, compression ratio and compression speed. Using more memory allows faster compression and a better compression ratio.

The total amount of memory used for compression depends on the WindowBits and the MemoryLevel. For decompression it depends only on the WindowBits. The totals are given by the functions:

compressTotal windowBits memLevel = 4 * 2^windowBits + 512 * 2^memLevel
decompressTotal windowBits = 2^windowBits

For example, for compression with the default windowBits = 15 and memLevel = 8 uses 256Kb. So for example a network server with 100 concurrent compressed streams would use 25Mb. The memory per stream can be halved (at the cost of somewhat degraded and slower compressionby) by reducing the windowBits and memLevel by one.

Decompression takes less memory, the default windowBits = 15 corresponds to just 32Kb.

Constructors

 DefaultMemoryLevel Deprecated: Use defaultMemoryLevel. MemoryLevel constructors will be hidden in version 0.7 MinMemoryLevel Deprecated: Use minMemoryLevel. MemoryLevel constructors will be hidden in version 0.7 MaxMemoryLevel Deprecated: Use maxMemoryLevel. MemoryLevel constructors will be hidden in version 0.7 MemoryLevel Int

Instances

 Source # Methods Source # MethodsshowList :: [MemoryLevel] -> ShowS # Source # Associated Typestype Rep MemoryLevel :: * -> * # Methods type Rep MemoryLevel Source # type Rep MemoryLevel = D1 (MetaData "MemoryLevel" "Codec.Compression.Zlib.Stream" "zlib-0.6.1.2-3sjfdoEPlrt37YiyVABDl7" False) ((:+:) ((:+:) (C1 (MetaCons "DefaultMemoryLevel" PrefixI False) U1) (C1 (MetaCons "MinMemoryLevel" PrefixI False) U1)) ((:+:) (C1 (MetaCons "MaxMemoryLevel" PrefixI False) U1) (C1 (MetaCons "MemoryLevel" PrefixI False) (S1 (MetaSel (Nothing Symbol) NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 Int)))))

The default memory level. (Equivalent to memoryLevel 8)

Use minimum memory. This is slow and reduces the compression ratio. (Equivalent to memoryLevel 1)

Use maximum memory for optimal compression speed. (Equivalent to memoryLevel 9)

A specific level in the range 1..9

The strategy parameter is used to tune the compression algorithm.

The strategy parameter only affects the compression ratio but not the correctness of the compressed output even if it is not set appropriately.

Constructors

 DefaultStrategy Deprecated: Use defaultStrategy. CompressionStrategy constructors will be hidden in version 0.7 Filtered Deprecated: Use filteredStrategy. CompressionStrategy constructors will be hidden in version 0.7 HuffmanOnly Deprecated: Use huffmanOnlyStrategy. CompressionStrategy constructors will be hidden in version 0.7

Instances

 Source # Methods Source # Source # Methods Source # Methods Source # Methods Source # Associated Typestype Rep CompressionStrategy :: * -> * # Methods Source # type Rep CompressionStrategy = D1 (MetaData "CompressionStrategy" "Codec.Compression.Zlib.Stream" "zlib-0.6.1.2-3sjfdoEPlrt37YiyVABDl7" False) ((:+:) (C1 (MetaCons "DefaultStrategy" PrefixI False) U1) ((:+:) (C1 (MetaCons "Filtered" PrefixI False) U1) (C1 (MetaCons "HuffmanOnly" PrefixI False) U1)))

Use this default compression strategy for normal data.

Use the filtered compression strategy for data produced by a filter (or predictor). Filtered data consists mostly of small values with a somewhat random distribution. In this case, the compression algorithm is tuned to compress them better. The effect of this strategy is to force more Huffman coding and less string matching; it is somewhat intermediate between defaultCompressionStrategy and huffmanOnlyCompressionStrategy.

Use the Huffman-only compression strategy to force Huffman encoding only (no string match).