MissingH- Large utility library

MaintainerJohn Goerzen <jgoerzen@complete.org>




GZip file decompression

Copyright (c) 2004 John Goerzen, jgoerzen@complete.org

The GZip format is described in RFC1952.


GZip Files

GZip files contain one or more Sections. Each Section, on disk, begins with a GZip Header, then stores the compressed data itself, and finally stores a GZip Footer.

The Header identifies the file as a GZip file, records the original modification date and time, and, in some cases, also records the original filename and comments.

The Footer contains a GZip CRC32 checksum over the decompressed data as well as a 32-bit length of the decompressed data. The module Data.Hash.CRC32.GZip is used to validate stored CRC32 values.

The vast majority of GZip files contain only one Section. Standard tools that work with GZip files create single-section files by default.

Multi-section files can be created by simply concatenating two existing GZip files together. The standard gunzip and zcat tools will simply concatenate the decompressed data when reading these files back. The decompress function in this module will do the same.

When reading data from this module, please use caution regarding how you access it. For instance, if you are wanting to write the decompressed stream to disk and validate its CRC32 value, you could use the decompress function. However, you should process the entire stream before you check the value of the Bool it returns. Otherwise, you will force Haskell to buffer the entire file in memory just so it can check the CRC32.


data Header Source

The data structure representing the GZip header. This occurs at the beginning of each Section on disk.




method :: Int

Compression method. Only 8 is defined at present.

flags :: Int
extra :: Maybe String
filename :: Maybe String
comment :: Maybe String
mtime :: Word32

Modification time of the original file

xfl :: Int

Extra flags

os :: Int

Creating operating system


type Section = (Header, String, Footer)Source

A section represents a compressed component in a GZip file. Every GZip file has at least one.

data GZipError Source



CRC-32 check failed


Couldn't find a GZip header


Compressed with something other than method 8 (deflate)

UnknownError String

Other problem arose

data Footer Source

Stored on-disk at the end of each section.




size :: Word32

The size of the original, decompressed data

crc32 :: Word32

The stored GZip CRC-32 of the original, decompressed data

crc32valid :: Bool

Whether or not the stored CRC-32 matches the calculated CRC-32 of the data

Whole-File Processing

decompress :: String -> (String, Maybe GZipError)Source

Read a GZip file, decompressing all sections that are found.

Returns a decompresed data stream and Nothing, or an unreliable string and Just (error). If you get anything other than Nothing, the String returned should be discarded.



:: Handle

Input handle

-> Handle

Output handle

-> IO (Maybe GZipError) 

Read a GZip file, decompressing all sections found.

Writes the decompressed data stream to the given output handle.

Returns Nothing if the action was successful, or Just GZipError if there was a problem. If there was a problem, the data written to the output handle should be discarded.

Section Processing

read_header :: String -> Either GZipError (Header, String)Source

Read the GZip header. Return (Header, Remainder).

read_section :: String -> Either GZipError (Section, String)Source

Read one section, returning (ThisSection, Remainder)