Portability | portable |
---|---|
Stability | experimental |
Maintainer | Timo B. Huebel (tbh@holumbus.org) |
Safe Haskell | None |
Version : 0.1
This module provides compression for streams of 32-bit words. Because of some internal restriction in GHC, which makes all fixed integer size equal in terms of bit-width, the algorithm tries to crunch as much numbers as possible into a single 64-bit word.
Based on the Simple9 encoding scheme from this article:
- Vo N. Anh, Alstair Moffat, "Inverted Index Compression Using Word-Aligned Binary Codes", Information Retrieval, 8 (1), 2005, pages 151-166
Compression
crunch64 :: [Word64] -> [Word64]Source
Crunch some values by encoding several values into one Word64
. The values may not exceed
the upper limit of (2 ^ 60) - 1
. This precondition is not checked! The compression works
best on small values, therefore a difference encoding (like the one in
Holumbus.Data.DiffList) prior to compression pays off well.
Decompression
decrunch16 :: [Word64] -> [Word16]Source
Decrunching to Word16
values, defined in terms of decrunch64
.
decrunch32 :: [Word64] -> [Word32]Source
Decrunching to Word32
values, defined in terms of decrunch64
.
decrunch64 :: [Word64] -> [Word64]Source
Decrunch a list of crunched values. No checking for properly encoded values is done, weird results have to be expected if calling this function on a list of arbitrary values.