Portability | portable |
---|---|

Stability | experimental |

Maintainer | Timo B. Huebel (tbh@holumbus.org) |

Safe Haskell | None |

Version : 0.1

This module provides compression for streams of 32-bit words. Because of some internal restriction in GHC, which makes all fixed integer size equal in terms of bit-width, the algorithm tries to crunch as much numbers as possible into a single 64-bit word.

Based on the Simple9 encoding scheme from this article:

- Vo N. Anh, Alstair Moffat,
"
*Inverted Index Compression Using Word-Aligned Binary Codes*", Information Retrieval, 8 (1), 2005, pages 151-166

# Compression

crunch64 :: [Word64] -> [Word64]Source

Crunch some values by encoding several values into one `Word64`

. The values may not exceed
the upper limit of `(2 ^ 60) - 1`

. This precondition is not checked! The compression works
best on small values, therefore a difference encoding (like the one in
Holumbus.Data.DiffList) prior to compression pays off well.

# Decompression

decrunch16 :: [Word64] -> [Word16]Source

Decrunching to `Word16`

values, defined in terms of `decrunch64`

.

decrunch32 :: [Word64] -> [Word32]Source

Decrunching to `Word32`

values, defined in terms of `decrunch64`

.

decrunch64 :: [Word64] -> [Word64]Source

Decrunch a list of crunched values. No checking for properly encoded values is done, weird results have to be expected if calling this function on a list of arbitrary values.