Holumbus-Searchengine-1.2.3: A search and indexing engine.

Portabilityportable
Stabilityexperimental
MaintainerTimo B. Huebel (tbh@holumbus.org)
Safe HaskellNone

Holumbus.Data.Crunch

Contents

Description

Version : 0.1

This module provides compression for streams of 32-bit words. Because of some internal restriction in GHC, which makes all fixed integer size equal in terms of bit-width, the algorithm tries to crunch as much numbers as possible into a single 64-bit word.

Based on the Simple9 encoding scheme from this article:

  • Vo N. Anh, Alstair Moffat, "Inverted Index Compression Using Word-Aligned Binary Codes", Information Retrieval, 8 (1), 2005, pages 151-166

Synopsis

Compression

crunch8 :: [Word8] -> [Word64]Source

Crunching Word8 values, defined in terms of crunch64.

crunch16 :: [Word16] -> [Word64]Source

Crunching Word16 values, defined in terms of crunch64.

crunch32 :: [Word32] -> [Word64]Source

Crunching Word32 values, defined in terms of crunch64.

crunch64 :: [Word64] -> [Word64]Source

Crunch some values by encoding several values into one Word64. The values may not exceed the upper limit of (2 ^ 60) - 1. This precondition is not checked! The compression works best on small values, therefore a difference encoding (like the one in Holumbus.Data.DiffList) prior to compression pays off well.

Decompression

decrunch8 :: [Word64] -> [Word8]Source

Decrunching to Word8 values, defined in terms of decrunch64.

decrunch16 :: [Word64] -> [Word16]Source

Decrunching to Word16 values, defined in terms of decrunch64.

decrunch32 :: [Word64] -> [Word32]Source

Decrunching to Word32 values, defined in terms of decrunch64.

decrunch64 :: [Word64] -> [Word64]Source

Decrunch a list of crunched values. No checking for properly encoded values is done, weird results have to be expected if calling this function on a list of arbitrary values.