Data.BloomFilter

bloomfilter-1.0: Pure and impure Bloom Filter implementations.

Data.BloomFilter

Portability	portable
Stability	unstable
Maintainer	Bryan O'Sullivan <bos@serpentine.com>

Contents

Ease of use
Performance

Immutable Bloom filters

Creation
Accessors

Mutable Bloom filters

Creation
Accessors
Mutation

The underlying representation

Description

A fast, space efficient Bloom filter implementation. A Bloom filter is a set-like data structure that provides a probabilistic membership test.

Queries do not give false negatives. When an element is added to a filter, a subsequent membership test will definitely return True.
False negatives are possible. If an element has not been added to a filter, a membership test may nevertheless indicate that the element is present.

This module provides low-level control. For an easier to use interface, see the Data.BloomFilter.Easy module.

Synopsis

type Hash = Word32

data Bloom a

data MBloom s a

unfoldB :: (a -> [Hash]) -> Int -> (b -> Maybe (a, b)) -> b -> Bloom a

fromListB :: (a -> [Hash]) -> Int -> [a] -> Bloom a

createB :: (a -> [Hash]) -> Int -> (forall s. MBloom s a -> ST s z) -> Bloom a

lengthB :: Bloom a -> Int

elemB :: a -> Bloom a -> Bool

newMB :: (a -> [Hash]) -> Int -> ST s (MBloom s a)

unsafeFreezeMB :: MBloom s a -> ST s (Bloom a)

thawMB :: Bloom a -> ST s (MBloom s a)

lengthMB :: MBloom s a -> Int

elemMB :: a -> MBloom s a -> ST s Bool

insertMB :: MBloom s a -> a -> ST s ()

bitArrayB :: Bloom a -> UArray Int Hash

bitArrayMB :: MBloom s a -> STUArray s Int Hash

Each of the functions for creating Bloom filters accepts two parameters:

The number of bits that should be used for the filter. Note that a filter is fixed in size; it cannot be resized after creation.
A function that accepts a value, and should return a fixed-size list of hashes of that value. To keep the false positive rate low, the hashes computes should, as far as possible, be independent.

By choosing these parameters with care, it is possible to tune for a particular false positive rate. The suggestSizing function in the Data.BloomFilter.Easy module calculates useful estimates for these parameters.

This module provides both mutable and immutable interfaces for creating and querying a Bloom filter. It is most useful as a low-level way to create a Bloom filter with a custom set of characteristics, perhaps in combination with the hashing functions in Hash.

For a higher-level interface that is easy to use, see the Data.BloomFilter.Easy module.

The implementation has been carefully tuned for high performance and low space consumption.

For efficiency, the number of bits requested when creating a Bloom filter is rounded up to the nearest power of two. This lets the implementation use bitwise operations internally, instead of much more expensive multiplication, division, and modulus operations.

type Hash = Word32

A hash value is 32 bits wide. This limits the maximum size of a filter to about four billion elements, or 512 megabytes of memory.

data Bloom a

An immutable Bloom filter, suitable for querying from pure code.

show/hide

Instances

data MBloom s a

A mutable Bloom filter, for use within the ST monad.

show/hide

Instances

Show (MBloom s a)

Immutable Bloom filters

unfoldB

::
=> a -> [Hash]	family of hash functions to use
-> Int	number of bits in filter
-> b -> Maybe (a, b)	seeding function
-> b	initial seed
-> Bloom a
Build an immutable Bloom filter from a seed value. The seeding function populates the filter as follows. If it returns `Nothing`, it is finished producing values to insert into the filter. If it returns `Just (a,b)`, `a` is added to the filter and `b` is used as a new seed.

fromListB

::
=> a -> [Hash]	family of hash functions to use
-> Int	number of bits in filter
-> [a]	values to populate with
-> Bloom a
Create an immutable Bloom filter, populating it from a list of values. Here is an example that uses the `cheapHashes` function from the Data.BloomFilter.Hash module to create a hash function that returns three hashes. import Data.BloomFilter.Hash (cheapHashes) filt = fromListB (cheapHashes 3) 1024 ["foo", "bar", "quux"]

createB

::
=> a -> [Hash]	family of hash functions to use
-> Int	number of bits in filter
-> forall s. MBloom s a -> ST s z	setup function (result is discarded)
-> Bloom a
Create an immutable Bloom filter, using the given setup function which executes in the `ST` monad. Example: import Data.BloomFilter.Hash (cheapHashes) filter = createB (cheapHashes 3) 1024 $ mf -> do insertMB mf "foo" insertMB mf "bar" Note that the result of the setup function is not used.

lengthB :: Bloom a -> Int

Return the size of an immutable Bloom filter, in bits.

elemB :: a -> Bloom a -> Bool

Query an immutable Bloom filter for membership. If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.

Mutable Bloom filters

newMB

::
=> a -> [Hash]	family of hash functions to use
-> Int	number of bits in filter
-> ST s (MBloom s a)
Create a new mutable Bloom filter. For efficiency, the number of bits used may be larger than the number requested. It is always rounded up to the nearest higher power of two. For a safer creation interface, use `createB`. To convert a mutable filter to an immutable filter for use in pure code, use `unsafeFreezeMB`.

unsafeFreezeMB :: MBloom s a -> ST s (Bloom a)

Create an immutable Bloom filter from a mutable one. The mutable filter must not be modified afterwards, or a runtime crash may occur. For a safer creation interface, use createB.

thawMB :: Bloom a -> ST s (MBloom s a)

Copy an immutable Bloom filter to create a mutable one. There is no non-copying equivalent.

lengthMB :: MBloom s a -> Int

Return the size of a mutable Bloom filter, in bits.

elemMB :: a -> MBloom s a -> ST s Bool

Query a mutable Bloom filter for membership. If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.

insertMB :: MBloom s a -> a -> ST s ()

Insert a value into a mutable Bloom filter. Afterwards, a membership query for the same value is guaranteed to return True.

The underlying representation

If you serialize the raw bit arrays below to disk, do not expect them to be portable to systems with different conventions for endianness or word size.

The raw bit array used by the immutable Bloom type.

bitArrayB :: Bloom a -> UArray Int Hash

The raw bit array used by the immutable MBloom type.

bitArrayMB :: MBloom s a -> STUArray s Int Hash

Produced by Haddock version 2.4.2