Portability | portable |
---|---|

Stability | unstable |

Maintainer | Bryan O'Sullivan <bos@serpentine.com> |

A fast, space efficient Bloom filter implementation. A Bloom filter is a set-like data structure that provides a probabilistic membership test.

- Queries do not give false negatives. When an element is added to
a filter, a subsequent membership test will definitely return
`True`

. - False positives
*are*possible. If an element has not been added to a filter, a membership test*may*nevertheless indicate that the element is present.

This module provides low-level control. For an easier to use interface, see the Data.BloomFilter.Easy module.

- type Hash = Word32
- data Bloom a
- data MBloom s a
- unfoldB :: forall a b. (a -> [Hash]) -> Int -> (b -> Maybe (a, b)) -> b -> Bloom a
- fromListB :: (a -> [Hash]) -> Int -> [a] -> Bloom a
- emptyB :: (a -> [Hash]) -> Int -> Bloom a
- singletonB :: (a -> [Hash]) -> Int -> a -> Bloom a
- lengthB :: Bloom a -> Int
- elemB :: a -> Bloom a -> Bool
- notElemB :: a -> Bloom a -> Bool
- insertB :: a -> Bloom a -> Bloom a
- insertListB :: [a] -> Bloom a -> Bloom a
- createB :: (a -> [Hash]) -> Int -> (forall s. MBloom s a -> ST s z) -> Bloom a
- modifyB :: (forall s. MBloom s a -> ST s z) -> Bloom a -> Bloom a
- newMB :: (a -> [Hash]) -> Int -> ST s (MBloom s a)
- unsafeFreezeMB :: MBloom s a -> ST s (Bloom a)
- thawMB :: Bloom a -> ST s (MBloom s a)
- lengthMB :: MBloom s a -> Int
- elemMB :: a -> MBloom s a -> ST s Bool
- insertMB :: MBloom s a -> a -> ST s ()
- bitArrayB :: Bloom a -> UArray Int Hash
- bitArrayMB :: MBloom s a -> STUArray s Int Hash

# Overview

Each of the functions for creating Bloom filters accepts two parameters:

- The number of bits that should be used for the filter. Note that a filter is fixed in size; it cannot be resized after creation.
- A function that accepts a value, and should return a fixed-size list of hashes of that value. To keep the false positive rate low, the hashes computes should, as far as possible, be independent.

By choosing these parameters with care, it is possible to tune for
a particular false positive rate. The `suggestSizing`

function in
the Data.BloomFilter.Easy module calculates useful estimates for
these parameters.

## Ease of use

This module provides both mutable and immutable interfaces for
creating and querying a Bloom filter. It is most useful as a
low-level way to create a Bloom filter with a custom set of
characteristics, perhaps in combination with the hashing functions
in `Hash`

.

For a higher-level interface that is easy to use, see the
`Data.BloomFilter.Easy`

module.

## Performance

The implementation has been carefully tuned for high performance and low space consumption.

For efficiency, the number of bits requested when creating a Bloom filter is rounded up to the nearest power of two. This lets the implementation use bitwise operations internally, instead of much more expensive multiplication, division, and modulus operations.

# Types

A hash value is 32 bits wide. This limits the maximum size of a filter to about four billion elements, or 512 megabytes of memory.

An immutable Bloom filter, suitable for querying from pure code.

# Immutable Bloom filters

## Creation

:: (a -> [Hash]) | family of hash functions to use |

-> Int | number of bits in filter |

-> [a] | values to populate with |

-> Bloom a |

Create an immutable Bloom filter, populating it from a list of values.

Here is an example that uses the `cheapHashes`

function from the
Data.BloomFilter.Hash module to create a hash function that
returns three hashes.

import Data.BloomFilter.Hash (cheapHashes) filt = fromListB (cheapHashes 3) 1024 ["foo", "bar", "quux"]

Create an empty Bloom filter.

This function is subject to fusion with `insertB`

and `insertListB`

.

:: (a -> [Hash]) | family of hash functions to use |

-> Int | number of bits in filter |

-> a | element to insert |

-> Bloom a |

Create a Bloom filter with a single element.

This function is subject to fusion with `insertB`

and `insertListB`

.

## Accessors

elemB :: a -> Bloom a -> BoolSource

Query an immutable Bloom filter for membership. If the value is
present, return `True`

. If the value is not present, there is
*still* some possibility that `True`

will be returned.

notElemB :: a -> Bloom a -> BoolSource

Query an immutable Bloom filter for non-membership. If the value
*is* present, return `False`

. If the value is not present, there
is *still* some possibility that `True`

will be returned.

## Mutators

insertB :: a -> Bloom a -> Bloom aSource

Create a new Bloom filter from an existing one, with the given member added.

This function may be expensive, as it is likely to cause the underlying bit array to be copied.

Repeated applications of this function with itself are subject to fusion.

insertListB :: [a] -> Bloom a -> Bloom aSource

Create a new Bloom filter from an existing one, with the given members added.

This function may be expensive, as it is likely to cause the underlying bit array to be copied.

Repeated applications of this function with itself are subject to fusion.

# Mutable Bloom filters

## Immutability wrappers

:: (a -> [Hash]) | family of hash functions to use |

-> Int | number of bits in filter |

-> (forall s. MBloom s a -> ST s z) | setup function (result is discarded) |

-> Bloom a |

Create an immutable Bloom filter, using the given setup function
which executes in the `ST`

monad.

Example:

import Data.BloomFilter.Hash (cheapHashes) filter = createB (cheapHashes 3) 1024 $ mf -> do insertMB mf "foo" insertMB mf "bar"

Note that the result of the setup function is not used.

## Creation

Create a new mutable Bloom filter. For efficiency, the number of bits used may be larger than the number requested. It is always rounded up to the nearest higher power of two, but clamped at a maximum of 4 gigabits, since hashes are 32 bits in size.

For a safer creation interface, use `createB`

. To convert a
mutable filter to an immutable filter for use in pure code, use
`unsafeFreezeMB`

.

unsafeFreezeMB :: MBloom s a -> ST s (Bloom a)Source

Create an immutable Bloom filter from a mutable one. The mutable
filter *must not* be modified afterwards, or a runtime crash may
occur. For a safer creation interface, use `createB`

.

thawMB :: Bloom a -> ST s (MBloom s a)Source

Copy an immutable Bloom filter to create a mutable one. There is no non-copying equivalent.

## Accessors

elemMB :: a -> MBloom s a -> ST s BoolSource

Query a mutable Bloom filter for membership. If the value is
present, return `True`

. If the value is not present, there is
*still* some possibility that `True`

will be returned.

## Mutation

insertMB :: MBloom s a -> a -> ST s ()Source

Insert a value into a mutable Bloom filter. Afterwards, a
membership query for the same value is guaranteed to return `True`

.

# The underlying representation

If you serialize the raw bit arrays below to disk, do not expect them to be portable to systems with different conventions for endianness or word size.

The raw bit array used by the immutable `Bloom`

type.

The raw bit array used by the immutable `MBloom`

type.