bloomfilter-2.0.1.0: Pure and impure Bloom Filter implementations.

CopyrightBryan O'Sullivan
LicenseBSD3
MaintainerBryan O'Sullivan <bos@serpentine.com>
Stabilityunstable
Portabilityportable
Safe HaskellNone
LanguageHaskell98

Data.BloomFilter.Mutable

Contents

Description

A fast, space efficient Bloom filter implementation. A Bloom filter is a set-like data structure that provides a probabilistic membership test.

  • Queries do not give false negatives. When an element is added to a filter, a subsequent membership test will definitely return True.
  • False positives are possible. If an element has not been added to a filter, a membership test may nevertheless indicate that the element is present.

This module provides low-level control. For an easier to use interface, see the Data.BloomFilter.Easy module.

Synopsis

Overview

Each of the functions for creating Bloom filters accepts two parameters:

  • The number of bits that should be used for the filter. Note that a filter is fixed in size; it cannot be resized after creation.
  • A function that accepts a value, and should return a fixed-size list of hashes of that value. To keep the false positive rate low, the hashes computes should, as far as possible, be independent.

By choosing these parameters with care, it is possible to tune for a particular false positive rate. The suggestSizing function in the Data.BloomFilter.Easy module calculates useful estimates for these parameters.

Ease of use

This module provides both mutable interfaces for creating and querying a Bloom filter. It is most useful as a low-level way to manage a Bloom filter with a custom set of characteristics.

Performance

The implementation has been carefully tuned for high performance and low space consumption.

For efficiency, the number of bits requested when creating a Bloom filter is rounded up to the nearest power of two. This lets the implementation use bitwise operations internally, instead of much more expensive multiplication, division, and modulus operations.

Types

type Hash = Word32 Source

A hash value is 32 bits wide. This limits the maximum size of a filter to about four billion elements, or 512 megabytes of memory.

data MBloom s a Source

A mutable Bloom filter, for use within the ST monad.

Instances

Show (MBloom s a) 

Mutable Bloom filters

Creation

new Source

Arguments

:: (a -> [Hash])

family of hash functions to use

-> Int

number of bits in filter

-> ST s (MBloom s a) 

Create a new mutable Bloom filter. For efficiency, the number of bits used may be larger than the number requested. It is always rounded up to the nearest higher power of two, but will be clamped at a maximum of 4 gigabits, since hashes are 32 bits in size.

Accessors

length :: MBloom s a -> Int Source

Return the size of a mutable Bloom filter, in bits.

elem :: a -> MBloom s a -> ST s Bool Source

Query a mutable Bloom filter for membership. If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.

Mutation

insert :: MBloom s a -> a -> ST s () Source

Insert a value into a mutable Bloom filter. Afterwards, a membership query for the same value is guaranteed to return True.

The underlying representation

If you serialize the raw bit arrays below to disk, do not expect them to be portable to systems with different conventions for endianness or word size.

The raw bit array used by the mutable MBloom type.