bloomfilter-1.2.6.6: Pure and impure Bloom Filter implementations.

Portabilityportable
Stabilityunstable
MaintainerBryan O'Sullivan <bos@serpentine.com>

Data.BloomFilter.Easy

Contents

Description

An easy-to-use Bloom filter interface.

Synopsis

Easy creation and querying

data Bloom a Source

An immutable Bloom filter, suitable for querying from pure code.

Instances

Show (Bloom a) 
NFData (Bloom a) 

easyListSource

Arguments

:: Hashable a 
=> Double

desired false positive rate (0 < e < 1)

-> [a]

values to populate with

-> Bloom a 

Create a Bloom filter with the given false positive rate and members. The hash functions used are computed by the cheapHashes function from the Data.BloomFilter.Hash module.

elemB :: a -> Bloom a -> BoolSource

Query an immutable Bloom filter for membership. If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.

notElemB :: a -> Bloom a -> BoolSource

Query an immutable Bloom filter for non-membership. If the value is present, return False. If the value is not present, there is still some possibility that True will be returned.

lengthB :: Bloom a -> IntSource

Return the size of an immutable Bloom filter, in bits.

Example: a spell checker

This example reads a dictionary file containing one word per line, constructs a Bloom filter with a 1% false positive rate, and spellchecks its standard input. Like the Unix spell command, it prints each word that it does not recognize.

import Data.BloomFilter.Easy (easyList, elemB)

main = do
  filt <- (easyList 0.01 . words) `fmap` readFile "usrsharedictwords"
  let check word | elemB word filt = ""
                 | otherwise         = word ++ "\n"
  interact (concat . map check . lines)

Useful defaults for creation

safeSuggestSizingSource

Arguments

:: Int

expected maximum capacity

-> Double

desired false positive rate (0 < e < 1)

-> Either String (Int, Int) 

Suggest a good combination of filter size and number of hash functions for a Bloom filter, based on its expected maximum capacity and a desired false positive rate.

The false positive rate is the rate at which queries against the filter should return True when an element is not actually present. It should be a fraction between 0 and 1, so a 1% false positive rate is represented by 0.01.

suggestSizingSource

Arguments

:: Int

expected maximum capacity

-> Double

desired false positive rate (0 < e < 1)

-> (Int, Int) 

Behaves as safeSuggestSizing, but calls error if given invalid or out-of-range inputs.