Copyright | Bryan O'Sullivan |
---|---|
License | BSD3 |
Maintainer | Bryan O'Sullivan <bos@serpentine.com> |
Stability | unstable |
Portability | portable |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
An easy-to-use Bloom filter interface.
Easy creation and querying
An immutable Bloom filter, suitable for querying from pure code.
:: Hashable a | |
=> Double | desired false positive rate (0 < e < 1) |
-> [a] | values to populate with |
-> Bloom a |
Create a Bloom filter with the given false positive rate and
members. The hash functions used are computed by the cheapHashes
function from the Hash
module.
elem :: a -> Bloom a -> Bool Source #
Query an immutable Bloom filter for membership. If the value is
present, return True
. If the value is not present, there is
still some possibility that True
will be returned.
notElem :: a -> Bloom a -> Bool Source #
Query an immutable Bloom filter for non-membership. If the value
is present, return False
. If the value is not present, there
is still some possibility that False
will be returned.
Example: a spell checker
This example reads a dictionary file containing one word per line,
constructs a Bloom filter with a 1% false positive rate, and
spellchecks its standard input. Like the Unix spell
command, it
prints each word that it does not recognize.
import Data.BloomFilter.Easy (easyList, elemB) main = do filt <- (easyList
0.01 . words) `fmap` readFile "usrsharedictwords" let check word |elemB
word filt = "" | otherwise = word ++ "\n" interact (concat . map check . lines)
Useful defaults for creation
:: Int | expected maximum capacity |
-> Double | desired false positive rate (0 < e < 1) |
-> Either String (Int, Int) |
Suggest a good combination of filter size and number of hash functions for a Bloom filter, based on its expected maximum capacity and a desired false positive rate.
The false positive rate is the rate at which queries against the
filter should return True
when an element is not actually
present. It should be a fraction between 0 and 1, so a 1% false
positive rate is represented by 0.01.
Behaves as safeSuggestSizing
, but calls error
if given
invalid or out-of-range inputs.