bloomfilter-1.2.1: Pure and impure Bloom Filter implementations.Source codeContentsIndex
Data.BloomFilter.Easy
Portabilityportable
Stabilityunstable
MaintainerBryan O'Sullivan <bos@serpentine.com>
Contents
Easy creation and querying
Example: a spell checker
Useful defaults for creation
Description
An easy-to-use Bloom filter interface.
Synopsis
data Bloom a
easyList :: Hashable a => Double -> [a] -> Bloom a
elemB :: a -> Bloom a -> Bool
notElemB :: a -> Bloom a -> Bool
lengthB :: Bloom a -> Int
suggestSizing :: Int -> Double -> (Int, Int)
Easy creation and querying
data Bloom a Source
An immutable Bloom filter, suitable for querying from pure code.
show/hide Instances
easyListSource
:: Hashable a
=> Doubledesired false positive rate (0 < e < 1)
-> [a]values to populate with
-> Bloom a
Create a Bloom filter with the given false positive rate and members. The hash functions used are computed by the cheapHashes function from the Data.BloomFilter.Hash module.
elemB :: a -> Bloom a -> BoolSource
Query an immutable Bloom filter for membership. If the value is present, return True. If the value is not present, there is still some possibility that True will be returned.
notElemB :: a -> Bloom a -> BoolSource
Query an immutable Bloom filter for non-membership. If the value is present, return False. If the value is not present, there is still some possibility that False will be returned.
lengthB :: Bloom a -> IntSource
Return the size of an immutable Bloom filter, in bits.
Example: a spell checker

This example reads a dictionary file containing one word per line, constructs a Bloom filter with a 1% false positive rate, and spellchecks its standard input. Like the Unix spell command, it prints each word that it does not recognize.

import Data.BloomFilter.Easy (easyList, elemB)

main = do
  filt <- (easyList 0.01 . words) `fmap` readFile "usrsharedictwords"
  let check word | elemB word filt = ""
                 | otherwise         = word ++ "\n"
  interact (concat . map check . lines)
Useful defaults for creation
suggestSizingSource
:: Intexpected maximum capacity
-> Doubledesired false positive rate (0 < e < 1)
-> (Int, Int)

Suggest a good combination of filter size and number of hash functions for a Bloom filter, based on its expected maximum capacity and a desired false positive rate.

The false positive rate is the rate at which queries against the filter should return True when an element is not actually present. It should be a fraction between 0 and 1, so a 1% false positive rate is represented by 0.01.

Produced by Haddock version 2.4.2