Safe Haskell  None 

Language  Haskell2010 
 data BloomFilter a
 newtype BloomFilterException = BloomFilterException String
 new :: SipKey > Int > Int > IO (BloomFilter a)
 data SipKey :: * = SipKey ~Word64 ~Word64
 insert :: Hashable a => BloomFilter a > a > IO Bool
 lookup :: Hashable a => BloomFilter a > a > IO Bool
 unionInto :: BloomFilter a > BloomFilter a > IO ()
 intersectionInto :: BloomFilter a > BloomFilter a > IO ()
 clone :: BloomFilter a > IO (BloomFilter a)
 serialize :: StableHashable a => BloomFilter a > IO ByteString
 deserialize :: StableHashable a => SipKey > ByteString > IO (BloomFilter a)
 fpr :: Int > Int > Int > Int > Double
Documentation
A threadsafe mutable bloom filter. Some additional functionality for advanced users is exposed in Control.Concurrent.BloomFilter.Internal.
data BloomFilter a Source #
A mutable bloom filter representing a set of Hashable
values of type a
.
A bloom filter is a setlike probabilistic hashbased data structure.
Elements can be insert
ed and lookup
ed again. The bloom filter
takes a constant amount of memory regardless of the size and number of
elements inserted, however as the bloom filter "grows" the likelihood of
falsepositives being returned by lookup
increases. fpr
can be used to
estimate the expected falsepositive rate.
newtype BloomFilterException Source #
Exceptions that may be thrown by operations in this library.
Creation
:: SipKey  The secret key to be used for hashing values for this bloom filter. 
> Int 

> Int 

> IO (BloomFilter a) 
Create a new bloom filter of elements of type a
with the given hash key
and parameters. fpr
can be useful for calculating the k
parameter, or
determining a good filter size.
The parameters must satisfy the following conditions, otherwise a
BloomFilterException
will be thrown:
k > 0
log2l >= 0 && log2l <= wordSizeInBits
log2l + k*(logBase 2 wordSizeInBits) <= 128
In addition, performance on 64bit machines will be best when
log2l + k*(logBase 2 wordSizeInBits) <= 64
where we require only 64 hash
bits for each element. (Performance on 32bit machines will be worse in all
cases, as we're doing 64bit arithmetic.)
Example: on a 32bit machine, the following produces a ~4KB bloom filter of
2^10
32bit words, using 3 bits per element:
do key read <$ getEnv THE_SECRET_KEY Bloom.new key 3 10
A 128bit secret key. This should be generated randomly and must be kept secret.
Operations
lookup :: Hashable a => BloomFilter a > a > IO Bool Source #
O(size_of_element). Look up the value in the bloom filter, returning
True
if the element is possibly in the set, and False
if the element is
certainly not in the set.
The likelihood that this returns True
on an element that was not
previously insert
ed depends on the parameters the filter was created
with, and the number of elements already inserted. The fpr
function can
help you estimate this.
Copying and Combining
:: BloomFilter a  Source, left unmodified. 
> BloomFilter a  Target, receiving elements from source. 
> IO () 
O(l_src+l_target). Write all elements in the first bloom filter into the second. This operation is lossless; ignoring writes to the source bloom filter that happen during this operation (see below), the target bloom filter will be identical to the filter produced had the elements been inserted into the target originally.
The source and target must have been created with the same key and
k
value. In addition the target must not be larger (i.e. the l
value)
than the source, and they must both use 128/64 bit hashes. This throws a
BloomFilterException
when those constraints are not met.
This operation is not linearizable with respect to insert
type operations;
elements being written to the source bloomfilter during this operation may
or may not make it into the target "at random".
:: BloomFilter a  Source, left unmodified. 
> BloomFilter a  Target, receiving elements from source. 
> IO () 
O(l_src+l_target). Make target
the intersection of the source and
target sets. This operation is "lossy" in that the false positive ratio of
target after the operation may be higher than if the elements forming the
intersection had been insert
ed directly into target.
The constraints and comments re. linearizability in unionInto
also apply
here.
clone :: BloomFilter a > IO (BloomFilter a) Source #
Create a copy of the input BloomFilter
.
This operation is not linearizable with respect to insert
type operations;
elements being written to the source bloomfilter during this operation may
or may not make it into the target "at random".
Serialization
serialize :: StableHashable a => BloomFilter a > IO ByteString Source #
Serialize a bloom filter to a strict ByteString
, which can be
deserialize
ed once again. Only a hash of the SipKey
is stored in the
serialized format.
This operation is not linearizable with respect to insert
type operations;
elements being written to the source bloomfilter during this operation may
or may not make it into the serialized ByteString
"at random".
deserialize :: StableHashable a => SipKey > ByteString > IO (BloomFilter a) Source #
Deserialize a BloomFilter
from a ByteString
created with serialize
or unsafeSerialize
. The key that was used to create the bloom filter is
not stored for security, and must be provided here. However if the key
provided does not match the key it was originally created with, a
BloomFilterException
will be thrown.
Utilities
:: Int 

> Int 

> Int 

> Int 

> Double 
An estimate of the falsepositive rate for a bloom1 filter. For a filter
with the provided parameters and having n
elements, the value returned
here is the percentage, over the course of many queries for elements not in
the filter, of those queries which would be expected to return an incorrect
True
result.
This function is slow but the complexity is bounded and can accept inputs of any size.