unagi-bloomfilter-0.1.1.0: A fast, cache-efficient, concurrent bloom filter

Safe HaskellNone
LanguageHaskell2010

Control.Concurrent.BloomFilter.Internal

Synopsis

Documentation

Some additional unsafe, low-level, and internal functions are exposed here for advanced users. The API should remain stable, except that functions may be added and no promises are made about the internals of the BloomFilter type itself.

new Source #

Arguments

:: SipKey

The secret key to be used for hashing values for this bloom filter.

-> Int

k: Number of independent bits of w to which we map an element. 3 is a good choice.

-> Int

log2l: The size of the filter, in machine words, as a power of 2. e.g. 3 means 2^3 or 8 machine words.

-> IO (BloomFilter a) 

Create a new bloom filter of elements of type a with the given hash key and parameters. fpr can be useful for calculating the k parameter, or determining a good filter size.

The parameters must satisfy the following conditions, otherwise a BloomFilterException will be thrown:

  • k > 0
  • log2l >= 0 && log2l <= wordSizeInBits
  • log2l + k*(logBase 2 wordSizeInBits) <= 128

In addition, performance on 64-bit machines will be best when log2l + k*(logBase 2 wordSizeInBits) <= 64 where we require only 64 hash bits for each element. (Performance on 32-bit machines will be worse in all cases, as we're doing 64-bit arithmetic.)

Example: on a 32-bit machine, the following produces a ~4KB bloom filter of 2^10 32-bit words, using 3 bits per element:

 do key read <$ getEnv THE_SECRET_KEY
    Bloom.new key 3 10

data BloomFilter a Source #

A mutable bloom filter representing a set of Hashable values of type a.

A bloom filter is a set-like probabilistic hash-based data structure. Elements can be insert-ed and lookup-ed again. The bloom filter takes a constant amount of memory regardless of the size and number of elements inserted, however as the bloom filter "grows" the likelihood of false-positives being returned by lookup increases. fpr can be used to estimate the expected false-positive rate.

Constructors

BloomFilter 

Fields

insert :: Hashable a => BloomFilter a -> a -> IO Bool Source #

O(size_of_element). Atomically insert a new element into the bloom filter.

This returns True if the element did not exist before the insert, and False if the element did already exist (subject to false-positives; see lookup). Note that this is reversed from lookup.

lookup :: Hashable a => BloomFilter a -> a -> IO Bool Source #

O(size_of_element). Look up the value in the bloom filter, returning True if the element is possibly in the set, and False if the element is certainly not in the set.

The likelihood that this returns True on an element that was not previously insert-ed depends on the parameters the filter was created with, and the number of elements already inserted. The fpr function can help you estimate this.

unionInto Source #

Arguments

:: BloomFilter a

Source, left unmodified.

-> BloomFilter a

Target, receiving elements from source.

-> IO () 

O(l_src+l_target). Write all elements in the first bloom filter into the second. This operation is lossless; ignoring writes to the source bloom filter that happen during this operation (see below), the target bloom filter will be identical to the filter produced had the elements been inserted into the target originally.

The source and target must have been created with the same key and k-value. In addition the target must not be larger (i.e. the l-value) than the source, and they must both use 128/64 bit hashes. This throws a BloomFilterException when those constraints are not met.

This operation is not linearizable with respect to insert-type operations; elements being written to the source bloomfilter during this operation may or may not make it into the target "at random".

intersectionInto Source #

Arguments

:: BloomFilter a

Source, left unmodified.

-> BloomFilter a

Target, receiving elements from source.

-> IO () 

O(l_src+l_target). Make target the intersection of the source and target sets. This operation is "lossy" in that the false positive ratio of target after the operation may be higher than if the elements forming the intersection had been insert-ed directly into target.

The constraints and comments re. linearizability in unionInto also apply here.

clone :: BloomFilter a -> IO (BloomFilter a) Source #

Create a copy of the input BloomFilter.

This operation is not linearizable with respect to insert-type operations; elements being written to the source bloomfilter during this operation may or may not make it into the target "at random".

data SipKey :: * #

A 128-bit secret key. This should be generated randomly and must be kept secret.

Constructors

SipKey ~Word64 ~Word64 

fpr Source #

Arguments

:: Int

n: Number of elements in filter

-> Int

l: Size of filter, in machine words

-> Int

k: Number of bits to map an element to

-> Int

w: word-size in bits (e.g. 32 or 64)

-> Double 

An estimate of the false-positive rate for a bloom-1 filter. For a filter with the provided parameters and having n elements, the value returned here is the percentage, over the course of many queries for elements not in the filter, of those queries which would be expected to return an incorrect True result.

This function is slow but the complexity is bounded and can accept inputs of any size.

serialize :: StableHashable a => BloomFilter a -> IO ByteString Source #

Serialize a bloom filter to a strict ByteString, which can be deserialize-ed once again. Only a hash of the SipKey is stored in the serialized format.

This operation is not linearizable with respect to insert-type operations; elements being written to the source bloomfilter during this operation may or may not make it into the serialized ByteString "at random".

unsafeSerialize :: StableHashable a => BloomFilter a -> IO ByteString Source #

Serialize a bloom filter to a strict ByteString, which can be deserialize-ed once again. Only a hash of the SipKey is stored in the serialized format. This operation is very fast and does no copying.

This is unsafe in that the source BloomFilter must not be modified after this operation, otherwise the ByteString will change, breaking referential transparency. Use serialize if uncertain.

deserialize :: StableHashable a => SipKey -> ByteString -> IO (BloomFilter a) Source #

Deserialize a BloomFilter from a ByteString created with serialize or unsafeSerialize. The key that was used to create the bloom filter is not stored for security, and must be provided here. However if the key provided does not match the key it was originally created with, a BloomFilterException will be thrown.

deserializeByteArray :: forall a. StableHashable a => SipKey -> MutableByteArray RealWorld -> IO (BloomFilter a) Source #

A low-level deserialization routine. This is very fast, and does no copying.