unagi-bloomfilter-0.1.1.0: A fast, cache-efficient, concurrent bloom filter

Control.Concurrent.BloomFilter.Internal

Synopsis

# Documentation

Some additional unsafe, low-level, and internal functions are exposed here for advanced users. The API should remain stable, except that functions may be added and no promises are made about the internals of the BloomFilter type itself.

Arguments

 :: SipKey The secret key to be used for hashing values for this bloom filter. -> Int k: Number of independent bits of w to which we map an element. 3 is a good choice. -> Int log2l: The size of the filter, in machine words, as a power of 2. e.g. 3 means 2^3 or 8 machine words. -> IO (BloomFilter a)

Create a new bloom filter of elements of type a with the given hash key and parameters. fpr can be useful for calculating the k parameter, or determining a good filter size.

The parameters must satisfy the following conditions, otherwise a BloomFilterException will be thrown:

• k > 0
• log2l >= 0 && log2l <= wordSizeInBits
• log2l + k*(logBase 2 wordSizeInBits) <= 128

In addition, performance on 64-bit machines will be best when log2l + k*(logBase 2 wordSizeInBits) <= 64 where we require only 64 hash bits for each element. (Performance on 32-bit machines will be worse in all cases, as we're doing 64-bit arithmetic.)

Example: on a 32-bit machine, the following produces a ~4KB bloom filter of 2^10 32-bit words, using 3 bits per element:

 do key read <\$ getEnv THE_SECRET_KEY
Bloom.new key 3 10


data BloomFilter a Source #

A mutable bloom filter representing a set of Hashable values of type a.

A bloom filter is a set-like probabilistic hash-based data structure. Elements can be insert-ed and lookup-ed again. The bloom filter takes a constant amount of memory regardless of the size and number of elements inserted, however as the bloom filter "grows" the likelihood of false-positives being returned by lookup increases. fpr can be used to estimate the expected false-positive rate.

Constructors

 BloomFilter Fieldskey :: !SipKey k :: !Int hash64Enough :: Boolif we need no more than 64-bits we can use the faster siphash64_1_3l_minus1 :: !Word64 log2l :: !Int arr :: !(MutableByteArray RealWorld)

newtype BloomFilterException Source #

Exceptions that may be thrown by operations in this library.

Constructors

 BloomFilterException String

Instances

 Source # Methods Source # Methods

insert :: Hashable a => BloomFilter a -> a -> IO Bool Source #

O(size_of_element). Atomically insert a new element into the bloom filter.

This returns True if the element did not exist before the insert, and False if the element did already exist (subject to false-positives; see lookup). Note that this is reversed from lookup.

lookup :: Hashable a => BloomFilter a -> a -> IO Bool Source #

O(size_of_element). Look up the value in the bloom filter, returning True if the element is possibly in the set, and False if the element is certainly not in the set.

The likelihood that this returns True on an element that was not previously insert-ed depends on the parameters the filter was created with, and the number of elements already inserted. The fpr function can help you estimate this.

Arguments

 :: BloomFilter a Source, left unmodified. -> BloomFilter a Target, receiving elements from source. -> IO ()

O(l_src+l_target). Write all elements in the first bloom filter into the second. This operation is lossless; ignoring writes to the source bloom filter that happen during this operation (see below), the target bloom filter will be identical to the filter produced had the elements been inserted into the target originally.

The source and target must have been created with the same key and k-value. In addition the target must not be larger (i.e. the l-value) than the source, and they must both use 128/64 bit hashes. This throws a BloomFilterException when those constraints are not met.

This operation is not linearizable with respect to insert-type operations; elements being written to the source bloomfilter during this operation may or may not make it into the target "at random".

Arguments

 :: BloomFilter a Source, left unmodified. -> BloomFilter a Target, receiving elements from source. -> IO ()

O(l_src+l_target). Make target the intersection of the source and target sets. This operation is "lossy" in that the false positive ratio of target after the operation may be higher than if the elements forming the intersection had been insert-ed directly into target.

The constraints and comments re. linearizability in unionInto also apply here.

clone :: BloomFilter a -> IO (BloomFilter a) Source #

Create a copy of the input BloomFilter.

This operation is not linearizable with respect to insert-type operations; elements being written to the source bloomfilter during this operation may or may not make it into the target "at random".

data SipKey :: * #

A 128-bit secret key. This should be generated randomly and must be kept secret.

Constructors

 SipKey ~Word64 ~Word64

Instances

 Methods(==) :: SipKey -> SipKey -> Bool #(/=) :: SipKey -> SipKey -> Bool # Methods MethodsshowsPrec :: Int -> SipKey -> ShowS #showList :: [SipKey] -> ShowS #

Arguments

 :: Int n: Number of elements in filter -> Int l: Size of filter, in machine words -> Int k: Number of bits to map an element to -> Int w: word-size in bits (e.g. 32 or 64) -> Double

An estimate of the false-positive rate for a bloom-1 filter. For a filter with the provided parameters and having n elements, the value returned here is the percentage, over the course of many queries for elements not in the filter, of those queries which would be expected to return an incorrect True result.

This function is slow but the complexity is bounded and can accept inputs of any size.

Serialize a bloom filter to a strict ByteString, which can be deserialize-ed once again. Only a hash of the SipKey is stored in the serialized format.

This operation is not linearizable with respect to insert-type operations; elements being written to the source bloomfilter during this operation may or may not make it into the serialized ByteString "at random".

Serialize a bloom filter to a strict ByteString, which can be deserialize-ed once again. Only a hash of the SipKey is stored in the serialized format. This operation is very fast and does no copying.

This is unsafe in that the source BloomFilter must not be modified after this operation, otherwise the ByteString will change, breaking referential transparency. Use serialize if uncertain.

Deserialize a BloomFilter from a ByteString created with serialize or unsafeSerialize. The key that was used to create the bloom filter is not stored for security, and must be provided here. However if the key provided does not match the key it was originally created with, a BloomFilterException will be thrown.

A low-level deserialization routine. This is very fast, and does no copying.