Safe Haskell | None |
---|---|
Language | Haskell2010 |
- new :: SipKey -> Int -> Int -> IO (BloomFilter a)
- data BloomFilter a = BloomFilter {}
- newtype BloomFilterException = BloomFilterException String
- insert :: Hashable a => BloomFilter a -> a -> IO Bool
- lookup :: Hashable a => BloomFilter a -> a -> IO Bool
- unionInto :: BloomFilter a -> BloomFilter a -> IO ()
- intersectionInto :: BloomFilter a -> BloomFilter a -> IO ()
- clone :: BloomFilter a -> IO (BloomFilter a)
- data SipKey :: * = SipKey ~Word64 ~Word64
- fpr :: Int -> Int -> Int -> Int -> Double
- serialize :: StableHashable a => BloomFilter a -> IO ByteString
- unsafeSerialize :: StableHashable a => BloomFilter a -> IO ByteString
- deserialize :: StableHashable a => SipKey -> ByteString -> IO (BloomFilter a)
- deserializeByteArray :: forall a. StableHashable a => SipKey -> MutableByteArray RealWorld -> IO (BloomFilter a)
Documentation
Some additional unsafe, low-level, and internal functions are exposed here
for advanced users. The API should remain stable, except that functions may
be added and no promises are made about the internals of the BloomFilter
type itself.
:: SipKey | The secret key to be used for hashing values for this bloom filter. |
-> Int |
|
-> Int |
|
-> IO (BloomFilter a) |
Create a new bloom filter of elements of type a
with the given hash key
and parameters. fpr
can be useful for calculating the k
parameter, or
determining a good filter size.
The parameters must satisfy the following conditions, otherwise a
BloomFilterException
will be thrown:
k > 0
log2l >= 0 && log2l <= wordSizeInBits
log2l + k*(logBase 2 wordSizeInBits) <= 128
In addition, performance on 64-bit machines will be best when
log2l + k*(logBase 2 wordSizeInBits) <= 64
where we require only 64 hash
bits for each element. (Performance on 32-bit machines will be worse in all
cases, as we're doing 64-bit arithmetic.)
Example: on a 32-bit machine, the following produces a ~4KB bloom filter of
2^10
32-bit words, using 3 bits per element:
do key read <$ getEnv THE_SECRET_KEY Bloom.new key 3 10
data BloomFilter a Source #
A mutable bloom filter representing a set of Hashable
values of type a
.
A bloom filter is a set-like probabilistic hash-based data structure.
Elements can be insert
-ed and lookup
-ed again. The bloom filter
takes a constant amount of memory regardless of the size and number of
elements inserted, however as the bloom filter "grows" the likelihood of
false-positives being returned by lookup
increases. fpr
can be used to
estimate the expected false-positive rate.
BloomFilter | |
|
newtype BloomFilterException Source #
Exceptions that may be thrown by operations in this library.
lookup :: Hashable a => BloomFilter a -> a -> IO Bool Source #
O(size_of_element). Look up the value in the bloom filter, returning
True
if the element is possibly in the set, and False
if the element is
certainly not in the set.
The likelihood that this returns True
on an element that was not
previously insert
-ed depends on the parameters the filter was created
with, and the number of elements already inserted. The fpr
function can
help you estimate this.
:: BloomFilter a | Source, left unmodified. |
-> BloomFilter a | Target, receiving elements from source. |
-> IO () |
O(l_src+l_target). Write all elements in the first bloom filter into the second. This operation is lossless; ignoring writes to the source bloom filter that happen during this operation (see below), the target bloom filter will be identical to the filter produced had the elements been inserted into the target originally.
The source and target must have been created with the same key and
k
-value. In addition the target must not be larger (i.e. the l
-value)
than the source, and they must both use 128/64 bit hashes. This throws a
BloomFilterException
when those constraints are not met.
This operation is not linearizable with respect to insert
-type operations;
elements being written to the source bloomfilter during this operation may
or may not make it into the target "at random".
:: BloomFilter a | Source, left unmodified. |
-> BloomFilter a | Target, receiving elements from source. |
-> IO () |
O(l_src+l_target). Make target
the intersection of the source and
target sets. This operation is "lossy" in that the false positive ratio of
target after the operation may be higher than if the elements forming the
intersection had been insert
-ed directly into target.
The constraints and comments re. linearizability in unionInto
also apply
here.
clone :: BloomFilter a -> IO (BloomFilter a) Source #
Create a copy of the input BloomFilter
.
This operation is not linearizable with respect to insert
-type operations;
elements being written to the source bloomfilter during this operation may
or may not make it into the target "at random".
A 128-bit secret key. This should be generated randomly and must be kept secret.
:: Int |
|
-> Int |
|
-> Int |
|
-> Int |
|
-> Double |
An estimate of the false-positive rate for a bloom-1 filter. For a filter
with the provided parameters and having n
elements, the value returned
here is the percentage, over the course of many queries for elements not in
the filter, of those queries which would be expected to return an incorrect
True
result.
This function is slow but the complexity is bounded and can accept inputs of any size.
serialize :: StableHashable a => BloomFilter a -> IO ByteString Source #
Serialize a bloom filter to a strict ByteString
, which can be
deserialize
-ed once again. Only a hash of the SipKey
is stored in the
serialized format.
This operation is not linearizable with respect to insert
-type operations;
elements being written to the source bloomfilter during this operation may
or may not make it into the serialized ByteString
"at random".
unsafeSerialize :: StableHashable a => BloomFilter a -> IO ByteString Source #
Serialize a bloom filter to a strict ByteString
, which can be
deserialize
-ed once again. Only a hash of the SipKey
is stored in the
serialized format. This operation is very fast and does no copying.
This is unsafe in that the source BloomFilter
must not be modified after
this operation, otherwise the ByteString will change, breaking referential
transparency. Use serialize
if uncertain.
deserialize :: StableHashable a => SipKey -> ByteString -> IO (BloomFilter a) Source #
Deserialize a BloomFilter
from a ByteString
created with serialize
or unsafeSerialize
. The key that was used to create the bloom filter is
not stored for security, and must be provided here. However if the key
provided does not match the key it was originally created with, a
BloomFilterException
will be thrown.
deserializeByteArray :: forall a. StableHashable a => SipKey -> MutableByteArray RealWorld -> IO (BloomFilter a) Source #
A low-level deserialization routine. This is very fast, and does no copying.