Copyright | (c) Chris Coffey 2018 |
---|---|
License | MIT |
Maintainer | chris@foldl.io |
Stability | experimental |
Safe Haskell | None |
Language | Haskell2010 |
Cuckoo filters are an alternative data structure to Bloom filters. They use a different approach to hashing items into the filter, which provides different behavior under load.
Inserting an item in to a Bloom filter always succeeds, although as the load factor on the filter increases the false positive probability trends towards %100. Cuckoo filters on the other hand hold the false positive probability constant under load, but will begin to fail inserts.
Cuckoo filters also support deletion natively, which allows reclaiming some used space from the filter to ward off insertion failures.
For more details see: Fan, . Cuckoo Filter: Practically Better Than Bloom. Retrieved August 22, 2016, from https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf
Synopsis
- data Size
- makeSize :: Natural -> Maybe Size
- data Filter a
- data MFilter a
- initialize :: CuckooFilter filt m => Size -> m (filt a)
- insert :: (Hashable a, Monad m, CuckooFilter filt m) => filt a -> a -> m (Maybe (filt a))
- member :: (Hashable a, Monad m, CuckooFilter filt m) => a -> filt a -> m Bool
- delete :: (Hashable a, Monad m, CuckooFilter filt m) => filt a -> a -> m (filt a)
The Cuckoo Filter
A non-zero natural number. Generally this is a power of two, although there's no hard requirement for that given the current implementation.
A Cuckoo Filter with a fixed size. The current implementation uses 8 bit fingerprints and 4 element buckets.
Instances
Monad m => CuckooFilter Filter m Source # | |
Defined in Data.CuckooFilter.Pure | |
Eq (Filter a) Source # | |
Show (Filter a) Source # | |
Generic (Filter a) Source # | |
ToJSON (Filter a) Source # | |
Defined in Data.CuckooFilter.Pure | |
FromJSON (Filter a) Source # | |
Serialize (Filter a) Source # | |
type Rep (Filter a) Source # | |
Defined in Data.CuckooFilter.Pure type Rep (Filter a) = D1 (MetaData "Filter" "Data.CuckooFilter.Pure" "cuckoo-filter-0.2.0.1-De2yBz7IqJ1IdBJJH37Xnd" False) (C1 (MetaCons "F" PrefixI True) (S1 (MetaSel (Just "buckets") NoSourceUnpackedness NoSourceStrictness DecidedLazy) (Rec0 (IntMap Bucket)) :*: (S1 (MetaSel (Just "numBuckets") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 Natural) :*: S1 (MetaSel (Just "size") NoSourceUnpackedness SourceStrict DecidedStrict) (Rec0 Size)))) |
initialize :: CuckooFilter filt m => Size -> m (filt a) Source #
Create a new cuckoo filter of the specified size
Working with a Cuckoo Filter
:: (Hashable a, Monad m, CuckooFilter filt m) | |
=> filt a | Current filter state |
-> a | Item to hash and store in the filter |
-> m (Maybe (filt a)) |
In exchange for the stable false-positive probability, insertion into a cuckoo filter may fail as the load factor increases.
Amoritized O(1)
Note, because of how cuckoo hashing works, inserts will fail when there's a set of items s that hash to the same fingerprint and share either IndexA or IndexB with probability (2numBuckets * 1256)^ (|s|-1). Alternatively, inserting the same item 2b+1 times will trigger the failure as well.
:: (Hashable a, Monad m, CuckooFilter filt m) | |
=> a | Check if this element is in the filter |
-> filt a | The filter |
-> m Bool |
Checks whether a given item is within the filter.
O(1)
delete :: (Hashable a, Monad m, CuckooFilter filt m) => filt a -> a -> m (filt a) Source #
Deletes a single occurance of a
from the Filter
. It first checks for a value
in the primary index, then in the secondary index.
Deleting an element not in the Cuckoo Filter is a noop and returns the filter unchanged.
O(1)