# cuckoo-filter [![Hackage](https://img.shields.io/badge/Hackage-0.1.0.1-blue.svg)](https://hackage.haskell.org/package/cuckoo-filter)[![Build Status](https://travis-ci.org/ChrisCoffey/cuckoo-filter.svg?branch=master)](https://travis-ci.org/ChrisCoffey/cuckoo-filter) Cuckoo filters are a probabilistic data structure used to answer questions like "Have I already seen this user" or "Is this word in the English language?". They're _probabilistic_ because each membership operation has a false positive probability. It guarnatees that there will never be a false negative, but may have a low chance of false positives. Bloom filters are the cannonical probabilistic filter structure, and cuckoo filters are a simlar but different tool. As a bloom filter's load factor increases, the chance of false positive trends towards 100%, but the inserts will never fail. On the other hand, a Cuckoo filter retains a relatively stable false positive probability under load, but as load approahes 95% inserts will begin to fail. In either case you probably want to resize your filter... This implementation has the following properties: - Buckets of 4 elements - 8 bit fingerprints - Cycle termination during item kicking occurs after (0.1 * size) buckets have been checked. - Size may be any non-zero natural number (not limited to powers of 2) For more details about how Cuckoo filters work, I recommend you read Fan et. al.'s 2016 paper https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf. ### Usage Cuckoo filters support three operations: `insert`, `member`, and `delete`. See the [haddocks](https://hackage.haskell.org/package/cuckoo-filter) for details. ### Performance As you'll find in the criterion results, the pure version of the filter can handle ~1.6 million insertions/s. From memory profiles, the vast majority of the memory is taken up by the underlying implementation of `Filter`, so this is an obvious area for improvement. The current implementation avoids pre-allocating memory for the filter, so the heap usage will incrase linearly with `insert` calls. This obviously helps keep heap usage low for sparse filters, but also means inserts are slower than they would be in a mutable implementation. ### TODO - [ ] Benchmark against a Bloom filter implementation - [ ] Introduce a mutable version of `Filter` and a typeclass for the storage interactions