A hash table using the cuckoo strategy. (See http://en.wikipedia.org/wiki/Cuckoo_hashing). Use this hash table if you...
- want the fastest possible inserts, and very fast lookups.
- are conscious of memory usage; this table has less space overhead than Data.HashTable.ST.Basic, but more than Data.HashTable.ST.Linear.
- don't care that a table resize might pause for a long time to rehash all of the key-value mappings.
The basic idea of cuckoo hashing, first introduced by Pagh and Rodler in 2001,
is to use d hash functions instead of only one; in this implementation d=2
and the strategy we use is to split up a flat array of slots into
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+ |x0|x1|x2|x3|x4|x5|x6|x7|y0|y1|y2|y3|y4|y5|y6|y7|z0|z1|z2........| +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+ [ ^^^ bucket 0 ^^^ ][ ^^^ bucket 1 ^^^ ]...
There are actually three parallel arrays: one unboxed array of
Ints for hash
codes, one boxed array for keys, and one boxed array for values. When looking
up a key-value mapping, we hash the key using two hash functions and look in
both buckets in the hash code array for the key. Each bucket is cache-line
sized, with its keys in no particular order. Because the hash code array is
unboxed, we can search it for the key using a highly-efficient branchless
strategy in C code, using SSE instructions if available.
On insert, if both buckets are full, we knock out a randomly-selected entry from one of the buckets (using a random walk ensures that "key cycles" are broken with maximum probability) and try to repeat the insert procedure. This process may not succeed; if all items have not successfully found a home after some number of tries, we give up and rehash all of the elements into a larger table.
Space overhead: experimental results
The implementation of cuckoo hash given here is almost as fast for lookups as
the basic open-addressing hash table using linear probing, and on average is
more space-efficient: in randomized testing on my 64-bit machine (see
test/compute-overhead/ComputeOverhead.hs in the source distribution), mean
overhead is 1.71 machine words per key-value mapping, with a standard deviation
of 0.30 words, and 2.46 words per mapping at the 95th percentile.
- A. Pagh and F. Rodler. Cuckoo hashing. In /Proceedings of the 9th Annual European Symposium on Algorithms/, pp. 121-133, 2001.
- data HashTable s k v
- new :: ST s (HashTable s k v)
- newSized :: Int -> ST s (HashTable s k v)
- delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s ()
- lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v)
- insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s ()
- mapM_ :: ((k, v) -> ST s a) -> HashTable s k v -> ST s ()
- foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a