hashtables-1.2.3.3: Mutable hash tables in the ST monad

Safe HaskellNone
LanguageHaskell2010

Data.HashTable.ST.Cuckoo

Description

A hash table using the cuckoo strategy. (See http://en.wikipedia.org/wiki/Cuckoo_hashing). Use this hash table if you...

  • want the fastest possible inserts, and very fast lookups.
  • are conscious of memory usage; this table has less space overhead than Data.HashTable.ST.Basic or Data.HashTable.ST.Linear.
  • don't care that a table resize might pause for a long time to rehash all of the key-value mappings.

Details:

The basic idea of cuckoo hashing, first introduced by Pagh and Rodler in 2001, is to use d hash functions instead of only one; in this implementation d=2 and the strategy we use is to split up a flat array of slots into k buckets, each cache-line-sized:

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+
|x0|x1|x2|x3|x4|x5|x6|x7|y0|y1|y2|y3|y4|y5|y6|y7|z0|z1|z2........|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+----------+
[  ^^^  bucket 0  ^^^  ][  ^^^  bucket 1  ^^^  ]...

There are actually three parallel arrays: one unboxed array of Ints for hash codes, one boxed array for keys, and one boxed array for values. When looking up a key-value mapping, we hash the key using two hash functions and look in both buckets in the hash code array for the key. Each bucket is cache-line sized, with its keys in no particular order. Because the hash code array is unboxed, we can search it for the key using a highly-efficient branchless strategy in C code, using SSE instructions if available.

On insert, if both buckets are full, we knock out a randomly-selected entry from one of the buckets (using a random walk ensures that "key cycles" are broken with maximum probability) and try to repeat the insert procedure. This process may not succeed; if all items have not successfully found a home after some number of tries, we give up and rehash all of the elements into a larger table.

Space overhead: experimental results

The implementation of cuckoo hash given here is almost as fast for lookups as the basic open-addressing hash table using linear probing, and on average is more space-efficient: in randomized testing on my 64-bit machine (see test/compute-overhead/ComputeOverhead.hs in the source distribution), mean overhead is 0.77 machine words per key-value mapping, with a standard deviation of 0.29 words, and 1.23 words per mapping at the 95th percentile.

References:

  • A. Pagh and F. Rodler. Cuckoo hashing. In /Proceedings of the 9th Annual European Symposium on Algorithms/, pp. 121-133, 2001.
Synopsis

Documentation

data HashTable s k v Source #

A cuckoo hash table.

Instances
HashTable HashTable Source # 
Instance details

Defined in Data.HashTable.ST.Cuckoo

Methods

new :: ST s (HashTable s k v) Source #

newSized :: Int -> ST s (HashTable s k v) Source #

mutate :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> (Maybe v, a)) -> ST s a Source #

mutateST :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a Source #

insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s () Source #

delete :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s () Source #

lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v) Source #

foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a Source #

mapM_ :: ((k, v) -> ST s b) -> HashTable s k v -> ST s () Source #

lookupIndex :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe Word) Source #

nextByIndex :: HashTable s k v -> Word -> ST s (Maybe (Word, k, v)) Source #

computeOverhead :: HashTable s k v -> ST s Double Source #

Show (HashTable s k v) Source # 
Instance details

Defined in Data.HashTable.ST.Cuckoo

Methods

showsPrec :: Int -> HashTable s k v -> ShowS #

show :: HashTable s k v -> String #

showList :: [HashTable s k v] -> ShowS #

new :: ST s (HashTable s k v) Source #

See the documentation for this function in new.

newSized :: Int -> ST s (HashTable s k v) Source #

See the documentation for this function in newSized.

delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s () Source #

See the documentation for this function in delete.

lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v) Source #

See the documentation for this function in lookup.

insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s () Source #

See the documentation for this function in insert.

mutate :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> (Maybe v, a)) -> ST s a Source #

mutateST :: (Eq k, Hashable k) => HashTable s k v -> k -> (Maybe v -> ST s (Maybe v, a)) -> ST s a Source #

mapM_ :: ((k, v) -> ST s a) -> HashTable s k v -> ST s () Source #

See the documentation for this function in mapM_.

foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a Source #

See the documentation for this function in foldM.

lookupIndex :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s (Maybe Word) Source #

Find index of given key in the hashtable.

nextByIndex :: HashTable s k v -> Word -> ST s (Maybe (Word, k, v)) Source #

Find the next entry in the hashtable starting at the given index.