A basic open-addressing hash table using linear probing. Use this hash table if you...
- want the fastest possible lookups, and very fast inserts.
- don't care about wasting a little bit of memory to get it.
- don't care that a table resize might pause for a long time to rehash all of the key-value mappings.
- have a workload which is not heavy with deletes; deletes clutter the table with deleted markers and force the table to be completely rehashed fairly often.
Of the hash tables in this collection, this hash table has the best insert and lookup performance, with the following caveats.
This table is not especially memory-efficient; firstly, the table has a maximum load factor of 0.83 and will be resized if load exceeds this value. Secondly, to improve insert and lookup performance, we store the hash code for each key in the table.
Each hash table entry requires three words, two for the pointers to the key and
value and one for the hash code. We don't count key and value pointers as
overhead, because they have to be there -- so the overhead for a full slot is
one word -- but empty slots in the hash table count for a full three words of
m as the number of slots in the table and
n as the number
of key value mappings. If the load factor is
k=n/m, the amount of space
w(n) = 1*n + 3(m-n)
w(n) = n + 3(n/k - n) = n (3/k-2)
k=0.83, the maximum load factor, gives a minimum overhead of 2
words per mapping. If
k=0.5, under normal usage the maximum overhead
situation, then the overhead would be 4 words per mapping.
Space overhead: experimental results
In randomized testing (see
test/compute-overhead/ComputeOverhead.hs in the
source distribution), mean overhead (that is, the number of words needed to
store the key-value mapping over and above the two words necessary for the key
and the value pointers) is approximately 2.29 machine words per key-value
mapping with a standard deviation of about 0.44 words, and 3.14 words per
mapping at the 95th percentile.
If enough elements are inserted into the table to make it exceed the maximum
load factor, the table is resized. A resize involves a complete rehash of all
the elements in the table, which means that any given call to
take O(n) time in the size of the table, with a large constant factor. If a
long pause waiting for the table to resize is unacceptable for your
application, you should choose the included linear hash table instead.
- Knuth, Donald E. The Art of Computer Programming, vol. 3 Sorting and Searching. Addison-Wesley Publishing Company, 1973.
- data HashTable s k v
- new :: ST s (HashTable s k v)
- newSized :: Int -> ST s (HashTable s k v)
- delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s ()
- lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v)
- insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s ()
- mapM_ :: ((k, v) -> ST s b) -> HashTable s k v -> ST s ()
- foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a
- computeOverhead :: HashTable s k v -> ST s Double
An open addressing hash table using linear probing.