Portability | portable |
---|---|
Stability | provisional |
Maintainer | libraries@haskell.org |
Safe Haskell | Trustworthy |
An implementation of extensible hash tables, as described in
Per-Ake Larson, Dynamic Hash Tables, CACM 31(4), April 1988,
pp. 446--457. The implementation is also derived from the one
in GHC's runtime system (ghc/rts/Hash.{c,h}
).
- data HashTable key val
- new :: (key -> key -> Bool) -> (key -> Int32) -> IO (HashTable key val)
- newHint :: (key -> key -> Bool) -> (key -> Int32) -> Int -> IO (HashTable key val)
- insert :: HashTable key val -> key -> val -> IO ()
- delete :: HashTable key val -> key -> IO ()
- lookup :: HashTable key val -> key -> IO (Maybe val)
- update :: HashTable key val -> key -> val -> IO Bool
- fromList :: Eq key => (key -> Int32) -> [(key, val)] -> IO (HashTable key val)
- toList :: HashTable key val -> IO [(key, val)]
- hashInt :: Int -> Int32
- hashString :: String -> Int32
- prime :: Int32
- longestChain :: HashTable key val -> IO [(key, val)]
Basic hash table operations
:: (key -> key -> Bool) |
|
-> (key -> Int32) |
|
-> IO (HashTable key val) | Returns: an empty hash table |
Creates a new hash table. The following property should hold for the eq
and hash
functions passed to new
:
eq A B => hash A == hash B
:: (key -> key -> Bool) |
|
-> (key -> Int32) |
|
-> Int |
|
-> IO (HashTable key val) | Returns: an empty hash table |
Creates a new hash table with the given minimum size.
insert :: HashTable key val -> key -> val -> IO ()Source
Inserts a key/value mapping into the hash table.
Note that insert
doesn't remove the old entry from the table -
the behaviour is like an association list, where lookup
returns
the most-recently-inserted mapping for a key in the table. The
reason for this is to keep insert
as efficient as possible. If
you need to update a mapping, then we provide update
.
lookup :: HashTable key val -> key -> IO (Maybe val)Source
Looks up the value of a key in the hash table.
update :: HashTable key val -> key -> val -> IO BoolSource
Updates an entry in the hash table, returning True
if there was
already an entry for this key, or False
otherwise. After update
there will always be exactly one entry for the given key in the table.
insert
is more efficient than update
if you don't care about
multiple entries, or you know for sure that multiple entries can't
occur. However, update
is more efficient than delete
followed
by insert
.
Converting to and from lists
fromList :: Eq key => (key -> Int32) -> [(key, val)] -> IO (HashTable key val)Source
Convert a list of key/value pairs into a hash table. Equality on keys is taken from the Eq instance for the key type.
toList :: HashTable key val -> IO [(key, val)]Source
Converts a hash table to a list of key/value pairs.
Hash functions
This implementation of hash tables uses the low-order n bits of the hash value for a key, where n varies as the hash table grows. A good hash function therefore will give an even distribution regardless of n.
If your keyspace is integrals such that the low-order bits between
keys are highly variable, then you could get away with using fromIntegral
as the hash function.
We provide some sample hash functions for Int
and String
below.
A sample (and useful) hash function for Int and Int32, implemented by extracting the uppermost 32 bits of the 64-bit result of multiplying by a 33-bit constant. The constant is from Knuth, derived from the golden ratio:
golden = round ((sqrt 5 - 1) * 2^32)
We get good key uniqueness on small inputs (a problem with previous versions): (length $ group $ sort $ map hashInt [-32767..65536]) == 65536 + 32768
hashString :: String -> Int32Source
A sample hash function for Strings. We keep multiplying by the golden ratio and adding. The implementation is:
hashString = foldl' f golden where f m c = fromIntegral (ord c) * magic + hashInt32 m magic = 0xdeadbeef
Where hashInt32 works just as hashInt shown above.
Knuth argues that repeated multiplication by the golden ratio will minimize gaps in the hash space, and thus it's a good choice for combining together multiple keys to form one.
Here we know that individual characters c are often small, and this produces frequent collisions if we use ord c alone. A particular problem are the shorter low ASCII and ISO-8859-1 character strings. We pre-multiply by a magic twiddle factor to obtain a good distribution. In fact, given the following test:
testp :: Int32 -> Int testp k = (n - ) . length . group . sort . map hs . take n $ ls where ls = [] : [c : l | l <- ls, c <- ['\0'..'\xff']] hs = foldl' f golden f m c = fromIntegral (ord c) * k + hashInt32 m n = 100000
We discover that testp magic = 0.
Diagnostics
longestChain :: HashTable key val -> IO [(key, val)]Source
This function is useful for determining whether your hash function is working well for your data set. It returns the longest chain of key/value pairs in the hash table for which all the keys hash to the same bucket. If this chain is particularly long (say, longer than 14 elements or so), then it might be a good idea to try a different hash function.