Safe Haskell | None |
---|

An implementation of linear hash tables. (See http://en.wikipedia.org/wiki/Linear_hashing). Use this hash table if you...

- care a lot about fitting your data set into memory; of the hash tables included in this collection, this one has the lowest space overhead
- don't care that inserts and lookups are slower than the other hash table
implementations in this collection (this one is slightly faster than
`Data.HashTable`

from the base library in most cases) - have a soft real-time or interactive application for which the risk of introducing a long pause on insert while all of the keys are rehashed is unacceptable.

*Details:*

Linear hashing allows for the expansion of the hash table one slot at a time, by moving a "split" pointer across an array of pointers to buckets. The number of buckets is always a power of two, and the bucket to look in is defined as:

bucket(level,key) = hash(key) mod (2^level)

The "split pointer" controls the expansion of the hash table. If the hash
table is at level `k`

(i.e. `2^k`

buckets have been allocated), we first
calculate `b=bucket(level-1,key)`

. If `b < splitptr`

, the destination bucket is
calculated as `b'=bucket(level,key)`

, otherwise the original value `b`

is used.

The split pointer is incremented once an insert causes some bucket to become fuller than some predetermined threshold; the bucket at the split pointer (*not* the bucket which triggered the split!) is then rehashed, and half of its keys can be expected to be rehashed into the upper half of the table.

When the split pointer reaches the middle of the bucket array, the size of the bucket array is doubled, the level increases, and the split pointer is reset to zero.

Linear hashing, although not quite as fast for inserts or lookups as the implementation of linear probing included in this package, is well suited for interactive applications because it has much better worst case behaviour on inserts. Other hash table implementations can suffer from long pauses, because it is occasionally necessary to rehash all of the keys when the table grows. Linear hashing, on the other hand, only ever rehashes a bounded (effectively constant) number of keys when an insert forces a bucket split.

*Space overhead: experimental results*

In randomized testing (see `test/compute-overhead/ComputeOverhead.hs`

in the
source distribution), mean overhead is approximately 1.51 machine words per
key-value mapping with a very low standard deviation of about 0.06 words, 1.60
words per mapping at the 95th percentile.

*Unsafe tricks*

Then the `unsafe-tricks`

flag is on when this package is built (and it is on by
default), we use some unsafe tricks (namely `unsafeCoerce#`

and
`reallyUnsafePtrEquality#`

) to save indirections in this table. These
techniques rely on assumptions about the behaviour of the GHC runtime system
and, although they've been tested and should be safe under normal conditions,
are slightly dangerous. Caveat emptor. In particular, these techniques are
incompatible with HPC code coverage reports.

References:

- W. Litwin. Linear hashing: a new tool for file and table addressing. In
*Proc. 6th International Conference on Very Large Data Bases, Volume 6*, pp. 212-223, 1980. - P-A. Larson. Dynamic hash tables.
*Communications of the ACM*31: 446-457, 1988.

- data HashTable s k v
- new :: ST s (HashTable s k v)
- newSized :: Int -> ST s (HashTable s k v)
- delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s ()
- lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v)
- insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s ()
- mapM_ :: ((k, v) -> ST s b) -> HashTable s k v -> ST s ()
- foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s a
- computeOverhead :: HashTable s k v -> ST s Double

# Documentation

new :: ST s (HashTable s k v)Source

See the documentation for this function in Data.HashTable.Class.

newSized :: Int -> ST s (HashTable s k v)Source

See the documentation for this function in Data.HashTable.Class.

delete :: (Hashable k, Eq k) => HashTable s k v -> k -> ST s ()Source

See the documentation for this function in Data.HashTable.Class.

lookup :: (Eq k, Hashable k) => HashTable s k v -> k -> ST s (Maybe v)Source

See the documentation for this function in Data.HashTable.Class.

insert :: (Eq k, Hashable k) => HashTable s k v -> k -> v -> ST s ()Source

See the documentation for this function in Data.HashTable.Class.

mapM_ :: ((k, v) -> ST s b) -> HashTable s k v -> ST s ()Source

See the documentation for this function in Data.HashTable.Class.

foldM :: (a -> (k, v) -> ST s a) -> a -> HashTable s k v -> ST s aSource

See the documentation for this function in Data.HashTable.Class.

computeOverhead :: HashTable s k v -> ST s DoubleSource

See the documentation for this function in Data.HashTable.Class.