Safe Haskell | None |
---|---|
Language | Haskell2010 |
- class Hashable a where
- class Eq h => Hash h where
- newtype FNV32 = FNV32 {}
- hashFNV32 :: Hashable a => a -> FNV32
- newtype FNV64 = FNV64 {}
- hashFNV64 :: Hashable a => a -> FNV64
- fnvPrime32 :: Word32
- fnvPrime64 :: Word64
- fnvOffsetBasis32 :: FNV32
- fnvOffsetBasis64 :: FNV64
- mixConstructor :: Hash h => Word8 -> h -> h
Documentation
The core of this library consists of
- the
Hashable
class which defines how hashable chunks of bytes are delivered to a hash function; new instances can be defined to support the hashing of new datatypes using an existing algorithm - the
Hash
class which implements a particular hashing algorithm, consuming bytes delivered inhash
; new instances can be defined to support hashing existingHashable
types with a new algorithm.
Currently we implement only the 32 and 64-bit variations of the
FNV-1a non-cryptographic hashing algorithm
(hashFNV32
and hashFNV64
), which have good hashing properties and are
easy to implement in different languages and on different platforms.
Please see the project description for more information.
A class of types that can be converted into a hash value. We expect all
instances to display "good" hashing properties (wrt avalanche, bit
independence, etc.) when passed to a "good" Hash
function.
We try to ensure that bytes are extracted from values in a way that is portable across architectures (where possible), and straightforward to replicate on other platforms and in other languages. Exceptions are NOTE-ed in instance docs.
See the section "Defining Hashable instances" for details of what we expect from instances.
hash :: Hash h => h -> a -> h Source
Add the bytes from the second argument into the hash, producing a new
hash value. This is essentially a left fold of the methods of Hash
over individual bytes extracted from a
.
For some instances of Hash
, this method might be a complete hashing
algorithm, or might comprise the core of a hashing algorithm (perhaps
with some final mixing), or might do something completely apart from
hashing (e.g. simply cons bytes into a list for debugging).
Hashable Bool | hash h = hash h . \b-> if b then (1::Word8) else 0 |
Hashable Char | Hash a |
Hashable Double | Hash a Double as IEEE 754 double-precision format bytes. This is terribly slow; direct complaints to http://hackage.haskell.org/trac/ghc/ticket/4092 |
Hashable Float | Hash a Float as IEEE 754 single-precision format bytes. This is terribly slow; direct complaints to http://hackage.haskell.org/trac/ghc/ticket/4092 |
Hashable Int | NOTE: |
Hashable Int8 | |
Hashable Int16 | |
Hashable Int32 | |
Hashable Int64 | |
Hashable Integer | Arbitrary-precision integers are hashed as follows: the magnitude is
represented with 32-bit chunks (at least one, for zero; but no more than
necessary), then bytes are added to the hash from most to least significant
(including all initial padding 0s). Finally |
Hashable Ordering | |
Hashable Word | NOTE: |
Hashable Word8 | |
Hashable Word16 | |
Hashable Word32 | |
Hashable Word64 | |
Hashable () | hash = const . mixConstructor 0 |
Hashable Unique | |
Hashable Version | The (now deprecated) |
Hashable ThreadId | NOTE: no promise of consistency across runs or platforms. |
Hashable TypeRep | NOTE: no promise of consistency across platforms or GHC versions. |
Hashable ByteString | Lazy |
Hashable ShortByteString | Exposed only in bytestring >= v0.10.4 |
Hashable ByteString | Strict |
Hashable ByteArray | Here we hash each byte of the array in turn. If using this to hash some
data stored internally as a |
Hashable Text | Lazy |
Hashable Text | Strict |
Hashable a => Hashable [a] | |
(Integral a, Hashable a) => Hashable (Ratio a) | hash s a = s `hash` numerator a `hash` denominator a |
Hashable (StableName a) | NOTE: No promise of stability across runs or platforms. Implemented via
|
Hashable a => Hashable (Maybe a) | |
(Hashable a, Hashable b) => Hashable (Either a b) | |
(Hashable a1, Hashable a2) => Hashable (a1, a2) | |
(Hashable a1, Hashable a2, Hashable a3) => Hashable (a1, a2, a3) | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4) => Hashable (a1, a2, a3, a4) | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5) => Hashable (a1, a2, a3, a4, a5) | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6) => Hashable (a1, a2, a3, a4, a5, a6) | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6, Hashable a7) => Hashable (a1, a2, a3, a4, a5, a6, a7) | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6, Hashable a7, Hashable a8) => Hashable (a1, a2, a3, a4, a5, a6, a7, a8) |
class Eq h => Hash h where Source
A class for hash functions which take a running hash value and
incrementally mix in bytes (or chunks of bytes). Bytes are fed to these
methods in our Hashable
instances, which promise to call these methods in
a platform-independent way.
Instances of Hash
only need to define mix8
, but may additional handle
mix
-ing in larger word chunks for performance reasons. For instance a hash
function which operates on four bytes at a time might make use of mix32
,
and perhaps in mix8
pad with three additional 0s.
Hashing with the FNV-1a algorithm
The FNV-1a hash algorithm. See http://www.isthe.com/chongo/tech/comp/fnv/
hashFNV32 :: Hashable a => a -> FNV32 Source
Hash a value using the standard spec-prescribed 32-bit seed value. For relevant instances of primitive types, we expect this to produce values following the FNV1a spec.
hashFNV32 =hash
fnvOffsetBasis32
hashFNV64 :: Hashable a => a -> FNV64 Source
Hash a value using the standard spec-prescribed 64-bit seed value. For relevant instances of primitive types, we expect this to produce values following the FNV1a spec.
This may be slow on 32-bit machines.
hashFNV64 =hash
fnvOffsetBasis64
Internals
FNV Primes
Standard seed values
The arbitrary initial seed values for different output hash sizes. These
values are part of the spec, but there is nothing special about them;
supposedly, in terms of hash quality, any non-zero value seed should be
fine passed to hash
:
Creating Hash and Hashable instances
Defining principled Hashable instances
Special care needs to be taken when defining instances of Hashable for your
own types, especially for recursive types and types with multiple
constructors. First instances need to ensure that /distinct values produce
distinct hash values. Here's an example of a bad/ implementation for Maybe
:
instance (Hashable a)=> Hashable (Maybe a) where -- BAD! hash h (Just a) = h `hash` a -- BAD! hash h Nothing = h `hash` (1::Word8) -- BAD!
Here Just (1::Word8)
hashes to the same value as Nothing
.
Second and more tricky, instances should not permit a function
f :: a -> (a,a)
such that
x
... or something.
The idea is we want to avoid the following kinds of collisions:hash
y == x hash
y1 hash
y2 where (y1,y2) = f y
hash [Just 1, Nothing] == hash [Just 1] -- BAD! hash ([1,2], [3]) == hash ([1], [2,3] -- BAD!)
Maybe what we mean is that where a
is a Monoid
, we expect replacing
mappend
with the hash operation to always yield different values. This
needs clarifying; please help.
Here are a few rules of thumb which should result in principled instances for your own types (This is a work-in-progress; please help):
- If all values of a type have a static structure, i.e. the arrangement and number of child parts to be hashed is knowable from the type, then one may simply hash each child element of the type in turn. This is the case for product types like tuples (where the arity is reflected in the type), or primitive numeric values composed of a static number of bits.
Otherwise if the type has variable structure, e.g. if it has multiple constructors or is an array type...
- Every possible value of a type should inject at least one byte of entropy apart from any recursive calls to child elements; we can ensure this is the case by hashing an initial or final distinct byte for each distinct constructor of our type
To ensure hashing remains consistent across platforms, instances should not
compile-time-conditionally call different mix
-family Hash
functions.
This rule doesn't matter for instances like FNV32
which mix in data one byte
at a time, but other Hash
instances may operate on multiple bytes at a time,
perhaps using padding bytes, so this becomes important.
A final important note: we're not concerned with collisions between values of
different types; in fact in many cases "equivalent" values of different
types intentionally hash to the same value. This also means instances cannot
rely on the hashing of child elements being uncorrelated. That might be one
interpretation of the mistake in our faulty Maybe
instance above