Safe Haskell | None |
---|---|
Language | Haskell2010 |
- newtype Hash32 a = Hash32 {
- hashWord32 :: Word32
- newtype Hash64 a = Hash64 {
- hashWord64 :: Word64
- data Hash128 a = Hash128 {
- hashWord128_0 :: !Word64
- hashWord128_1 :: !Word64
- data SipKey = SipKey !Word64 !Word64
- siphash64 :: Hashable a => SipKey -> a -> Hash64 a
- siphash64_1_3 :: Hashable a => SipKey -> a -> Hash64 a
- siphash128 :: Hashable a => SipKey -> a -> Hash128 a
- hashFNV32 :: Hashable a => a -> Hash32 a
- hashFNV64 :: Hashable a => a -> Hash64 a
- fnvPrime32 :: Word32
- fnvPrime64 :: Word64
- fnvOffsetBasis32 :: FNV32
- fnvOffsetBasis64 :: FNV64
- class Hashable a where
- mixConstructor :: HashState h => Word8 -> h -> h
- class Hashable a => StableHashable a where
- newtype TypeHash a = TypeHash {}
- mixType :: forall a t. StableHashable a => TypeHash t -> TypeHash (t a)
- typeHashOf :: StableHashable a => a -> TypeHash a
- typeHashOfProxy :: StableHashable a => proxy a -> TypeHash a
- class HashState h where
Documentation
The core of this library consists of
- the
Hashable
class which defines how hashable chunks of bytes are delivered to the data-consuming portion of a hash function; new instances can be defined to support the hashing of new datatypes using an existing algorithm - the
HashState
class which implements the data-consuming portion of a particular hashing algorithm, consuming bytes delivered inhash
; a new instance can be defined to implement a new hashing function that works on existingHashable
types.
We also include implementations for the following hash functions:
hashFNV32
, hashFNV64
, siphash64
, and siphash128
.
Please see the project description for more information, including motivation.
Hash Functions
Hashes of different widths. We tag these hash types with the types of
the data they were produced from so that e.g. we get a sensible Eq
instance.
Hash128 | |
|
Hashing with the SipHash algorithm
SipHash is a fast hashing algorithm with very good mixing properties, designed to be very secure against hash-flooding DOS attacks. SipHash is a good choice whenever your application may be hashing untrusted user data.
A 128-bit secret key. This should be generated randomly and must be kept secret.
siphash64 :: Hashable a => SipKey -> a -> Hash64 a Source #
An implementation of 64-bit siphash-2-4.
This function is fast on 64-bit machines, and provides very good hashing properties and protection against hash flooding attacks.
This uses the "standard" recommended parameters of 2 and 4 rounds,
recommended by the original paper, but siphash64_1_3
may be a faster and
equally secure choice.
siphash64_1_3 :: Hashable a => SipKey -> a -> Hash64 a Source #
An implementation of 64-bit siphash-1-3.
This is somewhat faster than siphash-2-4 (implemented in siphash64
), while
the authors claim it should still offer good protection against known
attacks. This is currently the standard hash function used in the Rust
language.
siphash128 :: Hashable a => SipKey -> a -> Hash128 a Source #
An implementation of 128-bit siphash-2-4.
This function is fast on 64-bit machines, and provides very good hashing properties and protection against hash flooding attacks.
Hashing with the FNV-1a algorithm
The FNV-1a hash (see http://www.isthe.com/chongo/tech/comp/fnv/) is a fast and extremely simple hashing algorithm with fairly good mixing properties. Its simplicity makes it a good choice if you need to implement the same hashing routines on multiple platforms e.g. to verify a hash generated in JS on a web client with a hash stored on your server.
If you are hashing untrusted user data and are concerned with hash flooding attacks, consider SipHash instead; performance is about the same in the current implementation.
hashFNV32 :: Hashable a => a -> Hash32 a Source #
Hash a value using the standard spec-prescribed 32-bit seed value.
hashFNV32 =Hash32
. fnv32 .hash
fnvOffsetBasis32
hashFNV64 :: Hashable a => a -> Hash64 a Source #
Hash a value using the standard spec-prescribed 64-bit seed value. This may be slow on 32-bit machines.
hashFNV64 =Hash64
. fnv64 .hash
fnvOffsetBasis64
FNV-1a Internal Parameters
Magic FNV primes:
fnvPrime32 :: Word32 Source #
fnvPrime64 :: Word64 Source #
The arbitrary initial seed values for different output hash sizes. These
values are part of the spec, but there is nothing special about them;
supposedly, in terms of hash quality, any non-zero value seed should be
fine passed to hash
:
fnvOffsetBasis32 :: FNV32 Source #
fnvOffsetBasis64 :: FNV64 Source #
Hashable types
class Hashable a where Source #
A class of types that can be converted into a hash value. We expect all instances to display "good" hashing properties (wrt avalanche, bit independence, etc.) when passed to an ideal hash function.
We try to ensure that bytes are extracted from values in a way that is
portable across architectures (where possible), and straightforward to
replicate on other platforms and in other languages. Portable instances are
also instances of StableHashable
, and non-portable instances are
NOTE-ed in instance docs here as well.
See the section "Defining Hashable instances" for details of what we expect from instances.
hash :: HashState h => h -> a -> h Source #
Add the bytes from the second argument into the hash, producing a new
hash value. This is essentially a left fold of the methods of
HashState
over individual bytes extracted from a
.
For some instances of HashState
, this method might be a complete
hashing algorithm, or might comprise the core of a hashing algorithm
(perhaps with some final mixing), or might do something completely apart
from hashing (e.g. simply cons bytes into a list for debugging).
Implementations must ensure that, for the same data:
Word16/32/64
arguments passed into the methods ofHashState
, and...- the choice of
mix
function itself...
...are consistent across architectures of different word size and
endianness. For example do not define an instance which conditionally
implements mix64
only on 64-bit architectures.
Hashable Bool Source # | hash h = hash h . \b-> if b then (1::Word8) else 0 |
Hashable Char Source # | Hash a |
Hashable Double Source # | Hash a Double by way of |
Hashable Float Source # | Hash a Float by way of |
Hashable Int Source # | NOTE: |
Hashable Int8 Source # | |
Hashable Int16 Source # | |
Hashable Int32 Source # | |
Hashable Int64 Source # | |
Hashable Integer Source # | Arbitrary-precision integers are hashed as follows: the magnitude is
represented with 32-bit chunks (at least one, for zero; but no more than
necessary), then bytes are added to the hash from most to least significant
(including all initial padding 0s). Finally |
Hashable Ordering Source # | |
Hashable Word Source # | NOTE: |
Hashable Word8 Source # | |
Hashable Word16 Source # | |
Hashable Word32 Source # | |
Hashable Word64 Source # | |
Hashable TypeRep Source # | NOTE: no promise of consistency across platforms or GHC versions. |
Hashable () Source # | hash = const . mixConstructor 0 |
Hashable BigNat Source # | The Exposed only in GHC 7.10. |
Hashable Natural Source # | The Exposed only in GHC 7.10 |
Hashable Void Source # | hash _ _ = absurd Exposed only in GHC 7.10 |
Hashable Version Source # | The (now deprecated) |
Hashable Unique Source # | NOTE: No promise of stability across runs or platforms. Implemented via
|
Hashable ThreadId Source # | NOTE: no promise of consistency across runs or platforms. |
Hashable ShortByteString Source # | Exposed only in bytestring >= v0.10.4 |
Hashable ByteString Source # | Lazy |
Hashable ByteString Source # | Strict |
Hashable ByteArray Source # | Here we hash each byte of the array in turn. If using this to hash some
data stored internally as a |
Hashable Text Source # | Lazy |
Hashable Text Source # | Strict |
Hashable a => Hashable [a] Source # | |
Hashable a => Hashable (Maybe a) Source # | |
(Integral a, Hashable a) => Hashable (Ratio a) Source # | hash s a = s `hash` numerator a `hash` denominator a |
Hashable (StableName a) Source # | NOTE: No promise of stability across runs or platforms. Implemented via
|
(Hashable a, Hashable b) => Hashable (Either a b) Source # | |
(Hashable a1, Hashable a2) => Hashable (a1, a2) Source # | |
(Hashable a1, Hashable a2, Hashable a3) => Hashable (a1, a2, a3) Source # | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4) => Hashable (a1, a2, a3, a4) Source # | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5) => Hashable (a1, a2, a3, a4, a5) Source # | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6) => Hashable (a1, a2, a3, a4, a5, a6) Source # | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6, Hashable a7) => Hashable (a1, a2, a3, a4, a5, a6, a7) Source # | |
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6, Hashable a7, Hashable a8) => Hashable (a1, a2, a3, a4, a5, a6, a7, a8) Source # | |
(Hashable a, Hashable b, Hashable c, Hashable d, Hashable e, Hashable f, Hashable g, Hashable h, Hashable i) => Hashable (a, b, c, d, e, f, g, h, i) Source # | |
(Hashable a, Hashable b, Hashable c, Hashable d, Hashable e, Hashable f, Hashable g, Hashable h, Hashable i, Hashable j) => Hashable (a, b, c, d, e, f, g, h, i, j) Source # | |
(Hashable a, Hashable b, Hashable c, Hashable d, Hashable e, Hashable f, Hashable g, Hashable h, Hashable i, Hashable j, Hashable k) => Hashable (a, b, c, d, e, f, g, h, i, j, k) Source # | |
(Hashable a, Hashable b, Hashable c, Hashable d, Hashable e, Hashable f, Hashable g, Hashable h, Hashable i, Hashable j, Hashable k, Hashable l) => Hashable (a, b, c, d, e, f, g, h, i, j, k, l) Source # | |
(Hashable a, Hashable b, Hashable c, Hashable d, Hashable e, Hashable f, Hashable g, Hashable h, Hashable i, Hashable j, Hashable k, Hashable l, Hashable m) => Hashable (a, b, c, d, e, f, g, h, i, j, k, l, m) Source # | |
(Hashable a, Hashable b, Hashable c, Hashable d, Hashable e, Hashable f, Hashable g, Hashable h, Hashable i, Hashable j, Hashable k, Hashable l, Hashable m, Hashable n) => Hashable (a, b, c, d, e, f, g, h, i, j, k, l, m, n) Source # | |
(Hashable a, Hashable b, Hashable c, Hashable d, Hashable e, Hashable f, Hashable g, Hashable h, Hashable i, Hashable j, Hashable k, Hashable l, Hashable m, Hashable n, Hashable o) => Hashable (a, b, c, d, e, f, g, h, i, j, k, l, m, n, o) Source # | |
Creating your own Hashable instances
When defining Hashable
instances for your own algebraic data types you
should do the following.
For types with a single constructor, simply call hash
on each of the
constructor's children, for instance:
instance (Hashable a, Hashable b, Hashable c) => Hashable (a, b, c) where hash h (a,b,c) = h `hash` a `hash` b `hash` c
And when a type has multiple constructors you should additionally call
mixConstructor
with a different argument for each constructor.
instance (Hashable a, Hashable b) => Hashable (Eithers a b) where hash h (Lefts a0 a1) = mixConstructor 0 (h `hash` a0 `hash` a1) hash h (Rights b0 b1 b2) = mixConstructor 1 (h `hash` b0 `hash` b1 `hash` b2)
In the future we may offer a way to derive instances like this automatically.
:: HashState h | |
=> Word8 | Constructor number. We recommend starting from 0 and incrementing. |
-> h | Hash state value to mix our byte into |
-> h | New hash state |
mixConstructor n h = h `mix8` (0xFF - n)
Stable "hashes" of Types
class Hashable a => StableHashable a where Source #
Types whose hashes can be compared across platforms. This is somewhat like
a limited, but cross-platform Typeable
.
Instances are expected to be universally-unique, and should be generated
randomly. Type parameters can be hashed together using mixType
, like:
instance (StableHashable b) => StableHashable (A b) where typeHash = mixType (TypeHash 530184177609460980)
When Hashable
instances change, the TypeHash
must be changed to a new
random value. This lets us "version" a set of hashes; if we store a
TypeHash
along with a set of hashes in program A, in program B we can
compare the stored value with our own TypeHash
and verify that hashes we
generate in program B can be meaningfully compared.
Note, obviously this doesn't ensure that values were hashed with the same hashing algorithm, and you should come up with your own means to serialize that information if you need to.
A value that uniquely identifies a StableHashable
type. This serves to
both version a type with respect to its Hashable
instance, and distinguish
types from each other (similar to TypeRep
) across program runs, platforms
and library versions.
mixType :: forall a t. StableHashable a => TypeHash t -> TypeHash (t a) Source #
A helper for implementing typeHash
by xor
-ing type parameters with a
new random hash value. E.g.:
instance (StableHashable a, StableHashable b) => StableHashable (a, b) where typeHash = mixType (mixType (TypeHash 12071780118071628513)) __ a new random value for (,) ____ mix in the type hash fora
__ mix in the type hash forb
typeHashOf :: StableHashable a => a -> TypeHash a Source #
typeHashOfProxy :: StableHashable a => proxy a -> TypeHash a Source #
Implementing new hash functions
class HashState h where Source #
A class for defining how a hash function consumes input data. Bytes are
fed to these methods in our Hashable
instances, which promise to call
these methods in a platform-independent way.
Instances of HashState
only need to define mix8
, but may additionally
handle mix
-ing in larger word chunks for performance reasons. For instance
a hash function which operates on four bytes at a time might make use of
mix32
, and perhaps in mix8
pad with three additional 0s.
Endianness is normalized in Hashable
instances, so these mix methods can
expect to receive identical words across platforms.
mix8 :: h -> Word8 -> h Source #
Mix in one byte.
mix16 :: h -> Word16 -> h Source #
Mix in a 2-byte word. Defaults to two mix8
on bytes from most to
least significant.
mix32 :: h -> Word32 -> h Source #
Mix in a 4-byte word. Defaults to four mix8
on bytes from most to
least significant.
mix64 :: h -> Word64 -> h Source #
Mix in a 8-byte word. Defaults to two mix32
on 32-byte words from
most to least significant.
Detailed discussion of principled Hashable instances
This is a work-in-progress, and purely IYI.
Special care needs to be taken when defining instances of Hashable for your
own types, especially for recursive types and types with multiple
constructors. First instances need to ensure that
distinct values produce distinct hash values. Here's an example of a bad
implementation for Maybe
:
instance (Hashable a)=> Hashable (Maybe a) where -- BAD! hash h (Just a) = h `hash` a -- BAD! hash h Nothing = h `hash` (1::Word8) -- BAD!
Here Just (1::Word8)
hashes to the same value as Nothing
.
Second and more tricky, instances should not permit a function
f :: a -> (a,a)
such that
x
... or something.
The idea is we want to avoid the following kinds of collisions:hash
y == x hash
y1 hash
y2 where (y1,y2) = f y
hash [Just 1, Nothing] == hash [Just 1] -- BAD! hash ([1,2], [3]) == hash ([1], [2,3]) -- BAD!
Maybe what we mean is that where a
is a Monoid
, we expect replacing
mappend
with the hash operation to always yield different values. This
needs clarifying; please help.
Here are a few rules of thumb which should result in principled instances for your own types (This is a work-in-progress; please help):
- If all values of a type have a static structure, i.e. the arrangement and number of child parts to be hashed is knowable from the type, then one may simply hash each child element of the type in turn. This is the case for product types like tuples (where the arity is reflected in the type), or primitive numeric values composed of a static number of bits.
Otherwise if the type has variable structure, e.g. if it has multiple constructors or is an array type...
- Every possible value of a type should inject at least one byte of entropy apart from any recursive calls to child elements; we can ensure this is the case by hashing an initial or final distinct byte for each distinct constructor of our type
To ensure hashing remains consistent across platforms, instances should not
compile-time-conditionally call different mix
-family HashState
functions.
This rule doesn't matter for instances like FNV32
which mix in data one byte
at a time, but other HashState
instances may operate on multiple bytes at a time,
perhaps using padding bytes, so this becomes important.
A final important note: we're not concerned with collisions between values of
different types; in fact in many cases "equivalent" values of different
types intentionally hash to the same value. This also means instances cannot
rely on the hashing of child elements being uncorrelated. That might be one
interpretation of the mistake in our faulty Maybe
instance above