hashabler-0.1.0.1: Principled, cross-platform & extensible hashing of types, including an implementation of the FNV-1a algorithm.

Safe HaskellNone
LanguageHaskell2010

Data.Hashabler

Contents

Synopsis

Documentation

The core of this library consists of

  • the Hashable class which defines how hashable chunks of bytes are delivered to a hash function; new instances can be defined to support the hashing of new datatypes using an existing algorithm
  • the Hash class which implements a particular hashing algorithm, consuming bytes delivered in hash; new instances can be defined to support hashing existing Hashable types with a new algorithm.

Currently we implement only the 32 and 64-bit variations of the FNV-1a non-cryptographic hashing algorithm (hashFNV32 and hashFNV64), which have good hashing properties and are easy to implement in different languages and on different platforms.

Please see the project description for more information.

class Hashable a where Source

A class of types that can be converted into a hash value. We expect all instances to display "good" hashing properties (wrt avalanche, bit independence, etc.) when passed to a "good" Hash function.

We try to ensure that bytes are extracted from values in a way that is portable across architectures (where possible), and straightforward to replicate on other platforms and in other languages. Exceptions are NOTE-ed in instance docs.

See the section "Defining Hashable instances" for details of what we expect from instances.

Methods

hash :: Hash h => h -> a -> h Source

Add the bytes from the second argument into the hash, producing a new hash value. This is essentially a left fold of the methods of Hash over individual bytes extracted from a.

For some instances of Hash, this method might be a complete hashing algorithm, or might comprise the core of a hashing algorithm (perhaps with some final mixing), or might do something completely apart from hashing (e.g. simply cons bytes into a list for debugging).

Instances

Hashable Bool
hash h = hash h . \b-> if b then (1::Word8) else 0
Hashable Char

Hash a Char as big endian UTF-16. Note that Char permits values in the reserved unicode range U+D800 to U+DFFF; these Char values are added to the hash just as if they were valid 16-bit characters.

Hashable Double

Hash a Double as IEEE 754 double-precision format bytes. This is terribly slow; direct complaints to http://hackage.haskell.org/trac/ghc/ticket/4092

Hashable Float

Hash a Float as IEEE 754 single-precision format bytes. This is terribly slow; direct complaints to http://hackage.haskell.org/trac/ghc/ticket/4092

Hashable Int

NOTE: Int has platform-dependent size. When hashing on 64-bit machines if the Int value to be hashed falls in the 32-bit Int range, we first cast it to an Int32. This should help ensure that programs that are correct across architectures will also produce the same hash values.

Hashable Int8 
Hashable Int16 
Hashable Int32 
Hashable Int64 
Hashable Integer

Arbitrary-precision integers are hashed as follows: the magnitude is represented with 32-bit chunks (at least one, for zero; but no more than necessary), then bytes are added to the hash from most to least significant (including all initial padding 0s). Finally mixConstructor is called on the resulting hash value, with (1::Word8) if the Integer was negative, otherwise with 0.

Hashable Ordering 
Hashable Word

NOTE: Word has platform-dependent size. When hashing on 64-bit machines if the Word value to be hashed falls in the 32-bit Word range, we first cast it to a Word32. This should help ensure that programs that are correct across architectures will also produce the same hash values.

Hashable Word8 
Hashable Word16 
Hashable Word32 
Hashable Word64 
Hashable ()
hash = const . mixConstructor 0
Hashable Unique 
Hashable Version

The (now deprecated) versionTags field is ignored, and we follow the Eq instance which does not ignore trailing zeros.

Hashable ThreadId

NOTE: no promise of consistency across runs or platforms.

Hashable TypeRep

NOTE: no promise of consistency across platforms or GHC versions.

Hashable ByteString

Lazy ByteString

Hashable ShortByteString

Exposed only in bytestring >= v0.10.4

Hashable ByteString

Strict ByteString

Hashable ByteArray

Here we hash each byte of the array in turn. If using this to hash some data stored internally as a ByteArray#, be aware that depending on the size and alignment requirements of that data, as well as the endianness of your machine, this might result in different hash values across different architectures.

Hashable Text

Lazy Text, hashed as big endian UTF-16.

Hashable Text

Strict Text, hashed as big endian UTF-16.

Hashable a => Hashable [a] 
(Integral a, Hashable a) => Hashable (Ratio a)
hash s a = s `hash` numerator a `hash` denominator a
Hashable (StableName a)

NOTE: No promise of stability across runs or platforms. Implemented via hashStableName.

Hashable a => Hashable (Maybe a) 
(Hashable a, Hashable b) => Hashable (Either a b) 
(Hashable a1, Hashable a2) => Hashable (a1, a2) 
(Hashable a1, Hashable a2, Hashable a3) => Hashable (a1, a2, a3) 
(Hashable a1, Hashable a2, Hashable a3, Hashable a4) => Hashable (a1, a2, a3, a4) 
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5) => Hashable (a1, a2, a3, a4, a5) 
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6) => Hashable (a1, a2, a3, a4, a5, a6) 
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6, Hashable a7) => Hashable (a1, a2, a3, a4, a5, a6, a7) 
(Hashable a1, Hashable a2, Hashable a3, Hashable a4, Hashable a5, Hashable a6, Hashable a7, Hashable a8) => Hashable (a1, a2, a3, a4, a5, a6, a7, a8) 

class Eq h => Hash h where Source

A class for hash functions which take a running hash value and incrementally mix in bytes (or chunks of bytes). Bytes are fed to these methods in our Hashable instances, which promise to call these methods in a platform-independent way.

Instances of Hash only need to define mix8, but may additional handle mix-ing in larger word chunks for performance reasons. For instance a hash function which operates on four bytes at a time might make use of mix32, and perhaps in mix8 pad with three additional 0s.

Minimal complete definition

mix8

Methods

mix8 :: h -> Word8 -> h Source

Hash in one byte.

mix16 :: h -> Word16 -> h Source

Hash in a 2-byte word. Defaults to mix8 on bytes from most to least significant.

mix32 :: h -> Word32 -> h Source

Hash in a 4-byte word. Defaults to mix8 on bytes from most to least significant.

Instances

Hash FNV64
mix8 (FNV64 h64) b = FNV64 $ (h64 ``xor`` fromIntegral b) * fnvPrime64
Hash FNV32
mix8 (FNV32 h32) b = FNV32 $ (h32 ``xor`` fromIntegral b) * fnvPrime32

Hashing with the FNV-1a algorithm

newtype FNV32 Source

The FNV-1a hash algorithm. See http://www.isthe.com/chongo/tech/comp/fnv/

Constructors

FNV32 

Fields

fnv32 :: Word32
 

Instances

Eq FNV32 
Ord FNV32 
Read FNV32 
Show FNV32 
Hash FNV32
mix8 (FNV32 h32) b = FNV32 $ (h32 ``xor`` fromIntegral b) * fnvPrime32

hashFNV32 :: Hashable a => a -> FNV32 Source

Hash a value using the standard spec-prescribed 32-bit seed value. For relevant instances of primitive types, we expect this to produce values following the FNV1a spec.

  hashFNV32 = hash fnvOffsetBasis32

newtype FNV64 Source

Constructors

FNV64 

Fields

fnv64 :: Word64
 

Instances

Eq FNV64 
Ord FNV64 
Read FNV64 
Show FNV64 
Hash FNV64
mix8 (FNV64 h64) b = FNV64 $ (h64 ``xor`` fromIntegral b) * fnvPrime64

hashFNV64 :: Hashable a => a -> FNV64 Source

Hash a value using the standard spec-prescribed 64-bit seed value. For relevant instances of primitive types, we expect this to produce values following the FNV1a spec.

This may be slow on 32-bit machines.

  hashFNV64 = hash fnvOffsetBasis64

Internals

FNV Primes

Standard seed values

The arbitrary initial seed values for different output hash sizes. These values are part of the spec, but there is nothing special about them; supposedly, in terms of hash quality, any non-zero value seed should be fine passed to hash:

Creating Hash and Hashable instances

mixConstructor Source

Arguments

:: Hash h 
=> Word8

Constructor number. We recommend starting from 0 and incrementing.

-> h

Hash value to mix our byte into

-> h

New hash value

Defining principled Hashable instances

Special care needs to be taken when defining instances of Hashable for your own types, especially for recursive types and types with multiple constructors. First instances need to ensure that /distinct values produce distinct hash values. Here's an example of a bad/ implementation for Maybe:

instance (Hashable a)=> Hashable (Maybe a) where              -- BAD!
    hash h (Just a) = h `hash` a          -- BAD!
    hash h Nothing  = h `hash` (1::Word8) -- BAD!

Here Just (1::Word8) hashes to the same value as Nothing.

Second and more tricky, instances should not permit a function f :: a -> (a,a) such that x hash y == x hash y1 hash y2 where (y1,y2) = f y... or something. The idea is we want to avoid the following kinds of collisions:

hash [Just 1, Nothing] == hash [Just 1]     -- BAD!
hash ([1,2], [3])      == hash ([1], [2,3]  -- BAD!)

Maybe what we mean is that where a is a Monoid, we expect replacing mappend with the hash operation to always yield different values. This needs clarifying; please help.

Here are a few rules of thumb which should result in principled instances for your own types (This is a work-in-progress; please help):

  • If all values of a type have a static structure, i.e. the arrangement and number of child parts to be hashed is knowable from the type, then one may simply hash each child element of the type in turn. This is the case for product types like tuples (where the arity is reflected in the type), or primitive numeric values composed of a static number of bits.

Otherwise if the type has variable structure, e.g. if it has multiple constructors or is an array type...

  • Every possible value of a type should inject at least one byte of entropy apart from any recursive calls to child elements; we can ensure this is the case by hashing an initial or final distinct byte for each distinct constructor of our type

To ensure hashing remains consistent across platforms, instances should not compile-time-conditionally call different mix-family Hash functions. This rule doesn't matter for instances like FNV32 which mix in data one byte at a time, but other Hash instances may operate on multiple bytes at a time, perhaps using padding bytes, so this becomes important.

A final important note: we're not concerned with collisions between values of different types; in fact in many cases "equivalent" values of different types intentionally hash to the same value. This also means instances cannot rely on the hashing of child elements being uncorrelated. That might be one interpretation of the mistake in our faulty Maybe instance above