hashable-1.2.0.10: A class for types that can be converted to a hash value

Portabilityportable
Stabilityprovisional
Maintainerjohan.tibell@gmail.com
Safe HaskellNone

Data.Hashable

Contents

Description

This module defines a class, Hashable, for types that can be converted to a hash value. This class exists for the benefit of hashing-based data structures. The module provides instances for most standard types. Efficient instances for other types can be generated automatically and effortlessly using the generics support in GHC 7.2 and above.

The easiest way to get started is to use the hash function. Here is an example session with ghci.

 Prelude> import Data.Hashable
 Prelude> hash "foo"
 60853164

Synopsis

Hashing and security

Applications that use hash-based data structures to store input from untrusted users can be susceptible to "hash DoS", a class of denial-of-service attack that uses deliberately chosen colliding inputs to force an application into unexpectedly behaving with quadratic time complexity.

To mitigate the risk from collision attacks, this library provides an environment variable named HASHABLE_SALT that allows the default salt used by the hash function to be chosen at application startup time.

  • In the normal case, the environment variable is not set, and a fixed salt is used that does not vary between runs. (This choice can be made permanent by building this package with the -ffixed-salt flag.)
  • If the value is the string random, the system's cryptographic pseudo-random number generator will be used to supply a salt. While this may offer added security, it can also violate the assumption of some Haskell libraries that expect the results of hash to be stable across application runs. Choose this behaviour with care (and testing)!
  • When the value is an integer (prefixed with 0x for hexadecimal), it will be used as the salt.

If HASHABLE_SALT cannot be parsed, then the first time that a call to hash is made, the application will halt with an informative error message.

(Implementation note: FNV-1, the hash function used for strings, can still be susceptible to collision attacks, even if a salt unknown to the attacker is used. Future versions of the library might improve on this situation.)

Computing hash values

hash :: Hashable a => a -> IntSource

Return a hash value for the argument. Defined in terms of hashWithSalt and a default salt.

(See the "Hashing and security" section of the Data.Hashable documentation for an important note on working safely with untrusted user input.)

class Hashable a whereSource

The class of types that can be converted to a hash value.

Methods

hashWithSalt :: Int -> a -> IntSource

Return a hash value for the argument, using the given salt.

The general contract of hashWithSalt is:

  • If a value is hashed using the same salt during distinct runs of an application, the result must remain the same. (This is necessary to make it possible to store hashes on persistent media.)
  • If two values are equal according to the == method, then applying the hashWithSalt method on each of the two values must produce the same integer result if the same salt is used in each case.
  • It is not required that if two values are unequal according to the == method, then applying the hashWithSalt method on each of the two values must produce distinct integer results. (Every programmer will be aware that producing distinct integer results for unequal values will improve the performance of hashing-based data structures.)

This method can be used to compute different hash values for the same input by providing a different salt in each application of the method. This implies that any instance that defines hashWithSalt must make use of the salt in its implementation.

Avalanche behavior

A good hash function has a 50% probability of flipping every bit of its result in response to a change of just one bit in its input. This property is called avalanche. To be truly general purpose, hash functions must have strong avalanche behavior.

All of the Hashable instances provided by this module have excellent avalanche properties.

Creating new instances

There are two ways to create new instances: by deriving instances automatically using GHC's generic programming support or by writing instances manually.

Generic instances

Beginning with GHC 7.2, the recommended way to make instances of Hashable for most types is to use the compiler's support for automatically generating default instances.

 {-# LANGUAGE DeriveGeneric #-}

 import GHC.Generics (Generic)
 import Data.Hashable

 data Foo a = Foo a String
              deriving (Eq, Generic)

 instance Hashable a => Hashable (Foo a)

 data Colour = Red | Green | Blue
               deriving Generic

 instance Hashable Colour

If you omit a body for the instance declaration, GHC will generate a default instance that correctly and efficiently hashes every constructor and parameter.

Understanding a compiler error

Suppose you intend to use the generic machinery to automatically generate a Hashable instance.

 data Oops = Oops
      -- forgot to add "deriving Generic" here!

 instance Hashable Oops

And imagine that, as in the example above, you forget to add a "deriving Generic" clause to your data type. At compile time, you will get an error message from GHC that begins roughly as follows:

 No instance for (GHashable (Rep Oops))

This error can be confusing, as GHashable is not exported (it is an internal typeclass used by this library's generics machinery). The correct fix is simply to add the missing "deriving Generic".

Writing instances by hand

To maintain high quality hashes, new Hashable instances should be built using existing Hashable instances, combinators, and hash functions.

The functions below can be used when creating new instances of Hashable. For many string-like types the hashWithSalt method can be defined in terms of either hashPtrWithSalt or hashByteArrayWithSalt. Here's how you could implement an instance for the ByteString data type, from the bytestring package:

 import qualified Data.ByteString as B
 import qualified Data.ByteString.Internal as B
 import qualified Data.ByteString.Unsafe as B
 import Data.Hashable
 import Foreign.Ptr (castPtr)

 instance Hashable B.ByteString where
     hashWithSalt salt bs = B.inlinePerformIO $
                            B.unsafeUseAsCStringLen bs $ \(p, len) ->
                            hashPtrWithSalt p (fromIntegral len) salt

Use hashWithSalt to compute a hash from several values, using this recipe:

 data Product a b = P a b

 instance (Hashable a, Hashable b) => Hashable (Product a b) where
     hashWithSalt s (P a b) = s `hashWithSalt` a `hashWithSalt` b

You can chain hashes together using hashWithSalt, by following this recipe:

 combineTwo h1 h2 = h1 `hashWithSalt` h2

hashUsingSource

Arguments

:: Hashable b 
=> (a -> b)

Transformation function.

-> Int

Salt.

-> a

Value to transform.

-> Int 

Transform a value into a Hashable value, then hash the transformed value using the given salt.

This is a useful shorthand in cases where a type can easily be mapped to another type that is already an instance of Hashable. Example:

 data Foo = Foo | Bar
          deriving (Enum)

 instance Hashable Foo where
     hashWithSalt = hashUsing fromEnum

hashPtrSource

Arguments

:: Ptr a

pointer to the data to hash

-> Int

length, in bytes

-> IO Int

hash value

Compute a hash value for the content of this pointer.

hashPtrWithSaltSource

Arguments

:: Ptr a

pointer to the data to hash

-> Int

length, in bytes

-> Int

salt

-> IO Int

hash value

Compute a hash value for the content of this pointer, using an initial salt.

This function can for example be used to hash non-contiguous segments of memory as if they were one contiguous segment, by using the output of one hash as the salt for the next.

hashByteArraySource

Arguments

:: ByteArray#

data to hash

-> Int

offset, in bytes

-> Int

length, in bytes

-> Int

hash value

Compute a hash value for the content of this ByteArray#, beginning at the specified offset, using specified number of bytes. Availability: GHC.

hashByteArrayWithSaltSource

Arguments

:: ByteArray#

data to hash

-> Int

offset, in bytes

-> Int

length, in bytes

-> Int

salt

-> Int

hash value

Compute a hash value for the content of this ByteArray#, using an initial salt.

This function can for example be used to hash non-contiguous segments of memory as if they were one contiguous segment, by using the output of one hash as the salt for the next.

Availability: GHC.

Hashing types with multiple constructors

For a type with several value constructors, there are a few possible approaches to writing a Hashable instance.

If the type is an instance of Enum, the easiest (and safest) path is to convert it to an Int, and use the existing Hashable instance for Int.

 data Color = Red | Green | Blue
              deriving Enum

 instance Hashable Color where
     hashWithSalt = hashUsing fromEnum

This instance benefits from the fact that the Hashable instance for Int has excellent avalanche properties.

In contrast, a very weak hash function would be:

 terribleHash :: Color -> Int
 terribleHash salt = fromEnum

This has terrible avalanche properties, as the salt is ignored, and every input is mapped to a small integer.

If the type's constructors accept parameters, it can be important to distinguish the constructors.

 data Time = Days Int
           | Weeks Int
           | Months Int

The weak hash function below guarantees a high probability of days, weeks, and months all colliding when hashed.

 veryBadHash :: Time -> Int
 veryBadHash (Days  d)  = hash d
 veryBadHash (Weeks w)  = hash w
 veryBadHash (Months m) = hash m

It is easy to distinguish the constructors using the hashWithSalt function.

 instance Hashable Time where
     hashWithSalt s (Days n)   = s `hashWithSalt`
                                 (0::Int) `hashWithSalt` n
     hashWithSalt s (Weeks n)  = s `hashWithSalt`
                                 (1::Int) `hashWithSalt` n
     hashWithSalt s (Months n) = s `hashWithSalt`
                                 (2::Int) `hashWithSalt` n

If a constructor accepts multiple parameters, their hashes can be chained.

 data Date = Date Int Int Int

 instance Hashable Date where
     hashWithSalt s (Date yr mo dy) =
         s `hashWithSalt`
         yr `hashWithSalt`
         mo `hashWithSalt` dy