{-# LANGUAGE CPP #-} ------------------------------------------------------------------------ -- | -- Module : Data.Hashable -- Copyright : (c) Milan Straka 2010 -- (c) Johan Tibell 2011 -- (c) Bryan O'Sullivan 2011, 2012 -- License : BSD-style -- Maintainer : johan.tibell@gmail.com -- Stability : provisional -- Portability : portable -- -- This module defines a class, 'Hashable', for types that can be -- converted to a hash value. This class exists for the benefit of -- hashing-based data structures. The module provides instances for -- most standard types. Efficient instances for other types can be -- generated automatically and effortlessly using the generics support -- in GHC 7.2 and above. -- -- The easiest way to get started is to use the 'hash' function. Here -- is an example session with @ghci@. -- -- > Prelude> import Data.Hashable -- > Prelude> hash "foo" -- > 60853164 module Data.Hashable ( -- * Hashing and security -- $security -- * Computing hash values hash , Hashable(..) -- ** Avalanche behavior -- $avalanche -- * Creating new instances -- | There are two ways to create new instances: by deriving -- instances automatically using GHC's generic programming -- support or by writing instances manually. -- ** Generic instances -- $generics -- *** Understanding a compiler error -- $generic_err -- ** Writing instances by hand -- $blocks , hashUsing , hashPtr , hashPtrWithSalt #if defined(__GLASGOW_HASKELL__) , hashByteArray , hashByteArrayWithSalt #endif -- ** Hashing types with multiple constructors -- $ctors ) where import Data.Hashable.Class #ifdef GENERICS import Data.Hashable.Generic () #endif -- $avalanche -- -- A good hash function has a 50% probability of flipping every bit of -- its result in response to a change of just one bit in its -- input. This property is called /avalanche/. To be truly general -- purpose, hash functions must have strong avalanche behavior. -- -- All of the 'Hashable' instances provided by this module have -- excellent avalanche properties. -- $generics -- -- Beginning with GHC 7.2, the recommended way to make instances of -- 'Hashable' for most types is to use the compiler's support for -- automatically generating default instances. -- -- > {-# LANGUAGE DeriveGeneric #-} -- > -- > import GHC.Generics (Generic) -- > import Data.Hashable -- > -- > data Foo a = Foo a String -- > deriving (Eq, Generic) -- > -- > instance Hashable a => Hashable (Foo a) -- > -- > data Colour = Red | Green | Blue -- > deriving Generic -- > -- > instance Hashable Colour -- -- If you omit a body for the instance declaration, GHC will generate -- a default instance that correctly and efficiently hashes every -- constructor and parameter. -- $generic_err -- -- Suppose you intend to use the generic machinery to automatically -- generate a 'Hashable' instance. -- -- > data Oops = Oops -- > -- forgot to add "deriving Generic" here! -- > -- > instance Hashable Oops -- -- And imagine that, as in the example above, you forget to add a -- \"@deriving 'Generic'@\" clause to your data type. At compile time, -- you will get an error message from GHC that begins roughly as -- follows: -- -- > No instance for (GHashable (Rep Oops)) -- -- This error can be confusing, as 'GHashable' is not exported (it is -- an internal typeclass used by this library's generics machinery). -- The correct fix is simply to add the missing \"@deriving -- 'Generic'@\". -- $blocks -- -- To maintain high quality hashes, new 'Hashable' instances should be -- built using existing 'Hashable' instances, combinators, and hash -- functions. -- -- The functions below can be used when creating new instances of -- 'Hashable'. For many string-like types the -- 'hashWithSalt' method can be defined in terms of either -- 'hashPtrWithSalt' or 'hashByteArrayWithSalt'. Here's how you could -- implement an instance for the 'B.ByteString' data type, from the -- @bytestring@ package: -- -- > import qualified Data.ByteString as B -- > import qualified Data.ByteString.Internal as B -- > import qualified Data.ByteString.Unsafe as B -- > import Data.Hashable -- > import Foreign.Ptr (castPtr) -- > -- > instance Hashable B.ByteString where -- > hashWithSalt salt bs = B.inlinePerformIO $ -- > B.unsafeUseAsCStringLen bs $ \(p, len) -> -- > hashPtrWithSalt p (fromIntegral len) salt -- -- Use 'hashWithSalt' to compute a hash from several values, using -- this recipe: -- -- > data Product a b = P a b -- > -- > instance (Hashable a, Hashable b) => Hashable (Product a b) where -- > hashWithSalt s (P a b) = s `hashWithSalt` a `hashWithSalt` b -- -- You can chain hashes together using 'hashWithSalt', by following -- this recipe: -- -- > combineTwo h1 h2 = h1 `hashWithSalt` h2 -- $ctors -- -- For a type with several value constructors, there are a few -- possible approaches to writing a 'Hashable' instance. -- -- If the type is an instance of 'Enum', the easiest (and safest) path -- is to convert it to an 'Int', and use the existing 'Hashable' -- instance for 'Int'. -- -- > data Color = Red | Green | Blue -- > deriving Enum -- > -- > instance Hashable Color where -- > hashWithSalt = hashUsing fromEnum -- -- This instance benefits from the fact that the 'Hashable' instance -- for 'Int' has excellent avalanche properties. -- -- In contrast, a very weak hash function would be: -- -- > terribleHash :: Color -> Int -- > terribleHash salt = fromEnum -- -- This has terrible avalanche properties, as the salt is ignored, and -- every input is mapped to a small integer. -- -- If the type's constructors accept parameters, it can be important -- to distinguish the constructors. -- -- > data Time = Days Int -- > | Weeks Int -- > | Months Int -- -- The weak hash function below guarantees a high probability of days, -- weeks, and months all colliding when hashed. -- -- > veryBadHash :: Time -> Int -- > veryBadHash (Days d) = hash d -- > veryBadHash (Weeks w) = hash w -- > veryBadHash (Months m) = hash m -- -- It is easy to distinguish the constructors using the `hashWithSalt` -- function. -- -- > instance Hashable Time where -- > hashWithSalt s (Days n) = s `hashWithSalt` -- > (0::Int) `hashWithSalt` n -- > hashWithSalt s (Weeks n) = s `hashWithSalt` -- > (1::Int) `hashWithSalt` n -- > hashWithSalt s (Months n) = s `hashWithSalt` -- > (2::Int) `hashWithSalt` n -- -- If a constructor accepts multiple parameters, their hashes can be -- chained. -- -- > data Date = Date Int Int Int -- > -- > instance Hashable Date where -- > hashWithSalt s (Date yr mo dy) = -- > s `hashWithSalt` -- > yr `hashWithSalt` -- > mo `hashWithSalt` dy -- $security -- #security# -- -- Applications that use hash-based data structures to store input -- from untrusted users can be susceptible to \"hash DoS\", a class of -- denial-of-service attack that uses deliberately chosen colliding -- inputs to force an application into unexpectedly behaving with -- quadratic time complexity. -- -- This library uses the SipHash algorithm to hash strings. SipHash -- was designed to be more robust against collision attacks than -- traditional hash algorithms, while retaining good performance. -- -- To further mitigate the risk from collision attacks, this library -- provides an environment variable named @HASHABLE_SALT@ that allows -- the default salt used by the 'hash' function to be chosen at -- application startup time. -- -- * In the normal case, the environment variable is not set, and a -- fixed salt is used that does not vary between runs. (This choice -- can be made permanent by building this package with the -- @-ffixed-salt@ flag.) -- -- * If the value is the string @random@, the system's cryptographic -- pseudo-random number generator will be used to supply a salt. -- While this may offer added security, it can also violate the -- assumption of some Haskell libraries that expect the results of -- 'hash' to be stable across application runs. Choose this -- behaviour with care (and testing)! -- -- * When the value is an integer (prefixed with @0x@ for hexadecimal), -- it will be used as the salt. -- -- If @HASHABLE_SALT@ cannot be parsed, then the first time that a -- call to 'hash' is made, the application will halt with an -- informative error message. -- -- (Implementation note: while SipHash is used for strings, a -- faster—and almost certainly less secure—algorithm is -- used for numeric types, on the assumption that strings are much -- more likely as a hash DoS attack vector.)