Ticket #4108 (closed bug: fixed)

Opened 3 years ago

Last modified 3 years ago

GHC.Integer.hashInteger is a misnomer and confuses people

Reported by: duncan Owned by:
Priority: normal Milestone: 7.0.1
Component: Compiler Version: 6.12.2
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

GHC.Integer.hashInteger from the integer-gmp package does not hash Integers. It extracts the first Int from an Integer which might possibly be a useful utility in a hash function (it is used in Data.Unique.hashUnique).

The fact this function has this name makes people tempted to use it. This is a bad thing. There is at least one package on hackage using this (name withheld to protect the guilty).

The function should be renamed to reflect what it actually does. The fact that this might break programs using this function should be seen as a bonus.

Change History

  Changed 3 years ago by guest

It seems to me that this function serves perfectly well as a hash function.

A hash function is any well-defined procedure or mathematical function that converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array. — Wikipedia

It seems to me that this function evenly distributes the integers among the Ints and is fast

-- Eric Mertens

  Changed 3 years ago by igloo

  • status changed from new to closed
  • resolution set to invalid

I agree with Eric.

Hard to say more without knowing which package you are talking about, and what it expects hashInteger to do.

follow-up: ↓ 4   Changed 3 years ago by duncan

The new berp package uses it (but will stop using it in the next release). The author did not realise that GHC.Integer.hashInteger is basically the identity function and would not have used it if they had known.

By comparison, Data.HashTable.hashInt is not an identity. I defer to Eric that perhaps the existing hashInteger could be usable as a hash function in some circumstances. It does however seem a bit special-purpose to use that name, and it could be named something less misleading.

in reply to: ↑ 3   Changed 3 years ago by simonmar

  • status changed from closed to new
  • resolution invalid deleted

Replying to duncan:

The new berp package uses it (but will stop using it in the next release). The author did not realise that GHC.Integer.hashInteger is basically the identity function and would not have used it if they had known.

Why's that, just out of interest?

By comparison, Data.HashTable.hashInt is not an identity. I defer to Eric that perhaps the existing hashInteger could be usable as a hash function in some circumstances. It does however seem a bit special-purpose to use that name, and it could be named something less misleading.

It's probably not a great hash function, in the same way that (mod power-of-2) is not a great hash function either (disclaimer: I'm not an expert on hash functions), because it's too easy to construct a collection of values that have the same hash. So at least we ought to add something to the documentation to say what the function does.

  Changed 3 years ago by igloo

  • milestone set to 6.14.1

  Changed 3 years ago by simonmar

I just took a look at this, and I think 'hashInteger' is actually wrong:

hashInteger :: Integer -> Int#
hashInteger (S# i) = i
hashInteger (J# s d) = if s ==# 0#
                       then 0#
                       else indexIntArray# d 0#

Suppose the Integer is -j for some j and fits in Int#. If the representation is S# then hashInteger returns -j, but if the representation is J# then hashInteger returns j. We don't guarantee to represent an Integer using S# if it fits in an Int#, so I think this is wrong.

  Changed 3 years ago by simonmar

Fixed:

Fri Aug 13 16:29:26 BST 2010  Simon Marlow <marlowsd@gmail.com>
  * implement integer2Int# and integer2Word# in Haskell, not foreign prim
Shall I push this patch? (1/2)  [ynW...], or ? for more options: y
Fri Aug 13 16:31:42 BST 2010  Simon Marlow <marlowsd@gmail.com>
  * fix hashInteger to be the same as fromIntegral, and document it (#4108)

  Changed 3 years ago by simonmar

  • status changed from new to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.