Ticket #2528 (new bug)

Opened 5 years ago

Last modified 20 months ago

nub not as reliable as nubBy

Reported by: jdressel Owned by:
Priority: normal Milestone: 6.10.1
Component: libraries/base Version: 6.8.2
Keywords: Cc:
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

While working on a fun math/programming exercise I discovered a peculiar behavior in "nub". While it worked the vast majority of the time, there were a few cases where it failed to purge all duplicates as defined by (==). Replacing the use of "nub" with "nubBy (==)" correctly purged all cases.

I have linked a codepad evaluation that outputs the strange behavior:  http://codepad.org/VNauTAam

I would have isolated the bug better for you, but like I say nub works by far the majority of the time. This must be a peculiar clash with my custom data structures and strange definition of (==).

Cheers.

Change History

  Changed 5 years ago by JeremyShaw

  • status changed from new to closed
  • resolution set to invalid

Your (==) instance is broken. Specifically, for:

-- ((9+9)+2) a = Branch (Branch (Leaf (Just 9)) Add (Leaf (Just 9))) Add (Leaf (Just 2))

-- (9+(2+9)) b = Branch (Leaf (Just 9)) Add (Branch (Leaf (Just 2)) Add (Leaf (Just 9)))

we see that:

*Main> a == b True *Main> b == a False *Main>

There may be other bugs as well, that is just the first instance I found. nubBy (==) will break as well, you just have not seen it yet.

All instances of Eq must be transitive, reflexive, symmetric, and antisymmetric. Haskell can't automatically check this for you -- so it is left up the developer to do, either by formal proof, or by unit testing.

follow-up: ↓ 6   Changed 5 years ago by guest

  • status changed from closed to reopened
  • resolution invalid deleted
  • component changed from Compiler to libraries/base

(From Adrian Hey)

Well there still seems to be a problem, considering the what the library docs say about nubBy..

"The 'nubBy' function behaves just like 'nub', except it uses a user-supplied equality predicate instead of the overloaded '==' function."

..and nub was indeed originally defined as,..

nub=nubBy compare

..as is the local nub in the example.

So right or wrong, shouldn't we see the same result in each case? Here's the source of the problem I think:

elem_by :: (a -> a -> Bool) -> a -> [a] -> Bool
elem_by _  _ []         =  False
elem_by eq y (x:xs)     =  x `eq` y || elem_by eq y xs

vs.

elem _ []	= False
elem x (y:ys)	= x==y || elem x ys

Also, where does the H98 report say all instances of Eq must be transitive, reflexive, symmetric, and antisymmetric? It just says "The Eq class provides equality (==)..", whatever that might mean :-)

  Changed 5 years ago by guest

(from Adrian Hey)

Erm..that should be..

nub=nubBy (==)

  Changed 5 years ago by jdressel

Jeremy is correct: I forgot a case in my definition of (==) that makes it not completely symmetric for second layer commutative equivalence (oops).

The relevant demonstration for the example case that failed in the linked codepaste is:

  *Main> let a = Branch (Branch (Leaf (Just 9)) Add (Leaf (Just 9))) Add (Leaf (Just 2))
  *Main> let b = Branch (Leaf (Just 9)) Add (Branch (Leaf (Just 9)) Add  (Leaf (Just 2)))
  *Main> a == b
  True
  *Main> b == a
  False
  *Main> List.nubBy (==) [a,b]
  [((9+9)+2)]
  *Main> List.nub [a,b]
  [((9+9)+2),(9+(9+2))]

I guess the lack of symmetry in (==) causes the difference in behavior, which makes more sense to me now.

However, Adrian is also correct, the difference in behavior between "nub" and "nubBy (==)" is what startled me. I would have found the bug in my code eventually (especially after unit testing).

  Changed 5 years ago by simonmar

  • owner set to simonmar
  • difficulty set to Unknown
  • status changed from reopened to new

in reply to: ↑ 2   Changed 5 years ago by JeremyShaw

Replying to guest:

(From Adrian Hey) Well there still seems to be a problem, considering the what the library docs say about nubBy..

hrm, interesting. I am guessing that even with 'valid' Eq instances that could be important due to laziness and bottom?

Also, where does the H98 report say all instances of Eq must be transitive, reflexive, symmetric, and antisymmetric? It just says "The Eq class provides equality (==)..", whatever that might mean :-)

Well, it does not say it explicitly, but I suspect H98's usage of Eq implicitly demands those laws be followed. Hopefully in Haskell' the laws will not only be stated, but there will be some QuickCheck?-style properties you can use to test your own instances ;)

  Changed 5 years ago by igloo

  • milestone set to 6.10.1

  Changed 5 years ago by simonmar

  • status changed from new to closed
  • resolution set to fixed

Fixed:

Tue Sep  2 10:29:50 BST 2008  Simon Marlow <marlowsd@gmail.com>                 
  * #2528: reverse the order of args to (==) in nubBy to match nub

  Changed 5 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

  Changed 20 months ago by maeder

  • owner simonmar deleted
  • failure set to None/Unknown
  • status changed from closed to new
  • resolution fixed deleted

The last change made the implementation and reported prelude version (and thus the documentation of elem_by) inconsistent, see  http://www.haskell.org/pipermail/libraries/2011-September/016758.html

I propose to reverse the order or args under "USE_REPORT_PRELUDE", too, because I think, this is the right behavior and has the least impact.

The filter predicate could also be written as "(not . (`eq` x))".

"elem_by eq y xs" could be replaced by "any (eq y) xs" (and elem_by deleted), but I don't know if this has any performance impacts.

  Changed 20 months ago by maeder

Well, at least the wrong comment containing "same order" should be corrected, extended or removed.

Note: See TracTickets for help on using tickets.