Ticket #4086 (closed bug: fixed)
Data.List 'nub' function is O(n^2)
| Reported by: | Pete | Owned by: | igloo |
|---|---|---|---|
| Priority: | normal | Milestone: | 7.0.1 |
| Component: | libraries/base | Version: | 6.12.1 |
| Keywords: | Cc: | ||
| Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
| Type of failure: | Documentation bug | Difficulty: | Easy (less than 1 hour) |
| Test Case: | Blocked By: | ||
| Blocking: | Related Tickets: |
Description (last modified by igloo) (diff)
I recently discovered that some Haskell code was running much slower than I would have expected. I eventually traced the problem to the 'nub' function in the Data.List, which ghc implements as follows:
nub l = nub' l []
where
nub' [] _ = []
nub' (x:xs) ls
| x `elem` ls = nub' xs ls
| otherwise = x : nub' xs (x:ls)
This would seem to be O(n**2), because it accumulates the values it sees in a list. If it used a different data structure like a Set, it could be made O(n log n).
I'm not sure whether this should be considered a bug or not. The list nub returns is correct. On the other hand, when I call a library function in any programming language, I would normally expect it to use an algorithm that provides the best achievable asymptotic performance.
If you decide that you don't consider this to be a bug, can I suggest adding a note to the documentation, so people are aware that nub should only be used with short lists?
