Ticket #2143 (closed bug: fixed)

Opened 4 years ago

Last modified 2 years ago

Yhc's sort is faster than GHC's

Reported by: NeilMitchell Owned by: igloo
Priority: normal Milestone: Not GHC
Component: libraries/base Version: 6.8.2
Keywords: Cc: lennart@…, gwern0@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

The sort code in the Yhc libraries is faster than GHC. In some cases its asymptotically better. Some benchmarks have shown a doubling in performance. I know why this is, but will need to make sure the code is as good as it can be, check all the benchmarks agree etc.

Original work by Ian:  http://www.haskell.org/pipermail/glasgow-haskell-users/2002-May/003376.html

I will track this down and aim for a libraries proposal in a month or so.

Attachments

SortBy.hs Download (0.8 KB) - added by malcolm.wallace@… 3 years ago.
A stable O(n log n) sorting algorithm, by Thomas Nordin, 2002
sort.txt Download (32.7 KB) - added by guest 2 years ago.
The terminal output of my criterion run
sort.hs Download (3.1 KB) - added by guest 2 years ago.
The module running Criterion on GHC & YHC sorts
sort-reverse.txt Download (4.1 KB) - added by guest 2 years ago.
a run with YHC functions benchmarked before GHC; YHC still wins, showing that other run wasn't due to GHC being penalized by going first

Change History

Changed 4 years ago by igloo

  • difficulty set to Unknown
  • milestone set to Not GHC

Changed 4 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

Changed 4 years ago by simonmar

  • os changed from Unknown to Unknown/Multiple

Changed 3 years ago by daniel.is.fischer

  • failure set to None/Unknown

The Yhc code doesn't seem to be in the library yet. If it's really faster, shouldn't it be included even if there's still room for improvement?

I'll run a few benchmarks in the next days.

Changed 3 years ago by daniel.is.fischer

For pseudo-random lists, my measurements showed the Yhc code consistently faster, though not very much (3-12%). For (rev)sorted or almost (rev)sorted lists, Yhc's code is up to 10 times faster (according to my measurements).

Any chance to get the Yhc code into GHC?

Changed 3 years ago by malcolm.wallace@…

A stable O(n log n) sorting algorithm, by Thomas Nordin, 2002

Changed 3 years ago by malcolm.wallace@…

The attachment is the sorting algorithm from Yhc (actually, nhc98), as originally contributed by Thomas Nordin.

Changed 3 years ago by NeilMitchell

If memory serves, Lennart said he was the originally author of this code, so the path probably originates with HBC. The code in Yhc was taken unmodified from nhc98.

Changed 3 years ago by simonpj

  • cc lennart@… added
  • owner changed from NeilMitchell to igloo

OK, well assuming Lennart is happy (I've added him to cc), let's just adopt this. Ian can you action? Although the milestone says "not ghc", these sorting routines are in the base libraries which we ship with GHC, aren't they?

Simon

Changed 2 years ago by guest

  • cc gwern0@… added

Changed 2 years ago by guest

The terminal output of my criterion run

Changed 2 years ago by guest

The module running Criterion on GHC & YHC sorts

Changed 2 years ago by guest

I put together a little module tonight which uses Criterion 0.2 (0.4 won't install under 6.10.2) to benchmark GHC & YHC sorts using ascending ints, descending ints, random ints, and the Shakesperean corpus ( http://www.gutenberg.org/etext/100). (Besides the attachment, you can see  http://hpaste.org/fastcgi/hpaste.fcgi/view?id=14873#a14875 )

YHC, as measured by mean, won on all of them, even random and Shakespeare. The margins were not all 2x, but the narrowest YHC victory was Shakespeare - YHC averaged 25.3s while GHC puffed along at 29.5s.

Caveats: this was my first time using Criterion, no doubt there are major inefficiencies and bad style etc.

But I doubt any of my errors were sufficient to upend the results. Given the margins in a core function which is used all over the place (I grepped through all the repos I've downloaded; 'sort' is used a *lot*), that the replacement is correct and faster in all cases, I think not making the change as soon as possible is a mistake. This is something that should have been done yesterday, as the expression goes.

Changed 2 years ago by guest

a run with YHC functions benchmarked before GHC; YHC still wins, showing that other run wasn't due to GHC being penalized by going first

Changed 2 years ago by malcolm.wallace@…

  • status changed from new to closed
  • resolution set to fixed

Pushed the faster version to the base library. It should probably be merged into ghc-6.12.2.

Changed 2 years ago by simonmar

Thanks Malcolm.

Note: See TracTickets for help on using tickets.