groupBy: Replacement definition of Data.List.GroupBy

[ data, library, mit ] [ Propose Tags ] [ Report a vulnerability ]

Please see the README on Github at https://github.com/oisdk/groupBy#readme

[Skip to Readme]

Modules

[Index]

Data
- List
  - Data.List.GroupBy

Downloads

groupBy-0.1.0.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

oisdk

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1.0.0
Dependencies	base (>=4 && <5) [details]
License	MIT
Copyright	2018 Donnacha Oisín Kidney
Author	Donnacha Oisín Kidney
Maintainer	mail@doisinkidney.com
Uploaded	by oisdk at 2018-01-30T00:14:45Z
Category	Data
Home page	https://github.com/oisdk/groupBy#readme
Bug tracker	https://github.com/oisdk/groupBy/issues
Source repo	head: git clone https://github.com/oisdk/groupBy
Distributions
Reverse Dependencies	1 direct, 0 indirect [details]
Downloads	1251 total (1 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2018-01-30 [all 1 reports]

Readme for groupBy-0.1.0.0

[back to package description]

groupBy

This provides a drop-in replacement for Data.List.groupBy, with benchmarks and tests.

The original Data.List.groupBy has (perhaps unexpected) behaviour, in that it compares elements to the first in the group, not adjacent ones. In other words, if you wanted to group into ascending sequences:

>>> Data.List.groupBy (<=) [1,2,2,3,1,2,0,4,5,2]
[[1,2,2,3,1,2],[0,4,5,2]]

The replacement has three distinct advantages:

It groups adjacent elements, allowing the example above to function as expected:

>>> Data.List.GroupBy.groupBy (<=) [1,2,2,3,1,2,0,4,5,2]
[[1,2,2,3],[1,2],[0,4,5],[2]]

It is a good producer and consumer, with rules similar to those for Data.List.scanl. The old version was defined in terms of span:

groupBy                 :: (a -> a -> Bool) -> [a] -> [[a]]
groupBy _  []           =  []
groupBy eq (x:xs)       =  (x:ys) : groupBy eq zs
                           where (ys,zs) = span (eq x) xs

Which prevents it from being a good producer/consumer.

It is significantly faster than the original in most cases.

Tests

Tests ensure that the function is the same as the original when the relation supplied is an equivalence, and that it performs the expected adjacent comparisons when the relation isn't transitive.

The tests also check that laziness is maintained, as defined by:

>>> head (groupBy (==) (1:2:undefined))
[1]

>>> (head . head) (groupBy undefined (1:undefined))
1

>>> (head . head . tail) (groupBy (==) (1:2:undefined))
2

Benchmarks

Benchmarks compare the function to three other implementations: the current Data.List.groupBy, a version provided by the utility-ht package, and a version provided by Brandon Simmons.

The benchmarks test functions that force the outer list:

length . groupBy eq

And functions which force the contents of the inner lists:

sum' = foldl' (+) 0

sum' . map sum' . groupBy eq

Each benchmark is run on lists where the groups are small, the groups are large, and where there is only one group. The default size is 10000, but other sizes can be provided with the --size=[x,y,z] flag to the benchmarks.

The new definition is slower than the old only when the size of the sublists is much larger than the size of the outer list. To make the newer definition faster in that case, you would simply force the pair (or use a strict pair) from the accumulator. However, this makes the new definition match the old speed in the other cases, which I would imagine are more common.