Ticket #3271 (closed proposal: fixed)

Opened 4 years ago

Last modified 4 years ago

New methods for Data.Sequence

Reported by: LouisWasserman Owned by: LouisWasserman
Priority: normal Milestone: Not GHC
Component: libraries (other) Version:
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Data.Sequence is meant as a general-purpose implementation of finite sequences. The range of methods it offers, however, is considerably more constrained than Data.List, even allowing for the constraint that sequences are finite.

The following methods cannot be efficiently implemented based on currently exported methods from Data.Sequence.

  • zipWith and its variants. Note: it suffices to define zipWith alone.
  • replicate. (This can be implemented in O(log n) time with node sharing.)

The following methods are relatively simple extensions of already-exported methods. It may be more appropriate to allow library users to reimplement them themselves.

  • scanl, scanr, and variants. This is relatively straightforward using methods borrowed from the Traversable instance.
  • tails and inits. tails is equivalent to scanr (<|) empty; inits is similar.
  • takeWhile, dropWhile, span, break (and perhaps from-the-end variations). Finding a breakpoint index can be done as efficiently on lists as on sequences; find the appropriate breakpoint index after converting to a list and use splitAt.
  • unfoldr and iterate, though the latter would require a finite number of iterations to perform.
  • partition. I can conceive of a slightly more efficient implementation than the trivial one, but the gains may be minimal.

Attachments

second-round-of-methods-for-data_sequence.dpatch Download (11.9 KB) - added by LouisWasserman 4 years ago.
The remaining methods specified in the ticket.
SequenceCheck.hs Download (2.4 KB) - added by LouisWasserman 4 years ago.
Quick Check properties testing that each function specified in the ticket behaves analogously to the Data.List equivalents.
new-methods-for-data_sequence.dpatch Download (41.5 KB) - added by LouisWasserman 4 years ago.
Final version of all methods for Data.Sequence, including sort and sortBy.
new-methods-for-data_sequence.2.dpatch Download (37.9 KB) - added by LouisWasserman 4 years ago.
new-methods-for-data_sequence.3.2.dpatch Download (46.9 KB) - added by LouisWasserman 4 years ago.
Considerably better documented, benchmarked, optimized, and more concise implementation of the above methods.
new-methods-for-data_sequence.3.dpatch Download (46.9 KB) - added by LouisWasserman 4 years ago.
Considerably better documented, benchmarked, optimized, and more concise implementation of the above methods.
new-methods-for-data_sequence.4.dpatch Download (48.3 KB) - added by LouisWasserman 4 years ago.
This version contains the resolved-upon compromise, with a stable sort based on Data.List.sort for sort and sortBy and a speedy but unstable sort in unstableSort and unstableSortBy.
new-methods-for-data_sequence.5.dpatch Download (50.3 KB) - added by LouisWasserman 4 years ago.
Contains a variety of new methods to deal with indices, and span-type methods starting from the end of the sequence.
new-methods-for-data_sequence.6.dpatch Download (48.7 KB) - added by LouisWasserman 4 years ago.
Reduced usage of partial pattern matches, and standardization of the XXXL/XXXR methods. New methods: foldWithIndexL, foldWithIndexR.

Change History

Changed 4 years ago by LouisWasserman

  • component changed from libraries/base to libraries (other)

Changed 4 years ago by LouisWasserman

Discussion deadline: 2 weeks after ticket submission. (June 16)

Changed 4 years ago by LouisWasserman

The remaining methods specified in the ticket.

Changed 4 years ago by LouisWasserman

Quick Check properties testing that each function specified in the ticket behaves analogously to the Data.List equivalents.

Changed 4 years ago by LouisWasserman

New additional method: a sort method, possibly faster than the Data.List equivalent.

Changed 4 years ago by LouisWasserman

  • owner set to LouisWasserman

Changed 4 years ago by LouisWasserman

Final version of all methods for Data.Sequence, including sort and sortBy.

Changed 4 years ago by igloo

  • difficulty set to Unknown
  • milestone set to Not GHC

Changed 4 years ago by LouisWasserman

Changed 4 years ago by LouisWasserman

In response to Ross's useful suggestions (which I had not actually noticed until recently), I have made considerable changes to my Data.Sequence patch here.

Here are the relevant points:

  • I completely rewrote the sorting method. The sort is unstable, but it is 40 lines (much more maintainable than the several-hundred line implementation from earlier) and runs *twice as fast as* (fromList . Data.List.sort . toList) for n=50000. (For sorted lists, it clocks in at about 4x faster.)

o This is with no RTS options. With -H128m, the advantage is considerably thinner -- 39ms vs. 43ms (about one standard deviation) -- but the implementation is sufficiently compact that the moderate benefit is satisfactory.

  • tails and inits are considerably more sophisticated, running to about 50 lines apiece (although some of that code is shared between them), but

o This implementation is genuinely asymptotically faster than the previous implementation: evaluating any tail from the sequence returned by tails takes O(log (min (i, n-i))), as opposed to O(n) in scanr (<|) empty xs, but evaluating every tail from the sequence takes O(n) overall, as opposed to O(n log n) in fromList [drop n xs | n <- [0..length xs]]. o Without direct access to the internals of the sequence it is impossible to match the asymptotic performance of this implementation. o This implementation is also a hair faster in practice.

  • partition is now in fact implemented via a simple foldl', which is actually faster than my old, several-dozen-line implementation as well as obviously simpler.
  • filter has been added to the list of methods available in Data.Sequence.
  • iterate has been renamed to iterateN to reinforce the different type, which requires a finite bound on the sequence length.
  • On the back end, replicate, iterateN, and sortBy do not use fromList, but instead use a common framework that wraps construction of a sequence in an Applicative functor. This automatically induces the node sharing that gives replicate performance O(log n), but behaves exactly correctly for all its other requirements.
  • As a result, there is a faster alternative to fromList if the length of the list is known. The name and type of this method seems like it might become genuinely contentious, so I'm not inclined to expose that faster method at the moment.

Changed 4 years ago by LouisWasserman

Considerably better documented, benchmarked, optimized, and more concise implementation of the above methods.

Changed 4 years ago by LouisWasserman

Considerably better documented, benchmarked, optimized, and more concise implementation of the above methods.

Changed 4 years ago by LouisWasserman

This version contains the resolved-upon compromise, with a stable sort based on Data.List.sort for sort and sortBy and a speedy but unstable sort in unstableSort and unstableSortBy.

Changed 4 years ago by LouisWasserman

  • version 6.10.2 deleted

Changed 4 years ago by LouisWasserman

Contains a variety of new methods to deal with indices, and span-type methods starting from the end of the sequence.

Changed 4 years ago by LouisWasserman

Reduced usage of partial pattern matches, and standardization of the XXXL/XXXR methods. New methods: foldWithIndexL, foldWithIndexR.

Changed 4 years ago by ross

  • status changed from new to closed
  • resolution set to fixed

This was discussed at length by Louis and Ross, with no dissent from anyone else. During the discussion the patch went through several iterations, cutting a lot of specialized code, but also adding several more methods.

In committing the patch, I've tweaked the doc comments a little, and renamed foldWithIndexL and foldWithIndexR as foldlWithIndex and foldrWithIndex, parallelling foldlWithKey and foldrWithKey in Data.Map.

Note: See TracTickets for help on using tickets.