The tdigest package

[ Tags: bsd3, library, numeric ] [ Propose Tags ]

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl for more details https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf.


[Skip to Readme]

Properties

Versions 0, 0.1
Change log CHANGELOG.md
Dependencies base (>=4.7 && <4.11), base-compat (>=0.9.1 && <0.10), binary (>=0.7.1.0 && <0.10), deepseq (>=1.3.0.2 && <1.5), reducers (>=3.12.1 && <3.13), semigroupoids (>=5.1 && <5.3), semigroups (>=0.18.2 && <0.19), vector (>=0.11 && <0.13), vector-algorithms (>=0.7.0.1 && <0.8) [details]
License BSD3
Author Oleg Grenrus <oleg.grenrus@iki.fi>
Maintainer Oleg Grenrus <oleg.grenrus@iki.fi>
Category Numeric
Home page https://github.com/futurice/haskell-tdigest#readme
Bug tracker https://github.com/futurice/haskell-tdigest/issues
Source repository head: git clone https://github.com/futurice/haskell-tdigest
Uploaded Wed Mar 8 12:31:14 UTC 2017 by phadej
Updated Thu Jul 27 22:31:25 UTC 2017 by phadej to revision 2   [What is this?]
Distributions LTSHaskell:0.1, NixOS:0.1, Stackage:0.1, Tumbleweed:0.1
Downloads 153 total (14 in the last 30 days)
Rating (no votes yet) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2017-03-08 [all 1 reports]
Hackage Matrix CI

Modules

[Index]

Downloads

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

For package maintainers and hackage trustees


Readme for tdigest-0.1

[back to package description]

tdigest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl

Synopsis

λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737

Benchmarks

Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

Example histogram

tdigest-simple -m tdigest -d standard -s 100000 -c 10 -o output.svg -i 34
cp output.svg example.svg
inkscape --export-png=example.png --export-dpi=80 --export-background-opacity=0 --without-gui example.svg

Example