tdigest: On-line accumulation of rank-based statistics

[ bsd3, library, numeric ] [ Propose Tags ]

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl for more details https://github.com/tdunning/t-digest/blob/07b8f2ca2be8d0a9f04df2feadad5ddc1bb73c88/docs/t-digest-paper/histo.pdf.


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 0, 0.1, 0.2, 0.2.1, 0.2.1.1, 0.3
Change log CHANGELOG.md
Dependencies base (>=4.7 && <4.12), base-compat (>=0.10.1 && <0.11), binary (>=0.7.1.0 && <0.10), deepseq (>=1.3.0.2 && <1.5), reducers (>=3.12.2 && <3.13), semigroupoids (>=5.2.2 && <5.3), semigroups (>=0.18.4 && <0.19), transformers (>=0.3 && <0.6), vector (>=0.12.0.1 && <0.13), vector-algorithms (>=0.7.0.1 && <0.8) [details]
License BSD-3-Clause
Author Oleg Grenrus <oleg.grenrus@iki.fi>
Maintainer Oleg Grenrus <oleg.grenrus@iki.fi>
Category Numeric
Home page https://github.com/futurice/haskell-tdigest#readme
Bug tracker https://github.com/futurice/haskell-tdigest/issues
Source repo head: git clone https://github.com/futurice/haskell-tdigest
Uploaded by phadej at 2018-04-13T12:57:29Z
Distributions Arch:0.3, LTSHaskell:0.3, NixOS:0.3, Stackage:0.3
Reverse Dependencies 11 direct, 19 indirect [details]
Downloads 10874 total (132 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-04-13 [all 1 reports]

Readme for tdigest-0.2

[back to package description]

tdigest

A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means.

See original paper: "Computing extremely accurate quantiles using t-digest" by Ted Dunning and Otmar Ertl

Synopsis

λ *Data.TDigest > median (tdigest [1..1000] :: TDigest 3)
Just 499.0090729817737

Benchmarks

Using 50M exponentially distributed numbers:

  • average: 16s; incorrect approximation of median, mostly to measure prng speed
  • sorting using vector-algorithms: 33s; using 1000MB of memory
  • sparking t-digest (using some par): 53s
  • buffered t-digest: 68s
  • sequential t-digest: 65s

Example histogram

tdigest-simple -m tdigest -d standard -s 100000 -c 10 -o output.svg -i 34
cp output.svg example.svg
inkscape --export-png=example.png --export-dpi=80 --export-background-opacity=0 --without-gui example.svg

Example