tdigest-0.1: On-line accumulation of rank-based statistics

Safe HaskellNone
LanguageHaskell2010

Data.TDigest.Internal

Contents

Description

Internals of TDigest.

Tree implementation is based on Adams’ Trees Revisited by Milan Straka http://fox.ucw.cz/papers/bbtree/bbtree.pdf

Synopsis

Documentation

assert :: Bool -> String -> a -> a Source #

type Size = Int Source #

data TDigest compression Source #

TDigest is a tree of centroids.

compression is a 1/δ. The greater the value of compression the less likely value merging will happen.

Constructors

Node !Size !Mean !Weight !Weight !(TDigest compression) !(TDigest compression)

Tree node

Nil

Empty tree

Instances

KnownNat comp => Reducer Double (TDigest comp) Source #

Both cons and snoc are insert

Methods

unit :: Double -> TDigest comp #

snoc :: TDigest comp -> Double -> TDigest comp #

cons :: Double -> TDigest comp -> TDigest comp #

Show (TDigest compression) Source # 

Methods

showsPrec :: Int -> TDigest compression -> ShowS #

show :: TDigest compression -> String #

showList :: [TDigest compression] -> ShowS #

KnownNat comp => Semigroup (TDigest comp) Source # 

Methods

(<>) :: TDigest comp -> TDigest comp -> TDigest comp #

sconcat :: NonEmpty (TDigest comp) -> TDigest comp #

stimes :: Integral b => b -> TDigest comp -> TDigest comp #

KnownNat comp => Monoid (TDigest comp) Source # 

Methods

mempty :: TDigest comp #

mappend :: TDigest comp -> TDigest comp -> TDigest comp #

mconcat :: [TDigest comp] -> TDigest comp #

KnownNat comp => Binary (TDigest comp) Source #

TDigest isn't compressed after de-serialisation, but it can be still smaller.

Methods

put :: TDigest comp -> Put #

get :: Get (TDigest comp) #

putList :: [TDigest comp] -> Put #

NFData (TDigest comp) Source #

TDigest has only strict fields.

Methods

rnf :: TDigest comp -> () #

But*, the benefit vs. code explosion is not yet worth.

totalWeight :: TDigest comp -> Weight Source #

Total count of samples.

>>> totalWeight (tdigest [1..100] :: TDigest 5)
100.0

size :: TDigest comp -> Int Source #

minimumValue :: TDigest comp -> Mean Source #

Center of left-most centroid. Note: may be different than min element inserted.

>>> minimumValue (tdigest [1..100] :: TDigest 3)
1.0

maximumValue :: TDigest comp -> Mean Source #

Center of right-most centroid. Note: may be different than max element inserted.

>>> maximumValue (tdigest [1..100] :: TDigest 3)
99.0

combineDigest :: KnownNat comp => TDigest comp -> TDigest comp -> TDigest comp Source #

insertCentroid :: forall comp. KnownNat comp => Centroid -> TDigest comp -> TDigest comp Source #

node :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp Source #

Constructor which calculates size and total weight.

balanceR :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp Source #

Balance after right insertion.

balanceL :: Mean -> Weight -> TDigest comp -> TDigest comp -> TDigest comp Source #

Balance after left insertion.

node' :: Int -> Mean -> Weight -> Weight -> TDigest comp -> TDigest comp -> TDigest comp Source #

Alias to Node

singNode :: Mean -> Weight -> TDigest comp Source #

Create singular node.

combinedCentroid :: Mean -> Weight -> Mean -> Weight -> Centroid Source #

Add two weighted means together.

threshold Source #

Arguments

:: Double

total weight

-> Double

quantile

-> Double

compression (1/δ)

-> Double 

Calculate the threshold, i.e. maximum weight of centroid.

compress :: forall comp. KnownNat comp => TDigest comp -> TDigest comp Source #

Compress TDigest.

Reinsert the centroids in "better" order (in original paper: in random) so they have opportunity to merge.

Compression will happen only if size is both: bigger than relMaxSize * comp and bigger than absMaxSize.

forceCompress :: forall comp. KnownNat comp => TDigest comp -> TDigest comp Source #

Perform compression, even if current size says it's not necessary.

toMVector Source #

Arguments

:: KnownNat comp 
=> TDigest comp

t-Digest

-> ST s (MVector s (Centroid, Double))

return also a "space left in the centroid" value for "shuffling"

relMaxSize :: Int Source #

Relative size parameter. Hard-coded value: 25.

absMaxSize :: Int Source #

Absolute size parameter. Hard-coded value: 1000.

debugPrint :: TDigest comp -> IO () Source #

Output the TDigest tree.

validate :: TDigest comp -> Either String (TDigest comp) Source #

Check various invariants in the TDigest tree.

insert Source #

Arguments

:: KnownNat comp 
=> Double

element

-> TDigest comp 
-> TDigest comp 

Insert single value into TDigest.

insert' Source #

Arguments

:: KnownNat comp 
=> Double

element

-> TDigest comp 
-> TDigest comp 

Insert single value, don't compress TDigest even if needed.

For sensibly bounded input, it makes sense to let TDigest grow (it might grow linearly in size), and after that compress it once.

singleton :: KnownNat comp => Double -> TDigest comp Source #

Make a TDigest of a single data point.

tdigest :: (Foldable f, KnownNat comp) => f Double -> TDigest comp Source #

Strict foldl' over Foldable structure.

>>> :set -XDataKinds