online: online statistics

[ bsd3, deprecated, library, project ] [ Propose Tags ]
Deprecated in favor of mealy

transformation of statistics to online algorithms


[Skip to Readme]

Downloads

Maintainer's Corner

Package maintainers

  • No current members of group

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.2.0, 0.2.1.0, 0.2.2.0, 0.2.3.0, 0.3.0.0, 0.4.0.0, 0.5.0, 0.6.0
Dependencies base (>=4.7 && <5), foldl, numhask, protolude, tdigest, vector, vector-algorithms [details]
License BSD-3-Clause
Copyright Tony Day
Author Tony Day
Maintainer tonyday567@gmail.com
Category statistics
Home page https://github.com/tonyday567/online
Source repo head: git clone https://github.com/tonyday567/online
Uploaded by tonyday567 at 2017-07-23T05:12:53Z
Distributions
Reverse Dependencies 1 direct, 0 indirect [details]
Downloads 3403 total (19 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2017-07-23 [all 1 reports]

Readme for online-0.2.0

[back to package description]

online

Build Status Hackage lts nightly

online turns a statistic (in haskell this can usually be thought of as a fold of a foldable) into an online algorithm.

motivation

Imagine a data stream, like an ordered indexed container or a time-series of measurements. An exponential moving average can be calculated as a repeated iteration over a stream of xs:

\[ ema_t = ema_{t-1} * 0.9 + x_t * 0.1 \]

The 0.1 is akin to the learning rate in machine learning, or 0.9 can be thought of as a decaying or a rate of forgetting. An exponential moving average learns about what the value of x has been lately, where lately is, on average, about 1/0.1 = 10 x's ago. All very neat.

The first bit of neat is speed. There's 2 times and a plus. The next is space: an ema is representing the recent xs in a size as big as a single x. Compare that with a simple moving average where you have to keep the history of the last n xs around to keep up (just try it).

It's so neat, it's probably a viable monoidal category all by itself.

online

Haskell allows us to abstract the compound ideas in an ema and create polymorphic routines over a wide variety of statistics, so that they all retain these properties of speed, space and rigour.

av xs = L.fold (online identity (.* 0.9)) xs
-- av [0..10] == 6.030559401413827
-- av [0..100] == 91.00241448887785

online identity (.* 0.9) is how you express an ema with a decay rate of 0.9.

online works for any statistic. Here's the construction of standard deviation using applicative style:

std :: Double -> L.Fold Double Double
std r = (\s ss -> sqrt (ss - s**2)) <$> ma r <*> sqma r
  where
    ma r = online identity (.*r)
    sqma r = online (**2) (.*r)