Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
Online statistics for ordered data (such as time-series data), modelled as mealy machines
Synopsis
- data Mealy a b = forall c. Mealy (a -> c) (c -> a -> c) (c -> b)
- dipure :: (a -> a -> a) -> Mealy a a
- pattern M :: (a -> c) -> (c -> a -> c) -> (c -> b) -> Mealy a b
- scan :: Mealy a b -> [a] -> [b]
- fold :: Mealy a b -> [a] -> b
- newtype Averager a b = Averager {
- sumCount :: (a, b)
- pattern A :: a -> b -> Averager a b
- av :: Divisive a => Averager a a -> a
- av_ :: (Eq a, Additive a, Divisive a) => Averager a a -> a -> a
- online :: (Divisive b, Additive b) => (a -> b) -> (b -> b) -> Mealy a b
- ma :: (Divisive a, Additive a) => a -> Mealy a a
- absma :: (Divisive a, Absolute a) => a -> Mealy a a
- sqma :: (Divisive a, Additive a) => a -> Mealy a a
- std :: ExpField a => a -> Mealy a a
- cov :: Field a => Mealy a a -> Mealy (a, a) a
- corrGauss :: ExpField a => a -> Mealy (a, a) a
- corr :: ExpField a => Mealy a a -> Mealy a a -> Mealy (a, a) a
- beta1 :: ExpField a => Mealy a a -> Mealy (a, a) a
- alpha1 :: ExpField a => Mealy a a -> Mealy (a, a) a
- reg1 :: ExpField a => Mealy a a -> Mealy (a, a) (a, a)
- beta :: (ExpField a, KnownNat n, Eq a) => a -> Mealy (Array '[n] a, a) (Array '[n] a)
- alpha :: (ExpField a, KnownNat n, Eq a) => a -> Mealy (Array '[n] a, a) a
- reg :: (ExpField a, KnownNat n, Eq a) => a -> Mealy (Array '[n] a, a) (Array '[n] a, a)
- asum :: Additive a => Mealy a a
- aconst :: b -> Mealy a b
- last :: Mealy a a
- maybeLast :: a -> Mealy (Maybe a) a
- delay1 :: a -> Mealy a a
- delay :: [a] -> Mealy a a
- window :: Int -> Mealy a (Seq a)
- diff :: (a -> a -> b) -> Mealy a b
- gdiff :: (a -> b) -> (a -> a -> b) -> Mealy a b
- same :: Eq b => (a -> b) -> Mealy a Bool
- countM :: Ord a => Mealy a (Map a Int)
- sumM :: (Ord a, Additive b) => Mealy (a, b) (Map a b)
- listify :: Mealy a b -> Mealy [a] [b]
- data Medianer a b = Medianer {}
- onlineL1 :: (Ord b, Field b, Absolute b) => b -> b -> (a -> b) -> (b -> b) -> Mealy a b
- maL1 :: (Ord a, Field a, Absolute a) => a -> a -> a -> Mealy a a
Types
A Mealy
a b is a triple of functions
- (a -> s) inject Convert an input into the state type.
- (s -> a -> s) step Update state given prior state and (new) input.
- (s -> b) extract Convert state to the output type.
By adopting this order, a Mealy sum looks like:
M id (+) id
where the first id is the initial injection to a contravariant position, and the second id is the covariant extraction.
inject kicks off state on the initial element of the Foldable, but is otherwise independent of step.
scan (M i s e) (x : xs) = e <$> scanl' s (i x) xs
forall c. Mealy (a -> c) (c -> a -> c) (c -> b) |
pattern M :: (a -> c) -> (c -> a -> c) -> (c -> b) -> Mealy a b Source #
Convenience pattern for a Mealy
.
M extract step inject
scan :: Mealy a b -> [a] -> [b] Source #
Run a list through a Mealy
and return a list of values for every step
length (scan _ xs) == length xs
Most common statistics are averages, which are some sort of aggregation of values (sum) and some sort of sample size (count).
av :: Divisive a => Averager a a -> a Source #
extract the average from an Averager
av gives NaN on zero divide
av_ :: (Eq a, Additive a, Divisive a) => Averager a a -> a -> a Source #
substitute a default value on zero-divide
av_ (Averager (0,0)) x == x
online :: (Divisive b, Additive b) => (a -> b) -> (b -> b) -> Mealy a b Source #
online f g
is a Mealy
where f is a transformation of the data and
g is a decay function (usually convergent to zero) applied at each step.
online id id == av
online
is best understood by examining usage
to produce a moving average and standard deviation:
An exponentially-weighted moving average with a decay rate of 0.9
ma r == online id (*r)
An exponentially-weighted moving average of the square.
sqma r = online (\x -> x * x) (* r)
Applicative-style exponentially-weighted standard deviation computation:
std r = (\s ss -> sqrt (ss - s ** 2)) <$> ma r <*> sqma r
Statistics
The doctest examples are composed from some random series generated with Data.Mealy.Simulate.
- xs0, xs1 & xs2 are samples from N(0,1)
- xsp is a pair of N(0,1)s with a correlation of 0.8
>>>
:set -XDataKinds
>>>
import Data.Mealy.Simulate
>>>
g <- create
>>>
xs0 <- rvs g 10000
>>>
xs1 <- rvs g 10000
>>>
xs2 <- rvs g 10000
>>>
xsp <- rvsp g 10000 0.8
ma :: (Divisive a, Additive a) => a -> Mealy a a Source #
A moving average using a decay rate of r. r=1 represents the simple average, and r=0 represents the latest value.
>>>
fold (ma 0) ([1..100])
100.0
>>>
fold (ma 1) ([1..100])
50.5
>>>
fold (ma 0.99) xs0
9.713356299018187e-2
absma :: (Divisive a, Absolute a) => a -> Mealy a a Source #
absolute average
>>>
fold (absma 1) xs0
0.8075705557429647
sqma :: (Divisive a, Additive a) => a -> Mealy a a Source #
average square
fold (ma r) . fmap (**2) == fold (sqma r)
std :: ExpField a => a -> Mealy a a Source #
standard deviation
The construction of standard deviation, using the Applicative instance of a Mealy
:
(\s ss -> sqrt (ss - s ** (one+one))) <$> ma r <*> sqma r
The average deviation of the numbers 1..1000 is about 1 / sqrt 12 * 1000 https://en.wikipedia.org/wiki/Uniform_distribution_(continuous)#Standard_uniform
>>>
fold (std 1) [0..1000]
288.9636655359978
The average deviation with a decay of 0.99
>>>
fold (std 0.99) [0..1000]
99.28328803163829
>>>
fold (std 1) xs0
1.0126438036262801
cov :: Field a => Mealy a a -> Mealy (a, a) a Source #
The covariance of a tuple given an underlying central tendency fold.
>>>
fold (cov (ma 1)) xsp
0.7818936662586868
corrGauss :: ExpField a => a -> Mealy (a, a) a Source #
correlation of a tuple, specialised to Guassian
>>>
fold (corrGauss 1) xsp
0.7978347126677433
corr :: ExpField a => Mealy a a -> Mealy a a -> Mealy (a, a) a Source #
a generalised version of correlation of a tuple
>>>
fold (corr (ma 1) (std 1)) xsp
0.7978347126677433
corr (ma r) (std r) == corrGauss r
beta1 :: ExpField a => Mealy a a -> Mealy (a, a) a Source #
The beta in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.
This is a generalisation of the classical regression formula, where averages are replaced by Mealy
statistics.
\[ \begin{align} \beta & = \frac{n\sum xy - \sum x \sum y}{n\sum x^2 - (\sum x)^2} \\ & = \frac{n^2 \overline{xy} - n^2 \bar{x} \bar{y}}{n^2 \overline{x^2} - n^2 \bar{x}^2} \\ & = \frac{\overline{xy} - \bar{x} \bar{y}}{\overline{x^2} - \bar{x}^2} \\ \end{align} \]
>>>
fold (beta1 (ma 1)) $ zipWith (\x y -> (y, x + y)) xs0 xs1
0.999747321294513
alpha1 :: ExpField a => Mealy a a -> Mealy (a, a) a Source #
The alpha in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.
\[ \begin{align} \alpha & = \frac{\sum y \sum x^2 - \sum x \sum xy}{n\sum x^2 - (\sum x)^2} \\ & = \frac{n^2 \bar{y} \overline{x^2} - n^2 \bar{x} \overline{xy}}{n^2 \overline{x^2} - n^2 \bar{x}^2} \\ & = \frac{\bar{y} \overline{x^2} - \bar{x} \overline{xy}}{\overline{x^2} - \bar{x}^2} \\ \end{align} \]
>>>
fold (alpha1 (ma 1)) $ zipWith (\x y -> ((3+y), x + 0.5 * (3 + y))) xs0 xs1
1.3680496627365146e-2
reg1 :: ExpField a => Mealy a a -> Mealy (a, a) (a, a) Source #
The (alpha, beta) tuple in a simple linear regression of an (independent variable, single dependent variable) tuple given an underlying central tendency fold.
>>>
fold (reg1 (ma 1)) $ zipWith (\x y -> ((3+y), x + 0.5 * (3 + y))) xs0 xs1
(1.3680496627365146e-2,0.4997473212944953)
beta :: (ExpField a, KnownNat n, Eq a) => a -> Mealy (Array '[n] a, a) (Array '[n] a) Source #
multiple regression
\[ \begin{align} {\hat {{\mathbf {B}}}}=({\mathbf {X}}^{{{\rm {T}}}}{\mathbf {X}})^{{ -1}}{\mathbf {X}}^{{{\rm {T}}}}{\mathbf {Y}} \end{align} \]
\[ \begin{align} {\mathbf {X}}={\begin{bmatrix}{\mathbf {x}}_{1}^{{{\rm {T}}}}\\{\mathbf {x}}_{2}^{{{\rm {T}}}}\\\vdots \\{\mathbf {x}}_{n}^{{{\rm {T}}}}\end{bmatrix}}={\begin{bmatrix}x_{{1,1}}&\cdots &x_{{1,k}}\\x_{{2,1}}&\cdots &x_{{2,k}}\\\vdots &\ddots &\vdots \\x_{{n,1}}&\cdots &x_{{n,k}}\end{bmatrix}} \end{align} \]
let ys = zipWith3 (\x y z -> 0.1 * x + 0.5 * y + 1 * z) xs0 xs1 xs2 let zs = zip (zipWith (\x y -> fromList [x,y] :: F.Array '[2] Double) xs1 xs2) ys fold (beta 0.99) zs
- 0.4982692361226971, 1.038192474255091
alpha :: (ExpField a, KnownNat n, Eq a) => a -> Mealy (Array '[n] a, a) a Source #
alpha in a multiple regression
reg :: (ExpField a, KnownNat n, Eq a) => a -> Mealy (Array '[n] a, a) (Array '[n] a, a) Source #
multiple regression
let ys = zipWith3 (\x y z -> 0.1 * x + 0.5 * y + 1 * z) xs0 xs1 xs2 let zs = zip (zipWith (\x y -> fromList [x,y] :: F.Array '[2] Double) xs1 xs2) ys fold (reg 0.99) zs
([0.4982692361226971, 1.038192474255091],2.087160803386695e-3)
maybeLast :: a -> Mealy (Maybe a) a Source #
most recent value if it exists, previous value otherwise.
:: [a] | initial statistical values, delay equals length |
-> Mealy a a |
delays values by n steps
delay [0] == delay1 0
delay [] == id
delay [1,2] = delay1 2 . delay1 1
>>>
scan (delay [-2,-1]) [0..3]
[-2,-1,0,1]
Autocorrelation example:
scan (((,) <$> id <*> delay [0]) >>> beta (ma 0.99)) xs0
window :: Int -> Mealy a (Seq a) Source #
a moving window of a's, most recent at the front of the sequence
median
A rough Median. The average absolute value of the stat is used to callibrate estimate drift towards the median