monoid-statistics-0.3.1: Monoids for calculation of statistics of sample

Stability experimental Alexey Khudyakov

Data.Monoid.Statistics

Contents

Description

Synopsis

# Type class

class Monoid m => StatMonoid m a whereSource

Monoid which corresponds to some stattics. In order to do so it must be commutative. In many cases it's not practical to construct monoids for each element so `papennd` was added. First parameter of type class is monoidal accumulator. Second is type of element over which statistic is calculated.

Statistic could be calculated with fold over sample. Since accumulator is `Monoid` such fold could be easily parralelized. Check examples section for more information.

Instance must satisfy following law:

``` pappend x (pappend y mempty) == pappend x mempty `mappend` pappend y mempty
mappend x y == mappend y x
```

It is very similar to Reducer type class from monoids package but require commutative monoids

Methods

pappend :: a -> m -> mSource

Add one element to monoid accumulator. P stands for point in analogy for Pointed.

Instances

 StatMonoid Max Double StatMonoid Min Double Real a => StatMonoid Variance a Real a => StatMonoid Mean a Integral a => StatMonoid (Count a) b (StatMonoid a x, StatMonoid b x) => StatMonoid (TwoStats a b) x

evalStatistic :: (Foldable d, StatMonoid m a) => d a -> mSource

Calculate statistic over `Foldable`. It's implemented in terms of foldl'.

## Examples

These examples show how to find maximum and minimum of a sample in one pass over data.

This is test data. It's not limited to list but could be anything what could be folded.

``` > let xs = [1..100] :: [Double]
```

Now let calculate maximum of test sample using two methods. First one is to use generic function `evalStatistic` and another one is fold.

``` > evalStatistic xs :: Max
Max {calcMax = 100.0}
> foldl (flip pappend) mempty xs :: Max
Max {calcMax = 100.0}
```

More complicated example allows to combine several monoids together. It allows to calculate two statistics in one pass:

``` > evalStatistic xs :: TwoStats Min Max
TwoStats {calcStat1 = Min {calcMin = 1.0}, calcStat2 = Max {calcMax = 100.0}}
```

Last example shows how to calculate nuber of elements, mean and variance at once:

``` > let v = evalStatistic xs :: Variance
> calcCount v
100
> calcMean v
50.5
> calcStddev v
28.86607004772212
```

data TwoStats a b Source

Monoid which allows to calculate two statistics in parralel

Constructors

 TwoStats FieldscalcStat1 :: !a calcStat2 :: !b

Instances

 Typeable2 TwoStats (Eq a, Eq b) => Eq (TwoStats a b) (Show a, Show b) => Show (TwoStats a b) (Monoid a, Monoid b) => Monoid (TwoStats a b) (StatMonoid a x, StatMonoid b x) => StatMonoid (TwoStats a b) x

# Additional information

Statistic is function of a sample which does not depend on order of elements in a sample. For each statistics corresponding monoid could be constructed:

``` f :: [A] -> B

data F = F [A]

evalF (F xs) = f xs

instance Monoid F here
mempty = F []
(F a) `mappend` (F b) = F (a ++ b)
```

This indeed proves that monoid could be constructed. Monoid above is completely impractical. It runs in O(n) space. However for some statistics monoids which runs in O(1) space could be implemented. Simple examples of such statistics are number of elements in sample or mean of a sample.

On the other hand some statistics could not be implemented in such way. For example calculation of median require O(n) space. Variance could be implemented in O(1) but such implementation will have problems with numberical stability.