Portability | portable |
---|---|

Stability | experimental |

Maintainer | bos@serpentine.com |

Very fast statistics over simple powers of a sample. These can all be computed efficiently in just a single pass over a sample, with that pass subject to stream fusion.

The tradeoff is that some of these functions are less numerically
robust than their counterparts in the `Statistics.Sample`

module.
Where this is the case, the alternatives are noted.

- type Sample = Vector Double
- data Powers
- powers :: Int -> Sample -> Powers
- order :: Powers -> Int
- count :: Powers -> Int
- sum :: Powers -> Double
- mean :: Powers -> Double
- variance :: Powers -> Double
- stdDev :: Powers -> Double
- varianceUnbiased :: Powers -> Double
- centralMoment :: Int -> Powers -> Double
- skewness :: Powers -> Double
- kurtosis :: Powers -> Double

# Types

# Constructor

O(*n*) Collect the *n* simple powers of a sample.

Functions computed over a sample's simple powers require at least a
certain number (or *order*) of powers to be collected.

- To compute the
*k*th`centralMoment`

, at least*k*simple powers must be collected. - For the
`variance`

, at least 2 simple powers are needed. - For
`skewness`

, we need at least 3 simple powers. - For
`kurtosis`

, at least 4 simple powers are required.

This function is subject to stream fusion.

# Descriptive functions

The number of elements in the original `Sample`

. This is the
sample's zeroth simple power.

The sum of elements in the original `Sample`

. This is the
sample's first simple power.

# Statistics of location

mean :: Powers -> DoubleSource

The arithmetic mean of elements in the original `Sample`

.

This is less numerically robust than the mean function in the
`Statistics.Sample`

module, but the number is essentially free to
compute if you have already collected a sample's simple powers.

# Statistics of dispersion

variance :: Powers -> DoubleSource

Maximum likelihood estimate of a sample's variance. Also known
as the population variance, where the denominator is *n*. This is
the second central moment of the sample.

This is less numerically robust than the variance function in the
`Statistics.Sample`

module, but the number is essentially free to
compute if you have already collected a sample's simple powers.

stdDev :: Powers -> DoubleSource

Standard deviation. This is simply the square root of the maximum likelihood estimate of the variance.

# Functions over central moments

centralMoment :: Int -> Powers -> DoubleSource

Compute the *k*th central moment of a `Sample`

. The central
moment is also known as the moment about the mean.

skewness :: Powers -> DoubleSource

Compute the skewness of a sample. This is a measure of the asymmetry of its distribution.

A sample with negative skew is said to be *left-skewed*. Most of
its mass is on the right of the distribution, with the tail on the
left.

skewness . powers 3 $ U.to [1,100,101,102,103] ==> -1.497681449918257

A sample with positive skew is said to be *right-skewed*.

skewness . powers 3 $ U.to [1,2,3,4,100] ==> 1.4975367033335198

A sample's skewness is not defined if its `variance`

is zero.

kurtosis :: Powers -> DoubleSource

Compute the excess kurtosis of a sample. This is a measure of the "peakedness" of its distribution. A high kurtosis indicates that the sample's variance is due more to infrequent severe deviations than to frequent modest deviations.

A sample's excess kurtosis is not defined if its `variance`

is
zero.

# References

- Besset, D.H. (2000) Elements of statistics.
*Object-oriented implementation of numerical methods*ch. 9, pp. 311–331. http://www.elsevier.com/wps/product/cws_home/677916 - Anderson, G. (2009) Compute
*k*th central moments in one pass.*quantblog*. http://quantblog.wordpress.com/2009/02/07/compute-kth-central-moments-in-one-pass/