Portability | portable |
---|---|

Stability | experimental |

Maintainer | bos@serpentine.com |

Commonly used sample statistics, also known as descriptive statistics.

- type Sample = UArr Double
- mean :: Sample -> Double
- harmonicMean :: Sample -> Double
- geometricMean :: Sample -> Double
- variance :: Sample -> Double
- varianceUnbiased :: Sample -> Double
- stdDev :: Sample -> Double
- fastVariance :: Sample -> Double
- fastVarianceUnbiased :: Sample -> Double
- fastStdDev :: Sample -> Double

# Types

# Statistics of location

mean :: Sample -> DoubleSource

Arithmetic mean. This uses Welford's algorithm to provide numerical stability, using a single pass over the sample data.

harmonicMean :: Sample -> DoubleSource

Harmonic mean. This algorithm performs a single pass over the sample.

geometricMean :: Sample -> DoubleSource

Geometric mean of a sample containing no negative values.

# Statistics of dispersion

The variance—and hence the standard deviation—of a sample of fewer than two elements are both defined to be zero.

## Two-pass functions (numerically robust)

These functions use the compensated summation algorithm of Chan et al. for numerical robustness, but require two passes over the sample data as a result.

Because of the need for two passes, these functions are *not*
subject to stream fusion.

varianceUnbiased :: Sample -> DoubleSource

Unbiased estimate of a sample's variance.

stdDev :: Sample -> DoubleSource

Standard deviation. This is simply the square root of the maximum likelihood estimate of the variance.

## Single-pass functions (faster, less safe)

The functions prefixed with the name `fast`

below perform a single
pass over the sample data using Knuth's algorithm. They usually
work well, but see below for caveats. These functions are subject
to array fusion.

*Note*: in cases where most sample data is close to the sample's
mean, Knuth's algorithm gives inaccurate results due to
catastrophic cancellation.

fastVariance :: Sample -> DoubleSource

Maximum likelihood estimate of a sample's variance.

fastVarianceUnbiased :: Sample -> DoubleSource

Unbiased estimate of a sample's variance.

fastStdDev :: Sample -> DoubleSource

Standard deviation. This is simply the square root of the maximum likelihood estimate of the variance.

# References

- Chan, T. F.; Golub, G.H.; LeVeque, R.J. (1979) Updating formulae and a pairwise algorithm for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University. ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf
- Knuth, D.E. (1998) The art of computer programming, volume 2: seminumerical algorithms, 3rd ed., p. 232.
- Welford, B.P. (1962) Note on a method for calculating corrected
sums of squares and products.
*Technometrics*4(3):419–420. http://www.jstor.org/stable/1266577 - West, D.H.D. (1979) Updating mean and variance estimates: an
improved method.
*Communications of the ACM*22(9):532–535. http://doi.acm.org/10.1145/359146.359153