Portability | portable |
---|---|
Stability | experimental |
Maintainer | bos@serpentine.com |
Commonly used sample statistics, also known as descriptive statistics.
- type Sample = UArr Double
- mean :: Sample -> Double
- harmonicMean :: Sample -> Double
- geometricMean :: Sample -> Double
- variance :: Sample -> Double
- varianceUnbiased :: Sample -> Double
- stdDev :: Sample -> Double
- fastVariance :: Sample -> Double
- fastVarianceUnbiased :: Sample -> Double
- fastStdDev :: Sample -> Double
Types
Statistics of location
mean :: Sample -> DoubleSource
Arithmetic mean. This uses Welford's algorithm to provide numerical stability, using a single pass over the sample data.
harmonicMean :: Sample -> DoubleSource
Harmonic mean. This algorithm performs a single pass over the sample.
geometricMean :: Sample -> DoubleSource
Geometric mean of a sample containing no negative values.
Statistics of dispersion
The variance—and hence the standard deviation—of a sample of fewer than two elements are both defined to be zero.
Two-pass functions (numerically robust)
These functions use the compensated summation algorithm of Chan et al. for numerical robustness, but require two passes over the sample data as a result.
Because of the need for two passes, these functions are not subject to stream fusion.
varianceUnbiased :: Sample -> DoubleSource
Unbiased estimate of a sample's variance.
stdDev :: Sample -> DoubleSource
Standard deviation. This is simply the square root of the maximum likelihood estimate of the variance.
Single-pass functions (faster, less safe)
The functions prefixed with the name fast
below perform a single
pass over the sample data using Knuth's algorithm. They usually
work well, but see below for caveats. These functions are subject
to array fusion.
Note: in cases where most sample data is close to the sample's mean, Knuth's algorithm gives inaccurate results due to catastrophic cancellation.
fastVariance :: Sample -> DoubleSource
Maximum likelihood estimate of a sample's variance.
fastVarianceUnbiased :: Sample -> DoubleSource
Unbiased estimate of a sample's variance.
fastStdDev :: Sample -> DoubleSource
Standard deviation. This is simply the square root of the maximum likelihood estimate of the variance.
References
- Chan, T. F.; Golub, G.H.; LeVeque, R.J. (1979) Updating formulae and a pairwise algorithm for computing sample variances. Technical Report STAN-CS-79-773, Department of Computer Science, Stanford University. ftp://reports.stanford.edu/pub/cstr/reports/cs/tr/79/773/CS-TR-79-773.pdf
- Knuth, D.E. (1998) The art of computer programming, volume 2: seminumerical algorithms, 3rd ed., p. 232.
- Welford, B.P. (1962) Note on a method for calculating corrected sums of squares and products. Technometrics 4(3):419–420. http://www.jstor.org/stable/1266577
- West, D.H.D. (1979) Updating mean and variance estimates: an improved method. Communications of the ACM 22(9):532–535. http://doi.acm.org/10.1145/359146.359153