statistics-0.2.1: A library of statistical types, data, and functions

Portabilityportable
Stabilityexperimental
Maintainerbos@serpentine.com

Statistics.Sample

Contents

Description

Commonly used sample statistics, also known as descriptive statistics.

Synopsis

Types

type Sample = UArr DoubleSource

Sample data.

Statistics of location

mean :: Sample -> DoubleSource

Arithmetic mean. This uses Welford's algorithm to provide numerical stability, using a single pass over the sample data.

harmonicMean :: Sample -> DoubleSource

Harmonic mean. This algorithm performs a single pass over the sample.

geometricMean :: Sample -> DoubleSource

Geometric mean of a sample containing no negative values.

Statistics of dispersion

The variance—and hence the standard deviation—of a sample of fewer than two elements are both defined to be zero.

Two-pass functions (numerically robust)

These functions use the compensated summation algorithm of Chan et al. for numerical robustness, but require two passes over the sample data as a result.

Because of the need for two passes, these functions are not subject to stream fusion.

variance :: Sample -> DoubleSource

Maximum likelihood estimate of a sample's variance.

varianceUnbiased :: Sample -> DoubleSource

Unbiased estimate of a sample's variance.

stdDev :: Sample -> DoubleSource

Standard deviation. This is simply the square root of the maximum likelihood estimate of the variance.

Single-pass functions (faster, less safe)

The functions prefixed with the name fast below perform a single pass over the sample data using Knuth's algorithm. They usually work well, but see below for caveats. These functions are subject to array fusion.

Note: in cases where most sample data is close to the sample's mean, Knuth's algorithm gives inaccurate results due to catastrophic cancellation.

fastVariance :: Sample -> DoubleSource

Maximum likelihood estimate of a sample's variance.

fastVarianceUnbiased :: Sample -> DoubleSource

Unbiased estimate of a sample's variance.

fastStdDev :: Sample -> DoubleSource

Standard deviation. This is simply the square root of the maximum likelihood estimate of the variance.

References