statistics-0.10.0.0: A library of statistical types, data, and functions

Portabilityportable
Stabilityexperimental
Maintainerbos@serpentine.com

Statistics.Quantile

Contents

Description

Functions for approximating quantiles, i.e. points taken at regular intervals from the cumulative distribution function of a random variable.

The number of quantiles is described below by the variable q, so with q=4, a 4-quantile (also known as a quartile) has 4 intervals, and contains 5 points. The parameter k describes the desired point, where 0 ≤ kq.

Synopsis

Quantile estimation functions

weightedAvgSource

Arguments

:: Vector v Double 
=> Int

k, the desired quantile.

-> Int

q, the number of quantiles.

-> v Double

x, the sample data.

-> Double 

O(n log n). Estimate the kth q-quantile of a sample, using the weighted average method.

data ContParam Source

Parameters a and b to the continuousBy function.

Constructors

ContParam !Double !Double 

continuousBySource

Arguments

:: Vector v Double 
=> ContParam

Parameters a and b.

-> Int

k, the desired quantile.

-> Int

q, the number of quantiles.

-> v Double

x, the sample data.

-> Double 

O(n log n). Estimate the kth q-quantile of a sample x, using the continuous sample method with the given parameters. This is the method used by most statistical software, such as R, Mathematica, SPSS, and S.

midspreadSource

Arguments

:: Vector v Double 
=> ContParam

Parameters a and b.

-> Int

q, the number of quantiles.

-> v Double

x, the sample data.

-> Double 

O(n log n). Estimate the range between q-quantiles 1 and q-1 of a sample x, using the continuous sample method with the given parameters.

For instance, the interquartile range (IQR) can be estimated as follows:

 midspread medianUnbiased 4 (U.fromList [1,1,2,2,3])
 ==> 1.333333

Parameters for the continuous sample method

cadpw :: ContParamSource

California Department of Public Works definition, a=0, b=1. Gives a linear interpolation of the empirical CDF. This corresponds to method 4 in R and Mathematica.

hazen :: ContParamSource

Hazen's definition, a=0.5, b=0.5. This is claimed to be popular among hydrologists. This corresponds to method 5 in R and Mathematica.

s :: ContParamSource

Definition used by the S statistics application, with a=1, b=1. The interpolation points divide the sample range into n-1 intervals. This corresponds to method 7 in R and Mathematica.

spss :: ContParamSource

Definition used by the SPSS statistics application, with a=0, b=0 (also known as Weibull's definition). This corresponds to method 6 in R and Mathematica.

medianUnbiased :: ContParamSource

Median unbiased definition, a=1/3, b=1/3. The resulting quantile estimates are approximately median unbiased regardless of the distribution of x. This corresponds to method 8 in R and Mathematica.

normalUnbiased :: ContParamSource

Normal unbiased definition, a=3/8, b=3/8. An approximately unbiased estimate if the empirical distribution approximates the normal distribution. This corresponds to method 9 in R and Mathematica.

References