statistics-linreg-0.2.3: Linear regression between two samples, based on the 'statistics' package.

Statistics.LinearRegression

Synopsis

# Simple linear regression functions

Simple linear regression between 2 samples. Takes two vectors Y={yi} and X={xi} and returns (alpha, beta) such that Y = alpha + beta*X

Simple linear regression between 2 samples. Takes two vectors Y={yi} and X={xi} and returns (alpha, beta, r*r) such that Y = alpha + beta*X and where r is the Pearson product-moment correlation coefficient

Total Least Squares (TLS) linear regression. Assumes x-axis values (and not just y-axis values) are random variables and that both variables have similar distributions. interface is the same as `linearRegression`.

# related functions

Pearson's product-moment correlation coefficient

Covariance of two samples

# Robust linear regression

Finding a robust fit linear estimate between two samples. The procedure requires randomization and is based on the procedure described in the reference.

A wrapper that executes `robustFit` using a default random generator (meaning it is only pseudo-random)

Robust fit yielding also the R-square value of the "clean" dataset.

## Related types

The robust fit algorithm used has various parameters that can be specified using the `EstimationParameters` record.

Constructors

 EstimationParameters FieldsoutlierFraction :: !DoubleMaximal fraction of outliers expected in the sample (default 0.25) shortIterationSteps :: !IntNumber of concentration steps to take for initial evaluation of a solution (default 3) maxSubsetsNum :: !IntMaximal number of sampled subsets (pairs of points) to use as starting points (default 500) groupSubsets :: !IntIf the initial sample is large, and thus gets subdivided, this is the number of candidate-estimations to take from each subgroup, on which complete convergence will be executed (default 10) mediumSetSize :: !IntMaximal size of sample that can be analyzed without any sub-division (default 600) largeSetSize :: !IntMaximal size of sample that does not require two-step sub-division (see reference article) (default 1500) estimator :: EstimatorEstimator function to use (default linearRegression) errorFunction :: ErrorFunctionErrorFunction to use (default linearRegressionError)

type ErrorFunction = EstimatedRelation -> (Double, Double) -> DoubleSource

An `ErrorFunction` is a function that computes the error of a given point from an estimate. This module provides two error functions correspoinding to the two `Estimator` functions it defines:

• Vertical distance squared via `linearRegressionError` that should be used with `linearRegression`
• Total distance squared vie `linearRegressionTLSError` that should be used with `linearRegressionTLS`

An `Estimator` is a function that generates an estimated linear regression based on 2 samples. This module provides two estimator functions: `linearRegression` and `linearRegressionTLS`

type EstimatedRelation = (Double, Double)Source

An estimated linear relation between 2 samples is (alpha,beta) such that Y = alpha + beta*X.

## Provided values

Default set of parameters to use (see reference for details).

linearRegression error function is the square of the vertical distance of a point from the line.

linearRegressionTLS error function is the square of the total distance of a point from the line.

## Helper functions

Calculate the optimal (local minimum) estimate based on an initial estimate. The local minimum may not be the global (a.k.a. best) estimate but starting from enough different initial estimates should yield the global optimum eventually.