statistics-linreg-0.2.3: Linear regression between two samples, based on the 'statistics' package.

Safe HaskellNone




Simple linear regression functions

linearRegression :: Sample -> Sample -> (Double, Double)Source

Simple linear regression between 2 samples. Takes two vectors Y={yi} and X={xi} and returns (alpha, beta) such that Y = alpha + beta*X

linearRegressionRSqr :: Sample -> Sample -> (Double, Double, Double)Source

Simple linear regression between 2 samples. Takes two vectors Y={yi} and X={xi} and returns (alpha, beta, r*r) such that Y = alpha + beta*X and where r is the Pearson product-moment correlation coefficient

linearRegressionTLS :: Sample -> Sample -> (Double, Double)Source

Total Least Squares (TLS) linear regression. Assumes x-axis values (and not just y-axis values) are random variables and that both variables have similar distributions. interface is the same as linearRegression.

related functions

correl :: Sample -> Sample -> DoubleSource

Pearson's product-moment correlation coefficient

covar :: Sample -> Sample -> DoubleSource

Covariance of two samples

Robust linear regression

robustFit :: MonadRandom m => EstimationParameters -> Sample -> Sample -> m EstimatedRelationSource

Finding a robust fit linear estimate between two samples. The procedure requires randomization and is based on the procedure described in the reference.

nonRandomRobustFit :: EstimationParameters -> Sample -> Sample -> EstimatedRelationSource

A wrapper that executes robustFit using a default random generator (meaning it is only pseudo-random)

robustFitRSqr :: MonadRandom m => EstimationParameters -> Sample -> Sample -> m (EstimatedRelation, Double)Source

Robust fit yielding also the R-square value of the "clean" dataset.

Related types

data EstimationParameters Source

The robust fit algorithm used has various parameters that can be specified using the EstimationParameters record.




outlierFraction :: !Double

Maximal fraction of outliers expected in the sample (default 0.25)

shortIterationSteps :: !Int

Number of concentration steps to take for initial evaluation of a solution (default 3)

maxSubsetsNum :: !Int

Maximal number of sampled subsets (pairs of points) to use as starting points (default 500)

groupSubsets :: !Int

If the initial sample is large, and thus gets subdivided, this is the number of candidate-estimations to take from each subgroup, on which complete convergence will be executed (default 10)

mediumSetSize :: !Int

Maximal size of sample that can be analyzed without any sub-division (default 600)

largeSetSize :: !Int

Maximal size of sample that does not require two-step sub-division (see reference article) (default 1500)

estimator :: Estimator

Estimator function to use (default linearRegression)

errorFunction :: ErrorFunction

ErrorFunction to use (default linearRegressionError)

type ErrorFunction = EstimatedRelation -> (Double, Double) -> DoubleSource

An ErrorFunction is a function that computes the error of a given point from an estimate. This module provides two error functions correspoinding to the two Estimator functions it defines:

type Estimator = Sample -> Sample -> EstimatedRelationSource

An Estimator is a function that generates an estimated linear regression based on 2 samples. This module provides two estimator functions: linearRegression and linearRegressionTLS

type EstimatedRelation = (Double, Double)Source

An estimated linear relation between 2 samples is (alpha,beta) such that Y = alpha + beta*X.

Provided values

defaultEstimationParameters :: EstimationParametersSource

Default set of parameters to use (see reference for details).

linearRegressionError :: ErrorFunctionSource

linearRegression error function is the square of the vertical distance of a point from the line.

linearRegressionTLSError :: ErrorFunctionSource

linearRegressionTLS error function is the square of the total distance of a point from the line.

Helper functions

converge :: EstimationParameters -> Sample -> Sample -> EstimatedRelation -> EstimatedRelationSource

Calculate the optimal (local minimum) estimate based on an initial estimate. The local minimum may not be the global (a.k.a. best) estimate but starting from enough different initial estimates should yield the global optimum eventually.