Safe Haskell | None |
---|
- linearRegression :: Sample -> Sample -> (Double, Double)
- linearRegressionRSqr :: Sample -> Sample -> (Double, Double, Double)
- linearRegressionTLS :: Sample -> Sample -> (Double, Double)
- correl :: Sample -> Sample -> Double
- covar :: Sample -> Sample -> Double
- robustFit :: MonadRandom m => EstimationParameters -> Sample -> Sample -> m EstimatedRelation
- nonRandomRobustFit :: EstimationParameters -> Sample -> Sample -> EstimatedRelation
- robustFitRSqr :: MonadRandom m => EstimationParameters -> Sample -> Sample -> m (EstimatedRelation, Double)
- data EstimationParameters = EstimationParameters {
- outlierFraction :: !Double
- shortIterationSteps :: !Int
- maxSubsetsNum :: !Int
- groupSubsets :: !Int
- mediumSetSize :: !Int
- largeSetSize :: !Int
- estimator :: Estimator
- errorFunction :: ErrorFunction
- type ErrorFunction = EstimatedRelation -> (Double, Double) -> Double
- type Estimator = Sample -> Sample -> EstimatedRelation
- type EstimatedRelation = (Double, Double)
- defaultEstimationParameters :: EstimationParameters
- linearRegressionError :: ErrorFunction
- linearRegressionTLSError :: ErrorFunction
- converge :: EstimationParameters -> Sample -> Sample -> EstimatedRelation -> EstimatedRelation
Simple linear regression functions
linearRegression :: Sample -> Sample -> (Double, Double)Source
Simple linear regression between 2 samples. Takes two vectors Y={yi} and X={xi} and returns (alpha, beta) such that Y = alpha + beta*X
linearRegressionRSqr :: Sample -> Sample -> (Double, Double, Double)Source
Simple linear regression between 2 samples. Takes two vectors Y={yi} and X={xi} and returns (alpha, beta, r*r) such that Y = alpha + beta*X and where r is the Pearson product-moment correlation coefficient
linearRegressionTLS :: Sample -> Sample -> (Double, Double)Source
Total Least Squares (TLS) linear regression.
Assumes x-axis values (and not just y-axis values) are random variables and that both variables have similar distributions.
interface is the same as linearRegression
.
related functions
Robust linear regression
robustFit :: MonadRandom m => EstimationParameters -> Sample -> Sample -> m EstimatedRelationSource
Finding a robust fit linear estimate between two samples. The procedure requires randomization and is based on the procedure described in the reference.
nonRandomRobustFit :: EstimationParameters -> Sample -> Sample -> EstimatedRelationSource
A wrapper that executes robustFit
using a default random generator (meaning it is only pseudo-random)
robustFitRSqr :: MonadRandom m => EstimationParameters -> Sample -> Sample -> m (EstimatedRelation, Double)Source
Robust fit yielding also the R-square value of the "clean" dataset.
Related types
data EstimationParameters Source
The robust fit algorithm used has various parameters that can be specified using the EstimationParameters
record.
EstimationParameters | |
|
type ErrorFunction = EstimatedRelation -> (Double, Double) -> DoubleSource
An ErrorFunction
is a function that computes the error of a given point from an estimate. This module provides two error functions correspoinding to the two Estimator
functions it defines:
- Vertical distance squared via
linearRegressionError
that should be used withlinearRegression
- Total distance squared vie
linearRegressionTLSError
that should be used withlinearRegressionTLS
type Estimator = Sample -> Sample -> EstimatedRelationSource
An Estimator
is a function that generates an estimated linear regression based on 2 samples. This module provides two estimator functions:
linearRegression
and linearRegressionTLS
type EstimatedRelation = (Double, Double)Source
An estimated linear relation between 2 samples is (alpha,beta) such that Y = alpha + beta*X.
Provided values
defaultEstimationParameters :: EstimationParametersSource
Default set of parameters to use (see reference for details).
linearRegressionError :: ErrorFunctionSource
linearRegression error function is the square of the vertical distance of a point from the line.
linearRegressionTLSError :: ErrorFunctionSource
linearRegressionTLS error function is the square of the total distance of a point from the line.
Helper functions
converge :: EstimationParameters -> Sample -> Sample -> EstimatedRelation -> EstimatedRelationSource
Calculate the optimal (local minimum) estimate based on an initial estimate. The local minimum may not be the global (a.k.a. best) estimate but starting from enough different initial estimates should yield the global optimum eventually.
References
- Two Dimensional Euclidean Regression (Stein) http://www.dspcsp.com/pubs/euclreg.pdf
- Computing LTS Regression For Large Data Sets (Rousseeuw and Driessen) http://agoras.ua.ac.be/abstract/Comlts99.htm