regression-simple-0.2.1: Simple linear and quadratic regression
Safe HaskellSafe-Inferred
LanguageHaskell2010

Math.Regression.Simple

Description

regression-simple provides (hopefully) simple regression functions.

The linear :: Foldable f => (a -> (Double, Double)) -> f a -> V2 is the simplest one.

There are variants with weights, y-errors, and x and y-errors. In addition, package includes Levenberg–Marquardt algorithm implementation to fit arbitrary functions (with one, two or three parameters), as long as you can give their partial derivatives as well (ad package is handy for that).

For multiple independent variable ordinary least squares or Levenberg-Marquard with functions with > 3 parameter you should look elsewhere.

Package has been tested to return similar results as fit functionality in gnuplot (L-M doesn't always converge to exactly the same points in parameter space).

Synopsis

Linear regression

linear :: Foldable f => (a -> (Double, Double)) -> f a -> V2 Source #

Linear regression.

>>> let input1 = [(0, 1), (1, 3), (2, 5)]
>>> PP $ linear id input1
V2 2.0000 1.00000
>>> let input2 = [(0.1, 1.2), (1.3, 3.1), (1.9, 4.9), (3.0, 7.1), (4.1, 9.0)]
>>> PP $ linear id input2
V2 2.0063 0.88685

linearFit :: Foldable f => (a -> (Double, Double)) -> f a -> Fit V2 Source #

Like linear but returns complete Fit.

To get confidence intervals you should multiply the errors by quantile (studentT (n - 2)) ci' from statistics package or similar. For big n using value 1 gives 68% interval and using value 2 gives 95% confidence interval. See https://en.wikipedia.org/wiki/Student%27s_t-distribution#Table_of_selected_values (quantile calculates one-sided values, you need two-sided, thus adjust ci value).

The first input is perfect fit:

>>> let fit = linearFit id input1
>>> PP fit
Fit (V2 2.0000 1.00000) (V2 0.00000 0.00000) 1 0.00000

The second input is quite good:

>>> PP $ linearFit id input2
Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962

But the third input isn't so much, standard error of a slope parameter is 20%.

>>> let input3 = [(0, 2), (1, 3), (2, 6), (3, 11)]
>>> PP $ linearFit id input3
Fit (V2 3.0000 1.00000) (V2 0.63246 1.1832) 2 4.0000

linearWithWeights :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V2 Source #

Weighted linear regression.

>>> let input2 = [(0.1, 1.2), (1.3, 3.1), (1.9, 4.9), (3.0, 7.1), (4.1, 9.0)]
>>> PP $ linearFit id input2
Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962
>>> let input2w = [(0.1, 1.2, 1), (1.3, 3.1, 1), (1.9, 4.9, 1), (3.0, 7.1, 1/4), (4.1, 9.0, 1/4)]
>>> PP $ linearWithWeights id input2w
Fit (V2 2.0060 0.86993) (V2 0.12926 0.23696) 3 0.22074

linearWithYerrors :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V2 Source #

Linear regression with y-errors.

>>> let input2y = [(0.1, 1.2, 0.12), (1.3, 3.1, 0.31), (1.9, 4.9, 0.49), (3.0, 7.1, 0.71), (4.1, 9.0, 1.9)]
>>> let fit = linearWithYerrors id input2y
>>> PP fit
Fit (V2 1.9104 0.98302) (V2 0.13006 0.10462) 3 2.0930

When we know actual y-errors, we can calculate the Q-value using statistics package:

>>> import qualified Statistics.Distribution            as S
>>> import qualified Statistics.Distribution.ChiSquared as S
>>> S.cumulative (S.chiSquared (fitNDF fit)) (fitWSSR fit)
0.446669639443138

or using math-functions

>>> import Numeric.SpecFunctions (incompleteGamma)
>>> incompleteGamma (fromIntegral (fitNDF fit) / 2) (fitWSSR fit / 2)
0.446669639443138

It is not uncommon to deem acceptable on equal terms any models with, say, Q > 0.001. If Q is too large, too near to 1 is most likely caused by overestimating the y-errors.

linearWithXYerrors Source #

Arguments

:: Foldable f 
=> (a -> (Double, Double, Double, Double))

\(x_i, y_i, \delta x_i, \delta y_i\)

-> f a

data

-> NonEmpty (Fit V2) 

Iterative linear regression with x and y errors.

Orear, J. (1982). Least squares when both variables have uncertainties. American Journal of Physics, 50(10), 912–916. doi:10.1119/1.12972

>>> let input2xy = [(0.1, 1.2, 0.01, 0.12), (1.3, 3.1, 0.13, 0.31), (1.9, 4.9, 0.19, 0.49), (3.0, 7.1, 0.3, 0.71), (4.1, 9.0, 0.41, 1.9)]
>>> let fit :| fits = linearWithXYerrors id input2xy

First fit is done using linearWithYerrors:

>>> PP fit
Fit (V2 1.9104 0.98302) (V2 0.13006 0.10462) 3 2.0930

After that the effective variance is used to refine the fit, just a few iterations is often enough:

>>> PP $ take 3 fits
Fit (V2 1.9092 0.99251) (V2 0.12417 0.08412) 3 1.2992
Fit (V2 1.9092 0.99250) (V2 0.12418 0.08414) 3 1.2998
Fit (V2 1.9092 0.99250) (V2 0.12418 0.08414) 3 1.2998

Step-by-step interface

linearFit' :: LinRegAcc -> Fit V2 Source #

Calculate linear fit from LinRegAcc.

data LinRegAcc Source #

Linear regression accumulator.

Constructors

LinRegAcc 

Fields

Instances

Instances details
Show LinRegAcc Source # 
Instance details

Defined in Math.Regression.Simple

NFData LinRegAcc Source # 
Instance details

Defined in Math.Regression.Simple

Methods

rnf :: LinRegAcc -> () #

addLinReg Source #

Arguments

:: LinRegAcc 
-> Double

x

-> Double

y

-> LinRegAcc 

Add a point to linreg accumulator.

addLinRegW Source #

Arguments

:: LinRegAcc 
-> Double

x

-> Double

y

-> Double

w

-> LinRegAcc 

Add a weighted point to linreg accumulator.

Quadratic regression

quadratic :: Foldable f => (a -> (Double, Double)) -> f a -> V3 Source #

Quadratic regression.

>>> let input1 = [(0, 1), (1, 3), (2, 5)]
>>> quadratic id input1
V3 0.0 2.0 1.0
>>> let input2 = [(0.1, 1.2), (1.3, 3.1), (1.9, 4.9), (3.0, 7.1), (4.1, 9.0)]
>>> PP $ quadratic id input2
V3 (-0.00589) 2.0313 0.87155
>>> let input3 = [(0, 2), (1, 3), (2, 6), (3, 11)]
>>> PP $ quadratic id input3
V3 1.00000 0.00000 2.0000

quadraticFit :: Foldable f => (a -> (Double, Double)) -> f a -> Fit V3 Source #

Like quadratic but returns complete Fit.

>>> PP $ quadraticFit id input2
Fit (V3 (-0.00589) 2.0313 0.87155) (V3 0.09281 0.41070 0.37841) 2 0.25910
>>> PP $ quadraticFit id input3
Fit (V3 1.00000 0.00000 2.0000) (V3 0.00000 0.00000 0.00000) 1 0.00000

quadraticWithWeights :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V3 Source #

Weighted quadratic regression.

>>> let input2w = [(0.1, 1.2, 1), (1.3, 3.1, 1), (1.9, 4.9, 1), (3.0, 7.1, 1/4), (4.1, 9.0, 1/4)]
>>> PP $ quadraticWithWeights id input2w
Fit (V3 0.02524 1.9144 0.91792) (V3 0.10775 0.42106 0.35207) 2 0.21484

quadraticWithYerrors :: Foldable f => (a -> (Double, Double, Double)) -> f a -> Fit V3 Source #

Quadratic regression with y-errors.

>>> let input2y = [(0.1, 1.2, 0.12), (1.3, 3.1, 0.31), (1.9, 4.9, 0.49), (3.0, 7.1, 0.71), (4.1, 9.0, 0.9)]
>>> PP $ quadraticWithYerrors id input2y
Fit (V3 0.08776 1.6667 1.0228) (V3 0.10131 0.31829 0.11917) 2 1.5398

quadraticWithXYerrors Source #

Arguments

:: Foldable f 
=> (a -> (Double, Double, Double, Double))

\(x_i, y_i, \delta x_i, \delta y_i\)

-> f a

data

-> NonEmpty (Fit V3) 

Iterative quadratic regression with x and y errors.

Orear, J. (1982). Least squares when both variables have uncertainties. American Journal of Physics, 50(10), 912–916. doi:10.1119/1.12972

Step-by-step interface

quadraticFit' :: QuadRegAcc -> Fit V3 Source #

Calculate quadratic fit from QuadRegAcc.

data QuadRegAcc Source #

Quadratic regression accumulator.

Constructors

QuadRegAcc 

Fields

Instances

Instances details
Show QuadRegAcc Source # 
Instance details

Defined in Math.Regression.Simple

NFData QuadRegAcc Source # 
Instance details

Defined in Math.Regression.Simple

Methods

rnf :: QuadRegAcc -> () #

addQuadReg Source #

Arguments

:: QuadRegAcc 
-> Double

x

-> Double

y

-> QuadRegAcc 

Add a point to quadreg accumulator.

addQuadRegW Source #

Arguments

:: QuadRegAcc 
-> Double

x

-> Double

y

-> Double

w

-> QuadRegAcc 

Add a weighted point to quadreg accumulator.

quadRegAccToLin :: QuadRegAcc -> LinRegAcc Source #

Convert QuadRegAcc to LinRegAcc.

Using this we can try quadratic and linear fits with a single data scan.

Levenberg–Marquardt algorithm

One parameter

levenbergMarquardt1 Source #

Arguments

:: Foldable f 
=> (Double -> a -> (Double, Double, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \partial_\beta f(\beta, x_i)\)

-> Double

initial parameter, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit Double)

non-empty list of iteration results

Levenberg–Marquardt for functions with one parameter.

See levenbergMarquardt2 for examples, this is very similar.

For example we can fit \(f = x \mapsto \beta x + 1\), its derivative is \(\partial_\beta f = x \mapsto x\).

>>> let scale a (x, y) = (y, a * x + 1, x)
>>> PP $ NE.last $ levenbergMarquardt1 scale 1 input2
Fit 1.9685 0.04735 4 0.27914

Not bad, but worse then linear fit which fits the intercept point too.

levenbergMarquardt1WithWeights Source #

Arguments

:: Foldable f 
=> (Double -> a -> (Double, Double, Double, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \partial_\beta f(\beta, x_i), w_i\)

-> Double

initial parameter, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit Double)

non-empty list of iteration results

levenbergMarquardt1 with weights.

levenbergMarquardt1WithYerrors Source #

Arguments

:: Foldable f 
=> (Double -> a -> (Double, Double, Double, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \partial_\beta f(\beta, x_i), \delta y_i\)

-> Double

initial parameter, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit Double)

non-empty list of iteration results

levenbergMarquardt1 with Y-errors.

levenbergMarquardt1WithXYerrors Source #

Arguments

:: Foldable f 
=> (Double -> a -> (Double, Double, Double, Double, Double, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \partial_\beta f(\beta, x_i), \partial_x f(\beta, x_i), \delta x_i, \delta y_i\)

-> Double

initial parameter, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit Double)

non-empty list of iteration results

levenbergMarquardt1 with XY-errors.

Two parameters

levenbergMarquardt2 Source #

Arguments

:: Foldable f 
=> (V2 -> a -> (Double, Double, V2))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i)\)

-> V2

initial parameters, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit V2)

non-empty list of iteration results

Levenberg–Marquardt for functions with two parameters.

You can use this sledgehammer to do a a linear fit:

>>> let lin (V2 a b) (x, y) = (y, a * x + b, V2 x 1)

We can then use levenbergMarquardt2 to find a fit:

>>> PP $ levenbergMarquardt2 lin (V2 1 1) input2
Fit (V2 1.00000 1.00000) (V2 1.0175 2.5385) 3 29.470
Fit (V2 1.2782 1.4831) (V2 0.57784 1.4416) 3 9.5041
Fit (V2 1.7254 1.4730) (V2 0.18820 0.46952) 3 1.0082
Fit (V2 1.9796 0.95226) (V2 0.09683 0.24157) 3 0.26687
Fit (V2 2.0060 0.88759) (V2 0.09550 0.23826) 3 0.25962
Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962

This is the same result what linearFit returns:

>>> PP $ linearFit id input2
Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962

Using AD

You can use ad to calculate derivatives for you.

>>> import qualified Numeric.AD.Mode.Reverse.Double as AD

We need a (Traversable) homogenic triple to represent the two parameters and x:

>>> data H3 a = H3 a a a deriving (Functor, Foldable, Traversable)

Then we define a function ad can operate with:

>>> let linearF (H3 a b x) = a * x + b

which we can use to fit the curve in generic way:

>>> let lin' (V2 a b) (x, y) = case AD.grad' linearF (H3 a b x) of (f, H3 da db _f') -> (y, f, V2 da db)
>>> PP $ levenbergMarquardt2 lin' (V2 1 1) input2
Fit (V2 1.00000 1.00000) (V2 1.0175 2.5385) 3 29.470
Fit (V2 1.2782 1.4831) (V2 0.57784 1.4416) 3 9.5041
Fit (V2 1.7254 1.4730) (V2 0.18820 0.46952) 3 1.0082
Fit (V2 1.9796 0.95226) (V2 0.09683 0.24157) 3 0.26687
Fit (V2 2.0060 0.88759) (V2 0.09550 0.23826) 3 0.25962
Fit (V2 2.0063 0.88685) (V2 0.09550 0.23826) 3 0.25962

Non-polynomial example

We can fit other curves too, for example an example from Wikipedia https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm#Example

>>> let rateF (H3 vmax km s) = (vmax * s) / (km + s)
>>> let rateF' (V2 vmax km) (x, y) = case AD.grad' rateF (H3 vmax km x) of (f, H3 vmax' km' _) -> (y, f, V2 vmax' km')
>>> let input = zip [0.038,0.194,0.425,0.626,1.253,2.500,3.740] [0.050,0.127,0.094,0.2122,0.2729,0.2665,0.3317]
>>> PP $ levenbergMarquardt2 rateF' (V2 0.9 0.2) input
Fit (V2 0.90000 0.20000) (V2 0.43304 0.43936) 5 1.4455
Fit (V2 0.61786 0.36360) (V2 0.23270 0.50259) 5 0.26730
Fit (V2 0.39270 0.49787) (V2 0.05789 0.24170) 5 0.01237
Fit (V2 0.36121 0.54525) (V2 0.04835 0.23315) 5 0.00785
Fit (V2 0.36168 0.55530) (V2 0.04880 0.23790) 5 0.00784
Fit (V2 0.36182 0.55620) (V2 0.04885 0.23826) 5 0.00784
Fit (V2 0.36184 0.55626) (V2 0.04885 0.23829) 5 0.00784

We get the same result as in the article: 0.362 and 0.556

The algorithm terminates when a scaling parameter \(\lambda\) becomes larger than 1e20 or smaller than 1e-20, or relative WSSR change is smaller than 1e-10, or sum-of-squared-residuals candidate becomes NaN (i.e. when it would start to produce garbage). You may want to terminate sooner, Numerical Recipes suggest to stop when WSSR decreases by a neglible amount absolutely or fractionally.

levenbergMarquardt2WithWeights Source #

Arguments

:: Foldable f 
=> (V2 -> a -> (Double, Double, V2, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), w_i\)

-> V2

initial parameters, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit V2)

non-empty list of iteration results

levenbergMarquardt2 with weights.

Because levenbergMarquardt2 is an iterative algorithm, not only we can use it to fit curves with known y-errors (levenbergMarquardt2WithYerrors), but also with both x and y-errors (levenbergMarquardt2WithXYerrors).

levenbergMarquardt2WithYerrors Source #

Arguments

:: Foldable f 
=> (V2 -> a -> (Double, Double, V2, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), \delta y_i\)

-> V2

initial parameters, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit V2)

non-empty list of iteration results

levenbergMarquardt2 with Y-errors.

levenbergMarquardt2WithXYerrors Source #

Arguments

:: Foldable f 
=> (V2 -> a -> (Double, Double, V2, Double, Double, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), \partial_x f(\beta, x_i), \delta x_i, \delta y_i\)

-> V2

initial parameters, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit V2)

non-empty list of iteration results

levenbergMarquardt2 with XY-errors.

Three parameters

levenbergMarquardt3 Source #

Arguments

:: Foldable f 
=> (V3 -> a -> (Double, Double, V3))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i)\)

-> V3

initial parameters, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit V3)

non-empty list of iteration results

Levenberg–Marquardt for functions with three parameters.

See levenbergMarquardt2 for examples, this is very similar.

>>> let quad (V3 a b c) (x, y) = (y, a * x * x + b * x + c, V3 (x * x) x 1)
>>> PP $ NE.last $ levenbergMarquardt3 quad (V3 2 2 2) input3
Fit (V3 1.00000 (-0.00000) 2.0000) (V3 0.00000 0.00000 0.00000) 1 0.00000

Same as quadratic fit, just less direct:

>>> PP $ quadraticFit id input3
Fit (V3 1.00000 0.00000 2.0000) (V3 0.00000 0.00000 0.00000) 1 0.00000

levenbergMarquardt3WithWeights Source #

Arguments

:: Foldable f 
=> (V3 -> a -> (Double, Double, V3, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), w_i\)

-> V3

initial parameters, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit V3)

non-empty list of iteration results

levenbergMarquardt3 with weights.

levenbergMarquardt3WithYerrors Source #

Arguments

:: Foldable f 
=> (V3 -> a -> (Double, Double, V3, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), \delta y_i\)

-> V3

initial parameters, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit V3)

non-empty list of iteration results

levenbergMarquardt3 with Y-errors.

levenbergMarquardt3WithXYerrors Source #

Arguments

:: Foldable f 
=> (V3 -> a -> (Double, Double, V3, Double, Double, Double))

\(\beta, d_i \mapsto y_i, f(\beta, x_i), \nabla_\beta f(\beta, x_i), \partial_x f(\beta, x_i), \delta x_i, \delta y_i\)

-> V3

initial parameters, \(\beta_0\)

-> f a

data, \(d\)

-> NonEmpty (Fit V3)

non-empty list of iteration results

levenbergMarquardt3 with XY-errors.

Auxiliary types

data Fit v Source #

Result of a curve fit.

Constructors

Fit 

Fields

Instances

Instances details
Show v => Show (Fit v) Source # 
Instance details

Defined in Math.Regression.Simple

Methods

showsPrec :: Int -> Fit v -> ShowS #

show :: Fit v -> String #

showList :: [Fit v] -> ShowS #

NFData v => NFData (Fit v) Source # 
Instance details

Defined in Math.Regression.Simple

Methods

rnf :: Fit v -> () #

data V2 Source #

2d vector. Strict pair of Doubles.

Also used to represent linear polynomial: V2 a b \(= a x + b\).

Constructors

V2 !Double !Double 

Instances

Instances details
Show V2 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

showsPrec :: Int -> V2 -> ShowS #

show :: V2 -> String #

showList :: [V2] -> ShowS #

NFData V2 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

rnf :: V2 -> () #

Eq V2 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

(==) :: V2 -> V2 -> Bool #

(/=) :: V2 -> V2 -> Bool #

Add V2 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

zero :: V2 Source #

add :: V2 -> V2 -> V2 Source #

Mult M22 V2 V2 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

mult :: M22 -> V2 -> V2 Source #

Mult SM22 V2 V2 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

mult :: SM22 -> V2 -> V2 Source #

Mult Double V2 V2 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

mult :: Double -> V2 -> V2 Source #

data V3 Source #

3d vector. Strict triple of Doubles.

Also used to represent quadratic polynomial: V3 a b c \(= a x^2 + b x + c\).

Constructors

V3 !Double !Double !Double 

Instances

Instances details
Show V3 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

showsPrec :: Int -> V3 -> ShowS #

show :: V3 -> String #

showList :: [V3] -> ShowS #

NFData V3 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

rnf :: V3 -> () #

Eq V3 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

(==) :: V3 -> V3 -> Bool #

(/=) :: V3 -> V3 -> Bool #

Add V3 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

zero :: V3 Source #

add :: V3 -> V3 -> V3 Source #

Mult M33 V3 V3 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

mult :: M33 -> V3 -> V3 Source #

Mult SM33 V3 V3 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

mult :: SM33 -> V3 -> V3 Source #

Mult Double V3 V3 Source # 
Instance details

Defined in Math.Regression.Simple.LinAlg

Methods

mult :: Double -> V3 -> V3 Source #