Portability | portable |
---|---|

Stability | experimental |

Maintainer | bos@serpentine.com |

Safe Haskell | None |

Deprecated: Use Statistics.Sample.KernelDensity instead.

Kernel density estimation code, providing non-parametric ways to estimate the probability density function of a sample.

The techniques used by functions in this module are relatively
fast, but they generally give inferior results to the KDE function
in the main `KernelDensity`

module (due to the
oversmoothing documented for `bandwidth`

below).

- epanechnikovPDF :: Vector v Double => Int -> v Double -> (Points, Vector Double)
- gaussianPDF :: Vector v Double => Int -> v Double -> (Points, Vector Double)
- newtype Points = Points {}
- choosePoints :: Vector v Double => Int -> Double -> v Double -> Points
- type Bandwidth = Double
- bandwidth :: Vector v Double => (Double -> Bandwidth) -> v Double -> Bandwidth
- epanechnikovBW :: Double -> Bandwidth
- gaussianBW :: Double -> Bandwidth
- type Kernel = Double -> Double -> Double -> Double -> Double
- epanechnikovKernel :: Kernel
- gaussianKernel :: Kernel
- estimatePDF :: Vector v Double => Kernel -> Bandwidth -> v Double -> Points -> Vector Double
- simplePDF :: Vector v Double => (Double -> Double) -> Kernel -> Double -> Int -> v Double -> (Points, Vector Double)

# Simple entry points

:: Vector v Double | |

=> Int | Number of points at which to estimate |

-> v Double | Data sample |

-> (Points, Vector Double) |

Simple Epanechnikov kernel density estimator. Returns the uniformly spaced points from the sample range at which the density function was estimated, and the estimates at those points.

:: Vector v Double | |

=> Int | Number of points at which to estimate |

-> v Double | Data sample |

-> (Points, Vector Double) |

Simple Gaussian kernel density estimator. Returns the uniformly spaced points from the sample range at which the density function was estimated, and the estimates at those points.

# Building blocks

## Choosing points from a sample

Points from the range of a `Sample`

.

:: Vector v Double | |

=> Int | Number of points to select, |

-> Double | Sample bandwidth, |

-> v Double | Input data |

-> Points |

Choose a uniform range of points at which to estimate a sample's probability density function.

If you are using a Gaussian kernel, multiply the sample's bandwidth by 3 before passing it to this function.

If this function is passed an empty vector, it returns values of positive and negative infinity.

## Bandwidth estimation

bandwidth :: Vector v Double => (Double -> Bandwidth) -> v Double -> BandwidthSource

Compute the optimal bandwidth from the observed data for the given kernel.

This function uses an estimate based on the standard deviation of a sample (due to Deheuvels), which performs reasonably well for unimodal distributions but leads to oversmoothing for more complex ones.

epanechnikovBW :: Double -> BandwidthSource

Bandwidth estimator for an Epanechnikov kernel.

gaussianBW :: Double -> BandwidthSource

Bandwidth estimator for a Gaussian kernel.

## Kernels

type Kernel = Double -> Double -> Double -> Double -> DoubleSource

The convolution kernel. Its parameters are as follows:

- Scaling factor, 1/
*nh* - Bandwidth,
*h* - A point at which to sample the input,
*p* - One sample value,
*v*

epanechnikovKernel :: KernelSource

Epanechnikov kernel for probability density function estimation.

gaussianKernel :: KernelSource

Gaussian kernel for probability density function estimation.

## Low-level estimation

:: Vector v Double | |

=> Kernel | Kernel function |

-> Bandwidth | Bandwidth, |

-> v Double | Sample data |

-> Points | Points at which to estimate |

-> Vector Double |

Kernel density estimator, providing a non-parametric way of estimating the PDF of a random variable.

:: Vector v Double | |

=> (Double -> Double) | Bandwidth function |

-> Kernel | Kernel function |

-> Double | Bandwidth scaling factor (3 for a Gaussian kernel, 1 for all others) |

-> Int | Number of points at which to estimate |

-> v Double | sample data |

-> (Points, Vector Double) |

A helper for creating a simple kernel density estimation function with automatically chosen bandwidth and estimation points.

# References

- Deheuvels, P. (1977) Estimation non paramétrique de la densité
par histogrammes
généralisés. Mhttp:archive.numdam.org
*article*RSA_1977__25_3_5_0.pdf>