statistics-0.10.1.0: A library of statistical types, data, and functions

Portability portable experimental bos@serpentine.com

Statistics.Sample.KernelDensity.Simple

Description

Kernel density estimation code, providing non-parametric ways to estimate the probability density function of a sample.

The techniques used by functions in this module are relatively fast, but they generally give inferior results to the KDE function in the main `Statistics.KernelDensity` module (due to the oversmoothing documented for `bandwidth` below).

Synopsis

# Simple entry points

Arguments

 :: Vector v Double => Int Number of points at which to estimate -> v Double Data sample -> (Points, Vector Double)

Simple Epanechnikov kernel density estimator. Returns the uniformly spaced points from the sample range at which the density function was estimated, and the estimates at those points.

Arguments

 :: Vector v Double => Int Number of points at which to estimate -> v Double Data sample -> (Points, Vector Double)

Simple Gaussian kernel density estimator. Returns the uniformly spaced points from the sample range at which the density function was estimated, and the estimates at those points.

# Building blocks

## Choosing points from a sample

newtype Points Source

Points from the range of a `Sample`.

Constructors

 Points FieldsfromPoints :: Vector Double

Instances

 Eq Points Show Points

Arguments

 :: Vector v Double => Int Number of points to select, n -> Double Sample bandwidth, h -> v Double Input data -> Points

Choose a uniform range of points at which to estimate a sample's probability density function.

If you are using a Gaussian kernel, multiply the sample's bandwidth by 3 before passing it to this function.

If this function is passed an empty vector, it returns values of positive and negative infinity.

## Bandwidth estimation

The width of the convolution kernel used.

bandwidth :: Vector v Double => (Double -> Bandwidth) -> v Double -> BandwidthSource

Compute the optimal bandwidth from the observed data for the given kernel.

This function uses an estimate based on the standard deviation of a sample (due to Deheuvels), which performs reasonably well for unimodal distributions but leads to oversmoothing for more complex ones.

Bandwidth estimator for an Epanechnikov kernel.

Bandwidth estimator for a Gaussian kernel.

## Kernels

type Kernel = Double -> Double -> Double -> Double -> DoubleSource

The convolution kernel. Its parameters are as follows:

• Scaling factor, 1/nh
• Bandwidth, h
• A point at which to sample the input, p
• One sample value, v

Epanechnikov kernel for probability density function estimation.

Gaussian kernel for probability density function estimation.

## Low-level estimation

Arguments

 :: Vector v Double => Kernel Kernel function -> Bandwidth Bandwidth, h -> v Double Sample data -> Points Points at which to estimate -> Vector Double

Kernel density estimator, providing a non-parametric way of estimating the PDF of a random variable.

Arguments

 :: Vector v Double => (Double -> Double) Bandwidth function -> Kernel Kernel function -> Double Bandwidth scaling factor (3 for a Gaussian kernel, 1 for all others) -> Int Number of points at which to estimate -> v Double sample data -> (Points, Vector Double)

A helper for creating a simple kernel density estimation function with automatically chosen bandwidth and estimation points.

# References

• Deheuvels, P. (1977) Estimation non paramtrique de la densit par histogrammes gnraliss. Mhttp:archive.numdam.orgarticleRSA_1977__25_3_5_0.pdf>