Statistics.EM.GMM

Description

EM for a mixture of k one-dimensional Gaussians. This procedure tends to produce NaNs whenever more Gaussians are being selected than are called for. This is rather convenient. ;-)

TODO cite paper

Synopsis

Documentation

emFix :: Data -> Theta -> ThetaSource

Find an optimal set of parameters Theta. The additional takeWhile (not . isnan . fst) makes sure that in cases of overfitting, emFix does terminate. Due to the way we check and take, in case of NaNs, the returned values will be NaNs (checking fst, returning snd).

emStarts :: Int -> Data -> ThetaSource

Given a set of Data and a number k of Gaussian peaks, try to find the optimal GMM. This is done by trying each data point as mu for each Gaussian. Note that this will be rather slow for larger k (larger than, say 2 or 3). In that case, a random-drawing method should be chosen.

TODO xs' -> xs sorting makes me cry!