Portability	portable
Stability	experimental
Maintainer	felipe.lessa@gmail.com
Safe Haskell	Safe-Infered

Math.Statistics.Dirichlet

Contents

Data types (re-exported)
Options (re-exported)
Training data (re-exported)
Functions (re-exported)

Description

This module re-exports functions from Math.Statistics.Dirichlet.Mixture and Math.Statistics.Dirichlet.Options in a more digestable way. Since this library is under-documented, I recommend reading the documentation of the symbols re-exported here.

This module does not use Math.Statistics.Dirichlet.Density in any way. If you don't need mixtures then you should probably use that module directly since it's faster and more reliable (less magic happens there).

Synopsis

Data types (re-exported)

data DirichletMixture Source

A Dirichlet mixture.

Constructors

DM
Fields dmWeights :: !(Vector Double) Weights of each density. dmDensities :: !Matrix Values of all parameters of all densities. This matrix has `length dmWeights` rows.

Instances

Eq DirichletMixture
Read DirichletMixture
Show DirichletMixture
NFData DirichletMixture

empty :: Int -> Int -> Double -> DirichletMixture Source

empty q n x is an "empty" Dirichlet mixture with q components and n parameters. Each component has size n, weight inversely proportional to its index and all alphas set to x.

type Component = (Double, [Double])Source

A list representation of a component of a Dirichlet mixture. Used by fromList and toList only.

fromList :: [Component] -> DirichletMixture Source

fromList xs constructs a Dirichlet mixture from a non-empty list of components. Each component has a weight and a list of alpha values. The weights sum to 1, all lists must have the same number of values and every number must be non-negative. None of these preconditions are verified.

toList :: DirichletMixture -> [Component]Source

toList dm is the inverse of fromList, constructs a list of components from a Dirichlet mixture. There are no error conditions and toList . fromList == id.

Options (re-exported)

type TrainingVector = Vector Double Source

A vector used for deriving the parameters of a Dirichlet density or mixture.

type TrainingVectors = Vector TrainingVector Source

A vector of training vectors. This is the only vector that is not unboxed (for obvious reasons).

newtype StepSize Source

Usually denoted by lowercase greek letter eta (η), size of each step in the gradient. Should be greater than zero and much less than one.

Constructors

Step Double

type Delta = Double Source

Maximum difference between costs to consider that the process converged.

data Predicate Source

Predicate specifying when the training should be over.

Constructors

Pred

Fields

maxIter :: !Int: Maximum number of iterations.
minDelta :: !Delta: Minimum delta to continue iterating. This is invariant of deltaSteps, which means that if deltaSteps is 2 then minDelta will be considered twice bigger to account for the different deltaSteps.
deltaSteps :: !Int: How many estimation steps should be done before recalculating the delta. If deltaSteps is 1 then it will be recalculated on every step.
maxWeightIter :: !Int: Maximum number of iterations on each weight step.
jumpDelta :: !Delta: Used only when calculating mixtures. When the delta drops below this cutoff the computation changes from estimating the alphas to estimatating the weights and vice-versa. Should be greater than minDelta.

Instances

Eq Predicate
Read Predicate
Show Predicate

data Reason Source

Reason why the derivation was over.

Constructors

Delta	The difference between applications of the cost function dropped below the minimum delta. In other words, it coverged.
MaxIter	The maximum number of iterations was reached while the delta was still greater than the minimum delta.
CG Result	CG_DESCENT returned this result, which brought the derivation process to a halt.

Instances

Eq Reason
Read Reason
Show Reason

data Result a Source

Result of a deriviation.

Constructors

Result
Fields reason :: !Reason Reason why the derivation was over. iters :: !Int Number of iterations spent. lastDelta :: !Delta Last difference between costs. lastCost :: !Double Last cost (i.e. the cost of the result). result :: !a Result obtained.

Instances

Eq a => Eq (Result a)
Read a => Read (Result a)
Show a => Show (Result a)
NFData a => NFData (Result a)

Training data (re-exported)

data TrainingData Source

Pre-processed training vectors (see prepareTraining).

Instances

Eq TrainingData
Show TrainingData

prepareTraining :: TrainingVectors -> TrainingData Source

Prepares training vectors to be used as training data. Anything that depends only on the training vectors is precalculated here.

We also try to find columns where all training vectors are zero. Those columns are removed from the derivation process and every component will have zero value on that column. Note that at least one column should have non-zero training vectors.

Functions (re-exported)

derive :: DirichletMixture -> Predicate -> StepSize -> TrainingData -> Result DirichletMixture Source

Derive a Dirichlet mixture using a maximum likelihood method as described by Karplus et al (equation 25) using CG_DESCENT method by Hager and Zhang (see Numeric.Optimization.Algorithms.HagerZhang05). All training vectors should have the same length, however this is not verified.

cost :: TrainingData -> DirichletMixture -> Double Source

Cost function for deriving a Dirichlet mixture (equation 18). This function is minimized by derive. Calculated using (17) and (54).