statistics-dirichlet-0.6.1: Functions for working with Dirichlet densities and mixtures on vectors.

Safe HaskellSafe-Infered




This module re-exports functions from Math.Statistics.Dirichlet.Mixture and Math.Statistics.Dirichlet.Options in a more digestable way. Since this library is under-documented, I recommend reading the documentation of the symbols re-exported here.

This module does not use Math.Statistics.Dirichlet.Density in any way. If you don't need mixtures then you should probably use that module directly since it's faster and more reliable (less magic happens there).


Data types (re-exported)

data DirichletMixture Source

A Dirichlet mixture.




dmWeights :: !(Vector Double)

Weights of each density.

dmDensities :: !Matrix

Values of all parameters of all densities. This matrix has length dmWeights rows.

empty :: Int -> Int -> Double -> DirichletMixtureSource

empty q n x is an "empty" Dirichlet mixture with q components and n parameters. Each component has size n, weight inversely proportional to its index and all alphas set to x.

type Component = (Double, [Double])Source

A list representation of a component of a Dirichlet mixture. Used by fromList and toList only.

fromList :: [Component] -> DirichletMixtureSource

fromList xs constructs a Dirichlet mixture from a non-empty list of components. Each component has a weight and a list of alpha values. The weights sum to 1, all lists must have the same number of values and every number must be non-negative. None of these preconditions are verified.

toList :: DirichletMixture -> [Component]Source

toList dm is the inverse of fromList, constructs a list of components from a Dirichlet mixture. There are no error conditions and toList . fromList == id.

Options (re-exported)

type TrainingVector = Vector DoubleSource

A vector used for deriving the parameters of a Dirichlet density or mixture.

type TrainingVectors = Vector TrainingVectorSource

A vector of training vectors. This is the only vector that is not unboxed (for obvious reasons).

newtype StepSize Source

Usually denoted by lowercase greek letter eta (η), size of each step in the gradient. Should be greater than zero and much less than one.


Step Double 

type Delta = DoubleSource

Maximum difference between costs to consider that the process converged.

data Predicate Source

Predicate specifying when the training should be over.




maxIter :: !Int

Maximum number of iterations.

minDelta :: !Delta

Minimum delta to continue iterating. This is invariant of deltaSteps, which means that if deltaSteps is 2 then minDelta will be considered twice bigger to account for the different deltaSteps.

deltaSteps :: !Int

How many estimation steps should be done before recalculating the delta. If deltaSteps is 1 then it will be recalculated on every step.

maxWeightIter :: !Int

Maximum number of iterations on each weight step.

jumpDelta :: !Delta

Used only when calculating mixtures. When the delta drops below this cutoff the computation changes from estimating the alphas to estimatating the weights and vice-versa. Should be greater than minDelta.

data Reason Source

Reason why the derivation was over.



The difference between applications of the cost function dropped below the minimum delta. In other words, it coverged.


The maximum number of iterations was reached while the delta was still greater than the minimum delta.

CG Result

CG_DESCENT returned this result, which brought the derivation process to a halt.

data Result a Source

Result of a deriviation.




reason :: !Reason

Reason why the derivation was over.

iters :: !Int

Number of iterations spent.

lastDelta :: !Delta

Last difference between costs.

lastCost :: !Double

Last cost (i.e. the cost of the result).

result :: !a

Result obtained.


Eq a => Eq (Result a) 
Read a => Read (Result a) 
Show a => Show (Result a) 
NFData a => NFData (Result a) 

Training data (re-exported)

data TrainingData Source

Pre-processed training vectors (see prepareTraining).

prepareTraining :: TrainingVectors -> TrainingDataSource

Prepares training vectors to be used as training data. Anything that depends only on the training vectors is precalculated here.

We also try to find columns where all training vectors are zero. Those columns are removed from the derivation process and every component will have zero value on that column. Note that at least one column should have non-zero training vectors.

Functions (re-exported)

derive :: DirichletMixture -> Predicate -> StepSize -> TrainingData -> Result DirichletMixtureSource

Derive a Dirichlet mixture using a maximum likelihood method as described by Karplus et al (equation 25) using CG_DESCENT method by Hager and Zhang (see Numeric.Optimization.Algorithms.HagerZhang05). All training vectors should have the same length, however this is not verified.

cost :: TrainingData -> DirichletMixture -> DoubleSource

Cost function for deriving a Dirichlet mixture (equation 18). This function is minimized by derive. Calculated using (17) and (54).