maxent-learner-hw-0.2.1: Hayes and Wilson's maxent learning algorithm for phonotactic grammars.

Copyright© 2016-2017 George Steel and Peter Jurgec
Safe HaskellNone



Library for using DFAs to represent maxent grammars. A mexent grammar consists of a set of constraints, each of which is given a weight, which define a probability diatribution over the set of strings of each given length. The relative probability (maxent score) of each string is equal to the negative exponential of the the total weight of the violated constraints. In this module, such a grammar is reperesented by a DFST which can count violations and a Vec of weights.

This module is mainly concerned with calculating probabilities of samples of text and finding the optimal weights to maximize that probability. There are also functions to randomly generate text using the distribution implied by a mexent grammar.



data Lexicon sigma Source #

Returns the probability (as a logarithm) of a lexicon with aand associated length distribution.





Show sigma => Show (Lexicon sigma) Source # 


showsPrec :: Int -> Lexicon sigma -> ShowS #

show :: Lexicon sigma -> String #

showList :: [Lexicon sigma] -> ShowS #

sortLexicon :: Ord sigma => [([sigma], Int)] -> Lexicon sigma Source #

Convert jumbled list of words and frequencies to sorted lexicon.

lengthCdf :: Lexicon sigma -> Cdf Length Source #

Retrieve length distribution as a Cdf for sampling.

lengthPdf :: Lexicon sigma -> [(Length, Double)] Source #

Retrieve length distribution as a normalized probability mass function. Probabilities add up to 1.

maxentProb :: Vec -> Multicount -> Double Source #

Apply weights to violation counts to get a relative probability.

lexLogProbTotalDeriv Source #


:: Ix sigma 
=> MulticountDFST sigma

DFST counting constraint violations

-> Array Length Int

Length distribution of lexicon

-> Vec

Observed violations in lexicon

-> Vec

Weights to give constraints

-> (Double, Vec)

Probability and its derivative w.r.t. the weights

For a given set of consteraints (reperesented by a DFST counting violations), lexicon (reprersented as length distribution and total violation count, which should be precomputed), and weight vector, returns the absolute probability (as a negative logarithm) and its derivative with respect to the weight vector.

Minimize this to find the optimal weights. To prevent overfitting, this function includes an exponential (L₁) prior equivalent to each constraint being violated once for existing. This intentionally differs from Hayes and Wilson since their gaussian (L₂²) prior had a strong preference for as many simillar constraints as possible as opposed to a single constraint. The exponential prior was chosen since it is independent of splitting constraints into duplicates with the weight distributed between them.

lexLogProbPartialDeriv :: Ix sigma => MulticountDFST sigma -> Array Length Int -> Vec -> Vec -> Vec -> Double Source #

Compute partial derivative of lexicon probability. Much faster equivalent of

lexLogProbPartialDeriv ctr lengths oviols weights dir = dir `innerProd` snd (lexLogProbTotalDeriv ctr lengths oviols weights)

llpOptimizeWeights :: Ix sigma => Array Length Int -> PackedText sigma -> MulticountDFST sigma -> Vec -> Vec Source #

Calculate weights to maximize probability of lexicon. Takes starting position of search which MUST have the correct dimensionality (do not use zero)

sampleWord Source #


:: (RandomGen g, Ix sigma, MonadState g m) 
=> DFST Int sigma Double

Probability DFST

-> Length

Maximum length to greate generator fot

-> Length -> m [sigma]

Random generator taking length and returning action.

Returns a monadic action to sample random words from a probability transducer, which may be generated from a violation counter with (fmap (maxentProb weights) ctr)). For efficiency, evaluate this once then sequence the action repeatedly as intermediate values will be memoized.

sampleWordSalad :: (RandomGen g, Ix sigma, MonadState g m) => DFST Int sigma Double -> Cdf Length -> Int -> m [[sigma]] Source #

Like sampleWord but generates multiple words. Length distribution is specified as a Cdf and number of words to generate.