maxent-learner-hw-0.2.1: Hayes and Wilson's maxent learning algorithm for phonotactic grammars.

Copyright© 2016-2017 George Steel and Peter Jurgec
LicenseGPL-2+
Maintainergeorge.steel@gmail.com
Safe HaskellNone
LanguageHaskell2010

Text.PhonotacticLearner.PhonotacticConstraints.Generators

Description

Functions for generating sets of candidate constraint sets. For basic use, CandidateSettings and CandidateGrammar while the other functions provide more fine-grained control.

The classesByGenreraity function enumerates the classes defined by a feature table in a sensible order, removing duplicate descriptions of the same class. The ug functions then take these classes and then combine them imto globs in various ways. For efficiency, classes are reperesented as (NaturalClass, SegSet SegRef) pairs and constraints are output as (ClassGlob, ListGlob SegRef) pairs, avoiding the need for repeated conversions and copying of classes.

Synopsis

Documentation

data CandidateSettings Source #

Settings for grammar generation

Constructors

CandidateSettings 

Fields

  • useEdges :: Bool

    Allow single classes and bigrams restricted to word boundaries.

  • useTrigrams :: Maybe [Text]

    Allows trigrams as long as at least one class is [] or [±x] where x is in the included list.

  • useBroken :: Maybe [Text]

    Allows long-distance constraints of the form AB+C where A,C are classes and C = [] or [±x] with x in the list.

candidateGrammar :: FeatureTable sigma -> CandidateSettings -> (Int, Int, [(ClassGlob, ListGlob SegRef)]) Source #

Generate a reasonable set of candidate constraints based single classes, bigrams, and the4 additionsl constraint types specified in the settings. First and second return values are the number of classes and candidates in the grammar, and the third is the set of candidates.

ngrams :: Int -> [a] -> [[a]] Source #

Given a number n and a sequence, returns all subsewuences of length n.

classesByGenerality :: FeatureTable sigma -> Int -> [(Int, (NaturalClass, SegSet SegRef))] Source #

Enumerate all classes (and their inverses) to a certain number of features in descending order of the number of segments the uninverted class contains. Discards duplicates (having the same set of segments).

Each segment is returned as a tripple with the (negated for sorting) numbet of segments in the class, the class label, and the set of segments it contains.

ugSingleClasses :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)] Source #

Given a set of classes, return a set of globs matching those classes.

ugBigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)] Source #

Given a set of classes, return a set pf globs matching class pairs, ordered by total weight. At most one class may be inverted.

ugEdgeClasses :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)] Source #

Given a set of classes, return a set of globs matching those globs at word boundaries. At most one class may be inverted.

ugEdgeBigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)] Source #

Given a set of classes, return a set pf globs matching class pairs at word boundaries, ordered by total weight. At most one class may be inverted.

ugLimitedTrigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)] Source #

Given a set of classes ansd a smaller subset, return a set of globs matching trigrams of classes from the set where at least one class is contained in the subset. At most one class may be inverted.

ugLongDistance :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)] Source #

Given two sets of classes, return globs matching a pair oc slasses in the first set separated by any number of occurrences of a class in the second set. At most one class may be inverted. At most one class may be inverted. This can lead to fairly large grammar DFAs when multiple such constraints are merged.

ugHayesWilson :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)] Source #

Combine the above functions (not including ugLongDistance) into the original candidate generator from the Hayes and Wilson paper.