Copyright | © 2016-2017 George Steel and Peter Jurgec |
---|---|
License | GPL-2+ |
Maintainer | george.steel@gmail.com |
Safe Haskell | None |
Language | Haskell2010 |
Functions for generating sets of candidate constraint sets.
For efficiency, classes are reperesented as (
pairs and
constraints are output as NaturalClass
, SegSet
SegRef
)(
pairs, avoiding the need for repeated conversions and copying of classes.ClassGlob
, ListGlob
SegRef
)
The classesByGenreraity
function enumerates the classes defined by a feature table in a sensible order, removing duplicate descriptions of the same class. The ug functions then take these classes and then combine them imto globs in various ways.
- ngrams :: Int -> [a] -> [[a]]
- classesByGenerality :: FeatureTable sigma -> Int -> [(Int, (NaturalClass, SegSet SegRef))]
- ugSingleClasses :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)]
- ugBigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)]
- ugEdgeClasses :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)]
- ugEdgeBigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)]
- ugLimitedTrigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)]
- ugLongDistance :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)]
- ugHayesWilson :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)]
Documentation
ngrams :: Int -> [a] -> [[a]] Source #
Given a number n and a sequence, returns all subsewuences of length n.
classesByGenerality :: FeatureTable sigma -> Int -> [(Int, (NaturalClass, SegSet SegRef))] Source #
Enumerate all classes (and their inverses) to a certain number of features in descending order of the number of segments the uninverted class contains. Discards duplicates (having the same set of segments).
Each segment is returned as a tripple with the (negated for sorting) numbet of segments in the class, the class label, and the set of segments it contains.
ugSingleClasses :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)] Source #
Given a set of classes, return a set of globs matching those classes.
ugBigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)] Source #
Given a set of classes, return a set pf globs matching class pairs, ordered by total weight. At most one class may be inverted.
ugEdgeClasses :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)] Source #
ugEdgeBigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(ClassGlob, ListGlob SegRef)] Source #
Given a set of classes, return a set pf globs matching class pairs at word boundaries, ordered by total weight. At most one class may be inverted.
ugLimitedTrigrams :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)] Source #
Given a set of classes ansd a smaller subset, return a set of globs matching trigrams of classes from the set where at least one class is contained in the subset. At most one class may be inverted.
ugLongDistance :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)] Source #
Given two sets of classes, return globs matching a pair oc slasses in the first set separated by any number of occurrences of a class in the second set. At most one class may be inverted. At most one class may be inverted. This can lead to fairly large grammar DFAs when multiple such constraints are merged.
ugHayesWilson :: [(Int, (NaturalClass, SegSet SegRef))] -> [(NaturalClass, SegSet SegRef)] -> [(ClassGlob, ListGlob SegRef)] Source #
Combine the above functions (not including ugLongDistance
) into the original candidate generator from the Hayes and Wilson paper.