maxent-learner-hw-0.2.0: Hayes and Wilson's maxent learning algorithm for phonotactic grammars.

Copyright© 2016-2017 George Steel and Peter Jurgec
LicenseGPL-2+
Maintainergeorge.steel@gmail.com
Safe HaskellNone
LanguageHaskell2010

Text.PhonotacticLearner.PhonotacticConstraints.FileFormats

Description

Functions for saving and loading lexicons and ClassGlob constraint grammars in standard formats.

Synopsis

Documentation

segmentFiero Source #

Arguments

:: Set String

All possible segments

-> String

Raw text

-> [String]

Segmented text segmentFiero [] = error "Empty segment list."

Given a set of possible segments and a string, break a string into segments. Uses the rules in Fiero orthography (a phonetic writing system using ASCII characters) where the longest possible match is always taken and apostrophes are used as a digraph break.

joinFiero Source #

Arguments

:: Set String

All possible segments

-> [String]

Segmented text

-> String

Raw text

Joins segments together using Fiero rules. Inserts apostrophes where necerssary.

data LexRow Source #

Structure for reperesenting lexicon entries

Constructors

LexRow [String] Int 

parseWordlist :: Set String -> Text -> [LexRow] Source #

Parse a lexicon from a file. Segmentation of a word uses fiero rules (which will also decode space-separated segments and single-character segments). Words may optionally be followed by a tab character and an integer indicating frequency (1 by default).

collateWordlist :: Set String -> Text -> [LexRow] Source #

Collate a list of words and frequencies from raw phonetic text.

serWordlist :: Set String -> [LexRow] -> Text Source #

Serializes a list of words and frequerncies to a string for decoding with parseWordlist. Connects segments using Fiero rules.

serWordlistSpaced :: [LexRow] -> Text Source #

Serializes a list of words and frequerncies to a string for decoding with parseWordlist. Puts spaces between segments.

data PhonoGrammar Source #

Reperesentation of a ClassGlob grammar.

Constructors

PhonoGrammar 

Fields

parseGrammar :: Text -> Maybe PhonoGrammar Source #

Parse a grammar from a file. Blank lines ans lines begining with # are ignored. The first regular line must contain a list of (Length,Int) pairs and subsequent lines must contain a weight followed by a ClassGlob.

serGrammarRules :: [ClassGlob] -> Vec -> Text Source #

Serialize a grammar without length distribution

serGrammar :: PhonoGrammar -> Text Source #

Serialize a grammar including length distribution