concraft-0.9.1: Morphological disambiguation based on constrained CRFs

Safe HaskellNone

NLP.Concraft

Contents

Synopsis

Model

data Concraft Source

Concraft data.

Constructors

Concraft 

Instances

saveModel :: FilePath -> Concraft -> IO ()Source

Save model in a file. Data is compressed using the gzip format.

loadModel :: FilePath -> IO ConcraftSource

Load model from a file.

Tagging

tag :: Word w => Concraft -> Sent w Tag -> [(Set Tag, Tag)]Source

Tag sentence using the model. In your code you should probably use your analysis function, translate results into a container of Sentences, evaluate tag on each sentence and embed the tagging results into the morphosyntactic structure of your own.

The function returns guessing results as fst elements of the output pairs and disambiguation results as snd elements of the corresponding pairs.

marginals :: Word w => Concraft -> Sent w Tag -> [WMap Tag]Source

Determine marginal probabilities corresponding to individual tags w.r.t. the disambiguation model. Since the guessing model is used first, the resulting weighted maps may contain tags not present in the input sentence.

Training

trainSource

Arguments

:: (Word w, FromJSON w, ToJSON w) 
=> Tagset

Tagset

-> Int

Numer of guessed tags for each word

-> TrainConf

Guessing model training configuration

-> TrainConf

Disamb model training configuration

-> IO [Sent w Tag]

Training data

-> IO [Sent w Tag]

Evaluation data

-> IO Concraft 

Train guessing and disambiguation models. No reanalysis will be performed.

reAnaTrainSource

Arguments

:: (Word w, FromJSON w, ToJSON w) 
=> Tagset

Tagset

-> Analyse w Tag

Analysis function

-> Int

Numer of guessed tags for each word

-> TrainConf

Guessing model training configuration

-> TrainConf

Disamb model training configuration

-> IO [SentO w Tag]

Training data

-> IO [SentO w Tag]

Evaluation data

-> IO Concraft 

Train guessing and disambiguation models after dataset reanalysis.

Pruning

prune :: Double -> Concraft -> ConcraftSource

Prune disambiguation model: discard model features with absolute values (in log-domain) lower than the given threshold.