concraft-0.9.2: Morphological disambiguation based on constrained CRFs

Safe HaskellNone

NLP.Concraft

Contents

Synopsis

Model

data Concraft Source

Concraft data.

Constructors

Concraft 

Instances

saveModel :: FilePath -> Concraft -> IO ()Source

Save model in a file. Data is compressed using the gzip format.

loadModel :: FilePath -> IO ConcraftSource

Load model from a file.

Tagging

tag :: Word w => Concraft -> Sent w Tag -> [(Set Tag, Tag)]Source

Tag sentence using the model. In your code you should probably use your analysis function, translate results into a container of Sentences, evaluate tag on each sentence and embed the tagging results into the morphosyntactic structure of your own.

The function returns guessing results as fst elements of the output pairs and disambiguation results as snd elements of the corresponding pairs.

marginals :: Word w => Concraft -> Sent w Tag -> [WMap Tag]Source

Determine marginal probabilities corresponding to individual tags w.r.t. the disambiguation model. Since the guessing model is used first, the resulting weighted maps corresponding to OOV words may contain tags not present in the input sentence.

Training

trainSource

Arguments

:: (Word w, FromJSON w, ToJSON w) 
=> Tagset

A morphosyntactic tagset to which Tags of the training and evaluation input data must correspond.

-> Int

How many tags is the guessing model supposed to produce for a given OOV word? It will be used (see guessSent) on both training and evaluation input data prior to the training of the disambiguation model.

-> TrainConf

Training configuration for the guessing model.

-> TrainConf

Training configuration for the disambiguation model.

-> IO [Sent w Tag]

Training dataset. This IO action will be executed a couple of times, so consider using lazy IO if your dataset is big.

-> IO [Sent w Tag]

Evaluation dataset IO action. Consider using lazy IO if your dataset is big.

-> IO Concraft 

Train the Concraft model. No reanalysis of the input data will be performed.

The FromJSON and ToJSON instances are used to store processed input data in temporary files on a disk.

reAnaTrainSource

Arguments

:: (Word w, FromJSON w, ToJSON w) 
=> Tagset

A morphosyntactic tagset to which Tags of the training and evaluation input data must correspond.

-> Analyse w Tag

Analysis function. It will be used to reanalyse input dataset.

-> Int

How many tags is the guessing model supposed to produce for a given OOV word? It will be used (see guessSent) on both training and evaluation input data prior to the training of the disambiguation model.

-> TrainConf

Training configuration for the guessing model.

-> TrainConf

Training configuration for the disambiguation model.

-> IO [SentO w Tag]

Training dataset. This IO action will be executed a couple of times, so consider using lazy IO if your dataset is big.

-> IO [SentO w Tag]

Evaluation dataset IO action. Consider using lazy IO if your dataset is big.

-> IO Concraft 

Train the Concraft model after dataset reanalysis.

The FromJSON and ToJSON instances are used to store processed input data in temporary files on a disk.

Pruning

prune :: Double -> Concraft -> ConcraftSource

Prune disambiguation model: discard model features with absolute values (in log-domain) lower than the given threshold.