concraft-0.12.1: Morphological disambiguation based on constrained CRFs

Safe HaskellNone
LanguageHaskell98

NLP.Concraft

Contents

Synopsis

Model

data Concraft Source #

Concraft data.

Constructors

Concraft 
Instances
Binary Concraft Source # 
Instance details

Defined in NLP.Concraft

Methods

put :: Concraft -> Put #

get :: Get Concraft #

putList :: [Concraft] -> Put #

saveModel :: FilePath -> Concraft -> IO () Source #

Save model in a file. Data is compressed using the gzip format.

loadModel :: FilePath -> IO Concraft Source #

Load model from a file.

Tagging

tag :: Word w => Concraft -> Sent w Tag -> [(Set Tag, Tag)] Source #

Tag sentence using the model. In your code you should probably use your analysis function, translate results into a container of Sentences, evaluate tag on each sentence and embed the tagging results into the morphosyntactic structure of your own.

The function returns guessing results as fst elements of the output pairs and disambiguation results as snd elements of the corresponding pairs.

marginals :: Word w => Concraft -> Sent w Tag -> [WMap Tag] Source #

Determine marginal probabilities corresponding to individual tags w.r.t. the disambiguation model. Since the guessing model is used first, the resulting weighted maps corresponding to OOV words may contain tags not present in the input sentence.

Training

train Source #

Arguments

:: (Word w, FromJSON w, ToJSON w) 
=> Tagset

A morphosyntactic tagset to which Tags of the training and evaluation input data must correspond.

-> Int

How many tags is the guessing model supposed to produce for a given OOV word? It will be used (see guessSent) on both training and evaluation input data prior to the training of the disambiguation model.

-> TrainConf

Training configuration for the guessing model.

-> TrainConf

Training configuration for the disambiguation model.

-> IO [Sent w Tag]

Training dataset. This IO action will be executed a couple of times, so consider using lazy IO if your dataset is big.

-> IO [Sent w Tag]

Evaluation dataset IO action. Consider using lazy IO if your dataset is big.

-> IO Concraft 

Train the Concraft model. No reanalysis of the input data will be performed.

The FromJSON and ToJSON instances are used to store processed input data in temporary files on a disk.

reAnaTrain Source #

Arguments

:: (Word w, FromJSON w, ToJSON w) 
=> Tagset

A morphosyntactic tagset to which Tags of the training and evaluation input data must correspond.

-> Analyse w Tag

Analysis function. It will be used to reanalyse input dataset.

-> Int

How many tags is the guessing model supposed to produce for a given OOV word? It will be used (see guessSent) on both training and evaluation input data prior to the training of the disambiguation model.

-> TrainConf

Training configuration for the guessing model.

-> TrainConf

Training configuration for the disambiguation model.

-> IO [SentO w Tag]

Training dataset. This IO action will be executed a couple of times, so consider using lazy IO if your dataset is big.

-> IO [SentO w Tag]

Evaluation dataset IO action. Consider using lazy IO if your dataset is big.

-> IO Concraft 

Train the Concraft model after dataset reanalysis.

The FromJSON and ToJSON instances are used to store processed input data in temporary files on a disk.

Pruning

prune :: Double -> Concraft -> Concraft Source #

Prune disambiguation model: discard model features with absolute values (in log-domain) lower than the given threshold.