concraft-hr-0.1.0.2: Part-of-speech tagger for Croatian

Safe HaskellNone
LanguageHaskell98

NLP.Concraft.Croatian

Contents

Synopsis

Model

data Concraft :: *

Concraft data.

Instances

saveModel :: FilePath -> Concraft -> IO ()

Save model in a file. Data is compressed using the gzip format.

loadModel :: FilePath -> IO Concraft

Load model from a file.

Tagging

tag :: Concraft -> Sent Word Tag -> [(Set Tag, Tag)] Source

Tag the analysed sentence. it is expected that the result of the tagging is a list of tuples (a,b) the set of possible tags (which were guessed by the guessing model or as a result of analysis from the Analyzer) a, and the disambiguated tag b.

marginals :: Concraft -> Sent Word Tag -> Sent Word Tag Source

Tag the sentence with marginal probabilities. The resulting sentence contains the probabilites of each tag given in the set.

Training

data TrainConf Source

Training configuration.

Constructors

TrainConf 

Fields

tagset :: Tagset

Tagset.

gSgdArgs :: SgdArgs

SGD parameters for the guessing model.

dSgdArgs :: SgdArgs

SGD parameters for the disambiguation model.

reana :: Bool

Perform reanalysis.

onDisk :: Bool

Store SGD dataset on disk.

guessNum :: Int

Number of guessed tags for each word. The guessing model will output the possible tags with their probabilities. The first guessNum, sorted in the descending order by the probability, will form the whole set of possible tags.

r0 :: R0T

r0T parameter.

train Source

Arguments

:: TrainConf

Training configuration

-> IO [Sent Word Tag]

Training data

-> IO [Sent Word Tag]

Evaluation data

-> IO Concraft

Trained model

Begins the training of the model, if evaluation data is supplied, the periodic report will contain an optimistic accuracy of the model. Optimistic in the sense of taking the given correct tag and set of possible tags as the complete results of the analysis. It is possible that the morphosyntactic analyzer used to provide the set of possible tags doesn't include the correct tag in the set. Hence, the optimistic accuracy report.

Pruning

prune :: Double -> Concraft -> Concraft

Prune disambiguation model: discard model features with absolute values (in log-domain) lower than the given threshold.