concraft-pl-0.7.4: Morphological tagger for Polish

Safe HaskellNone
LanguageHaskell98

NLP.Concraft.Polish

Contents

Synopsis

Model

data Concraft :: *

Concraft data.

Instances

saveModel :: FilePath -> Concraft -> IO ()

Save model in a file. Data is compressed using the gzip format.

loadModel :: FilePath -> IO Concraft

Load model from a file.

Tagging

tag :: Concraft -> Sent Tag -> Sent Tag Source

Tag the analysed sentence.

marginals :: Concraft -> Sent Tag -> Sent Tag Source

Tag the sentence with marginal probabilities.

Analysis

macaPar :: MacaPool -> Text -> IO [Sent Tag] Source

Analyse paragraph with Maca. The function is thread-safe. As a pre-processing step, all non-printable characters are removed from the input (based on empirical observations, Maca behaves likewise).

Training

data TrainConf Source

Training configuration.

Constructors

TrainConf 

Fields

tagset :: Tagset

Tagset.

sgdArgs :: SgdArgs

SGD parameters.

reana :: Bool

Perform reanalysis.

onDisk :: Bool

Store SGD dataset on disk.

guessNum :: Int

Numer of guessed tags for each word.

r0 :: R0T

r0T parameter.

train Source

Arguments

:: TrainConf 
-> IO [SentO Tag]

Training data

-> IO [SentO Tag]

Evaluation data

-> IO Concraft 

Train concraft model. TODO: It should be possible to supply the two training procedures with different SGD arguments.

Pruning

prune :: Double -> Concraft -> Concraft

Prune disambiguation model: discard model features with absolute values (in log-domain) lower than the given threshold.