concraft-0.2.0: Morphosyntactic tagging tool based on constrained CRFs

Safe HaskellNone

NLP.Concraft.Guess

Synopsis

Documentation

type Ox t a = Ox (Word t) Text aSource

The Ox monad specialized to word token type and text observations. TODO: Move to monad-ox package from here and from the nerf library.

type Schema t a = Vector (Word t) -> Int -> Ox t aSource

A schema is a block of the Ox computation performed within the context of the sentence and the absolute sentence position.

type Ob = ([Int], Text)Source

An observation consist of an index (of list type) and an actual observation value.

schematize :: Ord t => Sent t -> Sent Ob tSource

Schematize the input sentence with according to schema rules.

data Guesser t Source

A guesser represented by the conditional random field.

Constructors

Guesser 

Fields

crf :: CRF Ob t

The CRF model

ign :: t

The tag indicating unkown words

Instances

(Ord t, Binary t) => Binary (Guesser t) 

guess :: Ord t => Int -> Guesser t -> Sent t -> [[t]]Source

Determine the k most probable labels for each unknown word in the sentence.

tagFileSource

Arguments

:: Int

Guesser argument

-> Guesser Text

Guesser itself

-> FilePath

File to tag (plain format)

-> IO Text 

Tag the file.

learnSource

Arguments

:: SgdArgs

SGD parameters

-> Text

The tag indicating unknown words

-> FilePath

Train file (plain format)

-> Maybe FilePath

Maybe eval file

-> IO (Guesser Text) 

TODO: Abstract over the format type.