concraft-0.2.0: Morphosyntactic tagging tool based on constrained CRFs

Safe HaskellNone

NLP.Concraft.Disamb

Synopsis

Documentation

data Tier Source

A tier description.

Constructors

Tier 

Fields

withPos :: Bool

Does it include the part of speech?

withAtts :: Set Attr

Tier grammatical attributes.

Instances

data Tag Source

A tag with optional POS.

Constructors

Tag 

Fields

pos :: Maybe POS
 
atts :: Map Attr Text
 

Instances

select :: Tier -> Tag -> TagSource

Select tier attributes.

splitWord :: TierConf -> Word Tag -> Word (Tag, Tag)Source

Split tags between two layers. TODO: Add support for multiple layers.

type Ox t a = Ox (Word t) Text aSource

The Ox monad specialized to word token type and text observations. TODO: Move to monad-ox package from here and from the nerf library.

type Schema t a = Vector (Word t) -> Int -> Ox t aSource

A schema is a block of the Ox computation performed within the context of the sentence and the absolute sentence position.

type Ob = ([Int], Text)Source

An observation consist of an index (of list type) and an actual observation value.

schematize :: Sent t -> Sent Ob tSource

Schematize the input sentence according to schema rules.

deTear :: TierConf -> Word Tag -> (Tag, Tag) -> TagSource

Unsplit the list of tag pairs. TODO: It can be done without the help of original word.

deTears :: TierConf -> Sent Tag -> [(Tag, Tag)] -> [Tag]Source

data Disamb Source

The disambiguation model.

Instances

disamb :: Disamb -> Sent Tag -> [Tag]Source

Determine the most probable label sequence.

tagFileSource

Arguments

:: Text

Tag indicating unknown words

-> Disamb 
-> FilePath

File to tag (plain format)

-> IO Text 

Tag the file.

learnSource

Arguments

:: SgdArgs

SGD parameters

-> FilePath

File with positional tagset definition

-> Text

The tag indicating unknown words

-> TierConf

Tiered tagging configuration

-> FilePath

Train file (plain format)

-> Maybe FilePath

Maybe eval file

-> IO Disamb 

TODO: Abstract over the format type.