concraft-0.4.0: Morphosyntactic tagging tool based on constrained CRFs

Safe HaskellNone

NLP.Concraft.Schema

Contents

Description

Observation schema blocks for Concraft.

Synopsis

Types

type Ob = ([Int], Text)Source

An observation consist of an index (of list type) and an actual observation value.

type Ox t a = Ox (Word t) Text aSource

The Ox monad specialized to word token type and text observations.

type Schema t a = Vector (Word t) -> Int -> Ox t aSource

A schema is a block of the Ox computation performed within the context of the sentence and the absolute sentence position.

void :: a -> Schema t aSource

A dummy schema block.

sequenceS_ :: [Vector (Word t) -> a -> Ox t b] -> Vector (Word t) -> a -> Ox t ()Source

Sequence the list of schemas (or blocks) and discard individual values.

Usage

schematize :: Schema t a -> Sent t -> [[Ob]]Source

Use the schema to extract observations from the sentence.

Configuration

data Body a Source

Body of configuration entry.

Constructors

Body 

Fields

range :: [Int]

Range argument for the schema block.

oovOnly :: Bool

When true, the entry is used only for oov words.

args :: a

Additional arguments for the schema block.

Instances

Show a => Show (Body a) 
Binary a => Binary (Body a) 

type Entry a = Maybe (Body a)Source

Maybe entry.

entry :: [Int] -> Entry ()Source

Plain entry with no additional arugments.

entryWith :: a -> [Int] -> Entry aSource

Entry with additional arguemnts.

data SchemaConf Source

Configuration of the schema. All configuration elements specify the range over which a particular observation type should be taken on account. For example, the [-1, 0, 2] range means that observations of particular type will be extracted with respect to previous (k - 1), current (k) and after the next (k + 2) positions when identifying the observation set for position k in the input sentence.

Constructors

SchemaConf 

Fields

orthC :: Entry ()

The orthB schema block.

lowOrthC :: Entry ()

The lowOrthB schema block.

lowPrefixesC :: Entry [Int]

The lowPrefixesB schema block. The first list of ints represents lengths of prefixes.

lowSuffixesC :: Entry [Int]

The lowSuffixesB schema block. The first list of ints represents lengths of suffixes.

knownC :: Entry ()

The knownB schema block.

shapeC :: Entry ()

The shapeB schema block.

packedC :: Entry ()

The packedB schema block.

begPackedC :: Entry ()

The begPackedB schema block.

nullConf :: SchemaConfSource

Null configuration of the observation schema.

fromConf :: SchemaConf -> Schema t ()Source

Build the schema based on the configuration.

guessConfDefault :: SchemaConfSource

Default configuration for the guessing observation schema.

disambConfDefault :: SchemaConfSource

Default configuration for the guessing observation schema.

Schema blocks

type Block t a = Vector (Word t) -> [Int] -> Ox t aSource

A block is a chunk of the Ox computation performed within the context of the sentence and the list of absolute sentence positions.

fromBlock :: Block t a -> [Int] -> Bool -> Schema t aSource

Transform a block to a schema depending on * A list of relative sentence positions, * A boolean value; if true, the block computation will be performed only on positions where an OOV word resides.

orthB :: Block t ()Source

Orthographic form at the current position.

lowOrthB :: Block t ()Source

Orthographic form at the current position.

lowPrefixesB :: [Int] -> Block t ()Source

List of lowercased prefixes of given lengths.

lowSuffixesB :: [Int] -> Block t ()Source

List of lowercased suffixes of given lengths.

knownB :: Block t ()Source

Shape of the word.

shapeB :: Block t ()Source

Shape of the word.

packedB :: Block t ()Source

Packed shape of the word.

begPackedB :: Block t ()Source

Packed shape of the word.