nerf-0.4.0: Nerf, the named entity recognition tool based on linear-chain CRFs

Safe HaskellNone

NLP.Nerf.Schema

Contents

Description

Observation schema blocks for Nerf.

Synopsis

Types

type Ox a = Ox Word Text aSource

The Ox monad specialized to word token type and text observations.

type Schema a = Vector Word -> Int -> Ox aSource

A schema is a block of the Ox computation performed within the context of the sentence and the absolute sentence position.

void :: a -> Schema aSource

A dummy schema block.

sequenceS_ :: [Vector Word -> a -> Ox b] -> Vector Word -> a -> Ox ()Source

Sequence the list of schemas (or blocks) and discard individual values.

Usage

schematize :: Schema a -> [Word] -> Sent ObSource

Use the schema to extract observations from the sentence.

Configuration

data Body a Source

Body of configuration entry.

Constructors

Body 

Fields

range :: [Int]

Range argument for the schema block.

args :: a

Additional arguments for the schema block.

Instances

Show a => Show (Body a) 
Binary a => Binary (Body a) 

type Entry a = Maybe (Body a)Source

Maybe entry.

entry :: [Int] -> Entry ()Source

Plain entry with no additional arugments.

entryWith :: a -> [Int] -> Entry aSource

Entry with additional arguemnts.

data SchemaConf Source

Configuration of the schema. All configuration elements specify the range over which a particular observation type should be taken on account. For example, the [-1, 0, 2] range means that observations of particular type will be extracted with respect to previous (k - 1), current (k) and after the next (k + 2) positions when identifying the observation set for position k in the input sentence.

Constructors

SchemaConf 

Fields

orthC :: Entry ()

The orthB schema block.

splitOrthC :: Entry ()

The splitOrthB schema block.

lowPrefixesC :: Entry [Int]

The lowPrefixesB schema block. The first list of ints represents lengths of prefixes.

lowSuffixesC :: Entry [Int]

The lowSuffixesB schema block. The first list of ints represents lengths of suffixes.

lemmaC :: Entry Int

The lemmaB schema block.

shapeC :: Entry ()

The shapeB schema block.

packedC :: Entry ()

The packedB schema block.

shapePairC :: Entry ()

The shapePairB schema block.

packedPairC :: Entry ()

The packedPairB schema block.

dictC :: Entry [Dict]

Dictionaries of NEs (dictB schema block).

intTrigsC :: Entry Dict

Dictionary of internal triggers.

extTrigsC :: Entry Dict

Dictionary of external triggers.

nullConf :: SchemaConfSource

Null configuration of the observation schema.

defaultConfSource

Arguments

:: [Dict]

Named Entity dictionaries

-> Maybe Dict

Dictionary of internal triggers

-> Maybe Dict

Dictionary of external triggers

-> IO SchemaConf 

Default configuration of the observation schema.

fromConf :: SchemaConf -> Schema ()Source

Build the schema based on the configuration.

Schema blocks

type Block a = Vector Word -> [Int] -> Ox aSource

A block is a chunk of the Ox computation performed within the context of the sentence and the list of absolute sentence positions.

fromBlock :: Block a -> [Int] -> Schema aSource

Transform the block to the schema depending on the list of relative sentence positions.

orthB :: Block ()Source

Orthographic form at the current position.

splitOrthB :: Block ()Source

Orthographic form split into two observations: the lowercased form and the original form (only when different than the lowercased one).

lowPrefixesB :: [Int] -> Block ()Source

List of lowercased prefixes of given lengths.

lowSuffixesB :: [Int] -> Block ()Source

List of lowercased suffixes of given lengths.

lemmaB :: Int -> Block ()Source

Lemma substitute parametrized by the number specifying the span over which lowercased prefixes and suffixes will be saved. For example, lemmaB 2 will take affixes of lengths 0, -1 and -2 on account.

shapeB :: Block ()Source

Shape of the word.

packedB :: Block ()Source

Packed shape of the word.

shapePairB :: Block ()Source

Combined shapes of two consecutive (at k-1 and k positions) words.

packedPairB :: Block ()Source

Combined packed shapes of two consecutive (at k-1 and k positions) words.

dictB :: Dict -> Block ()Source

Plain dictionary search determined with respect to the list of relative positions.