crf-chain2-generic-0.1.1: Second-order, generic, constrained, linear conditional random fields

Safe Haskell None

Data.CRF.Chain2.Pair

Contents

Synopsis

# Data types

data Word a b Source

A word with `a` representing the observation type and `b` representing the compound label type.

Instances

 (Eq a, Eq b) => Eq (Word a b) (Eq (Word a b), Ord a, Ord b) => Ord (Word a b) (Show a, Show b) => Show (Word a b)

mkWord :: Set a -> Set b -> Word a bSource

A word constructor which checks non-emptiness of the potential set of labels.

type Sent a b = [Word a b]Source

data Dist a Source

A probability distribution defined over elements of type a. All elements not included in the map have probability equal to 0.

mkDist :: Ord a => [(a, Double)] -> Dist aSource

Construct the probability distribution.

type WordL a b = (Word a b, Dist b)Source

A WordL is a labeled word, i.e. a word with probability distribution defined over labels. We assume that every label from the distribution domain is a member of the set of potential labels corresponding to the word. TODO: Ensure the assumption using the smart constructor.

type SentL a b = [WordL a b]Source

A sentence of labeled words.

# CRF

data CRF a b c Source

Constructors

 CRF Fieldscodec :: Codec a b c model :: Model Ob Lb Feat

Instances

 (Ord a, Ord b, Ord c, Binary a, Binary b, Binary c) => Binary (CRF a b c)

## Training

Arguments

 :: (Ord a, Ord b, Ord c) => SgdArgs Args for SGD -> IO [SentL a (b, c)] Training data `IO` action -> Maybe (IO [SentL a (b, c)]) Maybe evalation data -> IO (CRF a b c) Resulting codec and model

Train the CRF using the stochastic gradient descent method. When the evaluation data `IO` action is `Just`, the iterative training process will notify the user about the current accuracy on the evaluation part every full iteration over the training part. TODO: Add custom feature extraction function.

## Tagging

tag :: (Ord a, Ord b, Ord c) => CRF a b c -> Sent a (b, c) -> [(b, c)]Source

Find the most probable label sequence.