Safe Haskell	None

Data.CRF.Chain2.Pair

Contents

Data types
CRF
- Training
- Tagging

Synopsis

Data types

data Word a b Source

A word with a representing the observation type and b representing the compound label type.

Instances

(Eq a, Eq b) => Eq (Word a b)
(Eq (Word a b), Ord a, Ord b) => Ord (Word a b)
(Show a, Show b) => Show (Word a b)

mkWord :: Set a -> Set b -> Word a bSource

A word constructor which checks non-emptiness of the potential set of labels.

type Sent a b = [Word a b]Source

data Dist a Source

A probability distribution defined over elements of type a. All elements not included in the map have probability equal to 0.

mkDist :: Ord a => [(a, Double)] -> Dist aSource

Construct the probability distribution.

type WordL a b = (Word a b, Dist b)Source

A WordL is a labeled word, i.e. a word with probability distribution defined over labels. We assume that every label from the distribution domain is a member of the set of potential labels corresponding to the word. TODO: Ensure the assumption using the smart constructor.

type SentL a b = [WordL a b]Source

A sentence of labeled words.

CRF

data CRF a b c Source

Constructors

CRF
Fields codec :: Codec a b c model :: Model Ob Lb Feat

Instances

(Ord a, Ord b, Ord c, Binary a, Binary b, Binary c) => Binary (CRF a b c)

Training

train Source

Arguments

:: (Ord a, Ord b, Ord c)
=> SgdArgs	Args for SGD
-> IO [SentL a (b, c)]	Training data `IO` action
-> Maybe (IO [SentL a (b, c)])	Maybe evalation data
-> IO (CRF a b c)	Resulting codec and model

Train the CRF using the stochastic gradient descent method. When the evaluation data IO action is Just, the iterative training process will notify the user about the current accuracy on the evaluation part every full iteration over the training part. TODO: Add custom feature extraction function.

Tagging

tag :: (Ord a, Ord b, Ord c) => CRF a b c -> Sent a (b, c) -> [(b, c)]Source

Find the most probable label sequence.