crf-chain1-0.2.3: First-order, linear-chain conditional random fields

Data.CRF.Chain1

Description

The module provides first-order, linear-chain conditional random fields (CRFs).

Important feature of the implemented flavour of CRFs is that transition features which are not included in the CRF model are considered to have probability of 0. It is particularly useful when the training material determines the set of possible label transitions (e.g. when using the IOB encoding method). Furthermore, this design decision makes the implementation much faster for sparse datasets.

Synopsis

# Data types

type Word a = Set a Source #

A Word is represented by a set of observations.

type Sent a = [Word a] Source #

A sentence of words.

data Dist a Source #

A probability distribution defined over elements of type a. All elements not included in the map have probability equal to 0.

mkDist :: Ord a => [(a, Double)] -> Dist a Source #

Construct the probability distribution.

type WordL a b = (Word a, Dist b) Source #

A WordL is a labeled word, i.e. a word with probability distribution defined over labels.

annotate :: Word a -> b -> WordL a b Source #

Annotate the word with the label.

type SentL a b = [WordL a b] Source #

A sentence of labeled words.

# CRF

data CRF a b Source #

A conditional random field model with additional codec used for data encoding.

Constructors

 CRF Fieldscodec :: Codec a bThe codec is used to transform data into internal representation, where each observation and each label is represented by a unique integer number.model :: ModelThe actual model, which is a map from Features to potentials.
Instances
 (Ord a, Ord b, Binary a, Binary b) => Binary (CRF a b) Source # Instance detailsDefined in Data.CRF.Chain1.Train Methodsput :: CRF a b -> Put #get :: Get (CRF a b) #putList :: [CRF a b] -> Put #

## Training

Arguments

 :: (Ord a, Ord b) => SgdArgs Args for SGD -> IO [SentL a b] Training data IO action -> Maybe (b, IO [SentL a b]) Default label and evalation data -> ([(Xs, Ys)] -> [Feature]) Feature selection -> IO (CRF a b) Resulting model

Train the CRF using the stochastic gradient descent method. The resulting model will contain features extracted with the user supplied extraction function. You can use the functions provided by the Data.CRF.Chain1.Feature.Present and Data.CRF.Chain1.Feature.Hidden modules for this purpose. When the evaluation data IO action is Just, the iterative training process will notify the user about the current accuracy on the evaluation part every full iteration over the training part.

## Tagging

tag :: (Ord a, Ord b) => CRF a b -> Sent a -> [b] Source #

Determine the most probable label sequence within the context of the given sentence using the model provided by the CRF.

# Feature selection

hiddenFeats :: [(Xs, Ys)] -> [Feature] Source #

Hidden Features of all types which can be constructed based on the dataset.

presentFeats :: [(Xs, Ys)] -> [Feature] Source #

Features of all kinds which occur in the dataset.