alpino-tools-0.2.0: Alpino data manipulation tools

Stabilityexperimental
MaintainerDaniël de Kok <me@danieldk.eu>
Safe HaskellSafe-Infered

Data.Alpino.Model

Description

Data structures and functions to modify and process training data for the Alpino parse disambiguation and fluency ranking components.

Since the training data follows a very general format, this module and submodules should also be usable for other parsers and generators. Please refer to the description of bsToTrainingInstance for more information about the format that is used.

Synopsis

Documentation

data FeatureValue Source

A feature and its corresponding value.

Constructors

FeatureValue 

data TrainingInstance Source

A training instance.

Constructors

TrainingInstance 

Fields

instanceType :: TrainingInstanceType

Type of training instance

instanceKey :: ByteString

Training instance identifier

instanceN :: ByteString
 
instanceScore :: Double

Quality score

instanceFeatures :: Features

Features

data TrainingInstanceType Source

Type of training instance (parsing or generation).

bestScore :: [TrainingInstance] -> DoubleSource

Find the highest score of a context.

bestScore' :: [TrainingInstance] -> DoubleSource

Find the highest score of a context (strict).

bsToTrainingInstance :: ByteString -> Maybe TrainingInstanceSource

Read a training instance from a ByteString.

The bytestring is assumed to contain five fields separated by the hash (#) character:

  1. An indicator for the type of training instance (P for parse disambiguation, G for fluency ranking).
  2. The identifier of the context (usually the identifier of a sentence of logircal form).
  3. Parse/generation number.
  4. A quality score for this training instance.
  5. A list of features and values. List elements are separated by the vertical bar (|), and have the following form: value@feature

filterFeatures :: (Bool -> Bool) -> Set ByteString -> TrainingInstance -> TrainingInstanceSource

Filter features by exact names. A modifier function can be applied, for instance, the not function would exclude the specified features.

filterFeaturesFunctor :: (Bool -> Bool) -> Set ByteString -> TrainingInstance -> TrainingInstanceSource

Filter features by their functor. A modifier function can be applied, for instance, the not function would exclude the specified features.

randomSample :: MonadRandom m => Int -> [TrainingInstance] -> m [TrainingInstance]Source

Extract a random sample from a list of instances.

scoreToBinary :: [TrainingInstance] -> [TrainingInstance]Source

Convert the quality scores to binary scores. The instances with the highest quality score get score 1.0, other instances get score 0.0.

scoreToBinaryNorm :: [TrainingInstance] -> [TrainingInstance]Source

Divide a score of 1.0 uniformly over instances with the highest quality scores.

scoreToNorm :: [TrainingInstance] -> [TrainingInstance]Source

Normalize scores over all training instances.

trainingInstanceToBs :: TrainingInstance -> ByteStringSource

Convert a training instance to a ByteString.