Stability | experimental |
---|---|
Maintainer | Daniël de Kok <me@danieldk.eu> |
Safe Haskell | Safe-Infered |
Data.Alpino.Model
Description
Data structures and functions to modify and process training data for the Alpino parse disambiguation and fluency ranking components.
Since the training data follows a very general format, this module and
submodules should also be usable for other parsers and generators.
Please refer to the description of bsToTrainingInstance
for more
information about the format that is used.
- data FeatureValue = FeatureValue {
- feature :: ByteString
- value :: Double
- data TrainingInstance = TrainingInstance {
- instanceType :: TrainingInstanceType
- instanceKey :: ByteString
- instanceN :: ByteString
- instanceScore :: Double
- instanceFeatures :: Features
- data TrainingInstanceType
- bestScore :: [TrainingInstance] -> Double
- bestScore' :: [TrainingInstance] -> Double
- bsToTrainingInstance :: ByteString -> Maybe TrainingInstance
- filterFeatures :: (Bool -> Bool) -> Set ByteString -> TrainingInstance -> TrainingInstance
- filterFeaturesFunctor :: (Bool -> Bool) -> Set ByteString -> TrainingInstance -> TrainingInstance
- randomSample :: MonadRandom m => Int -> [TrainingInstance] -> m [TrainingInstance]
- scoreToBinary :: [TrainingInstance] -> [TrainingInstance]
- scoreToBinaryNorm :: [TrainingInstance] -> [TrainingInstance]
- scoreToNorm :: [TrainingInstance] -> [TrainingInstance]
- trainingInstanceToBs :: TrainingInstance -> ByteString
Documentation
data FeatureValue Source
A feature and its corresponding value.
Constructors
FeatureValue | |
Fields
|
Instances
data TrainingInstance Source
A training instance.
Constructors
TrainingInstance | |
Fields
|
Instances
data TrainingInstanceType Source
Type of training instance (parsing or generation).
Constructors
ParsingInstance | |
GenerationInstance |
Instances
bestScore :: [TrainingInstance] -> DoubleSource
Find the highest score of a context.
bestScore' :: [TrainingInstance] -> DoubleSource
Find the highest score of a context (strict).
bsToTrainingInstance :: ByteString -> Maybe TrainingInstanceSource
Read a training instance from a ByteString
.
The bytestring is assumed to contain five fields separated by the hash (#) character:
- An indicator for the type of training instance (P for parse disambiguation, G for fluency ranking).
- The identifier of the context (usually the identifier of a sentence of logircal form).
- Parse/generation number.
- A quality score for this training instance.
- A list of features and values. List elements are separated by the vertical bar (|), and have the following form: value@feature
filterFeatures :: (Bool -> Bool) -> Set ByteString -> TrainingInstance -> TrainingInstanceSource
Filter features by exact names. A modifier function can be applied,
for instance, the not
function would exclude the specified features.
filterFeaturesFunctor :: (Bool -> Bool) -> Set ByteString -> TrainingInstance -> TrainingInstanceSource
Filter features by their functor. A modifier function can be applied,
for instance, the not
function would exclude the specified features.
randomSample :: MonadRandom m => Int -> [TrainingInstance] -> m [TrainingInstance]Source
Extract a random sample from a list of instances.
scoreToBinary :: [TrainingInstance] -> [TrainingInstance]Source
Convert the quality scores to binary scores. The instances with the highest quality score get score 1.0, other instances get score 0.0.
scoreToBinaryNorm :: [TrainingInstance] -> [TrainingInstance]Source
Divide a score of 1.0 uniformly over instances with the highest quality scores.
scoreToNorm :: [TrainingInstance] -> [TrainingInstance]Source
Normalize scores over all training instances.
trainingInstanceToBs :: TrainingInstance -> ByteStringSource
Convert a training instance to a ByteString
.