concraft-0.14.0: Morphological disambiguation based on constrained CRFs

Safe HaskellNone
LanguageHaskell98

NLP.Concraft.DAG.Segmentation

Contents

Description

Baseline word-segmentation functions.

Synopsis

Documentation

data PathTyp Source #

Which path type to search: shortest (Min) or longest (Max)

Constructors

Min 
Max 
Freq FreqConf 

pickPath :: Word b => PathTyp -> DAG a b -> DAG a b Source #

Select the shortest-path (or longest, depending on PathTyp) in the given DAG and remove all the edges which are not on this path.

findPath :: Word b => PathTyp -> DAG a b -> Set EdgeID Source #

Retrieve the edges which belong to the shortest/longest (depending on the argument function: minimum or maximum) path in the given DAG.

Frequencies

computeFreqs :: Word w => [Sent w t] -> Map Text (Int, Int) Source #

Compute chosen/not-chosen counts of the individual orthographic forms in the DAGs. Only the ambiguous segments are taken into account.

data FreqConf Source #

Configuration related to frequency-based path picking.

Constructors

FreqConf 

Fields

Ambiguity-related stats

computeAmbiStats :: Word w => AmbiCfg -> [Sent w t] -> AmbiStats Source #

Compute: * the number of tokens participating in ambiguities * the total number of tokens

data AmbiCfg Source #

Numbers of tokens.

Constructors

AmbiCfg 

Fields

Instances
Eq AmbiCfg Source # 
Instance details

Defined in NLP.Concraft.DAG.Segmentation

Methods

(==) :: AmbiCfg -> AmbiCfg -> Bool #

(/=) :: AmbiCfg -> AmbiCfg -> Bool #

Ord AmbiCfg Source # 
Instance details

Defined in NLP.Concraft.DAG.Segmentation

Show AmbiCfg Source # 
Instance details

Defined in NLP.Concraft.DAG.Segmentation

data AmbiStats Source #

Numbers of tokens.

Constructors

AmbiStats 

Fields