hist-pl-dawg-0.2.0: A generic, DAWG-based dictionary

Safe HaskellNone

NLP.HistPL.DAWG

Contents

Description

A DAWG-based dictionary with additional information assigned to lexical entries and word forms.

Synopsis

Rule

data Rule Source

A rule for translating a form into another form.

Constructors

Rule 

Fields

cut :: !Int

Number of characters to cut from the end of the form.

suffix :: !Text

A suffix to paste.

Instances

apply :: Rule -> Text -> TextSource

Apply the rule.

between :: Text -> Text -> RuleSource

Determine a rule which translates between two strings.

Entry

data Lex i a b Source

A dictionary entry consists of a Key and a Value.

Constructors

Lex 

Fields

lexKey :: Key i

Entry key.

lexVal :: Val a Text b

Entry value.

Instances

(Eq i, Eq a, Eq b) => Eq (Lex i a b) 
(Ord i, Ord a, Ord b) => Ord (Lex i a b) 
(Show i, Show a, Show b) => Show (Lex i a b) 

data Key i Source

A key of a dictionary entry.

Constructors

Key 

Fields

path :: Text

A path of the entry, i.e. DAWG key.

uid :: i

Unique identifier among entries with the same path.

Instances

Eq i => Eq (Key i) 
Ord i => Ord (Key i) 
Show i => Show (Key i) 

data Val a w b Source

A value of the entry.

Constructors

Val 

Fields

info :: a

Additional information assigned to the entry.

forms :: Map w b

A map of forms with additional info of type b assigned. Invariant: in case of a reverse dictionary (from word forms to base forms) this map should contain exactly one element (a base form and a corresonding information).

Instances

(Eq a, Eq w, Eq b) => Eq (Val a w b) 
(Ord a, Ord w, Ord b) => Ord (Val a w b) 
(Show a, Show w, Show b) => Show (Val a w b) 
(Ord w, Binary a, Binary w, Binary b) => Binary (Val a w b) 

Entry set

type LexSet i a b = Map (Key i) (Val a Text b)Source

A set of dictionary entries.

mkLexSet :: Ord i => [Lex i a b] -> LexSet i a bSource

Make lexical set from a list of entries.

unLexSet :: LexSet i a b -> [Lex i a b]Source

List lexical entries.

type Node i a b = Map i (Val a Rule b)Source

Actual values stored in automaton states contain all entry information but path.

decode :: Ord i => Text -> Node i a b -> LexSet i a bSource

Decode dictionary value given path.

DAWG

type DAWG i a b = DAWG Char Weight (Node i a b)Source

A dictionary parametrized over ID i, with info a for every (key, i) pair and info b for every (key, i, apply rule key) triple.

Initialization

type DAWG'Init i a b = DAWG Char (Node i a b)Source

A DAWG initialization structure (a dynamic DAWG).

empty :: Ord b => DAWG a b

Empty DAWG.

insert :: (Ord i, Ord a, Ord b) => (Text, i, a, Text, b) -> DAWG'Init i a b -> DAWG'Init i a bSource

Insert a (key, ID, entry info, form, entry/form info) into a DAWG'Init structure.

freeze :: DAWG a b -> DAWG a () b

Construct immutable version of the automaton.

Query

lookup :: Ord i => Text -> DAWG i a b -> LexSet i a bSource

Lookup the key in the dictionary.

submap :: Ord i => Text -> DAWG i a b -> DAWG i a bSource

Return the sub-dictionary containing all keys beginning with a prefix.

Weight

type Weight = Int

Weight of a node corresponds to the number of final states reachable from the node. Weight of an edge is a sum of weights of preceding nodes outgoing from the same parent node.

weigh :: DAWG a b c -> DAWG a Weight c

Compute node weights and store corresponding values in transition labels. Be aware, that the entire DAWG will be weighted, even when (because of the use of the submap function) only a part of the DAWG is currently selected.

size :: DAWG a Weight c -> Int

A number of distinct (key, value) pairs in the weighted DAWG.

index :: Text -> DAWG i a b -> Maybe IntSource

Position in a set of all dictionary entries with respect to the lexicographic order.

byIndex :: Int -> DAWG i a b -> Maybe TextSource

Find dictionary entry given its index with respect to the lexicographic order.

Conversion

fromList :: (Ord i, Ord a, Ord b) => [(Text, i, a, Text, b)] -> DAWG i a bSource

Make dictionary from a list of (key, ID, entry info, form, entry/form info) tuples.

toList :: (Ord i, Ord a, Ord b) => DAWG i a b -> [(Text, i, a, Text, b)]Source

Transform dictionary back into the list of (key, ID, key/ID info, elem, key/ID/elem info) tuples.

entries :: Ord i => DAWG i a b -> [Lex i a b]Source

List dictionary lexical entries.

revDAWG :: (Ord i, Ord a, Ord b) => DAWG i a b -> DAWG i a bSource

Reverse the dictionary.