hist-pl-fusion-0.3.0: Merging historical dictionary with PoliMorf

Safe HaskellNone

NLP.HistPL.Fusion

Contents

Synopsis

Rule

data Rule Source

A rule for translating a form into another form.

Constructors

Rule 

Fields

cut :: !Int

Number of characters to cut from the end of the form.

suffix :: !Text

A suffix to paste.

Instances

apply :: Rule -> Text -> TextSource

Apply the rule.

between :: Text -> Text -> RuleSource

Make a rule to translate between two strings.

Basic types

type UID = IntSource

Unique ID in historical dictionary.

type POS = TextSource

Part of speech.

type Word = TextSource

Word form.

type Base = TextSource

Base form.

type IsBase = BoolSource

Is the word form a base form?

Dictionary

Entry

data Lex i a b Source

A lexical entry.

Constructors

Lex 

Fields

lexKey :: LexKey i
 
lexElem :: LexElem a b
 

Instances

(Eq i, Eq a, Eq b) => Eq (Lex i a b) 
(Ord i, Ord a, Ord b) => Ord (Lex i a b) 
(Show i, Show a, Show b) => Show (Lex i a b) 

data LexKey i Source

Lexical entry dictionary key.

Constructors

LexKey 

Fields

key :: Text
 
uid :: i
 

Instances

Eq i => Eq (LexKey i) 
Ord i => Ord (LexKey i) 
Show i => Show (LexKey i) 

data LexElem a b Source

Lexical entry info.

Constructors

LexElem 

Fields

info :: a
 
forms :: Map Word b
 

Instances

(Eq a, Eq b) => Eq (LexElem a b) 
(Ord a, Ord b) => Ord (LexElem a b) 
(Show a, Show b) => Show (LexElem a b) 

type LexSet i a b = Map (LexKey i) (LexElem a b)Source

A set of lexical entries in a map form.

mkLexSet :: Ord i => [Lex i a b] -> LexSet i a bSource

Make lexical set from a list of entries.

unLexSet :: LexSet i a b -> [Lex i a b]Source

List lexical entries.

Dictionary

type Dict i a b = DAWG Char () (Map i (a, Map Rule b))Source

One-way dictionary parametrized over ID i, with info a for every (key, i) pair and info b for every (key, i, apply rule key) triple.

type BaseDict i a b = Dict i a bSource

Dictionary keys represent base forms and rules transform base forms to their corresponding word forms. Info a is assigned to every lexeme and info b to every word form.

type FormDict i a b = Dict i a bSource

Dictionary keys represent word forms and rules transform word forms to their corresponding base forms. Info a is assigned to every lexeme and info b to every word form.

mkDict :: (Ord i, Ord a, Ord b) => [(Text, i, a, Text, b)] -> Dict i a bSource

Make dictionary from a list of (key, ID, key/ID info, elem, key/ID/elem info) tuples.

unDict :: (Ord i, Ord a, Ord b) => Dict i a b -> [(Text, i, a, Text, b)]Source

Transform dictionary back into the list of (key, ID, key/ID info, elem, key/ID/elem info) tuples.

revDict :: (Ord i, Ord a, Ord b) => Dict i a b -> Dict i a bSource

Reverse the dictionary.

lookup :: Ord i => Text -> Dict i a b -> LexSet i a bSource

Lookup the key in the dictionary.

entries :: Ord i => Dict i a b -> [Lex i a b]Source

List dictionary lexical entries.

Bilateral

data Bila i a b Source

Bilateral dictionary.

Constructors

Bila 

Fields

baseDict :: BaseDict i a b
 
formDict :: FormDict i a b
 

Instances

(Eq i, Eq a, Eq b) => Eq (Bila i a b) 
(Ord i, Ord a, Ord b) => Ord (Bila i a b) 
(Show i, Show a, Show b) => Show (Bila i a b) 
(Ord i, Binary i, Binary a, Binary b) => Binary (Bila i a b) 

mkBila :: (Ord i, Ord a, Ord b) => [(Base, i, a, Word, b)] -> Bila i a bSource

Make bilateral dictionary from a list of (base form, ID, additional lexeme info, word form, additional word form info) tuples.

withForm :: Ord i => Bila i a b -> Word -> LexSet i a bSource

Identify entries which contain given word form.

Historical

type Hist = BaseDict UID (Set POS) IsBaseSource

Historical dictionary.

mkHist :: [BinEntry] -> HistSource

Construct historical dictionary.

type HLex = Lex UID (Set POS) IsBaseSource

Historical dictionary entry.

Contemporary

type Poli = Bila POS () ()Source

PoliMorf dictionary in a bilateral form.

type PLex = Lex POS () ()Source

PoliMorf dictionary entry.

type PLexSet = LexSet POS () ()Source

Set of PoliMorf dictionary entries.

mkPoli :: [Entry] -> PoliSource

Make bilateral dictionary from PoliMorf.

Correspondence

type Corresp = Poli -> HLex -> PLexSetSource

A function which determines entries from a bilateral dictionary corresponing to a given historical lexeme.

buildCorresp :: Core -> Filter -> Choice -> CorrespSource

Build Corresp function form individual components.

Components

type Core = Poli -> HLex -> [PLexSet]Source

We provide three component types, Core, Filter and Choice, which can be combined together using the buildCorresp function to construct a Corresp function. The first one, Core, is used to identify a list of potential sets of lexemes. It is natural to define the core function in such a way because the task of determining corresponding lexemes can be usually divided into a set of smaller tasks of the same purpose. For example, we may want to identify LexSets corresponding to individual word forms of the historical lexeme.

type Filter = HLex -> PLex -> BoolSource

Function which can be used to filter out lexemes which do not satisfy a particular predicate. For example, we may want to filter out lexemes with incompatible POS value.

type Choice = [PLexSet] -> PLexSetSource

The final choice of lexemes. Many different strategies can be used here -- sum of the sets, intersection, or voting.

byForms :: CoreSource

Identify LexSets corresponding to individual word forms of the historical lexeme using the withForm function.

posFilter :: FilterSource

Filter out lexemes with POS value incompatible with the set of POS values assigned to the historical lexeme.

sumChoice :: ChoiceSource

Sum of sets of lexemes.

Fusion

type Fused = BaseDict UID () CodeSource

Fused dictionary.

type FLex = Lex UID () CodeSource

Fused dictionary entry.

data Code Source

Code of word form origin.

Constructors

Orig

original (was already present in HLex)

Copy

a copy (from corresponding lexeme)

Instances

extend :: HLex -> PLexSet -> FLexSource

Extend lexeme with forms from the set of lexemes.

fuse :: Corresp -> Hist -> Poli -> FusedSource

Fuse the historical dictionary with bilateral contemporary dictionary using the given Corresp function to determine contemporary lexemes corresponding to individual lexemes from the historical dictionary.