polimorf-0.5.0: Working with the PoliMorf dictionary

Safe HaskellNone

Data.PoliMorf

Contents

Description

The module provides functionality for manipulating PoliMorf, the morphological dictionary for Polish. Apart from IO utilities there is a merge function which can be used to merge the PoliMorf with another dictionary resources.

Synopsis

Core types

type Form = TextSource

A form.

type Base = TextSource

A base form.

type Tag = TextSource

A morphosyntactic tag.

type Cat = TextSource

A category.

data Entry Source

An entry from the PoliMorf dictionary.

Constructors

Entry 

Fields

form :: !Form
 
base :: !Base
 
tag :: !Tag
 
cat :: !Cat
 

atomic :: Entry -> BoolSource

Is the entry an atomic one? More precisely, we treat all negative forms starting with ''nie'' and all superlatives starting with ''naj'' as non-atomic entries.

Parsing

readPoliMorf :: FilePath -> IO [Entry]Source

Read the PoliMorf from the file.

parsePoliMorf :: Text -> [Entry]Source

Parse the PoliMorf into a list of entries.

Merging

data Rule Source

A rule for translating a form into another one.

Constructors

Rule 

Fields

cut :: !Int

Number of characters to cut from the end of the form.

suffix :: !Text

A suffix to paste.

Instances

apply :: Rule -> Text -> TextSource

Apply the rule.

toBase :: Entry -> Maybe RuleSource

Determine the rule needed to translate the form into its base form.

mkRuleMap :: [(Text, Text)] -> DAWG (Set Rule)Source

Make a rule map from a list of entries.

type BaseMap = DAWG (Set Rule)Source

A map from forms to their possible base forms (there may be many since the form may be a member of multiple lexemes).

mkBaseMap :: [Entry] -> BaseMapSource

Make a BaseMap from a list of entries.

type FormMap = DAWG (Set Rule)Source

A map from base forms to all their potential forms.

mkFormMap :: [Entry] -> FormMapSource

Make a FormMap from a list of entries.

data RelCode Source

Reliability information: how did we assign a particular label to a particular word form.

Constructors

ByForm

Based on labels of other forms within the same lexeme

ByBase

Label assigned based on a lemma label

Exact

Label assigned in a direct manner

mergeWith :: Ord a => (String -> String -> a -> a) -> BaseMap -> DAWG (Set a) -> DAWG (Map a RelCode)Source

Merge the BaseMap with the dictionary resource which maps forms to sets of labels. Every label is assigned a RelCode which tells what is the relation between the label and the form. It is a generalized version of the merge function with additional function f x y y'label which can be used to determine the resultant set of labels for the form x given ,,similar'' form y and its original label y'label. There are three kinds of labels: Exact labels assigned in a direct manner, ByBase labels assigned to all forms which have a base form with a label in the input dictionary, and ByForm labels assigned to all forms which have a related form from the same lexeme with a label in the input dictionary.

merge :: Ord a => BaseMap -> DAWG (Set a) -> DAWG (Map a RelCode)Source

A specialized version of the mergeWith function which doesn't change labels in the resultant DAWG.