polimorf-0.4.1: Working with the PoliMorf dictionary

Safe HaskellNone

Data.PoliMorf

Contents

Description

The module provides functionality for manipulating PoliMorf, the morphological dictionary for Polish. Apart from IO utilities there is a merge function which can be used to merge the PoliMorf with another dictionary resources.

Synopsis

Types

type Form = TextSource

A form.

type Base = TextSource

A base form.

type Tag = TextSource

A morphosyntactic tag.

type Cat = TextSource

A category.

data Entry Source

An entry from the PoliMorf dictionary.

Constructors

Entry 

Fields

form :: !Form
 
base :: !Base
 
tag :: !Tag
 
cat :: !Cat
 

Parsing

readPoliMorf :: FilePath -> IO [Entry]Source

Read the PoliMorf from the file.

parsePoliMorf :: Text -> [Entry]Source

Parse the PoliMorf into a list of entries.

Merging

type BaseMap = Map Form (Set Base)Source

A map from forms to their possible base forms (there may be many since the form may be a member of multiple lexemes).

mkBaseMap :: [Entry] -> BaseMapSource

Make the base map from the list of entries.

data RelCode Source

Reliability information: how did we assign a particular label to a particular word form.

Constructors

Exact

Label assigned in a direct manner

ByBase

Label assigned based on a lemma label

ByForm

Based on labels of other forms within the same lexeme

merge :: Ord a => BaseMap -> Map Form (Set a) -> Map Form (Map a RelCode)Source

Merge the BaseMap with the dictionary resource which maps forms to sets of labels. Every label is assigned a RelCode which tells what is the relation between the label and the form. There are three kinds of labels: Exact labels assigned in a direct manner, ByBase labels assigned to all forms which have a base form with a label in the input dictionary, and ByForm labels assigned to all forms which have a related form from the same lexeme with a label in the input dictionary.

This function is far from being memory efficient right now. If you plan to run it with respect to the entire PoliMorf dictionary you should do it on a machine with an abundance of available memory.