adict-0.2.0: Approximate dictionary searching

Safe HaskellNone

NLP.Adict

Contents

Description

This module exports main data types and functions of the adict library.

Synopsis

Dictionary representation

The library provides two basic data structures used for dictionary representation. The first one is a Trie, which can be constructed from a list of dictionary entries by using the fromList function.

The trie can be translated into a directed acyclic word graph (DAWG) using the fromTrie function (for the moment it is done in an inefficient manner, though).

There is also a possibility of constructing an implicit DAWG, i.e. a DAWG which is algebraically represented by a trie with sharing of common subtries, by using the implicitDAWG function (which is also inefficient right now; in fact, the fromTrie function uses this one underneath).

Finally, the DAWG can be transformed back to a trie (implicit DAWG) using the fromDAWG function.

Trie

data Trie a b Source

A trie of words with character type a and entry type b. It can be thought of as a map of type [a] -> b.

Constructors

Trie 

Fields

valueIn :: b

Value in the node.

edgeMap :: Map a (Trie a b)

Edges to subtries annotated with characters.

Instances

Functor (Trie a) 
(Eq a, Eq b) => Eq (Trie a b) 
(Eq (Trie a b), Ord a, Ord b) => Ord (Trie a b) 
(Show a, Show b) => Show (Trie a b) 
(Ord a, Binary a, Binary b) => Binary (Trie a b) 

type TrieD a b = Trie a (Maybe b)Source

A Trie with Maybe values in nodes.

fromList :: Ord a => [([a], b)] -> TrieD a bSource

Construct the Trie from the list of (word, value) pairs.

implicitDAWG :: (Ord a, Ord b) => Trie a b -> Trie a bSource

Elminate common subtries. The result is algebraically a trie but is represented as a DAWG in memory.

Directed acyclic word graph

data DAWG a b Source

A directed acyclic word graph with character type a and dictionary entry type b.

Constructors

DAWG 

Fields

root :: Int

Root (index) of the DAWG

array :: Vector (Row a b)

Vector of DAWG nodes

Instances

(Ord a, Binary a, Binary b) => Binary (DAWG a b) 

data Row a b Source

A Row represents one node of the DAWG.

Constructors

Row 

Fields

rowValue :: b

Value in the node.

rowEdges :: Vector (a, Int)

Edges to subnodes (represented by array indices) annotated with characters.

type DAWGD a b = DAWG a (Maybe b)Source

A DAWGD dictionary is a DAWG which may have the Nothing value along the path from the root to a leave.

fromTrie :: (Ord a, Ord b) => Trie a b -> DAWG a bSource

Find and eliminate all common subtries in the input trie and return the trie represented as a DAWG.

fromDAWG :: Ord a => DAWG a b -> Trie a bSource

Transform the DAWG to implicit DAWG in a form of a trie.