adict-0.4.0: Approximate dictionary searching

Safe HaskellNone

NLP.Adict

Contents

Description

This module re-exports main data types and functions from the adict library.

Synopsis

Approximate searching

There are three approximate searching methods implemented in the library. The first one, findAll, can be used to find all matches within the given distance from the query word. The findNearest function, on the other hand, searches only for the nearest to the query word match. The third one, bruteSearch, is provided only for reference and testing purposes.

The findAll function is evaluated against the Trie while the findNearest one is evaluated against the DAWG. The reason to make this distinction is that the findNearest function needs to distinguish between DAG nodes and to know when the particular node is visited for the second time.

Both methods perform the search with respect to the cost function specified by the library user, which can be used to customize weights of edit operations. The Cost structure provides the general representation of the cost and it can be used with the findAll method. The shortest-path algorithm used in the background of the findNearest function is optimized to use the more informative, CostDiv cost representation, which divides edit operations between separate classes with respect to their weight.

Cost function

type Word a = Vector aSource

A word parametrized with character type a.

type Pos = IntSource

Position in a sentence.

type Weight = DoubleSource

Cost of edit operation. It has to be a non-negative value!

data Cost a Source

Cost represents a cost (or weight) of a symbol insertion, deletion or substitution. It can depend on edit operation position and on symbol values.

Constructors

Cost 

Fields

insert :: Pos -> a -> Weight
 
delete :: Pos -> a -> Weight
 
subst :: Pos -> a -> a -> Weight
 

costDefault :: Eq a => Cost aSource

Simple cost function: all edit operations cost 1 unit.

Searching methods

bruteSearch :: Cost a -> Double -> Word a -> [(Word a, b)] -> [(Word a, b, Double)]Source

Find all words within a list with restricted generalized edit distance from x lower or equall to k.

findAllSource

Arguments

:: (Enum a, Unbox w) 
=> Cost a

Cost function

-> Double

Threshold

-> Word a

Query word

-> DAWG a w b 
-> [([a], b, Double)] 

Find all words within a DAWG with restricted generalized edit distance lower than or equal to a given threshold.

findNearestSource

Arguments

:: (Enum a, Unbox w) 
=> CostDiv a

Cost function

-> Double

Threshold

-> Word a

Query word

-> DAWG a w b 
-> Maybe ([a], b, Double) 

We could check, if CostDiv satisfies basic properties. On the other hand, we do not do this for plain Cost function.