Safe Haskell | None |
---|
This module re-exports main data types and functions from the adict library.
- type Word a = Vector a
- type Pos = Int
- type Weight = Double
- data Cost a = Cost {}
- costDefault :: Eq a => Cost a
- bruteSearch :: Cost a -> Double -> Word a -> [(Word a, b)] -> [(Word a, b, Double)]
- findAll :: (Enum a, Unbox w) => Cost a -> Double -> Word a -> DAWG a w b -> [([a], b, Double)]
- findNearest :: (Enum a, Unbox w) => CostDiv a -> Double -> Word a -> DAWG a w b -> Maybe ([a], b, Double)
Approximate searching
There are three approximate searching methods implemented in
the library. The first one, findAll
, can be used to find
all matches within the given distance from the query word.
The findNearest
function, on the other hand, searches only
for the nearest to the query word match.
The third one, bruteSearch
, is provided only for reference
and testing purposes.
The findAll
function is evaluated against the Trie
while the
findNearest
one is evaluated against the DAWG
.
The reason to make this distinction is that the findNearest
function needs to distinguish between DAG nodes and to know
when the particular node is visited for the second time.
Both methods perform the search with respect to the cost function
specified by the library user, which can be used to customize
weights of edit operations. The Cost
structure provides the
general representation of the cost and it can be used with
the findAll
method. The shortest-path algorithm used in
the background of the findNearest
function is optimized to
use the more informative, CostDiv
cost representation,
which divides edit operations between separate classes with
respect to their weight.
Cost function
Cost represents a cost (or weight) of a symbol insertion, deletion or substitution. It can depend on edit operation position and on symbol values.
costDefault :: Eq a => Cost aSource
Simple cost function: all edit operations cost 1 unit.
Searching methods
bruteSearch :: Cost a -> Double -> Word a -> [(Word a, b)] -> [(Word a, b, Double)]Source
Find all words within a list with restricted generalized edit distance from x lower or equall to k.