dawg-0.10: Directed acyclic word graphs

Safe HaskellNone

Data.DAWG.Static

Contents

Description

The module implements directed acyclic word graphs (DAWGs) internaly represented as minimal acyclic deterministic finite-state automata.

In comparison to Data.DAWG.Dynamic module the automaton implemented here:

  • Keeps all nodes in one array and therefore uses less memory,
  • When weighed, it can be used to perform static hashing with index and byIndex functions,
  • Doesn't provide insert/delete family of operations.

Synopsis

DAWG type

data DAWG a b c Source

DAWG a b c constitutes an automaton with alphabet symbols of type a, transition labels of type b and node values of type Maybe c. All nodes are stored in a Vector with positions of nodes corresponding to their IDs.

Instances

(Eq b, Eq c, Unbox b) => Eq (DAWG a b c) 
(Ord b, Ord c, Unbox b) => Ord (DAWG a b c) 
(Show b, Show c, Unbox b) => Show (DAWG a b c) 
(Binary b, Binary c, Unbox b) => Binary (DAWG a b c) 

Query

lookup :: (Enum a, Unbox b) => [a] -> DAWG a b c -> Maybe cSource

Find value associated with the key.

submap :: (Enum a, Unbox b) => [a] -> DAWG a b c -> DAWG a b cSource

Return the sub-DAWG containing all keys beginning with a prefix. The in-memory representation of the resultant DAWG is the same as of the original one, only the pointer to the DAWG root will be different.

numStates :: DAWG a b c -> IntSource

Number of states in the automaton. TODO: The function ignores the root value, it won't work properly after using the submap function.

numEdges :: DAWG a b c -> IntSource

Number of edges in the automaton. TODO: The function ignores the root value, it won't work properly after using the submap function.

Weight

type Weight = IntSource

Weight of a node corresponds to the number of final states reachable from the node. Weight of an edge is a sum of weights of preceding nodes outgoing from the same parent node.

weigh :: DAWG a b c -> DAWG a Weight cSource

Compute node weights and store corresponding values in transition labels. Be aware, that the entire DAWG will be weighted, even when (because of the use of the submap function) only a part of the DAWG is currently selected.

size :: DAWG a Weight c -> IntSource

A number of distinct (key, value) pairs in the weighted DAWG.

index :: Enum a => [a] -> DAWG a Weight c -> Maybe IntSource

Position in a set of all dictionary entries with respect to the lexicographic order.

byIndex :: Enum a => Int -> DAWG a Weight c -> Maybe [a]Source

Find dictionary entry given its index with respect to the lexicographic order.

Construction

empty :: Unbox b => DAWG a b cSource

Empty DAWG.

fromList :: (Enum a, Ord b) => [([a], b)] -> DAWG a () bSource

Construct DAWG from the list of (word, value) pairs. First a DAWG is created and then it is frozen using the freeze function.

fromListWith :: (Enum a, Ord b) => (b -> b -> b) -> [([a], b)] -> DAWG a () bSource

Construct DAWG from the list of (word, value) pairs with a combining function. The combining function is applied strictly. First a DAWG is created and then it is frozen using the freeze function.

fromLang :: Enum a => [[a]] -> DAWG a () ()Source

Make DAWG from the list of words. Annotate each word with the () value. First a DAWG is created and then it is frozen using the freeze function.

Conversion

assocs :: (Enum a, Unbox b) => DAWG a b c -> [([a], c)]Source

Return all (key, value) pairs in the DAWG in ascending key order.

keys :: (Enum a, Unbox b) => DAWG a b c -> [[a]]Source

Return all keys of the DAWG in ascending order.

elems :: Unbox b => DAWG a b c -> [c]Source

Return all elements of the DAWG in the ascending order of their keys.

freeze :: DAWG a b -> DAWG a () bSource

Construct immutable version of the automaton.