dawg-0.11: Directed acyclic word graphs

Safe HaskellNone

Data.DAWG.Static

Contents

Description

The module implements directed acyclic word graphs (DAWGs) internaly represented as minimal acyclic deterministic finite-state automata.

In comparison to Data.DAWG.Dynamic module the automaton implemented here:

  • Keeps all nodes in one array and therefore uses less memory,
  • When weighed, it can be used to perform static hashing with index and byIndex functions,
  • Doesn't provide insert/delete family of operations.

Synopsis

DAWG type

data DAWG a b c Source

DAWG a b c constitutes an automaton with alphabet symbols of type a, transition labels of type b and node values of type Maybe c. All nodes are stored in a Vector with positions of nodes corresponding to their IDs.

Instances

(Eq b, Eq c, Unbox b) => Eq (DAWG a b c) 
(Ord b, Ord c, Unbox b) => Ord (DAWG a b c) 
(Show b, Show c, Unbox b) => Show (DAWG a b c) 
(Binary b, Binary c, Unbox b) => Binary (DAWG a b c) 

ID

type ID = IntSource

Node identifier.

rootID :: DAWG a b c -> IDSource

The actual DAWG root has the 0 ID. Thanks to the rootID attribute, we can represent a submap of a DAWG.

byID :: ID -> DAWG a b c -> Maybe (DAWG a b c)Source

Retrieve sub-DAWG with a given ID (or Nothing, if there's no such DAWG). This function can be used, together with the root function, to store IDs rather than entire DAWGs in a data structure.

Query

lookup :: (Enum a, Unbox b) => [a] -> DAWG a b c -> Maybe cSource

Find value associated with the key.

edges :: Enum a => DAWG a b c -> [(a, DAWG a b c)]Source

A list of outgoing edges.

submap :: (Enum a, Unbox b) => [a] -> DAWG a b c -> DAWG a b cSource

Return the sub-DAWG containing all keys beginning with a prefix. The in-memory representation of the resultant DAWG is the same as of the original one, only the pointer to the DAWG root will be different.

numStates :: DAWG a b c -> IntSource

Number of states in the automaton. TODO: The function ignores the rootID value, it won't work properly after using the submap function.

numEdges :: DAWG a b c -> IntSource

Number of edges in the automaton. TODO: The function ignores the rootID value, it won't work properly after using the submap function.

Weight

type Weight = IntSource

Weight of a node corresponds to the number of final states reachable from the node. Weight of an edge is a sum of weights of preceding nodes outgoing from the same parent node.

weigh :: DAWG a b c -> DAWG a Weight cSource

Compute node weights and store corresponding values in transition labels. Be aware, that the entire DAWG will be weighted, even when (because of the use of the submap function) only a part of the DAWG is currently selected.

size :: DAWG a Weight c -> IntSource

A number of distinct (key, value) pairs in the weighted DAWG.

index :: Enum a => [a] -> DAWG a Weight c -> Maybe IntSource

Position in a set of all dictionary entries with respect to the lexicographic order.

byIndex :: Enum a => Int -> DAWG a Weight c -> Maybe [a]Source

Find dictionary entry given its index with respect to the lexicographic order.

Construction

empty :: Unbox b => DAWG a b cSource

Empty DAWG.

fromList :: (Enum a, Ord b) => [([a], b)] -> DAWG a () bSource

Construct DAWG from the list of (word, value) pairs. First a DAWG is created and then it is frozen using the freeze function.

fromListWith :: (Enum a, Ord b) => (b -> b -> b) -> [([a], b)] -> DAWG a () bSource

Construct DAWG from the list of (word, value) pairs with a combining function. The combining function is applied strictly. First a DAWG is created and then it is frozen using the freeze function.

fromLang :: Enum a => [[a]] -> DAWG a () ()Source

Make DAWG from the list of words. Annotate each word with the () value. First a DAWG is created and then it is frozen using the freeze function.

Conversion

assocs :: (Enum a, Unbox b) => DAWG a b c -> [([a], c)]Source

Return all (key, value) pairs in the DAWG in ascending key order.

keys :: (Enum a, Unbox b) => DAWG a b c -> [[a]]Source

Return all keys of the DAWG in ascending order.

elems :: Unbox b => DAWG a b c -> [c]Source

Return all elements of the DAWG in the ascending order of their keys.

freeze :: DAWG a b -> DAWG a () bSource

Construct immutable version of the automaton.