dawgdic: Generation and traversal of compressed directed acyclic dawg graphs

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Warnings:

Generation and traversal of compressed directed acyclic dawg graphs


[Skip to Readme]

Properties

Versions 0.1.0, 0.1.0
Change log CHANGELOG.md
Dependencies base (>=4.17.2.1 && <4.25.0.0), binary, bitvec, deepseq, hashable, primitive, vector, vector-binary, vector-hashtables [details]
License BSD-3-Clause
Author Andrey Prokopenko
Maintainer persiantiger@yandex.ru
Category Data
Uploaded by swamp_agr at 2025-08-28T19:20:45Z

Modules

Flags

Manual Flags

NameDescriptionDefault
trace

Enable tracing

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for dawgdic-0.1.0

[back to package description]

dawgdic

Overview

dawgdic is a library for building and accessing dictionaries implemented with directed acyclic word graphs (DAWG).

This is a ported version of C++ dawgdic library.

Features

This library offers DAWG, Dictionary, Guide and Completer data types as well as their builders.

Input characters must be in range 0-255.

This port does not contain RankedGuide and RankedCompleter (yet).

Comparison with other DAWG libraries

package/lexicon size (words) 2.6K 26K 370K
dawg: Data.DAWG.Dynamic 370868 2823660 29933964
dawg: Data.DAWG.Static 133138 1030578 11128690
dawg-ord: Data.DAWG.Int N/A N/A N/A
dawg-ord: Data.DAWG.Ord N/A N/A N/A
packed-dawg: Data.DAWG.Packed 15619 128491 1481671
dawgdic: Data.DAWG.Dictionary 3088 141328 1603600
dawgdic: Guide (w/ Dictionary) 4640 212000 2405408
feature dawg dawg-ord packed-dawg dawgdic
build from list + + + +
persistence + - + +
insert Dynamic + - Builder
delete key Dynamic + - -
follow character Static ~edges lookupPrefix +
lookup value + + - +
member ~lookup ~lookup + +
keys + + + +
values + + - +
toList (assocs) + + - +
complete word ~submap - ~toList +
fuzzy search - - - -
max size N/A N/A 2^22 nodes N/A

Installation

Add dawgdic dependency to your project and run cabal build.

Building and Querying

Building DAWG from lexicon of words and ignoring insertion failures is as simple as this:

>>> import Data.DAWG.DAWG
>>> dawg <- fromAscList . lines =<< readFile "/path/to/lexicon"

Otherwise, consider mutable builder. It could also be useful if words are associated with values.

buildOrError content = do
  dawgBuilder <- new
  forM_ content \(word, value) -> do
    result <- insert word (Just value) dawgBuilder
    unless result $ error "Insert failed"
  freeze dawgBuilder

Building dictionary and guide:

>>> import qualified Data.DAWG.Dictionary as D
>>> import qualified Data.DAWG.Guide as G
>>> dict <- D.build' dawg
>>> guide <- G.build' dawg dict

Saving dictionary and guide:

>>> D.write "dict.dawg" dict
>>> G.write "guide.dawg" guide

Loading dictionary and guide:

>>> dict <- D.load "dict.dawg"
>>> guide <- G.load "guide.dawg"

Consider using Completer for auto-complete-like queries or if you need to obtain lexicon back from Dictionary and Guide.

Benchmarks

Utilities

Benchmark                             default(μs)
------------------------------------- ------------
Utilities/10/Dawg.fromAscList             5549.76
Utilities/10/Dict.build'                 26981.64
Utilities/10/Dict.contains                  22.46
Utilities/10/Dict.lookup                    22.68
Utilities/10/Dict.follow                    22.42
Utilities/10/Guide.build'                   76.81
Utilities/10/Completer.completeKeys        385.35
------------------------------------- ------------
Utilities/100/Dawg.fromAscList           66486.21
Utilities/100/Dict.build'               194881.38
Utilities/100/Dict.contains                326.16
Utilities/100/Dict.lookup                  323.45
Utilities/100/Dict.follow                  319.86
Utilities/100/Guide.build'                 782.24
Utilities/100/Completer.completeKeys      7016.36
------------------------------------- ------------
Utilities/1000/Dawg.fromAscList         888061.61
Utilities/1000/Dict.build'             1659798.44
Utilities/1000/Dict.contains              3627.54
Utilities/1000/Dict.lookup                3638.10
Utilities/1000/Dict.follow                3564.64
Utilities/1000/Guide.build'               7992.73
Utilities/1000/Completer.completeKeys    82343.20
------------------------------------- ------------

How to reproduce:

cabal install bench-show --overwrite-policy=always
time cabal bench +RTS "-N4 -A64m -n4m -qb0" -RTS  --benchmark-options="--output bench.html --csv results.csv"
bench-show --presentation=Solo report results.csv

Comparison

Benchmark                      default(μs)
------------------------------ -----------
10/dawgdic.follow                    21.04
10/dawg.follow                       85.41
10/dawg-ord.follow                  110.42
10/packed-dawg.follow               147.30
100/dawgdic.follow                  311.05
100/dawg.follow                    1367.81
100/dawg-ord.follow                1774.78
100/packed-dawg.follow             2098.12
1000/dawgdic.follow                3273.02
1000/dawg.follow                  18842.47
1000/dawg-ord.follow              21992.03
1000/packed-dawg.follow           28938.01
------------------------------ -----------
10/dawgdic.lookup value              21.74
10/dawg.lookup value                 61.89
10/dawg-ord.lookup value            209.93
100/dawgdic.lookup value            290.13
100/dawg.lookup value               847.27
100/dawg-ord.lookup value          2973.35
1000/dawgdic.lookup value          3551.44
1000/dawg.lookup value            14217.29
1000/dawg-ord.lookup value        40111.28
10/dawgdic.member                    21.02
------------------------------ -----------
10/dawg.member                       56.43
10/dawg-ord.member                  191.74
10/packed-dawg.member               173.97
100/dawgdic.member                  343.32
100/dawg.member                     873.70
100/dawg-ord.member                2863.96
100/packed-dawg.member             2200.33
1000/dawgdic.member                9288.22
1000/dawg.member                  13878.72
1000/dawg-ord.member              31428.47
1000/packed-dawg.member           29089.83
------------------------------ -----------
10/dawgdic.keys                     108.33
10/dawg.keys                        100.23
10/dawg-ord.keys                    160.51
10/packed-dawg.keys                  50.50
100/dawgdic.keys                   1392.78
100/dawg.keys                      1413.00
100/dawg-ord.keys                  2324.99
100/packed-dawg.keys                692.23
1000/dawgdic.keys                 15628.83
1000/dawg.keys                    16541.79
1000/dawg-ord.keys                27612.77
1000/packed-dawg.keys              7415.48
------------------------------ -----------
10/dawgdic.values                    52.26
10/dawg.values                       72.29
10/dawg-ord.values                  134.79
100/dawgdic.values                  616.02
100/dawg.values                    1060.87
100/dawg-ord.values                1794.82
1000/dawgdic.values                6457.32
1000/dawg.values                  11734.25
1000/dawg-ord.values              21532.54
------------------------------ -----------
10/dawgdic.toList                   107.26
10/dawg.toList                      116.89
10/dawg-ord.toList                  202.35
100/dawgdic.toList                 1367.87
100/dawg.toList                    1608.26
100/dawg-ord.toList                2898.26
1000/dawgdic.toList               16334.34
1000/dawg.toList                  18578.59
1000/dawg-ord.toList              36360.44
------------------------------ -----------
10/dawgdic.complete word            367.97
10/dawg.complete word               248.14
10/packed-dawg.complete word        303.69
100/dawgdic.complete word          6151.51
100/dawg.complete word             4494.85
100/packed-dawg.complete word      4913.83
1000/dawgdic.complete word        71152.49
1000/dawg.complete word           58222.76
1000/packed-dawg.complete word    60828.53
------------------------------ -----------

How to reproduce:

cabal install bench-show --overwrite-policy=always
time cabal bench +RTS "-N4 -A64m -n4m -qb0" -RTS  --benchmark-options="--output bench.html --csv results.csv"
bench-show --presentation=Solo report results.csv

Contributing

In cases of issues please attach callstack, provide minimal dictionary lexicon and provide logs with enabled tracing.

cabal build -ftrace

Acknowledgments