ADPfusion: Efficient, high-level dynamic programming.

[ algorithms, bioinformatics, bsd3, data-structures, formal-languages, library ] [ Propose Tags ]

ADPfusion combines stream-fusion (using the stream interface provided by the vector library) and type-level programming to provide highly efficient dynamic programming combinators.

From the programmers' viewpoint, ADPfusion behaves very much like the original ADP implementation http://bibiserv.techfak.uni-bielefeld.de/adp/ developed by Robert Giegerich and colleagues, though both combinator semantics and backtracking are different.

The library internals, however, are designed not only to speed up ADP by a large margin (which this library does), but also to provide further runtime improvements by allowing the programmer to switch over to other kinds of data structures with better time and space behaviour. Most importantly, dynamic programming tables can be strict, removing indirections present in lazy, boxed tables.

As a simple benchmark, consider the Nussinov78 algorithm which translates to three nested for loops (for C). In the figure, four different approaches are compared using inputs with size 100 characters to 1000 characters in increments of 100 characters. C is an implementation (".C" directory) in C using "gcc -O3". ADP is the original ADP approach (see link above), while GAPC uses the GAP language (http://gapc.eu/).

Please note that actual performance will depend much on table layout and data structures accessed during calculations, but in general performance is very good: close to C and better than other high-level approaches (that I know of).

Even complex ADP code tends to be completely optimized to loops that use only unboxed variables (Int# and others, indexIntArray# and others).

Completely novel (compared to ADP), is the idea of allowing efficient monadic combinators. This facilitates writing code that performs backtracking, or samples structures stochastically, among others things.

This version is still highly experimental and makes use of multiple recent improvements in GHC. This is particularly true for the monadic interface.

Newley added are the ADP.Fusion.GAPlike modules. These allow for writing grammars with only one (non)-terminal combinator. The logic for index manipulation is now moved into data types for terminals and non-terminals.

While this change leads to slightly more complicated instances for each new terminal or non-terminal, the overall code complexity is significantly lower. In addition, Constraint Kinds make complex interactions between (non)-terminals possible, while still managing to produce high-performance code.

The final goal would, of course, be to have no inter-terminal combinators anymore.

  • GHC 7.6, LLVM, and -fnew-codegen recommended: gives a speedup of x2 for GAPcriterion

Long term goals: Outer indices with more than two dimensions, specialized table design, a combinator library, a library for computational biology.

Two algorithms from the realm of computational biology are provided as examples on how to write dynamic programming algorithms using this library: http://hackage.haskell.org/package/Nussinov78 and http://hackage.haskell.org/package/RNAFold.

Changes since 0.0.1.2:

  • require GHC 7.6

  • ADP.Fusion.GAPlike module for (almost) combinator-less grammars

  • ConstraintKinds for constrained parsers in GAPlike.

Changes since 0.0.1.0:

  • compatibility with GHC 7.4

  • note: still using fundeps & and TFs together. The TF-only version does not optimize as well (I know why but not yet how to fix it)

Using the new code generator?

The new code generator is not official yet, but I recommend trying it out:


[Skip to Readme]

Modules

[Index]

Flags

Automatic Flags
NameDescriptionDefault
devel

build criterion benchmarks and pull in QuickCheck

Disabled

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.0.1.0, 0.0.1.1, 0.0.1.2, 0.1.0.0, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.2.0.3, 0.2.0.4, 0.2.1.0, 0.4.0.0, 0.4.0.1, 0.4.0.2, 0.4.1.0, 0.4.1.1, 0.5.0.0, 0.5.1.0, 0.5.2.0, 0.5.2.2, 0.6.0.0 (info)
Dependencies base (>=4 && <5), criterion (>=0.6 && <0.7), ghc-prim, primitive (>=0.5 && <0.6), PrimitiveArray (>=0.4 && <0.5), QuickCheck (==2.5), vector (>=0.10 && <0.11) [details]
License BSD-3-Clause
Copyright Christian Hoener zu Siederdissen, 2011-2012
Author Christian Hoener zu Siederdissen, 2011-2012
Maintainer choener@tbi.univie.ac.at
Category Algorithms, Data Structures, Bioinformatics
Home page http://www.tbi.univie.ac.at/~choener/adpfusion
Source repo head: git clone git://github.com/choener/ADPfusion
Uploaded by ChristianHoener at 2012-11-07T18:22:28Z
Distributions
Reverse Dependencies 15 direct, 1 indirect [details]
Executables GAPcriterion
Downloads 19317 total (73 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs uploaded by user [build log]
Successful builds reported [all 1 reports]

Readme for ADPfusion-0.1.0.0

[back to package description]

ADPfusion (c) 2012, Christian Hoener zu Siederdissen University of Vienna, Vienna, Austria choener@tbi.univie.ac.at LICENSE: BSD3

Introduction

ADPfusion combines stream-fusion (using the stream interface provided by the vector library) and type-level programming to provide highly efficient dynamic programming combinators.

From the programmers' viewpoint, ADPfusion behaves very much like the original ADP implementation http://bibiserv.techfak.uni-bielefeld.de/adp/ developed by Robert Giegerich and colleagues, though both combinator semantics and backtracking are different.

The library internals, however, are designed not only to speed up ADP by a large margin (which this library does), but also to provide further runtime improvements by allowing the programmer to switch over to other kinds of data structures with better time and space behaviour. Most importantly, dynamic programming tables can be strict, removing indirections present in lazy, boxed tables.

As an example, even rather complex ADP code tends to be completely optimized to loops that use only unboxed variables (Int# and others, indexIntArray# and others).

Completely novel (compared to ADP), is the idea of allowing efficient monadic combinators. This facilitates writing code that performs backtracking, or samples structures stochastically, among others things.

This version is still highly experimental and makes use of multiple recent improvements in GHC. This is particularly true for the monadic interface.

Long term goals: Outer indices with more than two dimensions, specialized table design, a combinator library, a library for computational biology.

Two algorithms from the realm of computational biology are provided as examples on how to write dynamic programming algorithms using this library: http://hackage.haskell.org/package/Nussinov78 and http://hackage.haskell.org/package/RNAfold.

Installation

If GHC-7.2.2/GHC-7.4, LLVM and cabal-install are available, you should be all set. I recommend using cabal-dev as it provides a very nice sandbox (replace cabal-dev with cabal otherwise).

If you go with cabal-dev, no explicit installation is necessary and ADPfusion will be installed in the sandbox together with the example algorithms or your own.

For a more global installation, "cabal install ADPfusion" should do the trick.

To run the Quickcheck tests, do an additional "cabal-dev install QuickCheck", then "cabal-dev ghci", ":l ADP/Fusion/QuickCheck.hs", and "allProps". Loading the quickcheck module should take a bit due to compilation. "allProps" tests all properties and should yield no errors.

Notes

If you have problems, find bugs, or want to use this library to write your own DP algorithms, please send me a mail. I'm very interested in hearing what is missing.

One of the things I'll be integrating is an extension to higher dimensions (more than two).

Right now, I am not quite happy with the construction and destruction of the "Box" representations. These will change soon. In addition, an analysis of the actual combinators should remove the need for nested applications of objective functions in many cases.

VERSION HISTORY

  • 0.0.0.3:

    • initial version, together with submitted paper
  • 0.0.0.4:

    • based most combinators on just two generalized Box creators
    • cleaned up and simplified RNAfold example
    • RNAfold execution now a bit slower. Simplified energy functions typically only have three arguments now, which can be of 'Primary' type. While this reduces speed because we will repeatedly ask for the same value, it is much easier to handle the different functions and ``play'' with fusion properties.
    • RNAfold compilation massively faster: execution/compilation tradoff is worth it for experimenting with ADPfusion; still faster than anything except RNAfold itself. We are now now 2.8x times slower, but 3.5x times slower
    • Quickcheck properties for many combinators
    • Unit tests for RNAfold functions
    • will soon split off RNAfold and Nussinov and publish three hackage libraries
    • this version was never available, after being done, a split into library and examples was performed.
  • 0.0.1.0

    • providing just the library. Examples are found in different libraries.
  • 0.0.1.1

    • this version should be compatible with GHC-7.4, at least GHC-7,4.2-rc1.
    • a type family (TF) version has not been able to show the same performance as fundeps. This means that fundeps for Internal.hs stay alive, for now.