Copyright | (c) Sebastian Graf 2017-2020 |
---|---|

License | ISC |

Maintainer | sgraf1337@gmail.com |

Portability | portable |

Safe Haskell | None |

Language | Haskell2010 |

# What is This?

The purpose of `datafix`

is to separate declaring
data-flow problems
from computing their solutions by
fixed-point iteration.

The need for this library arose when I was combining two analyses within GHC for my master's thesis. I held a talk on that topic, feel free to click through if you want to know the details.

You can think of data-flow problems as problems that are solvable by dynamic programming or memoization, except that the dependency graph of data-flow problems doesn't need to be acyclic.

Data-flow problems are declared with the primitives in
`Datafix.Description`

and solved by `Datafix.Worklist.`

.`solveProblem`

With that out of the way, let's set in place the GHCi environment of our examples:

`>>>`

`:set -XScopedTypeVariables`

`>>>`

`:set -XTypeApplications`

`>>>`

`:set -XTypeFamilies`

`>>>`

`import Datafix`

`>>>`

`import Data.Proxy (Proxy (..))`

`>>>`

`import Algebra.Lattice (JoinSemiLattice (..), BoundedJoinSemiLattice (..))`

`>>>`

`import Numeric.Natural`

# Use Case: Solving Recurrences

Let's start out by computing the fibonacci series:

`>>>`

fib :: Natural -> Natural fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) :}`:{`

`>>>`

2`fib 3`

`>>>`

55`fib 10`

Bring your rabbits to the vet while you can still count them...

Anyway, the fibonacci series is a typical problem exhibiting
*overlapping subproblems*. As a result, our `fib`

function from above scales badly in
the size of its input argument `n`

. Because we repeatedly recompute
solutions, the time complexity of our above function is in \(\mathcal{O}(2^n)\)!

We can do better by using *dynamic programming* or *memoization* to keep a
cache of already computed sub-problems, which helps computing the \(n\)th
item in \(\mathcal{O}(n)\) time and space:

`>>>`

fib2 :: Natural -> Natural fib2 n = fibs !! fromIntegral n where fibs = 0 : 1 : zipWith (+) fibs (tail fibs) :}`:{`

`>>>`

2`fib2 3`

`>>>`

55`fib2 10`

That's one of Haskell's pet issues: Expressing dynamic programs as lists through laziness.

As promised in the previous section, we can do the same using `datafix`

.
First, we need to declare a *transfer function* that makes the data
dependencies for the recursive case explicit, as if we were using
`fix`

to eliminate the recursion:

`>>>`

transferFib :: forall m . (MonadDependency m, Domain m ~ Natural) => Node -> LiftedFunc Natural m transferFib (Node 0) = return 0 transferFib (Node 1) = return 1 transferFib (Node n) = do a <- dependOn @m (Node (n-1)) b <- dependOn @m (Node (n-2)) return (a + b) :}`:{`

`MonadDependency`

contains a single primitive `dependOn`

for that purpose.

Every point of the fibonacci series is modeled as a seperate `Node`

of the
data-flow graph.
By looking at the definition of `LiftedFunc`

, we can see that
`LiftedFunc Natural m ~ m Natural`

, so for our simple
`Natural`

`Domain`

, the transfer function is specified directly in
`MonadDependency`

.

Note that indeed we eliminated explicit recursion in `transferFib`

.
This allows the solution algorithm to track and discover dependencies
of the transfer function as it is executed!

With our transfer function (which denotes data-flow nodes in the semantics
of `Natural`

s) in place, we can construct a `DataFlowFramework`

:

`>>>`

fibDff :: forall m . (MonadDependency m, Domain m ~ Natural) => DataFlowFramework m fibDff = DFF transferFib (const (eqChangeDetector @(Domain m))) :}`:{`

The `eqChangeDetector`

is important for cyclic dependency graphs and makes
sure we detect when a fixed-point has been reached.

That's it for describing the data-flow problem of fibonacci numbers.
We can ask `Datafix.Worklist.`

for a solution in a minute.`solveProblem`

The `solveProblem`

solver demands an instance of `BoundedJoinSemiLattice`

on the `Domain`

for when the data-flow graph is cyclic. We conveniently
delegate to the total `Ord`

instance for `Natural`

, knowing
that its semantic interpretation is irrelevant to us:

`>>>`

`instance JoinSemiLattice Natural where (\/) = max`

`>>>`

`instance BoundedJoinSemiLattice Natural where bottom = 0`

And now the final incantation of the solver:

`>>>`

55`solveProblem fibDff Sparse NeverAbort (dependOn @(DependencyM _ Natural) (Node 10))`

This will also execute in \(\mathcal{O}(n)\) space and time, all without worrying about a smart solution strategy involving how to tie knots or allocate vectors. Granted, this doesn't really pay off for simple problems like computing fibonacci numbers because of the boilerplate involved and the somewhat devious type-level story, but the intended use case is that of static analysis of programming languages.

Before I delegate you to a blog post about strictness analysis, we will look at a more devious reccurence relation with actual cycles in the resulting data-flow graph.

# Use Case: Solving Cyclic Recurrences

The recurrence relation describing fibonacci numbers admits a clear plan of how to compute a solution, because the dependency graph is obviously acyclic: To compute the next new value of the sequence, only the prior two values are needed.

This is not true of the following reccurence relation:

\[ f(n) = \begin{cases} 2 \cdot f(\frac{n}{2}), & n \text{ even}\\ f(n+1)-1, & n \text{ odd} \end{cases} \]

The identity function is the only solution to this, but it is unclear how we could arrive at that conclusion just by translating that relation into Haskell:

`>>>`

f n | even n = 2 * f (n `div` 2) | odd n = f (n + 1) - 1 :}`:{`

Imagine a call `f 1`

: This will call `f 2`

recursively, which again
will call `f 1`

. We hit a cyclic dependency!

Fortunately, we can use `datafix`

to compute the solution by fixed-point
iteration (which assumes monotonicity of the function to approximate):

`>>>`

transferF :: forall m . (MonadDependency m, Domain m ~ Int) => Node -> LiftedFunc Int m transferF (Node n) | even n = (* 2) <$> dependOn @m (Node (n `div` 2)) | odd n = (subtract 1) <$> dependOn @m (Node (n + 1)) :}`:{`

`>>>`

fDff :: forall m . (MonadDependency m, Domain m ~ Int) => DataFlowFramework m fDff = DFF transferF (const (eqChangeDetector @(Domain m))) :}`:{`

Specification of the data-flow problem works the same as for the `fib`

function.

As for `Natural`

, we need an instance of `BoundedJoinSemiLattice`

for `Int`

to compute a solution:

`>>>`

`instance JoinSemiLattice Int where (\/) = max`

`>>>`

`instance BoundedJoinSemiLattice Int where bottom = minBound`

Now it's just a matter of calling `solveProblem`

with the right parameters:

`>>>`

0`solveProblem fDff Sparse NeverAbort (dependOn @(DependencyM _ Int) (Node 0))`

`>>>`

5`solveProblem fDff Sparse NeverAbort (dependOn @(DependencyM _ Int) (Node 5))`

`>>>`

42`solveProblem fDff Sparse NeverAbort (dependOn @(DependencyM _ Int) (Node 42))`

`>>>`

-10`solveProblem fDff Sparse NeverAbort (dependOn @(DependencyM _ Int) (Node (-10)))`

Note how the *specification* of the data-flow problem was as unexciting as
it was for the fibonacci sequence (modulo boilerplate), yet the recurrence
we solved was pretty complicated already.

Of course, encoding the identity function this way is inefficient. But keep in mind that in general, we don't know the solution to a particular recurrence! It's always possible to solve the recurrence by hand upfront, but that's trading precious developer time for what might be a throw-away problem anyway.

Which brings us to the prime and final use case...

# Use Case: Static Analysis

Recurrence equations occur *all the time* in denotational
semantics and static data-flow analysis.

For every invocation of the compiler, for every module, for every analysis within the compiler, a recurrence relation representing program semantics is to be solved. Naturally, we can't task a human with solving a bunch of complicated recurrences everytime we hit compile.

In the imperative world, it's common-place to have some kind of fixed-point iteration framework carry out the iteration of the data-flow graph, but I could not find a similar abstraction for functional programming languages yet. Analyses for functional languages are typically carried out as iterated traversals of the syntax tree, but that is unsatisfying for a number of reasons:

- Solution logic of the data-flow problem is intertwined with its specification.
- Solution logic is duplicated among multiple analyses, violating DRY.
- A consequence of the last two points is that performance tweaks
have to be adapted for every analysis separately.
In the case of GHC's Demand Analyser, going from chaotic iteration
(which corresponds to naive iterated tree traversals) to an iteration
scheme that caches results of inner let-bindings, annotations to the
syntax tree are suddenly used like
`State`

threads, which makes the analysis logic even more complex than it already was.

So, I can only encourage any compiler dev who wants to integrate static
analyses into their compiler to properly specify the data-flow problems
in terms of `datafix`

and leave the intricacies of finding a good iteration
order to this library :)

# Comparison to Datalog/Soufflé

In its most declarative form, `datafix`

is an embedded DSL for specifying
static analyses. In that regard, it is really similar to
Soufflé, only that Soufflé uses
an external DSL (a Datalog dialect) to specify the analysis. The resulting
compiled executable needs to run in a separate process and gets the facts of
the input program encoded in datalog facts. `datafix`

analyses, on the other
hand, will be compiled into the host program and don't need an additional
communication layer.