{-# OPTIONS_GHC -fno-warn-unused-imports #-}

-- |
-- Module      :  Datafix.Tutorial
-- Copyright   :  (c) Sebastian Graf 2018
-- License     :  ISC
-- Maintainer  :  sgraf1337@gmail.com
-- Portability :  portable
--
-- = What is This?
--
-- The purpose of @datafix@ is to separate declaring
-- [data-flow problems](https://en.wikipedia.org/wiki/Data-flow_analysis)
-- from computing their solutions by
-- [fixed-point iteration](https://en.wikipedia.org/wiki/Fixed-point_iteration).
--
-- The need for this library arose when I was combining two analyses
-- within GHC for my master's thesis. I recently
-- [held a talk](https://cdn.rawgit.com/sgraf812/hiw17/2645b206d3f2b5e6e7c95bc791dfa4bf9cbc8d12/slides.pdf)
-- on that topic, feel free to click through if you want to know the details.
--
-- You can think of data-flow problems as problems that are solvable by
-- [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming)
-- or [memoization](https://en.wikipedia.org/wiki/Memoization),
-- except that the dependency graph of data-flow problems doesn't need to be
-- acyclic.
--
-- Data-flow problems are declared with the primitives in
-- @"Datafix.Description"@ and solved by @Datafix.Worklist.'solveProblem'@.
--
-- With that out of the way, let's set in place the GHCi environment of our
-- examples:
--
-- >>> :set -XScopedTypeVariables
-- >>> :set -XTypeApplications
-- >>> :set -XTypeFamilies
-- >>> import Datafix
-- >>> import Data.Proxy (Proxy (..))
-- >>> import Algebra.Lattice (JoinSemiLattice (..), BoundedJoinSemiLattice (..))
-- >>> import Numeric.Natural
--
-- = Use Case: Solving Recurrences
--
-- Let's start out by computing the fibonacci series:
--
-- >>> :{
--   fib :: Natural -> Natural
--   fib 0 = 0
--   fib 1 = 1
--   fib n = fib (n-1) + fib (n-2)
-- :}
--
-- >>> fib 3
-- 2
-- >>> fib 10
-- 55
--
-- Bring your rabbits to the vet while you can still count them...
--
-- Anyway, the fibonacci series is a typical problem exhibiting
-- /overlapping subproblems/. As a result, our @fib@ function from above scales badly in
-- the size of its input argument @n@. Because we repeatedly recompute
-- solutions, the time complexity of our above function is in \(\mathcal{O}(2^n)\)!
--
-- We can do better by using /dynamic programming/ or /memoization/ to keep a
-- cache of already computed sub-problems, which helps computing the \(n\)th
-- item in \(\mathcal{O}(n)\) time and space:
--
-- >>> :{
--   fib2 :: Natural -> Natural
--   fib2 n = fibs !! fromIntegral n
--     where
--       fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
-- :}
--
-- >>> fib2 3
-- 2
-- >>> fib2 10
-- 55
--
-- That's one of Haskell's pet issues: Expressing dynamic programs as lists
-- through laziness.
--
-- As promised in the previous section, we can do the same using @datafix@.
-- First, we need to declare a /transfer function/ that makes the data
-- dependencies for the recursive case explicit, as if we were using
-- 'Data.Function.fix' to eliminate the recursion:
--
-- >>> :{
--   transferFib
--     :: forall m
--      . (MonadDependency m, Domain m ~ Natural)
--     => Node
--     -> LiftedFunc Natural m
--   transferFib (Node 0) = return 0
--   transferFib (Node 1) = return 1
--   transferFib (Node n) = do
--     a <- dependOn @m (Node (n-1))
--     b <- dependOn @m (Node (n-2))
--     return (a + b)
-- :}
--
-- 'MonadDependency' contains a single primitive 'dependOn' for that purpose.
--
-- Every point of the fibonacci series is modeled as a seperate 'Node' of the
-- data-flow graph.
-- By looking at the definition of 'LiftedFunc', we can see that
-- @LiftedFunc Natural m ~ m Natural@, so for our simple
-- 'Natural' 'Domain', the transfer function is specified directly in
-- 'MonadDependency'.
--
-- Note that indeed we eliminated explicit recursion in @transferFib@.
-- This allows the solution algorithm to track and discover dependencies
-- of the transfer function as it is executed!
--
-- With our transfer function (which denotes data-flow nodes in the semantics
-- of 'Natural's) in place, we can construct a 'DataFlowProblem':
--
-- >>> :{
--   fibDfp :: forall m . (MonadDependency m, Domain m ~ Natural) => DataFlowProblem m
--   fibDfp = DFP transferFib (const (eqChangeDetector @(Domain m)))
-- :}
--
-- The 'eqChangeDetector' is important for cyclic dependency graphs and makes
-- sure we detect when a fixed-point has been reached.
--
-- That's it for describing the data-flow problem of fibonacci numbers.
-- We can ask @Datafix.Worklist.'solveProblem'@ for a solution in a minute.
--
-- The 'solveProblem' solver demands an instance of 'BoundedJoinSemiLattice'
-- on the 'Domain' for when the data-flow graph is cyclic. We conveniently
-- delegate to the total @Ord@ instance for 'Numeric.Natural.Natural', knowing
-- that its semantic interpretation is irrelevant to us:
--
-- >>> instance JoinSemiLattice Natural where (\/) = max
-- >>> instance BoundedJoinSemiLattice Natural where bottom = 0
--
-- And now the final incantation of the solver:
--
-- >>> solveProblem fibDfp Sparse NeverAbort (Node 10)
-- 55
--
-- This will also execute in \(\mathcal{O}(n)\) space and time, all without
-- worrying about a smart solution strategy involving how to tie knots or
-- allocate vectors.
-- Granted, this doesn't really pay off for simple problems like computing
-- fibonacci numbers because of the boilerplate involved and the somewhat
-- devious type-level story, but the intended use case is that of static
-- analysis of programming languages.
--
-- Before I delegate you to a blog post about strictness analysis,
-- we will look at a more devious reccurence relation with actual
-- cycles in the resulting data-flow graph.
--
-- = Use Case: Solving Cyclic Recurrences
--
-- The recurrence relation describing fibonacci numbers admits a clear
-- plan of how to compute a solution, because the dependency graph is
-- obviously acyclic: To compute the next new value of the sequence,
-- only the prior two values are needed.
--
-- This is not true of the following reccurence relation:
--
-- \[
-- f(n) = \begin{cases}
--   2 \cdot f(\frac{n}{2}), & n \text{ even}\\
--   f(n+1)-1, & n \text{ odd}
-- \end{cases}
-- \]
--
-- The identity function is the only solution to this, but it is unclear
-- how we could arrive at that conclusion just by translating that relation
-- into Haskell:
--
-- >>> :{
-- f n
--   | even n = 2 * f (n `div` 2)
--   | odd n  = f (n + 1) - 1
-- :}
--
-- Imagine a call @f 1@: This will call @f 2@ recursively, which again
-- will call @f 1@. We hit a cyclic dependency!
--
-- Fortunately, we can use @datafix@ to compute the solution by fixed-point
-- iteration (which assumes monotonicity of the function to approximate):
--
-- >>> :{
--   transferF
--     :: forall m
--      . (MonadDependency m, Domain m ~ Int)
--     => Node
--     -> LiftedFunc Int m
--   transferF (Node n)
--     | even n = (* 2) <$> dependOn @m (Node (n `div` 2))
--     | odd n  = (subtract 1) <$> dependOn @m (Node (n + 1))
-- :}
--
-- >>> :{
--   fDfp :: forall m . (MonadDependency m, Domain m ~ Int) => DataFlowProblem m
--   fDfp = DFP transferF (const (eqChangeDetector @(Domain m)))
-- :}
--
-- Specification of the data-flow problem works the same as for the 'fib'
-- function.
--
-- As for 'Natural', we need an instance of 'BoundedJoinSemiLattice'
-- for 'Int' to compute a solution:
--
-- >>> instance JoinSemiLattice Int where (\/) = max
-- >>> instance BoundedJoinSemiLattice Int where bottom = minBound
--
-- Now it's just a matter of calling 'solveProblem' with the right parameters:
--
-- >>> solveProblem fDfp Sparse NeverAbort (Node 0)
-- 0
-- >>> solveProblem fDfp Sparse NeverAbort (Node 5)
-- 5
-- >>> solveProblem fDfp Sparse NeverAbort (Node 42)
-- 42
-- >>> solveProblem fDfp Sparse NeverAbort (Node (-10))
-- -10
--
-- Note how the /specification/ of the data-flow problem was as unexciting as
-- it was for the fibonacci sequence (modulo boilerplate), yet the recurrence
-- we solved was pretty complicated already.
--
-- Of course, encoding the identity function this way is inefficient.
-- But keep in mind that in general, we don't know the solution to a particular
-- recurrence! It's always possible to solve the recurrence by hand upfront,
-- but that's trading precious developer time for what might be a throw-away
-- problem anyway.
--
-- Which brings us to the prime and final use case...
--
-- = Use Case: Static Analysis
--
-- Recurrence equations occur /all the time/ in denotational
-- semantics and static data-flow analysis.
--
-- For every invocation of the compiler, for every module, for every analysis
-- within the compiler, a recurrence relation representing program semantics
-- is to be solved. Naturally, we can't task a human with solving a bunch of
-- complicated recurrences everytime we hit compile.
--
-- In the imperative world, it's common-place to have some kind of fixed-point
-- iteration framework carry out the iteration of the data-flow graph, but
-- I could not find a similar abstraction for functional programming languages
-- yet. Analyses for functional languages are typically carried out as iterated
-- traversals of the syntax tree, but that is unsatisfying for a number of
-- reasons:
--
--  1.  Solution logic of the data-flow problem is intertwined with its
--      specification.
--  2.  Solution logic is duplicated among multiple analyses, violating DRY.
--  3.  A consequence of the last two points is that performance tweaks
--      have to be adapted for every analysis separately.
--      In the case of GHC's Demand Analyser, going from chaotic iteration
--      (which corresponds to naive iterated tree traversals) to an iteration
--      scheme that caches results of inner let-bindings, annotations to the
--      syntax tree are suddenly used like 'State' threads, which makes
--      the analysis logic even more complex than it already was.
--
-- So, I can only encourage any compiler dev who wants to integrate static
-- analyses into their compiler to properly specify the data-flow problems
-- in terms of @datafix@ and leave the intricacies of finding a good iteration
-- order to this library :)

module Datafix.Tutorial () where

import           Datafix