Motivation ========== Haskell is a great language for data processing. You load some data in the IO monad, parse it, funnel the data through various functions and write the result back to disk or display it via a web server. The programmer has the `let` and `where` patterns at hand which can be used to sub-structure a single function, e.g. workflow x y = let a = f x b = g a y in h a b To the environment program, however, the values of the intermediate steps `a` and `b` are invisible and the reader does not know you used the auxiliary functions `f`, `g` and `h`, although they might be important when an outsider tries to check the correctness of the result of the `workflow` function. This is where the Provenience monad comes in. How it works ============ The Provenience monad is an ordinary state monad transformer. The state is a data flow [graph](https://hackage.haskell.org/package/fgl "fgl"), which we call the *variable store*. Nodes are [Pandoc](https://hackage.haskell.org/package/pandoc "pandoc") renderings of so-called *variables*. A variable is simply a pair of an ordinary Haskell value together with its node in the graph. A computation in the Provenience monad performs any number of the following five actions. * Register a new variable in the variable store * Provide a description of a registered variable (in form of a Pandoc [Block](http://hackage.haskell.org/package/pandoc-types/docs/Text-Pandoc-Definition.html#t:Block "Block")) * Provide a short name for a registered variable (used in hyperlinks) * Render the value of a registered variable into its node in the variable store (as a Pandoc `Block`). There is a class for default rendering methods akin to the `Show` class. * Apply a variable holding a function to a variable holding a value, similar to the `<*>` operator of `Applicative` functors. In the Provenience monad, we write `<%>` instead. The fifth action is the only action that adds edges to the data dependency graph. Suppose we have registered a variable `f` holding a value of type `a -> b` and a variable `x` holding a value of type `a`. The description of `f` should explain to the reader what the function that is the value of `f` does. The monadic action y <- pure f <%> x does not register `y` as a new variable; instead `y` points to the same node in the variable store as `f`. However, the value of `y` is the application of the value of `f` to the value of `x` and there is now an edge from `x` to `y` in the data flow graph labelled with the description of `f`. If `y` is not itself a function but the desired result, you should overwrite the node's description (which is still the description of `f`) with a new description of the value of `y`. Why this design choice? Because otherwise partial application is impossible. If <%> always registered new variables, then f <%> a <%> b would register both `f(a)` and `f(a)(b)` as variables, which might not be what the user intended. But overwriting `f` also means that we can not re-use the same function variable in several applications. When that is desired, use a Provenience action producing a variable instead of the variable itself. Consider the following. let f = var succ x <- input 4 y <- f <%> x z <- f <%> y Since the Haskell identifier `f` is bound to a Provenience action that registers a new variable holding the `succ` function, all three of `x`, `y` and `z` are distinct variables. The take-home message is that f <- var succ x <- input 4 y <- pure f <%> x is a dangerous style because the value of `f` is not what the corresponding node in the graph is being used for anymore. alternative Representation -------------------------- The variable store also permits to save an alternative representation of each variable in addition to the Pandoc rendering, since you might want to provide a machine-readable data flow graph in addition to a Pandoc document. Similarly to the `IHaskellDisplay` class, each type used in a variable must have a type class instance that allows automatic conversion into the alternative representation. If you don't need this feature, simply choose () as the alternative representation type. The graph of alternative representations can be extracted from the variable store. We provide code to assemble the store into a spreadsheet (of static cells). Foldable structures of basic values become columns while doubly-nested structures become tables. Example ======= Continuing the example above, in the Provenience monad you would write something like the following. Of course it is up to the programmer to decide how fine-grained the decomposition into Provenience actions should be. workflow x' y' = do ---------- register and render the input variables ------------------ x <- input x' -- register and render x' y <- input y' x `named` "x" -- links to x show "x" as text y `named` "y" x renderDefault "first item of input data" -- describe x y renderDefault "second item of input data" linkx <- linkto x -- create a hyperlink, used below let what_f_does = Para [Str "auxiliary function f applied to ",linkx] --------------------------------------------------------------------- ------ the actual computation is three lines as in the pure code ---- a <- func f what_f_does <%> x b <- func g (renderDefault "auxiliary function g") <%> a <%> y c <- func h (renderDefault "auxiliary function h") <%> a <%> b ------ only book-keeping below -------------------------------------- --------------------------------------------------------------------- a `named` "a" >> b `named` "b" >> c `named` "result" a renderDefault "first intermediate result" b renderDefault "second intermediate result" c renderDefault "the workflow result" render a >> render b >> render c return c Above, the action `func` registers a new variable and immediately supplies a description, which is then used as edge label by the `<%>` operator on the same line. You see that instead of one line of pure Haskell you are burdened with writing four kinds of Provenience actions: *register*, *describe*, *alias* and *render*. But of the four actions, three are only concerned with providing descriptions that the pure code did not contain. Remarks ======= This package was inspired by the [Javelin](https://en.wikipedia.org/wiki/Javelin_Software "wikipedia") Software. Thanks to John R Levine, one of the authors of Javelin, for explaining the concepts underlying Javelin. By using [Pandoc](https://hackage.haskell.org/package/pandoc "pandoc") the user has a number of output format choices. With a little CSS, the above example may be rendered like follows. Unfortunately, hackage does not allow raw html in markdown, so you have to convert the markdown yourself. (For the sake of example, we used `f = abs`, `g = replicate` and `h = fmap concat . replicate`).