Boilerplate Removal with Uniplate

by Neil Mitchell

Generic transformations and queries are often referred to as boilerplate code - they remain relatively similar as the action performed by the code changes, and can often outnumber the actual intent of the code in terms of lines. While other generic traversal schemes have shown how powerful new features can be added to compilers, and how the type system can be manipulated into accepting these operations, this document focuses on a conceptually simpler generic concept. The Uniplate class is introduced, which abstracts over common traversals and queries in a simple manner. A more complete document on the Uniplate class will be available from the project website shortly.

There have been several attempts at generic traversal/query methods in Haskell. One initial paper was "Scrap your boilerplate: a practical design pattern for generic programming" (free copy) - which I will refer to as SYB. Another mechanism is "A Pattern for Almost Compositional Functions" (free copy) - which I refer to as Compos (after the name of their class). A detailed comparison is given in the Uniplate paper (to be submitted).

The principle advantage of the Uniplate class over these two papers is that it requires no type system extensions, compared to rank-2 types for SYB and GADT's for Compos. The simplicity of the types required means that the user is free to concentrate on the operations within the class, without requiring thought as to the type trickery required. The Uniplate pattern has been implemented in Yhc for the Core data type, and in Catch on several data types within the program.

This document proceeds as follows:

The motivation and use cases for Uniplate
How to use the Uniplate class
Extensions to Biplate

The libraries is available through Hackage or darcs:

darcs get --partial http://www.cs.york.ac.uk/fp/darcs/uniplate

If you only wish to read a small fraction of this document, can I suggest you pay particular attention to transform and universe - these are by far the most common traversal patterns.

Acknowledgements

Thanks to Björn Bringert for feedback on an earlier version of this document, Eric Mertens for various ideas and code snippets, and to Matt Naylor and Tom Shackell for helpful discussions.

Motivation and Use Cases

The idea behind the Uniplate class is that there exists a data structure, usually with a reasonable number of constructors, which is often transformed or analysed. The usual example of this would be a compiler, which has at its core an expression type. This can be seen as a form of generic programming.

The Uniplate class has the following goals:

Adding a new constructor should require very few changes
Traversals should have a minimal amount of boiler plate
The obvious code should "just work"
Haskell 98, where possible
Writing new types of traverals is easy
The default traversals cover most cases

The ideas behind the Uniplate class have been used extensively, in both the Yhc compiler and the Catch tool. In Catch there are over 100 traversals using the Uniplate class, showing that the Uniplate class gets extensive use.

Using Uniplate

These examples revolve around a small arithmetic language, given here:

import Data.Generics.Uniplate

data Expr = Val Int
          | Add Expr Expr
          | Sub Expr Expr
          | Div Expr Expr
          | Mul Expr Expr
          | Neg Expr
          deriving (Show, Eq)

Assume that a Uniplate class has already been written; so now instance Uniplate Expr is available to us. Some examples are presented, in rough order of increasingly complexity.

Checking for division by zero

universe :: Uniplate on => on -> [on]

If an expression is divided by zero, this causes a runtime error in our language. As part of the compiler, it's nice to give the user a warning message about this. This can be done with the following test:

hasDivZero :: Expr -> Bool
hasDivZero x = not $ null [() | Div _ (Val 0) <- universe x]

Here the only Uniplate method being used is universe. Given a tree, universe returns all the root of the tree, and all it's subtrees at all levels. This can be used to quickly flatten a tree structure into a list, for quick analysis via list comprehensions, as is done above. For each division by zero found, any value is created in the list comprehension, and then this is checked to see if anything did match. Returning the count of divide by zero errors is trivial, simply use length instead of not $ null. Extra context could perhaps be given by printing some of the value that is being divided by zero, to help narrow down the error.

Exercise: Write a function to find all literals that occur in an expression, together with their count.

Basic optimisation

transform :: Uniplate on => (on -> on) -> on -> on

If we are negating a literal value, this computation can be performed in advance, so let's write a function to do this.

optimise :: Expr -> Expr
optimise = transform $ \x -> case x of
    Neg (Val i) -> Val (negate i)
    x -> x

Here the Uniplate method being used is transform. This applies the given function to all the children of an expression, before applying it to the parent. This can be thought of as bottom-up traversal of the data structure. The optimise code merely pattern matches on the negation of a literal, and replaces it with the literal.

Now lets add another optimisation into the same pass, just before the x -> x line insert:

    Add x y | x == y -> Mul x (Val 2)

This takes an addition where two terms are equal and changes it into a multiplication, causing the nested expression to be executed only once. This shows that normal Haskell applies, the Uniplate lets you write code as before.

Exercise: Extend the optimisation to so that adding x to Mul x (Val 2) produces a multiplication by 3.

Depth of an expression

fold :: Uniplate on => (on -> [res] -> res) -> on -> res

Now lets imagine that programmers in your language are paid by the depth of expression they produce, so lets write a function that computes the maximum depth of an expression.

depth :: Expr -> Int
depth = fold (\_ cs -> 1 + maximum (0:cs))

This function performs a fold over the data structure. The function simply says that for each iteration, add one to the previous depth. An evaluator for this expression language can also be modelled as a fold, see inside the example directory to see an implementation.

Exercise: Write a function that counts the maximum depth of addition only.

Renumbering literals

transformM :: (Monad m, Uniplate on) => (on -> m on) -> on -> m on

The literal values need to be replaced with a sequence of numbers, each unique. This is unlikely for an arithmetic expression, but consider bound variables in lambda calculus and it starts to become a bit more plausible:

uniqueLits :: Expr -> Expr
uniqueLits x = evalState (transformM f x) [0..]
    where
        f (Val i) = do
            y:ys <- get
            put ys
            return (Val y)
        f x = return x

Here a monadic computation is required, the program needs to keep track of what the next item in the list to use is, and replace the current item. By using the state monad, this can be done easily.

Exercise: Allow each literal to occur only once, when a second occurance is detected, replace that literal with zero.

Generating mutants

contexts :: Uniplate on => on -> [(on, on -> on)]

The person who is inputting the expression thinks they made a mistake, they suspect they got one of the values wrong by plus or minus one. Generate all the expressions they might have written.

mutate :: Expr -> [Expr]
mutate x = concat [[gen $ Val $ i-1, gen $ Val $ i+1]
                  | (Val i, gen) <- contexts x]

The transform function is useful for doing an operation to all nodes in a tree, but sometimes you only want to apply a transformation once. This is less common, but is sometimes required. The idea is that the context provides the information required to recreate the original expression, but with this node altered.

Exercise: Replace one multiplication with addition, if there are no multiplications return the original expression.

Reverse notation

uniplate :: Uniplate on => on -> ([on], [on] -> on)

In general, universe and transform are used most of the time, and transformM is used occasionally. The Uniplate class is built upon the operation uniplate, which takes an expression, and returns a pair with the children of that expression, and a function to generate that expression with a new set of children in place. This operation can be directly exploited if required, although should be handled with caution.

Let us make a function that reverses the order of all the inputs, say if the user is working in Right-to-Left mode on their computer:

reverseExpr :: Expr -> Expr
reverseExpr = transform f
    where
        f x = generate $ reverse collect
            where (collect,generate) = uniplate x

This expression does a standard transform, but at each iteration calls uniplate, then reverses the children set before regenerating the original expression. This shows the underlying mechanism on which the library is based, and isn't recommended for average users.

Defining a Uniplate instance

As shown in the reversal example (just above), the only method in the Uniplate class is uniplate. The Uniplate class is defined as:

class Uniplate on where
    uniplate :: on -> ([on], [on] -> on)

The idea is that given an item, you want to return all the children, and a function that will replace all the children. An invariant is that the list given to the second function will be the same length as that returned in the first element of the pair. Let's start by constructing the Uniplate instance for the expression type.

instance Uniplate Expr of
    uniplate x =
        case x of
            Add a b -> ([a,b], \[a,b] -> Add a b)
            ...
            Neg a -> ([a], \[a] -> Neg a)
            Val i -> ([], \[] -> Val i)

A short study of the code should show how this works. The other constructors such as Mul follow the same pattern as Add. To define a Uniplate instance a user should make use of the Data.Derive tool.

Using Biplate

The Biplate class is not standard Haskell, requiring multi-parameter type classes. Where possible try and use the standard Uniplate class. The Biplate class is necessary when working with a data structure that has multiple types within it.

class Uniplate with => Biplate on with where
    biplate :: on -> ([with], [with] -> on)

The biplate method operates much like the biplate, except for the different types. When the types of on and with are different, biplate returns the closest children of the requested type. When the types are the same, this function returns the root element, not it's children.

There are several mechanisms for writing Biplate instances, discussed in the Uniplate paper.

Using the operations from Biplate

To see various operations being used from the Biplate class, see the Uniplate paper. Typically the operations are just the same as Uniplate, with Bi on the end.

universeBi:: Biplate on with => on -> [with]
transformBi :: Biplate on with => (with -> with) -> on -> on
transformBiM :: (Monad m, Biplate on with) => (with -> m with) -> on -> m on