Generic transformations and queries are often referred to as boilerplate code - they remain relatively similar as the action performed by the code changes, and can often outnumber the actual intent of the code in terms of lines. While other generic traversal schemes have shown how powerful new features can be added to compilers, and how the type system can be manipulated into accepting these operations, the Uniplate library focuses on a conceptually simpler generic concept. A more complete document on Uniplate was published at the Haskell Workshop 2007, and is available from the project website, along with a video presentation, and the associated thesis chapter.
Uniplate is a simple, concise and fast generics library. To expand on that sentence:
The Uniplate library can be installed with the standard sequence of cabal commands:
cabal update cabal install uniplate
This document proceeds as follows:
Thanks to Björn Bringert for feedback on an earlier version of this document, Eric Mertens for various ideas and code snippets, and to Matt Naylor and Tom Shackell for helpful discussions.
To demonstrate the facilities of Uniplate, we use a simple arithmetic type:
{-# LANGUAGE DerivingDataTypeable #-} module Expr where import Data.Generics.Uniplate.Data data Expr = Val Int | Add Expr Expr | Sub Expr Expr | Div Expr Expr | Mul Expr Expr | Neg Expr deriving (Show, Eq, Data, Typeable)
In this definition, the Uniplate specific bits are bolded. The three extra parts are:
This definition makes use of the Scrap Your Boilerplate (SYB) based Uniplate implementation. The SYB implementation is compatible with the other implementations, but is slower (between 2 and 8 times) and requires some modest compiler extensions (implemented in GHC for many years). The alternative definition scheme is described towards the end of this document, in "Making Uniplate Faster". I recommend using the SYB implementation to start with, as it requires least setup.
universe :: Uniplate on => on -> [on]
If an expression is divided by zero, this causes a runtime error in our language. As part of the compiler, it's nice to give the user a warning message about this. This can be done with the following test:
hasDivZero :: Expr -> Bool hasDivZero x = not $ null [() | Div _ (Val 0) <- universe x]
Here the only Uniplate method being used is universe. Given a tree, universe returns all the root of the tree, and all it's subtrees at all levels. This can be used to quickly flatten a tree structure into a list, for quick analysis via list comprehensions, as is done above. For each division by zero found, any value is created in the list comprehension, and then this is checked to see if anything did match. Returning the count of divide by zero errors is trivial, simply use length instead of not $ null. Extra context could perhaps be given by printing some of the value that is being divided by zero, to help narrow down the error.
Exercise: Write a function to find all literals that occur in an expression, together with their count.
transform :: Uniplate on => (on -> on) -> on -> on
If we are negating a literal value, this computation can be performed in advance, so let's write a function to do this.
optimise :: Expr -> Expr optimise = transform f where f (Neg (Val i)) = Val (negate i) f x = x
Here the Uniplate method being used is transform. This applies the given function to all the children of an expression, before applying it to the parent. This can be thought of as bottom-up traversal of the data structure. The optimise code merely pattern matches on the negation of a literal, and replaces it with the literal.
Now lets add another optimisation into the same pass, just before the f x = x line insert:
f (Add x y) | x == y = Mul x (Val 2)
This takes an addition where two terms are equal and changes it into a multiplication, causing the nested expression to be executed only once. This shows that normal Haskell applies, the Uniplate lets you write code as before.
Exercise: Extend the optimisation to so that adding x to Mul x (Val 2) produces a multiplication by 3.
para :: Uniplate on => (on -> [res] -> res) -> on -> res
Now lets imagine that programmers in your language are paid by the depth of expression they produce, so lets write a function that computes the maximum depth of an expression.
depth :: Expr -> Int depth = para (\_ cs -> 1 + maximum (0:cs))
This function performs a paramorphism (a bit like a fold) over the data structure. The function simply says that for each iteration, add one to the previous depth. An evaluator for this expression language can also be modelled as a para, see inside the example directory to see an implementation.
Exercise: Write a function that counts the maximum depth of addition only.
transformM :: (Monad m, Uniplate on) => (on -> m on) -> on -> m on
The literal values need to be replaced with a sequence of numbers, each unique. This is unlikely for an arithmetic expression, but consider bound variables in lambda calculus and it starts to become a bit more plausible:
uniqueLits :: Expr -> Expr uniqueLits x = evalState (transformM f x) [0..] where f (Val i) = do y:ys <- get put ys return (Val y) f x = return x
Here a monadic computation is required, the program needs to keep track of what the next item in the list to use is, and replace the current item. By using the state monad, this can be done easily.
Exercise: Allow each literal to occur only once, when a second occurance is detected, replace that literal with zero.
contexts :: Uniplate on => on -> [(on, on -> on)]
The person who is inputting the expression thinks they made a mistake, they suspect they got one of the values wrong by plus or minus one. Generate all the expressions they might have written.
mutate :: Expr -> [Expr] mutate x = concat [[gen $ Val $ i-1, gen $ Val $ i+1] | (Val i, gen) <- contexts x]
The transform function is useful for doing an operation to all nodes in a tree, but sometimes you only want to apply a transformation once. This is less common, but is sometimes required. The idea is that the context provides the information required to recreate the original expression, but with this node altered.
Exercise: Replace one multiplication with addition, if there are no multiplications return the original expression.
All the operations defined in Uniplate have a corresponding Biplate instance. Typically the operations are just the same as Uniplate, with Bi on the end.
universeBi:: Biplate on with => on -> [with] transformBi :: Biplate on with => (with -> with) -> on -> on transformBiM :: (Monad m, Biplate on with) => (with -> m with) -> on -> m on
The biggest difference is for the functions childrenBi and descendBi. In these cases, if the starting type and the target type are the same, then the input value will be returned. For example:
childrenBi (Add (Val 1) (Val 2)) == [Add (Val 1) (Val 2)] children (Add (Val 1) (Val 2)) == [Val 1, Val 2]
To make Uniplate faster import Data.Generics.Uniplate.Direct, and provide Uniplate instances by generating them with the Derive tool.