Data.Derive: A User Manual

by Neil Mitchell & Stefan O'Rear

Data.Derive is a library and a tool for deriving instances for Haskell programs. It is designed to work with custom derivations, SYB and Template Haskell mechanisms. The tool requires GHC, but the generated code is portable to all compilers. We see this tool as a competitor to DrIFT.

This document proceeds as follows:

Obtaining and Installing Data.Derive
Supported Derivations
Using the Derive Program
Using Template Haskell Derivations
Writing a New Derivation

Acknowledgements

Thanks to everyone who has submitted patches and given assistance, including: Twan van Laarhoven, Spencer Janssen, Andrea Vezzosi. Thanks also to Joel Raymont for being the first user of Data.Derive, and generally helping us feel like we weren't wasting our time.

Obtaining and Installing Data.Derive

Data.Derive is available using darcs:

darcs get --partial http://www.cs.york.ac.uk/fp/darcs/derive

Install the program using the standard sequence of Cabal magic:

runhaskell Setup configure
runhaskell Setup build
runhaskell Setup install

Supported Derivations

Data.Derive is not limited to any prebuild set of derivations, see later for how to add your own. Out of the box, we provide instances for the following libraries.

Prelude

These are the standard classes defined in the Haskell Report, some of which the existing deriving works upon.

Eq
Ord
Bounded
Enum
EnumCyclic
Functor
Read
Show

Base

These are instances from the base libraries, but which aren't in the Haskell 98 report.

Monoid
NFData

Query

DrIFT defines a number of useful query functions, which are technically not instances, but can be derived in a similar manner. We support some of these as from DrIFT, some with modifications, and some which are brand new:

From
Has
Is
Set
LazySet

Generics

We support the two classes from the first Scrap Your Boilerplate paper, and the classes from the Play library:

Typeable
Data
Play

Binary

We support the new Binary library, and the BinaryDefer library.

Binary
BinaryDefer

Testing

We support both QuickCheck and the SmallCheck library:

Arbitrary
Serial

Classhacking

From the HList library:

TTypeable

Missing

These derivations are in DrIFT, but not in Derive. If you need them, let us know and we'll implement them.

ATermConvertible - encode terms in the ATerm format.
BitsBinary - efficient binary encoding of terms.
FunctorM - derive reasonable fmapM implementation.
GhcBinary - byte sized binary encoding of terms.
HFoldable - Strafunski hfoldr.
Haskell2Xml - encode terms as XML (HaXml<=1.13).
Observable - HOOD observable.
RMapM - derive reasonable rmapM implementation.
Term - Strafunski representation via Dynamic.
XmlContent - encode terms as XML (HaXml>=1.14).

Using the Derive program

Let's imagine we've defined a data type:

data Color = RGB Int Int Int
           | CMYK Int Int Int Int
           deriving (Eq, Show)

Now we wish to extend this to derive Binary and change to defining Eq using our library. To do this we simply add to the deriving clause.

data Color = RGB Int Int Int
           | CMYK Int Int Int Int
           deriving (Show {-! Eq, Binary !-})

Now running derive on the program containing this code will generate appropriate instances. How do you combine these instances back into the code? There are various mechanisms supported.

Appending to the module

One way is to append the text to the bottom of the module, this can be done by passing the --append flag. If this is done, Derive will generate the required instances and place them at the bottom of the file, along with a checksum. Do not modify these instances.

Using CPP

One way is to use CPP. Ensure your compiler is set up for compiling with the C Pre Processor. For example:

{-# OPTIONS_GHC -cpp #-}
{-# OPTIONS_DERIVE --output=file.h #-}

module ModuleName where

#include "file.h"

Side-by-side Modules

If you had Colour.Type, and wished to place the Binary instance in Colour.Binary, this can be done with:

{-# OPTIONS_DERIVE --output=Binary.hs --module=Colour.Binary --import #-}

Here you ask for the output to go to a particular file, give a specific module name and import this module. This will only work if the data structure is exported non-abstractly.

Using Template Haskell Derivations

One of Derive's major advantages over DrIFT is support for the Template Haskell (henceforth abbreviated "TH") system. This allows Derive to be invoked automatically during the compilation process, and (because it occurs with full access to the renamer tables) transparently supports deriving across module boundaries. The main disadvantage of TH-based deriving is that it is only portable to compilers that support TH; currently that is GHC only.

To use the TH deriving system, with the same example as before:

import Data.Derive.TH
import Data.Derive.Eq
import Data.Derive.Binary

data Color = RGB Int Int Int
           | CMYK Int Int Int Int
           deriving (Show)

$( derive makeEq ''Color )
$( derive makeBinary ''Color )

Note two things. First, we need to import the derivations. By convention, a derivation for a class FooBar is located in module Data.Derive.FooBar (nota bene: this need not be in package "derive") and is exported with the name makeFooBar. Secondly, we need to tell the compiler to insert the instance using the TH splice construct, $( ... ) (the spaces are optional). The splice causes the compiler to run the function derive (exported from Data.Derive.TH), passing arguments makeFooBar and ''Color. The second argument deserves more explanation; it is a quoted symbol, somewhat like a quoted symbol in Lisp and with deliberately similar syntax. (Two apostrophes are used to specify that this name is to be resolved as a type constructor; just 'Color would look for a data constructor named Color.)

Writing a New Derivation

There are two methods for writing a new derivation, guessing or coding. The guessing method is substantially easier if it will work for you, but is limited to derivations with the following properties:

Inductive - each derivation must be similar to the previous one. Binary does not have this property as a 1 item derivation does not have a tag, but a 2 item derivation does.
Not inductive on the type - it must be an instance for the constructors, not for the type. Typeable violates this property by inducting on the free variables in the data type.
Not type based - the derivation must not change based on the types of the fields. Play and Functor both behave differently given differently typed fields.
Not record based - the derivation must not change on record fields. Show outputs the fields, so this is not allowed.

If however your instance does meet these properties, you can use derivation by guess. Many instances do meet these conditions: Eq, Ord, Data, Serial etc.

Derivation by Guess

This is a unique feature of this library. You simply give an instance, and the program guesses what your instance derivation code should look like, and returns it. You paste the code in, and you have written an instance without learning any of the types or functions required to construct the abstract syntax. For example, lets take the Data instance. I recommend reading through the source in Data.Derive.Data first, then matching it to this description.

First copy the Data file, changing all the obvious bits (makeData etc) to whatever name you want. Next change the example to match your requirements. You basically define an instance for DataName which is defined as:

data DataName a = CtorZero
                | CtorOne  a
                | CtorTwo  a a
                | CtorTwo' a a

Try and make your declaration as inductive as possible. Use x1 etc for variable names within a constructor match. Place all the constructors in the correct order. If you would be unable to see an obvious pattern, then the guesser won't either. Once we have written our sample instance:

> ghci Data.Derive.Data -DGUESS
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base ... linking ... done.
Ok, modules loaded: Data.Derive.Data, Data.DeriveGuess, Language.Haskell.TH.All,
 Language.Haskell.TH.SYB, Language.Haskell.TH.Data, Language.Haskell.TH.FixedPpr
, Language.Haskell.TH.Helper, Language.Haskell.TH.Peephole.
*Data.Derive.Data> guess example

makeData = Derivation data' "Data"
data' dat = [instance_context ["Data","Typeable"] "Data" dat [(FunD (mkName
    "gfoldl") ((map (\(ctorInd,ctor) -> (Clause [(VarP (mkName "k")),(VarP (
    mkName "r")),(ConP (mkName ("" ++ ctorName ctor)) ((map (\field -> (VarP (
    mkName ("x" ++ show field)))) (id [1..ctorArity ctor]))++[]))] (NormalB (
    foldr1With (VarE (mkName "k")) ((map (\field -> (VarE (mkName ("x" ++ show
    field)))) (reverse [1..ctorArity ctor]))++[(AppE (VarE (mkName "r")) (ConE
    (mkName ("" ++ ctorName ctor))))]++[]))) [])) (id (zip [0..] (dataCtors dat
    ))))++[]))]]

And thats it. The block of code spewed out will generate Data instances, we just paste it back into the file.

There is lots of clever stuff, induction hypotheses etc going on behind all this. If you have an instance which you think should be inferable, but isn't, then let me know.

Derivation by Coding

We use the Template Haskell data types extensively, for examples take a look at Binary and Functor. Its not particularly hard, but it is harder than just having them guessed.