Programming with binders using Unbound ====================================== *Names* are the bane of every language implementation: they play an unavoidable, central role, yet are tedious to deal with and surprisingly tricky to get right. Unbound is a flexible and powerful library for programming with names and binders, which makes programming with binders easy and painless. Built on top of RepLib's generic programming framework, it does a lot of work behind the scenes to provide you with a seamless, "it just works" experience. This literate Haskell tutorial will walk you through the basics of using Unbound. The [Haddock documentation for Unbound](http://hackage.haskell.org/package/unbound) is also a good source of information. For something more academic, you may also be interested in reading * [Stephanie Weirich, Brent Yorgey, and Tim Sheard. Binders Unbound.](http://www.cis.upenn.edu/~byorgey/papers/binders-unbound.pdf) Submitted, March 2011. The untyped lambda calculus --------------------------- Let's start by writing a simple untyped lambda calculus interpreter. This will illustrate the basic functionality of Unbound. **Preliminaries** First, we need to enable lots of wonderful GHC extensions: > {-# LANGUAGE MultiParamTypeClasses > , TemplateHaskell > , ScopedTypeVariables > , FlexibleInstances > , FlexibleContexts > , UndecidableInstances > #-} You may be worried by `UndecidableInstances`. Sadly, this is necessary in order to typecheck the code generated by RepLib. Rest assured, however, that the instances generated by RepLib *are* decidable; it's just that GHC can't prove it. Now to import the library: > import Unbound.LocallyNameless We import the locally nameless implementation of Unbound (A nominal implementation is also provided in `Unbound.Nominal`. However, at the moment it is likely full of bugs and is poorly documented, so we recommend sticking with the locally nameless implementation for now.) A few other imports we'll need for this particular example: > import Control.Applicative > import Control.Arrow ((+++)) > import Control.Monad > import Control.Monad.Trans.Maybe > import Control.Monad.Trans.Error > > import Text.Parsec hiding ((<|>), Empty) > import qualified Text.Parsec.Token as P > import Text.Parsec.Language (haskellDef) > > import qualified Text.PrettyPrint as PP > import Text.PrettyPrint (Doc, (<+>)) **Representing terms** We now declare a `Term` data type to represent lambda calculus terms. > data Term = Var (Name Term) > | App Term Term > | Lam (Bind (Name Term) Term) > deriving Show The `App` constructor is straightforward, but the other two constructors are worth looking at in detail. First, the `Var` constructor holds a `Name Term`. `Name` is an abstract type for representing names, provided by Unbound. `Name`s are indexed by the sorts of things to which they can refer (or more precisely, the sorts of things which can be substituted for them). Here, a variable is simply a name for some `Term`, so we use the type `Name Term`. Lambdas are where names are *bound*, so we use the special `Bind` combinator, also provided by the library. Something of type `Bind p b` represents a pair consisting of a *pattern* `p` and a *body* `b`. The pattern may bind names which occur in `b`. Here is where the power of generic programming comes into play: we may use (almost) any types at all as patterns and bodies, and Unbound will be able to handle it with very little extra guidance from us. In this particular case, a lambda simply binds a single name, so the pattern is just a `Name Term`, and the body is just another `Term`. Now we tell RepLib to automatically derive a bunch of behind-the-scenes, boilerplate instances for `Term`: > $(derive [''Term]) There are just a couple more things we need to do. First, we make `Term` an instance of `Alpha`, which provides most of the methods we will need for working with the variables and binders within `Term`s. > instance Alpha Term What, no method definitions? Nope! In this case (and in most cases) the default implementations, written in terms of those generic instances we had RepLib derive for us, work just fine. But in special situations it's possible to override specific methods in the `Alpha` class with our own implementations (see the documentation for an example). We only need to provide one more thing: a `Subst Term Term` instance. In general, an instance for `Subst b a` means that we can use the `subst` function to substitute things of type `b` for `Name`s occurring in things of type `a`. The only method we must implement ourselves is `isvar`, which has the type isvar :: a -> Maybe (SubstName a b) The documentation for `isvar` states "If the argument is a variable, return its name wrapped with the 'SubstName' constructor. Return `Nothing` for non-variable arguments." Even the most sophisticated generic programming library can't read our minds: we have to tell it which values of our data type are variables (*i.e.* things that can be substituted for). For `Term` this is not hard: > instance Subst Term Term where > isvar (Var v) = Just (SubstName v) > isvar _ = Nothing That's all! **Trying things out** Now that we've got the necessary preliminaries set up, what can we do with this? First, let's define some convenient helper functions: > lam :: String -> Term -> Term > lam x t = Lam $ bind (string2Name x) t > > var :: String -> Term > var = Var . string2Name Notice that `string2Name` allows us to create a `Name` from a `String`, and `bind` allows us to construct bindings. We can test things out at the `ghci` prompt like so: *Main> lam "x" (lam "y" (var "x")) Lam ( Lam ( Var 1@0)) Don't worry about the `1@0` thing: Unbound handles all the details of this for you. However, if you must know, it is a *de Bruijn index*, which refers to the 0th variable of the 1st (counting outwards from 0) enclosing binding site; that is, to `x`. Recall that the left-hand side of a `Bind` can be an arbitrary data structure potentially containing multiple names (a *pattern*), like a pair or a list; hence the need for the index after the `@`. Of course, in this particular example we only ever bind one name at once, so the index after the `@` will always be zero. We can check that substitution works as we expect. Substituting for `x` in a term where `x` does not occur free has no effect: *Main> subst (string2Name "x") (var "z") (lam "x" (var "x")) Lam ( Var 0@0) If `x` does occur free, the substitution takes place as expected: *Main> subst (string2Name "x") (var "z") (lam "y" (var "x")) Lam ( Var z) Finally, substitution is capture-avoiding: *Main> subst (string2Name "x") (var "y") (lam "y" (var "x")) Lam ( Var y) It may look at first glance like `y` has been incorrectly captured, but the fact that it has a *name* means it is free: if it had been captured we would see `Lam ( Var 0@0)`. **Evaluation** The first thing we want to do is write an evaluator for our lambda calculus. Of course there are many ways to do this; for the sake of simplicity and illustration, we will write an evaluator based on a small-step, call-by-value operational semantics. > -- A convenient synonym for mzero > done :: MonadPlus m => m a > done = mzero > > step :: Term -> MaybeT FreshM Term > step (Var _) = done > step (Lam _) = done > step (App (Lam b) t2) = do > (x,t1) <- unbind b > return $ subst x t2 t1 > step (App t1 t2) = > App <$> step t1 <*> pure t2 > <|> App <$> pure t1 <*> step t2 We define a `step` function with the type `Term -> MaybeT FreshM Term`. `FreshM` is a monad provided by the binding library to handle fresh name generation. It's fairly simple but works just fine in many situations. (If you need to, you can create your own custom monad, make it an instance of the `Fresh` class, and use it in place of `FreshM`.) In order to signal whether a reduction step has taken place, we add failure capability with the `MaybeT` monad transformer. We may freely intermix `FreshM` (which also comes in a transformer variant, `FreshMT`) with all the standard monad transformers found in the `transformers` package. `step` tries to reduce the given term one step if possible. Variables and lambdas cannot be reduced at all, so in those cases we signal that we are done. If the input term is an application of a lambda to another term, we must do a beta-reduction. We first use `unbind` to destruct the binding inside the `Lam` constructor; it automatically chooses a fresh name for the bound variable and gives us back a pair of the variable and body. We then call `subst` to perform the appropriate substitution. Otherwise, we must have an application of something other than a lambda. In this case we try reducing first the left-hand and then the right-hand term. Finally, we define an `eval` function as the transitive closure of `step`, and run it with `runFreshM`: > tc :: (Monad m, Functor m) => (a -> MaybeT m a) -> (a -> m a) > tc f a = do > ma' <- runMaybeT (f a) > case ma' of > Just a' -> tc f a' > Nothing -> return a > > eval :: Term -> Term > eval x = runFreshM (tc step x) **Parsing** We can use [Parsec](http://hackage.haskell.org/package/parsec) to write a tiny parser for our lambda calculus: > lexer = P.makeTokenParser haskellDef > parens = P.parens lexer > brackets = P.brackets lexer > ident = P.identifier lexer > > parseTerm = parseAtom `chainl1` (pure App) > > parseAtom = parens parseTerm > <|> var <$> ident > <|> lam <$> (brackets ident) <*> parseTerm > > runTerm :: String -> Either ParseError Term > runTerm = (id +++ eval) . parse parseTerm "" In fact, there's nothing particularly special about this parser with respect to the binding library: we just get to reuse our `var` and `lam` functions from before, with the result that strings like `"([x] [y] x) x"` are parsed into terms with all the scoping properly resolved. To check that it works, let's compute 2 + 3: *Main> runTerm "([m][n][s][z] m s (n s z)) ([s] [z] s (s z)) ([s][z] s (s (s z))) s z" Right (App (Var s) (App (Var s) (App (Var s) (App (Var s) (App (Var s) (Var z)))))) 2 + 3 is still 5, and all is right with the world. **Pretty-printing and LFresh** Now we want to write a pretty-printer for our lambda calculus (to use in our fantastic type checking error messages, once we get around to adding an amazing, sophisticated type system). Here's a first attempt: > class Pretty' p where > ppr' :: (Applicative m, Fresh m) => p -> m Doc > > instance Pretty' Term where > ppr' (Var x) = return . PP.text . show $ x > ppr' (App t1 t2) = PP.parens <$> ((<+>) <$> ppr' t1 <*> ppr' t2) > ppr' (Lam b) = do > (x, t) <- unbind b > ((PP.brackets . PP.text . show $ x) <+>) <$> ppr' t However, there's a problem: *Main> runFreshM $ ppr' (lam "x" (lam "y" (lam "z" (var "y")))) [x] [y1] [z2] y1 Ugh, what are those numbers doing there? The problem is that `unbind` always generates a new globally fresh name no matter what other names are or aren't in scope. This is fine for evaluation, but for pretty-printing terms that include bound names it's rather ugly. For nicer printing we'll need something a bit more sophisticated. That something is the `LFresh` type class, which gives a slightly different interface for generating *locally fresh* names (as opposed to `Fresh` which generates globally fresh names). A standard `LFreshM` monad is provided (along with a corresponding transformer, `LFreshMT`) which is an instance of `LFresh`. class Monad m => LFresh m where -- | Pick a new name that is fresh for the current (implicit) scope. lfresh :: Rep a => Name a -> m (Name a) -- | Avoid the given names when freshening in the subcomputation. avoid :: [AnyName] -> m a -> m a Monads which are instances of `LFresh` maintain a set of "currently in-scope" names which are to be avoided when generating new names. `lfresh` generates a name which is guaranteed not to be in the set, and `avoid` runs a subcomputation with some additional names added to the in-scope set. You probably won't need to call these methods explicitly very often; more useful are some methods built on top of these such as `lunbind`: lunbind :: (LFresh m, Alpha p, Alpha t) => Bind p t -> ((p, t) -> m r) -> m r `lunbind` corresponds to `unbind` but works in an `LFresh` context. It destructs a binding, avoiding only names curently in scope, and runs a subcomputation while additionally avoiding the chosen name(s). Let's rewrite our pretty-printer in terms of `LFresh`. The only change we need to make is to use continuation-passing style for the call to `lunbind` in place of the normal monadic sequencing used with `unbind`. > class Pretty p where > ppr :: (Applicative m, LFresh m) => p -> m Doc > > instance Pretty Term where > ppr (Var x) = return . PP.text . show $ x > ppr (App t1 t2) = PP.parens <$> ((<+>) <$> ppr t1 <*> ppr t2) > ppr (Lam b) = > lunbind b $ \(x,t) -> > ((PP.brackets . PP.text . show $ x) <+>) <$> ppr t Let's try it: *Main> runLFreshM $ ppr (lam "x" (lam "y" (lam "z" (var "y")))) [x] [y] [z] y *Main> runLFreshM $ ppr (lam "x" (lam "y" (lam "y" (var "y")))) [x] [y] [y1] y1 Much better! Note: the tutorial from this point on is still under construction, so expect some rough edges -- although you may still find the material useful! A simple dependent calculus --------------------------- To illustrate some of the more advanced features of RepLib's binding library, let's consider a simple dependent calculus, defined as follows: [XXX put ott output here or something? How to present the calculus?] This is about a simple as we can get while retaining dependency of types on terms, but it is already rather interesting with regards to binding structure. The main point of interest is the way that *telescopes* work: in a term such as [XXX \[A:*, B:A -> *, x:A, t:A -> B x]. t x] every variable bound in the telescope is in scope not only in the body of the abstraction but also in the type annotations of later bindings in the telescope. For example, `x` shows up both in the type of `t` and in the body of the abstraction. We can imagine a way to encode this using only `Bind`, but it would be rather ugly. [XXX explain why it would be ugly: subtrees etc., doesn't correspond to way we have imagined the syntax, etc.] Instead, we can define a type `Exp` of expressions like this: > data Exp = EVar (Name Exp) > | EStar > | ELam (Bind Tele Exp) > | EApp Exp [Exp] > | EPi (Bind Tele Exp) > deriving Show There's nothing remarkable about this yet; the definition of `Exp` corresponds exactly to the grammar we gave for expressions earlier, and refers to a data type `Tele` of telescopes. However, we can already see that the definition of `Tele` will have to be somewhat interesting: `ELam` and `EPi` declare telescopes as patterns which bind variables within the body (an `Exp`), but as we noted before, telescopes also have their own internal binding structure. > data Tele = Empty > | Cons (Rebind (Name Exp, Embed Exp) Tele) > deriving Show A telescope can be empty, of course, or else it is a variable binding like `(x:A)` followed by another telescope, in which the variable is bound. However, it won't do to use `Bind`: the variable is bound *not only* in the following telescope, but *also* in the body of the abstraction which forms the outer context. So instead of `Bind` we use `Rebind`. `Rebind p b` specifies that the pattern `p` is bound in the body `b`, but is *also* made available to be bound in another outer context. (Of course, that outer context might itself be a `Rebind`, in which case the same variable would be bound in yet another outer context, and so on.) We are not quite done: `Rebind (Name Exp, Exp) Tele` would not be correct, since this would specify that any variables occurring in the `Exp` are bound in the telescope (and also in the outer context), but this is not correct. The `Exp` is a type annotation for the name, and any names occurring in it are actually *references* to previously bound names, not binding sites themselves. For this purpose the `Embed` wrapper is provided, which specifies that the wrapped type -- which would otherwise be considered a binding pattern -- is only an annotation whose names refer back to previous bindings. Pop quiz: why would | Cons (Rebind (Name Exp) (Embed Exp, Tele)) also be incorrect? (Answer: because then variables would be bound within their own type annotations.) Now for some instances: we derive generic representations for `Exp` and `Tele`, and make them both instances of `Alpha`. We also define a `Subst` instance so we can substitute expressions for variables in other expressions. > $(derive [''Exp, ''Tele]) > > instance Alpha Exp > instance Alpha Tele > instance Subst Exp Exp where > isvar (EVar v) = Just (SubstName v) > isvar _ = Nothing We also need to be able to substitute expressions for variables occurring in telescopes. However, since telescopes do not contain free expression variables directly (only binding sites, which are never the target of a substitution), the default definition of `isvar = const Nothing` is all we need, and the generic programming framework takes care of the rest. > instance Subst Exp Tele We define some convenient smart constructors as before: > evar :: String -> Exp > evar = EVar . string2Name > > elam :: [(String, Exp)] -> Exp -> Exp > elam t b = ELam (bind (mkTele t) b) > > epi :: [(String, Exp)] -> Exp -> Exp > epi t b = EPi (bind (mkTele t) b) > > earr :: Exp -> Exp -> Exp > earr t1 t2 = epi [("_", t1)] t2 > > eapp :: Exp -> Exp -> Exp > eapp a b = EApp a [b] > > mkTele :: [(String, Exp)] -> Tele > mkTele [] = Empty > mkTele ((x,e) : t) = Cons (rebind (string2Name x, Embed e) (mkTele t)) These are fairly straightforward, and we note only the second case of `mkTele`, where we use `rebind` for creating a `Rebind` structure, and wrap `e` in an `Embed` constructor. We can test things out so far by creating a few example terms. Here is the polymorphic identity function: *Main> elam [("A", EStar), ("x", evar "A")] (evar "x") ELam (<(Cons (<<(A,{EStar})>> Cons (<<(x,{EVar 0@0})>> Empty)))> EVar 0@1) Inside the `ELam` we have the whole telescope inside angle brackets (indicating the entire thing is a binding pattern), followed by the body of the lambda, `EVar 0@1`, indicating it is a reference to the first enclosing binding pattern (that is, the telescope), and specifically to the second variable bound within the pattern (that is, `x`). Within the telescope, there is a binding for `A` with an annotation (in curly braces) of `EStar`, followed by a binding for `x`, with an annotation of `EVar 0@0` -- that is, `A`. *Main> :{ *Main| elam [ ("A", EStar) *Main| , ("B", earr (evar "A") EStar) *Main| , ("x", evar "A") *Main| , ("t", earr (evar "A") (eapp (evar "B") (evar "x"))) *Main| ] *Main| (eapp (evar "t") (evar "x")) *Main| :} ELam (<(Cons (<<(A,{EStar})>> Cons (<<(B,{EPi (<(Cons (<<(_,{EVar 0@0})>> Empty))> EStar)})>> Cons (<<(x,{EVar 1@0})>> Cons (<<(t,{EPi (<(Cons (<<(_,{EVar 2@0})>> Empty))> EApp (EVar 2@0) [EVar 1@0])})>> Empty)))))> EApp (EVar 0@3) [EVar 0@2]) We will also need functions for appending two telescopes, and for looking up a name in a telescope. Both these functions illustrate the use of `unrebind`, which opens a `Rebind` structure similarly to the way that `unbind` opens `Bind`s. However, it is different in one important respect: while the output of `unbind` must be in a monad for fresh name generation, the output of `unrebind` is pure. This is because `Rebind`s can only ever occur within an enclosing `Bind`, so by the time we get around to opening a `Rebind`, fresh names have already been chosen for the binders by a call to `unbind`, and they need not be freshened again. We also define a monad `M` which will serve as the context for our type checker. > appTele :: Tele -> Tele -> Tele > appTele Empty t2 = t2 > appTele (Cons rb) t2 = Cons (rebind p (appTele t1' t2)) > where (p, t1') = unrebind rb > > type M = ErrorT String LFreshM > > lookUp :: Name Exp -> Tele -> M Exp > lookUp n Empty = throwError $ "Not in scope: " ++ show n > lookUp v (Cons rb) | v == x = return a > | otherwise = lookUp v t' > where ((x, Embed a), t') = unrebind rb (We also note in passing that `appTele` and `lookUp` would be perfect opportunities to use GHC's `ViewPatterns`, but for simplicity's sake we leave this fun to the reader.) We can now write a type checker for our toy language. From the point of view of the binding library, there's nothing too remarkable about it: we use `lunbind` to take apart binders when inferring the types of lambdas, applications, and pis, and use substitution both when checking the types of argument lists (`checkList`) and when substituting the arguments to an application into the type of the result (`multiSubst`). > unPi :: Exp -> M (Bind Tele Exp) > unPi (EPi bnd) = return bnd > unPi e = throwError $ "Expected pi type, got " ++ show e ++ " instead" > > infer :: Tele -> Exp -> M Exp > infer g (EVar x) = lookUp x g > infer _ EStar = return EStar > infer g (ELam bnd) = do > lunbind bnd $ \(delta, m) -> do > b <- infer (g `appTele` delta) m > return . EPi $ bind delta b > infer g (EApp m ns) = do > bnd <- unPi =<< infer g m > lunbind bnd $ \(delta, b) -> do > checkList g ns delta > multiSubst delta ns b > infer g (EPi bnd) = do > lunbind bnd $ \(delta, b) -> do > check (g `appTele` delta) b EStar > return EStar > > check :: Tele -> Exp -> Exp -> M () > check g m a = do > b <- infer g m > checkEq b a > > checkList :: Tele -> [Exp] -> Tele -> M () > checkList _ [] Empty = return () > checkList g (e:es) (Cons rb) = do > let ((x, Embed a), t') = unrebind rb > check g e a > checkList (subst x e g) (subst x e es) (subst x e t') > checkList _ _ _ = throwError $ "Unequal number of parameters and arguments" > > multiSubst :: Tele -> [Exp] -> Exp -> M Exp > multiSubst Empty [] e = return e > multiSubst (Cons rb) (e1:es) e = multiSubst t' es e' > where ((x,_), t') = unrebind rb > e' = subst x e1 e > multiSubst _ _ _ = throwError $ "Unequal lengths in multiSubst" -- shouldn't happen > > -- A conservative, inexpressive notion of equality, just for the sake > -- of the example. > checkEq :: Exp -> Exp -> M () > checkEq e1 e2 = if aeq e1 e2 > then return () > else throwError $ "Couldn't match: " ++ show e1 ++ " " ++ show e2 XXX insert type *checking* example (checking pi-type of a lambda) as illustration of unbind2