Recent posts

The new code generator is nearly ready to go live

I've just spent another week working on the Glorious New Code Generator, and I'm pleased to say it's now in a state that we can think about switching over.

Just to recap, the "new code generator" is a replacement for the STG->Cmm phase of the compiler, using a new intermediate representation based on  Hoopl , and separating out the mechanics of layout out stack frames from code generation. It's mainly an architectural change with little visible benefit, but it will be a big improvement in terms of flexibility and modularity inside the compiler. We'll be able to express many optimisations that were hard before, e.g. #1498.

Furthermore, many bits of cruft will go away, such as large chunks of the horrible CoreToStg pass, and the SRT analysis. These things are replaced by much cleaner analyses of the generated code using Hoopl.

For more details on the new code generator see Commentary/Compiler/NewCodeGen, although that page needs a lot of updating after my recent hacking.

The current status in more detail:

  • everything works, with the possible exception of profiling. I've just got a stage2 compiler working, and the whole test suite passes with no new failures.
  • generated code is on average as good as before, sometimes better and sometimes worse. I have a few examples of tight loops that get better with the new code gen, but of course for complex low-level loopy code LLVM will still be necessary to get really good code. On the other hand, GHC can take advantage of its better knowledge about aliasing, and I expect to make more use of this in the future. There's quite a bit of scope for generating better code and adding cool Hoopl-based optimisations later. Things are definitely looking up for low-level performance in GHC.
  • compile times will get a bit worse, at least initially. This is where I've been spending a lot of effort: when I started work on the new code generator it was adding more than 100% to compile times, and I've got that down to about 5% for optimised compilation, 10-15% for non-optimised. Compile times will improve further when we switch over properly and we can start removing a lot of cruft.

We now have a 7.6 branch, so my plan is to fix the one remaining blocker (profiling) and then switch over. There will no doubt be breakage that hasn't shown up so far, but we have until 7.8 to find and fix it.

EDIT: I should mention that the new code generator is the work of many people, including Norman Ramsey, Simon Peyton Jones, John Dias, Edward Yang and probably others I've forgotten.

  • Posted: 2012-08-03 08:13 (Updated: 2012-08-03 08:27)
  • Author: simonmar
  • Categories: (none)
  • Comments (2)

An overhaul of stack management, and some performance improvements

I just  committed a patch I've been working on for a few days that revamps the way thread stacks are managed in the runtime system. There'll be some nice performance improvements coming in 7.2.1 as a result of this.

Background

In GHC 7.0 and earlier, a thread is represented by a TSO object in the heap. The TSO contains:

  • some thread-specific state and flags
  • a couple of link fields for linking the TSO onto lists
  • the stack, growing backwards from the end of the TSO

when the stack overflows, as long as the maximum stack size (+RTS -K<size>) has not yet been reached, then the runtime system would create a new TSO double the size of the old one, copy the stack and the thread state into the new object, and mark the old TSO as a ThreadRelocated pointing to the new one. This last step is important, because the TSO might be reachable from other places: a ThreadId is basically a pointer to the TSO, for instance.

This ThreadRelocated thing is a nuisance, because it means every time we derefernce a TSO pointer, we have to check for ThreadRelocated in a little loop. This loop was enshrined in a function deRefTSO() in the RTS, and was called from various places. Over the years we've had several bugs caused by a missing deRefTSO.

Overheads of the old way

Obviously there's a certain amount of overhead associated with copying the whole stack every time we need to enlarge the TSO: every stack word is written approximately twice. What's worse, though, is that the entire stack is traversed during every GC, if the thread is active (threads that haven't run since the last GC are marked "clean" and not traversed). If the thread has a deep stack, this makes minor GCs expensive, which shows up as a high GC overhead. A common workaround is to use a larger allocation area, e.g. +RTS -A4m, but that has the disadvantage of making the allocation area larger than the L2 cache, which also hurts performance.

A better way: stack chunks

The solution to the repeated traversal problem is to identify the parts of the stack that are unmodified since the last GC, and not traverse them. The GC already does a lot of this kind of thing - it's called a "write barrier", and it's a basic requirement for generational GC.

There are two ways we could do this: either do a card-marking trick like we do for mutable arrays, or we could divide the stack up into multiple objects and mark each object as either clean or dirty. The card-marking option would have been fairly complicated to administer, so we discarded that idea. In contrast, having multiple objects containing parts of the stack was relatively straightforward.

Perhaps we should have done this ages ago. I put it off because I thought it would be complex - we have lots of places in the RTS that traverse stacks. It turns out that the majority of these places don't care about stack chunks, which is nice.

Death to ThreadRelocated

The obvious solution to the ThreadRelocated problem is to separate the stack from the TSO, and that's exactly what we've done. Now the TSO contains the thread state only, and it points to a separate STACK object for the topmost stack chunk, containing the stack pointer and the stack itself. The result is that a fragile invariant goes away, which is a very good thing.

Why didn't we do this before? Well, in fact I did try making this change a long time ago, and found that it made performance worse for some thread benchmarks. I put it down to the extra cache-line fill required to access the stack pointer in the separate STACK object. However, this time the results look pretty good - I'm not entirely sure why, maybe it's the combination of doing this with stack chunks, or maybe I did something wrong last time, who knows (or cares!).

Performance

Here are some performance figures from a small collection of thread benchmarks ( http://darcs.haskell.org/nofib/smp). This is comparing yesterday's HEAD with my working branch:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed
--------------------------------------------------------------------------------
    callback001        +289.3%    -13.5%     +0.8%     +1.2%
    callback002        +302.0%    -13.5%     -1.3%     -1.3%
          sieve        +329.2%     -0.2%     -1.6%     -1.3%
     threads001        +292.4%     +0.0%     +4.2%     +4.5%
     threads003        +298.9%     -0.5%    -36.5%    -36.4%
     threads006        +304.1%     -2.2%    -66.4%    -66.4%
     threads007        +409.7%     +0.0%     +2.4%     +2.5%
--------------------------------------------------------------------------------
            Min        +289.3%    -13.5%    -66.4%    -66.4%
            Max        +409.7%     +0.0%     +4.2%     +4.5%
 Geometric Mean        +316.3%     -4.5%    -19.3%    -19.2%

Ignore the size figures, it's because the HEAD build was using -split-objs.

We can see that several benchmarks display no difference or get very slightly worse, but a couple (threads003 and threads006) see a big jump in performance. threads003 is an old variant on threadring, and threads006 is a microbenchmark for throwTo. I don't fully understand the difference in performance here, but typically I don't investigate when things go faster, only when they go slower.

GHC itself got a little faster: 1-2% when compiling Cabal.

But what you really wanted to know was... what about threadring, right? Well, it currently goes a few percent slower with the new changes. However, I notice that it's allocating a lot more than before, so I think there's something suspicious going on. I shall investigate...

Update: some more results

BinaryTrees, from the shootout:

  • before: 64.26s
  • after: 17.61s

So I guess the problem with BinaryTrees was more to do with repeated stack traversals than it was to do with a low mortality rate. Nice that we can get good performance from this program without tweaking the heap settings at all now.

Hackage benchmarks, from David Peixotto's fibon benchmark suite:

--------------------------------------------------------------------------------
        Program           Size    Allocs   Runtime   Elapsed  TotalMem
--------------------------------------------------------------------------------
           Agum        +306.5%     +3.5%     +0.8%     +0.9%     +0.0%
          Bzlib          -----     -----     -----     -----     -----
           Cpsa        +125.1%     +2.1%     -2.2%     -2.3%     +0.0%
         Crypto          -----     -----     -----     -----     -----
            Fgl        +408.9%     +0.1%     +2.0%     +2.0%     -5.4%
            Fst        +137.7%     -0.0%     -4.4%     -4.3%     +0.0%
         Funsat        +151.4%     +0.2%     +1.7%     +1.8%     -2.0%
             Gf         13990k 13885653k      9.03      9.60   116736k
          HaLeX        +195.9%     +0.0%     +1.2%     +1.1%     +0.0%
          Happy        +135.0%     +3.3%     -5.3%     -5.4%    -13.7%
         Hgalib        +236.7%     +1.0%     +0.7%     +0.7%     +0.0%
    Palindromes        +243.1%    -10.1%     +3.4%     +3.4%     +2.1%
          Pappy        +148.9%     +0.5%     +3.7%     +3.8%    -17.7%
     QuickCheck        +139.5%     -0.1%     -1.6%     -1.6%     +0.0%
          Regex        +182.1%     +0.0%     +3.5%     +3.5%    -21.4%
          Simgi        +159.3%     +0.0%     +6.9%     +6.9%     +0.0%
   TernaryTrees        +519.4%     -2.0%    -32.4%    -32.7%    -14.3%
          Xsact        +246.4%     +1.2%     +0.9%     +0.9%     -0.6%
--------------------------------------------------------------------------------
            Min        +125.1%    -10.1%    -32.4%    -32.7%    -21.4%
            Max        +519.4%     +3.5%     +6.9%     +6.9%     +2.1%
 Geometric Mean        +207.6%     -0.1%     -1.9%     -1.9%     -5.2%

One or two nice ones in there: TernaryTrees, Happy, Fst all look good. Simgi probably deserves further investigation.

Notice how the overall memory size is reducing too, by 5% on average.

Update 2: threadring results

threadring 10000000, without -threaded:

  • 6.12.3: 1.50s
  • 7.0.1: 2.05s
  • HEAD with stack patches: 1.67s

So the stack patches recovered almost all the performance lost between 6.12.3 and 7.0.1 on non-threaded threadring. The performance lost in 7.0.1 was due to the BLACKHOLE changes and some other changes to fix pathological cases of bad MVar performance.

More interesting are the -threaded numbers:

  • 6.12.3: 5.86s
  • 7.0.1: 3.00s
  • HEAD with stack patches: 2.31s

The new code is very clearly winning here. I spent a little while with the perf tool on Linux trying to understand the results, but wasn't able to fully account for it, sadly.

  • Posted: 2010-12-15 08:29 (Updated: 2010-12-22 08:19)
  • Author: simonmar
  • Categories: (none)
  • Comments (2)

New directions for Template Haskell

Nota bene: TemplateHaskell/BlogPostChanges is a copy of this blog post, but with subsequent edits and improvements. Don't pay too much attention to the text below; I'm keeping it only so that you can see the context for comments.

This post explores a set of design proposals for Template Haskell. They are inspired by discussion with Tim Sheard, Kathleen Fisher, and Jacques Carette. It was originally triggered by several Template Haskell tickets: including #4230, #4135, #4128, #4170, #4125, #4124, #4364, #6062, #6089. (See also #7016, which work better with the suggestions below.) Taken together, these proposals would make quite a big change to TH, I think for the better. Happily, I'm pretty sure they are relatively easy to implement.

There's an interesting  critique of Template Haskell on StackOverflow, some (but not all) of which is addressed by proposals here.

But I'd like to get the design right; hence this post. I'm going to treat it as a working draft, and modify it in the light of comments. So please comment.

I'm going to assume that you are more or less familiar with Template Haskell. If not, there's lots of background on the  Template Haskell page. It's a Wiki so you can help to improve it.

Here's the implementation page? the describes how we are going to implement this stuff.

Some brief background

After parsing, GHC runs two completely separate passes, one after the other:

  • The renamer resolves scopes, fixing precisely which binding site is connected which occurrence of every variable. For example, you write
    let x = \x -> x+1 in x
    
    and the renamer changes it to
    let x_77 = \x_78 -> x_78 + 1 in x_77
    
    The renamer also performs depdenency analysis, which sorts bindings (both value declarations and type declarations) into the smallest possible mutually recursive groups. This prior sorting is required to maximise polymorphism in mutually recursive value bindings.
  • The typechecker works on the renamed program, and typechecks it.

At the moment these two passes relate to Template Haskell as follows:

  • Quasi-quotes are run in the renamer. Why? Because quasi-quotes can expand to patterns.

Consider this, which has a quasi-quoted pattern:

\x -> \ [pads| blah |] -> x+1

Is the "x" in "x+1" bound by the outer \x or by the 'x' that might be bought into scope by the [pads| blah |] quasi-quote? The only way to know is to run the quasi-quote, so that's what happens.

  • All other Template Haskell stuff is run in the typechecker. Why? Because we try to typecheck quotations before feeding them into TH functions. More on this below.

The main issue

The big issue is this: Template Haskell is both too weakly typed and too strongly typed.

Too weakly typed

A TH quotation has type Q Exp, a type that says nothing about the type of the quoted term. For example, consider

qnot :: Q Exp -> Q Exp
qnot x = [| not $x |]

Presumably the author expects $x to be a boolean-valued term, but that is not checked. For example we might write

h = $(qnot [| "hello" |])

in which we pass a string-valued term to qnot. The splice will typecheck fine, but the returned code will be the ill-typed not "hello". There is no soundness problem because GHC typechecks the result of the splice $(qnot [| "hello" |]), but the error is reported in code that the user never wrote.

Moreover, errors can be delayed. For example, suppose qnot was like this:

qnot :: Q Exp -> Q Exp
qnot x = [| (not $x, length $x) |]

This cannot possibly be right, becuase $x cannot be both a boolean and a list. Yet TH will accept it, because a splice has type forall a.a. The error will only be reported to callers of qnot.

This is bad. MetaML solves this problem by giving types to quoted terms, something like this:

qnot :: TExp Bool -> TExp Bool
qnot x = [| not $x |]

Here TExp (short for typed expressions) has a type parameter that expresses the type of the quoted term.

In TH we deliberately did not do this, because it restricts expressiveness; see  the original TH paper. We could make this choice without losing soundness because in TH, unlike in MetaML, all splicing is done at compile time, and the result of a splice is typechecked from scratch. But still, it's a weakness and, for some users (stand up Jacques), a very serious weakness.

Too strongly typed

Even though TH cannot guarantee to construct only type-correct code, every quotation is typechecked. For example, the quotation [| "hello" && True |] would be rejected because it is internally inconsistent.

But with the advent of type splices (a very useful feature) typechecking quotes has become hard to do. Consider this:

  f :: Q Type -> Q [Dec]
  f t = [d| data T = MkT $t; 
            g (MkT x) = g+1 |]

This function f returns a declaration quote, declaring T and g. You'll see that the constructor MkT takes an argument whose type is passed (as a quotation) to f -- a type splice.

The difficulty is that we can't typecheck the declaration of 'g' until we know what $t is; and we won't know that until f is called. In short,

  • we can't really typecheck the declaration quote at all

An analogous problem occurs in other declaration splices, for example

  f t = [d| instance C $t where ... |]

Here GHC's check that an instance decl is always of form

  instance C (T a b c)

falls over, again because we don't know what $t will be. Here's another example:

      [| let { f :: $t; f = e } in .. |]

We can't sensibly typecheck the term without knowing what f's type signature is, and we can't know that without expanding the splice.

Here's a rather different example, #4364:

type N0 = $( [t| Z |] )
type N1 = $( [t| Z |] )

Faced with a type splice

type N0 = $(...blah...)

the renamer doesn't know what the splice will expand to, because splices are currently run later, in the type checker. So it pessimistically assumes that the splice could expand to mention anything in scope. But that pessimistic assuption triggers the error message

     Cycle in type synonym declarations:
       m.hs:7:1-23: type N0 = $([t| Z |])
       m.hs:8:1-23: type N1 = $([t| Z |])  }}}

All this is quite annoying. Several users have said "I'm just using a quotation as a convenient way to build a syntax tree. Please don't even try to typecheck it; just wait until it is finally spliced into a top-level splice".


A proposal

TH currently embodies an uneasy compromise between being too strongly typed and too weakly typed. So my proposal is to move TH in both directions at once:

  • Part A: Move the existing structures towards the expressive but weakly-typed end.
  • Part B: Add new MetaML-style constructs for strongly-typed metaprogramming.
  • Part C: Clean up and improve reification
  • Part D: Quasi-quotation improvements

Part A: Reduce typechecking for quotes

On the "make-it-weaker" front, here's what I propose:

  • Cease typechecking TH quotes altogether. Instead, to use GHC's terminology, we would rename a quote, but not typecheck it. The renaming pass ensures that the scope hygiene mechanisms would remain unchanged. By not attempting to typecheck we avoid all the tricky problems sketched above.
  • Add pattern splices and local declaration splices, as requested in #1476. For example
    -- mkPat :: Q Pat -> Q Pat
    f $(mkPat [p| x |]) = x+1
    
    -- mkDecl :: Int -> Q [Dec]
    g x = let $(mkDecl 3) in ...
    
    These are not supported today because they bring new variables into scope, and hence are incompatible with running splices only after the renamer is finished; see  Notes on Template Haskell, section 8.
  • Run TH splices in the renamer, uniformly with quasi-quotes. Of course, we must still typecheck the code we are about to run. But there's an existing TH restriction that code run in top-level splices must be imported. So we can typecheck this code even during renaming, because it can only mention imported (and hence already fully-typechecked) code.

This solves #4364 because we run the splice in the renamer, so things are sorted out by the time we are checking for cycles (in the type checker).

  • Allow quoted names as patterns as  requested by Conal Eliott. This is just a variation on allowing splices in patterns, since a quoted name 'x is really just a simple splice

These changes would essentially implement the desires of those who say "I just want to generate syntax trees". All the mentioned bug reports would be fixed. The big loss is that quotes would not be typechecked at all.

Lexical scoping

Consider these definitions:

g :: Int -> Q Pat

y :: Int
y = 7

f :: Int -> Q Exp
f n = [| \ $(g n) -> y+1 |]

Where is the 'y' bound in the RHS of f?

  • Perhaps by the y = 7 that is in scope at the definition of f?
  • Perhaps by the pattern that $(g n) expands to?
  • Perhaps by a 'y' that is in scope at the splice site of f?
  • Does it depend on whether $(g n) in fact binds 'y'?

A major point about TH is that we get lexical scoping (also called "hygienic"). So, to me it seems the the first of these choices is the only reasonable one. If you want the second you can instead use explicit dynamic binding by saying

f n = [| \ $(g n) -> $(return (VarE (mkName "n"))) + 1 |]

So the rule would be:

  • In a quote, a variable 'v' is bound by the lexically enclosing binding of 'v', ignoring all pattern and declaration splices.

To be consistent this should apply to top level splices too.

A variation (probably not)

A possible, rather ad hoc, variation would be to still typecheck quotes that are (a) top level, and (b) expression quotes. For example, we might still reject this:

f x = [| $x + (True 'c') |]

because the quote is obviously ill-typed. Only quotes nested inside top-level splices would avoid the type checker (because if the splice is run in the renamer, we can't typecheck the nexted quote). For example:

$(f [| True 'c' |])

This splice would run in the renamer, and only the result of the splice would be typechecked. But what about this?

f t = [| let g :: $t-> Int; g = e in .... |]

This is still very awkward to typecheck. After all, if $t expands to a polymorphic type, the result of the splice might typecheck, but it's really impossible to typecheck without knowing the signature. Maybe we should just give up if there's a type splice? The only really simple thing is not to typecheck quotes at all.


Part B: Add MetaML-style typed quotes

Template Haskell has quotes for terms, types, patterns, and declarations. They are all untyped, in the sense that the type of the quote tells you nothing about the type of the quoted thing. For example

  [| True |] :: Q Exp

There's no clue that the type of the quoted expression is Bool.

In the case of terms (only), we know how from MetaML to have typed quotations. Here's a proposed extension to TH to add typed term quotes:

  • Add a new type of typed expressions TExp a
  • Add a new term quotation form [|| e ||], called a typed quote; the type of the quote is TExp ty, where ty is the type of e. In the type-system jargon, this is the "introduction form" for TExp.
  • Add a new splice form $$e, called a typed splice. The term e must have type TExp ty, and the splice $$e then has type ty. This is the "elimination form" for TExp.
  • Add a constant which takes a typed quote and returns an untyped one: unType :: TExp a -> Q Exp
  • Run these new typed splices in the typechecker, not the renamer.

(The precise syntax for typed-quotes and type-splices is of course up for grabs. But doubling the symbols seems plausible to me.)

Here's a standard example:

power :: Int -> TExp (Int -> Int)
power n = [|| \x -> $$(go n [|| x ||]) ||]
  where
    go :: TExp Int -> TExp Int
    go 0 x = [|| 1 ||]
    go n x = [|| $x * $$(go (n-1)) ||]

You could do this with existing TH but there'd be no guarantee that power would return a well-typed term. With TExp there is.

Points to notice

  • Unlike TH, the only way to construct a value of type TExp is with a quote. You cannot drop into do-ntation, nor use explicit construction of the values in the Exp algebraic data type. That restriction limits expressiveness, but it enables the strong typing guarantees.
  • There are no declaration, type, or pattern quotes for these typed quotes. Only terms.
  • You can't use an untyped splice in a typed quote, thus [|| ...$(e)... ||]. Similarly, you can't splice a type, pattern, or declaration group in a typed term quote.
  • Using unType you can go from the typed world to the untyped one, which lets you mix the worlds. Example:
    f :: TExp Int -> Q Exp -> Q Exp
    f qt qu = [| $(unType qt) + $qu |]
    
  • Unlike Exp, TExp is an abstract type, so you can't decompose values of type TExp. All you can do with them is splice them (into a program or a larger quote). Or you can convert to a Q Exp and then decompose, using the existing TH mechanisms. For example
    g :: TExp Int -> Q Exp
    g qt = do { tree <- unType qt
              ; case tree of
                  VarE v -> ...
                  ConE c -> ...
                  ...etc...  }
    
  • TExp is not a monad, but it is an applicative type constructor, although not quite an instance of Applicative class:
       pure e   ===   [|| e ||]
       f <*> g   =    [|| $$f $$g ||]
    
    Reminder: the Applicative class looks like this
    class Applicative f where
      pure :: a -> f a
      <*>  :: f (a->b) -> f a -> f b
    
    TExp is only "almost an instance" because pure isn't a function; its argument must be syntactically quoted.

Syntax is always a delicate point.

  • We could use some other kind of bracket, although brackets are always in short supply; eg (| ... |) or {| ... |}.
  • We could add Unicode brackets too (suggestions?); but I think we should have ASCII equivalents.
  • Ian asked whether $(...) could accept either Q Exp or TExp. I think not; we need to know which kind of splice it is before type checking.

Part C: Reification and typechecking

The third part of this proposal concerns reification. The Q monad gives you access to the typechecker's environment. Notably, Template Haskell provides the function

reify :: Name -> Q Info

which, given the Name of a variable, type, or class, looks the Name up in the type environment and returns what the type checker knows about it:

data Info = TyConI       -- A type
               Dec	        -- Declaration of the type

          | ClassI       -- A class
               Dec              -- Declaration of the class
               [ClassInstance]	-- The instances of that class

          ...etc...

What reify sees

A dark corner of reify is this: what types does reify see? Consider

f x = ($(do { xi <- reify 'x; ... }),
       x && True)

Here, reify can be used to examine the type of x. But the type of x isn't fully known until the type checker has seen the term (x && True). So in current TH it's going to be unpredicatable what you see for x, which is hardly satisfactory.

It seems to me that the only decent, predictable story is to say that reify can only consult the top-level type environment. More specifically, Template Haskell processes a program top-down:

  module M where
   import ...
   f x = x
   $(th1 4)
   h y = k y y $(blah1)
   $(th2 10)
   w z = $(blah2)

TH processes this module as follows:

  1. Typecheck down to, but not including, the first splice, $(th1 4). These declarations constitute the first declaration group.
  2. Typecheck and run the splice, which returns a bunch of declarations D1
  3. Typecheck the declarations D1 combined with the declarations down to, but not including, the second splice. These declarations consitute the second declaration group.
  4. Typecheck and run the next splice, $(th2 10)
  5. Rinse and repeat

So the proposal is as follows. A declaration group is the chunk of declarations created by a top-level declaration splice, plus those following it, down to but not includig the next top-level declaration splice. Then the type environment seen by reify includes all the declaration up to the end of the immediately preceding declaration block, but no more.

So, in the preceding example:

  • A reify inside the splice $(th1 ..) would see the definition of f.
  • A reify inside the splice $(blah) woudl see the definition of f, but would not see any bindings created by $(th1...).
  • A reify inside the splice $(th2..) would see the definition of f, all the bindings created by $(th1..), and teh definition of h.
  • A reify inside the splice $(blah2) would see the same definitions as the splice $(th2...).

This would mean that you could not say

f x = x
g y = $(do { info <- reify 'f; ... })

because there is no top=level splice between the declaration of f and the splice. But that seems reasonable to me. If you want that behaviour you can say

f x = x
$(return [])
g y = $(do { info <- reify 'f; ... })

Reifying expressions

But what about expressions? It would be useful (stand up Kathleen) to have a more elaborate reify, like this:

typecheck :: [(Name,Type)]  -- Extend the type environment with this 
          -> Exp	    -- The expression to typecheck
          -> Q (Either String Type)
 -- Typecheck the expression, returning
 --    Left error-message     if typechecking fails
 --    Right type             if typechecking succeeds

(For GHCi users, reify f is like :info f, while typecheck [] (...) is like :type (...).)

You might ask whether we can typeckeck an expression; remember, these Q ty things are going to be run in the renamer. But if the type environment is that in force just before the last top-level splice, then all is well: that stuff has been fully typechecked.


Part D: quasiquotation

This part is unrelated to the preceding proposals, and is responding to #4372 and #2041.

  • For #2041, rather than the proposal made there, I think the nicest thing is for Language.Haskell.TH to expose a Haskell quasiquoter:
    parseHaskell :: QuasiQuoter
    
    Remember that a QuasiQuoter is a quadruple of parsers:
    data QuasiQuoter = QuasiQuoter { quoteExp  :: String -> Q Exp,
                                     quotePat  :: String -> Q Pat,
                                     quoteType :: String -> Q Type,
                                     quoteDec  :: String -> Q [Dec] }
    
    If TH provided such parsers, you could use them to parse antiquotes. That seems better to than having strings in the TH syntax.

See #4430 for an excellent point about fixities.

  • For #4372, I'm a bit agnostic. There is no technical issue here; it's just about syntax. Read the notes on the ticket.
  • See #4429 for a suggestion about reifying Names.

Part E: Other minor issues

This section collects other TH changes that I think should be done.

  • The InfixE construtor of Syntax.Exp should only allow a Var in the operator position. See Trac #4877
  • Posted: 2010-10-18 14:55 (Updated: 2013-05-16 09:12)
  • Author: simonpj
  • Categories: (none)
  • Comments (18)

Let generalisation in GHC 7.0

[Note: updated to GHC 7.2, Sept 2011]

GHC 7.2 has a completely new type inference engine. The old one had grown to resemble the closing stages of a game of Jenga, in which making any change was a delicate operation that risked bringing the whole edifice crashing down. In particular, there were a dozen open bugs concerning type families that I had no idea how to fix.

This summer Dimitrios Vytiniotis, Brent Yorgey and I built a completely new inference engine. It rests on a solid technical foundation developed over the last couple of years, in a collaboration with Dimitrios, Tom Schrijvers, and Martin Sulzmann. There is a series of papers that lead up to it, but our JFP submission " Modular type inference with local assumptions" summarises the whole thing in one consistent notation.

The new engine feels much more solid to me, and indeed I've been able to close almost all those open bugs. Doubtless there will be some new ones, but I feel vastly more confident that we can fix them than I did with the old engine.

One other thing is that the new typechecker has completely swept away the old rather ad-hoc stuff about "rigid" types in GADT pattern matches. So you may find that some GADT programs will typecheck with fewer type annotations than before.

However, there is one way in which GHC 7.2 will type fewer programs than before, and that is the focus of this post. I want explain what the change is, and how to adapt your code to accommodate it.

Local polymorphism

Consider this (contrived) program:

{-# LANGUAGE ScopedTypeVariables #-}

f :: forall a. a -> ((a, Char), (a, Bool))
f x = (g 'v', g True)
    where
      g :: forall b. b -> (a,b)
      g y = (x,y)

This polymorphism is expressed in g's type signature:

  • g is a polymorphic function, because it is applied both to the character 'v' and to the boolean True.
  • However, g is monomorphic in a, because g's body mentions x

The type signature g :: forall b. b -> (a,b) reflects these two points:

  • The "forall b" says that g is polymorphic in b.
  • The a in the signature must is the a in the type of x. This type variable a is brought into scope, in the body of f, by the forall a in the type signature for f.

The LANGUAGE ScopedTypeVariables pragma (a) allows you to use an explicit "forall" in the type signature for a function f, and (b) brings the forall'd type variables (a in this case) into scope in the body of the function f.

What has changed in GHC 7.2?

GHC 6.12 will compile the above program without any type signatures at all. It infers the polymorphic type for g. But, if you are using type families or GADTs, GHC 7.0 will not. More precisely:

  • The flags -XGADTs or -XTypeFamilies, or the corresponding LANGUAGE pragmas, imply the (new) flag -XMonoLocalBinds.
  • With -XMonoLocalBinds, non-top-level (or local) let/where bindings are not generalised.
  • Remember that -fglasgow-exts implies -XGADTs and hence -XTypeFamilies.

So given the type-signature-free code

{-# LANGUAGE MonoLocalBinds #-}
f x = (g 'v', g True)
    where
      g y = (x,y)

GHC 7.2 will reject the program with

Foo.hs:6:17:
    Couldn't match expected type `Char' with actual type `Bool'
    In the first argument of `g', namely `True'
    In the expression: g True
    In the expression: (g 'v', g True)

Why? Because the type of g is not generalised, so it can only be applied to either Char or Bool but not both.

Which bindings are affected?

In the JFP paper we propose the rule that local bindings are not generalised, where "local" means "syntactically not top level". This was the rule implemented by GHC 7.0.

However GHC 7.2 relaxes the restriction slightly. Consider

{-# LANGUAGE MonoLocalBinds #-}
f x = (k 'v', k True)
    where
      h y = (y,y)  -- Note: x is not mentioned
      k z = (h z, h z)

Now the binding for h is "local" in the sense that it is not syntactically top-level, but it has no free variables, so it could have been top-level. Similarly, k is "local" but only mentions h which could have been top-level, and hence k could be top-level too. GHC 7.2 spots this, and generalises both just as if they had been top level.

Here's the rule. With -XMonoLocalBinds (the default), a binding without a type signature is generalised only if all its free variables are closed.

A binding is closed if and only if

  • It has a type signature, and the type signature has no free variables; or
  • It has no type signature, and all its free variables are closed, and it is unaffected by the monomorphism restriction. And hence it is fully generalised.

In our example, h has no free variables and hence is generalised; moreover, k has free variable h, but it is closed, so k too can be generalised.

Here's another example:

x = 4 + 5
y = x + 7

Both bindings are top-level, but x falls under the Monomorphism Restriction and is not generalised for that reason. So x is not closed. Hence y is not generalised either, since it has a non-closed free variable x. (This really fixes a bug in GHC 7.0 which erroneously generalised y because it was syntactically top level.)

How can I fix my program?

Almost all code doesn't need local let definitions to have a polymorphic type, because the function is only used at one type. But not all. There are two ways to fix your program if it falls over because of the MonoLocalBinds change.

  • Check whether you need -XGADTs or -XTypeFamilies. If you are using the blunderbuss -fglasgow-exts, replace it with proper LANGUAGE pragmas.
  • The good way is simply give a type signature to the local definition, as I did for g above. Writing such a type signature may force you to use {-# LANGUAGE ScopedTypeVariables #-} as the above example showed. My experience is that the code is easier to understand with the type signature.
  • The -XGADTs or -XTypeFamilies pragmas switch on MonoLocalBinds but, if you want, you can override it with -XNoMonoLocalBinds (or the equivalent LANGUAGE pragma). The type checker will then do its best to generalise local bindings, and your program will almost certainly work. However, I don't know how to guarantee any good properties of the type inference algorithm. So I think this is ok as as short term fix, but I'd like to encourage you to add that handful of type signatures instead.

You may not be very familiar with the code, so it can be tricky to work out what type the local definition should have. (That's one reason I think that the code is improved by having a type signature!) With this in mind I've added a new warning flag -fwarn-missing-local-sigs, which prints a warning, and the type, of any polymorphic local binding that lacks a type signature. So you can follow this procedure:

  • Compile with -XNoMonoLocalBinds -fwarn-missing-local-sigs. The first flag makes it compile, and the second shows you the types.
  • Compile with neither flag, getting some errors.
  • Add type signatures to cure the errors, using the types printed out in the first step.

Adding the type signature is completely compatible with GHC 6.12 etc, so you shouldn't need any conditional compilation if you want to compile your code with multiple versions of GHC.

A second, bigger example

One relatively common situation in which you need a local type signature is when you use runST with some auxiliary definitions. Here is a typical example:

{-# LANGUAGE ScopedTypeVariables #-}
module Test where
import Control.Monad.ST
import Data.STRef

foo :: forall a. Bool -> a -> a -> a
foo b x y = runST (do { r <- newSTRef x; fill r; readSTRef r })
          where
            -- fill :: forall s. STRef s a -> ST s ()
            fill r = case b of
                       True  -> return ()
                       False -> writeSTRef r y

-- runST :: forall a. (forall s. ST s a) -> a
-- newSTRef   :: a -> ST s (STRef s a)
-- readSTRef  :: STRef s a -> ST s a
-- writeSTRef :: STRef s a -> a -> ST s ()

The function foo is a bit contrived, but it's typical of the way in which runST is often used. We allocate a reference cell, containing x, call fill, and then read out what is now in the reference cell. All fill does is overwrite the cell if b is False. So foo is really an elaborate if function.

But in GHC 7.0, with -XMonoLocalBinds (or -XGADTs, -XTypeFamilies), the program will be rejected with this alarming-seeming error message

Foo.hs:7:11:
    Couldn't match type `s' with `s1'
      because this skolem type variable would escape: `s1'
    This skolem is bound by the polymorphic type `forall s. ST s a'
    The following variables have types that mention s
      fill :: STRef s a -> ST s () (bound at Foo.hs:10:11)
    In the first argument of `runST', namely
      `(do { r <- newSTRef x; fill r; readSTRef r })'

Remember, fill has not been generalised, so it gets a monomorphic type. That in turn means that the argument to runST is not polymorphic enough.

Despite its scariness, in some ways the error message identifies the problem more clearly than the previous example. In particular, the error message identifies fill as the culprit. You can fix it the same way as before, by adding a signature for fill; I've put it in a comment above.

Why make this change?

At first this change seems like retrograde step, so why did we make it? It's a long story, but the short summary is this: I don't know how to build a reliable, predictable type inference engine for a type system that has both (a) generalisation of local let/where and (b) local type equality assumptions, such as those introduced by GADTs. The story is told in our paper " Let should not be generalised" and, at greater length, in the journal version " Modular type inference with local assumptions".

  • Posted: 2010-09-29 18:53 (Updated: 2011-09-07 08:08)
  • Author: simonpj
  • Categories: (none)
  • Comments (1)

First results from GHC's new garbage collector

For a number of months now, we have been designing and developing a new garbage collector for GHC, with the primary goal of improving scaling on multicores. The current garbage collector isn't exactly a slouch: it is a parallel generational design, and after  going to some effort to improve locality we managed to achieve some respectable scaling results on up to 8 cores, and the Intel CnC project has been able to achieve  even greater speedups (as much as 22x on a 32-core machine) with carefully-tuned benchmarks.

Still, we recognise that the garbage collector is often the bottleneck where scaling is concerned. Tuning a program for scaling usually involves reducing its demands on the memory subsystem and hence reducing the amount of GC that happens. Here's how the current GC impedes scaling:

This is a screenshot from  ThreadScope, showing part of the timeline of the execution of a parallel Haskell program - in this case, the  minimax program for searching the game tree of 4x4 noughts and crosses. Each bar represents the execution of a single CPU, and the thin orange/green bars are the GCs. Every GC stops all the threads and performs the GC in parallel, before restarting the computation again. This is known as "stop the world" parallal GC - compared to other techniques such as concurrent GC it's easier to implement and generally gives good performance. Many industrial-strength virtual machines use stop-the-world parallel GC in their server products, because it gives the best throughput. However, the diagram clearly shows that we're wasting time here: the GCs aren't well-balanced (the green parts show the actual work, the blank parts are idle time). We deliberately don't load-balance the work in young-generation collections, because doing so impacts locality and is worse for overall performance (we discussed this in the  ICFP'09 paper and gave measurements). So we end up wasting some of our processor bandwidth, and hence losing some scaling.

Not only that, but the cost of the all-core synchronisation becomes a bottleneck as the number of cores increases, and the need for rapid synchronisations causes problems if the OS decides to borrow one of the cores to do something else for a while: the symptom of this problem has been dubbed the "last core parallel slowdown" in Haskell.

Here's what our new GC looks like on the same program:

The heap organisation in fact hasn't changed much: there are still per-core young generations and a single old generation, but the improvement is that the per-core young generations can be collected independently without stopping the other threads. Collecting the old generation - the "global heap" - still requires stopping all the threads, but since this is the old generation, global collections are much less frequent than local collections.

Since there are fewer synchronisations and less wasted processor bandwidth, the overall throughput is higher - only about 10% on 8 cores with this example, but there are other (less picturesque) examples that improve more, and it's still early days and we have a lot of tuning to do. We expect the benefits to be greater on larger numbers of cores, and furthermore the "last core slowdown" should be finally banished.

Designs like this have appeared in the literature (starting with  Doligez/Leroy POPL'93), and ours borrows ideas from several of these earlier designs while adding a few novel twists of our own. We plan to write up the design properly once all the tuning is done and we have some solid results. I expect it to land in GHC HEAD in a few months time, and it should be in the autumn 2011 major release of GHC.

Debian 6.0 "Squeeze" frozen... announcement mentions GHC

The  announcement for the recent freeze of Debian 6.0 specifically mentions GHC 6.12 alongside Python, Perl, and GCC in the list of compilers and interpreters for "common languages". A milestone reached? (thanks to Reuben Thomas for sending us the link)

  • Posted: 2010-08-25 06:32 (Updated: 2010-08-25 06:33)
  • Author: simonmar
  • Categories: (none)
  • Comments (1)

The GHC blog

As an experiment, we are moving the GHC blog from  http://ghcmutterings.wordpress.com/ to here.

  • Posted: 2010-08-24 06:20 (Updated: 2010-08-24 06:22)
  • Author: simonmar
  • Categories: (none)
  • Comments (0)