|Version 73 (modified by simonpj, 16 months ago)|
GHC's glorious new code generator
This page summarises work that Norman Ramsey, Simon M, Simon PJ, and John Dias are doing on re-architecting GHC's back end. Here is the state of play; see also work on the LLVM back end.
- Norman added a Hoopl performance page
- Edward Yang has a wiki page that describes shortcomings of the code generated by the new pipeline: Commentary/Compiler/NewCodeGenStupidity
- John D has built a complete new codegen pipeline, running alongside the old one, enabled by -fuse-new-codegen. It is described here: Commentary/Compiler/NewCodeGenPipeline. It uses a new representation for Cmm, mostly with "Z" in the name. (Let's call the original Cmm OldCmm and this new one CmmZ.) It has a new conversion STG->CmmZ, and then sequence of passes that optimise and cps-convert the Cmm. Finally, it is converted back to the old Cmm so that it can flow to the old code generators.
- Compiling through the new pipeline passes all tests and GHC is bootstrappable.
- Separately, we have developed yet another, and still better, Cmm representation, the subject of an upcoming ICFP 2010 submission. It uses phantom types and GADTs to add very useful open/closed invariants. This isn't in GHC at all yet. I'll call it CmmGADT for easy reference.
Generally we want to keep old and new pipelines working simultaneously, so that we can swith only when we are sure the new stuff works. Next steps in this grand plan are:
- Check the impact on compilation time of the new route.
- Finalise CmmGADT and make the new pipeline use it.
- Make the Cmm parser (which parses .cmm files from the RTS) produce CmmGADT, and push that down the new pipeline.
- Implement the many refactorings and improvements to the new pipeline described in http://darcs.haskell.org/ghc/compiler/cmm/cmm-notes. See also: Commentary/Compiler/NewCodeGenStupidity
- Instead of converting new Cmm to old Cmm, make the downstream code generators consume CmmGADT, and convert old Cmm to CmmGADT.
- Expand the capability of the new pipeline so that it does native code generation too, and we can ultimately discard the existing code generators. The design of this stage is here: Commentary/Compiler/IntegratedCodeGen
Workflow for the new code generator and Hoopl
We have the following repositories:
- HEAD: the main GHC git repo. http://darcs.haskell.org/ghc.git
- HooplMaster: the master Hoopl Git repository.
(Physical location: linux.cs.tufts.edu:/r/ghc/www/hoopl/hoopl.git)
- HooplLag: a Git repo that is guaranteed to work with GHC HEAD. It is
not automatically updated by pushes to HooplMaster. Instead a manual
process (below) updates it; hence "lag".
Normal GHC developers, who are uninterested in Hoopl, ignore all this. If they download HEAD, and then do ./sync-all get they'll get HooplLag, which is always guaranteed to work with HEAD.
Developers who work on GHC and also need to modify Hoopl need to ensure their changes end up in both repositories.
- In your hoopl directory in your development tree, add HooplMaster as a remote and update your reference there.
- Hack away in the development tree.
- Record Hoopl commits.
- Run validate in the development tree
- Push the commits in hoopl to the HooplMaster Git repo
- Wait for the mirrors to update (the impatient can run /srv/darcs/do_mirrors on darcs.haskell.org)
- Push the commits in hoopl to the HooplLag Git repo (probably the origin remote)
Status report April 2011
Term’s starting up for me again, and as a result, I will have less time to throw at the new code generator. As such, here is the state of play, as of April 2011.
We currently have a code generator that is correct, but slow and stupid. The primary optimization problem that I was tackling at the end of Spring Break was the following: newly compiled programs used a lot more stack space than their old equivalents, due to poor usage of call areas for register reloading and an over-conservative stack layout algorithm. Our new plan is to overhaul the spiller and the stack layout engine in the following manner:
1. We teach the register spiller about another location: a value may be in a register, on the stack in a register slot, OR (and this is the new bit) inside a call area. If it’s in a call area, reload from that instead of of its slot location. This might require borrowing some code from the code motion improvements in my private branch ( http://github.com/ezyang/ghc).
2. We change the stack layout code to perform interference on a per-word, rather than per-area level. Because this may cause a very huge interference graph, we bound the number of conflicts we remember; instead of recording every slot a slot may conflict with, we remember some number, and if there are more, we just say it conflicts with everything. (There is some potential for statistics and tuning parameters here.)
Bug list (code-gen related bugs that we may be able to fix):
- #1498 (avoid redundant heap check on the fast path)
- #3552 (unreachable code)
- #3462 (a feature)
Below here is old stuff, mostly out of date
Loose ends as at October 2008
- Check nofib performance regressions (one or two allocation (odd)); several on time.
- Compile libraries and test
- Nuke CmmCPSGen, CmmCPS, and the two Maybes in CmmInfo data type.
- Remove optionallyConvertAndOrCPS from main pipeline.
- Fix if-then-else special case in StgCmmExpr
- Perhaps some obvious CSE opportunities
- We only use one GC entry point, but there should be a bunch of canned ones.
- To Convention add StaticNative (which does not use R1), and use it appropriately. Replace GC by gc = Native.
- Allow .cmx files to come in before the new pipeline, to aid graceful migration of RTS.
- Migrate RTS to use procedures
- Explore new calling conventions
- Get rid of SRTs and live-variable info from STG, and from the Core-to-Stg phase.
- Do not split proc-points into separate CmmProc. Not a trivial change, because it involves attaching info tables to blocks, not just to CmmProcs.
- Nuke old code gen, and associated Cmm cruft
- Simplify dataflow framework, by doing deep rewriting only. If possible kill LastExit (Simon's favourite hate).
Notes about the state of play in August, 2008
These notes are largely out of date, but I don't want to dump them till we're sure that we've sucked all the juice out of them.
- Code generator: first draft done.
- Control-flow opt: simple ones done
- Common block elimination: done
- Block concatenation: done
- Adams optimisation: currently done in compiler/cmm/CmmProcPointZ.hs. The Adams optimization should be divorced from this module and replaced with common-block elimination, to be done after the proc-point transformation. In principle this combination may be slightly less effective than the current code, since the selection of proc-point protocols is guided by Adams's criteria, but NR thinks it will be easy to get the common, important cases nailed. Note that the Adams optimization is designed to avoid generating extra continuations in the case of heap checks after a function call.
- Proc-point analysis and transformation: working, although there is plenty of room for experimentation with the calling conventions at proc points. In practice NR recommends the following procedure:
- All optional proc points to be generated with no parameters (all live variables on the stack)
- This situation to be remedied when the code generator is reorganized along the lines NR proposed in July 2007, i.e., the register allocator runs on C-- with calls (as opposed to C-- with jumps only) and therefore before proc-point analysis
- Bypassing proc-point analysis for native back ends: not done.
- Add spill/reload: Implemented to NR's satisfaction in compiler/cmm/CmmSpillReload.hs, with the proviso that spilling is done to abstract stack slots rather than real stack positions (see comments below on stack-slot allocation)
- Stack slot allocation: implemented with a greedy algorithm. There is room for improvement here.
- Make stack explicit: done.
- Split into multiple CmmProcs: done.
ToDo: main issues
- How do we write continuations in the RTS? E.g. the update-frame continuation? Michael Adams had a syntax with two sets of parameters, the the ones on the stack and the return values.
- Review code gen for calls with lots of args. In the existing codegen we push magic continuations that say "apply the return value to N more args". Do we want to do this? ToDo: how rare is it to have too many args?
- Figure out how PAPs work. This may interact with the GC check and stack check at the start of a function call. The concern is how to enter the garbage collector with an infotable that properly describes the live variables. Now that we generate info tables on demand at the end of the pipeline, we can enter the gc with a regular procedure call and expect that the proper info table will be generated.
- Was there something about sinking spills and hoisting reloads?
ToDo: small issues
- Shall we rename Branch to GoTo?!
- Change the C-- parser (which parses RTS .cmm files) to directly construct CmmGraph.
- (SLPJ) See let-no-escape todos in StgCmmExpr.