|Version 12 (modified by dterei, 3 years ago)|
Work in Progress on the LLVM Backend
This page is meant to collect together information about people working on (or interested in working on) LLVM in GHC, and the projects they are looking at. See also the state of play of the whole back end.
- David Terei is going to be at Microsoft Research for an internship to work on LLVM from the middle of April
- There is interest on working on LLVM as part of the Google Summer of Code
- The GSoC timeline is May 24th - August 9th ( http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/faqs#timeline), so this overlaps with David's internship
- Max Bolingbroke ( http://www.cl.cam.ac.uk/~mb566) has proposed a SoC project to work on LLVM http://hackage.haskell.org/trac/summer-of-code/ticket/1582
- Alp Mestanogullari ( http://alpmestan.wordpress.com/, http://twitter.com/alpmestan) is interested in working on a SoC project on LLVM
Small Ticket Items
- Use a new Monad instead of passing LlvmEnv around everywhere.
Big Ticket Items
LLVM IR Representation
The LLVM IR is modeled in GHC using an algebraic data type to represent the first order abstract syntax of the LLVM assembly code. The LLVM representation lives in the 'Llvm' subdirectory and also contains code for pretty printing. This is the same approach taken by EHC's LLVM Back-end, and we adapted the module developed by them for this purpose.
The current design is overly complicated and could be faster. It uses String + show operations for printing for example when it should be using FastString + Outputable. Before simplifying this design though it would be good to investigate using the LLVM API instead of the assembly language for interacting with LLVM. This would be done most likely by using the pre-existing Haskell LLVM API bindings found here. This should hopefully provide a speed up in compilation speeds which is greatly needed since the LLVM back-end is ~2x slower at the moment.
We now support TNTC using an approach of gnu as subsections. This seems to work fine but we would like still to move to a pure LLVM solution. Ideally we would implement this in LLVM by allowing a global variable to be associated with a function, so that LLVM is aware that the two will be laid out next to each other and can better optimise (e.g using this approach LLVM should be able to perform constant propagation on info-tables).
Update (30/06/2010): The current TNTC solution doesn't work on Mac OS X. So we need to implement an LLVM based solution. We currently support OS X by post processing the assembly. Pure LLVM is a nicer way forward.
Optimise LLVM for the type of Code GHC produces
At the moment only a some fairly basic benchmarking has been done of the LLVM back-end. Enough to give an indication of how it performs on the whole (well as far as you trust benchmarks anyway) and of what it can sometimes achieve. However this is by no means exauhstive or probably even close to it and doesn't give us enough information about the areas where LLVM performs badly. The LLVM optimisation pass also at the moment just uses the standard '-O' levels, which like GCC entail a whole bunch of optimisation passes. These groups are designed for C programs mostly.
- More benchmarking, particularly finding some bad spots for the LLVM back-end and generating a good picture of the characteristics of the back-end.
- Look into the LLVM optimiser, e.g perhaps some more work in the style of Don's work
- Look at any new optimisation passes that could be written for LLVM which would help to improve the code it generates for GHC.
- Look at general fixes/improvement to LLVM to improve the code it generates for LLVM (e.g at the moment LLVM performs a lot of redundant stack manipulation in the code in generates for GHC, would be good to fix this up).
Update the Back-end to use the new Cmm data types / New Code Generator
There is ongoing work to produce a new, nicer, more modular code generator for GHC (the slightly confusingly name code generator in GHC refers to the pipeline stage where the Core IR is compiled to the Cmm IR). The LLVM back-end could be updated to make sure it works with the new code generator and does so in an efficient manner.
LLVM's Link Time Optimisations
One of LLVM's big marketing features is its support for link time optimisation. This does thinks such as in-lining across module boundaries, more aggressive dead code elimination... ect). The LLVM back-end could be updated to make use of this. Roman apparently tried to use the new 'gold' linker with GHC and it doesn't support all the needed features.
LLVM Cross Compiler / Port
This is more of an experimental idea but the LLVM back-end looks like it would make a great choice for Porting LLVM. That is, instead of porting LLVM through the usual route of via-C and then fixing up the NCG, just try to do it all through the LLVM back-end. As LLVM is quite portable and supported on more platforms then GHC, it would be an interesting and valuable experiment to try to port GHC to a new platform by simply getting the LLVM back-end working on it. (The LLVM back-end works in both unregistered and registered mode, another advantage for porting compared to the C and NCG back-ends).
It would also be interesting to looking into improving GHC to support cross compiling and doing this through the LLVM back-end as it should be easier to fix up to support this feature than the C or NCG back-ends.