| | 1 | This page collects information about the performance of GHC and GHC-compiled code. We're interested in programs that GHC compiles slowly, programs that run slowly (especially small ones), and ideas for improving performance of either GHC itself or GHC-compiled code. |
| | 2 | |
| | 3 | = Performance of GHC = |
| | 4 | |
| | 5 | The Todo list for GHC performance: |
| | 6 | |
| | 7 | * Emit the interface immediately after typechecking in non-optimised compilation |
| | 8 | * Turn on the NCG by default for -O (possibly improve x87 float code first?) |
| | 9 | * Stringbuffer could use mmap() |
| | 10 | * Improve performance of generating output (SimonMarlow: done). |
| | 11 | * Profile & do a space-leak sweep (SimonMarlow: done, fixed one space leak, space usage not great but could be worse). |
| | 12 | * Improve binary interface representation (SimonMarlow: some improvement here). |
| | 13 | |
| | 14 | Half-baked ideas: |
| | 15 | |
| | 16 | * glom the interfaces for a package into a single uber-interface with one string table. |
| | 17 | * Classic loop optimizations that are not invariant hoisting. (Jan-Willem Maessen) |
| | 18 | |
| | 19 | == Modules that GHC compiles slowly == |
| | 20 | |
| | 21 | Simon PJ found a case of non-linearity in the simplifier; we are currently testing a patch that hopefully fixes all of the following: |
| | 22 | |
| | 23 | * Language.Haskell.Syntax |
| | 24 | * large constant structures taking non-linear time to compile |
| | 25 | * large Happy-generated grammars |
| | 26 | * WashNGo-2.4.6/WASH/HTMLMonad98.hs from WASH/CGI (SimonMarlow: investigating) |
| | 27 | * dsrun011 from the test suite (SimonMarlow: fixed) |
| | 28 | * record with lots of fields (20+), deriving Show, with -O (SimonMarlow: fixed) |
| | 29 | |
| | 30 | = Performance of compiled code = |
| | 31 | |
| | 32 | == Programs that run too slowly == |
| | 33 | |
| | 34 | * Various of the shootout benchmarks (Todo: expand). |
| | 35 | |
| | 36 | == Performance Todo list == |
| | 37 | |
| | 38 | Here we list some problems that we know about, and/or ideas for improving things. |
| | 39 | |
| | 40 | * x86_64: use more registers (the NCG needs to save/restore volatile regs over C calls, apart from that we're there). |
| | 41 | * via-C produces code with too many jumps, especially with later versions of GCC. GCC commons up identical basic blocks containing a single jump instruction. |
| | 42 | * GC has been reported to behave non-linearly http://www.haskell.org//pipermail/glasgow-haskell-users/2005-April/008301.html (SimonMarlow: turned out to be caused by swapping) |
| | 43 | * Space leak: a function that is statically known to not use some or all of its arguments should have a function descriptor that reflects the unused pointers as non-pointers, so that a PAP applied to unused args doesn't retain them. |
| | 44 | * Push heap check into a case branch if that means we can avoid a heap check in a tail-recursive loop. |
| | 45 | * Make ticky-ticky profiling work again. |
| | 46 | * Make the NCG handle loops, and perform some simple loop optimisations. |
| | 47 | * Better optimisation pass sequences? (c.f. Laszlo Nemeth's research http://www.tcs.informatik.uni-muenchen.de/~hwloidl/TFP04/Abstracts/17.html) |
| | 48 | * Can we do anything about Double alignment? |
| | 49 | * better GC for chains of THUNK_SELECTORs (see comments in GC.c) |
| | 50 | * Int64 reported to be slow http://www.haskell.org//pipermail/glasgow-haskell-users/2005-June/008574.html (Lemmih: Fixed. It's now only three times slower to use Int64 over Int in that example. (patch not commited yet)). |
| | 51 | * System.Random could be optimized more. |
| | 52 | |
| | 53 | A few slightly larger projects: |
| | 54 | |
| | 55 | * parallel GC for multi-processors |
| | 56 | * Allow strict enumerations to be "unboxed" |
| | 57 | |
| | 58 | Some more open-ended ideas: |
| | 59 | |
| | 60 | * Strict or unlifted types, or strictness annotations on functions |
| | 61 | * How about an anti-strictness annotation? Value is always re-evaluated when used? |
| | 62 | * Can GC performance be improved at all? Perhaps: a combination of depth-first and breadth-first traversal instead of just breadth-first. (SimonMarlow: attempted, no joy. But managed to find some other opportunities for improvement in the GC while I was there.) |
| | 63 | * improve compacting GC performance (ask SimonMarlow for ideas) |
| | 64 | * Whole program optimisation infrastructure. Dump intermediate representation into a special section of the .o files (or use another file) and upon linking the program/library run another optimisation pass over the entire lot before finally generating c/cmm. Only enabled with -O2/-O3 or something. |
| | 65 | |
| | 66 | = Size of compiled code = |
| | 67 | |
| | 68 | List modules or programs that produce code that is too large: |
| | 69 | |
| | 70 | The Todo list for code size: |
| | 71 | |
| | 72 | * Implement dynamic libraries on other platforms: x86/Linux, x86_64/Linux, x86/*BSD, x86_64/*BSD. |
| | 73 | * Find common code fragments and share them |
| | 74 | * Top level CSE of lambdas module alpha equivalence. This come up from time to time when inlining occurs. It doesn't look like it gets done, but I could be missing the pass. (Jan-Willem Maessen) |