| | 24 | |
| | 25 | = Implementation = |
| | 26 | |
| | 27 | == Framework == |
| | 28 | |
| | 29 | * New -fllvm code generation pipeline, involved modifying: |
| | 30 | * {{{main/CodeOutput.lhs}}} - Selects appropriate back-end for code generation (C, NCG, LLVM). |
| | 31 | * {{{main/DynFlags.hs}}} - Stores GHC configuration (command line options, compile time options... ect). Added `HscLlvm` target type. |
| | 32 | * {{{ghc.cabal.in}}} - Stores modules/files to compile for ghc. Added new LLVM files and directory stored under `llvmGen`, and new CPP flag to enable the LLVM code generator (`-DLLVM`). |
| | 33 | * {{{ghc.mk}}} - Added new `GhcWithLlvmCodeGen` option which can be set in `build.mk` to `YES` to enable the LLVM code generator. |
| | 34 | * {{{main/DriverPhases.hs}}} - Added `LlvmAs` phase to invoke the compilation of LLVM bitcode/IR to an object file. After this phase linking can occur. |
| | 35 | * {{{main/DriverPipeline.hs}}} - Added code for new `LlvmAs`, `LlvmOpt` and `LlvmLlc` phases. |
| | 36 | * {{{LlvmAs}}} - Invokes `llvm-as` tool to compile a llvm assembly file ('.ll') to a bitcode file (`.bc`). |
| | 37 | * {{{LlvmOpt}}} - Invokes the llvm `opt` tool to optimise the module. Just use the llvm standard optimisation groups of `O1`, `O2`, `O3`, depending on the optimisation level passed to 'ghc' by the user. |
| | 38 | * {{{LlvmLlc}}} - Invokes the llvm `llc` tool to generate the machine code ('.s' file) from the optimised bitcode. 'As' stage runs next, part of existing 'ghc' pipeline. |
| | 39 | * {{{SysTools.lhs}}} - Stores the path and default settings of the system tools needed, so for LLVM back-end this is `llvm-as`, `opt` and `llc`. |
| | 40 | |
| | 41 | The LLVM pipeline works as specified above. Code generation phase occurs, using the {{{dflags}}} option data the appropriate generator is selected (which is the Llvm back-end is `-fllvm` has been specified on the command line). After code generation, the next phase is determined, this is done from the `HscLlvm` target data constructor which is selected at ghc startup by {{{DynFlags.hs}}}. The next phase is `LlvmAs` which will compile the text IR to an LLVM bitcode file (equivalent to `llvm-as` tool). After this the `LlvmLlc` phase is run, which produces a native object file from the llvm bitcode file (equivalanet to the `llc` tool). At this stage, the output from all three back-ends should be 'equivalent'. After this phase, the `StopLn`, or linking phase occurs which should result in the end result. Compiling some Haskell code with the c-backend and some with the llvm-backend and linking them together is supported. |
| | 42 | |
| | 43 | == LLVM Code Generation == |
| | 44 | |
| | 45 | For LLVM code generation we need a method for representing and generating LLVM code. The [http://llvm.org/docs/FAQ.html#langirgen LLVM FAQ] suggest the following possible approaches: |
| | 46 | * Call into LLVM Libraries using FFI (can probably use [http://hackage.haskell.org/package/llvm Haskell LLVM Bindings]) |
| | 47 | * Emit LLVM Assembly (approach taken by [http://www.cs.uu.nl/wiki/Ehc/WebHome EHC's] LLVM Back-end, can use the [https://subversion.cs.uu.nl/repos/project.UHC.pub/trunk/EHC/src/ehc/LLVM.cag module] developed by them for this) |
| | 48 | * Emit LLVM Bitcode (can't see any reason to do this) |
| | 49 | |
| | 50 | The approach taken was to use the LLVM module from [http://www.cs.uu.nl/wiki/Ehc/WebHome EHC]. This module contains an abstract syntax representation of LLVM Assembly and the ability to pretty print it. It has been heavily modified to increase its language coverage as it was missing several LLVM constructs which were needed. Ideally we would like to add a second pretty printer which calls into the LLVM C++ API to generate LLVM Bitcode. This should hopefully decrease the compile times and make the back-end more resilient to future changes to LLVM Assembly. The LLVM Haskell binding (first option) wasn't used as it represents LLVM at a very high level, which isn't appropriate for the back-end. |
| | 51 | |
| | 52 | == Code Generation == |
| | 53 | |
| | 54 | Code generation consists of translating a list of `GenCmmTop` data types to LLVM code. `GenCmmTop` has the following form: |
| | 55 | |
| | 56 | {{{ |
| | 57 | data GenCmmTop d h g |
| | 58 | = CmmProc -- Function |
| | 59 | h -- Extra header such as the info table |
| | 60 | CLabel -- Procedure name |
| | 61 | CmmFormals -- Argument locals live on entry |
| | 62 | g -- Control flow graph |
| | 63 | |
| | 64 | | CmmData -- Static data |
| | 65 | Section -- Type |
| | 66 | [d] -- Data |
| | 67 | |
| | 68 | data GenBasicBlock i = BasicBlock BlockId [i] |
| | 69 | newtype ListGraph i = ListGraph [[GenBasicBlock i] |
| | 70 | |
| | 71 | type RawCmmTop = GenCmmmTop CmmStatic [CmmStatic] (ListGraph CmmStmt) |
| | 72 | }}} |
| | 73 | |
| | 74 | That is, it consists of two types, static data and functions. Each can largely be handled separately. Just enough information is needed such that pointers can be constructed to them and in many cases this information can be gathered from assumptions and constraints on Cmm. |
| | 75 | |
| | 76 | The code generator lives in `llvmGen` with the driver being `llvmGen/LlvmCodeGen.lhs`. |
| | 77 | |
| | 78 | A large part of the code generation is keeping track of defined variables/functions and their type. An `LlvmEnv` construct is used for this. It is simply a dictionary storing function/variable names with their corresponding type information. This is used to create correct references/pointers between variables and functions. |
| | 79 | |
| | 80 | === Unregistered Vs. Registered === |
| | 81 | |
| | 82 | Code generation can take place in two general modes, `unregistered` and `registered`. There are two major differences from a back-end code generation point of view. Firstly, in unregistered mode a optimisation features called {{{TABLES_NEXT_TO_CODE}}} is disabled. This means that the `h` field of `CmmProc` is empty. In registered mode it instead contains the `CmmStatic` data for the procedures info table which must be placed just before the procedure in the generated code so that both the info table and procedure can be accessed through one pointer. This optimisation can be disabled separately though in `registered` mode. |
| | 83 | |
| | 84 | The other major change is the use of pinned global registers. The `Cmm` language includes a concept called registers. These are used like machine registers or variables in C to store the result of expressions. Unlike `LLVM` they are mutable. `Cmm` includes two types of registers as you can see below: |
| | 85 | |
| | 86 | {{{ |
| | 87 | data CmmReg |
| | 88 | = CmmLocal LocalReg |
| | 89 | | CmmGlobal GlobalReg |
| | 90 | deriving( Eq, Ord ) |
| | 91 | }}} |
| | 92 | |
| | 93 | A `LocalReg` is a temporary general purpose registered used in a procedure with scope of a single procedure. A `GlobalReg` on the other hand has global scope and a specific use. They are used just like machine registers, with a Stack Pointer and Heap Pointer registers creating a virtual machine (`STG`). `GlobalReg` is of the form: |
| | 94 | |
| | 95 | {{{ |
| | 96 | data GlobalReg |
| | 97 | -- Argument and return registers |
| | 98 | = VanillaReg -- pointers, unboxed ints and chars |
| | 99 | {-# UNPACK #-} !Int -- its number |
| | 100 | VGcPtr |
| | 101 | |
| | 102 | | FloatReg -- single-precision floating-point registers |
| | 103 | {-# UNPACK #-} !Int -- its number |
| | 104 | |
| | 105 | | DoubleReg -- double-precision floating-point registers |
| | 106 | {-# UNPACK #-} !Int -- its number |
| | 107 | |
| | 108 | | LongReg -- long int registers (64-bit, really) |
| | 109 | {-# UNPACK #-} !Int -- its number |
| | 110 | |
| | 111 | -- STG registers |
| | 112 | | Sp -- Stack ptr; points to last occupied stack location. |
| | 113 | | SpLim -- Stack limit |
| | 114 | | Hp -- Heap ptr; points to last occupied heap location. |
| | 115 | | HpLim -- Heap limit register |
| | 116 | | CurrentTSO -- pointer to current thread's TSO |
| | 117 | | CurrentNursery -- pointer to allocation area |
| | 118 | | HpAlloc -- allocation count for heap check failure |
| | 119 | |
| | 120 | -- We keep the address of some commonly-called |
| | 121 | -- functions in the register table, to keep code |
| | 122 | -- size down: |
| | 123 | | EagerBlackholeInfo -- stg_EAGER_BLACKHOLE_info |
| | 124 | | GCEnter1 -- stg_gc_enter_1 |
| | 125 | | GCFun -- stg_gc_fun |
| | 126 | |
| | 127 | -- Base offset for the register table, used for accessing registers |
| | 128 | -- which do not have real registers assigned to them. This register |
| | 129 | -- will only appear after we have expanded GlobalReg into memory accesses |
| | 130 | -- (where necessary) in the native code generator. |
| | 131 | | BaseReg |
| | 132 | |
| | 133 | -- Base Register for PIC (position-independent code) calculations |
| | 134 | -- Only used inside the native code generator. It's exact meaning differs |
| | 135 | -- from platform to platform (see module PositionIndependentCode). |
| | 136 | | PicBaseReg |
| | 137 | |
| | 138 | deriving( Show ) |
| | 139 | }}} |
| | 140 | |
| | 141 | In unregistered mode these global registers are all just stored in memory in the heap. A specific pass operating on Cmm that takes place just before code generation thus transforms code such as: |
| | 142 | |
| | 143 | {{{ |
| | 144 | __stginit_ZCMain() { |
| | 145 | { update_frame: <none> |
| | 146 | } |
| | 147 | cfF: |
| | 148 | Sp = Sp + 4; |
| | 149 | jump (I32[Sp - 4]) (); |
| | 150 | } |
| | 151 | }}} |
| | 152 | |
| | 153 | into the following unregistered form for code generation: |
| | 154 | |
| | 155 | {{{ |
| | 156 | __stginit_main::Main() { |
| | 157 | { [] |
| | 158 | } |
| | 159 | crF: |
| | 160 | I32[MainCapability+92] = I32[MainCapability+92] + 4; |
| | 161 | jump (I32[I32[MainCapability+92] - 4]) (); |
| | 162 | } |
| | 163 | }}} |
| | 164 | |
| | 165 | Where `MainCapability` is a label to the start of a RTS defined structure storing all the global registers. |
| | 166 | |
| | 167 | In registered mode as many of these global registers are assigned permanently to fixed hardware registers. This is done as it greatly improves performance. As these registers are accessed very frequently needing to load and store to memory for accessing adds a great cost. So for example on `x86` the following map between `Cmm` global registers and `x86` hardware registers exists: |
| | 168 | |
| | 169 | {{{ |
| | 170 | Base -> %EBX |
| | 171 | Sp -> %EBP |
| | 172 | Hp -> %EDI |
| | 173 | R1 -> %ESI |
| | 174 | }}} |
| | 175 | |
| | 176 | These are all the available `callee save` registers on x86. `callee save` are used as in ghc generated code now saving and restoring of these registers are needed due to there new special use and because GHC uses continuation passing style, so a `'ret'` statement is never actually generated. And since they are `callee save`, foreign code can also be called without any need to handle the `Cmm` registers. |
| | 177 | |
| | 178 | == `CmmData` == |
| | 179 | |
| | 180 | `CmmData` takes the following form: |
| | 181 | |
| | 182 | {{{ |
| | 183 | newtype CmmData = CmmData Section [CmmStatic] |
| | 184 | |
| | 185 | data CmmStatic |
| | 186 | = CmmStaticLit CmmLit -- static value |
| | 187 | | CmmUninitialised Int -- n bytes of uninitialised data |
| | 188 | | CmmAlign Int -- align to next N byte boundary |
| | 189 | | CmmDataLabel CLabel -- label current position in code |
| | 190 | | CmmString [Word8] -- string of 8 bit values |
| | 191 | |
| | 192 | data CmmLit |
| | 193 | = CmmInt Integer Width -- 2 compliments, truncated int |
| | 194 | | CmmFloat Rational Width -- float |
| | 195 | | CmmLabel CLabel -- &l1 |
| | 196 | | CmmLabelOff CLabel Int -- &l1 + offset |
| | 197 | | CmmLabelDiffOff CLabel CLabel Int -- &l1 - &l2 + offset |
| | 198 | | CmmBlock BlockId -- address of code label |
| | 199 | | CmmHighStackMark -- max stack space used during a procedure |
| | 200 | }}} |
| | 201 | |
| | 202 | Code generation takes place mainly in {{{llvmGen/LlvmCodeGen/Data.hs}}}, driven by the main Llvm compiler driver, {{llvmGen/LlvmCodeGen.lhs}}}. |
| | 203 | |
| | 204 | The code generation for data occurs in two phases, firstly the types and all data is generated except for address values. Then the address values are resolved. This two step method is used as in the first pass, we don't know if a address refers to an external address or a procedure/data structure in the current LLVM module. We also need the type information in LLVM to create a pointer. |
| | 205 | |
| | 206 | === 1st Pass : Generation === |
| | 207 | |
| | 208 | All `CmmStatic` is translated to LLVM structures. |
| | 209 | |
| | 210 | == `CmmStaticLit` == |
| | 211 | |
| | 212 | These are translated when possible as follows: |
| | 213 | * `CmmInt` -> Reduced to Int and then an appropriate `LMInt` of correct size is created. As LLVM supports any bit size, this is very straight forward. |
| | 214 | * `CmmFloat` -> Translated to a double, detecting NAN and INFINITY correctly. Then correct LLVM type (`float`, `double`, `float80`, `float128`) is selected. |
| | 215 | * `CmmLabel` -> Left untranslated at first, later resolved once we have determined types. As pointers are cast to word size ints, we can still determine types. |
| | 216 | * `CmmLabelOff` -> As above. |
| | 217 | * `CmmLabelDiffOff` -> As above. |
| | 218 | * `CmmBlock` -> `BlockId` is changed to a `CLabel` and then treated as a `CmmLabel` static type. |
| | 219 | * `CmmHighStackMark` -> Panic occurs if this type is encountered. |
| | 220 | |
| | 221 | ==== `CmmUninitialised` ==== |
| | 222 | |
| | 223 | For this, a zeroed array of `8bit` values is created of correct size. |
| | 224 | |
| | 225 | ==== `CmmAlign` & `CmmDataLabel` ==== |
| | 226 | |
| | 227 | The LLVM back-end can't handle `CmmAlign` or `CmmDataLabel`. A panic occurs if either is encountered. A `CmmDataLabel` is expected at the very start of each list of `CmmStatic`. It is removed and used as the name for the structure and constant instance. |
| | 228 | |
| | 229 | ==== `CmmString` ==== |
| | 230 | |
| | 231 | This is translated into a LLVM string. Ascii characters are used when they are printable, escaped hex values otherwise. A null termination is added. |
| | 232 | |
| | 233 | === 2nd Pass : Resolution === |
| | 234 | |
| | 235 | After the first pass, all types have been determined and all data translated except for address values (CLabel's). All generated llvm data is added to a Map of string to `LlvmType`, string being the data structure name. All `CmmProc's` are added to the map as well, they don't need to be properly passed though, just their names retrieved as they have a constant type of void return and no parameters. |
| | 236 | |
| | 237 | Now appropriate pointers can be generated using the type information from the map and LLVM's `getelementptr` instruction. These are then all passed to int's to allow the types of structures to be determined in advance. If a pointer doesn't have a match in the Map, it is assumed to refer to an external (outside of this module) address. An external reference is declared for this address as: |
| | 238 | |
| | 239 | {{{ |
| | 240 | @label = external global [0 * i32] |
| | 241 | }}} |
| | 242 | |
| | 243 | Where i32 is the pointer size. (i64 if on 64 bit). |
| | 244 | |
| | 245 | == `CmmProc` == |
| | 246 | |
| | 247 | TODO |