| 52 | | The hardware supports only small fixed sized vectors. High level libraries would like to be able to use arbitrary sized vectors. Similar to the design in GCC and LLVM we provide primitive Haskell types and operations for fixed-size vectors. The task of implementing variable sized vectors in terms of fixed-size vector types and primops is left to the next layer up (DPH, vector lib). |
| 53 | | |
| 54 | | That is, in the core primop layer and down, vector support is only for fixed-size vectors. The fixed sizes will be only powers of 2 and only up to some maximum size. The choice of maximum size should reflect the largest vector size supported by the current range of CPUs (e.g. 256bit with AVX). |
| 55 | | |
| 56 | | === Fallbacks === |
| 57 | | |
| 58 | | The portabilty strategy relies on fallbacks so that we can implement large vectors on machines with only small vector registers, or no vector support at all (either none at all, or none for that type, e.g. only support for integer vectors not floating point, or only 32bit floats not doubles). |
| | 60 | The hardware supports only small fixed sized vectors. High level libraries would like to be able to use arbitrary sized vectors. Similar to the design in GCC and LLVM we will provide primitive Haskell types and operations for fixed-size vectors. The task of implementing variable sized vectors in terms of fixed-size vector types and primops is left to the next layer up (DPH, vector lib). |
| | 61 | |
| | 62 | That is, in the core primop layer and down, vector support is only for fixed-size vectors. The fixed sizes will be only powers of 2 and only up to some maximum size. The choice of maximum size should reflect the largest vector size supported by the current range of CPUs (256bit with AVX): |
| | 63 | |
| | 64 | || types || || || vector sizes || |
| | 65 | || Int8 || Word8 || || 2, 4, 8, 16, 32 || |
| | 66 | || Int16 || Word16 || || 2, 4, 8, 16 || |
| | 67 | || Int32 || Word32 || Float || 2, 4, 8 || |
| | 68 | || Int64 || Word64 || Double || 2, 4 || |
| | 69 | |
| | 70 | We could choose to support larger fixed sizes, or the same maximum size for all types, but there is no strict need to do so. |
| | 71 | |
| | 72 | === Portability and fallbacks === |
| | 73 | |
| | 74 | To enable portable Haskell code we will to provide the same set of vector types and operations on all architectures. Again this follows the approach taken by GCC and LLVM. |
| | 75 | |
| | 76 | We will rely on fallbacks for the cases where certain types or operations are not supported directly in hardware. In particular we can implement large vectors on machines with only small vector registers. Where there is no vector hardware support at all for a type (e.g. arch with no vectors or 64bit doubles on ARM's NEON) we can implement it using scalar code. |
| 64 | | === Code generators === |
| 65 | | |
| 66 | | We would not extend the portable C backend to emit vector instructions. It would rely on the higher layers transforming vector operations into scalar operations. The portable C backend is not ABI compatible with the other code generators so there is no concern about vector registers in the calling convention. |
| 67 | | |
| 68 | | The LLVM C library supports vector types and instructions directly. The GHC LLVM backend could be extended to translate vector ops at the CMM level into LLVM vector ops. |
| 69 | | |
| 70 | | The NCG (native code generator) may need at least minimal support for vector types if vector registers are to be used in the calling convention. This would be necessary if ABI compatibility is to be preserved with the LLVM backend. It is optional whether vector instructions are used to improve performance. |
| | 82 | == Code generators == |
| | 83 | |
| | 84 | We will not extend the portable C backend to emit vector instructions. It will rely on the higher layers transforming vector operations into scalar operations. The portable C backend is not ABI compatible with the other code generators so there is no concern about vector registers in the calling convention. |
| | 85 | |
| | 86 | The LLVM C library supports vector types and instructions directly. The GHC LLVM backend could be extended to translate vector ops at the Cmm level into LLVM vector ops. |
| | 87 | |
| | 88 | The NCG (native code generator) may need at least minimal support for vector types if vector registers are to be used in the calling convention (see below). If we choose a common calling convention where vectors are passed in registers rather than on the stack then minimal support in the NCG would be necessary if ABI compatibility is to be preserved with the LLVM backend. It is optional whether vector instructions are used to improve performance. |
| | 130 | for width {w} in 8, 16, 32, 64 and "", (empty for native Int#/Word# width)[[BR]] |
| | 131 | for multiplicity {m} in 2, 4, 8, 16, 32 |
| | 132 | |
| | 133 | `type Int`''{w}''`Vec`''{m}''`#`[[BR]] |
| | 134 | `type Word`''{w}''`Vec`''{m}''`#`[[BR]] |
| | 135 | `type FloatVec`''{m}''`#`[[BR]] |
| | 136 | `type DoubleVec`''{m}''`#`[[BR]] |
| | 137 | |
| 114 | | for width {w} in 8, 16, 32, 64 and "" -- empty for native Int#/Word# width |
| 115 | | for multiplicity {m} in 2, 4, 8, 16, 32 |
| 116 | | {{{ |
| 117 | | type Int{w}Vec{m}# |
| 118 | | type Word{w}Vec{m}# |
| 119 | | type FloatVec{m}# |
| 120 | | type DoubleVec{m}# |
| 121 | | }}} |
| 122 | | It has not yet been decided if we will use a name convention such as: |
| 123 | | {{{ |
| 124 | | IntVec2# IntVec4# IntVec8# ... |
| 125 | | Int8Vec2# ... |
| 126 | | Int16Vec2# |
| 127 | | ... |
| 128 | | }}} |
| 129 | | Or if we will add a new concrete syntax to suggest a paramater, but have it really still part of the name, such as: |
| 130 | | |
| 131 | | Syntax note: here <2> is concrete syntax |
| 132 | | {{{ |
| 133 | | IntVec<2># IntVec<4># IntVec<8># ... |
| 134 | | Int8Vec<2># ... |
| 135 | | Int16Vec<2># |
| 136 | | .. |
| 137 | | }}} |
| 138 | | Similarly there would be families of primops: |
| | 140 | Hence we have individual type names with the following naming convention: |
| | 141 | |
| | 142 | || || length 2 || length 4 || length 8 || etc || |
| | 143 | || native `Int` || `IntVec2#` || `IntVec4#` || `IntVec8#` || ... || |
| | 144 | || `Int8` || `Int8Vec2#` || `Int8Vec4#` || `Int8Vec8#` || ... || |
| | 145 | || `Int16` || `Int16Vec2#` || `Int16Vec4#` || `Int16Vec8#` || ... || |
| | 146 | || etc || ... || ... || ... || ... || |
| | 147 | |
| | 148 | Similarly there will be families of primops: |
| 143 | | From the point of view of the Haskell namespace for values and types, each member of each of these families is distinct. It is just a naming convention that suggests the relationship (with or without the addition of some concrete syntax to support the convention). |
| | 153 | From the point of view of the Haskell namespace for values and types, each member of each of these families is distinct. It is just a naming convention that suggests the relationship. |
| | 154 | |
| | 155 | === Optional extension: extra syntax === |
| | 156 | |
| | 157 | We could add a new concrete syntax using `<...>` to suggest a paramater, but have it really still part of the name: |
| | 158 | |
| | 159 | || || length 2 || length 4 || length 8 || etc || |
| | 160 | || native `Int` || `IntVec<2>#` || `IntVec<4>#` || `IntVec<8>#` || ... || |
| | 161 | || `Int8` || `Int8Vec<2>#` || `Int8Vec<4>#` || `Int8Vec<8>#` || ... || |
| | 162 | || `Int16` || `Int16Vec<2>#` || `Int16Vec<4>#` || `Int16Vec<8>#` || ... || |
| | 163 | || etc || ... || ... || ... || ... || |
| 147 | | Internally in GHC we can take advantage of the obvious parametrisation within the families of primitive types and operations. In particular we extend GHC's primop.txt.pp machinery to enable us to describe the family as a whole and to generate the members. |
| 148 | | |
| 149 | | For example: |
| 150 | | {{{ |
| 151 | | paramater <w> Width 8,16,32,64 |
| 152 | | paramater <m> Multiplicity 2,4,8,16,32 |
| 153 | | |
| 154 | | primop VIntAddOp <w> <m> "addInt<w>Vec<m>#" Dyadic |
| 155 | | Int{w}Vec{m}# -> Int{w}Vec{m}# -> Int{w}Vec{m}# |
| 156 | | {doc comments} |
| 157 | | }}} |
| 158 | | This would generate a family of primops, and an internal representation using the obvious parameters: |
| | 167 | Internally in GHC we can take advantage of the obvious parametrisation within the families of primitive types and operations. In particular we extend GHC's `primop.txt.pp` machinery to enable us to describe the family as a whole and to generate the members. |
| | 168 | |
| | 169 | For example, here is some plausible concrete syntax for `primop.txt.pp`: |
| | 170 | {{{ |
| | 171 | parameter <w, m> Width Multiplicity |
| | 172 | with <w, m> in <8, 2>,<8, 4>,<8, 8>,<8, 16>,<8, 32>, |
| | 173 | <16,2>,<16,4>,<16,8>,<16,16>, |
| | 174 | <32,2>,<32,4>,<32,8>, |
| | 175 | <64,2>,<64,4> |
| | 176 | }}} |
| | 177 | Note that we allow non-rectangular combinations of values for the parameters. We declare the range of values along with the parameter so that we do not have to repeat it for every primtype and primop. |
| | 178 | {{{ |
| | 179 | primtype <w,m> Int<w>Vec<m># |
| | 180 | |
| | 181 | primop VIntAddOp <w,m> "addInt<w>Vec<m>#" Dyadic |
| | 182 | Int<w>Vec<m># -> Int<w>Vec<m># -> Int<w>Vec<m># |
| | 183 | {Vector addition} |
| | 184 | }}} |
| | 185 | |
| | 186 | This would generate a family of primops, and an internal representation using the type names declared for the parameters: |
| 164 | | |
| 165 | | === Optional: primitive int sizes === |
| 166 | | |
| 167 | | The same mechanism could be used to handle parametrisation between Int8#, Int16# etc. Currently these do not exist as primitive types. The types Int8, Int16 etc are implemented as a boxed native-sized Int# plus narrowing. |
| | 192 | It is not yet clear what syntax to achieve the names of the native sized types `Int` and `Word`. Perhaps we should use "", e.g. |
| | 193 | {{{ |
| | 194 | parameter <w, m> Width Multiplicity |
| | 195 | with <w, m> in <8, 2>,<8, 4>,<8, 8>,<8, 16>,<8, 32>, |
| | 196 | <16,2>,<16,4>,<16,8>,<16,16>, |
| | 197 | <32,2>,<32,4>,<32,8>, |
| | 198 | <64,2>,<64,4> |
| | 199 | <"",2>,<"",4> |
| | 200 | }}} |
| | 201 | |
| | 202 | === Optional extension: primitive int sizes === |
| | 203 | |
| | 204 | The above mechanism could be used to handle parametrisation between Int8#, Int16# etc. Currently these do not exist as primitive types. The types Int8, Int16 etc are implemented as a boxed native-sized Int# plus narrowing. |