!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn o p q r s t u v w x y z { | } ~    Analogous to  The type parameter r- and its functional dependency are necessary since g must be a function of the form $a -> ... -> c -> CodeGenFunction r d %and we must ensure that the explicit r and the implicit r in the g do match. /This is an Applicative functor that registers, Gwhat extensions are needed in order to run the contained instructions. +You can escape from the functor by calling  (and providing a generic implementation. We use an applicative functor since with a monadic interface 5we had to create the specialised code in every case, ,in order to see which extensions where used ,in the course of creating the instructions. ;We use only one (unparameterized) type for all extensions, (since this is the most simple solution. ,Alternatively we could use a type parameter 9where class constraints show what extensions are needed. KThis would be just like exceptions that are explicit in the type signature +as in the control-monad-exception package. RHowever we would still need to lift all basic LLVM instructions to the new monad. .Declare that a certain plain LLVM instruction #depends on a particular extension. 2This can be useful if you rely on the data layout 0of a certain architecture when doing a bitcast, @or if you know that LLVM translates a certain generic operation <to something especially optimal for the declared extension. 7Create an intrinsic and register the needed extension. :We cannot immediately check whether the signature matches )or whether the right extension is given. #However, when resolving intrinsics <LLVM will not find the intrinsic if the extension is wrong, "and it also checks the signature. run generic specific generates the specific code ?if the required extensions are available on the host processor and generic otherwise. Convenient variant of : -Only run the code with extended instructions %if an additional condition is given. Only for debugging purposes.            *construct an array out of single elements DYou must assert that the length of the list matches the array size. &This can be considered the inverse of  . Kprovide the elements of an array as a list of individual virtual registers &This can be considered the inverse of . !The loop is unrolled, since  and  expect constant indices.  ! ! !"#$"#$"#$"#$  %&'  "This would also work for vectors, if LLVM would support select! with bool vectors as condition.  ()*+  %&'   ()*+  %&'   ()*+,-./An alternative to / 'where I try to persuade LLVM to use x86's LOOP instruction. %Unfortunately it becomes even worse. 5LLVM developers say that x86 LOOP is actually slower 9than manual decrement, zero test and conditional branch. 0123This is a variant of 2 that may be more convient, ,because you only need one lambda expression 'for both loop condition and loop body. 4"This construct starts new blocks, )so be prepared when continueing after an 4. 567Branch-free variant of 5 5that is faster if the enclosed block is very simply, .say, if it contains at most two instructions. &It can only be used as alternative to 5 /if the enclosed block is free of side effects.  ,-./01234567 ./012345,-67 ,--./01234567<89:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^Wthe upper two integers are set to zero, there is no instruction that converts to Int64 _+MXCSR is not really supported by LLVM-2.6. ILLVM does not know about the dependency of all floating point operations on this status register. `abcdefghijklmcumulative sum: "(a,b,c,d) -> (a,a+b,a+b+c,a+b+c+d) &I try to cleverly use horizontal add, 8but the generic version in the Vector module is better. 689:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklm68;:9<?>=@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklm689:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklm Kn Attention: .The rounding and fraction functions only work 4for floating point values with maximum magnitude of maxBound :: Int32. >This way we safe expensive handling of possibly seldom cases. opqrstu8The order of addition is chosen for maximum efficiency. 'We do not try to prevent cancelations. vwCThe first result value is the sum of all vector elements from 0 to  div n 2 + 1 B and the second result value is the sum of vector elements from div n 2 to n-1.  n must be at least D2. xJTreat the vector as concatenation of pairs and all these pairs are added. ( Useful for stereo signal processing.  n must be at least D2. yz{|GAllow to work on records of vectors as if they are vectors of records. EThis is a reasonable approach for records of different element types Jsince processor vectors can only be built from elements of the same type. 9But also say for chunked stereo signal this makes sense. In this case we would work on Stereo (Value a). }~,Manually assemble a vector of equal values. %Better use ScalarOrVector.replicate. *construct a vector out of single elements EYou must assert that the length of the list matches the vector size. &This can be considered the inverse of . LManually implement vector shuffling using insertelement and extractelement. In contrast to LLVM':s built-in instruction it supports distinct vector sizes, $but it allows only one input vector =(or a tuple of vectors, but we cannot shuffle between them). (For more complex shuffling we recommend  and . 0Rotate one element towards the higher elements. I don'-t want to call it rotateLeft or rotateRight, =because there is no prefered layout for the vector elements. In Intel's instruction manual vector $elements are indexed like the bits, that is from right to left. @However, when working with Haskell list and enumeration syntax, the start index is left. Implement the ! method using the methods of the | class. Kprovide the elements of a vector as a list of individual virtual registers &This can be considered the inverse of . 8Like LLVM.Util.Loop.mapVector but the loop is unrolled, >which is faster since it can be packed by the code generator. 8Like LLVM.Util.Loop.mapVector but the loop is unrolled, >which is faster since it can be packed by the code generator. 7Ideally on ix86 with SSE41 this would be translated to dpps. +If the target vector type is a native type Fthen the chop operation produces no actual machine instruction. (nop) 3If the vector cannot be evenly divided into chunks 5the last chunk will be padded with undefined values. +The target size is determined by the type. EIf the chunk list provides more data, the exceeding data is dropped. )If the chunk list provides too few data, 5the target vector is filled with undefined elements.  !"#$6We partition a vector of size n into chunks of size m -and add these chunks using vector additions. .We do this by repeated halving of the vector, Hsince this way we do not need assumptions about the native vector size. *We reduce the vector size only virtually, Dthat is we maintain the vector size and fill with undefined values. This is reasonable Wsince LLVM-2.5 and LLVM-2.6 does not allow shuffling between vectors of different size ;and because it likes to do computations on Vector D2 Float in MMX registers on ix86 CPU's, &which interacts badly with FPU usage. 0Since we fill the vector with undefined values, ?LLVM actually treats the vectors like vectors of smaller size. %&'Needs (log n) vector additions ().On LLVM-2.6 and X86 this produces branch-free but even slower code than fractionSelect, %since the comparison to booleans and 8back to a floating point number is translated literally 8to elementwise comparison, conversion to a 0 or -1 byte %and then to a floating point number. MLLVM.select on boolean vectors cannot be translated to X86 code in LLVM-2.6, >thus I code my own version that calls select on all elements. This is slow but works. IWhen this issue is fixed, this function will be replaced by LLVM.select. *t implemented using . This will need jumps. +s implemented using . This will need jumps. ,Another implementation of , 1this time in terms of binary logical operations. The selecting integers must be 5(-1) for selecting an element from the first operand 8and 0 for selecting an element from the second operand. This leads to optimal code. 7On SSE41 this could be done with blendvps or blendvpd. -./02nopqrstuvwxyz{|}~2|}~uvwxyz{nopqrst2nopqrstopqrstuvwxyz{vwxyz{|}~}~ 0The fraction has the same sign as the argument. @This is not particular useful but fast on IEEE implementations. 12+increment (first operand) may be negative, "phase must always be non-negative .both increment and phase must be non-negative 34=There are functions that are intended for processing scalars +but have formally vector input and output. ?This function breaks vector function down to a scalar function (by accessing the lowest vector element. 56 789:;<=>?@An implementation of both  and Memory.C must ensure that  haskellValue is compatible with  llvmStruct. That is, writing and reading  llvmStruct by LLVM must be the same as accessing  haskellValue by Storable methods. HToDo: In future we may also require Storable constraint for llvmStruct. KWe use a functional dependency in order to let type inference work nicely. ABC DEF>Adding the finalizer to a ForeignPtr seems to be the only way Hthat warrants execution of the finalizer (not too early and not never). THowever, the normal ForeignPtr finalizers must be independent from Haskell runtime. &In contrast to ForeignPtr finalizers, @addFinalizer adds finalizers to boxes, that are optimized away. 1Thus finalizers are run too early or not at all. 3Concurrent.ForeignPtr and using threaded execution 1is the only way to get finalizers in Haskell IO. +$This and the following type classes Fare intended for arithmetic operations on wrappers around LLVM types. 5E.g. you might define a fixed point fraction type by   newtype Fixed = Fixed Int32 Mand then use the same methods for floating point and fixed point arithmetic. -In contrast to the arithmetic methods in the llvm wrapper, 7in our methods the types of operands and result match. FAdvantage: Type inference determines most of the types automatically. 7Disadvantage: You cannot use constant values directly, $but you have to convert them all to G. HI.both increment and phase must be non-negative JKLM,%&'()*+,%&()*+'% Isomorphic to =ReaderT (CodeGenFunction r z) (ContT z (CodeGenFunction r)) a, (where the reader provides the block for N &and the continuation part manages the O. %counterpart to Data.Maybe.HT.toMaybe 9If the returned position is smaller than the array size, (then returned final state is undefined. P !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{ | } ~     , - . / ;    | } ~        |}~   / < ?}~                                      ! " # $ % & ' ( )*+,-./01231245llvm-extra-0.3LLVM.Extra.ExtensionLLVM.Extra.ExtensionCheck.X86LLVM.Extra.ClassLLVM.Extra.ArrayLLVM.Extra.MonadLLVM.Extra.ArithmeticLLVM.Extra.ControlLLVM.Extra.Extension.X86LLVM.Extra.VectorLLVM.Extra.ScalarOrVectorLLVM.Extra.MemoryLLVM.Extra.ForeignPtrLLVM.Extra.MaybeContinuationLLVM.Extra.ArithmeticPrivateCallArgsT Subtargetwrap intrinsic intrinsicAttrrunrunWhen runUnsafewithwith2with3sse1sse2sse3ssse3sse41sse42MakeValueTuple valueTupleOfZero zeroTuple Undefined undefTuplezeroTuplePointedundefTuplePointedvalueTupleOfFunctorphisTraversableaddPhisFoldablesizeassemble extractAllmapchainliftR2liftR3incdecadvanceArrayElementPtrfcmpcmpandorSelectselect arrayLooparrayLoopWithExitarrayLoop2WithExitfixedLengthLoop whileLoopwhileLoopShared ifThenElseifThenselectTraversable ifThenSelectmaxssminssmaxpsminpsmaxsdminsdmaxpdminpdcmpsscmppscmpsdcmppdpcmpgtbpcmpgtwpcmpgtdpcmpgtqpcmpugtbpcmpugtwpcmpugtdpcmpugtqpminsbpminswpminsdpmaxsbpmaxswpmaxsdpminubpminuwpminudpmaxubpmaxuwpmaxudpabsbpabswpabsdpmuludqpmulldcvtps2dqcvtpd2dqldmxcsrstmxcsr withMXCSRhaddpshaddpddppsdppdroundssroundpsroundsdroundpdabsssabssdabspsabspdRealminmaxabstruncatefractionfloor Arithmeticsum sumToPairsumInterleavedToPaircumulate dotProductmulAccessinsertextract ShuffleMatch shuffleMatch replicate insertChunkiterateshuffle sizeInTuplerotateUp rotateDownreverseshiftUp shiftDownshiftUpMultiZeroshiftDownMultiZeroshuffleMatchTraversableshuffleMatchAccessshuffleMatchPlain1shuffleMatchPlain2insertTraversableextractTraversablemodify mapChunks zipChunksWithchopconcat cumulate1signedFraction umul32to64RationalConstantconstFromRationalIntegerConstantconstFromInteger PseudoModulescale scaleConst ReplicatereplicateConstFraction addToPhaseincPhase replicateOf FirstClassElementRecordCloadstore decomposecomposeelement loadRecord storeRecorddecomposeRecord composeRecordcastStorablePtr loadNewtype storeNewtypedecomposeNewtypecomposeNewtypenewInitnewParamnewTranscendentalsinlogexpcospow Algebraicsqrt fromRational'Fieldfdiv fromInteger' PseudoRingAdditivezeroaddsubnegonesquareidiviremConsresolvewithBoolfromBooltoBoolisJustliftguardbind arrayLoop2 llvm-0.10.0.1LLVM.Core.CodeGen FunctionArgsbuildIntrinsic targetNamenamecheck subtargetLLVM.Core.Instructions insertvalue extractvalue cmpSelect_arrayLoopWithExitDecLoop _emitCodeVDoubleVFloat switchFPPred pcmpuFromPcmp valueUnit _cumulate1s replicateCore iterateCore _mapByFoldmapAuto zipAutoWithdotProductPartial sumPartialchopCore getLowestPair_reduceAddInterleaved sumGenericsumToPairGenericreduceSumInterleaved_cumulateSimplecumulateGeneric cumulateFrom1 floorGenericfractionGeneric _floorSelect_fractionSelect selectLogical floorLogicalfractionLogicalorderByorder fractionGen singleton runScalar ConvertStructdecomposeField composeField fromStorable toStorable loadElement storeElementextractElement insertElementpairtriplefieldsImporterderefStartParamPtr derefStartPtrValue_inc_dec valueTypeNamecallIntrinsic1callIntrinsic2 addReadNonebase Data.MaybeNothingJust