zZ]      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./012345678 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~                                                                                                                                                                 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~%None*Instance for special casing null pointers.:Given a bit pattern, yield all bit masks that it contains.This does *not* attempt to compute a minimal set of bit masks that when combined yield the bit pattern, instead all contained bit masks are produced.Integral conversionFloating conversionObtain C value from Haskell .Obtain Haskell  from C value.#Convert a C enumeration to Haskell.#Convert a Haskell enumeration to C.[2009..2014] Trevor L. McDonellBSDNone 0>IReturn a descriptive error string associated with a particular error code?_Raise a CUDAException. Exceptions can be thrown from pure code, but can only be caught in the  monad.@%Raise a CUDAException in the IO MonadBReturn the results of a function on successful execution, otherwise throw an exception with an error string associated with the return codeClThrow an exception with an error string associated with an unsuccessful return code, otherwise return unit.F#A specially formatted error messageI  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFD  !"#$%&'()*+,-./0123456789:;<=>?@ABCD  !"#$%&'()*+,-./0123456789:;<=>?@ABC :  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEF[2009..2014] Trevor L. McDonellBSDNone I7Return the version number of the installed CUDA driver.IIII[2009..2014] Trevor L. McDonellBSDNone JProfiler output modeMInitialise the CUDA profiler.The configuration file is used to specify profiling options and profiling counters. Refer to the "Compute Command Line Profiler User Guide" for supported profiler options and counters.sNote that the CUDA profiler can not be initialised with this function if another profiling tool is already active. \http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PROFILER.html#group__CUDA__PROFILERNBegin profiling collection by the active profiling tool for the current context. If profiling is already enabled, then this has no effect.N and O can be used to programatically control profiling granularity, by allowing profiling to be done only on selected pieces of code. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PROFILER.html#group__CUDA__PROFILER_1g8a5314de2292c2efac83ac7fcfa9190eOStop profiling collection by the active profiling tool for the current context, and force all pending profiler events to be written to the output file. If profiling is already inactive, this has no effect. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PROFILER.html#group__CUDA__PROFILER_1g4d8edef6174fd90165e6ac838f320a5f JKLMIconfiguration file that itemises which counters and/or options to profile2output file where profiling results will be storedNOPJKLMNOJKLMNO JKLMNOP[2009..2014] Trevor L. McDonellBSDNone 0VReturn codes from API functionsRaise a S in the IO Monad#A specially formatted error messageEReturn the descriptive string associated with a particular error code|Return the results of a function on successful execution, otherwise return the error string associated with the return codeWReturn the error string associated with an unsuccessful return code, otherwise Nothing]STUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~YSTUVWb}ay^]uXYZ[\_`cdefghijklmnopqrstvwxz{|~YVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~STU STUVPWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~[2009..2014] Trevor L. McDonellBSDNone 6Return the version number of the installed CUDA driver7Return the version number of the installed CUDA runtime[2009..2014] Trevor L. McDonellBSDNone6 Warp size7Maximum number of in-flight threads on a multiprocessor<Maximum number of thread blocks resident on a multiprocessor4Maximum number of in-flight warps per multiprocessor2Number of SIMD arithmetic units per multiprocessor8Total amount of shared memory per multiprocessor (bytes)*Shared memory allocation unit size (bytes)-Total number of registers in a multiprocessorRegister allocation unit size)Register allocation granularity for warps&Maximum number of registers per thread(How multiprocessor resources are dividedPCI bus ID of the device PCI device ID PCI domain ID"The properties of a compute device IdentifierSupported compute capability.Available global memory on the device in bytes0Available constant memory on the device in bytes*Available shared memory per block in bytes32-bit registers per block!Warp size in threads (SIMD width)#Maximum number of threads per block,Maximum number of threads per multiprocessor)Maximum size of each dimension of a block(Maximum size of each dimension of a gridMaximum texture dimensionsClock frequency in kilohertz'Number of multiprocessors on the device/Maximum pitch in bytes allowed by memory copiesGlobal memory bus width in bits(Peak memory clock frequency in kilohertz"Alignment requirement for textures8Device can concurrently copy memory and execute a kernel9Device can possibly execute multiple kernels concurrently0Device supports and has enabled error correctionNumber of asynchronous enginesSize of the L2 cache in bytes%PCI device information for the device3Whether this is a Tesla device using the TCC driver+Whether there is a runtime limit on kernelsAs opposed to discreteDevice can use pinned memory3Device shares a unified address space with the host!Device supports stream priorities+Device supports caching globals in L1 cache*Device supports caching locals in L1 cache8Device supports allocating managed memory on this systemDevice is on a multi-GPU boardGUnique identifier for a group of devices associated with the same boardEGPU compute capability, major and minor revision number respectively.+The compute mode the device is currently inIExtract some additional hardware resource limitations for a given device.IFF '[2009..2014] Trevor L. McDonellBSDNone !Active threads per multiprocessor'Active thread blocks per multiprocessorActive warps per multiprocessor*Occupancy of each multiprocessor (percent)BCalculate occupancy data for a given GPU and kernel resource usageOptimise multiprocessor occupancy as a function of thread block size and resource usage. This returns the smallest satisfying block size in increments of a single warp. As G, but with a generator that produces the specific thread block sizes that should be tested. The generated list can produce values in any order, but the last satisfying block size will be returned. Hence, values should be monotonically decreasing to return the smallest block size yielding maximum occupancy, and vice-versa. bIncrements in powers-of-two, over the range of supported thread block sizes for the given device. bDecrements in powers-of-two, over the range of supported thread block sizes for the given device. [Decrements in the warp size of the device, over the range of supported thread block sizes. [Increments in the warp size of the device, over the range of supported thread block sizes.mDetermine the maximum number of CTAs that can be run simultaneously for a given kernel / device combination."Properties of the card in questionThreads per blockRegisters per threadShared memory per block (bytes)Architecture to optimise for1Register count as a function of thread block size>Shared memory usage (bytes) as a function of thread block size Architecture to optimise forThread block sizes to consider1Register count as a function of thread block size>Shared memory usage (bytes) as a function of thread block size    "Properties of the card in questionThreads per blockRegisters per threadShared memory per block (bytes)!Maximum number of resident blocks               [2009..2014] Trevor L. McDonellBSDNone CaDevice limit flags4Possible option values for direct peer memory accessDevice execution flags A device identifier!?Select the compute device which best matches the given criteria",Returns which device is currently being used#VReturns the number of devices available for execution, with compute capability >= 1.0$4Return information about the selected compute device%'Set device to be used for GPU execution&*Set flags to be used for device executions'8Set list of devices for CUDA execution in priority order(pBlock until the device has completed all preceding requested tasks. Returns an error if one of the tasks fails.)Explicitly destroys and cleans up all runtime resources associated with the current device in the current process. Any subsequent API call will reinitialise the device.Note that this function will reset the device immediately. It is the caller s responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called.*Queries if the first device can directly access the memory of the second. If direct access is possible, it can then be enabled with +. Requires cuda-4.0.+If the devices of both the current and supplied contexts support unified addressing, then enable allocations in the supplied context to be accessible by the current context. Requires cuda-4.0.,bDisable direct memory access from the current context to the supplied context. Requires cuda-4.0.-7Query compute 2.0 call stack limits. Requires cuda-3.1..5Set compute 2.0 call stack limits. Requires cuda-3.1.= !"#$%&'()*+,-./012L !"#$%&'()*+,-.L !"#$%&')(*+,-.2 !"#$%&'()*+,-./012 [2009..2014] Trevor L. McDonellBSDNone Ca 8jPossible option flags for CUDA initialisation. Dummy instance until the API exports actual option values.9Device attributes A CUDA deviceVInitialise the CUDA driver API. This must be called before any other driver function. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__INITIALIZE.html#group__CUDA__INITIALIZE_1g0a2f1517e1bd8502c7194c3a8c134bc3AReturn the compute compatibility revision supported by the device;Return a handle to the compute device at the given ordinal. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g8bdd1cc7201304b01357b8034f6587cb3Return the selected attribute for the given device. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g9c3e1414f0ad901d3278a4d6645fc266:Return the number of device with compute capability > 1.0. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g52b5ce05cb8c5fb6831b2c0ff2887c74#The identifying name of the device. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gef75aa30df95446a845f2a7b9fffbb7f,Return the properties of the selected device1The total memory available on the device (bytes). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gc6a0d6551335a3780f9f3c967a0fde5dx89:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~89O:;<=>?@ABCDEFGHIJKLMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~9:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~889[:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ [2009..2015] Trevor L. McDonellBSDNone Ca Context creation flagsA device contextCreate a new CUDA context and associate it with the calling thread. The context is created with a usage count of one, and the caller of  must call  when done using the context. If a context is already current to the thread, it is supplanted by the newly created context and must be restored by a subsequent call to . uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf{Increments the usage count of the context. API: no context flags are currently supported, so this parameter must be empty.1Detach the context, and destroy if no longer usedeDestroy the specified context, regardless of how many threads it is current to. The context will be ed from the current thread's context stack, but if it is current on any other threads it will remain current to those threads, and attempts to access it will result in an error. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g27a365aebb0eb548166309f58a1e8b8e3Return the context bound to the calling CPU thread.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g8f13165846b73750693640fb3e8380d01Bind the specified context to the calling thread.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gbe562ee6258b4fcc272ca6478ca2a2f71Return the device of the currently active context uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g4e84b109eba36cdaaade167f34ae881eyPop the current CUDA context from the CPU thread. The context may then be attached to a different CPU thread by calling . uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g2fac188026a062d92e91a8687d0a7902Push the given context onto the CPU's thread stack of current contexts. The specified context becomes the CPU thread's current context, so all operations that operate on the current context are affected. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gb02d4c850eb16f861fe5a29682cc90babBlock until the device has completed all preceding requests. If the context was created with the F flag, the CPU thread will block until the GPU has finished its work. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g7a54725f28d34b8c6299f0c6ca579616,!  [2009..2015] Trevor L. McDonellBSDNone Ca4Possible option values for direct peer memory accessQueries if the first device can directly access the memory of the second. If direct access is possible, it can then be enabled with .Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g496bdaae1f632ebfb695b99d2c40f19eIf the devices of both the current and supplied contexts support unified addressing, then enable allocations in the supplied context to be accessible by the current context.Note that access is unidirectional, and in order to access memory in the current context from the peer context, a separate symmetric call to  is required.Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g0889ec6728e61c05ed359551d67b3f5aDisable direct memory access from the current context to the supplied peer context, and unregisters any registered allocations.Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g5b4b6936ea868d4954ce4d841a3b4810     [2009..2014] Trevor L. McDonellBSDNone Get the status of the primary context. Returns whether the current context is active, and the flags it was (or will be) created with.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g65f3e018721b6d90aa05cfb56250f469Specify the flags that the primary context should be created with. Note that this is an error if the primary context is already active.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1gd779a84f17acdad0d9143d9fe719cfdfDestroy all allocations and reset all state on the primary context of the given device in the current process. Requires cuda-7.0Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g5d38802e8600340283958a117466ce12Release the primary context on the given device. If there are no more references to the primary context it will be destroyed, regardless of how many threads it is current to.Unlike D this does not pop the context from the stack in any circumstances.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1gf2a8bc16f8df0c88031f6a1ba3d6e8adRetain the primary context for the given device, creating it if necessary, and increasing its usage count. The caller must call & when done using the context. Unlike : the newly retained context is not pushed onto the stack.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g9051f2d5c31501997a6cb0530290a300         [2009..2014] Trevor L. McDonellBSDNone C$Online compilation fallback strategy&Online compilation target architectureResults of online compilation milliseconds spent compiling PTXinformation about PTX assemblythe compiled module,Just-in-time compilation and linking options&maximum number of registers per thread)number of threads per block to target for/level of optimisation to apply (1-4, default 4)5compilation target, otherwise determined from context-fallback strategy if matching cubin not found/generate debug info (-g) (requires cuda >= 5.5)Cgenerate line number information (-lineinfo) (requires cuda >= 5.5)+verbose log messages (requires cuda >= 5.5)JA reference to a Module object, containing collections of device functionsLoad the contents of the specified file (either a ptx or cubin file) to create a new module, and load that module into the current context. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g366093bd269dafd0af21f1c7d18115d3Load the contents of the given image into a new module, and load that module into the current context. The image is (typically) the contents of a cubin or PTX file.Note that the M will be copied into a temporary staging area so that it can be passed to C. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g04ce266ce03720f479eab76136b90c0bAs d, but read the image data from the given pointer. The image is a NULL-terminated sequence of bytes. Load the contents of the given image into a module with online compiler options, and load the module into the current context. The image is (typically) the contents of a cubin or PTX file. The actual attributes of the compiled kernel can be probed using requires.Note that the M will be copied into a temporary staging area so that it can be passed to C. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g9e8047e9dbf725f0cd7cafd18bfd4d12 As  d, but read the image data from the given pointer. The image is a NULL-terminated sequence of bytes. )Unload a module from the current context. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g8ea3d716524369de3763104ced4ea57b7Device code formats that can be used for online linkingP     A     A           [2009..2014] Trevor L. McDonellBSDNone CA pending JIT linker state5Create a pending JIT linker invocation. The returned  should be  [ed once no longer needed. The device code machine size will match the calling application.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g86ca4052a2fab369cb943523908aa80d -Destroy the state of a JIT linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g01b7ae2a34047b05716969af245ce2d9!dComplete a pending linker invocation and load the current module. The link state will be destroyed.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g818fcd84a4150a997c0bba76fef4e716"1Add an input file to a pending linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g1224c0fd48d4a683f3ce19997f200a8c#,Add an input to a pending linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g3ebcd2ccb772ba9c120937a2d2831b77$As #O, but read the specified number of bytes of image data from the given pointer. !"#$%&' (!)"*#$+ !"#$ !"#$ !"#$%&' (!)"*#$+ [2009..2014] Trevor L. McDonellBSDNoneT     [2009..2014] Trevor L. McDonellBSDSafeCa &lPossible option flags for stream initialisation. Dummy instance until the API exports actual option values.'Priority of an execution stream. Work submitted to a higher priority stream may preempt execution of work already executing in a lower priority stream. Lower numbers represent higher priorities.(A processing stream. All operations in a stream are synchronous and executed in sequence, but operations in different non-default streams may happen out-of-order or concurrently with one another.Use 1,s to synchronise operations between streams.,Event creation flags1ZEvents are markers that can be inserted into the CUDA execution stream and later queried.4'A reference to page-locked host memory.A 4 is just a plain ,, but the memory has been allocated by CUDA into page locked memory. This means that the data can be copied to the GPU via DMA (direct memory access). Note that the use of the system function mlock? is not sufficient here --- the CUDA version ensures that the physical7 address stays this same, not just the virtual address.To copy data into a 4 array, you may use for example  withHostPtr together with !" or !#.7)A reference to data stored on the device.:XThe main execution stream. No operations overlap with operations in the default stream.=,Possible option flags for waiting for events&'()*+,-./0123456789:;<=>?@A&'()*+,-./0123456789:789456123,-./0+()*'&:&'()*+,-./0123456789:;<=>?@A[2009..2014] Trevor L. McDonellBSDNone CMCreate a new eventNDestroy an eventO?Determine the elapsed time (in milliseconds) between two eventsP0Determines if a event has actually been recordedQRecord an event once all operations in the current context (or optionally specified stream) have completed. This operation is asynchronous.R Makes all future work submitted to the (optional) stream wait until the given event reports completion before beginning execution. Synchronisation is performed on the device, including when the event and stream are from different device contexts. Requires cuda-3.2.S&Wait until the event has been recorded-./0123M4N5O6P7Q8R9S:+,-./01MNOPQRS1,-./0+MNOPQRS-./0123M4N5O6P7Q8R9S:[2009..2014] Trevor L. McDonellBSDNone T Create a new asynchronous streamU+Destroy and clean up an asynchronous streamV6Determine if all operations in a stream have completedW:Block until all operations in a Stream have been completed;The main execution stream (0){- INLINE defaultStream -} defaultStream :: Stream #if CUDART_VERSION < 3010 defaultStream = Stream 0 #else defaultStream = Stream nullPtr #endif <=>?T@UAVBWC;()*:TUVW()*TUVW: <=>?T@UAVBWC;[2009..2014] Trevor L. McDonellBSDNone %& XKernel function parameters. Doubles will be converted to an internal float representation on devices that do not support doubles natively.]Cache configuration preferencegNmaximum block size that can be successively launched (based on register usage)h,number of registers required for each threadiA global device function.dNote that the use of a string naming a function was deprecated in CUDA 4.1 and removed in CUDA 5.0.j#Obtain the attributes of the named globalZ device function. This itemises the requirements to successfully launch the given kernel.kSSpecify the grid and block dimensions for a device call. Used in conjunction with lT, this pushes data onto the execution stack that will be popped when a function is ned.lqSet the argument parameters that will be passed to the next kernel invocation. This is used in conjunction with k to control kernel execution.mOn devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.rSwitching between configuration modes may insert a device-side synchronisation point for streamed kernel launchesn Invoke the globalD kernel function on the device. This must be preceded by a call to k and (if appropriate) l.oInvoke a kernel on a  (gx * gy), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific (.'XYZ[\]^_`abcdefghiDEFGHIjJkgrid dimensionsblock dimensionsshared memory per block (bytes)associated processing streamKlLMmNnOoDevice function symbolgrid dimensionsthread block shapeshared memory per block (bytes)(optional) execution streamPpqXYZ[\]`^_abcdefghijklmnoibcdefghXYZ[\]^_`ajklmnoXYZ[\]^_`abcdefghiDEFGHIjJkKlLMmNnOoPpq[2009..2015] Trevor L. McDonellBSDNone Ca u-Device shared memory configuration preferencey%Device cache configuration preference~Device limits flags>Return the flags that were used to create the current context.Requires CUDA-7.0 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gf81eef983c1e3b2ef4f166d7a930c86d$Query compute 2.0 call stack limits.Requires CUDA-3.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g9f2d47d1745752aa16da7ed0d111b6a8<Specify the size of the call stack, for compute 2.0 devices.Requires CUDA-3.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g0651954dfb9788173e60a9af7201e65aOn devices where the L1 cache and shared memory use the same hardware resources, this function returns the preferred cache configuration for the current context.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g40b6b141698f76744dea6e39b9a25360On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the current context. This is only a preference.$Any function configuration set via $3 will be preferred over this context-wide setting.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g54699acf7e2ef27279d013ca2095f4a3Return the current size of the shared memory banks in the current context. On devices with configurable shared memory banks,  can be used to change the configuration, so that subsequent kernel launches will by default us the new bank size. On devices without configurable shared memory, this function returns the fixed bank size of the hardware.Requires CUDA-4.2 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g17153a1b8b8c756f7ab8505686a4ad74On devices with configurable shared memory banks, this function will set the context's shared memory bank size that will be used by default for subsequent kernel launches._Changing the shared memory configuration between launches may insert a device synchronisation. Shared memory bank size does not affect shared memory usage or kernel occupancy, but may have major effects on performance. Larger bank sizes allow for greater potential bandwidth to shared memory, but change the kinds of accesses which result in bank conflicts.Requires CUDA-4.2 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g2574235fa643f8f251bf7bc28fac3692\Returns the numerical values that correspond to the greatest and least priority execution streams in the current context respectively. Stream priorities follow the convention that lower numerical numbers correspond to higher priorities. The range of meaningful stream priorities is given by the inclusive range [greatestPriority,leastPriority].Requires CUDA-5.5. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g137920ab61a71be6ce67605b9f294091+uvwxyz{|}~QRSTUVWXYZ[\]^_`'uvwxyz{|}~~yz{|}uvwx'uvwxyz{|}~QRSTUVWXYZ[\]^_`%[2009..2015] Trevor L. McDonellBSDNone4'uvwxyz{|}~[2009..2014] Trevor L. McDonellBSDNone CCreate a new event yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g450687e75f3ff992fe01662a43d9d3dbDestroy an event yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g593ec73a8ec5a5fc031311d3e4dca1ef?Determine the elapsed time (in milliseconds) between two events yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1gdfb1178807353bbcaa9e245da497cf970Determines if a event has actually been recorded yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g6f0704d755066b0ee705749ae911deefRecord an event once all operations in the current context (or optionally specified stream) have completed. This operation is asynchronous. yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g95424d3be52c4eb95d83861b70fb89d1Makes all future work submitted to the (optional) stream wait until the given event reports completion before beginning execution. Synchronisation is performed on the device, including when the event and stream are from different device contexts.Requires CUDA-3.2. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g6a898b652dfc6aa1d5c8d97062618b2f&Wait until the event has been recorded yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g9e520d34e51af7f5375610bca4add99cabcdefghijklmn+,-./0123123,-./0+abcdefghijklmn[2009..2015] Trevor L. McDonellBSDNone "A CUDA inter-process event handle.kCreate an inter-process event handle for a previously allocated event. The event must be created with the 0 and /L event flags. The returned handle may then be sent to another process and Ted to allow efficient hardware synchronisation between GPU work in other processes.:After the event has been opened in the importing process, , , ,  may be used in either process.FPerforming operations on the imported event after the event has been )ed in the exporting process is undefined.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gea02eadd12483de5305878b13288a86cOpen an inter-process event handle for use in the current process, returning an event that can be used in the current process and behaving as a locally created event with the / flag specified.The event must be freed with Q. Performing operations on the imported event after the exported event has been *ed in the exporting process is undefined.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf1d525918b6c643b99ca8c8e42e36c2e opqrstuv opqrstuv[2009..2014] Trevor L. McDonellBSDNone CCreate a new stream. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1ga581f0c5833e21ded8b5a56594e243f4Create a stream with the given priority. Work submitted to a higher-priority stream may preempt work already executing in a lower priority stream.The convention is that lower numbers represent higher priorities. The default priority is zero. The range of meaningful numeric priorities can be queried using &. If the specified priority is outside the supported numerical range, it will automatically be clamped to the highest or lowest number in the range.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g95c1a8c7c3dacb13091692dd9c7f7471HDestroy a stream. If the device is still doing work in the stream when  is called, the function returns immediately and the resources associated with the stream will be released automatically once the device has completed all work. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g244c8833de4596bcd31a06cdf21ee7585Check if all operations in the stream have completed. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g1b0d24bbe97fa68e4bc511fb6adfeb0bAWait until the device has completed all operations in the Stream. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g15e49dd91ec15991eb7c0a741beb7dadQuery the priority of a stream.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g5bd5cb26915a2ecf1921807339488484wxyz{|}~ &()*: ()*&:wxyz{|}~[2009..2014] Trevor L. McDonellBSDNone  %&C Function attributesA  __global__ device functionMReturns the value of the selected attribute requirement for the given kernel. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g5e92a1b0d8d1b82cb00dcfb2de15961bOn devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.sSwitching between configuration modes may insert a device-side synchronisation point for streamed kernel launches.Requires CUDA-3.0. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g40f8c11e81def95dc0072a375f9656819Set the shared memory configuration of a device function.On devices with configurable shared memory banks, this will force all subsequent launches of the given device function to use the specified shared memory bank size configuration. On launch of the function, the shared memory configuration of the device will be temporarily changed if needed to suit the function configuration. Changes in shared memory configuration may introduction a device side synchronisation between kernel launches.,Any per-function configuration specified by setSharedMemConfig9 will override the context-wide configuration set with '.?Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.MThis function will do nothing on devices with fixed shared memory bank size.Requires CUDA-5.0. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g430b913f24970e63869635395df6d9f5Invoke a kernel on a (gx * gy * gz), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific (.In , the number of kernel parameters and their offsets and sizes do not need to be specified, as this information is retrieved directly from the kernel's image. This requires the kernel to have been compiled with toolchain version 3.2 or later.The alternative  will pass the arguments in directly, requiring the application to know the size and alignment/padding of each kernel parameter. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15Invoke a kernel on a (gx * gy * gz), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific (.In , the number of kernel parameters and their offsets and sizes do not need to be specified, as this information is retrieved directly from the kernel's image. This requires the kernel to have been compiled with toolchain version 3.2 or later.The alternative  will pass the arguments in directly, requiring the application to know the size and alignment/padding of each kernel parameter. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15Invoke the kernel on a size (w,h)\ grid of blocks. Each block contains the number of threads specified by a previous call to 5. The launch may also be associated with a specific (. Specify the (x,y,z)^ dimensions of the thread blocks that are created when the given kernel function is launched.tSet the number of bytes of dynamic shared memory to be available to each thread block when the function is launchedFSet the parameters that will specified next time the kernel is invokedKernel function parameters2function to executeblock grid dimensionthread block shapeshared memory (bytes)(optional) stream to execute inlist of function parametersfunction to executeblock grid dimensionthread block shapeshared memory (bytes)(optional) stream to execute inlist of function parametersuvwxuvwx$ [2009..2014] Trevor L. McDonellBSDSafeLook at the contents of device memory. This takes an IO action that will be applied to that pointer, the result of which is returned. It would be silly to return the pointer from the action.?Return a unique handle associated with the given device pointer-Return a device pointer from the given handle The constant ` contains the distinguished memory location that is not associated with a valid memory location.Cast a device pointer from one type to another9Advance the pointer address by the given offset in bytes.nGiven an alignment constraint, align the device pointer to the next highest address satisfying the constraintXCompute the difference between the second and first argument. This fulfils the relation +p2 == p1 `plusDevPtr` (p2 `minusDevPtr` p1)EAdvance a pointer into a device array by the given number of elementsApply an IO action to the memory reference living inside the host pointer object. All uses of the pointer should be inside the  bracket. The constant ` contains the distinguished memory location that is not associated with a valid memory location,Cast a host pointer from one type to another8Advance the pointer address by the given offset in byteslGiven an alignment constraint, align the host pointer to the next highest address satisfying the constraint<Compute the difference between the second and first argumentAAdvance a pointer into a host array by a given number of elements456789789456[2009..2014] Trevor L. McDonellBSDNone C&Options for unified memory allocationsOptions for host allocation Allocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type. The runtime system automatically accelerates calls to functions such as  and # that refer to page-locked memory.Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange7Free page-locked host memory previously allocated with  mallecHostAllocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitable aligned, and not cleared.Execute a computation, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.}Note that kernel launches can be asynchronous, so you may need to add a synchronisation point at the end of the computation..Free previously allocated memory on the deviceQAllocates memory that will be automatically managed by the Unified Memory system[Copy a number of elements from the device to host memory. This is a synchronous operation.Copy memory from the device asynchronously, possibly associated with a particular stream. The destination memory must be page locked.TCopy a 2D memory area from the device to the host. This is a synchronous operation.Copy a 2D memory area from the device to the host asynchronously, possibly associated with a particular stream. The destination array must be page locked.Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a listKCopy a number of elements onto the device. This is a synchronous operation.Copy memory onto the device asynchronously, possibly associated with a particular stream. The source memory must be page-locked.GCopy a 2D memory area onto the device. This is a synchronous operation.Copy a 2D memory area onto the device asynchronously, possibly associated with a particular stream. The source array must be page locked.Write a list of storable elements into a device array. The array must be sufficiently large to hold the entire list. This requires two marshalling operationsCopy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to host, but will not overlap other device operations.Copy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, and may be associated with a particular stream.Copy a 2D memory area from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will not overlap other device operations.Copy a 2D memory area from the first device array (source) to the second device array (destination). The copied areas may not overlay. This operation is asynchronous with respect to the host, and may be associated with a particular stream.Copy data between the host and device asynchronously, possibly associated with a particular stream. The host-side memory must be page-locked (allocated with ).TCopy a 2D memory area between the host and device. This is a synchronous operation.Copy a 2D memory area between the host and device asynchronously, possibly associated with a particular stream. The host-side memory must be page-locked.=Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two copy operations: firstly from a Haskell list into a heap-allocated array, and from there into device memory. The array should be d when no longer required.PWrite a list of storable elements into a newly allocated device array. This is  composed with .Temporarily store a list of elements into a newly allocated device array. An IO action is applied to the array, the result of which is returned. Similar to 7, this requires two marshalling operations of the data.As with , the memory is freed once the action completes, so you should not return the pointer from the action, and be sure that any asynchronous operations (such as kernel execution) have completed. A variant of Q which also supplies the number of elements in the array to the applied function/Initialise device memory to a given 8-bit valueCCopy data between host and device. This is a synchronous operation.Bwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array width destinationsourcenumber of elements destinationsourcenumber of elements destinationwidth of destination arraysourcewidth of source array width to copyheight to copy destinationwidth of destination arraysourcewidth of source array width to copyheight to copyThe device memoryNumber of bytesValue to set for each byte!!7[2009..2014] Trevor L. McDonellBSDNone  A description of how memory read through the texture cache should be interpreted, including the kind of data and the number of bits of each component (x,y,z and w, respectively). Texture channel format kind5access texture using normalised coordinates [0.0,1.0)A texture referenceBind the memory area associated with the device pointer to a texture reference given by the named symbol. Any previously bound references are unbound.Bind the two-dimensional memory area to the texture reference associated with the given symbol. The size of the area is constrained by (width,height) in texel units, and the row pitch in bytes. Any previously bound references are unbound.>Returns the texture reference associated with the given symbolTexture filtering modeTexture addressing mode(                    [2009..2014] Trevor L. McDonellBSDNone C$)&Options for unified memory allocations-Options for host allocation1Allocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type.Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange.Host memory allocated in this way is automatically and immediately accessible to all contexts on all devices which support unified addressing. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gdd8311286d2c2691605362c689bc64e02As 1, but return a V instead. The array will be deallocated automatically once the last reference to the  is dropped.3*Free a section of page-locked host memory. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g62e0fdbe181dab6b1c90fa1a51c7b92c4\Page-locks the specified array (on the host) and maps it for the device(s) as specified by the given allocation flags. Subsequently, the memory is accessed directly by the device so can be read and written with much higher bandwidth than pageable memory that has not been registered. The memory range is added to the same tracking mechanism as 19 to automatically accelerate calls to functions such as ?.Note that page-locking excessive amounts of memory may degrade system performance, since it reduces the amount of pageable memory available. This is best used sparingly to allocate staging areas for data exchange.MThis function has limited support on Mac OS X. OS 10.7 or later is required.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf0a9fe11544326dabd743b7aa6b542235FUnmaps the memory from the given pointer, and makes it pageable again.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g63f450c8125359be87b7623b1c0b2a146Allocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitably aligned for any type, and is not cleared. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb82d2a09844a58dd9e744dc31e8aa4677&Execute a computation on the device, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.eNote that kernel launches can be asynchronous, so you may want to add a synchronisation point using %( as part of the continuation.8#Release a section of device memory. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g89b3f154e17cc89b6eea277dbdf5c93a9Allocates memory that will be automatically managed by the Unified Memory system. The returned pointer is valid on the CPU and on all GPUs which supported managed memory. All accesses to this pointer must obey the Unified Memory programming model.On a multi-GPU system with peer-to-peer support, where multiple GPUs support managed memory, the physical storage is created on the GPU which is active at the time 9 is called. All other GPUs will access the array at reduced bandwidth via peer mapping over the PCIe bus. The Unified Memory system does not migrate memory between GPUs.On a multi-GPU system where multiple GPUs support managed memory, but not all pairs of such GPUs have peer-to-peer support between them, the physical storage is allocated in system memory (zero-copy memory) and all GPUs will access the data at reduced bandwidth over the PCIe bus.Requires CUDA-6.0 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb347ded34dc326af404aa02af5388a32:[Copy a number of elements from the device to host memory. This is a synchronous operation. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g3480368ee0208a98f75019c9a8450893;Copy memory from the device asynchronously, possibly associated with a particular stream. The destination host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g56f30236c7c5247f8e061b59d3268362<,Copy a 2D array from the device to the host. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27=Copy a 2D array from the device to the host asynchronously, possibly associated with a particular execution stream. The destination host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f274>Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a list.?KCopy a number of elements onto the device. This is a synchronous operation. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4d32266788c440b0220b1a9ba5795169@Copy memory onto the device asynchronously, possibly associated with a particular stream. The source host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g1572263fe2597d7ba4f6964597a354a3A,Copy a 2D array from the host to the device. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27BCopy a 2D array from the host to the device asynchronously, possibly associated with a particular execution stream. The source host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f274CWrite a list of storable elements into a device array. The device array must be sufficiently large to hold the entire list. This requires two marshalling operations.DCopy the given number of elements from the first device array (source) to the second device (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g1725774abf8b51b91945f3336b778c8bE,Copy the given number of elements from the first device array (source) to the second device array (destination). The copied areas may not overlap. The operation is asynchronous with respect to the host, and can be asynchronous to other device operations by associating it with a particular stream. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g39ea09ba682b8eccc9c3e0c04319b5c8FCopy a 2D array from the first device array (source) to the second device array (destination). The copied areas must not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27G$Copy a 2D array from the first device array (source) to the second device array (destination). The copied areas may not overlap. The operation is asynchronous with respect to the host, and can be asynchronous to other device operations by associating it with a particular execution stream. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f274H0Copies an array from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host, but serialised with respect to all pending and future asynchronous work in the source and destination contexts. To avoid this synchronisation, use I instead.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ge1f5c7771544fee150ada8853c7cbf4aICopies from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host and all work in other streams and devices.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g82fcecb38018e64b98616a8ac30112f2J@Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two memory copies: firstly from a Haskell list to a heap allocated array, and from there onto the graphics device. The memory should be 8d when no longer required.KPWrite a list of storable elements into a newly allocated device array. This is J composed with .LTemporarily store a list of elements into a newly allocated device array. An IO action is applied to to the array, the result of which is returned. Similar to K', this requires copying the data twice.As with 7, the memory is freed once the action completes, so you should not return the pointer from the action, and be wary of asynchronous kernel execution.M A variant of LQ which also supplies the number of elements in the array to the applied functionNdSet a number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6e582bf866e9e2fb014297bfaf354d7b uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g7d805e610054392a4d11e8a8bf5eb35c uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g983e8d8759acd1b64326317481fbf132OSet the number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide. The operation is asynchronous and may optionally be associated with a stream.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gaef08a7ccd61112f94e82f2b30d43627 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf731438877dd8ec875e4c43d848c878c uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g58229da5d30f1c0cdf667b320ec2c0f5PfReturn the device pointer associated with a mapped, pinned host buffer, which was allocated with the / option by 1.;Currently, no options are supported and this must be empty. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g57a39e5cba26af4d06be67fc77cc62f0QHReturn the base address and allocation size of the given device pointer. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g64fee5711274a2a0573a789c94d8299bRbReturn the amount of free and total memory respectively available to the current context (bytes). uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g808f555540d0143a331cc42aa98835c0l)*+,-./0123456789:;< width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate= width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate to>?@A width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinateB width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate toCDEF width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinateG width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate toHnumber of array elementssource array and contextdestination array and contextInumber of array elementssource array and context$destination array and device contextstream to associate withJKLMNOPQRSTUV,)*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRST,-./012345678)*+,9:;<=>?@ABCDEFGHIKJLMNOPQRTSf)*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUV[2009..2015] Trevor L. McDonellBSDNone C]'Flags for controlling IPC memory access_:A CUDA memory handle used for inter-process communication.`Create an inter-process memory handle for an existing device memory allocation. The handle can then be sent to another process and made available to that process via a.Requires CUDA-4.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6f1b5be767b275f016523b2ac49ebec1a}Open an inter-process memory handle exported from another process, returning a device pointer usable in the current process.-Maps memory exported by another process with createL into the current device address space. For contexts on different devices, a8 can attempt to enable peer access if the user called  ), and is controlled by the ^ flag.8Each handle from a given device and context may only be aDed by one context per device per other process. Memory returned by a must be freed via b.Requires CUDA-4.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ga8bd126fcff919a0c996b7640f197b79b#Close and unmap memory returned by at. The original allocation in the exporting process as well as imported mappings in other processes are unaffected._Any resources used to enable peer access will be freed if this is the last mapping using them.Requires CUDA-4.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gd6f5d5bcf6376c6853b64635b0157b9e]^_`a b  c]^_`ab_]^`ab]^_`a b  c[2009..2014] Trevor L. McDonellBSDNone iTexture data formatsrTexture read mode optionsv Texture reference filtering modey"Texture reference addressing modes~A texture referenceICreate a new texture reference. Once created, the application must call setPtr to associate the reference with allocated memory. Other texture reference functions are used to specify the format and interpretation to be used when the memory is read through this reference. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF__DEPRECATED.html#group__CUDA__TEXREF__DEPRECATED_1g0084fabe2c6d28ffcf9d9f5c7164f16cDestroy a texture reference. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF__DEPRECATED.html#group__CUDA__TEXREF__DEPRECATED_1gea8edbd6cf9f97e6ab2b41fc6785519d{Bind a linear array address of the given size (bytes) as a texture reference. Any previously bound references are unbound. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g44ef7e5055192d52b3d43456602b50a8Bind a linear address range to the given texture reference as a two-dimensional arena. Any previously bound reference is unbound. Note that calls to  can not follow a call to ! for the same texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g26f709bbe10516681913d1ffe8756ee2Get the addressing mode used by a texture reference, corresponding to the given dimension (currently the only supported dimension values are 0 or 1). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1gfb367d93dc1d20aab0cf8ce70d543b333Get the filtering mode used by a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g2439e069746f69b940f2f4dbc78cdf87JGet the data format and number of channel components of the bound texture. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g90936eb6c7c4434a609e1160c278ae53KSpecify the addressing mode for the given dimension of a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g85f4a13eeb94c8072f61091489349bcbWSpecify the filtering mode to be used when reading memory through a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g595d0af02c55576f8c835e4efd1f39c0SSpecify additional characteristics for reading and indexing the texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g554ffd896487533c36810f2e45bb7a28pSpecify the format of the data and number of packed components per element to be read by the texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g05585ef8ea2fec728a03c6c8f87cf07a?ijklmnopqrstuvwxyz{|}~   !$iqjmnoklprstuvxwyz{|}~$~ijklmnopqyz{|}vwxrstu,ijklmnopqrstuvwxyz{|}~   ![2009..2014] Trevor L. McDonellBSDNone Returns a function handle. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1ga52be009b0d4045811b30c965e1cb2cf;Return a global pointer, and size of the global (in bytes). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1gf3e43672e26073b1081476dbf47a86abReturn a handle to a texture reference. This texture reference handle should not be destroyed, as the texture will be destroyed automatically when the module is unloaded. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g9607dcbf911c16420d5264273f2b5608 "#$%&'( "#$%&'(*[2009..2014] Trevor L. McDonellBSDNone0   +[2009..2015] Trevor L. McDonellBSDNone  !"#$%&'()*+,-./0123456789:;<=>?@ABCI89O:;<=>?@ABCDEFGHIJKLMNPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~   '456789uvwxyz{|}~)*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQR,[2009..2014] Trevor L. McDonellBSDNoneSTUVWb}ay^]uXYZ[\_`cdefghijklmnopqrstvwxz{|~ !"#$%&'()*+,-.456789XYZ[\]`^_abcdefghijklmno-[2009..2014] Trevor L. McDonellBSDNoneSTUVWb}ay^]uXYZ[\_`cdefghijklmnopqrstvwxz{|~ !"#$%&'()*+,-.456789XYZ[\]`^_abcdefghijklmno)./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~./012YX<3kV;QRJ[\789:_`abSLijcdefWNmolpqrstuvw      !"(#$)%&'()*+,-./0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G  H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~                  {                                    (        $ ) % )   #                                                                                          !"#$%&'()*+,-.-/01234567899:;<=>?@ABCDEFGHIJKLMNOPQRS TUVWXYZ&'[\]'&^_(/0`abc)*+,-defghi.-j/013klmnopqrst??u$vEwDxyBz{|}~"5"ef      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWX Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~                       d      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~#cuda-0.7.5.0-2ytzUgIWGxZIpempFJhnrOForeign.CUDA.Driver.ErrorForeign.CUDA.Driver.UtilsForeign.CUDA.Driver.ProfilerForeign.CUDA.Runtime.ErrorForeign.CUDA.Runtime.UtilsForeign.CUDA.Analysis.DeviceForeign.CUDA.Analysis.OccupancyForeign.CUDA.Runtime.DeviceForeign.CUDA.Driver.Device Foreign.CUDA.Driver.Context.Base Foreign.CUDA.Driver.Context.Peer#Foreign.CUDA.Driver.Context.PrimaryForeign.CUDA.Driver.Module.BaseForeign.CUDA.Driver.Module.LinkForeign.CUDA.TypesForeign.CUDA.Runtime.EventForeign.CUDA.Runtime.StreamForeign.CUDA.Runtime.Exec"Foreign.CUDA.Driver.Context.ConfigForeign.CUDA.Driver.EventForeign.CUDA.Driver.IPC.EventForeign.CUDA.Driver.StreamForeign.CUDA.Driver.ExecForeign.CUDA.PtrForeign.CUDA.Runtime.MarshalForeign.CUDA.Runtime.TextureForeign.CUDA.Driver.MarshalForeign.CUDA.Driver.IPC.MarshalForeign.CUDA.Driver.Texture Foreign.CUDA.Driver.Module.QueryForeign.CUDA.Internal.C2HSForeign.CUDA.AnalysisForeign.Marshal.Array copyArray moveArraysetCacheConfigFunForeign.CUDA.Driver.ContextgetStreamPriorityRange setSharedMemsyncaddForeign.CUDA.Driver.ModuleForeign.CUDA.DriverForeign.CUDA.Runtime Foreign.CUDA CUDAExceptionExitCode UserErrorStatusSuccess InvalidValue OutOfMemoryNotInitialized DeinitializedProfilerDisabledProfilerNotInitializedProfilerAlreadyStartedProfilerAlreadyStoppedNoDevice InvalidDevice InvalidImageInvalidContextContextAlreadyCurrent MapFailed UnmapFailed ArrayIsMapped AlreadyMappedNoBinaryForGPUAlreadyAcquired NotMappedNotMappedAsArrayNotMappedAsPointerEccUncorrectableUnsupportedLimitContextAlreadyInUsePeerAccessUnsupported InvalidPTXInvalidGraphicsContext InvalidSource FileNotFoundSharedObjectSymbolNotFoundSharedObjectInitFailedOperatingSystem InvalidHandleNotFoundNotReadyIllegalAddressLaunchOutOfResources LaunchTimeoutLaunchIncompatibleTexturingPeerAccessAlreadyEnabledPeerAccessNotEnabledPrimaryContextActiveContextIsDestroyedAssert TooManyPeersHostMemoryAlreadyRegisteredHostMemoryNotRegisteredHardwareStackErrorIllegalInstructionMisalignedAddressInvalidAddressSpace InvalidPC LaunchFailed NotPermitted NotSupportedUnknowndescribe cudaError cudaErrorIO requireSDK resultIfOk nothingIfOk$fShowCUDAException$fExceptionCUDAException $fEnumStatus $fEqStatus $fShowStatus driverVersion OutputMode KeyValuePairCSV initialisestartstop$fEnumOutputMode$fEqOutputMode$fShowOutputModeMissingConfigurationMemoryAllocationInitializationError LaunchFailurePriorLaunchFailureInvalidDeviceFunctionInvalidConfigurationInvalidPitchValue InvalidSymbolMapBufferObjectFailedUnmapBufferObjectFailedInvalidHostPointerInvalidDevicePointerInvalidTextureInvalidTextureBindingInvalidChannelDescriptorInvalidMemcpyDirectionAddressOfConstantTextureFetchFailedTextureNotBoundSynchronizationErrorInvalidFilterSettingInvalidNormSettingMixedDeviceExecutionCudartUnloadingNotYetImplementedMemoryValueTooLargeInvalidResourceHandleInsufficientDriverSetOnActiveProcessInvalidSurfaceECCUncorrectableDuplicateVariableNameDuplicateTextureNameDuplicateSurfaceNameDevicesUnavailableInvalidKernelImageNoKernelImageForDeviceIncompatibleDriverContextDeviceAlreadyInUseLaunchMaxDepthExceededLaunchFileScopedTexLaunchFileScopedSurfSyncDepthExceededLaunchPendingCountExceeded InvalidPc InvalidPtxStartupFailureApiFailureBaseruntimeVersionDeviceResourcesthreadsPerWarp threadsPerMPthreadBlocksPerMP warpsPerMP coresPerMPsharedMemPerMPsharedMemAllocUnit regFileSize regAllocUnit regAllocWarp regPerThread allocation AllocationWarpBlockPCIbusIDdeviceIDdomainIDDeviceProperties deviceNamecomputeCapabilitytotalGlobalMem totalConstMemsharedMemPerBlock regsPerBlockwarpSizemaxThreadsPerBlockmaxThreadsPerMultiProcessor maxBlockSize maxGridSizemaxTextureDim1DmaxTextureDim2DmaxTextureDim3D clockRatemultiProcessorCountmemPitch memBusWidth memClockRatetextureAlignment computeMode deviceOverlapconcurrentKernels eccEnabledasyncEngineCount cacheMemL2pciInfotccDriverEnabledkernelExecTimeoutEnabled integratedcanMapHostMemoryunifiedAddressingstreamPriorities globalL1Cache localL1Cache managedMemory multiGPUBoardmultiGPUBoardGroupIDCompute ComputeModeDefault Exclusive ProhibitedExclusiveProcessdeviceResources $fOrdCompute $fShowCompute$fEnumComputeMode$fEqComputeMode$fShowComputeMode $fEqCompute $fShowPCI$fShowDeviceProperties Occupancy activeThreadsactiveThreadBlocks activeWarps occupancy100 occupancyoptimalBlockSizeoptimalBlockSizeOfincPow2decPow2decWarpincWarpmaxResidentBlocks $fEqOccupancy$fOrdOccupancy$fShowOccupancyLimit StacksizePrintffifosizeMallocheapsizeDevruntimesyncdepthDevruntimependinglaunchcountPeerFlag DeviceFlag ScheduleAuto ScheduleSpin ScheduleYield BlockingSyncMapHostLMemResizeToMaxDevicechoosegetcountpropssetsetFlagssetOrderreset accessibleremovegetLimitsetLimit $fEnumLimit$fEnumPeerFlag$fStorableDeviceProperties$fEnumDeviceFlag$fEqDeviceFlag$fShowDeviceFlag$fBoundedDeviceFlag $fEqLimit $fShowLimitInitFlagDeviceAttributeMaxThreadsPerBlock MaxBlockDimX MaxBlockDimY MaxBlockDimZ MaxGridDimX MaxGridDimY MaxGridDimZMaxSharedMemoryPerBlockSharedMemoryPerBlockTotalConstantMemoryWarpSizeMaxPitchMaxRegistersPerBlockRegistersPerBlock ClockRateTextureAlignment GpuOverlapMultiprocessorCountKernelExecTimeout IntegratedCanMapHostMemoryMaximumTexture1dWidthMaximumTexture2dWidthMaximumTexture2dHeightMaximumTexture3dWidthMaximumTexture3dHeightMaximumTexture3dDepthMaximumTexture2dLayeredWidthMaximumTexture2dArrayWidthMaximumTexture2dLayeredHeightMaximumTexture2dArrayHeightMaximumTexture2dLayeredLayersMaximumTexture2dArrayNumslicesSurfaceAlignmentConcurrentKernels EccEnabledPciBusId PciDeviceId TccDriverMemoryClockRateGlobalMemoryBusWidth L2CacheSizeMaxThreadsPerMultiprocessorAsyncEngineCountUnifiedAddressingMaximumTexture1dLayeredWidthMaximumTexture1dLayeredLayersCanTex2dGatherMaximumTexture2dGatherWidthMaximumTexture2dGatherHeightMaximumTexture3dWidthAlternateMaximumTexture3dHeightAlternateMaximumTexture3dDepthAlternate PciDomainIdTexturePitchAlignmentMaximumTexturecubemapWidth!MaximumTexturecubemapLayeredWidth"MaximumTexturecubemapLayeredLayersMaximumSurface1dWidthMaximumSurface2dWidthMaximumSurface2dHeightMaximumSurface3dWidthMaximumSurface3dHeightMaximumSurface3dDepthMaximumSurface1dLayeredWidthMaximumSurface1dLayeredLayersMaximumSurface2dLayeredWidthMaximumSurface2dLayeredHeightMaximumSurface2dLayeredLayersMaximumSurfacecubemapWidth!MaximumSurfacecubemapLayeredWidth"MaximumSurfacecubemapLayeredLayersMaximumTexture1dLinearWidthMaximumTexture2dLinearWidthMaximumTexture2dLinearHeightMaximumTexture2dLinearPitchMaximumTexture2dMipmappedWidthMaximumTexture2dMipmappedHeightComputeCapabilityMajorComputeCapabilityMinorMaximumTexture1dMipmappedWidthStreamPrioritiesSupportedGlobalL1CacheSupportedLocalL1CacheSupported MaxSharedMemoryPerMultiprocessorMaxRegistersPerMultiprocessor ManagedMemory MultiGpuBoardMultiGpuBoardGroupIdCU_DEVICE_ATTRIBUTE_MAX useDevice capabilitydevice attributenametotalMem$fEnumInitFlag$fEnumDeviceAttribute $fEqDevice $fShowDevice$fEqDeviceAttribute$fShowDeviceAttribute ContextFlag SchedAuto SchedSpin SchedYieldSchedBlockingSync SchedMaskLmemResizeToMax FlagsMaskContext useContextcreateattachdetachdestroypoppush$fEnumContextFlag $fEqContext $fShowContext$fEqContextFlag$fShowContextFlag$fBoundedContextFlagstatussetupreleaseretainJITOptionInternalJIT_MAX_REGISTERSJIT_THREADS_PER_BLOCK JIT_WALL_TIMEJIT_INFO_LOG_BUFFERJIT_INFO_LOG_BUFFER_SIZE_BYTESJIT_ERROR_LOG_BUFFERJIT_ERROR_LOG_BUFFER_SIZE_BYTESJIT_OPTIMIZATION_LEVELJIT_TARGET_FROM_CUCONTEXT JIT_TARGETJIT_FALLBACK_STRATEGYJIT_GENERATE_DEBUG_INFOJIT_LOG_VERBOSEJIT_GENERATE_LINE_INFOJIT_CACHE_MODEJIT_NUM_OPTIONS JITInputTypeCubinPTX FatbinaryObjectLibraryCuJitNumInputTypes JITFallback PreferPTX PreferBinary JITTarget Compute10 Compute11 Compute12 Compute13 Compute20 Compute21 Compute30 Compute32 Compute35 Compute37 Compute50 Compute52 JITResultjitTime jitInfoLog jitModule JITOption MaxRegistersThreadsPerBlockOptimisationLevelTargetFallbackStrategyGenerateDebugInfoGenerateLineInfoVerboseModule useModuleloadFileloadDataloadDataFromPtr loadDataExloadDataFromPtrExunloadjitOptionUnpackjitTargetOfCompute$fEnumJITOptionInternal$fEnumJITInputType$fEnumJITFallback$fEnumJITTarget $fEqModule $fShowModule$fShowJITResult $fEqJITTarget$fShowJITTarget$fEqJITFallback$fShowJITFallback$fShowJITOption$fEqJITInputType$fShowJITInputType$fEqJITOptionInternal$fShowJITOptionInternal LinkStatecompleteaddFileaddDataaddDataFromPtr$fShowLinkState StreamFlagStreamPriorityStream useStreamWaitFlag EventFlag DisableTiming InterprocessEventuseEventHostPtr useHostPtr DevicePtr useDevicePtr defaultStream$fEnumStreamFlag$fEnumWaitFlag$fEnumEventFlag$fStorableHostPtr $fShowHostPtr$fStorableDevicePtr$fShowDevicePtr $fEqDevicePtr$fOrdDevicePtr $fEqHostPtr $fOrdHostPtr $fEqEvent $fShowEvent $fEqEventFlag$fShowEventFlag$fBoundedEventFlag $fEqStream $fShowStream elapsedTimequeryrecordwaitblockfinishedFunParamIArgFArgDArgVArg CacheConfigNoneSharedL1Equal FunAttributesconstSizeByteslocalSizeBytessharedSizeBytesmaxKernelThreadsPerBlocknumRegsFun attributes setConfig setParamssetCacheConfiglaunch launchKernel$fEnumCacheConfig$fStorableFunAttributes$fShowFunAttributes$fEqCacheConfig$fShowCacheConfig SharedMemDefaultBankSizeFourByteBankSizeEightByteBankSizeCache PreferNone PreferSharedPreferL1 PreferEqual StackSizePrintfFifoSizeMallocHeapSizeDevRuntimeSyncDepthDevRuntimePendingLaunchCountMaxgetFlagsgetCachesetCache getSharedMem$fEnumSharedMem $fEnumCache $fEqCache $fShowCache $fEqSharedMem$fShowSharedMemIPCEventexportopen $fEqIPCEvent$fShowIPCEventcreateWithPriority getPriority FunAttributeMaxKernelThreadsPerBlockSharedSizeBytesConstSizeBytesLocalSizeBytesNumRegs PtxVersion BinaryVersion CacheModeCaCU_FUNC_ATTRIBUTE_MAXrequiressetSharedMemConfigFun launchKernel' setBlockShape setSharedSize$fStorableFunParam$fEnumFunAttribute$fEqFunAttribute$fShowFunAttribute withDevicePtrdevPtrToWordPtrwordPtrToDevPtr nullDevPtr castDevPtr plusDevPtr alignDevPtr minusDevPtr advanceDevPtr withHostPtr nullHostPtr castHostPtr plusHostPtr alignHostPtr minusHostPtradvanceHostPtr AttachFlagGlobalHostSingle AllocFlagPortable DeviceMapped WriteCombinedmallocHostArrayfreeHost mallocArray allocaArrayfreemallocManagedArray peekArraypeekArrayAsync peekArray2DpeekArray2DAsync peekListArray pokeArraypokeArrayAsync pokeArray2DpokeArray2DAsync pokeListArraycopyArrayAsync copyArray2DcopyArray2DAsyncnewListArrayLen newListArray withListArraywithListArrayLenmemset$fEnumCopyDirection$fEnumAttachFlag$fEnumAllocFlag $fEqAllocFlag$fShowAllocFlag$fBoundedAllocFlag$fEqAttachFlag$fShowAttachFlag$fBoundedAttachFlag$fEqCopyDirection$fShowCopyDirection FormatDescdepthkind FilterModePointLinear AddressModeWrapClampMirrorBorder FormatKindSignedUnsignedFloatTexture normalised filtering addressingformatbindbind2D$fStorableTexture$fStorableFormatDesc$fEnumFilterMode$fEnumAddressMode$fEnumFormatKind$fEqFormatKind$fShowFormatKind$fEqAddressMode$fShowAddressMode$fEqFilterMode$fShowFilterMode$fEqFormatDesc$fShowFormatDesc $fEqTexture $fShowTextureCuMemAttachGlobalCuMemAttachHostCuMemAttachSinglemallocHostForeignPtr registerArrayunregisterArray copyArrayPeercopyArrayPeerAsync memsetAsync getDevicePtr getBasePtr getMemInfopeekDeviceHandleuseDeviceHandleIPCFlagLazyEnablePeerAccess IPCDevicePtrclose $fEnumIPCFlag $fEqIPCFlag $fShowIPCFlag$fBoundedIPCFlag$fEqIPCDevicePtr$fShowIPCDevicePtrFormatWord8Word16Word32Int8Int16Int32HalfReadMode ReadAsIntegerNormalizedCoordinatesSRGB useTexturegetAddressMode getFilterMode getFormatsetAddressMode setFilterMode setReadMode setFormatpeekTex $fEnumFormat$fEnumReadMode $fEqReadMode$fShowReadMode $fEqFormat $fShowFormatgetFungetPtrgetTex nothingIfNullextractBitMaskscIntConv cFloatConv cFromBoolghc-prim GHC.TypesBoolcToBoolcToEnum cFromEnumwithCStringLenIntConvpeekCStringLenIntConv withIntConv withFloatConv peekIntConv peekFloatConvwithBoolpeekBool peekArrayWithwithEnumpeekEnum nothingIfcombineBitMaskscontainsBitMaskIOcuGetErrorString'_cuGetErrorStringcuDriverGetVersion'_cuDriverGetVersioncuProfilerStop'_cuProfilerStart'_cuProfilerInitialize'_cuProfilerInitializecuProfilerStartcuProfilerStop describe'_cudaDriverGetVersion'_cudaRuntimeGetVersion'_cudaRuntimeGetVersioncudaDriverGetVersioncudaDeviceSetLimit'_cudaDeviceGetLimit'_cudaDeviceDisablePeerAccess'_cudaDeviceEnablePeerAccess'_cudaDeviceCanAccessPeer'_cudaDeviceReset'_cudaDeviceSynchronize'_cudaSetValidDevices'_cudaSetDeviceFlags'_cudaSetDevice'_cudaGetDeviceProperties'_cudaGetDeviceCount'_cudaGetDevice'_cudaChooseDevice'_cudaChooseDevice cudaGetDevicecudaGetDeviceCountcudaGetDeviceProperties cudaSetDevicecudaSetDeviceFlagscudaSetValidDevicescudaDeviceSynchronizecudaDeviceResetcudaDeviceCanAccessPeercudaDeviceEnablePeerAccesscudaDeviceDisablePeerAccesscudaDeviceGetLimitcudaDeviceSetLimitcuDeviceTotalMem'_cuDeviceGetName'_cuDeviceGetCount'_cuDeviceGetAttribute'_ cuDeviceGet'_cuInit'_enable_constructors'_enable_constructorscuInit cuDeviceGetcuDeviceGetAttributecuDeviceGetCountcuDeviceGetNamecuDeviceTotalMemcuCtxSynchronize'_cuCtxPushCurrent'_cuCtxPopCurrent'_cuCtxGetDevice'_cuCtxSetCurrent'_cuCtxGetCurrent'_cuCtxDestroy'_ cuCtxDetach'_ cuCtxAttach'_ cuCtxCreate'_ cuCtxCreate cuCtxAttach cuCtxDetach cuCtxDestroycuCtxGetCurrentcuCtxSetCurrentcuCtxGetDevicecuCtxPopCurrentcuCtxPushCurrentcuCtxSynchronizecuCtxDisablePeerAccess'_cuCtxEnablePeerAccess'_cuDeviceCanAccessPeer'_cuDeviceCanAccessPeercuCtxEnablePeerAccesscuCtxDisablePeerAccesscuDevicePrimaryCtxRetain'_cuDevicePrimaryCtxRelease'_cuDevicePrimaryCtxReset'_cuDevicePrimaryCtxSetFlags'_cuDevicePrimaryCtxGetState'_cuDevicePrimaryCtxGetStatecuDevicePrimaryCtxSetFlagscuDevicePrimaryCtxResetcuDevicePrimaryCtxReleasecuDevicePrimaryCtxRetainbytestring-0.10.8.1Data.ByteString.Internal ByteStringcuModuleUnload'_cuModuleLoadDataEx'_cuModuleLoadData'_cuModuleLoad'_ c_strnlen' cuModuleLoadcuModuleLoadDatacuModuleLoadDataExcuModuleUnloadpeekMod c_strnlen useLinkStatecuLinkAddData'_cuLinkAddFile'_cuLinkComplete'_cuLinkDestroy'_cuLinkCreate'_ cuLinkCreate cuLinkDestroycuLinkComplete cuLinkAddFile cuLinkAddDatabaseGHC.PtrPtrcudaEventSynchronize'_cudaStreamWaitEvent'_cudaEventRecord'_cudaEventQuery'_cudaEventElapsedTime'_cudaEventDestroy'_cudaEventCreateWithFlags'_cudaEventCreateWithFlagscudaEventDestroycudaEventElapsedTimecudaEventQuerycudaEventRecordcudaStreamWaitEventcudaEventSynchronize peekStreamcudaStreamSynchronize'_cudaStreamQuery'_cudaStreamDestroy'_cudaStreamCreate'_cudaStreamCreatecudaStreamDestroycudaStreamQuerycudaStreamSynchronize cudaLaunch'_cudaFuncSetCacheConfig'_cudaSetDoubleForDevice'_cudaSetupArgument'_cudaConfigureCallSimple'_cudaFuncGetAttributes'_cudaFuncGetAttributescudaConfigureCallSimplecudaSetupArgumentcudaSetDoubleForDevicecudaFuncSetCacheConfig cudaLaunchwithFuncuCtxGetStreamPriorityRange'_cuCtxSetSharedMemConfig'_cuCtxGetSharedMemConfig'_cuCtxSetCacheConfig'_cuCtxGetCacheConfig'_cuCtxSetLimit'_cuCtxGetLimit'_cuCtxGetFlags'_ cuCtxGetFlags cuCtxGetLimit cuCtxSetLimitcuCtxGetCacheConfigcuCtxSetCacheConfigcuCtxGetSharedMemConfigcuCtxSetSharedMemConfigcuCtxGetStreamPriorityRangecuEventSynchronize'_cuStreamWaitEvent'_cuEventRecord'_cuEventQuery'_cuEventElapsedTime'_cuEventDestroy'_cuEventCreate'_ cuEventCreatecuEventDestroycuEventElapsedTime cuEventQuery cuEventRecordcuStreamWaitEventcuEventSynchronizeIPCEventHandle useIPCEventcuIpcOpenEventHandle'_cuIpcGetEventHandle'_cuIpcGetEventHandlecuIpcOpenEventHandlenewIPCEventHandlecuStreamGetPriority'_cuStreamSynchronize'_cuStreamQuery'_cuStreamDestroy'_cuStreamCreateWithPriority'_cuStreamCreate'_cuStreamCreatecuStreamCreateWithPrioritycuStreamDestroy cuStreamQuerycuStreamSynchronizecuStreamGetPriorityuseFun cuParamSetv'_ cuParamSetf'_ cuParamSeti'_cuParamSetSize'_cuFuncSetSharedSize'_cuFuncSetBlockShape'_cuLaunchGridAsync'_cuLaunchKernel'_cuFuncSetSharedMemConfig'_cuFuncSetCacheConfig'_cuFuncGetAttribute'_cuFuncGetAttributecuFuncSetCacheConfigcuFuncSetSharedMemConfigcuLaunchKernelcuLaunchGridAsynccuFuncSetBlockShapecuFuncSetSharedSizecuParamSetSize cuParamSeti cuParamSetf cuParamSetv memcpyAsyncmemcpy2D memcpy2DAsync Data.Tuplefst CopyDirection HostToHost HostToDevice DeviceToHostDeviceToDevice cudaMemset'_cudaMemcpy2DAsync'_cudaMemcpy2D'_cudaMemcpyAsync'_ cudaMemcpy'_cudaMallocManaged'_ cudaFree'_ cudaMalloc'_cudaFreeHost'_cudaHostAlloc'_ cudaHostAlloc cudaFreeHost cudaMalloccudaFreecudaMallocManagedmemcpy cudaMemcpycudaMemcpyAsync cudaMemcpy2DcudaMemcpy2DAsync cudaMemsetTextureReferencecudaGetTextureReference'_cudaBindTexture2D'_cudaBindTexture'_cudaBindTexturecudaBindTexture2DcudaGetTextureReferencewith_ withCString_GHC.ForeignPtr ForeignPtr DeviceHandlecuMemGetInfo'_cuMemGetAddressRange'_cuMemHostGetDevicePointer'_cuMemsetD32Async'_cuMemsetD16Async'_cuMemsetD8Async'_ cuMemsetD32'_ cuMemsetD16'_ cuMemsetD8'_cuMemcpyPeerAsync'_cuMemcpyPeer'_cuMemcpy2DDtoDAsync'_cuMemcpy2DDtoD'_cuMemcpyDtoDAsync'_cuMemcpyDtoD'_cuMemcpy2DHtoDAsync'_cuMemcpy2DHtoD'_cuMemcpyHtoDAsync'_cuMemcpyHtoD'_cuMemcpy2DDtoHAsync'_cuMemcpy2DDtoH'_cuMemcpyDtoHAsync'_cuMemcpyDtoH'_cuMemAllocManaged'_ cuMemFree'_ cuMemAlloc'_cuMemHostUnregister'_cuMemHostRegister'_cuMemFreeHost'_cuMemHostAlloc'_finalizerMemFreeHostcuMemHostAlloc cuMemFreeHostcuMemHostRegistercuMemHostUnregister cuMemAlloc cuMemFreecuMemAllocManaged cuMemcpyDtoHcuMemcpyDtoHAsynccuMemcpy2DDtoHcuMemcpy2DDtoHAsync cuMemcpyHtoDcuMemcpyHtoDAsynccuMemcpy2DHtoDcuMemcpy2DHtoDAsync cuMemcpyDtoDcuMemcpyDtoDAsynccuMemcpy2DDtoDcuMemcpy2DDtoDAsync cuMemcpyPeercuMemcpyPeerAsync cuMemsetD8 cuMemsetD16 cuMemsetD32cuMemsetD8AsynccuMemsetD16AsynccuMemsetD32AsynccuMemHostGetDevicePointercuMemGetAddressRange cuMemGetInfo IPCMemHandleuseIPCDevicePtrcuIpcCloseMemHandle'_cuIpcOpenMemHandle'_cuIpcGetMemHandle'_cuIpcGetMemHandlecuIpcOpenMemHandlecuIpcCloseMemHandlenewIPCMemHandlecuTexRefSetFormat'_cuTexRefSetFlags'_cuTexRefSetFilterMode'_cuTexRefSetAddressMode'_cuTexRefGetFormat'_cuTexRefGetFilterMode'_cuTexRefGetAddressMode'_cuTexRefSetAddress2DSimple'_cuTexRefSetAddress'_cuTexRefDestroy'_cuTexRefCreate'_cuTexRefCreatecuTexRefDestroycuTexRefSetAddresscuTexRefSetAddress2DSimplecuTexRefGetAddressModecuTexRefGetFilterModecuTexRefGetFormatcuTexRefSetAddressModecuTexRefSetFilterModecuTexRefSetFlagscuTexRefSetFormatcuModuleGetTexRef'_cuModuleGetGlobal'_cuModuleGetFunction'_cuModuleGetFunctioncuModuleGetGlobalcuModuleGetTexRef resultIfFound