q5"      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwx y z { | } ~                                                                                                        !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~$None*Instance for special casing null pointers.:Given a bit pattern, yield all bit masks that it contains.This does *not* attempt to compute a minimal set of bit masks that when combined yield the bit pattern, instead all contained bit masks are produced.Integral conversionFloating conversionObtain C value from Haskell .Obtain Haskell  from C value.#Convert a C enumeration to Haskell.#Convert a Haskell enumeration to C.               [2009..2014] Trevor L. McDonellBSDNone +>IReturn a descriptive error string associated with a particular error code?%Raise a CUDAException in the IO Monad@#A specially formatted error messageAReturn the results of a function on successful execution, otherwise throw an exception with an error string associated with the return codeBlThrow an exception with an error string associated with an unsuccessful return code, otherwise return unit.H  !"#$%&'()*+,-./0123456789:;<=>?@ABC  !"#$%&'()*+,-./0123456789:;<=>?@ABC  !"#$%&'()*+,-./0123456789:;<=>?@AB :  !"#$%&'()*+,-./0123456789:;<=>?@AB[2009..2014] Trevor L. McDonellBSDNone C7Return the version number of the installed CUDA driver.CCCC[2009..2014] Trevor L. McDonellBSDNone +GReturn codes from API functionsRaise a D in the IO Monad#A specially formatted error messageEReturn the descriptive string associated with a particular error code|Return the results of a function on successful execution, otherwise return the error string associated with the return codeWReturn the error string associated with an unsuccessful return code, otherwise Nothing]DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~YDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~YGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~DEF DEFGPHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~[2009..2014] Trevor L. McDonellBSDNone 6Return the version number of the installed CUDA driver7Return the version number of the installed CUDA runtime  [2009..2014] Trevor L. McDonellBSDNone6 Warp size7Maximum number of in-flight threads on a multiprocessor<Maximum number of thread blocks resident on a multiprocessor4Maximum number of in-flight warps per multiprocessor2Number of SIMD arithmetic units per multiprocessor8Total amount of shared memory per multiprocessor (bytes)*Shared memory allocation unit size (bytes)-Total number of registers in a multiprocessorRegister allocation unit size)Register allocation granularity for warps&Maximum number of registers per thread(How multiprocessor resources are dividedPCI bus ID of the device PCI device ID PCI domain ID"The properties of a compute device IdentifierSupported compute capability.Available global memory on the device in bytes0Available constant memory on the device in bytes*Available shared memory per block in bytes32-bit registers per block!Warp size in threads (SIMD width)#Maximum number of threads per block,Maximum number of threads per multiprocessor)Maximum size of each dimension of a block(Maximum size of each dimension of a gridMaximum texture dimensionsClock frequency in kilohertz'Number of multiprocessors on the device/Maximum pitch in bytes allowed by memory copiesGlobal memory bus width in bits(Peak memory clock frequency in kilohertz"Alignment requirement for textures8Device can concurrently copy memory and execute a kernel9Device can possibly execute multiple kernels concurrently0Device supports and has enabled error correctionNumber of asynchronous enginesSize of the L2 cache in bytes%PCI device information for the device3Whether this is a Tesla device using the TCC driver+Whether there is a runtime limit on kernelsAs opposed to discreteDevice can use pinned memory3Device shares a unified address space with the host!Device supports stream priorities+Device supports caching globals in L1 cache*Device supports caching locals in L1 cache8Device supports allocating managed memory on this systemDevice is on a multi-GPU boardGUnique identifier for a group of devices associated with the same board+The compute mode the device is currently inIExtract some additional hardware resource limitations for a given device.!EGPU compute capability, major and minor revision number respectively.I"#!FF '"#![2009..2014] Trevor L. McDonellBSDNone !Active threads per multiprocessor'Active thread blocks per multiprocessorActive warps per multiprocessor*Occupancy of each multiprocessor (percent)BCalculate occupancy data for a given GPU and kernel resource usageOptimise multiprocessor occupancy as a function of thread block size and resource usage. This returns the smallest satisfying block size in increments of a single warp.As G, but with a generator that produces the specific thread block sizes that should be tested. The generated list can produce values in any order, but the last satisfying block size will be returned. Hence, values should be monotonically decreasing to return the smallest block size yielding maximum occupancy, and vice-versa.bIncrements in powers-of-two, over the range of supported thread block sizes for the given device.bDecrements in powers-of-two, over the range of supported thread block sizes for the given device.[Decrements in the warp size of the device, over the range of supported thread block sizes.[Increments in the warp size of the device, over the range of supported thread block sizes.mDetermine the maximum number of CTAs that can be run simultaneously for a given kernel / device combination."Properties of the card in questionThreads per blockRegisters per threadShared memory per block (bytes)Architecture to optimise for1Register count as a function of thread block size>Shared memory usage (bytes) as a function of thread block size"Properties of the card in questionThreads per blockRegisters per threadShared memory per block (bytes)!Maximum number of resident blocks [2009..2014] Trevor L. McDonellBSDNone =YDevice limit flags4Possible option values for direct peer memory accessDevice execution flagsA device identifier?Select the compute device which best matches the given criteria,Returns which device is currently being usedVReturns the number of devices available for execution, with compute capability >= 1.04Return information about the selected compute device'Set device to be used for GPU execution*Set flags to be used for device executions8Set list of devices for CUDA execution in priority order pBlock until the device has completed all preceding requested tasks. Returns an error if one of the tasks fails. Explicitly destroys and cleans up all runtime resources associated with the current device in the current process. Any subsequent API call will reinitialise the device.Note that this function will reset the device immediately. It is the caller s responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called. Queries if the first device can directly access the memory of the second. If direct access is possible, it can then be enabled with  . Requires cuda-4.0. If the devices of both the current and supplied contexts support unified addressing, then enable allocations in the supplied context to be accessible by the current context. Requires cuda-4.0. bDisable direct memory access from the current context to the supplied context. Requires cuda-4.0.7Query compute 2.0 call stack limits. Requires cuda-3.1.5Set compute 2.0 call stack limits. Requires cuda-3.1.=$%&'()*+,-./012345678 9 : ; < =>?@ABCL     L     2$%&'()*+,-./012345678 9 : ; < =>?@ABC[2009..2014] Trevor L. McDonellBSDNone =Y jPossible option flags for CUDA initialisation. Dummy instance until the API exports actual option values.Device attributesm A CUDA devicepVInitialise the CUDA driver API. This must be called before any other driver function. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__INITIALIZE.html#group__CUDA__INITIALIZE_1g0a2f1517e1bd8502c7194c3a8c134bc3qAReturn the compute compatibility revision supported by the devicer;Return a handle to the compute device at the given ordinal. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g8bdd1cc7201304b01357b8034f6587cbs3Return the selected attribute for the given device. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g9c3e1414f0ad901d3278a4d6645fc266t:Return the number of device with compute capability > 1.0. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g52b5ce05cb8c5fb6831b2c0ff2887c74u#The identifying name of the device. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gef75aa30df95446a845f2a7b9fffbb7fv,Return the properties of the selected devicew1The total memory available on the device (bytes). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gc6a0d6551335a3780f9f3c967a0fde5dv !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnoDEFGHIpJqrKsLtMuNvwOPQ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwmno !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklpqrstuvw[ !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnoDEFGHIpJqrKsLtMuNvwOPQ [2009..2015] Trevor L. McDonellBSDNone =Y xContext creation flagsA device contextCreate a new CUDA context and associate it with the calling thread. The context is created with a usage count of one, and the caller of  must call  when done using the context. If a context is already current to the thread, it is supplanted by the newly created context and must be restored by a subsequent call to . uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf{Increments the usage count of the context. API: no context flags are currently supported, so this parameter must be empty.1Detach the context, and destroy if no longer usedeDestroy the specified context, regardless of how many threads it is current to. The context will be ed from the current thread's context stack, but if it is current on any other threads it will remain current to those threads, and attempts to access it will result in an error. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g27a365aebb0eb548166309f58a1e8b8e3Return the context bound to the calling CPU thread.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g8f13165846b73750693640fb3e8380d01Bind the specified context to the calling thread.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gbe562ee6258b4fcc272ca6478ca2a2f71Return the device of the currently active context uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g4e84b109eba36cdaaade167f34ae881eyPop the current CUDA context from the CPU thread. The context may then be attached to a different CPU thread by calling . uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g2fac188026a062d92e91a8687d0a7902Push the given context onto the CPU's thread stack of current contexts. The specified context becomes the CPU thread's current context, so all operations that operate on the current context are affected. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gb02d4c850eb16f861fe5a29682cc90babBlock until the device has completed all preceding requests. If the context was created with the |F flag, the CPU thread will block until the GPU has finished its work. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g7a54725f28d34b8c6299f0c6ca579616,xyz{|}~RSTUVWXYZ[\]^_`abcdefxyz{|}~xyz{|}~!x yz{|}~RSTUVWXYZ[\]^_`abcdef [2009..2015] Trevor L. McDonellBSDNone =Y4Possible option values for direct peer memory accessQueries if the first device can directly access the memory of the second. If direct access is possible, it can then be enabled with .Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g496bdaae1f632ebfb695b99d2c40f19eIf the devices of both the current and supplied contexts support unified addressing, then enable allocations in the supplied context to be accessible by the current context.Note that access is unidirectional, and in order to access memory in the current context from the peer context, a separate symmetric call to  is required.Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g0889ec6728e61c05ed359551d67b3f5aDisable direct memory access from the current context to the supplied peer context, and unregisters any registered allocations.Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g5b4b6936ea868d4954ce4d841a3b4810 ghijklm ghijklm [2009..2014] Trevor L. McDonellBSDNone Get the status of the primary context. Returns whether the current context is active, and the flags it was (or will be) created with.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g65f3e018721b6d90aa05cfb56250f469Specify the flags that the primary context should be created with. Note that this is an error if the primary context is already active.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1gd779a84f17acdad0d9143d9fe719cfdfDestroy all allocations and reset all state on the primary context of the given device in the current process. Requires cuda-7.0Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g5d38802e8600340283958a117466ce12Release the primary context on the given device. If there are no more references to the primary context it will be destroyed, regardless of how many threads it is current to.Unlike D this does not pop the context from the stack in any circumstances.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1gf2a8bc16f8df0c88031f6a1ba3d6e8adRetain the primary context for the given device, creating it if necessary, and increasing its usage count. The caller must call & when done using the context. Unlike : the newly retained context is not pushed onto the stack.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g9051f2d5c31501997a6cb0530290a300nopqrstuvwnopqrstuvw [2009..2014] Trevor L. McDonellBSDNone $Online compilation fallback strategy&Online compilation target architectureResults of online compilation milliseconds spent compiling PTXinformation about PTX assemblythe compiled module,Just-in-time compilation and linking options&maximum number of registers per thread)number of threads per block to target for/level of optimisation to apply (1-4, default 4)5compilation target, otherwise determined from context-fallback strategy if matching cubin not found/generate debug info (-g) (requires cuda >= 5.5)Cgenerate line number information (-lineinfo) (requires cuda >= 5.5)+verbose log messages (requires cuda >= 5.5)JA reference to a Module object, containing collections of device functionsLoad the contents of the specified file (either a ptx or cubin file) to create a new module, and load that module into the current context. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g366093bd269dafd0af21f1c7d18115d3Load the contents of the given image into a new module, and load that module into the current context. The image is (typically) the contents of a cubin or PTX file.Note that the xM will be copied into a temporary staging area so that it can be passed to C. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g04ce266ce03720f479eab76136b90c0bAs d, but read the image data from the given pointer. The image is a NULL-terminated sequence of bytes.Load the contents of the given image into a module with online compiler options, and load the module into the current context. The image is (typically) the contents of a cubin or PTX file. The actual attributes of the compiled kernel can be probed using requires.Note that the xM will be copied into a temporary staging area so that it can be passed to C. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g9e8047e9dbf725f0cd7cafd18bfd4d12As d, but read the image data from the given pointer. The image is a NULL-terminated sequence of bytes.)Unload a module from the current context. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g8ea3d716524369de3763104ced4ea57by7Device code formats that can be used for online linkingPz{|}~yAA z{|}~y [2009..2014] Trevor L. McDonellBSDNone A pending JIT linker state5Create a pending JIT linker invocation. The returned  should be [ed once no longer needed. The device code machine size will match the calling application.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g86ca4052a2fab369cb943523908aa80d-Destroy the state of a JIT linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g01b7ae2a34047b05716969af245ce2d9dComplete a pending linker invocation and load the current module. The link state will be destroyed.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g818fcd84a4150a997c0bba76fef4e7161Add an input file to a pending linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g1224c0fd48d4a683f3ce19997f200a8c,Add an input to a pending linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g3ebcd2ccb772ba9c120937a2d2831b77As O, but read the specified number of bytes of image data from the given pointer.[2009..2014] Trevor L. McDonellBSDNoneT[2009..2014] Trevor L. McDonellBSDSafe=Y lPossible option flags for stream initialisation. Dummy instance until the API exports actual option values.Priority of an execution stream. Work submitted to a higher priority stream may preempt execution of work already executing in a lower priority stream. Lower numbers represent higher priorities.A processing stream. All operations in a stream are synchronous and executed in sequence, but operations in different non-default streams may happen out-of-order or concurrently with one another.Use ,s to synchronise operations between streams.Event creation flagsZEvents are markers that can be inserted into the CUDA execution stream and later queried.'A reference to page-locked host memory.A  is just a plain , but the memory has been allocated by CUDA into page locked memory. This means that the data can be copied to the GPU via DMA (direct memory access). Note that the use of the system function mlock? is not sufficient here --- the CUDA version ensures that the physical7 address stays this same, not just the virtual address.To copy data into a  array, you may use for example  withHostPtr together with  ! or  ".)A reference to data stored on the device.XThe main execution stream. No operations overlap with operations in the default stream.,Possible option flags for waiting for events[2009..2014] Trevor L. McDonellBSDNone =Create a new eventDestroy an event?Determine the elapsed time (in milliseconds) between two events0Determines if a event has actually been recordedRecord an event once all operations in the current context (or optionally specified stream) have completed. This operation is asynchronous. Makes all future work submitted to the (optional) stream wait until the given event reports completion before beginning execution. Synchronisation is performed on the device, including when the event and stream are from different device contexts. Requires cuda-3.2.&Wait until the event has been recorded[2009..2014] Trevor L. McDonellBSDNone  Create a new asynchronous stream+Destroy and clean up an asynchronous stream6Determine if all operations in a stream have completed:Block until all operations in a Stream have been completedThe main execution stream (0){- INLINE defaultStream -} defaultStream :: Stream #if CUDART_VERSION < 3010 defaultStream = Stream 0 #else defaultStream = Stream nullPtr #endif  [2009..2014] Trevor L. McDonellBSDNone !" Kernel function parameters. Doubles will be converted to an internal float representation on devices that do not support doubles natively.Cache configuration preferenceNmaximum block size that can be successively launched (based on register usage),number of registers required for each threadA global device function.dNote that the use of a string naming a function was deprecated in CUDA 4.1 and removed in CUDA 5.0.#Obtain the attributes of the named globalZ device function. This itemises the requirements to successfully launch the given kernel.SSpecify the grid and block dimensions for a device call. Used in conjunction with T, this pushes data onto the execution stack that will be popped when a function is ed.qSet the argument parameters that will be passed to the next kernel invocation. This is used in conjunction with  to control kernel execution.On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.rSwitching between configuration modes may insert a device-side synchronisation point for streamed kernel launches Invoke the globalD kernel function on the device. This must be preceded by a call to  and (if appropriate) .Invoke a kernel on a  (gx * gy), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific .'     grid dimensionsblock dimensionsshared memory per block (bytes)associated processing streamDevice function symbolgrid dimensionsthread block shapeshared memory per block (bytes)(optional) execution stream               [2009..2015] Trevor L. McDonellBSDNone =Y -Device shared memory configuration preference%Device cache configuration preference!Device limits flags(>Return the flags that were used to create the current context.Requires CUDA-7.0 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gf81eef983c1e3b2ef4f166d7a930c86d)$Query compute 2.0 call stack limits.Requires CUDA-3.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g9f2d47d1745752aa16da7ed0d111b6a8*<Specify the size of the call stack, for compute 2.0 devices.Requires CUDA-3.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g0651954dfb9788173e60a9af7201e65a+On devices where the L1 cache and shared memory use the same hardware resources, this function returns the preferred cache configuration for the current context.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g40b6b141698f76744dea6e39b9a25360,On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the current context. This is only a preference.$Any function configuration set via #3 will be preferred over this context-wide setting.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g54699acf7e2ef27279d013ca2095f4a3-Return the current size of the shared memory banks in the current context. On devices with configurable shared memory banks, . can be used to change the configuration, so that subsequent kernel launches will by default us the new bank size. On devices without configurable shared memory, this function returns the fixed bank size of the hardware.Requires CUDA-3.2 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g17153a1b8b8c756f7ab8505686a4ad74.On devices with configurable shared memory banks, this function will set the context's shared memory bank size that will be used by default for subsequent kernel launches._Changing the shared memory configuration between launches may insert a device synchronisation. Shared memory bank size does not affect shared memory usage or kernel occupancy, but may have major effects on performance. Larger bank sizes allow for greater potential bandwidth to shared memory, but change the kinds of accesses which result in bank conflicts.Requires CUDA-3.2 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g2574235fa643f8f251bf7bc28fac3692/\Returns the numerical values that correspond to the greatest and least priority execution streams in the current context respectively. Stream priorities follow the convention that lower numerical numbers correspond to higher priorities. The range of meaningful stream priorities is given by the inclusive range [greatestPriority,leastPriority].Requires CUDA-5.5. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g137920ab61a71be6ce67605b9f294091+ !"#$%&'()*+,-./ !"#$%&'()*+,-./(!"#$%&')* +,-./ !"#$%&'()*+,-./$[2009..2015] Trevor L. McDonellBSDNone4xyz{|}~ !"#$%&'()*+,-./[2009..2014] Trevor L. McDonellBSDNone =0Create a new event yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g450687e75f3ff992fe01662a43d9d3db1Destroy an event yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g593ec73a8ec5a5fc031311d3e4dca1ef2?Determine the elapsed time (in milliseconds) between two events yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1gdfb1178807353bbcaa9e245da497cf9730Determines if a event has actually been recorded yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g6f0704d755066b0ee705749ae911deef4Record an event once all operations in the current context (or optionally specified stream) have completed. This operation is asynchronous. yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g95424d3be52c4eb95d83861b70fb89d15Makes all future work submitted to the (optional) stream wait until the given event reports completion before beginning execution. Synchronisation is performed on the device, including when the event and stream are from different device contexts.Requires CUDA-3.2. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g6a898b652dfc6aa1d5c8d97062618b2f6&Wait until the event has been recorded yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g9e520d34e51af7f5375610bca4add99c0123456012345601234560123456[2009..2015] Trevor L. McDonellBSDNone 7"A CUDA inter-process event handle.8kCreate an inter-process event handle for a previously allocated event. The event must be created with the  and L event flags. The returned handle may then be sent to another process and 9Ted to allow efficient hardware synchronisation between GPU work in other processes.:After the event has been opened in the importing process, 4, 6, 5, 3 may be used in either process.FPerforming operations on the imported event after the event has been 1)ed in the exporting process is undefined.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gea02eadd12483de5305878b13288a86c9Open an inter-process event handle for use in the current process, returning an event that can be used in the current process and behaving as a locally created event with the  flag specified.The event must be freed with 1Q. Performing operations on the imported event after the exported event has been 1*ed in the exporting process is undefined.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf1d525918b6c643b99ca8c8e42e36c2e 789789789 789[2009..2014] Trevor L. McDonellBSDNone =:Create a new stream. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1ga581f0c5833e21ded8b5a56594e243f4;Create a stream with the given priority. Work submitted to a higher-priority stream may preempt work already executing in a lower priority stream.The convention is that lower numbers represent higher priorities. The default priority is zero. The range of meaningful numeric priorities can be queried using %. If the specified priority is outside the supported numerical range, it will automatically be clamped to the highest or lowest number in the range.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g95c1a8c7c3dacb13091692dd9c7f7471<HDestroy a stream. If the device is still doing work in the stream when < is called, the function returns immediately and the resources associated with the stream will be released automatically once the device has completed all work. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g244c8833de4596bcd31a06cdf21ee758=5Check if all operations in the stream have completed. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g1b0d24bbe97fa68e4bc511fb6adfeb0b>AWait until the device has completed all operations in the Stream. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g15e49dd91ec15991eb7c0a741beb7dad?Query the priority of a stream.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g5bd5cb26915a2ecf1921807339488484:;<=>? :;<=>? :;<=>?:;<=>?[2009..2014] Trevor L. McDonellBSDNone !"= DFunction attributesNA  __global__ device functionPMReturns the value of the selected attribute requirement for the given kernel. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g5e92a1b0d8d1b82cb00dcfb2de15961bQOn devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.sSwitching between configuration modes may insert a device-side synchronisation point for streamed kernel launches.Requires CUDA-3.0. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g40f8c11e81def95dc0072a375f965681R9Set the shared memory configuration of a device function.On devices with configurable shared memory banks, this will force all subsequent launches of the given device function to use the specified shared memory bank size configuration. On launch of the function, the shared memory configuration of the device will be temporarily changed if needed to suit the function configuration. Changes in shared memory configuration may introduction a device side synchronisation between kernel launches.,Any per-function configuration specified by setSharedMemConfig9 will override the context-wide configuration set with &.?Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.MThis function will do nothing on devices with fixed shared memory bank size.Requires CUDA-5.0. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g430b913f24970e63869635395df6d9f5SInvoke a kernel on a (gx * gy * gz), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific .In S, the number of kernel parameters and their offsets and sizes do not need to be specified, as this information is retrieved directly from the kernel's image. This requires the kernel to have been compiled with toolchain version 3.2 or later.The alternative T will pass the arguments in directly, requiring the application to know the size and alignment/padding of each kernel parameter. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15TInvoke a kernel on a (gx * gy * gz), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific .In S, the number of kernel parameters and their offsets and sizes do not need to be specified, as this information is retrieved directly from the kernel's image. This requires the kernel to have been compiled with toolchain version 3.2 or later.The alternative T will pass the arguments in directly, requiring the application to know the size and alignment/padding of each kernel parameter. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15UInvoke the kernel on a size (w,h)\ grid of blocks. Each block contains the number of threads specified by a previous call to V5. The launch may also be associated with a specific .V Specify the (x,y,z)^ dimensions of the thread blocks that are created when the given kernel function is launched.WtSet the number of bytes of dynamic shared memory to be available to each thread block when the function is launchedXFSet the parameters that will specified next time the kernel is invokedKernel function parameters2@ABCDEFGHIJKLMNOPQRSfunction to executeblock grid dimensionthread block shapeshared memory (bytes)(optional) stream to execute inlist of function parametersTfunction to executeblock grid dimensionthread block shapeshared memory (bytes)(optional) stream to execute inlist of function parametersUV W X   @ABCDEFGHIJKLMNOPQRSTUVWXNO@ABCDEFGHIJKLMPQRSTVWXU$@ABCD EFGHIJKLMNOPQRSTUV W X   [2009..2014] Trevor L. McDonellBSDSafeYLook at the contents of device memory. This takes an IO action that will be applied to that pointer, the result of which is returned. It would be silly to return the pointer from the action.Z?Return a unique handle associated with the given device pointer[-Return a device pointer from the given handle\ The constant \` contains the distinguished memory location that is not associated with a valid memory location].Cast a device pointer from one type to another^9Advance the pointer address by the given offset in bytes._nGiven an alignment constraint, align the device pointer to the next highest address satisfying the constraint`XCompute the difference between the second and first argument. This fulfils the relation +p2 == p1 `plusDevPtr` (p2 `minusDevPtr` p1)aEAdvance a pointer into a device array by the given number of elementsbApply an IO action to the memory reference living inside the host pointer object. All uses of the pointer should be inside the b bracket.c The constant c` contains the distinguished memory location that is not associated with a valid memory locationd,Cast a host pointer from one type to anothere8Advance the pointer address by the given offset in bytesflGiven an alignment constraint, align the host pointer to the next highest address satisfying the constraintg<Compute the difference between the second and first argumenthAAdvance a pointer into a host array by a given number of elementsYZ[\]^_`abcdefghYZ[\]^_`abcdefghYZ[\]^_`abcdefghYZ[\]^_`abcdefgh[2009..2014] Trevor L. McDonellBSDNone =i&Options for unified memory allocationsmOptions for host allocationq Allocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type. The runtime system automatically accelerates calls to functions such as x and }# that refer to page-locked memory.Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchanger7Free page-locked host memory previously allocated with  mallecHostsAllocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitable aligned, and not cleared.tExecute a computation, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.}Note that kernel launches can be asynchronous, so you may need to add a synchronisation point at the end of the computation.u.Free previously allocated memory on the devicevQAllocates memory that will be automatically managed by the Unified Memory systemw[Copy a number of elements from the device to host memory. This is a synchronous operation.xCopy memory from the device asynchronously, possibly associated with a particular stream. The destination memory must be page locked.yTCopy a 2D memory area from the device to the host. This is a synchronous operation.zCopy a 2D memory area from the device to the host asynchronously, possibly associated with a particular stream. The destination array must be page locked.{Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a list|KCopy a number of elements onto the device. This is a synchronous operation.}Copy memory onto the device asynchronously, possibly associated with a particular stream. The source memory must be page-locked.~GCopy a 2D memory area onto the device. This is a synchronous operation.Copy a 2D memory area onto the device asynchronously, possibly associated with a particular stream. The source array must be page locked.Write a list of storable elements into a device array. The array must be sufficiently large to hold the entire list. This requires two marshalling operationsCopy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to host, but will not overlap other device operations.Copy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, and may be associated with a particular stream.Copy a 2D memory area from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will not overlap other device operations.Copy a 2D memory area from the first device array (source) to the second device array (destination). The copied areas may not overlay. This operation is asynchronous with respect to the host, and may be associated with a particular stream.Copy data between the host and device asynchronously, possibly associated with a particular stream. The host-side memory must be page-locked (allocated with q).TCopy a 2D memory area between the host and device. This is a synchronous operation.Copy a 2D memory area between the host and device asynchronously, possibly associated with a particular stream. The host-side memory must be page-locked.=Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two copy operations: firstly from a Haskell list into a heap-allocated array, and from there into device memory. The array should be ud when no longer required.PWrite a list of storable elements into a newly allocated device array. This is  composed with .Temporarily store a list of elements into a newly allocated device array. An IO action is applied to the array, the result of which is returned. Similar to 7, this requires two marshalling operations of the data.As with t, the memory is freed once the action completes, so you should not return the pointer from the action, and be sure that any asynchronous operations (such as kernel execution) have completed. A variant of Q which also supplies the number of elements in the array to the applied function/Initialise device memory to a given 8-bit valueCCopy data between host and device. This is a synchronous operation.Bijklmnop !"#$q%r&s'tu(v)wxywidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthzwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array width{|}~width to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array width* destinationsourcenumber of elements+ destinationsourcenumber of elements, destinationwidth of destination arraysourcewidth of source array width to copyheight to copy- destinationwidth of destination arraysourcewidth of source array width to copyheight to copy.The device memoryNumber of bytesValue to set for each byte/01!ijklmnopqrstuvwxyz{|}~!mnopqrstuijklvwxyz{|}~7ijklmnop !"#$q%r&s'tu(v)wxyz{|}~*+,-./01[2009..2014] Trevor L. McDonellBSDNone  A description of how memory read through the texture cache should be interpreted, including the kind of data and the number of bits of each component (x,y,z and w, respectively).Texture channel format kind5access texture using normalised coordinates [0.0,1.0)2A texture referenceBind the memory area associated with the device pointer to a texture reference given by the named symbol. Any previously bound references are unbound.Bind the two-dimensional memory area to the texture reference associated with the given symbol. The size of the area is constrained by (width,height) in texel units, and the row pitch in bytes. Any previously bound references are unbound.3>Returns the texture reference associated with the given symbol4Texture filtering mode5Texture addressing mode(26789:3;<=>?@4526789:3;<=>?@45[2009..2014] Trevor L. McDonellBSDNone =#&Options for unified memory allocationsOptions for host allocationAllocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type.Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange.Host memory allocated in this way is automatically and immediately accessible to all contexts on all devices which support unified addressing. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gdd8311286d2c2691605362c689bc64e0*Free a section of page-locked host memory. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g62e0fdbe181dab6b1c90fa1a51c7b92c\Page-locks the specified array (on the host) and maps it for the device(s) as specified by the given allocation flags. Subsequently, the memory is accessed directly by the device so can be read and written with much higher bandwidth than pageable memory that has not been registered. The memory range is added to the same tracking mechanism as 9 to automatically accelerate calls to functions such as .Note that page-locking excessive amounts of memory may degrade system performance, since it reduces the amount of pageable memory available. This is best used sparingly to allocate staging areas for data exchange.MThis function has limited support on Mac OS X. OS 10.7 or later is required.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf0a9fe11544326dabd743b7aa6b54223FUnmaps the memory from the given pointer, and makes it pageable again.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g63f450c8125359be87b7623b1c0b2a14Allocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitably aligned for any type, and is not cleared. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb82d2a09844a58dd9e744dc31e8aa467&Execute a computation on the device, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.eNote that kernel launches can be asynchronous, so you may want to add a synchronisation point using $' as part of the continuation.#Release a section of device memory. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g89b3f154e17cc89b6eea277dbdf5c93aAllocates memory that will be automatically managed by the Unified Memory system. The returned pointer is valid on the CPU and on all GPUs which supported managed memory. All accesses to this pointer must obey the Unified Memory programming model.On a multi-GPU system with peer-to-peer support, where multiple GPUs support managed memory, the physical storage is created on the GPU which is active at the time  is called. All other GPUs will access the array at reduced bandwidth via peer mapping over the PCIe bus. The Unified Memory system does not migrate memory between GPUs.On a multi-GPU system where multiple GPUs support managed memory, but not all pairs of such GPUs have peer-to-peer support between them, the physical storage is allocated in system memory (zero-copy memory) and all GPUs will access the data at reduced bandwidth over the PCIe bus.Requires CUDA-6.0 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb347ded34dc326af404aa02af5388a32[Copy a number of elements from the device to host memory. This is a synchronous operation. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g3480368ee0208a98f75019c9a8450893Copy memory from the device asynchronously, possibly associated with a particular stream. The destination host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g56f30236c7c5247f8e061b59d3268362,Copy a 2D array from the device to the host. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27Copy a 2D array from the device to the host asynchronously, possibly associated with a particular execution stream. The destination host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f274Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a list.KCopy a number of elements onto the device. This is a synchronous operation. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4d32266788c440b0220b1a9ba5795169Copy memory onto the device asynchronously, possibly associated with a particular stream. The source host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g1572263fe2597d7ba4f6964597a354a3,Copy a 2D array from the host to the device. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27Copy a 2D array from the host to the device asynchronously, possibly associated with a particular execution stream. The source host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f274Write a list of storable elements into a device array. The device array must be sufficiently large to hold the entire list. This requires two marshalling operations.Copy the given number of elements from the first device array (source) to the second device (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g1725774abf8b51b91945f3336b778c8b,Copy the given number of elements from the first device array (source) to the second device array (destination). The copied areas may not overlap. The operation is asynchronous with respect to the host, and can be asynchronous to other device operations by associating it with a particular stream. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g39ea09ba682b8eccc9c3e0c04319b5c8Copy a 2D array from the first device array (source) to the second device array (destination). The copied areas must not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27$Copy a 2D array from the first device array (source) to the second device array (destination). The copied areas may not overlap. The operation is asynchronous with respect to the host, and can be asynchronous to other device operations by associating it with a particular execution stream. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f2740Copies an array from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host, but serialised with respect to all pending and future asynchronous work in the source and destination contexts. To avoid this synchronisation, use  instead.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ge1f5c7771544fee150ada8853c7cbf4aCopies from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host and all work in other streams and devices.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g82fcecb38018e64b98616a8ac30112f2@Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two memory copies: firstly from a Haskell list to a heap allocated array, and from there onto the graphics device. The memory should be d when no longer required.PWrite a list of storable elements into a newly allocated device array. This is  composed with .Temporarily store a list of elements into a newly allocated device array. An IO action is applied to to the array, the result of which is returned. Similar to ', this requires copying the data twice.As with , the memory is freed once the action completes, so you should not return the pointer from the action, and be wary of asynchronous kernel execution. A variant of Q which also supplies the number of elements in the array to the applied functiondSet a number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6e582bf866e9e2fb014297bfaf354d7b uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g7d805e610054392a4d11e8a8bf5eb35c uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g983e8d8759acd1b64326317481fbf132Set the number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide. The operation is asynchronous and may optionally be associated with a stream.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gaef08a7ccd61112f94e82f2b30d43627 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf731438877dd8ec875e4c43d848c878c uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g58229da5d30f1c0cdf667b320ec2c0f5fReturn the device pointer associated with a mapped, pinned host buffer, which was allocated with the  option by .;Currently, no options are supported and this must be empty. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g57a39e5cba26af4d06be67fc77cc62f0HReturn the base address and allocation size of the given device pointer. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g64fee5711274a2a0573a789c94d8299bbReturn the amount of free and total memory respectively available to the current context (bytes). uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g808f555540d0143a331cc42aa98835c0jABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinatei width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate tojkl width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinatem width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate tonop width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinateq width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate tornumber of array elementssource array and contextdestination array and contextsnumber of array elementssource array and context$destination array and device contextstream to associate withtuvwxyz{|}~++dABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~[2009..2015] Trevor L. McDonellBSDNone 'Flags for controlling IPC memory access:A CUDA memory handle used for inter-process communication.Create an inter-process memory handle for an existing device memory allocation. The handle can then be sent to another process and made available to that process via .Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6f1b5be767b275f016523b2ac49ebec1}Open an inter-process memory handle exported from another process, returning a device pointer usable in the current process.-Maps memory exported by another process with createL into the current device address space. For contexts on different devices, 8 can attempt to enable peer access if the user called  (, and is controlled by the  flag.8Each handle from a given device and context may only be Ded by one context per device per other process. Memory returned by  must be freed via .Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ga8bd126fcff919a0c996b7640f197b79#Close and unmap memory returned by t. The original allocation in the exporting process as well as imported mappings in other processes are unaffected._Any resources used to enable peer access will be freed if this is the last mapping using them.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gd6f5d5bcf6376c6853b64635b0157b9e[2009..2014] Trevor L. McDonellBSDNone Texture data formatsTexture read mode options Texture reference filtering mode"Texture reference addressing modesA texture referenceICreate a new texture reference. Once created, the application must call setPtr to associate the reference with allocated memory. Other texture reference functions are used to specify the format and interpretation to be used when the memory is read through this reference. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF__DEPRECATED.html#group__CUDA__TEXREF__DEPRECATED_1g0084fabe2c6d28ffcf9d9f5c7164f16cDestroy a texture reference. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF__DEPRECATED.html#group__CUDA__TEXREF__DEPRECATED_1gea8edbd6cf9f97e6ab2b41fc6785519d{Bind a linear array address of the given size (bytes) as a texture reference. Any previously bound references are unbound. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g44ef7e5055192d52b3d43456602b50a8Bind a linear address range to the given texture reference as a two-dimensional arena. Any previously bound reference is unbound. Note that calls to  can not follow a call to ! for the same texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g26f709bbe10516681913d1ffe8756ee2Get the addressing mode used by a texture reference, corresponding to the given dimension (currently the only supported dimension values are 0 or 1). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1gfb367d93dc1d20aab0cf8ce70d543b333Get the filtering mode used by a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g2439e069746f69b940f2f4dbc78cdf87JGet the data format and number of channel components of the bound texture. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g90936eb6c7c4434a609e1160c278ae53KSpecify the addressing mode for the given dimension of a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g85f4a13eeb94c8072f61091489349bcbWSpecify the filtering mode to be used when reading memory through a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g595d0af02c55576f8c835e4efd1f39c0SSpecify additional characteristics for reading and indexing the texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g554ffd896487533c36810f2e45bb7a28pSpecify the format of the data and number of packed components per element to be read by the texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g05585ef8ea2fec728a03c6c8f87cf07a?$$,[2009..2014] Trevor L. McDonellBSDNone Returns a function handle. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1ga52be009b0d4045811b30c965e1cb2cf;Return a global pointer, and size of the global (in bytes). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1gf3e43672e26073b1081476dbf47a86abReturn a handle to a texture reference. This texture reference handle should not be destroyed, as the texture will be destroyed automatically when the module is unloaded. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g9607dcbf911c16420d5264273f2b5608  )[2009..2014] Trevor L. McDonellBSDNone0*[2009..2015] Trevor L. McDonellBSDNone  !"#$%&'()*+,-./0123456789:;<=>?@ABC !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnpqrstuvwxyz{|}~ !"#$%&'()*+,-./@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh+[2009..2014] Trevor L. McDonellBSDNoneDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~          YZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~,[2009..2014] Trevor L. McDonellBSDNoneDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~          YZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnop-./01qrstuXWvw;2xyz{|}~jU:PQIZ[6789^_`aRKhibcdeVMlmknop' (     !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmno p q r s t  u  v w x x y z { | }   l ~  '   (                                                                         z }    z}z}  &%z} z }   # !"#$%&'()*+,-./0123456789:;<=>?@AB!CDEFGHIJKKLMNOPQRSTUVWXYZZ[\]^_`+abc/01234de56789:;<=>?@AB!CDEfgFGHIJhijklmnop qrstuvwxyYz{|}NOPQRSTUZZ~z}_`                                                                 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~"   p   !"#$%&'()*+,-./01234cuda_9Rfzfcs20kQGF2RuDXW4cVForeign.CUDA.Driver.ErrorForeign.CUDA.Driver.UtilsForeign.CUDA.Runtime.ErrorForeign.CUDA.Runtime.UtilsForeign.CUDA.Analysis.DeviceForeign.CUDA.Analysis.OccupancyForeign.CUDA.Runtime.DeviceForeign.CUDA.Driver.Device Foreign.CUDA.Driver.Context.Base Foreign.CUDA.Driver.Context.Peer#Foreign.CUDA.Driver.Context.PrimaryForeign.CUDA.Driver.Module.BaseForeign.CUDA.Driver.Module.LinkForeign.CUDA.TypesForeign.CUDA.Runtime.EventForeign.CUDA.Runtime.StreamForeign.CUDA.Runtime.Exec"Foreign.CUDA.Driver.Context.ConfigForeign.CUDA.Driver.EventForeign.CUDA.Driver.IPC.EventForeign.CUDA.Driver.StreamForeign.CUDA.Driver.ExecForeign.CUDA.PtrForeign.CUDA.Runtime.MarshalForeign.CUDA.Runtime.TextureForeign.CUDA.Driver.MarshalForeign.CUDA.Driver.IPC.MarshalForeign.CUDA.Driver.Texture Foreign.CUDA.Driver.Module.QueryForeign.CUDA.Internal.C2HSForeign.CUDA.AnalysisForeign.Marshal.Array copyArray moveArraysetCacheConfigFunForeign.CUDA.Driver.ContextgetStreamPriorityRange setSharedMemsyncaddForeign.CUDA.Driver.ModuleForeign.CUDA.DriverForeign.CUDA.Runtime Foreign.CUDA CUDAExceptionExitCode UserErrorStatusSuccess InvalidValue OutOfMemoryNotInitialized DeinitializedProfilerDisabledProfilerNotInitializedProfilerAlreadyStartedProfilerAlreadyStoppedNoDevice InvalidDevice InvalidImageInvalidContextContextAlreadyCurrent MapFailed UnmapFailed ArrayIsMapped AlreadyMappedNoBinaryForGPUAlreadyAcquired NotMappedNotMappedAsArrayNotMappedAsPointerEccUncorrectableUnsupportedLimitContextAlreadyInUsePeerAccessUnsupported InvalidPTXInvalidGraphicsContext InvalidSource FileNotFoundSharedObjectSymbolNotFoundSharedObjectInitFailedOperatingSystem InvalidHandleNotFoundNotReadyIllegalAddressLaunchOutOfResources LaunchTimeoutLaunchIncompatibleTexturingPeerAccessAlreadyEnabledPeerAccessNotEnabledPrimaryContextActiveContextIsDestroyedAssert TooManyPeersHostMemoryAlreadyRegisteredHostMemoryNotRegisteredHardwareStackErrorIllegalInstructionMisalignedAddressInvalidAddressSpace InvalidPC LaunchFailed NotPermitted NotSupportedUnknowndescribe cudaError requireSDK resultIfOk nothingIfOk driverVersionMissingConfigurationMemoryAllocationInitializationError LaunchFailurePriorLaunchFailureInvalidDeviceFunctionInvalidConfigurationInvalidPitchValue InvalidSymbolMapBufferObjectFailedUnmapBufferObjectFailedInvalidHostPointerInvalidDevicePointerInvalidTextureInvalidTextureBindingInvalidChannelDescriptorInvalidMemcpyDirectionAddressOfConstantTextureFetchFailedTextureNotBoundSynchronizationErrorInvalidFilterSettingInvalidNormSettingMixedDeviceExecutionCudartUnloadingNotYetImplementedMemoryValueTooLargeInvalidResourceHandleInsufficientDriverSetOnActiveProcessInvalidSurfaceECCUncorrectableDuplicateVariableNameDuplicateTextureNameDuplicateSurfaceNameDevicesUnavailableInvalidKernelImageNoKernelImageForDeviceIncompatibleDriverContextDeviceAlreadyInUseLaunchMaxDepthExceededLaunchFileScopedTexLaunchFileScopedSurfSyncDepthExceededLaunchPendingCountExceeded InvalidPc InvalidPtxStartupFailureApiFailureBaseruntimeVersionDeviceResourcesthreadsPerWarp threadsPerMPthreadBlocksPerMP warpsPerMP coresPerMPsharedMemPerMPsharedMemAllocUnit regFileSize regAllocUnit regAllocWarp regPerThread allocation AllocationWarpBlockPCIbusIDdeviceIDdomainIDDeviceProperties deviceNamecomputeCapabilitytotalGlobalMem totalConstMemsharedMemPerBlock regsPerBlockwarpSizemaxThreadsPerBlockmaxThreadsPerMultiProcessor maxBlockSize maxGridSizemaxTextureDim1DmaxTextureDim2DmaxTextureDim3D clockRatemultiProcessorCountmemPitch memBusWidth memClockRatetextureAlignment computeMode deviceOverlapconcurrentKernels eccEnabledasyncEngineCount cacheMemL2pciInfotccDriverEnabledkernelExecTimeoutEnabled integratedcanMapHostMemoryunifiedAddressingstreamPriorities globalL1Cache localL1Cache managedMemory multiGPUBoardmultiGPUBoardGroupIDCompute ComputeModeDefault Exclusive ProhibitedExclusiveProcessdeviceResources Occupancy activeThreadsactiveThreadBlocks activeWarps occupancy100 occupancyoptimalBlockSizeoptimalBlockSizeByincPow2decPow2decWarpincWarpmaxResidentBlocksLimit StacksizePrintffifosizeMallocheapsizeDevruntimesyncdepthDevruntimependinglaunchcountPeerFlag DeviceFlag ScheduleAuto ScheduleSpin ScheduleYield BlockingSyncMapHostLMemResizeToMaxDevicechoosegetcountpropssetsetFlagssetOrderreset accessibleremovegetLimitsetLimitInitFlagDeviceAttributeMaxThreadsPerBlock MaxBlockDimX MaxBlockDimY MaxBlockDimZ MaxGridDimX MaxGridDimY MaxGridDimZMaxSharedMemoryPerBlockSharedMemoryPerBlockTotalConstantMemoryWarpSizeMaxPitchMaxRegistersPerBlockRegistersPerBlock ClockRateTextureAlignment GpuOverlapMultiprocessorCountKernelExecTimeout IntegratedCanMapHostMemoryMaximumTexture1dWidthMaximumTexture2dWidthMaximumTexture2dHeightMaximumTexture3dWidthMaximumTexture3dHeightMaximumTexture3dDepthMaximumTexture2dLayeredWidthMaximumTexture2dArrayWidthMaximumTexture2dLayeredHeightMaximumTexture2dArrayHeightMaximumTexture2dLayeredLayersMaximumTexture2dArrayNumslicesSurfaceAlignmentConcurrentKernels EccEnabledPciBusId PciDeviceId TccDriverMemoryClockRateGlobalMemoryBusWidth L2CacheSizeMaxThreadsPerMultiprocessorAsyncEngineCountUnifiedAddressingMaximumTexture1dLayeredWidthMaximumTexture1dLayeredLayersCanTex2dGatherMaximumTexture2dGatherWidthMaximumTexture2dGatherHeightMaximumTexture3dWidthAlternateMaximumTexture3dHeightAlternateMaximumTexture3dDepthAlternate PciDomainIdTexturePitchAlignmentMaximumTexturecubemapWidth!MaximumTexturecubemapLayeredWidth"MaximumTexturecubemapLayeredLayersMaximumSurface1dWidthMaximumSurface2dWidthMaximumSurface2dHeightMaximumSurface3dWidthMaximumSurface3dHeightMaximumSurface3dDepthMaximumSurface1dLayeredWidthMaximumSurface1dLayeredLayersMaximumSurface2dLayeredWidthMaximumSurface2dLayeredHeightMaximumSurface2dLayeredLayersMaximumSurfacecubemapWidth!MaximumSurfacecubemapLayeredWidth"MaximumSurfacecubemapLayeredLayersMaximumTexture1dLinearWidthMaximumTexture2dLinearWidthMaximumTexture2dLinearHeightMaximumTexture2dLinearPitchMaximumTexture2dMipmappedWidthMaximumTexture2dMipmappedHeightComputeCapabilityMajorComputeCapabilityMinorMaximumTexture1dMipmappedWidthStreamPrioritiesSupportedGlobalL1CacheSupportedLocalL1CacheSupported MaxSharedMemoryPerMultiprocessorMaxRegistersPerMultiprocessor ManagedMemory MultiGpuBoardMultiGpuBoardGroupIdCU_DEVICE_ATTRIBUTE_MAX useDevice initialise capabilitydevice attributenametotalMem ContextFlag SchedAuto SchedSpin SchedYieldSchedBlockingSync SchedMaskLmemResizeToMax FlagsMaskContext useContextcreateattachdetachdestroypoppushstatussetupreleaseretainJITOptionInternalJIT_MAX_REGISTERSJIT_THREADS_PER_BLOCK JIT_WALL_TIMEJIT_INFO_LOG_BUFFERJIT_INFO_LOG_BUFFER_SIZE_BYTESJIT_ERROR_LOG_BUFFERJIT_ERROR_LOG_BUFFER_SIZE_BYTESJIT_OPTIMIZATION_LEVELJIT_TARGET_FROM_CUCONTEXT JIT_TARGETJIT_FALLBACK_STRATEGYJIT_GENERATE_DEBUG_INFOJIT_LOG_VERBOSEJIT_GENERATE_LINE_INFOJIT_CACHE_MODEJIT_NUM_OPTIONS JITInputTypeCubinPTX FatbinaryObjectLibraryCuJitNumInputTypes JITFallback PreferPTX PreferBinary JITTarget Compute10 Compute11 Compute12 Compute13 Compute20 Compute21 Compute30 Compute32 Compute35 Compute37 Compute50 Compute52 JITResultjitTime jitInfoLog jitModule JITOption MaxRegistersThreadsPerBlockOptimisationLevelTargetFallbackStrategyGenerateDebugInfoGenerateLineInfoVerboseModule useModuleloadFileloadDataloadDataFromPtr loadDataExloadDataFromPtrExunloadjitOptionUnpackjitTargetOfCompute LinkStatecompleteaddFileaddDataaddDataFromPtr StreamFlagStreamPriorityStream useStreamWaitFlag EventFlag DisableTiming InterprocessEventuseEventHostPtr useHostPtr DevicePtr useDevicePtr defaultStream elapsedTimequeryrecordwaitblockfinishedFunParamIArgFArgDArgVArg CacheConfigNoneSharedL1Equal FunAttributesconstSizeByteslocalSizeBytessharedSizeBytesmaxKernelThreadsPerBlocknumRegsFun attributes setConfig setParamssetCacheConfiglaunch launchKernel SharedMemDefaultBankSizeFourByteBankSizeEightByteBankSizeCache PreferNone PreferSharedPreferL1 PreferEqual StackSizePrintfFifoSizeMallocHeapSizeDevRuntimeSyncDepthDevRuntimePendingLaunchCountMaxgetFlagsgetCachesetCache getSharedMemIPCEventexportopencreateWithPriority getPriority FunAttributeMaxKernelThreadsPerBlockSharedSizeBytesConstSizeBytesLocalSizeBytesNumRegs PtxVersion BinaryVersion CacheModeCaCU_FUNC_ATTRIBUTE_MAXrequiressetSharedMemConfigFun launchKernel' setBlockShape setSharedSize withDevicePtrdevPtrToWordPtrwordPtrToDevPtr nullDevPtr castDevPtr plusDevPtr alignDevPtr minusDevPtr advanceDevPtr withHostPtr nullHostPtr castHostPtr plusHostPtr alignHostPtr minusHostPtradvanceHostPtr AttachFlagGlobalHostSingle AllocFlagPortable DeviceMapped WriteCombinedmallocHostArrayfreeHost mallocArray allocaArrayfreemallocManagedArray peekArraypeekArrayAsync peekArray2DpeekArray2DAsync peekListArray pokeArraypokeArrayAsync pokeArray2DpokeArray2DAsync pokeListArraycopyArrayAsync copyArray2DcopyArray2DAsyncnewListArrayLen newListArray withListArraywithListArrayLenmemset FormatDescdepthkind FilterModePointLinear AddressModeWrapClampMirrorBorder FormatKindSignedUnsignedFloatTexture normalised filtering addressingformatbindbind2DCuMemAttachGlobalCuMemAttachHostCuMemAttachSingle registerArrayunregisterArray copyArrayPeercopyArrayPeerAsync memsetAsync getDevicePtr getBasePtr getMemInfopeekDeviceHandleuseDeviceHandleIPCFlagLazyEnablePeerAccess IPCDevicePtrcloseFormatWord8Word16Word32Int8Int16Int32HalfReadMode ReadAsIntegerNormalizedCoordinatesSRGB useTexturegetAddressMode getFilterMode getFormatsetAddressMode setFilterMode setReadMode setFormatpeekTexgetFungetPtrgetTex nothingIfNullextractBitMaskscIntConv cFloatConv cFromBoolghc-prim GHC.TypesBoolcToBoolcToEnum cFromEnumwithCStringLenIntConvpeekCStringLenIntConv withIntConv withFloatConv peekIntConv peekFloatConvwithBoolpeekBool peekArrayWithwithEnumpeekEnum nothingIfcombineBitMaskscontainsBitMaskcuGetErrorString'_cuGetErrorString$fShowCUDAException$fExceptionCUDAException $fEnumStatuscuDriverGetVersion'_cuDriverGetVersion describe'_cudaDriverGetVersion'_cudaRuntimeGetVersion'_cudaRuntimeGetVersioncudaDriverGetVersion$fEnumComputeMode $fOrdCompute $fShowComputecudaDeviceSetLimit'_cudaDeviceGetLimit'_cudaDeviceDisablePeerAccess'_cudaDeviceEnablePeerAccess'_cudaDeviceCanAccessPeer'_cudaDeviceReset'_cudaDeviceSynchronize'_cudaSetValidDevices'_cudaSetDeviceFlags'_cudaSetDevice'_cudaGetDeviceProperties'_cudaGetDeviceCount'_cudaGetDevice'_cudaChooseDevice'_cudaChooseDevice cudaGetDevicecudaGetDeviceCountcudaGetDeviceProperties cudaSetDevicecudaSetDeviceFlagscudaSetValidDevicescudaDeviceSynchronizecudaDeviceResetcudaDeviceCanAccessPeercudaDeviceEnablePeerAccesscudaDeviceDisablePeerAccesscudaDeviceGetLimitcudaDeviceSetLimit $fEnumLimit$fEnumPeerFlag$fStorableDeviceProperties$fEnumDeviceFlagcuDeviceTotalMem'_cuDeviceGetName'_cuDeviceGetCount'_cuDeviceGetAttribute'_ cuDeviceGet'_cuInit'_cuInit cuDeviceGetcuDeviceGetAttributecuDeviceGetCountcuDeviceGetNamecuDeviceTotalMem$fEnumInitFlag$fEnumDeviceAttributecuCtxSynchronize'_cuCtxPushCurrent'_cuCtxPopCurrent'_cuCtxGetDevice'_cuCtxSetCurrent'_cuCtxGetCurrent'_cuCtxDestroy'_ cuCtxDetach'_ cuCtxAttach'_ cuCtxCreate'_ cuCtxCreate cuCtxAttach cuCtxDetach cuCtxDestroycuCtxGetCurrentcuCtxSetCurrentcuCtxGetDevicecuCtxPopCurrentcuCtxPushCurrentcuCtxSynchronize$fEnumContextFlagcuCtxDisablePeerAccess'_cuCtxEnablePeerAccess'_cuDeviceCanAccessPeer'_cuDeviceCanAccessPeercuCtxEnablePeerAccesscuCtxDisablePeerAccesscuDevicePrimaryCtxRetain'_cuDevicePrimaryCtxRelease'_cuDevicePrimaryCtxReset'_cuDevicePrimaryCtxSetFlags'_cuDevicePrimaryCtxGetState'_cuDevicePrimaryCtxGetStatecuDevicePrimaryCtxSetFlagscuDevicePrimaryCtxResetcuDevicePrimaryCtxReleasecuDevicePrimaryCtxRetainbytes_6elQVSg5cWdFrvRnfxTUrHData.ByteString.Internal ByteString$fEnumJITTargetcuModuleUnload'_cuModuleLoadDataEx'_cuModuleLoadData'_cuModuleLoad'_ c_strnlen' cuModuleLoadcuModuleLoadDatacuModuleLoadDataExcuModuleUnloadpeekMod c_strnlen$fEnumJITOptionInternal$fEnumJITInputType$fEnumJITFallback useLinkStatecuLinkAddData'_cuLinkAddFile'_cuLinkComplete'_cuLinkDestroy'_cuLinkCreate'_ cuLinkCreate cuLinkDestroycuLinkComplete cuLinkAddFile cuLinkAddDatabaseGHC.PtrPtr$fEnumEventFlag$fEnumStreamFlag$fEnumWaitFlag$fStorableHostPtr $fShowHostPtr$fStorableDevicePtr$fShowDevicePtrcudaEventSynchronize'_cudaStreamWaitEvent'_cudaEventRecord'_cudaEventQuery'_cudaEventElapsedTime'_cudaEventDestroy'_cudaEventCreateWithFlags'_cudaEventCreateWithFlagscudaEventDestroycudaEventElapsedTimecudaEventQuerycudaEventRecordcudaStreamWaitEventcudaEventSynchronize peekStreamcudaStreamSynchronize'_cudaStreamQuery'_cudaStreamDestroy'_cudaStreamCreate'_cudaStreamCreatecudaStreamDestroycudaStreamQuerycudaStreamSynchronize cudaLaunch'_cudaFuncSetCacheConfig'_cudaSetDoubleForDevice'_cudaSetupArgument'_cudaConfigureCallSimple'_cudaFuncGetAttributes'_cudaFuncGetAttributescudaConfigureCallSimplecudaSetupArgumentcudaSetDoubleForDevicecudaFuncSetCacheConfig cudaLaunchwithFun$fEnumCacheConfig$fStorableFunAttributescuCtxGetStreamPriorityRange'_cuCtxSetSharedMemConfig'_cuCtxGetSharedMemConfig'_cuCtxSetCacheConfig'_cuCtxGetCacheConfig'_cuCtxSetLimit'_cuCtxGetLimit'_cuCtxGetFlags'_ cuCtxGetFlags cuCtxGetLimit cuCtxSetLimitcuCtxGetCacheConfigcuCtxSetCacheConfigcuCtxGetSharedMemConfigcuCtxSetSharedMemConfigcuCtxGetStreamPriorityRange$fEnumSharedMem $fEnumCachecuEventSynchronize'_cuStreamWaitEvent'_cuEventRecord'_cuEventQuery'_cuEventElapsedTime'_cuEventDestroy'_cuEventCreate'_ cuEventCreatecuEventDestroycuEventElapsedTime cuEventQuery cuEventRecordcuStreamWaitEventcuEventSynchronizeIPCEventHandle useIPCEventcuIpcOpenEventHandle'_cuIpcGetEventHandle'_cuIpcGetEventHandlecuIpcOpenEventHandlenewIPCEventHandlecuStreamGetPriority'_cuStreamSynchronize'_cuStreamQuery'_cuStreamDestroy'_cuStreamCreateWithPriority'_cuStreamCreate'_cuStreamCreatecuStreamCreateWithPrioritycuStreamDestroy cuStreamQuerycuStreamSynchronizecuStreamGetPriority$fEnumFunAttributeuseFun cuParamSetv'_ cuParamSetf'_ cuParamSeti'_cuParamSetSize'_cuFuncSetSharedSize'_cuFuncSetBlockShape'_cuLaunchGridAsync'_cuLaunchKernel'_cuFuncSetSharedMemConfig'_cuFuncSetCacheConfig'_cuFuncGetAttribute'_cuFuncGetAttributecuFuncSetCacheConfigcuFuncSetSharedMemConfigcuLaunchKernelcuLaunchGridAsynccuFuncSetBlockShapecuFuncSetSharedSizecuParamSetSize cuParamSeti cuParamSetf cuParamSetv$fStorableFunParam memcpyAsyncmemcpy2D memcpy2DAsync Data.Tuplefst$fEnumCopyDirection CopyDirection HostToHost HostToDevice DeviceToHostDeviceToDevice cudaMemset'_cudaMemcpy2DAsync'_cudaMemcpy2D'_cudaMemcpyAsync'_ cudaMemcpy'_cudaMallocManaged'_ cudaFree'_ cudaMalloc'_cudaFreeHost'_cudaHostAlloc'_ cudaHostAlloc cudaFreeHost cudaMalloccudaFreecudaMallocManagedmemcpy cudaMemcpycudaMemcpyAsync cudaMemcpy2DcudaMemcpy2DAsync cudaMemset$fEnumAttachFlag$fEnumAllocFlagTextureReference$fEnumAddressMode$fEnumFormatKindcudaGetTextureReference'_cudaBindTexture2D'_cudaBindTexture'_cudaBindTexturecudaBindTexture2DcudaGetTextureReferencewith_ withCString_$fStorableTexture$fStorableFormatDesc$fEnumFilterMode DeviceHandlecuMemGetInfo'_cuMemGetAddressRange'_cuMemHostGetDevicePointer'_cuMemsetD32Async'_cuMemsetD16Async'_cuMemsetD8Async'_ cuMemsetD32'_ cuMemsetD16'_ cuMemsetD8'_cuMemcpyPeerAsync'_cuMemcpyPeer'_cuMemcpy2DDtoDAsync'_cuMemcpy2DDtoD'_cuMemcpyDtoDAsync'_cuMemcpyDtoD'_cuMemcpy2DHtoDAsync'_cuMemcpy2DHtoD'_cuMemcpyHtoDAsync'_cuMemcpyHtoD'_cuMemcpy2DDtoHAsync'_cuMemcpy2DDtoH'_cuMemcpyDtoHAsync'_cuMemcpyDtoH'_cuMemAllocManaged'_ cuMemFree'_ cuMemAlloc'_cuMemHostUnregister'_cuMemHostRegister'_cuMemFreeHost'_cuMemHostAlloc'_cuMemHostAlloc cuMemFreeHostcuMemHostRegistercuMemHostUnregister cuMemAlloc cuMemFreecuMemAllocManaged cuMemcpyDtoHcuMemcpyDtoHAsynccuMemcpy2DDtoHcuMemcpy2DDtoHAsync cuMemcpyHtoDcuMemcpyHtoDAsynccuMemcpy2DHtoDcuMemcpy2DHtoDAsync cuMemcpyDtoDcuMemcpyDtoDAsynccuMemcpy2DDtoDcuMemcpy2DDtoDAsync cuMemcpyPeercuMemcpyPeerAsync cuMemsetD8 cuMemsetD16 cuMemsetD32cuMemsetD8AsynccuMemsetD16AsynccuMemsetD32AsynccuMemHostGetDevicePointercuMemGetAddressRange cuMemGetInfo IPCMemHandleuseIPCDevicePtrcuIpcCloseMemHandle'_cuIpcOpenMemHandle'_cuIpcGetMemHandle'_cuIpcGetMemHandlecuIpcOpenMemHandlecuIpcCloseMemHandlenewIPCMemHandle $fEnumIPCFlagcuTexRefSetFormat'_cuTexRefSetFlags'_cuTexRefSetFilterMode'_cuTexRefSetAddressMode'_cuTexRefGetFormat'_cuTexRefGetFilterMode'_cuTexRefGetAddressMode'_cuTexRefSetAddress2DSimple'_cuTexRefSetAddress'_cuTexRefDestroy'_cuTexRefCreate'_cuTexRefCreatecuTexRefDestroycuTexRefSetAddresscuTexRefSetAddress2DSimplecuTexRefGetAddressModecuTexRefGetFilterModecuTexRefGetFormatcuTexRefSetAddressModecuTexRefSetFilterModecuTexRefSetFlagscuTexRefSetFormat $fEnumFormat$fEnumReadModecuModuleGetTexRef'_cuModuleGetGlobal'_cuModuleGetFunction'_cuModuleGetFunctioncuModuleGetGlobalcuModuleGetTexRef resultIfFound