w}       !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~                                  ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~                !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~      !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~                 (!None; *Instance for special casing null pointers. :Given a bit pattern, yield all bit masks that it contains.This does *not* attempt to compute a minimal set of bit masks that when combined yield the bit pattern, instead all contained bit masks are produced. Integral conversion Floating conversion Obtain C value from Haskell .Obtain Haskell  from C value.#Convert a C enumeration to Haskell.#Convert a Haskell enumeration to C.     [2017] Trevor L. McDonellBSDSafeAWThe base path to the CUDA toolkit installation that this package was compiled against.5The path where the CUDA toolkit executables, such as nvcc and ptxas, can be found.NThe path where the CUDA libraries this package was linked against are locatedJThe path where the CUDA headers this package was built against are located[2009..2017] Trevor L. McDonellBSDSafeEcV Execution stream creation flagsPriority of an execution stream. Work submitted to a higher priority stream may preempt execution of work already executing in a lower priority stream. Lower numbers represent higher priorities.A processing stream. All operations in a stream are synchronous and executed in sequence, but operations in different non-default streams may happen out-of-order or concurrently with one another.Use ,s to synchronise operations between streams. ,Possible option flags for waiting for events Event creation flagsZEvents are markers that can be inserted into the CUDA execution stream and later queried.'A reference to page-locked host memory.A  is just a plain  , but the memory has been allocated by CUDA into page locked memory. This means that the data can be copied to the GPU via DMA (direct memory access). Note that the use of the system function mlock? is not sufficient here --- the CUDA version ensures that the physical7 address stays this same, not just the virtual address.To copy data into a  array, you may use for example  withHostPtr together with "# or "$.)A reference to data stored on the device.XThe main execution stream. No operations overlap with operations in the default stream.      [2009..2017] Trevor L. McDonellBSDSafeo0Look at the contents of device memory. This takes an IO action that will be applied to that pointer, the result of which is returned. It would be silly to return the pointer from the action.1?Return a unique handle associated with the given device pointer2-Return a device pointer from the given handle3 The constant 3` contains the distinguished memory location that is not associated with a valid memory location4.Cast a device pointer from one type to another59Advance the pointer address by the given offset in bytes.6nGiven an alignment constraint, align the device pointer to the next highest address satisfying the constraint7XCompute the difference between the second and first argument. This fulfils the relation +p2 == p1 `plusDevPtr` (p2 `minusDevPtr` p1)8EAdvance a pointer into a device array by the given number of elements9Apply an IO action to the memory reference living inside the host pointer object. All uses of the pointer should be inside the 9 bracket.: The constant :` contains the distinguished memory location that is not associated with a valid memory location;,Cast a host pointer from one type to another<8Advance the pointer address by the given offset in bytes=lGiven an alignment constraint, align the host pointer to the next highest address satisfying the constraint><Compute the difference between the second and first argument?AAdvance a pointer into a host array by a given number of elements0123456789:;<=>?0123456789:;<=>?%[2016..2017] Trevor L. McDonellBSDSaferT!Like &'S, but focuses on providing a more detailed description of the value rather than a ()able representation.!@!@[2009..2017] Trevor L. McDonellBSDNone 1yADReturn codes from API functionsRaise a A in the IO Monad#A specially formatted error message|Return the results of a function on successful execution, otherwise return the error string associated with the return codeWReturn the error string associated with an unsuccessful return code, otherwise NothingEReturn the descriptive string associated with a particular error code\@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~\DEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ABC@ABCDSEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~[2009..2017] Trevor L. McDonellBSDNone 6Return the version number of the installed CUDA driver7Return the version number of the installed CUDA runtime\Return the version number of the CUDA library (API) that this package was compiled against.[2009..2017] Trevor L. McDonellBSDNone  A description of how memory read through the texture cache should be interpreted, including the kind of data and the number of bits of each component (x,y,z and w, respectively).Texture channel format kind5access texture using normalised coordinates [0.0,1.0)"A texture referenceBind the memory area associated with the device pointer to a texture reference given by the named symbol. Any previously bound references are unbound.Bind the two-dimensional memory area to the texture reference associated with the given symbol. The size of the area is constrained by (width,height) in texel units, and the row pitch in bytes. Any previously bound references are unbound.#>Returns the texture reference associated with the given symbolTexture addressing modeTexture filtering mode[2009..2017] Trevor L. McDonellBSDNone  Create a new asynchronous stream+Destroy and clean up an asynchronous stream6Determine if all operations in a stream have completed:Block until all operations in a Stream have been completed$The main execution stream (0){- INLINE defaultStream -} defaultStream :: Stream #if CUDART_VERSION < 3010 defaultStream = Stream 0 #else defaultStream = Stream nullPtr #endif  [2009..2017] Trevor L. McDonellBSDNone EF&Options for unified memory allocationsOptions for host allocation Allocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type. The runtime system automatically accelerates calls to functions such as  and # that refer to page-locked memory.Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange7Free page-locked host memory previously allocated with  mallecHostAllocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitable aligned, and not cleared.Execute a computation, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.}Note that kernel launches can be asynchronous, so you may need to add a synchronisation point at the end of the computation..Free previously allocated memory on the deviceQAllocates memory that will be automatically managed by the Unified Memory system[Copy a number of elements from the device to host memory. This is a synchronous operation.Copy memory from the device asynchronously, possibly associated with a particular stream. The destination memory must be page locked.TCopy a 2D memory area from the device to the host. This is a synchronous operation.Copy a 2D memory area from the device to the host asynchronously, possibly associated with a particular stream. The destination array must be page locked.Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a listKCopy a number of elements onto the device. This is a synchronous operation.Copy memory onto the device asynchronously, possibly associated with a particular stream. The source memory must be page-locked.GCopy a 2D memory area onto the device. This is a synchronous operation.Copy a 2D memory area onto the device asynchronously, possibly associated with a particular stream. The source array must be page locked.Write a list of storable elements into a device array. The array must be sufficiently large to hold the entire list. This requires two marshalling operationsCopy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to host, but will not overlap other device operations.Copy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, and may be associated with a particular stream.Copy a 2D memory area from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will not overlap other device operations.Copy a 2D memory area from the first device array (source) to the second device array (destination). The copied areas may not overlay. This operation is asynchronous with respect to the host, and may be associated with a particular stream.%Copy data between the host and device asynchronously, possibly associated with a particular stream. The host-side memory must be page-locked (allocated with ).&TCopy a 2D memory area between the host and device. This is a synchronous operation.'Copy a 2D memory area between the host and device asynchronously, possibly associated with a particular stream. The host-side memory must be page-locked.=Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two copy operations: firstly from a Haskell list into a heap-allocated array, and from there into device memory. The array should be d when no longer required.PWrite a list of storable elements into a newly allocated device array. This is  composed with (.Temporarily store a list of elements into a newly allocated device array. An IO action is applied to the array, the result of which is returned. Similar to 7, this requires two marshalling operations of the data.As with , the memory is freed once the action completes, so you should not return the pointer from the action, and be sure that any asynchronous operations (such as kernel execution) have completed. A variant of Q which also supplies the number of elements in the array to the applied function/Initialise device memory to a given 8-bit valueCCopy data between host and device. This is a synchronous operation. width to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array widthwidth to copy (elements)height to copy (elements) source arraysource array widthdestination arraydestination array width) destinationsourcenumber of elements% destinationsourcenumber of elements& destinationwidth of destination arraysourcewidth of source array width to copyheight to copy' destinationwidth of destination arraysourcewidth of source array width to copyheight to copyThe device memoryNumber of bytesValue to set for each byte!!*+,-./ [2009..2017] Trevor L. McDonellBSDNone &'m Kernel function parameters. Doubles will be converted to an internal float representation on devices that do not support doubles natively.Cache configuration preference Nmaximum block size that can be successively launched (based on register usage) ,number of registers required for each threadA global device function.dNote that the use of a string naming a function was deprecated in CUDA 4.1 and removed in CUDA 5.0.#Obtain the attributes of the named globalZ device function. This itemises the requirements to successfully launch the given kernel.SSpecify the grid and block dimensions for a device call. Used in conjunction with T, this pushes data onto the execution stack that will be popped when a function is ed.qSet the argument parameters that will be passed to the next kernel invocation. This is used in conjunction with  to control kernel execution.On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.rSwitching between configuration modes may insert a device-side synchronisation point for streamed kernel launches Invoke the globalD kernel function on the device. This must be preceded by a call to  and (if appropriate) .Invoke a kernel on a  (gx * gy), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific .grid dimensionsblock dimensionsshared memory per block (bytes)associated processing streamDevice function symbolgrid dimensionsthread block shapeshared memory per block (bytes)(optional) execution stream               [2009..2017] Trevor L. McDonellBSDNone E$Create a new eventDestroy an event?Determine the elapsed time (in milliseconds) between two events0Determines if a event has actually been recordedRecord an event once all operations in the current context (or optionally specified stream) have completed. This operation is asynchronous. Makes all future work submitted to the (optional) stream wait until the given event reports completion before beginning execution. Synchronisation is performed on the device, including when the event and stream are from different device contexts. Requires cuda-3.2. &Wait until the event has been recorded     [2009..2017] Trevor L. McDonellBSDNone 1.b_Raise a CUDAException. Exceptions can be thrown from pure code, but can only be caught in the 0 monad.c%Raise a CUDAException in the IO Monadd#A specially formatted error messageeReturn the results of a function on successful execution, otherwise throw an exception with an error string associated with the return codeflThrow an exception with an error string associated with an unsuccessful return code, otherwise return unit.jIReturn a descriptive error string associated with a particular error codeG@!"#$%NM/&aK.FG=PQ*+,-TUVWH?_`XYZ[LABC^'()0123456789:;<>@DEIJORS\]bcdefG$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`a!"#@bcdef!"#$=%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`a [2009..2017] Trevor L. McDonellBSDNone 4m7Return the version number of the installed CUDA driver.n\Return the version number of the CUDA library (API) that this package was compiled against.mnmn [2009..2017] Trevor L. McDonellBSDNone E^IxCreate a new stream. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1ga581f0c5833e21ded8b5a56594e243f4yCreate a stream with the given priority. Work submitted to a higher-priority stream may preempt work already executing in a lower priority stream.The convention is that lower numbers represent higher priorities. The default priority is zero. The range of meaningful numeric priorities can be queried using *. If the specified priority is outside the supported numerical range, it will automatically be clamped to the highest or lowest number in the rangeRequires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g95c1a8c7c3dacb13091692dd9c7f7471zHDestroy a stream. If the device is still doing work in the stream when z is called, the function returns immediately and the resources associated with the stream will be released automatically once the device has completed all work. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g244c8833de4596bcd31a06cdf21ee758{5Check if all operations in the stream have completed. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g1b0d24bbe97fa68e4bc511fb6adfeb0b|AWait until the device has completed all operations in the Stream. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g15e49dd91ec15991eb7c0a741beb7dad}Query the priority of a stream.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g5bd5cb26915a2ecf1921807339488484}Wait on a memory location. Work ordered after the operation will block until the given condition on the memory is satisfied. yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g629856339de7bc6606047385addbb398 yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g6910c1258c5f15aa5d699f0fd60d6933$Requires CUDA-8.0 for 32-bit values.$Requires CUDA-9.0 for 64-bit values.oWrite a value to memory, (presumably) after all preceding work in the stream has completed. Unless the option wC is supplied, the write is preceded by a system-wide memory fence. yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g091455366d56dc2f1f69726aafa369b0 yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1gc8af1e8b96d7561840affd5217dd6830$Requires CUDA-8.0 for 32-bit values.$Requires CUDA-9.0 for 64-bit values. opqrstuvwxyz{|}~ uvwopqrstxyz{|}~opqrstuvw[2009..2017] Trevor L. McDonellBSDNone tProfiler output modeInitialise the CUDA profiler.The configuration file is used to specify profiling options and profiling counters. Refer to the "Compute Command Line Profiler User Guide" for supported profiler options and counters.sNote that the CUDA profiler can not be initialised with this function if another profiling tool is already active. \http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PROFILER.html#group__CUDA__PROFILERBegin profiling collection by the active profiling tool for the current context. If profiling is already enabled, then this has no effect. and  can be used to programatically control profiling granularity, by allowing profiling to be done only on selected pieces of code. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PROFILER.html#group__CUDA__PROFILER_1g8a5314de2292c2efac83ac7fcfa9190eStop profiling collection by the active profiling tool for the current context, and force all pending profiler events to be written to the output file. If profiling is already inactive, this has no effect. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PROFILER.html#group__CUDA__PROFILER_1g4d8edef6174fd90165e6ac838f320a5fIconfiguration file that itemises which counters and/or options to profile2output file where profiling results will be stored[2009..2017] Trevor L. McDonellBSDNone ECreate a new event yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g450687e75f3ff992fe01662a43d9d3dbDestroy an event yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g593ec73a8ec5a5fc031311d3e4dca1ef?Determine the elapsed time (in milliseconds) between two events yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1gdfb1178807353bbcaa9e245da497cf970Determines if a event has actually been recorded yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g6f0704d755066b0ee705749ae911deefRecord an event once all operations in the current context (or optionally specified stream) have completed. This operation is asynchronous. yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g95424d3be52c4eb95d83861b70fb89d1Makes all future work submitted to the (optional) stream wait until the given event reports completion before beginning execution. Synchronisation is performed on the device, including when the event and stream are from different device contexts.Requires CUDA-3.2. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__STREAM.html#group__CUDA__STREAM_1g6a898b652dfc6aa1d5c8d97062618b2f&Wait until the event has been recorded yhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EVENT.html#group__CUDA__EVENT_1g9e520d34e51af7f5375610bca4add99c   [2009..2017] Trevor L. McDonellBSDNone ]"A CUDA inter-process event handle.kCreate an inter-process event handle for a previously allocated event. The event must be created with the  and L event flags. The returned handle may then be sent to another process and Ted to allow efficient hardware synchronisation between GPU work in other processes.:After the event has been opened in the importing process, , , ,  may be used in either process.FPerforming operations on the imported event after the event has been )ed in the exporting process is undefined.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gea02eadd12483de5305878b13288a86cOpen an inter-process event handle for use in the current process, returning an event that can be used in the current process and behaving as a locally created event with the  flag specified.The event must be freed with Q. Performing operations on the imported event after the exported event has been *ed in the exporting process is undefined.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf1d525918b6c643b99ca8c8e42e36c2e12[2009..2017] Trevor L. McDonellBSDNone9 Warp size2Number of SIMD arithmetic units per multiprocessor4Maximum number of in-flight warps per multiprocessor7Maximum number of in-flight threads on a multiprocessor<Maximum number of thread blocks resident on a multiprocessor8Total amount of shared memory per multiprocessor (bytes)8Maximum amount of shared memory per thread block (bytes)-Total number of registers in a multiprocessor%Maximum number of registers per blockRegister allocation unit sizeJHow multiprocessor resources are divided (register allocation granularity)&Maximum number of registers per thread*Shared memory allocation unit size (bytes)Warp allocation granularity$Warp register allocation granularityPCI bus ID of the device PCI device ID PCI domain ID"The properties of a compute device IdentifierSupported compute capability.Available global memory on the device in bytes0Available constant memory on the device in bytes*Available shared memory per block in bytes32-bit registers per block!Warp size in threads (SIMD width)#Maximum number of threads per block,Maximum number of threads per multiprocessor)Maximum size of each dimension of a block(Maximum size of each dimension of a gridMaximum texture dimensionsClock frequency in kilohertz'Number of multiprocessors on the device/Maximum pitch in bytes allowed by memory copiesGlobal memory bus width in bits(Peak memory clock frequency in kilohertz"Alignment requirement for textures8Device can concurrently copy memory and execute a kernel9Device can possibly execute multiple kernels concurrently0Device supports and has enabled error correctionNumber of asynchronous enginesSize of the L2 cache in bytes%PCI device information for the device3Whether this is a Tesla device using the TCC driver+Whether there is a runtime limit on kernelsAs opposed to discreteDevice can use pinned memory3Device shares a unified address space with the host!Device supports stream priorities+Device supports caching globals in L1 cache*Device supports caching locals in L1 cache8Device supports allocating managed memory on this systemDevice is on a multi-GPU boardGUnique identifier for a group of devices associated with the same boardEGPU compute capability, major and minor revision number respectively.+The compute mode the device is currently inIExtract some additional hardware resource limitations for a given device.I@I@'[2009..2017] Trevor L. McDonellBSDNone EcDevice limit flags4Possible option values for direct peer memory accessDevice execution flagsA device identifier?Select the compute device which best matches the given criteria,Returns which device is currently being usedVReturns the number of devices available for execution, with compute capability >= 1.04Return information about the selected compute device'Set device to be used for GPU execution*Set flags to be used for device executions8Set list of devices for CUDA execution in priority orderpBlock until the device has completed all preceding requested tasks. Returns an error if one of the tasks fails.Explicitly destroys and cleans up all runtime resources associated with the current device in the current process. Any subsequent API call will reinitialise the device.Note that this function will reset the device immediately. It is the caller s responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called.Queries if the first device can directly access the memory of the second. If direct access is possible, it can then be enabled with . Requires cuda-4.0.If the devices of both the current and supplied contexts support unified addressing, then enable allocations in the supplied context to be accessible by the current context. Requires cuda-4.0.bDisable direct memory access from the current context to the supplied context. Requires cuda-4.0. 7Query compute 2.0 call stack limits. Requires cuda-3.1. 5Set compute 2.0 call stack limits. Requires cuda-3.1.K  K  +[2009..2017] Trevor L. McDonellBSDNonem0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~       ,[2009..2017] Trevor L. McDonellBSDNone0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~       [2009..2017] Trevor L. McDonellBSDNone E'7Device code formats that can be used for online linking.$Online compilation fallback strategy1&Online compilation target architectureAResults of online compilationC milliseconds spent compiling PTXDinformation about PTX assemblyEthe compiled moduleF,Just-in-time compilation and linking optionsG&maximum number of registers per threadH)number of threads per block to target forI/level of optimisation to apply (1-4, default 4)J5compilation target, otherwise determined from contextK-fallback strategy if matching cubin not foundL/generate debug info (-g) (requires cuda >= 5.5)MCgenerate line number information (-lineinfo) (requires cuda >= 5.5)N+verbose log messages (requires cuda >= 5.5)OJA reference to a Module object, containing collections of device functionsRLoad the contents of the specified file (either a ptx or cubin file) to create a new module, and load that module into the current context. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g366093bd269dafd0af21f1c7d18115d3SLoad the contents of the given image into a new module, and load that module into the current context. The image is (typically) the contents of a cubin or PTX file.Note that the 3M will be copied into a temporary staging area so that it can be passed to C. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g04ce266ce03720f479eab76136b90c0bTAs Sd, but read the image data from the given pointer. The image is a NULL-terminated sequence of bytes.ULoad the contents of the given image into a module with online compiler options, and load the module into the current context. The image is (typically) the contents of a cubin or PTX file. The actual attributes of the compiled kernel can be probed using requires.Note that the 3M will be copied into a temporary staging area so that it can be passed to C. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g9e8047e9dbf725f0cd7cafd18bfd4d12VAs Ud, but read the image data from the given pointer. The image is a NULL-terminated sequence of bytes.W)Unload a module from the current context. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g8ea3d716524369de3763104ced4ea57bF !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYFOPQFGHIJKLMN123456789:;<=>?@ABCDE./0'()*+,- !"#$%&RSTUVWXY !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQ[2009..2017] Trevor L. McDonellBSDNone E,FjA pending JIT linker statek5Create a pending JIT linker invocation. The returned j should be l[ed once no longer needed. The device code machine size will match the calling application.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g86ca4052a2fab369cb943523908aa80dl-Destroy the state of a JIT linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g01b7ae2a34047b05716969af245ce2d9mdComplete a pending linker invocation and load the current module. The link state will be destroyed.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g818fcd84a4150a997c0bba76fef4e716n1Add an input file to a pending linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g1224c0fd48d4a683f3ce19997f200a8co,Add an input to a pending linker invocation.Requires CUDA-5.5. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g3ebcd2ccb772ba9c120937a2d2831b77pAs oO, but read the specified number of bytes of image data from the given pointer.'()*+,-FGHIJKLMNjklmnopjFGHIJKLMN'()*+,-klmnopj45[2009..2017] Trevor L. McDonellBSDNone EcC/ rjPossible option flags for CUDA initialisation. Dummy instance until the API exports actual option values.sDevice attributes A CUDA deviceVInitialise the CUDA driver API. This must be called before any other driver function. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__INITIALIZE.html#group__CUDA__INITIALIZE_1g0a2f1517e1bd8502c7194c3a8c134bc3;Return a handle to the compute device at the given ordinal. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g8bdd1cc7201304b01357b8034f6587cb3Return the selected attribute for the given device. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g9c3e1414f0ad901d3278a4d6645fc266:Return the number of device with compute capability > 1.0. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1g52b5ce05cb8c5fb6831b2c0ff2887c74#The identifying name of the device. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gef75aa30df95446a845f2a7b9fffbb7f,Return the properties of the selected device1The total memory available on the device (bytes). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__DEVICE.html#group__CUDA__DEVICE_1gc6a0d6551335a3780f9f3c967a0fde5dAReturn the compute compatibility revision supported by the devicerstuvwxyz{|}~stuvwxyz{|}~rsgtuvwxyz{|}~[2009..2017] Trevor L. McDonellBSDNone Ecq Context creation flagsA device contextCreate a new CUDA context and associate it with the calling thread. The context is created with a usage count of one, and the caller of  must call  when done using the context. If a context is already current to the thread, it is supplanted by the newly created context and must be restored by a subsequent call to . uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g65dc0012348bc84810e2103a40d8e2cf{Increments the usage count of the context. API: no context flags are currently supported, so this parameter must be empty.1Detach the context, and destroy if no longer usedeDestroy the specified context, regardless of how many threads it is current to. The context will be ed from the current thread's context stack, but if it is current on any other threads it will remain current to those threads, and attempts to access it will result in an error. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g27a365aebb0eb548166309f58a1e8b8e3Return the context bound to the calling CPU thread.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g8f13165846b73750693640fb3e8380d01Bind the specified context to the calling thread.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gbe562ee6258b4fcc272ca6478ca2a2f71Return the device of the currently active context uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g4e84b109eba36cdaaade167f34ae881eyPop the current CUDA context from the CPU thread. The context may then be attached to a different CPU thread by calling . uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g2fac188026a062d92e91a8687d0a7902Push the given context onto the CPU's thread stack of current contexts. The specified context becomes the CPU thread's current context, so all operations that operate on the current context are affected. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gb02d4c850eb16f861fe5a29682cc90babBlock until the device has completed all preceding requests. If the context was created with the F flag, the CPU thread will block until the GPU has finished its work. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g7a54725f28d34b8c6299f0c6ca579616 [2009..2017] Trevor L. McDonellBSDNone EB% &Options for unified memory allocations Options for host allocationAllocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type.Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange.Host memory allocated in this way is automatically and immediately accessible to all contexts on all devices which support unified addressing. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gdd8311286d2c2691605362c689bc64e0As , but return a 6V instead. The array will be deallocated automatically once the last reference to the 6 is dropped.*Free a section of page-locked host memory. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g62e0fdbe181dab6b1c90fa1a51c7b92c\Page-locks the specified array (on the host) and maps it for the device(s) as specified by the given allocation flags. Subsequently, the memory is accessed directly by the device so can be read and written with much higher bandwidth than pageable memory that has not been registered. The memory range is added to the same tracking mechanism as 9 to automatically accelerate calls to functions such as  .Note that page-locking excessive amounts of memory may degrade system performance, since it reduces the amount of pageable memory available. This is best used sparingly to allocate staging areas for data exchange.MThis function has limited support on Mac OS X. OS 10.7 or later is required.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf0a9fe11544326dabd743b7aa6b54223FUnmaps the memory from the given pointer, and makes it pageable again.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g63f450c8125359be87b7623b1c0b2a14Allocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitably aligned for any type, and is not cleared. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb82d2a09844a58dd9e744dc31e8aa467&Execute a computation on the device, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.eNote that kernel launches can be asynchronous, so you may want to add a synchronisation point using -. as part of the continuation.#Release a section of device memory. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g89b3f154e17cc89b6eea277dbdf5c93aAllocates memory that will be automatically managed by the Unified Memory system. The returned pointer is valid on the CPU and on all GPUs which supported managed memory. All accesses to this pointer must obey the Unified Memory programming model.On a multi-GPU system with peer-to-peer support, where multiple GPUs support managed memory, the physical storage is created on the GPU which is active at the time  is called. All other GPUs will access the array at reduced bandwidth via peer mapping over the PCIe bus. The Unified Memory system does not migrate memory between GPUs.On a multi-GPU system where multiple GPUs support managed memory, but not all pairs of such GPUs have peer-to-peer support between them, the physical storage is allocated in system memory (zero-copy memory) and all GPUs will access the data at reduced bandwidth over the PCIe bus.Requires CUDA-6.0 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb347ded34dc326af404aa02af5388a32Pre-fetches the given number of elements to the specified destination device. If the specified device is Nothing, the data is pre-fetched to host memory. The pointer must refer to a memory range allocated with . }http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED_1gfe94f8b7fb56291ebcea44261aa4cb84Requires CUDA-8.0.[Copy a number of elements from the device to host memory. This is a synchronous operation. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g3480368ee0208a98f75019c9a8450893Copy memory from the device asynchronously, possibly associated with a particular stream. The destination host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g56f30236c7c5247f8e061b59d3268362,Copy a 2D array from the device to the host. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27Copy a 2D array from the device to the host asynchronously, possibly associated with a particular execution stream. The destination host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f274Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a list. KCopy a number of elements onto the device. This is a synchronous operation. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4d32266788c440b0220b1a9ba5795169!Copy memory onto the device asynchronously, possibly associated with a particular stream. The source host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g1572263fe2597d7ba4f6964597a354a3",Copy a 2D array from the host to the device. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27#Copy a 2D array from the host to the device asynchronously, possibly associated with a particular execution stream. The source host memory must be page-locked. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f274$Write a list of storable elements into a device array. The device array must be sufficiently large to hold the entire list. This requires two marshalling operations.%Copy the given number of elements from the first device array (source) to the second device (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g1725774abf8b51b91945f3336b778c8b&,Copy the given number of elements from the first device array (source) to the second device array (destination). The copied areas may not overlap. The operation is asynchronous with respect to the host, and can be asynchronous to other device operations by associating it with a particular stream. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g39ea09ba682b8eccc9c3e0c04319b5c8'Copy a 2D array from the first device array (source) to the second device array (destination). The copied areas must not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g27f885b30c34cc20a663a671dbf6fc27($Copy a 2D array from the first device array (source) to the second device array (destination). The copied areas may not overlap. The operation is asynchronous with respect to the host, and can be asynchronous to other device operations by associating it with a particular execution stream. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g4acf155faeb969d9d21f5433d3d0f274)0Copies an array from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host, but serialised with respect to all pending and future asynchronous work in the source and destination contexts. To avoid this synchronisation, use * instead.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ge1f5c7771544fee150ada8853c7cbf4a*Copies from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host and all work in other streams and devices.Requires CUDA-4.0. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g82fcecb38018e64b98616a8ac30112f2+@Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two memory copies: firstly from a Haskell list to a heap allocated array, and from there onto the graphics device. The memory should be d when no longer required.,PWrite a list of storable elements into a newly allocated device array. This is + composed with (.-Temporarily store a list of elements into a newly allocated device array. An IO action is applied to to the array, the result of which is returned. Similar to ,', this requires copying the data twice.As with , the memory is freed once the action completes, so you should not return the pointer from the action, and be wary of asynchronous kernel execution.. A variant of -Q which also supplies the number of elements in the array to the applied function/dSet a number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6e582bf866e9e2fb014297bfaf354d7b uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g7d805e610054392a4d11e8a8bf5eb35c uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g983e8d8759acd1b64326317481fbf1320Set the number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide. The operation is asynchronous and may optionally be associated with a stream.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gaef08a7ccd61112f94e82f2b30d43627 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gf731438877dd8ec875e4c43d848c878c uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g58229da5d30f1c0cdf667b320ec2c0f51fReturn the device pointer associated with a mapped, pinned host buffer, which was allocated with the  option by .;Currently, no options are supported and this must be empty. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g57a39e5cba26af4d06be67fc77cc62f02HReturn the base address and allocation size of the given device pointer. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g64fee5711274a2a0573a789c94d8299b3bReturn the amount of free and total memory respectively available to the current context (bytes). uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g808f555540d0143a331cc42aa98835c0 width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate to" width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate# width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate to' width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate( width to copy (elements)height to copy (elements) source arraysource array widthsource x-coordinatesource y-coordinatedestination arraydestination array widthdestination x-coordinate destination y-coordinate stream to associate to)number of array elementssource array and contextdestination array and context*number of array elementssource array and context$destination array and device contextstream to associate with-      !"#$%&'()*+,-./012345-      !"#$%&'()*,+-./012354     [2009..2017] Trevor L. McDonellBSDNone p>Texture data formatsGTexture read mode optionsK Texture reference filtering modeN"Texture reference addressing modesSA texture referenceVICreate a new texture reference. Once created, the application must call setPtr to associate the reference with allocated memory. Other texture reference functions are used to specify the format and interpretation to be used when the memory is read through this reference. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF__DEPRECATED.html#group__CUDA__TEXREF__DEPRECATED_1g0084fabe2c6d28ffcf9d9f5c7164f16cWDestroy a texture reference. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF__DEPRECATED.html#group__CUDA__TEXREF__DEPRECATED_1gea8edbd6cf9f97e6ab2b41fc6785519dX{Bind a linear array address of the given size (bytes) as a texture reference. Any previously bound references are unbound. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g44ef7e5055192d52b3d43456602b50a8YBind a linear address range to the given texture reference as a two-dimensional arena. Any previously bound reference is unbound. Note that calls to ` can not follow a call to Y! for the same texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g26f709bbe10516681913d1ffe8756ee2ZGet the addressing mode used by a texture reference, corresponding to the given dimension (currently the only supported dimension values are 0 or 1). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1gfb367d93dc1d20aab0cf8ce70d543b33[3Get the filtering mode used by a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g2439e069746f69b940f2f4dbc78cdf87\JGet the data format and number of channel components of the bound texture. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g90936eb6c7c4434a609e1160c278ae53]KSpecify the addressing mode for the given dimension of a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g85f4a13eeb94c8072f61091489349bcb^WSpecify the filtering mode to be used when reading memory through a texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g595d0af02c55576f8c835e4efd1f39c0_SSpecify additional characteristics for reading and indexing the texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g554ffd896487533c36810f2e45bb7a28`pSpecify the format of the data and number of packed components per element to be read by the texture reference. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TEXREF.html#group__CUDA__TEXREF_1g05585ef8ea2fec728a03c6c8f87cf07a$>F?BCD@AEGHIJKLMNOPQRSTUVWXYZ[\]^_`a$STU>?@ABCDEFNOPQRKLMGHIJXYZ[\]^`_VWa>?@ABCDEFGHIJKLMNOPQRSTU[2009..2017] Trevor L. McDonellBSDNone Eq'Flags for controlling IPC memory accesss:A CUDA memory handle used for inter-process communication.tCreate an inter-process memory handle for an existing device memory allocation. The handle can then be sent to another process and made available to that process via u.Requires CUDA-4.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g6f1b5be767b275f016523b2ac49ebec1u}Open an inter-process memory handle exported from another process, returning a device pointer usable in the current process.Maps memory exported by another process with 'export into the current device address space. For contexts on different devices, u8 can attempt to enable peer access if the user called /, and is controlled by the r flag.8Each handle from a given device and context may only be uDed by one context per device per other process. Memory returned by u must be freed via v.Requires CUDA-4.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1ga8bd126fcff919a0c996b7640f197b79v#Close and unmap memory returned by ut. The original allocation in the exporting process as well as imported mappings in other processes are unaffected._Any resources used to enable peer access will be freed if this is the last mapping using them.Requires CUDA-4.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gd6f5d5bcf6376c6853b64635b0157b9eqrstuvsqrtuvqrs78[2009..2017] Trevor L. McDonellBSDNone K}Get the status of the primary context. Returns whether the current context is active, and the flags it was (or will be) created with.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g65f3e018721b6d90aa05cfb56250f469~Specify the flags that the primary context should be created with. Note that this is an error if the primary context is already active.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1gd779a84f17acdad0d9143d9fe719cfdfDestroy all allocations and reset all state on the primary context of the given device in the current process. Requires cuda-7.0Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g5d38802e8600340283958a117466ce12Release the primary context on the given device. If there are no more references to the primary context it will be destroyed, regardless of how many threads it is current to.Unlike D this does not pop the context from the stack in any circumstances.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1gf2a8bc16f8df0c88031f6a1ba3d6e8adRetain the primary context for the given device, creating it if necessary, and increasing its usage count. The caller must call & when done using the context. Unlike : the newly retained context is not pushed onto the stack.Requires CUDA-7.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX_1g9051f2d5c31501997a6cb0530290a300}~}~[2009..2017] Trevor L. McDonellBSDNone EcPeer-to-peer attributes4Possible option values for direct peer memory accessQueries if the first device can directly access the memory of the second. If direct access is possible, it can then be enabled with .Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g496bdaae1f632ebfb695b99d2c40f19eIf the devices of both the current and supplied contexts support unified addressing, then enable allocations in the supplied context to be accessible by the current context.Note that access is unidirectional, and in order to access memory in the current context from the peer context, a separate symmetric call to  is required.Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g0889ec6728e61c05ed359551d67b3f5aDisable direct memory access from the current context to the supplied peer context, and unregisters any registered allocations.Requires CUDA-4.0. http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g5b4b6936ea868d4954ce4d841a3b48102Queries attributes of the link between two devices http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PEER__ACCESS.html#group__CUDA__PEER__ACCESS_1g4c55c60508f8eba4546b51f2ee545393Requires CUDA-8.0  since 0.9.0.0  [2009..2017] Trevor L. McDonellBSDNone Ec -Device shared memory configuration preference%Device cache configuration preferenceDevice limits flags>Return the flags that were used to create the current context.Requires CUDA-7.0 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1gf81eef983c1e3b2ef4f166d7a930c86d$Query compute 2.0 call stack limits.Requires CUDA-3.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g9f2d47d1745752aa16da7ed0d111b6a8<Specify the size of the call stack, for compute 2.0 devices.Requires CUDA-3.1. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g0651954dfb9788173e60a9af7201e65aOn devices where the L1 cache and shared memory use the same hardware resources, this function returns the preferred cache configuration for the current context.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g40b6b141698f76744dea6e39b9a25360On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the current context. This is only a preference.$Any function configuration set via 03 will be preferred over this context-wide setting.Requires CUDA-3.2. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g54699acf7e2ef27279d013ca2095f4a3Return the current size of the shared memory banks in the current context. On devices with configurable shared memory banks,  can be used to change the configuration, so that subsequent kernel launches will by default us the new bank size. On devices without configurable shared memory, this function returns the fixed bank size of the hardware.Requires CUDA-4.2 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g17153a1b8b8c756f7ab8505686a4ad74On devices with configurable shared memory banks, this function will set the context's shared memory bank size that will be used by default for subsequent kernel launches._Changing the shared memory configuration between launches may insert a device synchronisation. Shared memory bank size does not affect shared memory usage or kernel occupancy, but may have major effects on performance. Larger bank sizes allow for greater potential bandwidth to shared memory, but change the kinds of accesses which result in bank conflicts.Requires CUDA-4.2 uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g2574235fa643f8f251bf7bc28fac3692\Returns the numerical values that correspond to the greatest and least priority execution streams in the current context respectively. Stream priorities follow the convention that lower numerical numbers correspond to higher priorities. The range of meaningful stream priorities is given by the inclusive range [greatestPriority,leastPriority].Requires CUDA-5.5. uhttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__CTX.html#group__CUDA__CTX_1g137920ab61a71be6ce67605b9f294091-[2009..2017] Trevor L. McDonellBSDNone9[2017] Trevor L. McDonellBSDNone EVIInformation about a pointerAdvise about the usage of a given range of memory. If the supplied device is Nothing, then the preferred location is taken to mean the CPU. }http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__UNIFIED.html#group__CUDA__UNIFIED_1g27608c857a9254789c13f3e3b72029e2Requires CUDA-8.0.9:;<=>?@A[2009..2017] Trevor L. McDonellBSDNone  &'EH1 Kernel function parametersFunction attributesA  __global__ device functionMReturns the value of the selected attribute requirement for the given kernel. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g5e92a1b0d8d1b82cb00dcfb2de15961bOn devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.sSwitching between configuration modes may insert a device-side synchronisation point for streamed kernel launches.Requires CUDA-3.0. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g40f8c11e81def95dc0072a375f9656819Set the shared memory configuration of a device function.On devices with configurable shared memory banks, this will force all subsequent launches of the given device function to use the specified shared memory bank size configuration. On launch of the function, the shared memory configuration of the device will be temporarily changed if needed to suit the function configuration. Changes in shared memory configuration may introduction a device side synchronisation between kernel launches.,Any per-function configuration specified by setSharedMemConfig9 will override the context-wide configuration set with 1.?Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.MThis function will do nothing on devices with fixed shared memory bank size.Requires CUDA-5.0. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g430b913f24970e63869635395df6d9f5Invoke a kernel on a (gx * gy * gz), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific .In , the number of kernel parameters and their offsets and sizes do not need to be specified, as this information is retrieved directly from the kernel's image. This requires the kernel to have been compiled with toolchain version 3.2 or later.The alternative  will pass the arguments in directly, requiring the application to know the size and alignment/padding of each kernel parameter. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15Invoke a kernel on a (gx * gy * gz), grid of blocks, where each block contains (tx * ty * tz)x threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific .In , the number of kernel parameters and their offsets and sizes do not need to be specified, as this information is retrieved directly from the kernel's image. This requires the kernel to have been compiled with toolchain version 3.2 or later.The alternative  will pass the arguments in directly, requiring the application to know the size and alignment/padding of each kernel parameter. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15Invoke a kernel on a (gx * gy * gz), grid of blocks, where each block contains (tx * ty * tz) threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific stream.@The thread blocks can cooperate and synchronise as they execute.6The device on which this kernel is invoked must have 2 3.tThe total number of blocks launched can not exceed the maximum number of active thread blocks per multiprocessor (41), multiplied by the number of multiprocessors (5).3The kernel can not make use of dynamic parallelism. whttp://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g06d753134145c4584c0c62525c1894cbRequires CUDA-9.0  since 0.9.0.0Invoke the kernel on a size (w,h)\ grid of blocks. Each block contains the number of threads specified by a previous call to 5. The launch may also be associated with a specific . Specify the (x,y,z)^ dimensions of the thread blocks that are created when the given kernel function is launched.tSet the number of bytes of dynamic shared memory to be available to each thread block when the function is launchedFSet the parameters that will specified next time the kernel is invokedfunction to executeblock grid dimensionthread block shapeshared memory (bytes)(optional) stream to execute inlist of function parametersfunction to executeblock grid dimensionthread block shapeshared memory (bytes)(optional) stream to execute inlist of function parametersfunction to executeblock grid dimensionthread block shapeshared memory (bytes)(optional) stream to execute inlist of function parameters   B[2009..2017] Trevor L. McDonellBSDNone TAReturns a function handle. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1ga52be009b0d4045811b30c965e1cb2cf;Return a global pointer, and size of the global (in bytes). {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1gf3e43672e26073b1081476dbf47a86abReturn a handle to a texture reference. This texture reference handle should not be destroyed, as the texture will be destroyed automatically when the module is unloaded. {http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MODULE.html#group__CUDA__MODULE_1g9607dcbf911c16420d5264273f2b56086[2009..2017] Trevor L. McDonellBSDNoneU3'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPRSTUVW7[2009..2017] Trevor L. McDonellBSDNoneV0123456789:;<=>?@!"#$%NM/&aK.FG=PQ*+,-TUVWH?_`XYZ[LABC^'()0123456789:;<>@DEIJORS\]bcdefmn'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPRSTUVWrstuvwxyz{|}~      !"#$%&'()*+,-./0123 [2009..2017] Trevor L. McDonellBSDNone{ !Active threads per multiprocessor'Active thread blocks per multiprocessorActive warps per multiprocessor*Occupancy of each multiprocessor (percent)BCalculate occupancy data for a given GPU and kernel resource usageOptimise multiprocessor occupancy as a function of thread block size and resource usage. This returns the smallest satisfying block size in increments of a single warp.As G, but with a generator that produces the specific thread block sizes that should be tested. The generated list can produce values in any order, but the last satisfying block size will be returned. Hence, values should be monotonically decreasing to return the smallest block size yielding maximum occupancy, and vice-versa.bIncrements in powers-of-two, over the range of supported thread block sizes for the given device.bDecrements in powers-of-two, over the range of supported thread block sizes for the given device.[Decrements in the warp size of the device, over the range of supported thread block sizes.[Increments in the warp size of the device, over the range of supported thread block sizes.mDetermine the maximum number of CTAs that can be run simultaneously for a given kernel / device combination."Properties of the card in questionThreads per blockRegisters per threadShared memory per block (bytes)Architecture to optimise for1Register count as a function of thread block size>Shared memory usage (bytes) as a function of thread block sizeArchitecture to optimise forThread block sizes to consider1Register count as a function of thread block size>Shared memory usage (bytes) as a function of thread block size"Properties of the card in questionThreads per blockRegisters per threadShared memory per block (bytes)!Maximum number of resident blocks8[2009..2017] Trevor L. McDonellBSDNone|cW@C9:;<=>?@AABCDEFGHIIJKKLMMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrst%uvwxyz{|}~     # !"#$%&'()*+,-. / 0 1 2 3 4  5 6 7 8 8 9 : ; < = > ? @ A B C D E F G H I   J K L M  v w x y z  N O P       Q R S T U V W X Y Z [ \ ]  ^  _    ` a    b c     d   e f         g h     i j k l m n o p q r  s    t u M v w x y z { | }~JKLM45F./      !"#$%&''()*+,-./01234456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~32F.     # !"#$%'()*+,     / !"#$%&'()*+,1*-./0123456789:;<=>??@ABCDEFGHIJKLMNOPQRSTUV/013WXYZ[\]^_`ab>>c0dDefCghAijklmno p p q r s t u v w x y z { | } ~ !!!!!!!!!!!!!!!!!!!!!!%oN#cuda-0.9.0.2-K3TnIBXtJqq2m6G5rGa2RYForeign.CUDA.PathForeign.CUDA.TypesForeign.CUDA.PtrForeign.CUDA.Runtime.ErrorForeign.CUDA.Runtime.UtilsForeign.CUDA.Runtime.TextureForeign.CUDA.Runtime.StreamForeign.CUDA.Runtime.MarshalForeign.CUDA.Runtime.ExecForeign.CUDA.Runtime.EventForeign.CUDA.Driver.ErrorForeign.CUDA.Driver.UtilsForeign.CUDA.Driver.StreamForeign.CUDA.Driver.ProfilerForeign.CUDA.Driver.EventForeign.CUDA.Driver.IPC.EventForeign.CUDA.Analysis.DeviceForeign.CUDA.Runtime.DeviceForeign.CUDA.Driver.Module.BaseForeign.CUDA.Driver.Module.LinkForeign.CUDA.Driver.Device Foreign.CUDA.Driver.Context.BaseForeign.CUDA.Driver.MarshalForeign.CUDA.Driver.TextureForeign.CUDA.Driver.IPC.Marshal#Foreign.CUDA.Driver.Context.Primary Foreign.CUDA.Driver.Context.Peer"Foreign.CUDA.Driver.Context.ConfigForeign.CUDA.Driver.UnifiedForeign.CUDA.Driver.Exec Foreign.CUDA.Driver.Module.QueryForeign.CUDA.Analysis.OccupancyForeign.CUDA.Internal.C2HSForeign.Marshal.Array copyArray moveArrayText.Show.Describe Text.ShowShow Text.ReadreadgetStreamPriorityRangeForeign.CUDA.Runtime Foreign.CUDAForeign.CUDA.Driver.ContextsyncaddsetCacheConfigFun setSharedMem attributeCooperativeLaunchthreadBlocksPerMPmultiProcessorCountForeign.CUDA.Driver.ModuleForeign.CUDA.DriverForeign.CUDA.AnalysiscudaInstallPath cudaBinPathcudaLibraryPathcudaIncludePath StreamFlag StreamDefault NonBlockingStreamPriorityStream useStreamWaitFlag EventFlag EventDefault BlockingSync DisableTiming InterprocessEventuseEventHostPtr useHostPtr DevicePtr useDevicePtr defaultStream$fStorableDevicePtr$fShowDevicePtr$fStorableHostPtr $fShowHostPtr$fEnumEventFlag$fEnumWaitFlag$fEnumStreamFlag $fEqDevicePtr$fOrdDevicePtr $fEqHostPtr $fOrdHostPtr $fEqEvent $fShowEvent $fEqEventFlag$fShowEventFlag$fBoundedEventFlag $fEqStream $fShowStream$fEqStreamFlag$fShowStreamFlag$fBoundedStreamFlag withDevicePtrdevPtrToWordPtrwordPtrToDevPtr nullDevPtr castDevPtr plusDevPtr alignDevPtr minusDevPtr advanceDevPtr withHostPtr nullHostPtr castHostPtr plusHostPtr alignHostPtr minusHostPtradvanceHostPtrdescribe CUDAExceptionExitCode UserErrorStatusSuccessMissingConfigurationMemoryAllocationInitializationError LaunchFailurePriorLaunchFailure LaunchTimeoutLaunchOutOfResourcesInvalidDeviceFunctionInvalidConfiguration InvalidDevice InvalidValueInvalidPitchValue InvalidSymbolMapBufferObjectFailedUnmapBufferObjectFailedInvalidHostPointerInvalidDevicePointerInvalidTextureInvalidTextureBindingInvalidChannelDescriptorInvalidMemcpyDirectionAddressOfConstantTextureFetchFailedTextureNotBoundSynchronizationErrorInvalidFilterSettingInvalidNormSettingMixedDeviceExecutionCudartUnloadingUnknownNotYetImplementedMemoryValueTooLargeInvalidResourceHandleNotReadyInsufficientDriverSetOnActiveProcessInvalidSurfaceNoDeviceECCUncorrectableSharedObjectSymbolNotFoundSharedObjectInitFailedUnsupportedLimitDuplicateVariableNameDuplicateTextureNameDuplicateSurfaceNameDevicesUnavailableInvalidKernelImageNoKernelImageForDeviceIncompatibleDriverContextPeerAccessAlreadyEnabledPeerAccessNotEnabledDeviceAlreadyInUseProfilerDisabledProfilerNotInitializedProfilerAlreadyStartedProfilerAlreadyStoppedAssert TooManyPeersHostMemoryAlreadyRegisteredHostMemoryNotRegisteredOperatingSystemPeerAccessUnsupportedLaunchMaxDepthExceededLaunchFileScopedTexLaunchFileScopedSurfSyncDepthExceededLaunchPendingCountExceeded NotPermitted NotSupportedHardwareStackErrorIllegalInstructionMisalignedAddressInvalidAddressSpace InvalidPcIllegalAddress InvalidPtxInvalidGraphicsContextNvlinkUncorrectableJitCompilerNotFoundCooperativeLaunchTooLargeStartupFailureApiFailureBase cudaError requireSDK resultIfOk nothingIfOk$fDescribeStatus $fEnumStatus$fShowCUDAException$fExceptionCUDAException $fEqStatus $fShowStatusruntimeVersion driverVersionlibraryVersion FormatDescdepthkind FilterModePointLinear AddressModeWrapClampMirrorBorder FormatKindSignedUnsignedFloatNoneTexture normalised filtering addressingformatbindbind2D$fEnumFormatKind$fEnumAddressMode$fEnumFilterMode$fStorableFormatDesc$fStorableTexture$fEqFormatKind$fShowFormatKind$fEqAddressMode$fShowAddressMode$fEqFilterMode$fShowFilterMode$fEqFormatDesc$fShowFormatDesc $fEqTexture $fShowTexturecreatedestroyfinishedblock AttachFlagGlobalHostSingle AllocFlagPortable DeviceMapped WriteCombinedmallocHostArrayfreeHost mallocArray allocaArrayfreemallocManagedArray peekArraypeekArrayAsync peekArray2DpeekArray2DAsync peekListArray pokeArraypokeArrayAsync pokeArray2DpokeArray2DAsync pokeListArraycopyArrayAsync copyArray2DcopyArray2DAsyncnewListArrayLen newListArray withListArraywithListArrayLenmemset$fEnumAllocFlag$fEnumAttachFlag$fEnumCopyDirection $fEqAllocFlag$fShowAllocFlag$fBoundedAllocFlag$fEqAttachFlag$fShowAttachFlag$fBoundedAttachFlag$fEqCopyDirection$fShowCopyDirectionFunParamIArgFArgDArgVArg CacheConfigSharedL1Equal FunAttributesconstSizeByteslocalSizeBytessharedSizeBytesmaxKernelThreadsPerBlocknumRegsFun attributes setConfig setParamssetCacheConfiglaunch launchKernel$fStorableFunAttributes$fEnumCacheConfig$fShowFunAttributes$fEqCacheConfig$fShowCacheConfig elapsedTimequeryrecordwait OutOfMemoryNotInitialized Deinitialized InvalidImageInvalidContextContextAlreadyCurrent MapFailed UnmapFailed ArrayIsMapped AlreadyMappedNoBinaryForGPUAlreadyAcquired NotMappedNotMappedAsArrayNotMappedAsPointerEccUncorrectableContextAlreadyInUse InvalidPTX InvalidSource FileNotFound InvalidHandleNotFoundLaunchIncompatibleTexturingPrimaryContextActiveContextIsDestroyed InvalidPC LaunchFailed cudaErrorIOStreamWaitFlag WaitValueGeq WaitValueEq WaitValueAnd WaitValueNorWaitValueFlushStreamWriteFlagWriteValueDefaultWriteValueNoMemoryBarriercreateWithPriority getPrioritywrite$fEnumStreamWriteFlag$fEnumStreamWaitFlag$fEqStreamWriteFlag$fShowStreamWriteFlag$fBoundedStreamWriteFlag$fEqStreamWaitFlag$fShowStreamWaitFlag$fBoundedStreamWaitFlag OutputMode KeyValuePairCSV initialisestartstop$fEnumOutputMode$fEqOutputMode$fShowOutputModeIPCEventexportopen $fEqIPCEvent$fShowIPCEventDeviceResourcesthreadsPerWarp coresPerMP warpsPerMP threadsPerMPsharedMemPerMPmaxSharedMemPerBlockregFileSizePerMPmaxRegPerBlock regAllocUnitregAllocationStylemaxRegPerThreadsharedMemAllocUnit warpAllocUnitwarpRegAllocUnit AllocationWarpBlockPCIbusIDdeviceIDdomainIDDeviceProperties deviceNamecomputeCapabilitytotalGlobalMem totalConstMemsharedMemPerBlock regsPerBlockwarpSizemaxThreadsPerBlockmaxThreadsPerMultiProcessor maxBlockSize maxGridSizemaxTextureDim1DmaxTextureDim2DmaxTextureDim3D clockRatememPitch memBusWidth memClockRatetextureAlignment computeMode deviceOverlapconcurrentKernels eccEnabledasyncEngineCount cacheMemL2pciInfotccDriverEnabledkernelExecTimeoutEnabled integratedcanMapHostMemoryunifiedAddressingstreamPriorities globalL1Cache localL1Cache managedMemory multiGPUBoardmultiGPUBoardGroupIDCompute ComputeModeDefault ProhibitedExclusiveProcessdeviceResources$fDescribeComputeMode$fEnumComputeMode $fOrdCompute $fShowCompute$fEqComputeMode$fShowComputeMode $fEqCompute $fShowPCI$fShowDevicePropertiesLimit StacksizePrintffifosizeMallocheapsizeDevruntimesyncdepthDevruntimependinglaunchcountPeerFlag DeviceFlag ScheduleAuto ScheduleSpin ScheduleYieldMapHostLMemResizeToMaxDevicechoosegetcountpropssetsetFlagssetOrderreset accessibleremovegetLimitsetLimit$fStorableDeviceProperties$fEnumDeviceFlag$fEnumPeerFlag $fEnumLimit$fEqDeviceFlag$fShowDeviceFlag$fBoundedDeviceFlag $fEqLimit $fShowLimitJITOptionInternalJIT_MAX_REGISTERSJIT_THREADS_PER_BLOCK JIT_WALL_TIMEJIT_INFO_LOG_BUFFERJIT_INFO_LOG_BUFFER_SIZE_BYTESJIT_ERROR_LOG_BUFFERJIT_ERROR_LOG_BUFFER_SIZE_BYTESJIT_OPTIMIZATION_LEVELJIT_TARGET_FROM_CUCONTEXT JIT_TARGETJIT_FALLBACK_STRATEGYJIT_GENERATE_DEBUG_INFOJIT_LOG_VERBOSEJIT_GENERATE_LINE_INFOJIT_CACHE_MODEJIT_NEW_SM3X_OPTJIT_FAST_COMPILEJIT_NUM_OPTIONS JITInputTypeCubinPTX FatbinaryObjectLibraryCuJitNumInputTypes JITFallback PreferPTX PreferBinary JITTarget Compute20 Compute21 Compute30 Compute32 Compute35 Compute37 Compute50 Compute52 Compute53 Compute60 Compute61 Compute62 Compute70 Compute73 Compute75 JITResultjitTime jitInfoLog jitModule JITOption MaxRegistersThreadsPerBlockOptimisationLevelTargetFallbackStrategyGenerateDebugInfoGenerateLineInfoVerboseModule useModuleloadFileloadDataloadDataFromPtr loadDataExloadDataFromPtrExunloadjitOptionUnpackjitTargetOfCompute$fEnumJITTarget$fEnumJITFallback$fEnumJITInputType$fEnumJITOptionInternal $fEqModule $fShowModule$fShowJITResult $fEqJITTarget$fShowJITTarget$fEqJITFallback$fShowJITFallback$fShowJITOption$fEqJITInputType$fShowJITInputType$fEqJITOptionInternal$fShowJITOptionInternal LinkStatecompleteaddFileaddDataaddDataFromPtr$fShowLinkStateInitFlagDeviceAttributeMaxThreadsPerBlock MaxBlockDimX MaxBlockDimY MaxBlockDimZ MaxGridDimX MaxGridDimY MaxGridDimZMaxSharedMemoryPerBlockSharedMemoryPerBlockTotalConstantMemoryWarpSizeMaxPitchMaxRegistersPerBlockRegistersPerBlock ClockRateTextureAlignment GpuOverlapMultiprocessorCountKernelExecTimeout IntegratedCanMapHostMemoryMaximumTexture1dWidthMaximumTexture2dWidthMaximumTexture2dHeightMaximumTexture3dWidthMaximumTexture3dHeightMaximumTexture3dDepthMaximumTexture2dLayeredWidthMaximumTexture2dArrayWidthMaximumTexture2dLayeredHeightMaximumTexture2dArrayHeightMaximumTexture2dLayeredLayersMaximumTexture2dArrayNumslicesSurfaceAlignmentConcurrentKernels EccEnabledPciBusId PciDeviceId TccDriverMemoryClockRateGlobalMemoryBusWidth L2CacheSizeMaxThreadsPerMultiprocessorAsyncEngineCountUnifiedAddressingMaximumTexture1dLayeredWidthMaximumTexture1dLayeredLayersCanTex2dGatherMaximumTexture2dGatherWidthMaximumTexture2dGatherHeightMaximumTexture3dWidthAlternateMaximumTexture3dHeightAlternateMaximumTexture3dDepthAlternate PciDomainIdTexturePitchAlignmentMaximumTexturecubemapWidth!MaximumTexturecubemapLayeredWidth"MaximumTexturecubemapLayeredLayersMaximumSurface1dWidthMaximumSurface2dWidthMaximumSurface2dHeightMaximumSurface3dWidthMaximumSurface3dHeightMaximumSurface3dDepthMaximumSurface1dLayeredWidthMaximumSurface1dLayeredLayersMaximumSurface2dLayeredWidthMaximumSurface2dLayeredHeightMaximumSurface2dLayeredLayersMaximumSurfacecubemapWidth!MaximumSurfacecubemapLayeredWidth"MaximumSurfacecubemapLayeredLayersMaximumTexture1dLinearWidthMaximumTexture2dLinearWidthMaximumTexture2dLinearHeightMaximumTexture2dLinearPitchMaximumTexture2dMipmappedWidthMaximumTexture2dMipmappedHeightComputeCapabilityMajorComputeCapabilityMinorMaximumTexture1dMipmappedWidthStreamPrioritiesSupportedGlobalL1CacheSupportedLocalL1CacheSupported MaxSharedMemoryPerMultiprocessorMaxRegistersPerMultiprocessor ManagedMemory MultiGpuBoardMultiGpuBoardGroupIdHostNativeAtomicSupported SingleToDoublePrecisionPerfRatioPageableMemoryAccessConcurrentManagedAccessComputePreemptionSupported!CanUseHostPointerForRegisteredMemCanUseStreamMemOpsCanUse64BitStreamMemOpsCanUseStreamWaitValueNorCooperativeMultiDeviceLaunchMaxSharedMemoryPerBlockOptinCU_DEVICE_ATTRIBUTE_MAX useDevice capabilitydevicenametotalMem$fEnumDeviceAttribute$fEnumInitFlag $fEqDevice $fShowDevice$fEqDeviceAttribute$fShowDeviceAttribute ContextFlag SchedAuto SchedSpin SchedYieldSchedBlockingSync SchedMaskLmemResizeToMax FlagsMaskContext useContextattachdetachpoppush$fEnumContextFlag $fEqContext $fShowContext$fEqContextFlag$fShowContextFlag$fBoundedContextFlagCuMemAttachGlobalCuMemAttachHostCuMemAttachSinglemallocHostForeignPtr registerArrayunregisterArrayprefetchArrayAsync copyArrayPeercopyArrayPeerAsync memsetAsync getDevicePtr getBasePtr getMemInfopeekDeviceHandleuseDeviceHandleFormatWord8Word16Word32Int8Int16Int32HalfReadMode ReadAsIntegerNormalizedCoordinatesSRGB useTexturegetAddressMode getFilterMode getFormatsetAddressMode setFilterMode setReadMode setFormatpeekTex$fEnumReadMode $fEnumFormat $fEqReadMode$fShowReadMode $fEqFormat $fShowFormatIPCFlagLazyEnablePeerAccess IPCDevicePtrclose $fEnumIPCFlag $fEqIPCFlag $fShowIPCFlag$fBoundedIPCFlag$fEqIPCDevicePtr$fShowIPCDevicePtrstatussetupreleaseretain PeerAttributePerformanceRankAccessSupportedNativeAtomicSupported getAttribute$fEnumPeerAttribute$fEqPeerAttribute$fShowPeerAttribute SharedMemDefaultBankSizeFourByteBankSizeEightByteBankSizeCache PreferNone PreferSharedPreferL1 PreferEqual StackSizePrintfFifoSizeMallocHeapSizeDevRuntimeSyncDepthDevRuntimePendingLaunchCountMaxgetFlagsgetCachesetCache getSharedMem $fEnumCache$fEnumSharedMem $fEqCache $fShowCache $fEqSharedMem$fShowSharedMemAdvice SetReadMostlyUnsetReadMostlySetPreferredLocationUnsetPreferredLocation SetAccessedByUnsetAccessedBy MemoryType HostMemory DeviceMemory ArrayMemory UnifiedMemoryPointerAttributes ptrContext ptrDeviceptrHost ptrBufferID ptrMemoryType ptrSyncMemops ptrIsManaged getAttributes setSyncMemopsadvise$fEnumMemoryType$fEnumPointerAttribute $fEnumAdvice$fEqMemoryType$fShowMemoryType$fBoundedMemoryType$fShowPointerAttributes$fEqPointerAttribute$fShowPointerAttribute$fBoundedPointerAttribute $fEqAdvice $fShowAdvice$fBoundedAdvice FunAttributeMaxKernelThreadsPerBlockSharedSizeBytesConstSizeBytesLocalSizeBytesNumRegs PtxVersion BinaryVersion CacheModeCaMaxDynamicSharedSizeBytesPreferredSharedMemoryCarveoutCU_FUNC_ATTRIBUTE_MAXrequiressetSharedMemConfigFun launchKernel'launchKernelCooperative setBlockShape setSharedSize$fEnumFunAttribute$fStorableFunParam$fEqFunAttribute$fShowFunAttributegetFungetPtrgetTex Occupancy activeThreadsactiveThreadBlocks activeWarps occupancy100 occupancyoptimalBlockSizeoptimalBlockSizeOfincPow2decPow2decWarpincWarpmaxResidentBlocks $fEqOccupancy$fOrdOccupancy$fShowOccupancy nothingIfNullextractBitMaskscIntConv cFloatConv cFromBoolghc-prim GHC.TypesBoolcToBoolcToEnum cFromEnumwithCStringLenIntConvpeekCStringLenIntConv withIntConv withFloatConv peekIntConv peekFloatConvwithBoolpeekBool peekArrayWithwithEnumpeekEnum nothingIfcombineBitMaskscontainsBitMaskbaseGHC.PtrPtrDescribeTextureReference peekStream memcpyAsyncmemcpy2D memcpy2DAsync Data.Tuplefstmemcpy CopyDirection HostToHost HostToDevice DeviceToHostDeviceToDeviceIO useIPCEventbytestring-0.10.8.2Data.ByteString.Internal ByteString useLinkStateGHC.ForeignPtr ForeignPtruseIPCDevicePtrPointerAttributeAttributeContextAttributeMemoryTypeAttributeDevicePointerAttributeHostPointerAttributeP2pTokensAttributeSyncMemopsAttributeBufferIdAttributeIsManageduseFun