cuda-0.9.0.2: FFI binding to the CUDA interface for programming NVIDIA GPUs

Copyright[2009..2017] Trevor L. McDonell
LicenseBSD
Safe HaskellNone
LanguageHaskell98

Foreign.CUDA.Driver.Exec

Contents

Description

Kernel execution control for low-level driver interface

Synopsis

Kernel Execution

newtype Fun Source #

A __global__ device function

Constructors

Fun (Ptr ()) 

data FunParam where Source #

Kernel function parameters

Constructors

IArg :: !Int32 -> FunParam 
FArg :: !Float -> FunParam 
VArg :: Storable a => !a -> FunParam 

setCacheConfigFun :: Fun -> Cache -> IO () Source #

On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.

Switching between configuration modes may insert a device-side synchronisation point for streamed kernel launches.

Requires CUDA-3.0.

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g40f8c11e81def95dc0072a375f965681

setSharedMemConfigFun :: Fun -> SharedMem -> IO () Source #

Set the shared memory configuration of a device function.

On devices with configurable shared memory banks, this will force all subsequent launches of the given device function to use the specified shared memory bank size configuration. On launch of the function, the shared memory configuration of the device will be temporarily changed if needed to suit the function configuration. Changes in shared memory configuration may introduction a device side synchronisation between kernel launches.

Any per-function configuration specified by setSharedMemConfig will override the context-wide configuration set with setSharedMem.

Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.

This function will do nothing on devices with fixed shared memory bank size.

Requires CUDA-5.0.

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g430b913f24970e63869635395df6d9f5

launchKernel Source #

Arguments

:: Fun

function to execute

-> (Int, Int, Int)

block grid dimension

-> (Int, Int, Int)

thread block shape

-> Int

shared memory (bytes)

-> Maybe Stream

(optional) stream to execute in

-> [FunParam]

list of function parameters

-> IO () 

Invoke a kernel on a (gx * gy * gz) grid of blocks, where each block contains (tx * ty * tz) threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific Stream.

In launchKernel, the number of kernel parameters and their offsets and sizes do not need to be specified, as this information is retrieved directly from the kernel's image. This requires the kernel to have been compiled with toolchain version 3.2 or later.

The alternative launchKernel' will pass the arguments in directly, requiring the application to know the size and alignment/padding of each kernel parameter.

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15

launchKernel' Source #

Arguments

:: Fun

function to execute

-> (Int, Int, Int)

block grid dimension

-> (Int, Int, Int)

thread block shape

-> Int

shared memory (bytes)

-> Maybe Stream

(optional) stream to execute in

-> [FunParam]

list of function parameters

-> IO () 

Invoke a kernel on a (gx * gy * gz) grid of blocks, where each block contains (tx * ty * tz) threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific Stream.

In launchKernel, the number of kernel parameters and their offsets and sizes do not need to be specified, as this information is retrieved directly from the kernel's image. This requires the kernel to have been compiled with toolchain version 3.2 or later.

The alternative launchKernel' will pass the arguments in directly, requiring the application to know the size and alignment/padding of each kernel parameter.

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1gb8f3dc3031b40da29d5f9a7139e52e15

launchKernelCooperative Source #

Arguments

:: Fun

function to execute

-> (Int, Int, Int)

block grid dimension

-> (Int, Int, Int)

thread block shape

-> Int

shared memory (bytes)

-> Maybe Stream

(optional) stream to execute in

-> [FunParam]

list of function parameters

-> IO () 

Invoke a kernel on a (gx * gy * gz) grid of blocks, where each block contains (tx * ty * tz) threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific stream.

The thread blocks can cooperate and synchronise as they execute.

The device on which this kernel is invoked must have attribute CooperativeLaunch.

The total number of blocks launched can not exceed the maximum number of active thread blocks per multiprocessor (threadBlocksPerMP), multiplied by the number of multiprocessors (multiProcessorCount).

The kernel can not make use of dynamic parallelism.

http://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__EXEC.html#group__CUDA__EXEC_1g06d753134145c4584c0c62525c1894cb

Requires CUDA-9.0

since 0.9.0.0

setBlockShape :: Fun -> (Int, Int, Int) -> IO () Source #

Deprecated: use launchKernel instead

Specify the (x,y,z) dimensions of the thread blocks that are created when the given kernel function is launched.

setSharedSize :: Fun -> Integer -> IO () Source #

Deprecated: use launchKernel instead

Set the number of bytes of dynamic shared memory to be available to each thread block when the function is launched

setParams :: Fun -> [FunParam] -> IO () Source #

Deprecated: use launchKernel instead

Set the parameters that will specified next time the kernel is invoked

launch :: Fun -> (Int, Int) -> Maybe Stream -> IO () Source #

Deprecated: use launchKernel instead

Invoke the kernel on a size (w,h) grid of blocks. Each block contains the number of threads specified by a previous call to setBlockShape. The launch may also be associated with a specific Stream.