cuda- FFI binding to the CUDA interface for programming NVIDIA GPUs

Copyright[2009..2014] Trevor L. McDonell
Safe HaskellNone




Kernel execution control for C-for-CUDA runtime interface


Kernel Execution

type Fun = FunPtr () Source

A global device function.

Note that the use of a string naming a function was deprecated in CUDA 4.1 and removed in CUDA 5.0.

data FunAttributes Source




constSizeBytes :: !Int64
localSizeBytes :: !Int64
sharedSizeBytes :: !Int64
maxKernelThreadsPerBlock :: !Int

maximum block size that can be successively launched (based on register usage)

numRegs :: !Int

number of registers required for each thread

data FunParam where Source

Kernel function parameters. Doubles will be converted to an internal float representation on devices that do not support doubles natively.


IArg :: !Int -> FunParam 
FArg :: !Float -> FunParam 
DArg :: !Double -> FunParam 
VArg :: Storable a => !a -> FunParam 

data CacheConfig Source

Cache configuration preference



attributes :: Fun -> IO FunAttributes Source

Obtain the attributes of the named global device function. This itemises the requirements to successfully launch the given kernel.

setConfig Source


:: (Int, Int)

grid dimensions

-> (Int, Int, Int)

block dimensions

-> Int64

shared memory per block (bytes)

-> Maybe Stream

associated processing stream

-> IO () 

Specify the grid and block dimensions for a device call. Used in conjunction with setParams, this pushes data onto the execution stack that will be popped when a function is launched.

setParams :: [FunParam] -> IO () Source

Set the argument parameters that will be passed to the next kernel invocation. This is used in conjunction with setConfig to control kernel execution.

setCacheConfig :: Fun -> CacheConfig -> IO () Source

On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.

Switching between configuration modes may insert a device-side synchronisation point for streamed kernel launches

launch :: Fun -> IO () Source

Invoke the global kernel function on the device. This must be preceded by a call to setConfig and (if appropriate) setParams.

launchKernel Source


:: Fun

Device function symbol

-> (Int, Int)

grid dimensions

-> (Int, Int, Int)

thread block shape

-> Int64

shared memory per block (bytes)

-> Maybe Stream

(optional) execution stream

-> [FunParam] 
-> IO () 

Invoke a kernel on a (gx * gy) grid of blocks, where each block contains (tx * ty * tz) threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific Stream.