cuda-0.6.0.1: FFI binding to the CUDA interface for programming NVIDIA GPUs

Copyright(c) [2009..2012] Trevor L. McDonell
LicenseBSD
Safe HaskellNone
LanguageHaskell98

Foreign.CUDA.Runtime.Exec

Contents

Description

Kernel execution control for C-for-CUDA runtime interface

Synopsis

Kernel Execution

type Fun = FunPtr () Source

A global device function.

Note that the use of a string naming a function was deprecated in CUDA 4.1 and removed in CUDA 5.0.

data FunAttributes Source

Constructors

FunAttributes 

Fields

constSizeBytes :: !Int64
 
localSizeBytes :: !Int64
 
sharedSizeBytes :: !Int64
 
maxKernelThreadsPerBlock :: !Int

maximum block size that can be successively launched (based on register usage)

numRegs :: !Int

number of registers required for each thread

data FunParam where Source

Kernel function parameters. Doubles will be converted to an internal float representation on devices that do not support doubles natively.

Constructors

IArg :: !Int -> FunParam 
FArg :: !Float -> FunParam 
DArg :: !Double -> FunParam 
VArg :: Storable a => !a -> FunParam 

data CacheConfig Source

Cache configuration preference

Constructors

None 
Shared 
L1 
Equal 

attributes :: Fun -> IO FunAttributes Source

Obtain the attributes of the named global device function. This itemises the requirements to successfully launch the given kernel.

setConfig Source

Arguments

:: (Int, Int)

grid dimensions

-> (Int, Int, Int)

block dimensions

-> Int64

shared memory per block (bytes)

-> Maybe Stream

associated processing stream

-> IO () 

Specify the grid and block dimensions for a device call. Used in conjunction with setParams, this pushes data onto the execution stack that will be popped when a function is launched.

setParams :: [FunParam] -> IO () Source

Set the argument parameters that will be passed to the next kernel invocation. This is used in conjunction with setConfig to control kernel execution.

setCacheConfig :: Fun -> CacheConfig -> IO () Source

On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.

Switching between configuration modes may insert a device-side synchronisation point for streamed kernel launches

launch :: Fun -> IO () Source

Invoke the global kernel function on the device. This must be preceded by a call to setConfig and (if appropriate) setParams.

launchKernel Source

Arguments

:: Fun

Device function symbol

-> (Int, Int)

grid dimensions

-> (Int, Int, Int)

thread block shape

-> Int64

shared memory per block (bytes)

-> Maybe Stream

(optional) execution stream

-> [FunParam] 
-> IO () 

Invoke a kernel on a (gx * gy) grid of blocks, where each block contains (tx * ty * tz) threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific Stream.