Copyright	[2009..2014] Trevor L. McDonell
License	BSD
Safe Haskell	None
Language	Haskell98

Foreign.CUDA.Runtime.Exec

Contents

Kernel Execution

Description

Kernel execution control for C-for-CUDA runtime interface

Synopsis

Kernel Execution

type Fun = FunPtr () Source

A global device function.

Note that the use of a string naming a function was deprecated in CUDA 4.1 and removed in CUDA 5.0.

data FunAttributes Source

Constructors

FunAttributes
Fields constSizeBytes :: !Int64 localSizeBytes :: !Int64 sharedSizeBytes :: !Int64 maxKernelThreadsPerBlock :: !Int maximum block size that can be successively launched (based on register usage) numRegs :: !Int number of registers required for each thread

Instances

Show FunAttributes
Storable FunAttributes

data FunParam where Source

Kernel function parameters. Doubles will be converted to an internal float representation on devices that do not support doubles natively.

Constructors

IArg :: !Int -> FunParam
FArg :: !Float -> FunParam
DArg :: !Double -> FunParam
VArg :: Storable a => !a -> FunParam

data CacheConfig Source

Cache configuration preference

Constructors

None
Shared
L1
Equal

Instances

Enum CacheConfig
Eq CacheConfig
Show CacheConfig

attributes :: Fun -> IO FunAttributes Source

Obtain the attributes of the named global device function. This itemises the requirements to successfully launch the given kernel.

setConfig Source

Arguments

:: (Int, Int)	grid dimensions
-> (Int, Int, Int)	block dimensions
-> Int64	shared memory per block (bytes)
-> Maybe Stream	associated processing stream
-> IO ()

Specify the grid and block dimensions for a device call. Used in conjunction with setParams, this pushes data onto the execution stack that will be popped when a function is launched.

setParams :: [FunParam] -> IO () Source

Set the argument parameters that will be passed to the next kernel invocation. This is used in conjunction with setConfig to control kernel execution.

setCacheConfig :: Fun -> CacheConfig -> IO () Source

On devices where the L1 cache and shared memory use the same hardware resources, this sets the preferred cache configuration for the given device function. This is only a preference; the driver is free to choose a different configuration as required to execute the function.

Switching between configuration modes may insert a device-side synchronisation point for streamed kernel launches

launch :: Fun -> IO () Source

Invoke the global kernel function on the device. This must be preceded by a call to setConfig and (if appropriate) setParams.

launchKernel Source

Arguments

:: Fun	Device function symbol
-> (Int, Int)	grid dimensions
-> (Int, Int, Int)	thread block shape
-> Int64	shared memory per block (bytes)
-> Maybe Stream	(optional) execution stream
-> [FunParam]
-> IO ()

Invoke a kernel on a (gx * gy) grid of blocks, where each block contains (tx * ty * tz) threads and has access to a given number of bytes of shared memory. The launch may also be associated with a specific Stream.