cuda-0.6.6.1: FFI binding to the CUDA interface for programming NVIDIA GPUs

Copyright[2009..2014] Trevor L. McDonell
LicenseBSD
Safe HaskellNone
LanguageHaskell98

Foreign.CUDA.Runtime.Marshal

Contents

Description

Memory management for CUDA devices

Synopsis

Host Allocation

data AllocFlag Source

Options for host allocation

mallocHostArray :: Storable a => [AllocFlag] -> Int -> IO (HostPtr a) Source

Allocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type. The runtime system automatically accelerates calls to functions such as peekArrayAsync and pokeArrayAsync that refer to page-locked memory.

Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange

freeHost :: HostPtr a -> IO () Source

Free page-locked host memory previously allocated with mallecHost

Device Allocation

mallocArray :: Storable a => Int -> IO (DevicePtr a) Source

Allocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitable aligned, and not cleared.

allocaArray :: Storable a => Int -> (DevicePtr a -> IO b) -> IO b Source

Execute a computation, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.

Note that kernel launches can be asynchronous, so you may need to add a synchronisation point at the end of the computation.

free :: DevicePtr a -> IO () Source

Free previously allocated memory on the device

Unified Memory Allocation

data AttachFlag Source

Options for unified memory allocations

Constructors

Global 
Host 
Single 

mallocManagedArray :: Storable a => [AttachFlag] -> Int -> IO (DevicePtr a) Source

Allocates memory that will be automatically managed by the Unified Memory system

Marshalling

peekArray :: Storable a => Int -> DevicePtr a -> Ptr a -> IO () Source

Copy a number of elements from the device to host memory. This is a synchronous operation.

peekArrayAsync :: Storable a => Int -> DevicePtr a -> HostPtr a -> Maybe Stream -> IO () Source

Copy memory from the device asynchronously, possibly associated with a particular stream. The destination memory must be page locked.

peekArray2D Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> Ptr a

destination array

-> Int

destination array width

-> IO () 

Copy a 2D memory area from the device to the host. This is a synchronous operation.

peekArray2DAsync Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> HostPtr a

destination array

-> Int

destination array width

-> Maybe Stream 
-> IO () 

Copy a 2D memory area from the device to the host asynchronously, possibly associated with a particular stream. The destination array must be page locked.

peekListArray :: Storable a => Int -> DevicePtr a -> IO [a] Source

Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a list

pokeArray :: Storable a => Int -> Ptr a -> DevicePtr a -> IO () Source

Copy a number of elements onto the device. This is a synchronous operation.

pokeArrayAsync :: Storable a => Int -> HostPtr a -> DevicePtr a -> Maybe Stream -> IO () Source

Copy memory onto the device asynchronously, possibly associated with a particular stream. The source memory must be page-locked.

pokeArray2D Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> Ptr a

source array

-> Int

source array width

-> DevicePtr a

destination array

-> Int

destination array width

-> IO () 

Copy a 2D memory area onto the device. This is a synchronous operation.

pokeArray2DAsync Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> HostPtr a

source array

-> Int

source array width

-> DevicePtr a

destination array

-> Int

destination array width

-> Maybe Stream 
-> IO () 

Copy a 2D memory area onto the device asynchronously, possibly associated with a particular stream. The source array must be page locked.

pokeListArray :: Storable a => [a] -> DevicePtr a -> IO () Source

Write a list of storable elements into a device array. The array must be sufficiently large to hold the entire list. This requires two marshalling operations

copyArray :: Storable a => Int -> DevicePtr a -> DevicePtr a -> IO () Source

Copy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to host, but will not overlap other device operations.

copyArrayAsync :: Storable a => Int -> DevicePtr a -> DevicePtr a -> Maybe Stream -> IO () Source

Copy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, and may be associated with a particular stream.

copyArray2D Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> DevicePtr a

destination array

-> Int

destination array width

-> IO () 

Copy a 2D memory area from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will not overlap other device operations.

copyArray2DAsync Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> DevicePtr a

destination array

-> Int

destination array width

-> Maybe Stream 
-> IO () 

Copy a 2D memory area from the first device array (source) to the second device array (destination). The copied areas may not overlay. This operation is asynchronous with respect to the host, and may be associated with a particular stream.

Combined Allocation and Marshalling

newListArray :: Storable a => [a] -> IO (DevicePtr a) Source

Write a list of storable elements into a newly allocated device array. This is newListArrayLen composed with fst.

newListArrayLen :: Storable a => [a] -> IO (DevicePtr a, Int) Source

Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two copy operations: firstly from a Haskell list into a heap-allocated array, and from there into device memory. The array should be freed when no longer required.

withListArray :: Storable a => [a] -> (DevicePtr a -> IO b) -> IO b Source

Temporarily store a list of elements into a newly allocated device array. An IO action is applied to the array, the result of which is returned. Similar to newListArray, this requires two marshalling operations of the data.

As with allocaArray, the memory is freed once the action completes, so you should not return the pointer from the action, and be sure that any asynchronous operations (such as kernel execution) have completed.

withListArrayLen :: Storable a => [a] -> (Int -> DevicePtr a -> IO b) -> IO b Source

A variant of withListArray which also supplies the number of elements in the array to the applied function

Utility

memset Source

Arguments

:: DevicePtr a

The device memory

-> Int64

Number of bytes

-> Int8

Value to set for each byte

-> IO () 

Initialise device memory to a given 8-bit value