cuda-0.7.5.2: FFI binding to the CUDA interface for programming NVIDIA GPUs

Copyright[2009..2014] Trevor L. McDonell
LicenseBSD
Safe HaskellNone
LanguageHaskell98

Foreign.CUDA.Runtime.Marshal

Contents

Description

Memory management for CUDA devices

Synopsis

Host Allocation

mallocHostArray :: Storable a => [AllocFlag] -> Int -> IO (HostPtr a) Source #

Allocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type. The runtime system automatically accelerates calls to functions such as peekArrayAsync and pokeArrayAsync that refer to page-locked memory.

Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange

freeHost :: HostPtr a -> IO () Source #

Free page-locked host memory previously allocated with mallecHost

Device Allocation

mallocArray :: Storable a => Int -> IO (DevicePtr a) Source #

Allocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitable aligned, and not cleared.

allocaArray :: Storable a => Int -> (DevicePtr a -> IO b) -> IO b Source #

Execute a computation, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.

Note that kernel launches can be asynchronous, so you may need to add a synchronisation point at the end of the computation.

free :: DevicePtr a -> IO () Source #

Free previously allocated memory on the device

Unified Memory Allocation

mallocManagedArray :: Storable a => [AttachFlag] -> Int -> IO (DevicePtr a) Source #

Allocates memory that will be automatically managed by the Unified Memory system

Marshalling

peekArray :: Storable a => Int -> DevicePtr a -> Ptr a -> IO () Source #

Copy a number of elements from the device to host memory. This is a synchronous operation.

peekArrayAsync :: Storable a => Int -> DevicePtr a -> HostPtr a -> Maybe Stream -> IO () Source #

Copy memory from the device asynchronously, possibly associated with a particular stream. The destination memory must be page locked.

peekArray2D Source #

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> Ptr a

destination array

-> Int

destination array width

-> IO () 

Copy a 2D memory area from the device to the host. This is a synchronous operation.

peekArray2DAsync Source #

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> HostPtr a

destination array

-> Int

destination array width

-> Maybe Stream 
-> IO () 

Copy a 2D memory area from the device to the host asynchronously, possibly associated with a particular stream. The destination array must be page locked.

peekListArray :: Storable a => Int -> DevicePtr a -> IO [a] Source #

Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a list

pokeArray :: Storable a => Int -> Ptr a -> DevicePtr a -> IO () Source #

Copy a number of elements onto the device. This is a synchronous operation.

pokeArrayAsync :: Storable a => Int -> HostPtr a -> DevicePtr a -> Maybe Stream -> IO () Source #

Copy memory onto the device asynchronously, possibly associated with a particular stream. The source memory must be page-locked.

pokeArray2D Source #

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> Ptr a

source array

-> Int

source array width

-> DevicePtr a

destination array

-> Int

destination array width

-> IO () 

Copy a 2D memory area onto the device. This is a synchronous operation.

pokeArray2DAsync Source #

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> HostPtr a

source array

-> Int

source array width

-> DevicePtr a

destination array

-> Int

destination array width

-> Maybe Stream 
-> IO () 

Copy a 2D memory area onto the device asynchronously, possibly associated with a particular stream. The source array must be page locked.

pokeListArray :: Storable a => [a] -> DevicePtr a -> IO () Source #

Write a list of storable elements into a device array. The array must be sufficiently large to hold the entire list. This requires two marshalling operations

copyArray :: Storable a => Int -> DevicePtr a -> DevicePtr a -> IO () Source #

Copy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to host, but will not overlap other device operations.

copyArrayAsync :: Storable a => Int -> DevicePtr a -> DevicePtr a -> Maybe Stream -> IO () Source #

Copy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, and may be associated with a particular stream.

copyArray2D Source #

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> DevicePtr a

destination array

-> Int

destination array width

-> IO () 

Copy a 2D memory area from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will not overlap other device operations.

copyArray2DAsync Source #

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> DevicePtr a

destination array

-> Int

destination array width

-> Maybe Stream 
-> IO () 

Copy a 2D memory area from the first device array (source) to the second device array (destination). The copied areas may not overlay. This operation is asynchronous with respect to the host, and may be associated with a particular stream.

Combined Allocation and Marshalling

newListArray :: Storable a => [a] -> IO (DevicePtr a) Source #

Write a list of storable elements into a newly allocated device array. This is newListArrayLen composed with fst.

newListArrayLen :: Storable a => [a] -> IO (DevicePtr a, Int) Source #

Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two copy operations: firstly from a Haskell list into a heap-allocated array, and from there into device memory. The array should be freed when no longer required.

withListArray :: Storable a => [a] -> (DevicePtr a -> IO b) -> IO b Source #

Temporarily store a list of elements into a newly allocated device array. An IO action is applied to the array, the result of which is returned. Similar to newListArray, this requires two marshalling operations of the data.

As with allocaArray, the memory is freed once the action completes, so you should not return the pointer from the action, and be sure that any asynchronous operations (such as kernel execution) have completed.

withListArrayLen :: Storable a => [a] -> (Int -> DevicePtr a -> IO b) -> IO b Source #

A variant of withListArray which also supplies the number of elements in the array to the applied function

Utility

memset Source #

Arguments

:: DevicePtr a

The device memory

-> Int64

Number of bytes

-> Int8

Value to set for each byte

-> IO () 

Initialise device memory to a given 8-bit value