cuda-0.6.5.0: FFI binding to the CUDA interface for programming NVIDIA GPUs

Copyright(c) [2009..2012] Trevor L. McDonell
LicenseBSD
Safe HaskellNone
LanguageHaskell98

Foreign.CUDA.Driver.Marshal

Contents

Description

Memory management for low-level driver interface

Synopsis

Host Allocation

data AllocFlag Source

Options for host allocation

mallocHostArray :: Storable a => [AllocFlag] -> Int -> IO (HostPtr a) Source

Allocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type.

Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange.

freeHost :: HostPtr a -> IO () Source

Free a section of page-locked host memory

registerArray :: Storable a => [AllocFlag] -> Int -> Ptr a -> IO (HostPtr a) Source

Page-locks the specified array (on the host) and maps it for the device(s) as specified by the given allocation flags. Subsequently, the memory is accessed directly by the device so can be read and written with much higher bandwidth than pageable memory that has not been registered. The memory range is added to the same tracking mechanism as mallocHostArray to automatically accelerate calls to functions such as pokeArray.

Note that page-locking excessive amounts of memory may degrade system performance, since it reduces the amount of pageable memory available. This is best used sparingly to allocate staging areas for data exchange.

This function is not yet implemented on Mac OS X. Requires cuda-4.0.

unregisterArray :: HostPtr a -> IO (Ptr a) Source

Unmaps the memory from the given pointer, and makes it pageable again.

This function is not yet implemented on Mac OS X. Requires cuda-4.0.

Device Allocation

mallocArray :: Storable a => Int -> IO (DevicePtr a) Source

Allocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitably aligned for any type, and is not cleared.

allocaArray :: Storable a => Int -> (DevicePtr a -> IO b) -> IO b Source

Execute a computation on the device, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.

Note that kernel launches can be asynchronous, so you may want to add a synchronisation point using sync as part of the computation.

free :: DevicePtr a -> IO () Source

Release a section of device memory

Unified Memory Allocation

data AttachFlag Source

Options for unified memory allocations

Constructors

Global 
Host 
Single 

mallocManagedArray :: Storable a => [AttachFlag] -> Int -> IO (DevicePtr a) Source

Allocates memory that will be automatically managed by the Unified Memory system

Marshalling

peekArray :: Storable a => Int -> DevicePtr a -> Ptr a -> IO () Source

Copy a number of elements from the device to host memory. This is a synchronous operation

peekArrayAsync :: Storable a => Int -> DevicePtr a -> HostPtr a -> Maybe Stream -> IO () Source

Copy memory from the device asynchronously, possibly associated with a particular stream. The destination host memory must be page-locked.

peekArray2D Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> Int

source x-coordinate

-> Int

source y-coordinate

-> Ptr a

destination array

-> Int

destination array width

-> Int

destination x-coordinate

-> Int

destination y-coordinate

-> IO () 

Copy a 2D array from the device to the host.

peekArray2DAsync Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> Int

source x-coordinate

-> Int

source y-coordinate

-> HostPtr a

destination array

-> Int

destination array width

-> Int

destination x-coordinate

-> Int

destination y-coordinate

-> Maybe Stream

stream to associate to

-> IO () 

Copy a 2D array from the device to the host asynchronously, possibly associated with a particular execution stream. The destination host memory must be page-locked.

peekListArray :: Storable a => Int -> DevicePtr a -> IO [a] Source

Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a list.

pokeArray :: Storable a => Int -> Ptr a -> DevicePtr a -> IO () Source

Copy a number of elements onto the device. This is a synchronous operation

pokeArrayAsync :: Storable a => Int -> HostPtr a -> DevicePtr a -> Maybe Stream -> IO () Source

Copy memory onto the device asynchronously, possibly associated with a particular stream. The source host memory must be page-locked.

pokeArray2D Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> Ptr a

source array

-> Int

source array width

-> Int

source x-coordinate

-> Int

source y-coordinate

-> DevicePtr a

destination array

-> Int

destination array width

-> Int

destination x-coordinate

-> Int

destination y-coordinate

-> IO () 

Copy a 2D array from the host to the device.

pokeArray2DAsync Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> HostPtr a

source array

-> Int

source array width

-> Int

source x-coordinate

-> Int

source y-coordinate

-> DevicePtr a

destination array

-> Int

destination array width

-> Int

destination x-coordinate

-> Int

destination y-coordinate

-> Maybe Stream

stream to associate to

-> IO () 

Copy a 2D array from the host to the device asynchronously, possibly associated with a particular execution stream. The source host memory must be page-locked.

pokeListArray :: Storable a => [a] -> DevicePtr a -> IO () Source

Write a list of storable elements into a device array. The device array must be sufficiently large to hold the entire list. This requires two marshalling operations.

copyArray :: Storable a => Int -> DevicePtr a -> DevicePtr a -> IO () Source

Copy the given number of elements from the first device array (source) to the second device (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution.

copyArrayAsync :: Storable a => Int -> DevicePtr a -> DevicePtr a -> Maybe Stream -> IO () Source

Copy the given number of elements from the first device array (source) to the second device array (destination). The copied areas may not overlap. The operation is asynchronous with respect to the host, and can be asynchronous to other device operations by associating it with a particular stream.

copyArray2D Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> Int

source x-coordinate

-> Int

source y-coordinate

-> DevicePtr a

destination array

-> Int

destination array width

-> Int

destination x-coordinate

-> Int

destination y-coordinate

-> IO () 

Copy a 2D array from the first device array (source) to the second device array (destination). The copied areas must not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution.

copyArray2DAsync Source

Arguments

:: Storable a 
=> Int

width to copy (elements)

-> Int

height to copy (elements)

-> DevicePtr a

source array

-> Int

source array width

-> Int

source x-coordinate

-> Int

source y-coordinate

-> DevicePtr a

destination array

-> Int

destination array width

-> Int

destination x-coordinate

-> Int

destination y-coordinate

-> Maybe Stream

stream to associate to

-> IO () 

Copy a 2D array from the first device array (source) to the second device array (destination). The copied areas may not overlap. The operation is asynchronous with respect to the host, and can be asynchronous to other device operations by associating it with a particular execution stream.

copyArrayPeer Source

Arguments

:: Storable a 
=> Int

number of array elements

-> DevicePtr a 
-> Context

source array and context

-> DevicePtr a 
-> Context

destination array and context

-> IO () 

Copies an array from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host, but serialised with respect to all pending and future asynchronous work in the source and destination contexts. To avoid this synchronisation, use copyArrayPeerAsync instead.

copyArrayPeerAsync Source

Arguments

:: Storable a 
=> Int

number of array elements

-> DevicePtr a 
-> Context

source array and context

-> DevicePtr a 
-> Context

destination array and device context

-> Maybe Stream

stream to associate with

-> IO () 

Copies from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host and all work in other streams and devices.

Combined Allocation and Marshalling

newListArray :: Storable a => [a] -> IO (DevicePtr a) Source

Write a list of storable elements into a newly allocated device array. This is newListArrayLen composed with fst.

newListArrayLen :: Storable a => [a] -> IO (DevicePtr a, Int) Source

Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two memory copies: firstly from a Haskell list to a heap allocated array, and from there onto the graphics device. The memory should be freed when no longer required.

withListArray :: Storable a => [a] -> (DevicePtr a -> IO b) -> IO b Source

Temporarily store a list of elements into a newly allocated device array. An IO action is applied to to the array, the result of which is returned. Similar to newListArray, this requires copying the data twice.

As with allocaArray, the memory is freed once the action completes, so you should not return the pointer from the action, and be wary of asynchronous kernel execution.

withListArrayLen :: Storable a => [a] -> (Int -> DevicePtr a -> IO b) -> IO b Source

A variant of withListArray which also supplies the number of elements in the array to the applied function

Utility

memset :: Storable a => DevicePtr a -> Int -> a -> IO () Source

Set a number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide.

memsetAsync :: Storable a => DevicePtr a -> Int -> a -> Maybe Stream -> IO () Source

Set the number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide. The operation is asynchronous and may optionally be associated with a stream. Requires cuda-3.2.

getDevicePtr :: [AllocFlag] -> HostPtr a -> IO (DevicePtr a) Source

Return the device pointer associated with a mapped, pinned host buffer, which was allocated with the DeviceMapped option by mallocHostArray.

Currently, no options are supported and this must be empty.

getBasePtr :: DevicePtr a -> IO (DevicePtr a, Int64) Source

Return the base address and allocation size of the given device pointer

getMemInfo :: IO (Int64, Int64) Source

Return the amount of free and total memory respectively available to the current context (bytes)