cuda-0.6.0.1: FFI binding to the CUDA interface for programming NVIDIA GPUs

Copyright(c) [2009..2012] Trevor L. McDonell
LicenseBSD
Safe HaskellNone
LanguageHaskell98

Foreign.CUDA.Driver.Marshal

Contents

Description

Memory management for low-level driver interface

Synopsis

Host Allocation

data AllocFlag Source

Options for host allocation

mallocHostArray :: Storable a => [AllocFlag] -> Int -> IO (HostPtr a) Source

Allocate a section of linear memory on the host which is page-locked and directly accessible from the device. The storage is sufficient to hold the given number of elements of a storable type.

Note that since the amount of pageable memory is thusly reduced, overall system performance may suffer. This is best used sparingly to allocate staging areas for data exchange.

freeHost :: HostPtr a -> IO () Source

Free a section of page-locked host memory

registerArray :: Storable a => [AllocFlag] -> Int -> Ptr a -> IO (HostPtr a) Source

Page-locks the specified array (on the host) and maps it for the device(s) as specified by the given allocation flags. Subsequently, the memory is accessed directly by the device so can be read and written with much higher bandwidth than pageable memory that has not been registered. The memory range is added to the same tracking mechanism as mallocHostArray to automatically accelerate calls to functions such as pokeArray.

Note that page-locking excessive amounts of memory may degrade system performance, since it reduces the amount of pageable memory available. This is best used sparingly to allocate staging areas for data exchange.

This function is not yet implemented on Mac OS X. Requires cuda-4.0.

unregisterArray :: HostPtr a -> IO (Ptr a) Source

Unmaps the memory from the given pointer, and makes it pageable again.

This function is not yet implemented on Mac OS X. Requires cuda-4.0.

Device Allocation

mallocArray :: Storable a => Int -> IO (DevicePtr a) Source

Allocate a section of linear memory on the device, and return a reference to it. The memory is sufficient to hold the given number of elements of storable type. It is suitably aligned for any type, and is not cleared.

allocaArray :: Storable a => Int -> (DevicePtr a -> IO b) -> IO b Source

Execute a computation on the device, passing a pointer to a temporarily allocated block of memory sufficient to hold the given number of elements of storable type. The memory is freed when the computation terminates (normally or via an exception), so the pointer must not be used after this.

Note that kernel launches can be asynchronous, so you may want to add a synchronisation point using sync as part of the computation.

free :: DevicePtr a -> IO () Source

Release a section of device memory

Unified Memory Allocation

data AttachFlag Source

Options for unified memory allocations

Constructors

Global 
Host 
Single 

mallocManagedArray :: Storable a => [AttachFlag] -> Int -> IO (DevicePtr a) Source

Allocates memory that will be automatically managed by the Unified Memory system

Marshalling

peekArray :: Storable a => Int -> DevicePtr a -> Ptr a -> IO () Source

Copy a number of elements from the device to host memory. This is a synchronous operation

peekArrayAsync :: Storable a => Int -> DevicePtr a -> HostPtr a -> Maybe Stream -> IO () Source

Copy memory from the device asynchronously, possibly associated with a particular stream. The destination host memory must be page-locked.

peekListArray :: Storable a => Int -> DevicePtr a -> IO [a] Source

Copy a number of elements from the device into a new Haskell list. Note that this requires two memory copies: firstly from the device into a heap allocated array, and from there marshalled into a list.

pokeArray :: Storable a => Int -> Ptr a -> DevicePtr a -> IO () Source

Copy a number of elements onto the device. This is a synchronous operation

pokeArrayAsync :: Storable a => Int -> HostPtr a -> DevicePtr a -> Maybe Stream -> IO () Source

Copy memory onto the device asynchronously, possibly associated with a particular stream. The source host memory must be page-locked.

pokeListArray :: Storable a => [a] -> DevicePtr a -> IO () Source

Write a list of storable elements into a device array. The device array must be sufficiently large to hold the entire list. This requires two marshalling operations.

copyArrayAsync :: Storable a => Int -> DevicePtr a -> DevicePtr a -> IO () Source

Copy the given number of elements from the first device array (source) to the second (destination). The copied areas may not overlap. This operation is asynchronous with respect to the host, but will never overlap with kernel execution.

copyArrayPeer Source

Arguments

:: Storable a 
=> Int

number of array elements

-> DevicePtr a 
-> Context

source array and context

-> DevicePtr a 
-> Context

destination array and context

-> IO () 

Copies an array from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host, but serialised with respect to all pending and future asynchronous work in the source and destination contexts. To avoid this synchronisation, use copyArrayPeerAsync instead.

copyArrayPeerAsync Source

Arguments

:: Storable a 
=> Int

number of array elements

-> DevicePtr a 
-> Context

source array and context

-> DevicePtr a 
-> Context

destination array and device context

-> Maybe Stream

stream to associate with

-> IO () 

Copies from device memory in one context to device memory in another context. Note that this function is asynchronous with respect to the host and all work in other streams and devices.

Combined Allocation and Marshalling

newListArray :: Storable a => [a] -> IO (DevicePtr a) Source

Write a list of storable elements into a newly allocated device array. This is newListArrayLen composed with fst.

newListArrayLen :: Storable a => [a] -> IO (DevicePtr a, Int) Source

Write a list of storable elements into a newly allocated device array, returning the device pointer together with the number of elements that were written. Note that this requires two memory copies: firstly from a Haskell list to a heap allocated array, and from there onto the graphics device. The memory should be freed when no longer required.

withListArray :: Storable a => [a] -> (DevicePtr a -> IO b) -> IO b Source

Temporarily store a list of elements into a newly allocated device array. An IO action is applied to to the array, the result of which is returned. Similar to newListArray, this requires copying the data twice.

As with allocaArray, the memory is freed once the action completes, so you should not return the pointer from the action, and be wary of asynchronous kernel execution.

withListArrayLen :: Storable a => [a] -> (Int -> DevicePtr a -> IO b) -> IO b Source

A variant of withListArray which also supplies the number of elements in the array to the applied function

Utility

memset :: Storable a => DevicePtr a -> Int -> a -> IO () Source

Set a number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide.

memsetAsync :: Storable a => DevicePtr a -> Int -> a -> Maybe Stream -> IO () Source

Set the number of data elements to the specified value, which may be either 8-, 16-, or 32-bits wide. The operation is asynchronous and may optionally be associated with a stream. Requires cuda-3.2.

getDevicePtr :: [AllocFlag] -> HostPtr a -> IO (DevicePtr a) Source

Return the device pointer associated with a mapped, pinned host buffer, which was allocated with the DeviceMapped option by mallocHostArray.

Currently, no options are supported and this must be empty.

getBasePtr :: DevicePtr a -> IO (DevicePtr a, Int64) Source

Return the base address and allocation size of the given device pointer

getMemInfo :: IO (Int64, Int64) Source

Return the amount of free and total memory respectively available to the current context (bytes)