cusparse-0.2.0.0: FFI bindings to the CUDA Sparse BLAS library

Copyright[2017] Trevor L. McDonell
LicenseBSD3
MaintainerTrevor L. McDonell <tmcdonell@cse.unsw.edu.au>
Stabilityexperimental
Portabilitynon-portable (GHC extensions)
Safe HaskellNone
LanguageHaskell2010

Foreign.CUDA.BLAS.Sparse

Contents

Description

The cuSPARSE library is an implementation of Sparse BLAS (Basic Linear Algebra Subprograms) for NVIDIA GPUs. Sparse matrices are those where the majority of elements are zero. Sparse BLAS routines are specifically implemented to take advantage of this sparsity.

To use operations from the cuSPARSE library, the user must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the desired sequence of cuSPARSE functions, then copy the results from the GPU memory space back to the host.

The cuda package can be used for writing to and retrieving data from the GPU.

Example

The following is based on the following example:

http://docs.nvidia.com/cuda/cusparse/index.html#appendix-b-cusparse-library-c---example

It assumes basic familiarity with the cuda package, as described in the Foreign.CUDA.Driver module.

>>> import Foreign.CUDA.Driver      as CUDA
>>> import Foreign.CUDA.BLAS.Sparse as Sparse
>>> CUDA.initialise []
>>> dev <- CUDA.device 0
>>> ctx <- CUDA.create dev []

We begin by creating the following matrix in COO format and transferring to the GPU:

\[ \left(\begin{matrix} 1.0 & & 2.0 & 3.0 \\ & 4.0 & & \\ 5.0 & & 6.0 & 7.0 \\ & 8.0 & & 9.0 \end{matrix}\right) \]

>>> let n = 4
>>> let nnz = 9
>>> d_cooRowIdx <- newListArray [ 0,0,0, 1, 2,2,2, 3,3 ]  :: IO (DevicePtr Int32)
>>> d_cooColIdx <- newListArray [ 0,2,3, 1, 0,2,3, 1,3 ]  :: IO (DevicePtr Int32)
>>> d_vals      <- newListArray [ 1..9 ]                  :: IO (DevicePtr Double)

Create a sparse and dense vector:

>>> let nnz_vector = 3
>>> d_xVal <- newListArray [ 100, 200, 400 ] :: IO (DevicePtr Double)
>>> d_xIdx <- newListArray [ 0,   1,   3 ]   :: IO (DevicePtr Int32)
>>> d_y    <- newListArray [ 10, 20 .. 80 ]  :: IO (DevicePtr Double)

Initialise the cuSPARSE library and set up the matrix descriptor:

>>> hdl <- Sparse.create
>>> mat <- Sparse.createMatDescr
>>> Sparse.setMatrixType mat General
>>> Sparse.setIndexBase mat Zero

Exercise the conversion routines to convert from COO to CSR format:

>>> d_csrRowPtr <- CUDA.mallocArray (n+1) :: IO (DevicePtr Int32)
>>> xcoo2csr hdl d_cooRowIdx nnz n d_csrRowPtr Zero
>>> peekListArray (n+1) d_csrRowPtr
[0,3,4,7,9]

Scatter elements from the sparse vector into the dense vector:

>>> dsctr hdl nnz_vector d_xVal d_xIdx (d_y `plusDevPtr` (n * sizeOf (undefined::Double))) Zero
>>> peekListArray 8 d_y
[10.0,20.0,30.0,40.0,100.0,200.0,70.0,400.0]

Multiply the matrix in CSR format with the dense vector:

>>> with 2.0 $ \alpha ->
>>> with 3.0 $ \beta  ->
>>> dcsrmv hdl N n n nnz alpha mat d_vals d_csrRowPtr d_cooColIdx d_y beta (d_y `plusDevPtr` (n * sizeOf (undefined::Double)))
>>> peekListArray 8 d_y
[10.0,20.0,30.0,40.0,680.0,760.0,1230.0,2240.0]

Multiply the matrix in CSR format with a dense matrix:

>>> d_z <- CUDA.mallocArray (2*(n+1)) :: IO (DevicePtr Double)
>>> memset (castDevPtr d_z :: DevicePtr Word8) (2*(n+1)*sizeOf (undefined::Double)) 0
>>> with 5.0 $ \alpha ->
>>> with 0.0 $ \beta  ->
>>> dcsrmm hdl N n 2 n nnz alpha mat d_vals d_csrRowPtr d_cooColIdx d_y n beta d_z (n+1)
>> peekListArray (2*(n+1)) d_z
[950.0,400.0,2550.0,2600.0,0.0,49300.0,15200.0,132300.0,131200.0,0.0]

Finally, we should free the device memory we allocated, and release the Sparse BLAS context handle:

>>> Sparse.destroy hdl
Additional information

For more information, see the NVIDIA cuSPARSE documentation:

http://docs.nvidia.com/cuda/cusparse/index.html

Synopsis

Control

newtype Handle Source #

An opaque handle to the cuSPARSE library context, which is passed to all library function calls.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsehandlet

Constructors

Handle (Ptr ()) 

create :: IO Handle Source #

This function initializes the cuSPARSE library and creates a handle on the cuSPARSE context. It must be called before any other cuSPARSE API function is invoked. It allocates hardware resources necessary for accessing the GPU.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsecreate

destroy :: Handle -> IO () Source #

This function releases CPU-side resources used by the cuSPARSE library. The release of GPU-side resources may be deferred until the application shuts down.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsedestroy

setPointerMode :: Handle -> PointerMode -> IO () Source #

Set the pointer mode used by cuSPARSE library functions.

The default mode is for values to be passed by reference from the host.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsesetpointermode

getPointerMode :: Handle -> IO PointerMode Source #

Get the pointer mode used by cuSPARSE library functions to pass scalar arguments.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsegetpointermode

newtype Info_prune Source #

Constructors

Info_prune 

Fields

newtype Info_csru2csr Source #

Constructors

Info_csru2csr 

Fields

newtype Info_color Source #

Constructors

Info_color 

Fields

newtype Info Source #

An opaque structure holding the information collected in the analysis phase of the solution of the sparse triangular linear system.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsesolveanalysisinfot

Constructors

Info (Ptr ()) 

createInfo :: IO Info Source #

Create and initialise the solve and analysis structure to default values.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsecreatesolveanalysisinfo

destroyInfo :: Info -> IO () Source #

Release memory associated with a matrix solver structure.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsedestroysolveanalysisinfo

cusparseError :: String -> IO a Source #

Raise a CUSparseException in the IO Monad

checkStatus :: CInt -> IO () Source #

Throw an error if given error code is not CUSPARSE_STATUS_SUCCESS

data Fill Source #

Indicates whether the upper or lower part of the sparse matrix is stored.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsefillmodet

Constructors

Lower 
Upper 
Instances
Enum Fill Source # 
Instance details

Defined in Foreign.CUDA.BLAS.Sparse.Matrix.Descriptor

Methods

succ :: Fill -> Fill #

pred :: Fill -> Fill #

toEnum :: Int -> Fill #

fromEnum :: Fill -> Int #

enumFrom :: Fill -> [Fill] #

enumFromThen :: Fill -> Fill -> [Fill] #

enumFromTo :: Fill -> Fill -> [Fill] #

enumFromThenTo :: Fill -> Fill -> Fill -> [Fill] #

Eq Fill Source # 
Instance details

Defined in Foreign.CUDA.BLAS.Sparse.Matrix.Descriptor

Methods

(==) :: Fill -> Fill -> Bool #

(/=) :: Fill -> Fill -> Bool #

Show Fill Source # 
Instance details

Defined in Foreign.CUDA.BLAS.Sparse.Matrix.Descriptor

Methods

showsPrec :: Int -> Fill -> ShowS #

show :: Fill -> String #

showList :: [Fill] -> ShowS #

data Diagonal Source #

Indicates whether the diagonal elements of the matrix are unity. The diagonal elements are always assumed to be present, but if Unit is passed to an API routine, then the routine assumes that all diagonal entries are unity and will not read or modify those entries.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsediagtypet

Constructors

NonUnit 
Unit 

newtype MatrixDescriptor Source #

An opaque type used to describe the shape and properties of a matrix.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsematdescrt

Constructors

MatrixDescriptor (Ptr ()) 

newtype Hybrid Source #

An opaque structure holding the matrix in hybrid (HYB) format.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsehybmatt

Constructors

Hybrid (Ptr ()) 

destroyHYB :: Hybrid -> IO () Source #

Destroy and release any memory associated with a hybrid matrix.

http://docs.nvidia.com/cuda/cusparse/index.html#cusparsedestroyhybmat

Operations