cublas-0.5.0.0: FFI bindings to the CUDA BLAS library

Copyright[2014..2017] Trevor L. McDonell
LicenseBSD3
MaintainerTrevor L. McDonell <tmcdonell@cse.unsw.edu.au>
Stabilityexperimental
Portabilitynon-portable (GHC extensions)
Safe HaskellNone
LanguageHaskell2010

Foreign.CUDA.BLAS

Contents

Description

The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) for NVIDIA GPUs.

To use operations from the cuBLAS library, the user must allocate the required matrices and vectors in the GPU memory space, fill them with data, call the desired sequence of cuBLAS functions, then copy the results from the GPU memory space back to the host.

The cuda package can be used for writing to and retrieving data from the GPU.

Data layout

Unlike modern BLAS libraries, cuBLAS only provides Fortran-style implementations of the subprograms, using column-major storage and 1-based indexing.

The ?geam operation can be used to perform matrix transposition.

Example

At a short example, we show how to compute the following matrix-matrix product with dgemm:

\[ \left(\begin{matrix} 1 & 2 \\ 3 & 4 \\ 5 & 6 \\ \end{matrix}\right) \cdot \left(\begin{matrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ \end{matrix}\right) = \left(\begin{matrix} 9 & 12 & 15 \\ 19 & 26 & 33 \\ 29 & 40 & 51 \\ \end{matrix}\right) \]

I assume you know how to initialise the CUDA environment, as described in the Foreign.CUDA.Driver module:

>>> import Foreign.CUDA.Driver as CUDA
>>> import Foreign.CUDA.BLAS as BLAS
>>> CUDA.initialise []
>>> dev <- CUDA.device 0
>>> ctx <- CUDA.create dev []

Just as we must create a CUDA execution context with create before interacting with the GPU, we must create a BLAS context handle before executing any cuBLAS library operations, which will be associated with the current device context:

>>> hdl <- BLAS.create

Now, let us generate the matrix data on the GPU. (For simplicity in this example we will just marshal the data via lists, but in a real application with a large amount of data we should of course use some kind of unboxed array):

>>> let rowsA = 3; colsA = 2; sizeA = rowsA * colsA
>>> let rowsB = 2; colsB = 3; sizeB = rowsB * colsB
>>> let sizeC = rowsA * colsB
>>> matA <- CUDA.newListArray (take sizeA [1..])
>>> matB <- CUDA.newListArray (take sizeB [1..])
>>> matC <- CUDA.mallocArray sizeC

Note in the above that we store data in row-major order, as is the convention in C. However, the cuBLAS library assumes a column-major representation, as is the style of Fortran. However, we can make use of the following equivalency:

\[ B^T \cdot A^T = (A \cdot B)^T \]

and, since the transposed matrix in column-major representation is equivalent to our matrix in row-major representation, we can avoid any actual data manipulation to get things into a form suitable for cuBLAS (phew!).

The final thing to take care of are the scaling parameters to the dgemm operation, \(\alpha\) and \(\beta\). By default, it is assumed that these values reside in host memory, but this setting can be changed with setPointerMode; When set to Device mode, the function withDevicePtr can be used to treat the device memory pointer as a plain pointer to pass to the function.

Now, we are ready to piece it all together:

>>> import Foreign.Marshal
>>> with 1.0 $ \alpha ->
>>> with 0.0 $ \beta ->
>>> dgemm hdl N N colsB rowsA colsA alpha matB colsB matA colsA beta matC colsB

And retrieve the result:

>>> print =<< CUDA.peekListArray sizeC matC
[9.0,12.0,15.0,19.0,26.0,33.0,29.0,40.0,51.0]

Finally, we should free the device memory we allocated, and release the BLAS context handle:

>>> BLAS.destroy hdl
Additional information

For more information, see the NVIDIA cuBLAS documentation:

http://docs.nvidia.com/cuda/cublas/index.html

Synopsis

Control

data MathMode Source #

Enum for default math mode / tensor math mode

Constructors

DefaultMath 
TensorMath 

newtype Handle Source #

An opaque handle to the cuBLAS library context, which is passed to all library function calls.

http://docs.nvidia.com/cuda/cublas/index.html#cublashandle_t

Constructors

Handle (Ptr ()) 

create :: IO Handle Source #

This function initializes the CUBLAS library and creates a handle to an opaque structure holding the CUBLAS library context. It allocates hardware resources on the host and device and must be called prior to making any other CUBLAS library calls.

http://docs.nvidia.com/cuda/cublas/index.html#cublascreate

destroy :: Handle -> IO () Source #

This function releases hardware resources used by the CUBLAS library. The release of GPU resources may be deferred until the application exits. This function is usually the last call with a particular handle to the CUBLAS library.

http://docs.nvidia.com/cuda/cublas/index.html#cublasdestroy

setPointerMode :: Handle -> PointerMode -> IO () Source #

Set the pointer mode used by cuBLAS library functions. For example, this controls whether the scaling parameters \(\alpha\) and \(\beta\) of the ?gemm operation are treated as residing in host or device memory.

The default mode is for values to be passed by reference from the host.

http://docs.nvidia.com/cuda/cublas/index.html#cublassetpointermode

getPointerMode :: Handle -> IO PointerMode Source #

Get the pointer mode used by cuBLAS library functions to pass scalar arguments.

http://docs.nvidia.com/cuda/cublas/index.html#cublasgetpointermode

setAtomicsMode :: Handle -> AtomicsMode -> IO () Source #

Set whether cuBLAS library functions are allowed to use atomic functions, when available. The implementations are generally faster, but can generate results which are not strictly identical from one run to another.

http://docs.nvidia.com/cuda/cublas/index.html#cublassetatomicsmode

getAtomicsMode :: Handle -> IO AtomicsMode Source #

Determine whether cuBLAS library functions are allowed to use atomic operations.

http://docs.nvidia.com/cuda/cublas/index.html#cublasgetatomicsmode

setMathMode :: Handle -> MathMode -> IO () Source #

Set whether cuBLAS library functions are allowed to use Tensor Core operations where available.

http://docs.nvidia.com/cuda/cublas/index.html#cublassetmathmode

since 0.4.0.0

getMathMode :: Handle -> IO MathMode Source #

Determine whether cuBLAS library functions are allowed to use Tensor Core operations where available.

http://docs.nvidia.com/cuda/cublas/index.html#cublasgetmathmode

since 0.4.0.0

cublasError :: String -> IO a Source #

Raise a CUBLASException in the IO Monad

requireSDK :: Name -> Double -> IO a Source #

A specially formatted error message

Operations