| Safe Haskell | None |
|---|---|
| Language | Haskell98 |
Data.Repa.Eval.Generic.Par
Description
Generic parallel array computation operators.
- fillChunked :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> Int# -> IO ()
- fillChunkedIO :: Gang -> (Int# -> a -> IO ()) -> (Int# -> IO (Int# -> IO a)) -> Int# -> IO ()
- fillBlock2 :: Elt a => Gang -> (Int# -> a -> IO ()) -> (Int# -> Int# -> a) -> Int# -> Int# -> Int# -> Int# -> Int# -> IO ()
- fillInterleaved :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> Int# -> IO ()
- fillCursoredBlock2 :: Elt a => Gang -> (Int# -> a -> IO ()) -> (Int# -> Int# -> cursor) -> (Int# -> Int# -> cursor -> cursor) -> (cursor -> a) -> Int# -> Int# -> Int# -> Int# -> Int# -> IO ()
- foldAll :: Gang -> (Int# -> a) -> (a -> a -> a) -> a -> Int# -> IO a
- foldInner :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> (a -> a -> a) -> a -> Int# -> Int# -> IO ()
Filling
Arguments
| :: Gang | Gang to run the operation on. |
| -> (Int# -> a -> IO ()) | Update function to write into result buffer. |
| -> (Int# -> a) | Function to get the value at a given index. |
| -> Int# | Number of elements. |
| -> IO () |
Fill something in parallel.
- The array is split into linear chunks, and each thread linearly fills one chunk.
Arguments
| :: Gang | Gang to run the operation on. |
| -> (Int# -> a -> IO ()) | Update function to write into result buffer. |
| -> (Int# -> IO (Int# -> IO a)) | Create a function to get the value at a given index. The first argument is the thread number, so you can do some per-thread initialisation. |
| -> Int# | Number of elements. |
| -> IO () |
Fill something in parallel, using a separate IO action for each thread.
- The array is split into linear chunks, and each thread linearly fills one chunk.
Arguments
| :: Elt a | |
| => Gang | |
| -> (Int# -> a -> IO ()) | Update function to write into result buffer. |
| -> (Int# -> Int# -> a) | Function to evaluate the element at an (x, y) index. |
| -> Int# | Width of the whole array. |
| -> Int# | x0 lower left corner of block to fill |
| -> Int# | y0 |
| -> Int# | w0 width of block to fill. |
| -> Int# | h0 height of block to fill. |
| -> IO () |
Fill a block in a rank-2 array in parallel.
- Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
- Coordinates given are of the filled edges of the block.
- We divide the block into columns, and give one column to each thread.
- Each column is filled in row major order from top to bottom.
Arguments
| :: Gang | Gang to run the operation on. |
| -> (Int# -> a -> IO ()) | Update function to write into result buffer. |
| -> (Int# -> a) | Function to get the value at a given index. |
| -> Int# | Number of elements. |
| -> IO () |
Fill something in parallel, using a round-robin order.
- Threads handle elements in row major, round-robin order.
- Using this method helps even out unbalanced workloads.
Arguments
| :: Elt a | |
| => Gang | Gang to run the operation on. |
| -> (Int# -> a -> IO ()) | Update function to write into result buffer. |
| -> (Int# -> Int# -> cursor) | Make a cursor from an (x, y) index. |
| -> (Int# -> Int# -> cursor -> cursor) | Shift the cursor by an (x, y) offset. |
| -> (cursor -> a) | Function to evaluate the element at an index. |
| -> Int# | Width of the whole array. |
| -> Int# | x0 lower left corner of block to fill |
| -> Int# | y0 |
| -> Int# | w0 width of block to fill |
| -> Int# | h0 height of block to fill |
| -> IO () |
Fill a block in a rank-2 array in parallel.
- Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
- Using cursor functions can help to expose inter-element indexing computations to the GHC and LLVM optimisers.
- Coordinates given are of the filled edges of the block.
- We divide the block into columns, and give one column to each thread.
- We need the
Eltconstraint so that we can use itstouchfunction to provide an order of evaluation ammenable to the LLVM optimiser. You should compile your Haskell program with-fllvm -optlo-O3to enable LLVM's Global Value Numbering optimisation.
Reduction
Arguments
| :: Gang | Gang to run the operation on. |
| -> (Int# -> a) | Function to get an element from the source. |
| -> (a -> a -> a) | Binary associative combining function. |
| -> a | Starting value. |
| -> Int# | Number of elements. |
| -> IO a |
Parallel tree reduction of an array to a single value. Each thread takes an equally sized chunk of the data and computes a partial sum. The main thread then reduces the array of partial sums to the final result.
We don't require that the initial value be a neutral element, so each thread computes a fold1 on its chunk of the data, and the seed element is only applied in the final reduction step.
Arguments
| :: Gang | Gang to run the operation on. |
| -> (Int# -> a -> IO ()) | Function to write into the result buffer. |
| -> (Int# -> a) | Function to get an element from the source. |
| -> (a -> a -> a) | Binary associative combination operator. |
| -> a | Neutral starting value. |
| -> Int# | Total length of source. |
| -> Int# | Inner dimension (length to fold over). |
| -> IO () |
Parallel reduction of a multidimensional array along the innermost dimension. Each output value is computed by a single thread, with the output values distributed evenly amongst the available threads.