repa-eval-4.2.3.1: Low-level parallel operators on bulk random-accessble arrays.

Data.Repa.Eval.Generic.Par

Contents

Description

Generic parallel array computation operators.

Synopsis

# Filling

Arguments

 :: Gang Gang to run the operation on. -> (Int# -> a -> IO ()) Update function to write into result buffer. -> (Int# -> a) Function to get the value at a given index. -> Int# Number of elements. -> IO ()

Fill something in parallel.

• The array is split into linear chunks, and each thread linearly fills one chunk.

Arguments

 :: Gang Gang to run the operation on. -> (Int# -> a -> IO ()) Update function to write into result buffer. -> (Int# -> IO (Int# -> IO a)) Create a function to get the value at a given index. The first argument is the thread number, so you can do some per-thread initialisation. -> Int# Number of elements. -> IO ()

Fill something in parallel, using a separate IO action for each thread.

• The array is split into linear chunks, and each thread linearly fills one chunk.

Arguments

 :: Elt a => Gang -> (Int# -> a -> IO ()) Update function to write into result buffer. -> (Int# -> Int# -> a) Function to evaluate the element at an (x, y) index. -> Int# Width of the whole array. -> Int# x0 lower left corner of block to fill -> Int# y0 -> Int# w0 width of block to fill. -> Int# h0 height of block to fill. -> IO ()

Fill a block in a rank-2 array in parallel.

• Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
• Coordinates given are of the filled edges of the block.
• We divide the block into columns, and give one column to each thread.
• Each column is filled in row major order from top to bottom.

Arguments

 :: Gang Gang to run the operation on. -> (Int# -> a -> IO ()) Update function to write into result buffer. -> (Int# -> a) Function to get the value at a given index. -> Int# Number of elements. -> IO ()

Fill something in parallel, using a round-robin order.

• Threads handle elements in row major, round-robin order.
• Using this method helps even out unbalanced workloads.

Arguments

 :: Elt a => Gang Gang to run the operation on. -> (Int# -> a -> IO ()) Update function to write into result buffer. -> (Int# -> Int# -> cursor) Make a cursor from an (x, y) index. -> (Int# -> Int# -> cursor -> cursor) Shift the cursor by an (x, y) offset. -> (cursor -> a) Function to evaluate the element at an index. -> Int# Width of the whole array. -> Int# x0 lower left corner of block to fill -> Int# y0 -> Int# w0 width of block to fill -> Int# h0 height of block to fill -> IO ()

Fill a block in a rank-2 array in parallel.

• Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
• Using cursor functions can help to expose inter-element indexing computations to the GHC and LLVM optimisers.
• Coordinates given are of the filled edges of the block.
• We divide the block into columns, and give one column to each thread.
• We need the Elt constraint so that we can use its touch function to provide an order of evaluation ammenable to the LLVM optimiser. You should compile your Haskell program with -fllvm -optlo-O3 to enable LLVM's Global Value Numbering optimisation.

# Reduction

Arguments

 :: Gang Gang to run the operation on. -> (Int# -> a) Function to get an element from the source. -> (a -> a -> a) Binary associative combining function. -> a Starting value. -> Int# Number of elements. -> IO a

Parallel tree reduction of an array to a single value. Each thread takes an equally sized chunk of the data and computes a partial sum. The main thread then reduces the array of partial sums to the final result.

We don't require that the initial value be a neutral element, so each thread computes a fold1 on its chunk of the data, and the seed element is only applied in the final reduction step.

Arguments

 :: Gang Gang to run the operation on. -> (Int# -> a -> IO ()) Function to write into the result buffer. -> (Int# -> a) Function to get an element from the source. -> (a -> a -> a) Binary associative combination operator. -> a Neutral starting value. -> Int# Total length of source. -> Int# Inner dimension (length to fold over). -> IO ()

Parallel reduction of a multidimensional array along the innermost dimension. Each output value is computed by a single thread, with the output values distributed evenly amongst the available threads.