repa-eval-4.2.3.1: Low-level parallel operators on bulk random-accessble arrays.

Safe HaskellNone
LanguageHaskell98

Data.Repa.Eval.Generic.Par

Contents

Description

Generic parallel array computation operators.

Synopsis

Filling

fillChunked Source #

Arguments

:: Gang

Gang to run the operation on.

-> (Int# -> a -> IO ())

Update function to write into result buffer.

-> (Int# -> a)

Function to get the value at a given index.

-> Int#

Number of elements.

-> IO () 

Fill something in parallel.

  • The array is split into linear chunks, and each thread linearly fills one chunk.

fillChunkedIO Source #

Arguments

:: Gang

Gang to run the operation on.

-> (Int# -> a -> IO ())

Update function to write into result buffer.

-> (Int# -> IO (Int# -> IO a))

Create a function to get the value at a given index. The first argument is the thread number, so you can do some per-thread initialisation.

-> Int#

Number of elements.

-> IO () 

Fill something in parallel, using a separate IO action for each thread.

  • The array is split into linear chunks, and each thread linearly fills one chunk.

fillBlock2 Source #

Arguments

:: Elt a 
=> Gang 
-> (Int# -> a -> IO ())

Update function to write into result buffer.

-> (Int# -> Int# -> a)

Function to evaluate the element at an (x, y) index.

-> Int#

Width of the whole array.

-> Int#

x0 lower left corner of block to fill

-> Int#

y0

-> Int#

w0 width of block to fill.

-> Int#

h0 height of block to fill.

-> IO () 

Fill a block in a rank-2 array in parallel.

  • Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
  • Coordinates given are of the filled edges of the block.
  • We divide the block into columns, and give one column to each thread.
  • Each column is filled in row major order from top to bottom.

fillInterleaved Source #

Arguments

:: Gang

Gang to run the operation on.

-> (Int# -> a -> IO ())

Update function to write into result buffer.

-> (Int# -> a)

Function to get the value at a given index.

-> Int#

Number of elements.

-> IO () 

Fill something in parallel, using a round-robin order.

  • Threads handle elements in row major, round-robin order.
  • Using this method helps even out unbalanced workloads.

fillCursoredBlock2 Source #

Arguments

:: Elt a 
=> Gang

Gang to run the operation on.

-> (Int# -> a -> IO ())

Update function to write into result buffer.

-> (Int# -> Int# -> cursor)

Make a cursor from an (x, y) index.

-> (Int# -> Int# -> cursor -> cursor)

Shift the cursor by an (x, y) offset.

-> (cursor -> a)

Function to evaluate the element at an index.

-> Int#

Width of the whole array.

-> Int#

x0 lower left corner of block to fill

-> Int#

y0

-> Int#

w0 width of block to fill

-> Int#

h0 height of block to fill

-> IO () 

Fill a block in a rank-2 array in parallel.

  • Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
  • Using cursor functions can help to expose inter-element indexing computations to the GHC and LLVM optimisers.
  • Coordinates given are of the filled edges of the block.
  • We divide the block into columns, and give one column to each thread.
  • We need the Elt constraint so that we can use its touch function to provide an order of evaluation ammenable to the LLVM optimiser. You should compile your Haskell program with -fllvm -optlo-O3 to enable LLVM's Global Value Numbering optimisation.

Reduction

foldAll Source #

Arguments

:: Gang

Gang to run the operation on.

-> (Int# -> a)

Function to get an element from the source.

-> (a -> a -> a)

Binary associative combining function.

-> a

Starting value.

-> Int#

Number of elements.

-> IO a 

Parallel tree reduction of an array to a single value. Each thread takes an equally sized chunk of the data and computes a partial sum. The main thread then reduces the array of partial sums to the final result.

We don't require that the initial value be a neutral element, so each thread computes a fold1 on its chunk of the data, and the seed element is only applied in the final reduction step.

foldInner Source #

Arguments

:: Gang

Gang to run the operation on.

-> (Int# -> a -> IO ())

Function to write into the result buffer.

-> (Int# -> a)

Function to get an element from the source.

-> (a -> a -> a)

Binary associative combination operator.

-> a

Neutral starting value.

-> Int#

Total length of source.

-> Int#

Inner dimension (length to fold over).

-> IO () 

Parallel reduction of a multidimensional array along the innermost dimension. Each output value is computed by a single thread, with the output values distributed evenly amongst the available threads.