Safe Haskell | None |
---|---|

Language | Haskell98 |

Generic parallel array computation operators.

- fillChunked :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> Int# -> IO ()
- fillChunkedIO :: Gang -> (Int# -> a -> IO ()) -> (Int# -> IO (Int# -> IO a)) -> Int# -> IO ()
- fillBlock2 :: Elt a => Gang -> (Int# -> a -> IO ()) -> (Int# -> Int# -> a) -> Int# -> Int# -> Int# -> Int# -> Int# -> IO ()
- fillInterleaved :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> Int# -> IO ()
- fillCursoredBlock2 :: Elt a => Gang -> (Int# -> a -> IO ()) -> (Int# -> Int# -> cursor) -> (Int# -> Int# -> cursor -> cursor) -> (cursor -> a) -> Int# -> Int# -> Int# -> Int# -> Int# -> IO ()
- foldAll :: Gang -> (Int# -> a) -> (a -> a -> a) -> a -> Int# -> IO a
- foldInner :: Gang -> (Int# -> a -> IO ()) -> (Int# -> a) -> (a -> a -> a) -> a -> Int# -> Int# -> IO ()

# Filling

:: Gang | Gang to run the operation on. |

-> (Int# -> a -> IO ()) | Update function to write into result buffer. |

-> (Int# -> a) | Function to get the value at a given index. |

-> Int# | Number of elements. |

-> IO () |

Fill something in parallel.

- The array is split into linear chunks, and each thread linearly fills one chunk.

:: Gang | Gang to run the operation on. |

-> (Int# -> a -> IO ()) | Update function to write into result buffer. |

-> (Int# -> IO (Int# -> IO a)) | Create a function to get the value at a given index. The first argument is the thread number, so you can do some per-thread initialisation. |

-> Int# | Number of elements. |

-> IO () |

Fill something in parallel, using a separate IO action for each thread.

- The array is split into linear chunks, and each thread linearly fills one chunk.

:: Elt a | |

=> Gang | |

-> (Int# -> a -> IO ()) | Update function to write into result buffer. |

-> (Int# -> Int# -> a) | Function to evaluate the element at an (x, y) index. |

-> Int# | Width of the whole array. |

-> Int# | x0 lower left corner of block to fill |

-> Int# | y0 |

-> Int# | w0 width of block to fill. |

-> Int# | h0 height of block to fill. |

-> IO () |

Fill a block in a rank-2 array in parallel.

- Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
- Coordinates given are of the filled edges of the block.
- We divide the block into columns, and give one column to each thread.
- Each column is filled in row major order from top to bottom.

:: Gang | Gang to run the operation on. |

-> (Int# -> a -> IO ()) | Update function to write into result buffer. |

-> (Int# -> a) | Function to get the value at a given index. |

-> Int# | Number of elements. |

-> IO () |

Fill something in parallel, using a round-robin order.

- Threads handle elements in row major, round-robin order.
- Using this method helps even out unbalanced workloads.

:: Elt a | |

=> Gang | Gang to run the operation on. |

-> (Int# -> a -> IO ()) | Update function to write into result buffer. |

-> (Int# -> Int# -> cursor) | Make a cursor from an (x, y) index. |

-> (Int# -> Int# -> cursor -> cursor) | Shift the cursor by an (x, y) offset. |

-> (cursor -> a) | Function to evaluate the element at an index. |

-> Int# | Width of the whole array. |

-> Int# | x0 lower left corner of block to fill |

-> Int# | y0 |

-> Int# | w0 width of block to fill |

-> Int# | h0 height of block to fill |

-> IO () |

Fill a block in a rank-2 array in parallel.

- Blockwise filling can be more cache-efficient than linear filling for rank-2 arrays.
- Using cursor functions can help to expose inter-element indexing computations to the GHC and LLVM optimisers.
- Coordinates given are of the filled edges of the block.
- We divide the block into columns, and give one column to each thread.
- We need the
`Elt`

constraint so that we can use its`touch`

function to provide an order of evaluation ammenable to the LLVM optimiser. You should compile your Haskell program with`-fllvm -optlo-O3`

to enable LLVM's Global Value Numbering optimisation.

# Reduction

:: Gang | Gang to run the operation on. |

-> (Int# -> a) | Function to get an element from the source. |

-> (a -> a -> a) | Binary associative combining function. |

-> a | Starting value. |

-> Int# | Number of elements. |

-> IO a |

Parallel tree reduction of an array to a single value. Each thread takes an equally sized chunk of the data and computes a partial sum. The main thread then reduces the array of partial sums to the final result.

We don't require that the initial value be a neutral element, so each thread computes a fold1 on its chunk of the data, and the seed element is only applied in the final reduction step.

:: Gang | Gang to run the operation on. |

-> (Int# -> a -> IO ()) | Function to write into the result buffer. |

-> (Int# -> a) | Function to get an element from the source. |

-> (a -> a -> a) | Binary associative combination operator. |

-> a | Neutral starting value. |

-> Int# | Total length of source. |

-> Int# | Inner dimension (length to fold over). |

-> IO () |

Parallel reduction of a multidimensional array along the innermost dimension. Each output value is computed by a single thread, with the output values distributed evenly amongst the available threads.