futhark-0.25.24: An optimising compiler for a functional, array-oriented language.
Safe HaskellSafe-Inferred
LanguageGHC2021

Futhark.Optimise.DoubleBuffer

Description

The simplification engine is only willing to hoist allocations out of loops if the memory block resulting from the allocation is dead at the end of the loop. If it is not, we may cause data hazards.

This pass tries to rewrite loops with memory parameters. Specifically, it takes loops of this form:

loop {..., A_mem, ..., A, ...} ... do {
  ...
  let A_out_mem = alloc(...) -- stores A_out
  in {..., A_out_mem, ..., A_out, ...}
}

and turns them into

let A_in_mem = alloc(...)
let A_out_mem = alloc(...)
let A_in = copy A -- in A_in_mem
loop {..., A_in_mem, A_out_mem, ..., A=A_in, ...} ... do {
  ...
  in {..., A_out_mem, A_mem, ..., A_out, ...}
}

The result is essentially "pointer swapping" between the two memory initial blocks A_mem and A_out_mem. The invariant is that the array is always stored in the "first" memory block at the beginning of the loop (and also in the final result). We do need to add an extra element to the pattern, however. The initial copy of A could be elided if A is unique (thus A_in_mem=A_mem). This is because only then is it safe to use A_mem to store loop results. We don't currently do this.

Unfortunately, not all loops fit the pattern above. In particular, a nested loop that has been transformed as such does not! Therefore we also have another double buffering strategy, that turns

loop {..., A_mem, ..., A, ...} ... do {
  ...
  let A_out_mem = alloc(...)
  -- A in A_out_mem
  in {..., A_out_mem, ..., A, ...}
}

into

let A_res_mem = alloc(...)
loop {..., A_mem, ..., A, ...} ... do {
  ...
  let A_out_mem = alloc(...)
  -- A in A_out_mem
  let A' = copy A
  -- A' in A_res_mem
  in {..., A_res_mem, ..., A, ...}
}

The allocation of A_out_mem can then be hoisted out because it is dead at the end of the loop. This always works as long as A_out_mem has a loop-invariant allocation size, but requires a copy per iteration (and an initial one, elided above).

Synopsis

Documentation

doubleBufferGPU :: Pass GPUMem GPUMem Source #

The pass for GPU kernels.

doubleBufferMC :: Pass MCMem MCMem Source #

The pass for multicore