futhark-0.20.8: An optimising compiler for a functional, array-oriented language.
Safe HaskellNone
LanguageHaskell2010

Futhark.CodeGen.ImpGen.GPU.SegHist

Description

Our compilation strategy for SegHist is based around avoiding bin conflicts. We do this by splitting the input into chunks, and for each chunk computing a single subhistogram. Then we combine the subhistograms using an ordinary segmented reduction (SegRed).

There are some branches around to efficiently handle the case where we use only a single subhistogram (because it's large), so that we respect the asymptotics, and do not copy the destination array.

We also use a heuristic strategy for computing subhistograms in local memory when possible. Given:

H: total size of histograms in bytes, including any lock arrays.

G: group size

T: number of bytes of local memory each thread can be given without impacting occupancy (determined experimentally, e.g. 32).

LMAX: maximum amount of local memory per workgroup (hard limit).

We wish to compute:

COOP: cooperation level (number of threads per subhistogram)

LH: number of local memory subhistograms

We do this as:

COOP = ceil(H / T) LH = ceil((G*T)/H) if COOP <= G && H <= LMAX then use local memory else use global memory

Synopsis

Documentation

compileSegHist :: Pat GPUMem -> Count NumGroups SubExp -> Count GroupSize SubExp -> SegSpace -> [HistOp GPUMem] -> KernelBody GPUMem -> CallKernelGen () Source #

Generate code for a segmented histogram called from the host.