futhark-0.16.3: An optimising compiler for a functional, array-oriented language.
Safe HaskellNone



Our compilation strategy for SegHist is based around avoiding bin conflicts. We do this by splitting the input into chunks, and for each chunk computing a single subhistogram. Then we combine the subhistograms using an ordinary segmented reduction (SegRed).

There are some branches around to efficiently handle the case where we use only a single subhistogram (because it's large), so that we respect the asymptotics, and do not copy the destination array.

We also use a heuristic strategy for computing subhistograms in local memory when possible. Given:

H: total size of histograms in bytes, including any lock arrays.

G: group size

T: number of bytes of local memory each thread can be given without impacting occupancy (determined experimentally, e.g. 32).

LMAX: maximum amount of local memory per workgroup (hard limit).

We wish to compute:

COOP: cooperation level (number of threads per subhistogram)

LH: number of local memory subhistograms

We do this as:

COOP = ceil(H / T) LH = ceil((G*T)/H) if COOP <= G && H <= LMAX then use local memory else use global memory



compileSegHist :: Pattern KernelsMem -> Count NumGroups SubExp -> Count GroupSize SubExp -> SegSpace -> [HistOp KernelsMem] -> KernelBody KernelsMem -> CallKernelGen () Source #

Generate code for a segmented histogram called from the host.