| Safe Haskell | Safe-Inferred | 
|---|---|
| Language | GHC2021 | 
Futhark.CodeGen.ImpGen.GPU.SegHist
Description
Our compilation strategy for SegHist is based around avoiding
 bin conflicts.  We do this by splitting the input into chunks, and
 for each chunk computing a single subhistogram.  Then we combine
 the subhistograms using an ordinary segmented reduction (SegRed).
There are some branches around to efficiently handle the case where we use only a single subhistogram (because it's large), so that we respect the asymptotics, and do not copy the destination array.
We also use a heuristic strategy for computing subhistograms in shared memory when possible. Given:
H: total size of histograms in bytes, including any lock arrays.
G: block size
T: number of bytes of shared memory each thread can be given without impacting occupancy (determined experimentally, e.g. 32).
LMAX: maximum amount of shared memory per threadblock (hard limit).
We wish to compute:
COOP: cooperation level (number of threads per subhistogram)
LH: number of shared memory subhistograms
We do this as:
COOP = ceil(H / T) LH = ceil((G*T)/H) if COOP <= G && H <= LMAX then use shared memory else use global memory
Synopsis
- compileSegHist :: Pat LetDecMem -> SegLevel -> SegSpace -> [HistOp GPUMem] -> KernelBody GPUMem -> CallKernelGen ()
Documentation
compileSegHist :: Pat LetDecMem -> SegLevel -> SegSpace -> [HistOp GPUMem] -> KernelBody GPUMem -> CallKernelGen () Source #
Generate code for a segmented histogram called from the host.