Safe Haskell | Safe-Inferred |
---|---|
Language | GHC2021 |
Our compilation strategy for SegHist
is based around avoiding
bin conflicts. We do this by splitting the input into chunks, and
for each chunk computing a single subhistogram. Then we combine
the subhistograms using an ordinary segmented reduction (SegRed
).
There are some branches around to efficiently handle the case where we use only a single subhistogram (because it's large), so that we respect the asymptotics, and do not copy the destination array.
We also use a heuristic strategy for computing subhistograms in shared memory when possible. Given:
H: total size of histograms in bytes, including any lock arrays.
G: block size
T: number of bytes of shared memory each thread can be given without impacting occupancy (determined experimentally, e.g. 32).
LMAX: maximum amount of shared memory per threadblock (hard limit).
We wish to compute:
COOP: cooperation level (number of threads per subhistogram)
LH: number of shared memory subhistograms
We do this as:
COOP = ceil(H / T) LH = ceil((G*T)/H) if COOP <= G && H <= LMAX then use shared memory else use global memory
Synopsis
- compileSegHist :: Pat LetDecMem -> SegLevel -> SegSpace -> [HistOp GPUMem] -> KernelBody GPUMem -> CallKernelGen ()
Documentation
compileSegHist :: Pat LetDecMem -> SegLevel -> SegSpace -> [HistOp GPUMem] -> KernelBody GPUMem -> CallKernelGen () Source #
Generate code for a segmented histogram called from the host.