Safe Haskell | None |
---|---|
Language | Haskell2010 |
Our compilation strategy for SegHist
is based around avoiding
bin conflicts. We do this by splitting the input into chunks, and
for each chunk computing a single subhistogram. Then we combine
the subhistograms using an ordinary segmented reduction (SegRed
).
There are some branches around to efficiently handle the case where we use only a single subhistogram (because it's large), so that we respect the asymptotics, and do not copy the destination array.
We also use a heuristic strategy for computing subhistograms in local memory when possible. Given:
H: total size of histograms in bytes, including any lock arrays.
G: group size
T: number of bytes of local memory each thread can be given without impacting occupancy (determined experimentally, e.g. 32).
LMAX: maximum amount of local memory per workgroup (hard limit).
We wish to compute:
COOP: cooperation level (number of threads per subhistogram)
LH: number of local memory subhistograms
We do this as:
COOP = ceil(H / T) LH = ceil((G*T)/H) if COOP <= G && H <= LMAX then use local memory else use global memory
Synopsis
- compileSegHist :: Pat GPUMem -> Count NumGroups SubExp -> Count GroupSize SubExp -> SegSpace -> [HistOp GPUMem] -> KernelBody GPUMem -> CallKernelGen ()