futhark-0.21.10: An optimising compiler for a functional, array-oriented language.
Safe HaskellNone
LanguageHaskell2010

Futhark.Optimise.BlkRegTiling

Description

Perform a restricted form of block+register tiling corresponding to the following pattern: * a redomap is quasi-perfectly nested inside a kernel with at least two parallel dimension (the perfectly nested restriction is relaxed a bit to allow for SGEMM); * all streamed arrays of redomap are one dimensional; * all streamed arrays are variant to exacly one of the two innermost parallel dimensions, and conversely for each of the two innermost parallel dimensions, there is at least one streamed array variant to it; * the stream's result is a tuple of scalar values, which are also the "thread-in-space" return of the kernel. * We have further restrictions that in principle can be relaxed: the redomap has exactly two array input the redomap produces one scalar result the kernel produces one scalar result

Synopsis

Documentation

doRegTiling3D :: Stm GPU -> TileM (Maybe (Stms GPU, Stm GPU)) Source #

Expects a kernel statement as argument. CONDITIONS for 3D tiling optimization to fire are: 1. a) The kernel body can be broken into scalar-code-1 ++ [Redomap stmt] ++ scalar-code-2. b) The kernels has a per-thread result, and obviously the result is variant to the 3rd dimension (counted from innermost to outermost) 2. For the Redomap: a) the streamed arrays are one dimensional b) each of the array arguments of Redomap are variant to exactly one of the three innermost-parallel dimension of the kernel. This condition can be relaxed by interchanging kernel dimensions whenever possible. 3. For scalar-code-1: a) each of the statements is a slice that produces one of the streamed arrays

mmBlkRegTiling :: Stm GPU -> TileM (Maybe (Stms GPU, Stm GPU)) mmBlkRegTiling (Let pat aux (Op (SegOp (SegMap SegThread{} seg_space ts old_kbody))))