Distribution of Segment Descriptors
- mkDUSegdD :: Dist (Vector Int) -> Dist (Vector Int) -> Dist Int -> Dist USegd
- lengthD :: Dist USegd -> Dist Int
- takeLengthsD :: Dist USegd -> Dist (Vector Int)
- takeIndicesD :: Dist USegd -> Dist (Vector Int)
- takeElementsD :: Dist USegd -> Dist Int
- splitSegdOnSegsD :: Gang -> USegd -> Dist USegd
- splitSegdOnElemsD :: Gang -> USegd -> Dist ((USegd, Int), Int)
- splitSD :: Unbox a => Gang -> Dist USegd -> Vector a -> Dist (Vector a)
- joinSegdD :: Gang -> Dist USegd -> USegd
- glueSegdD :: Gang -> Dist ((USegd, Int), Int) -> Dist USegd
|:: Dist (Vector Int)|
|-> Dist (Vector Int)|
|-> Dist Int|
number of elements in each chunk
|-> Dist USegd|
O(1). Construct a distributed segment descriptor
O(1). Yield the lengths of the individual segments.
O(1). Yield the segment indices of a segment descriptor.
Split a segment descriptor across the gang, segment wise. Whole segments are placed on each thread, and we try to balance out the segments so each thread has the same number of array elements.
We don't split segments across threads, as this would limit our ability to perform intra-thread fusion of lifted operations. The down side of this is that if we have few segments with an un-even size distribution then large segments can cause the gang to become unbalanced.
In the following example the segment with size 100 dominates and unbalances the gang. There is no reason to put any segments on the the last thread because we need to wait for the first to finish anyway.
> pprp $ splitSegdOnSegsD theGang $ lengthsToUSegd $ fromList [100, 10, 20, 40, 50 :: Int] DUSegd lengths: DVector lengths: [ 1, 3, 1, 0] chunks: [,[10,20,40],,] indices: DVector lengths: [1,3,1,0] chunks: [, [0,10,30], , ] elements: DInt [100,70,50,0]
NOTE: This splitSegdOnSegsD function isn't currently used.
Split a segment descriptor across the gang, element wise. We try to put the same number of elements on each thread, which means that segments are sometimes split across threads.
Each thread gets a slice of segment descriptor, the segid of the first slice, and the offset of the first slice in its segment.
Example: In this picture each X represents 5 elements, and we have 5 segements in total.
segs: ----------------------- --- ------- --------------- ------------------- elems: |X X X X X X X X X|X X X X X X X X X|X X X X X X X X X|X X X X X X X X X| | thread1 | thread2 | thread3 | thread4 | segid: 0 0 3 4 offset: 0 45 0 5 pprp $ splitSegdOnElemsD theGang4 $ lengthsToUSegd $ fromList [60, 10, 20, 40, 50 :: Int] segd: DUSegd lengths: DVector lengths: [1,3,2,1] chunks: [,[15,10,20],[40,5],] indices: DVector lengths: [1,3,2,1] chunks: [, [0,15,25], [0,40],] elements: DInt [45,45,45,45] segids: DInt [0,0,3,4] (segment id of first slice on thread) offsets: DInt [0,45,0,5] (offset of that slice in its segment)
time O(segs) Join a distributed segment descriptor into a global one. This simply joins the distributed lengths and indices fields, but does not reconstruct the original segment descriptor as it was before splitting.
> pprp $ joinSegdD theGang4 $ fstD $ fstD $ splitSegdOnElemsD theGang $ lengthsToUSegd $ fromList [60, 10, 20, 40, 50] USegd lengths: [45,15,10,20,40,5,45] indices: [0,45,60,70,90,130,135] elements: 180
TODO: sequential runtime is O(segs) due to application of lengthsToUSegd