Changes between Version 36 and Version 37 of DataParallel/BenchmarkStatus
- Timestamp:
- 03/09/09 05:09:52 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
DataParallel/BenchmarkStatus
v36 v37 83 83 || DotP, ref C || 100M elements || – || 554 || 277 || 142 || 72 || 37 || 22 || 20 || 84 84 || SMVM, primitives || 10kx10k @ density 0.1 || 1102/1102 || 1112/1112 || 561/561 || 285/285 || 150/150 || 82/82 || 63/70 || 54/100 || 85 || SMVM, vectorised || 10kx10k @ density 0.1 || 2312/2312 || 15960/15960 || 8192/8192 || 4188/4188 || 2362/2362 || 1538/1538 || 1047/1047 || 950/950||85 || SMVM, vectorised || 10kx10k @ density 0.1 || 1784/1784 || 1810/1810 || 910/910 || 466/466 || 237/237 || 131/131 || 96/96 || 87/87 || 86 86 || SMVM, ref C || 10kx10k @ density 0.1 || 580 || – || – || – || – || – || – || – || 87 87 || SMVM, primitives || 100kx100k @ density 0.001 || 1112/1112 || 1299/1299 || 684/684 || 653/653 || 368/368 || 294/294 || 197/197 || 160/160 || 88 || SMVM, vectorised || 100kx100k @ density 0.001 || 2345/2345 || 16110/16110 || 8553/8553 || 4400/4400 || 2572/2572 || 1645/1645 || 1224/1224 || 1005/1005||88 || SMVM, vectorised || 100kx100k @ density 0.001 || 1824/1824 || 2008/2008 || 1048/1048 || 1010/1010 || 545/545 || 426/426 || 269/269 || 258/258 || 89 89 || SMVM, ref C || 100kx100k @ density 0.001 || 600 || – || – || – || – || – || – || – || 90 90 … … 101 101 ==== Comments regarding smvm ==== 102 102 103 As on !LimitingFactor, but it scales much more nicely and improves until using four threads per core. This suggets that memory bandwidth is again a critical factor in this benchmark (this fits well with earlier observations on other architectures). Despite fusion problem with `dph-par`, the parallel Haskell program, using all 8 cores, still ends up three times faster than the sequential C program.103 As on !LimitingFactor, but it scales much more nicely and improves until using four threads per core. This suggets that memory bandwidth is again a critical factor in this benchmark (this fits well with earlier observations on other architectures). 104 104 105 105 On this machine, "SMVM primitives" also has a quirk from 2 to 4 threads. This re-enforces the suspicion that this is a scheduling problem.
