Changes between Version 33 and Version 34 of DataParallel/BenchmarkStatus
- Timestamp:
- 03/08/09 07:04:42 (4 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
DataParallel/BenchmarkStatus
v33 v34 66 66 There seems to be a fusion problem in DotP with `dph-par` (even if the version of `zipWithSUP` that uses `splitSD/joinSD` is used); hence the much lower runtime for "N=1" than for "sequential". The vectorised version runs out of memory; maybe because we didn't solve the `bpermute` problem, yet. 67 67 68 Obviously, the vectorised version remains to be improved. This is due to an unexploited fusion opportunity. Moreover, "SMVM, primitives" exhibits a strange behaviour from 2 to 4 threads with the matrix of density 0.001. This might be a scheduling problem. 69 68 70 === Execution on greyarea (1x UltraSPARC T2) === 69 71 … … 101 103 As on !LimitingFactor, but it scales much more nicely and improves until using four threads per core. This suggets that memory bandwidth is again a critical factor in this benchmark (this fits well with earlier observations on other architectures). Despite fusion problem with `dph-par`, the parallel Haskell program, using all 8 cores, still ends up three times faster than the sequential C program. 102 104 103 104 105 On this machine, "SMVM primitives" also has a quirk from 2 to 4 threads. This re-enforces the suspicion that this is a scheduling problem.
