Update to paper in IEEE Visualization 2006

After publishing performance results for our functional implementations of the Marching Cubes algorithm, we discovered two things which individually have a significant impact on the figures.

  • (1) We attempted to exclude time spent in the OpenGL renderer, for both our own code and the VTK baseline, to be sure that we were measuring the MC algorithm itself. Unfortunately, our original solution was inadvertently *too* lazy.

Our choice to count the produced number of vertices, in fact meant that no interpolation calculations were being forced - it was enough for the system to read the length of the vector of edges from the case table, without actually examining any of those edges.

The fault was revealed only when we ran a recently-developed Haskell code-coverage tool (hpc). We have now added into the code pipeline, an extra pass over the output vertices which forces their values to be calculated. On average, this roughly doubled the Haskell computation time for all datasets.

  • (2) The second discovery was that we were not giving our Haskell compiler the best optimisation flags for numeric calculation. One can assume that the VTK package is highly tuned for most common platforms, including the use of vectorised instruction sets where they are available. The comparison would be artificially biased if we neglected to do the same for the Haskell versions.

Thus, we have added the flags

        -fexcess-precision -optc-O3 -optc-mcpu=970

where the cpu-specific setting allows the use of the AltiVec vectorised instruction set on the PowerPC G5. For a Pentium-3 based system, the equivalent flags would be

    -fexcess-precision -optc-O3 -optc-ffast-math -optc-mfpmath=sse -optc-msse2

Using these flags on average halved the Haskell computation time for all datasets.

Putting both effects together, as one can imagine, they roughly cancel each other out. Indeed, the cumulative impact is that our figures do not change by much. However, one important observation from the paper's conclusions must be re-stated. The original figures showed a strong proportionality between the size of the dataset and the time taken. The new figures make it clear that the size of the output surface is also now an important factor in the running time, where this was not the case before. (Indeed this is hardly surprising, since the original tests did not in fact calculate the surface itself, only its size.)

Also, where previously our streaming implementation was significantly faster than VTK on two of the large datasets, it is now slower on one, and only marginally faster on the other.

Array Streaming
dataset published corrected published corrected
silicium 0.626 0.554 0.386 0.923
neghip 1.088 1.055 0.852 0.799
hydrogen 8.638 6.472 6.694 4.048
lobster 25.37 19.06 18.42 16.45
engine 44.51 28.94 28.06 21.35
statueLeg 48.78 33.22 34.54 19.83
aneurism 72.98 51.99 54.44 31.96
skull 79.50 89.79 57.19 71.70
stent8 287.5 156.5 154.9 113.1
vertebra8 703.0 860.9 517.1 819.5