Ticket #594 (closed task: fixed)

Opened 8 years ago

Last modified 3 years ago

Support use of SSE2 in the x86 native code genreator

Reported by: simonmar Owned by: simonmar
Priority: normal Milestone: 7.0.1
Component: Compiler (NCG) Version: 6.4.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty: Moderate (less than a day)
Test Case: N/A Blocked By:
Blocking: Related Tickets:

Description

Currently only the x86_64 native code generator supports SSE2, but it would be worthwhile enabling this in the x86 backend too.

Change History

Changed 8 years ago by simonmar

  • difficulty set to Moderate (1 day)
  • os set to Unknown
  • architecture set to Unknown

Changed 7 years ago by igloo

  • testcase set to N/A
  • milestone set to 6.8

Changed 6 years ago by simonmar

  • owner set to simonmar

I'm probably going to do this.

Changed 6 years ago by simonmar

  • owner simonmar deleted
  • milestone changed from 6.8 branch to 6.10 branch

Changed 5 years ago by simonmar

See #1890 for a test case (actually we could put that test into nofib).

Changed 5 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

Changed 5 years ago by simonmar

  • os changed from Unknown to Unknown/Multiple

Changed 4 years ago by igloo

  • milestone changed from 6.10 branch to 6.12 branch

Changed 4 years ago by simonmar

  • difficulty changed from Moderate (1 day) to Moderate (less than a day)

Changed 3 years ago by igloo

  • failure set to Runtime performance bug

Changed 3 years ago by simonmar

  • owner set to simonmar
  • status changed from new to assigned
  • milestone changed from 6.12 branch to 6.14.1

I'm on this.

Changed 3 years ago by simonmar

  • status changed from assigned to closed
  • resolution set to fixed

Done:

Thu Feb  4 10:48:49 GMT 2010  Simon Marlow <marlowsd@gmail.com>
  * Implement SSE2 floating-point support in the x86 native code generator (#594)
  
  The new flag -msse2 enables code generation for SSE2 on x86.  It
  results in substantially faster floating-point performance; the main
  reason for doing this was that our x87 code generation is appallingly
  bad, and since we plan to drop -fvia-C soon, we need a way to generate
  half-decent floating-point code.
  
  The catch is that SSE2 is only available on CPUs that support it (P4+,
  AMD K8+).  We'll have to think hard about whether we should enable it
  by default for the libraries we ship.  In the meantime, at least
  -msse2 should be an acceptable replacement for "-fvia-C
  -optc-ffast-math -fexcess-precision".
  
  SSE2 also has the advantage of performing all operations at the
  correct precision, so floating-point results are consistent with other
  platforms.
  
  I also tweaked the x87 code generation a bit while I was here, now
  it's slighlty less bad than before.

I measured the FF ray tracer benchmark, and -msse2 seems on par with, or possibly better than, "-fvia-C -optc-O3 -fexcess-precision -ffast-math", although the results are quite variable on the machine I tried it on. I suspect we're suffering from randomly misaligned Doubles on the stack and heap.

Note: See TracTickets for help on using tickets.