| Version 13 (modified by igloo, 6 months ago) |
|---|
Dynamic by default
Bugs
Performance
Full nofib results showing the effect of switching to dynamic-by-default are available for OS X x86_64, OS X x86, Linux x86_64 and Linux x86. There is also a table of the highlights below. In summary:
Binary sizes are way down across the board, as we are now dynamically linking to the libraries.
Things are rosiest on OS X x86_64. On this platform, -fPIC is always on, so using dynamic libraries doesn't mean giving up a register for PIC. Overall, performance is a few percent better with dynamic by default.
On OS X x86, the situation is not so nice. On x86 we are very short on registers, and giving up another for PIC means we end up around 15% down on performance.
On Linux x86_64 we have more registers, so the effect of giving one up for PIC isn't so pronounced, but we still lose a few percent performance overall.
For unknown reasons, x86 Linux suffers even worse than x86 OS X, with around a 30% performance penalty.
| static -> dynamic on OS X x86_64 |
static -> dynamic on OS X x86 |
static -> dynamic on Linux x86_64 |
static -> dynamic on Linux x86 |
|
|---|---|---|---|---|
| Binary Sizes | ||||
| -1 s.d. | -95.8% | -95.8% | -95.8% | -95.9% |
| +1 s.d. | -93.1% | -92.8% | -92.6% | -92.4% |
| Average | -94.6% | -94.5% | -94.5% | -94.4% |
| Run Time | ||||
| -1 s.d. | -1.2% | +11.7% | -2.5% | +16.6% |
| +1 s.d. | +1.6% | +20.0% | +9.6% | +40.3% |
| Average | +0.2% | +15.8% | +3.3% | +27.9% |
| Elapsed Time | ||||
| -1 s.d. | -6.9% | +10.3% | -2.5% | +16.6% |
| +1 s.d. | -0.3% | +20.4% | +9.6% | +40.3% |
| Average | -3.7% | +15.2% | +3.3% | +27.9% |
| Mutator Time | ||||
| -1 s.d. | -1.3% | +8.9% | -5.0% | +18.3% |
| +1 s.d. | +1.9% | +18.3% | +7.5% | +46.8% |
| Average | +0.3% | +13.5% | +1.1% | +31.8% |
| Mutator Elapsed Time | ||||
| -1 s.d. | -4.5% | +7.7% | -5.0% | +18.3% |
| +1 s.d. | +0.3% | +18.8% | +7.5% | +46.8% |
| Average | -2.1% | +13.1% | +1.1% | +31.8% |
| GC Time | ||||
| -1 s.d. | -1.4% | +16.3% | +5.6% | +13.4% |
| +1 s.d. | +1.8% | +27.1% | +11.2% | +24.0% |
| Average | +0.2% | +21.6% | +8.4% | +18.6% |
| GC Elapsed Time | ||||
| -1 s.d. | -1.5% | +15.8% | +5.6% | +13.4% |
| +1 s.d. | +1.3% | +25.6% | +11.2% | +24.0% |
| Average | -0.1% | +20.6% | +8.4% | +18.6% |
| Compile Times | ||||
| -1 s.d. | -11.7% | +6.2% | -1.8% | +27.0% |
| +1 s.d. | -0.5% | +18.2% | +7.8% | +37.8% |
| Average | -6.3% | +12.1% | +2.9% | +32.3% |
OS X x86 vs x86_64
Currently, some people use the x86 version of GHC on OS X for performance reasons. It's not clear for how much longer this will be viable, as other OS X libraries start dropping x86 support.
Full nofib results comparing the two are here for static by default, and here for dynamic by default, but the highlights are in the table below.
The left-hand column shows the status quo: x86_64 only beats x86 in mutator time, and that is a shallow victory as the higher GC time means that total runtime is worse for x86_64.
The right-hand column shows what the situation would be if we switch to dynamic instead. Allocations, memory use etc remain higher due to all word-sized things being twice as big. However, the combination of x86_64's performance improving, and x86's performance getting worse, means that x86_64 is now faster overall.
| x86 -> x86_64 when static by default | x86 -> x86_64 when dynamic by default | |
|---|---|---|
| Binary Sizes | ||
| -1 s.d. | +38.0% | +7.4% |
| +1 s.d. | +38.6% | +30.6% |
| Average | +38.3% | +18.5% |
| Allocations | ||
| -1 s.d. | +63.2% | +63.2% |
| +1 s.d. | +114.4% | +114.4% |
| Average | +87.0% | +87.0% |
| Run Time | ||
| -1 s.d. | -23.5% | -31.6% |
| +1 s.d. | +36.1% | +14.7% |
| Average | +2.1% | -11.4% |
| Elapsed Time | ||
| -1 s.d. | -18.2% | -30.0% |
| +1 s.d. | +40.1% | +17.0% |
| Average | +7.0% | -9.5% |
| Mutator Time | ||
| -1 s.d. | -32.4% | -38.8% |
| +1 s.d. | +20.1% | +3.0% |
| Average | -9.9% | -20.6% |
| Mutator Elapsed Time | ||
| -1 s.d. | -28.7% | -37.9% |
| +1 s.d. | +22.5% | +4.4% |
| Average | -6.6% | -19.5% |
| GC Time | ||
| -1 s.d. | +4.5% | -11.9% |
| +1 s.d. | +74.8% | +54.1% |
| Average | +35.2% | +16.5% |
| GC Elapsed Time | ||
| -1 s.d. | +7.9% | -8.0% |
| +1 s.d. | +75.1% | +56.7% |
| Average | +37.4% | +20.0% |
| Total Memory in use | ||
| -1 s.d. | -1.7% | -1.9% |
| +1 s.d. | +88.9% | +88.9% |
| Average | +36.3% | +36.1% |
| Compile Times | ||
| -1 s.d. | +11.9% | -8.9% |
| +1 s.d. | +21.1% | +2.9% |
| Average | +16.4% | -3.1% |
