Room for GHC runtime improvement >~5%, inlining related

I compare running nofib under GHC with default optimization flags vs. with somewhat extreme settings to inlining aggressiveness.

Many programs are unaffected, but some are improved significantly. Geometric mean: -6.5% allocs, -5.9% runtime.
Some regress significantly, beyond what the obvious code size growth would explain.

circsim          +0.7%     -4.5%     -0.7%     -0.9%     +2.1%
comp_lab_zift    +0.5%     -0.1%     -3.3%     -3.3%    +28.6%
fulsom          +13.8%     -2.6%     +4.3%     +4.3%    +10.8%
mandel2          +0.4%     +4.5%      0.01      0.01     +0.0%
paraffins        +0.5%     +0.0%     +4.0%     +2.9%     +0.0%
rewrite          +0.4%    +15.9%      0.03      0.03     +0.0%
tak              +0.1%     +4.0%      0.02      0.02     +0.0%
treejoin         -0.1%     -0.0%     +2.2%     +1.7%     +0.0%
wave4main        +1.6%    +29.4%    +19.9%    +19.9%    +30.8%

This presents two opportunities:

Find better default flags settings (maybe not quite as extreme) and make them the default.
Find the reasons behind the regressions, and fix them in GHC. In addition to improving the performance we perceive through GHC, hopefully performance will become more predictable to users: Simon PJ has told me he expects (paraphrasing) "more inlining should make things better, except for/through code size", which would be a very useful invariant; the data here clearly show some cases where it does not hold.

Included are a complete report, an extract of the highlights (significantly improved or any regressed benchmarks) and the script to reproduce given a ghc7.6 devel2 build.

Trac metadata

Trac field	Value
Version	7.6.1
Type	Bug
TypeOfFailure	OtherFailure
Priority	normal
Resolution	Unresolved
Component	Compiler
Test case
Differential revisions
BlockedBy
Related
Blocking
CC	simonpj@microsoft.com
Operating system
Architecture

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information