Ticket #7511 (new bug)

Opened 6 months ago

Last modified 2 months ago

Room for GHC runtime improvement >~5%, inlining related

Reported by: danielv Owned by:
Priority: normal Milestone: 7.8.1
Component: Compiler Version: 7.6.1
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I compare running nofib under GHC with default optimization flags vs. with somewhat extreme settings to inlining aggressiveness.

1. Many programs are unaffected, but some are improved significantly. Geometric mean: -6.5% allocs, -5.9% runtime.

2. Some regress significantly, beyond what the obvious code size growth would explain.

circsim          +0.7%     -4.5%     -0.7%     -0.9%     +2.1%
comp_lab_zift    +0.5%     -0.1%     -3.3%     -3.3%    +28.6%
fulsom          +13.8%     -2.6%     +4.3%     +4.3%    +10.8%
mandel2          +0.4%     +4.5%      0.01      0.01     +0.0%
paraffins        +0.5%     +0.0%     +4.0%     +2.9%     +0.0%
rewrite          +0.4%    +15.9%      0.03      0.03     +0.0%
tak              +0.1%     +4.0%      0.02      0.02     +0.0%
treejoin         -0.1%     -0.0%     +2.2%     +1.7%     +0.0%
wave4main        +1.6%    +29.4%    +19.9%    +19.9%    +30.8%

This presents two opportunities:

1. Find better default flags settings (maybe not quite as extreme) and make them the default.

2. Find the reasons behind the regressions, and fix them in GHC. In addition to improving the performance we perceive through GHC, hopefully performance will become more predictable to users: Simon PJ has told me he expects (paraphrasing) "more inlining should make things better, except for/through code size", which would be a very useful invariant; the data here clearly show some cases where it does not hold.

Included are a complete report, an extract of the highlights (significantly improved or any regressed benchmarks) and the script to reproduce given a ghc7.6 devel2 build.

Attachments

compareGhc76RegVsVeryKeen.txt Download (155.5 KB) - added by danielv 6 months ago.
Full report I excerpted from
nofibKeenessCompare Download (331 bytes) - added by danielv 6 months ago.
Script to produce logs, use nofib/nofib-analyse/nofib-analyse to produce reports.

Change History

Changed 6 months ago by danielv

Full report I excerpted from

Changed 6 months ago by danielv

Skipping the highlights since the regressions ended up inline.

Changed 6 months ago by danielv

Script to produce logs, use nofib/nofib-analyse/nofib-analyse to produce reports.

Changed 4 months ago by nfrisby

Just FYI, I've recently seen one example where more inlining caused an increase in allocation. Here's an abstraction:

f a b = let-no-escape j = ...
        in ... j ...

g x y = ... case f (...) (...) of ...

f got inlined and the j binding got floated out a bit. Thus j was no longer LNE, since the result of its call was scrutinized.

This happened in puzzle; the StateType's == and /= methods were f and g, respectively. I had accidentally awarded a huge result discount to ==. Yell if you'd like more details.

Changed 4 months ago by simonpj

  • cc simonpj@… removed
  • difficulty set to Unknown

Right! I think this loss of let-no-escapey-ness is precisely what can cause increased allocation when we inline. Daniel and I found this (hence the birth of this ticket), and I believe that the LNE thing was the sole cause we identified.

One might solve this by making the code generator yet more clever, so that it can avoid allocation for non-escaping functions, even they aren't tail calls. But that is quite hard.

More promising, I think, is to float that j function to top level altogether, and that is what Nick is working on. I'm hopeful that this'll solve much of the problem.

See also #5075 which reports a similar difficulty with LNEs becoming non-LNEs.

Changed 2 months ago by igloo

  • milestone set to 7.8.1
Note: See TracTickets for help on using tickets.