Ticket #5996 (patch bug)
fix for CSE
| Reported by: | michalt | Owned by: | simonpj |
|---|---|---|---|
| Priority: | normal | Milestone: | 7.8.1 |
| Component: | Compiler | Version: | 7.5 |
| Keywords: | Cc: | conrad@… | |
| Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
| Type of failure: | Runtime performance bug | Difficulty: | Unknown |
| Test Case: | Blocked By: | ||
| Blocking: | Related Tickets: |
Description
The current version of CSE is slightly broken -- contrary to the comments explaining it, it does not add an additional mapping when it substitutes the RHS of some binding. Taking the example from CSE.lhs
module Test2 where
data C a b = C a b
a = undefined
{-# NOINLINE a #-}
b = undefined
{-# NOINLINE b #-}
x1 = C a b
x2 = C x1 b
y1 = C a b
y2 = C y1 b
We want to detect that y1 is the same as x1 and so rewrite it to y1 = x1. But at this point we also want to add a new substitution that changes y1 to x1. So that we can get y2 = C x1 b and then y2 = x2. This is not done by the current code:
> ghc -O2 -fforce-recomp Test2.hs -ddump-simpl -dsuppress-all -rtsopts [1 of 1] Compiling Test2 ( Test2.hs, Test2.o ) ==================== Tidy Core ==================== Result size = 40 b = undefined a = b x1 = \\ @ a_aaf @ b_aag -> C (a) (b) x2 = \\ @ b_aao @ a_aap @ b1_aaq -> C (x1) (b) y1 = x1 y2 = \\ @ b_aaG @ a_aaH @ b1_aaI -> C (x1) (b)
I wrote a patch to fix that and we get the desired result:
> ~/dev/ghc-work/inplace/bin/ghc-stage2 -O2 -fforce-recomp Test2.hs -ddump-simpl -dsuppress-all -rtsopts [1 of 1] Compiling Test2 ( Test2.hs, Test2.o ) ==================== Tidy Core ==================== Result size = 30 b = undefined a = b x1 = \\ @ a_aal @ b_aam -> C (b) (b) x2 = \\ @ b_aau @ a_aav @ b1_aaw -> C (x1) (b) y1 = x1 y2 = x2
The fix seems quite easy but it made nofib unhappy -- see nofib1 attachment. Apparently there is quite a bit additional alloctation happening in a few benchmarks. I managed to narrow it down to the GHC.IO.Encoding.* modules. Adding a simple GHC_OPTIONS -fno-cse seems to improve the performance quite a bit (and above the current HEAD!) -- see nofib2 attachment. There seems to be a bit more code bloat, but the performance looks worth it. I haven't yet looked into exactly what causes the excessive allocation with CSE in the GHC.IO.Encoding.* modules (and I'm also a bit surprised by that -- I thought the main issue with CSE would be bigger memory usage). So any suggestions are more than welcome. ;)
All the patches are in https://github.com/michalt/ghc (branch "cse") and https://github.com/michalt/packages-base (branch "cse").

