Ticket #2762 (new bug)

Opened 15 months ago

Last modified 10 months ago

Excessive heap usage

Reported by: igloo Owned by:
Component: Compiler Version: 6.11
Keywords: Cc: batterseapower@…
Operating System: Unknown/Multiple
Test Case: Architecture: Unknown/Multiple
Type of failure:

Description

With Main.hs:

module Main (main) where

import InputOutput

main :: IO ()
main = do
          let content1 = concat (replicate 1000000 "1x") ++ "0"
          let i1 = fst $ input content1
          view i1

          let content2 = concat (replicate 1000001 "1y") ++ "0"
          let i2 = fst $ input content2
          view i2

view :: [Char] -> IO ()
view [] = return ()
view (i : is) = i `seq` view is

and InputOutput.hs:

module InputOutput (input) where

class InputOutput a where
    input :: String -> (a, String)

instance InputOutput Char where
    input (x : bs) = (x, bs)

instance InputOutput a => InputOutput [a] where
    input ('0':bs) = ([], bs)
    input ('1':bs) = case input bs of
                     (x, bs') ->
                         case input bs' of
                         ~(xs, bs'') -> (x : xs, bs'')

according to

ghc -O -prof -auto-all --make Main.hs -fforce-recomp
./Main +RTS -h

heap usage goes up to about 20M with the HEAD, but only about 200 bytes with 6.8.2.

This is with 6.11.20081108, but I started investigating with the HEAD when I saw similar problems with more-or-less 6.10.1.

Attachments

ghc_6_8.png Download (23.0 KB) - added by igloo 15 months ago.
ghc_6_11.png Download (13.5 KB) - added by igloo 15 months ago.
ImproveDefArity.Initial.darcs_patch Download (58.9 KB) - added by batterseapower 15 months ago.
Arity analysis whipped together on 29th November
working-A-vs-arity-def-site-B Download (138.0 KB) - added by batterseapower 15 months ago.
Nofib suite results (before vs after analysis added, -O1)
Main.hs.STGsyntax-0 Download (21.5 KB) - added by batterseapower 15 months ago.
Atom STG output for compiler without arity analysis
Main.hs.2.STGsyntax-0 Download (20.5 KB) - added by batterseapower 15 months ago.
Atom STG output for compiler with arity analysis version B (aka "Initial")

Change History

Changed 15 months ago by igloo

Changed 15 months ago by igloo

  Changed 15 months ago by simonmar

Ok, here's what's happening. The input method for InputOutput [a] gets compiled to this (STG syntax, so we can see the closures):

InputOutput.input1 =
    \r srt:(0,*bitmap*) [$dInputOutput_smx]
        let {
          input2_smz =
              \u srt:(0,*bitmap*) [] InputOutput.input1 $dInputOutput_smx; } in
        let {
          sat_snm =
              \r srt:(1,*bitmap*) [ds_smB]
              ...
        } in  sat_snm;

so, sat_snm is the function that does the work, and it has input2_smz as a free variable.

When this function is called, it allocates closures for sat_snm and input2_smz, and proceeds to enter sat_snm. Eventually this call evaluates input2_smz, which is another call to input1, which creates more closures, and so on.

Normally all these closures would be GC'd immediately, but they don't in this example, because the Main module has this CAF:

Main.input :: GHC.Base.String -> ([GHC.Types.Char], GHC.Base.String)
Main.input =
  InputOutput.input1
    @ GHC.Types.Char
    (InputOutput.a `cast` ...)

This CAF evaluates to an unbounded chain of sat_snm closures: remember sat_snm has input2 as a free var, which itself evaluates to another sat_snm closure, and so on). The CAF is retained because it is used by the second call to input.

The solution seems simple: inline input2, then the two closures in input1 disappear. We ought to be able to tell that input2 is a value - perhaps the recursion gets in the way?

  Changed 15 months ago by simonpj

  • cc batterseapower@… added

What we need here is a decent arity analysis, which Max is thinking about. We have

  f = \d. let f1 = f d in
           \y. ...f1...

The leak comes from a shared partial application of f:

  let t = f d in (t e1, t e2)
or
  map (f d) es

This is readily fixed. f looks as if it has arity 1, but a simple fixpoint argument shows that it really has arity 2. When we see that, we see that we can eta-expand it to get

  f = \d.\x.  let f1 = f d in ...f1 d...

Now we can inline f1, and all is good.

I had not previously realised that this arity analysis stuff could fix space leaks.

Simon

follow-up: ↓ 4   Changed 15 months ago by batterseapower

Well, I've had a look at the arity analysis idea. I'll attached a preliminary version of the analysis. It does indeed fix some "bad" arities, including the one in this example, but it's reasonably costly compile-time wise and doesn't have a non-neglible effect on allocations or runtime in Nofib, with two exceptions:

  • Bspt allocates a lot more, because the call to concat in Stdlib.seQuence is saturated by the eta-expansion step. This causes the simplifier to inline concat at that use site (even though the arguments are boring, since concat is a thin wrapper around a function poly_go). This in turn leads to RULES not spotting any instances of concat during the compilation of the Init and BSPT modules, so foldr/build fusion just fails to happen. (NB: you need to add the missing phase activation [~1] to the concat RULE in GHC.List to see this effect)
  • Atoms runtime apparently increases a lot, though there is no apparent difference in the STG generated in the two versions (also attached). I can't work out why this is.

Even if we fix these problems is it really worth introducing this analysis, given that it's effects are so small?

Changed 15 months ago by batterseapower

Arity analysis whipped together on 29th November

Changed 15 months ago by batterseapower

Nofib suite results (before vs after analysis added, -O1)

Changed 15 months ago by batterseapower

Atom STG output for compiler without arity analysis

Changed 15 months ago by batterseapower

Atom STG output for compiler with arity analysis version B (aka "Initial")

in reply to: ↑ 3 ; follow-up: ↓ 5   Changed 15 months ago by igloo

Replying to batterseapower:

Well, I've had a look at the arity analysis idea. I'll attached a preliminary version of the analysis. It does indeed fix some "bad" arities, including the one in this example,

Great, thanks!

but it's reasonably costly compile-time wise Even if we fix these problems is it really worth introducing this analysis, given that it's effects are so small?

Fixing the space leak is important to me. If the performance is an issue, then perhaps it should be an -O2-only optimisation?

in reply to: ↑ 4   Changed 14 months ago by batterseapower

Replying to igloo:

Fixing the space leak is important to me. If the performance is an issue, then perhaps it should be an -O2-only optimisation?

Perhaps. It seems to cost about 9% of the compile time of a -O build, so it's not a /massive/ problem.

Anyway, I looked a bit more at this arity stuff tonight and I've managed to fix a few bugs in the analysis, as well as make it work across several modules - this makes it /slightly/ more effective, and has fixed the problems I mentioned above.

I'm going to continue tuning and extending it, and will talk to Simon PJ when I have something potentially viable for HEAD.

  Changed 12 months ago by igloo

I think I can work around the cases of this in my real program by pulling the instance methods out into recursive functions, e.g.

instance InputOutput a => InputOutput [a] where
    input = inputList

inputList :: InputOutput a => String -> ([a], String)
inputList ('0':bs) = ([], bs)
inputList ('1':bs) = case input bs of
                     (x, bs') ->
                         case inputList bs' of
                         ~(xs, bs'') -> (x : xs, bs'')

In some cases it gets a little uglier, but still tolerable, e.g. with:

data Seq p from to where
    Cons :: p from mid -> Seq p mid to -> Seq p from to
    Nil :: Seq p here here

I need to hide the from and to arguments to stop it leaking space:

data Hide2 t where
    Hide2 :: t a b -> Hide2 t

hide2 :: t a b -> Hide2 t
hide2 x = Hide2 x

unhide2 :: Hide2 t -> t a b
unhide2 (Hide2 x) = unsafeCoerce x

inputSeq :: InputOutput2 p => ByteString -> (Hide2 (Seq p), ByteString)
inputSeq bs = case BS.head bs of
              0 -> (hide2 Nil, BS.tail bs)
              1 -> case input2 (BS.tail bs) of
                   (x, bs') ->
                       case inputSeq bs' of
                       ~(xs, bs'') ->
                           (hide2 (x `Cons` unhide2 xs), bs'')
              _ -> error "InputOutput Seq: Bad value"

So I'd still like this to be fixed, as it would be one less thing to worry about when writing code, but I don't think it's actually blocking me currently.

  Changed 10 months ago by igloo

  • milestone changed from 6.10.2 to 6.12 branch
Note: See TracTickets for help on using tickets.