Ticket #1136 (new compile-time performance bug)

Opened 2 years ago

Last modified 2 months ago

High memory use when compiling many let bindings.

Reported by: igloo Assigned to:
Priority: high Milestone: 6.10.1
Component: Compiler Version: 6.6
Severity: normal Keywords: performance
Cc: Difficulty: Unknown
Test Case: Architecture: Unknown
Operating System: Unknown

Description

Two programs that generate modules with lots of let bindings that GHC struggles to compile. With the first,

ghc -c J.hs

uses >100M of memory which seems a little high. For the second,

ghc -c J2.hs

got past 700M before I killed it, with both 6.6 and HEAD.

Attachments

mkj.hs (446 bytes) - added by igloo on 02/06/07 07:51:20.
mkj2.hs (471 bytes) - added by igloo on 02/06/07 07:51:26.
J_mem_vs_time.png (7.0 kB) - added by igloo on 01/22/08 08:15:25.
J_heap_profile.png (58.3 kB) - added by igloo on 01/22/08 08:15:56.
J.prof (14.0 kB) - added by igloo on 01/22/08 08:17:13.

Change History

02/06/07 07:51:20 changed by igloo

  • attachment mkj.hs added.

02/06/07 07:51:26 changed by igloo

  • attachment mkj2.hs added.

02/06/07 07:52:48 changed by igloo

05/07/07 03:49:21 changed by simonmar

  • priority changed from normal to high.

11/12/07 04:48:24 changed by simonpj

  • milestone changed from 6.8 branch to 6.4.3.

See also #1875, #783

11/12/07 04:48:59 changed by simonpj

  • milestone changed from 6.4.3 to 6.8.3.

12/14/07 07:17:30 changed by igloo

  • keywords set to performance.

01/07/08 02:42:30 changed by simonmar

  • type changed from bug to compile-time performance bug.

01/22/08 08:15:25 changed by igloo

  • attachment J_mem_vs_time.png added.

01/22/08 08:15:56 changed by igloo

  • attachment J_heap_profile.png added.

01/22/08 08:17:13 changed by igloo

  • attachment J.prof added.

01/22/08 08:23:10 changed by igloo

J_mem_vs_time.png shows the memory use against time for J.hs, with different values of numa/numb. It doesn't look like there is a complexity problem, but the constant factor seems high, using about 10k per binding. J_heap_profile.png is the heap profile for the largest case (400/20000), and peak usage works out at about 5k per binding (the difference presumably being due to the copying GC). Typecheck-Rename stands out, but there are also a few other large space users.

01/23/08 03:27:19 changed by simonpj

Interesting. I simplified it still more to a program of form:

a = let a0 = ()
    in let a1 = a0
        ...
    in let a10000 = a9999

and it still takes ages in the type checker, with a residency of 87M for 10,000 bindings. Splitting up the lets eliminates dependency analysis as the problem (in the type checker at least).

Do you think you might do a little bit more profiling to see where the typechecker/renamers costs are coming, for this simpler program?

Simon

Could

02/04/08 08:00:40 changed by igloo

For J2, consider this module:

module J where

data Foo a b c d = Foo a b c d

a = let
    a0 = a3
    a1 = a0
    a2 = a1
    a3 = a2
 in Foo a0 a1 a2 a3

We get (tidied up a bit):

==================== Desugar ====================
Rec {
J.a :: forall t0 t1 t2 t3. J.Foo t0 t1 t2 t3
J.a = \ (@ t0) (@ t1) (@ t2) (@ t3) ->
  letrec {
    res :: J.Foo t0 t1 t2 t3
    res =
      letrec {
        tup :: forall ttup. (ttup, ttup, ttup, ttup)
        tup =
          \ (@ ttup) ->
            letrec {
              tupa1 :: ttup
              tupa1 = tupa0;
              tupa2 :: ttup
              tupa2 = tupa1;
              tupa3 :: ttup
              tupa3 = tupa2;
              tupa0 :: ttup
              tupa0 = tupa3; } in
            (tupa0, tupa3, tupa2, tupa1);
        a0 :: forall ttup. ttup
        a0 = \ (@ ttup) -> case tup @ ttup of _ { (tup0, _, _, _) -> tup0 };
        a3 :: forall ttup. ttup
        a3 = \ (@ ttup) -> case tup @ ttup of _ { (_, tup1, _, _) -> tup1 };
        a2 :: forall ttup. ttup
        a2 = \ (@ ttup) -> case tup @ ttup of _ { (_, _, tup2, _) -> tup2 };
        a1 :: forall ttup. ttup
        a1 = \ (@ ttup) -> case tup @ ttup of _ { (_, _, _, tup3) -> tup3 };
      } in
      J.Foo
        @ t0
        @ t1
        @ t2
        @ t3
        (a0 @ t0)
        (a1 @ t1)
        (a2 @ t2)
        (a3 @ t3); } in
  res
end Rec }

I think that most of the space usage comes from the tuple deconstructors. We should look into desugaring into something like this instead:

==================== Desugar ====================
Rec {
J.a :: forall t0 t1 t2 t3. J.Foo t0 t1 t2 t3
J.a = \ (@ t0) (@ t1) (@ t2) (@ t3) ->
  letrec {
    res :: J.Foo t0 t1 t2 t3
    res =
      letrec {
        a0 :: forall ttup. ttup
        a0 = \ (@ ttup) -> a3;
        a3 :: forall ttup. ttup
        a3 = \ (@ ttup) -> a2;
        a2 :: forall ttup. ttup
        a2 = \ (@ ttup) -> a1;
        a1 :: forall ttup. ttup
        a1 = \ (@ ttup) -> a0;
      } in
      J.Foo
        @ t0
        @ t1
        @ t2
        @ t3
        (a0 @ t0)
        (a1 @ t1)
        (a2 @ t2)
        (a3 @ t3); } in
  res
end Rec }

Simon was worried about losing dictionary sharing if we did so, so we need to check into whether that causes a performance problem, and whether we can get the sharing some other way.

06/09/08 09:01:56 changed by simonpj

This patch should help

Thu Jun  5 05:44:23 PDT 2008  simonpj@microsoft.com
  * Desugar multiple polymorphic bindings more intelligently

But I suspect there is something else going on too in the typechecker, perhaps chains of mutable variables.

But first let's see if there's any improvement.

Simon

06/20/08 13:40:46 changed by igloo

  • milestone changed from 6.8.3 to 6.10.1.