Ticket #1216 (new run-time performance bug)

Opened 2 years ago

Last modified 3 days ago

indexing 2d arrays is slow and leaky. why?

Reported by: claus Assigned to: simonpj
Priority: normal Milestone: 6.10.2
Component: Compiler Version: 6.6
Severity: normal Keywords:
Cc: Bulat.Ziganshin@gmail.com claus.reinke@talk21.com Difficulty: Unknown
Test Case: Architecture: Unknown/Multiple
Operating System: Unknown/Multiple

Description

readArray/writeArray call GHC.Arr.index, which seems inexplicably slow for 2d arrays. inexplicably, because simply copying the default implementation of index from GHC.Arr into the local module can speed things up considerably.

originally raised in this thread: http://www.haskell.org/pipermail/haskell-cafe/2007-March/023394.html

shortened example or matrix/vector-multiplication attached. comment out the first line of myindex to use the local copy. this results in a speedup from 20s to 13s (time ./Index 100000) on my system, not to mention the difference in space usage (a factor of 1000 in allocation, according to +RTS -sstderr..).

Attachments

Index.hs (1.9 kB) - added by claus on 03/12/07 09:58:11.

Change History

03/12/07 09:58:11 changed by claus

  • attachment Index.hs added.

03/13/07 09:00:36 changed by guest

  • cc set to Bulat.Ziganshin@gmail.com.

03/22/07 08:51:42 changed by igloo

  • milestone set to 6.6.2.

We should look into this for 6.6.2.

08/27/07 03:02:59 changed by guest

  • cc changed from Bulat.Ziganshin@gmail.com to Bulat.Ziganshin@gmail.com claus.reinke@talk21.com.

11/08/07 07:20:54 changed by simonmar

  • owner set to simonpj.
  • milestone changed from 6.6.2 to 6.8 branch.

I've investigated 6.8.1's performance on this example. In summary:

  • the performance difference between calling index and using the inline copy still exists
  • there is virtually no different in allocation between the two, i.e. the "leaky" aspect of the original report doesn't exist
  • Specialisation is working properly: we're calling the specialised version of index for (Int,Int).
  • The difference appears to be inlining + case-of-case. Although we're calling the specialised index, it isn't inlined. We can force it to be inlined with -finline-use-threshold=30, but that doesn't help. It looks like case-of-case isn't being applied after inlining.
  • something else is strange: there's a specialised function named $sa_something in the output, but it isn't referenced from anywhere.

Simon: could you take a look? The code is in ~simonmar/scratch/1216.hs.

11/12/07 07:31:59 changed by simonmar

  • milestone changed from 6.8 branch to 6.8.3.

01/25/08 09:19:08 changed by simonpj

A Friday afternoon, so I thought I'd look at this. The problem is that GHC.Arr.$windex1, which is called from the inner loop has this inlining:

$windex1 :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Base.Int
  {- Arity: 3 Strictness: LLL
     Unfolding: (\ ww :: GHC.Prim.Int# ww1 :: GHC.Prim.Int# ww2 :: GHC.Prim.Int# ->
                 case @ GHC.Base.Int GHC.Prim.<=# ww ww2 of wild {
                   GHC.Base.False
                   -> GHC.Arr.indexError2
                        (GHC.Base.I# ww, GHC.Base.I# ww1)
                        (GHC.Base.I# ww2)
                        GHC.Arr.lvl3
                   GHC.Base.True
                   -> case @ GHC.Base.Int GHC.Prim.<=# ww2 ww1 of wild1 {
                        GHC.Base.False
                        -> GHC.Arr.indexError2
                             (GHC.Base.I# ww, GHC.Base.I# ww1)
                             (GHC.Base.I# ww2)
                             GHC.Arr.lvl3
                        GHC.Base.True -> GHC.Base.I# (GHC.Prim.-# ww2 ww) } }) -}

Note the allocation of the I# box! Why? Becuase GHC.Arr.indexError is not marked as a bottoming function. Why not? Because it looks like this:

[Arity 1
 Str: DmdType L]
GHC.Arr.indexError =
  \ (@ a_aRh) (@ b_aRi) ($dShow_aRD [ALWAYS Just L] :: GHC.Show.Show a_aRh) ->
    let {
      shows1_X1Am [ALWAYS Just L] :: a_aRh -> GHC.Show.ShowS
      [Str: DmdType {aRD->U(C(S)AA)}]
      shows1_X1Am = GHC.Show.shows @ a_aRh $dShow_aRD } in
    let {
      shows2_X1Ap [ALWAYS Just L] :: a_aRh -> GHC.Show.ShowS
      [Str: DmdType {aRD->U(C(S)AA)}]
      shows2_X1Ap = GHC.Show.shows @ a_aRh $dShow_aRD } in
    \ (rng_ah3 [ALWAYS Just L] :: (a_aRh, a_aRh))
      (i_ah5 [ALWAYS Just L] :: a_aRh)
      (tp_ah7 [ALWAYS Just S] :: GHC.Base.String) ->
      GHC.Err.error @ b_aRi (...stuff...)

Note the stupid arity and that the strictness analyser is not spotting the bottom result.

I'll think about how to fix this stupidity.

Simon

02/25/08 02:15:11 changed by simonmar

  • type changed from bug to run-time performance bug.

06/20/08 13:35:35 changed by igloo

  • milestone changed from 6.8.3 to 6.10.1.

09/30/08 08:37:18 changed by simonmar

  • architecture changed from Unknown to Unknown/Multiple.

09/30/08 08:51:15 changed by simonmar

  • os changed from Unknown to Unknown/Multiple.

10/04/08 04:53:32 changed by igloo

  • milestone changed from 6.10.1 to 6.10.2.