Ticket #4267 (new bug)

Opened 21 months ago

Last modified 4 months ago

Strictness analyser is to conservative about passing a boxed parameter

Reported by: tibbe Owned by:
Priority: low Milestone: 7.6.1
Component: Compiler Version: 6.13
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Given the following two modules:

Fold.hs:

module Fold (Tree, fold') where

data Tree a = Leaf | Node a !(Tree a) !(Tree a)

-- Strict, pre-order fold.
fold' :: (a -> b -> a) -> a -> Tree b -> a
fold' f = go
  where
    go z Leaf = z
    go z (Node a l r) = let z'  = go z l
                            z'' = f z' a
                        in z' `seq` z'' `seq` go z'' r
{-# INLINE fold' #-}

FoldTest.hs:

module FoldTest (sumTree) where

import Fold

sumTree :: Tree Int -> Int
sumTree = fold' (+) 0

I'd expect that the accumulator z used in go to be an unboxed Int#. However, it's boxed:

sumTree1 :: Int
sumTree1 = I# 0

sumTree_go :: Int -> Fold.Tree Int -> Int
sumTree_go =
  \ (z :: Int) (ds_ddX :: Fold.Tree Int) ->
    case ds_ddX of _ {
      Fold.Leaf -> z;
      Fold.Node a l r ->
        case sumTree_go z l of _ { I# z' ->
        case a of _ { I# a# ->
        sumTree_go (I# (+# z' a#)) r
        }
        }
    }

sumTree :: Fold.Tree Int -> Int
sumTree =
  \ (eta1_B1 :: Fold.Tree Int) ->
    sumTree_go sumTree1 eta1_B1

Given this definition of fold'

fold' :: (a -> b -> a) -> a -> Tree b -> a
fold' f = go
  where
    go z _ | z `seq` False = undefined
    go z Leaf = z
    go z (Node a l r) = go (f (go z l) a) r
{-# INLINE fold' #-}

I get the core I want. However, this version isn't explicit in that the left branch (i.e. go z l) should be evaluated before f is called on the result. In other words, I think my first definition is the one that correctly expresses the evaluation order, yet it results in worse core.

Change History

  Changed 21 months ago by tibbe

  • version changed from 6.12.1 to 6.13

The problem is present in HEAD as well.

follow-up: ↓ 3   Changed 21 months ago by simonmar

We established (during a conversation on IRC) that the reason for the lack of unboxing was that the function wasn't inlined until after the worker-wrapper transformation. The question is therefore, why wasn't it inlined earlier?

in reply to: ↑ 2   Changed 21 months ago by simonmar

Replying to simonmar:

We established (during a conversation on IRC) that the reason for the lack of unboxing was that the function wasn't inlined until after the worker-wrapper transformation. The question is therefore, why wasn't it inlined earlier?

I think I must have misunderstood the IRC conversation in question, tibbe told me today that the function does indeed get inlined before strictness analysis and worker-wrapper. tibbe: if you could attach the output from -dverbose-core2core to the ticket, that might help.

  Changed 21 months ago by tibbe

Here are the relevant parts of -dverbose-core2core. First, inlining happens:

==================== Simplifier Phase 2 [main] max-iterations=4 ====================
a_soD :: GHC.Types.Int
[LclId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [] 1 2}]
a_soD = GHC.Types.I# 0

FoldTest.sumTree :: Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclIdX,
 Arity=1,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 18 0}]
FoldTest.sumTree =
  \ (eta1_B1 :: Fold.Tree GHC.Types.Int) ->
    letrec {
      go_aaE [Occ=LoopBreaker]
        :: GHC.Types.Int -> Fold.Tree GHC.Types.Int -> GHC.Types.Int
      [LclId,
       Arity=2,
       Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=2, Value=True,
               ConLike=True, Cheap=True, Expandable=True,
               Guidance=IF_ARGS [0 3] 12 0}]
      go_aaE =
        \ (z_aaF :: GHC.Types.Int) (ds_dbh :: Fold.Tree GHC.Types.Int) ->
          case ds_dbh of _ {
            Fold.Leaf -> z_aaF;
            Fold.Node a_aaH l_aaI r_aaJ ->
              case go_aaE z_aaF l_aaI of _ { GHC.Types.I# ipv_soJ ->
              case a_aaH of _ { GHC.Types.I# y_aox ->
              go_aaE (GHC.Types.I# (GHC.Prim.+# ipv_soJ y_aox)) r_aaJ
              }
              }
          }; } in
    go_aaE a_soD eta1_B1

And some time later demand analysis:

==================== Demand analysis ====================
a_soD :: GHC.Types.Int
[LclId,
 Str=DmdType m,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [] 1 2}]
a_soD = GHC.Types.I# 0

FoldTest.sumTree :: Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclIdX,
 Arity=1,
 Str=DmdType S,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 18 0}]
FoldTest.sumTree =
  \ (eta1_B1 [Dmd=Just S] :: Fold.Tree GHC.Types.Int) ->
    letrec {
      go_aaE [Occ=LoopBreaker]
        :: GHC.Types.Int -> Fold.Tree GHC.Types.Int -> GHC.Types.Int
      [LclId,
       Arity=2,
       Str=DmdType SS,
       Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=2, Value=True,
               ConLike=True, Cheap=True, Expandable=True,
               Guidance=IF_ARGS [0 3] 12 0}]
      go_aaE =
        \ (z_aaF [Dmd=Just S] :: GHC.Types.Int)
          (ds_dbh [Dmd=Just S] :: Fold.Tree GHC.Types.Int) ->
          case ds_dbh of _ {
            Fold.Leaf -> z_aaF;
            Fold.Node a_aaH [Dmd=Just U(L)]
                      l_aaI [Dmd=Just S]
                      r_aaJ [Dmd=Just S] ->
              case go_aaE z_aaF l_aaI of _ { GHC.Types.I# ipv_soJ [Dmd=Just L] ->
              case a_aaH of _ { GHC.Types.I# y_aox [Dmd=Just L] ->
              go_aaE (GHC.Types.I# (GHC.Prim.+# ipv_soJ y_aox)) r_aaJ
              }
              }
          }; } in
    go_aaE a_soD eta1_B1

and after that worker/wrapper

==================== Worker Wrapper binds ====================
a_soD :: GHC.Types.Int
[LclId,
 Str=DmdType m,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [] 1 2}]
a_soD = GHC.Types.I# 0

FoldTest.sumTree :: Fold.Tree GHC.Types.Int -> GHC.Types.Int
[LclIdX,
 Arity=1,
 Str=DmdType S,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 18 0}]
FoldTest.sumTree =
  \ (eta1_B1 [Dmd=Just S] :: Fold.Tree GHC.Types.Int) ->
    letrec {
      go_aaE [Occ=LoopBreaker]
        :: GHC.Types.Int -> Fold.Tree GHC.Types.Int -> GHC.Types.Int
      [LclId,
       Arity=2,
       Str=DmdType SS,
       Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=2, Value=True,
               ConLike=True, Cheap=True, Expandable=True,
               Guidance=IF_ARGS [0 3] 12 0}]
      go_aaE =
        \ (z_aaF [Dmd=Just S] :: GHC.Types.Int)
          (ds_dbh [Dmd=Just S] :: Fold.Tree GHC.Types.Int) ->
          case ds_dbh of _ {
            Fold.Leaf -> z_aaF;
            Fold.Node a_aaH [Dmd=Just U(L)]
                      l_aaI [Dmd=Just S]
                      r_aaJ [Dmd=Just S] ->
              case go_aaE z_aaF l_aaI of _ { GHC.Types.I# ipv_soJ [Dmd=Just L] ->
              case a_aaH of _ { GHC.Types.I# y_aox [Dmd=Just L] ->
              go_aaE (GHC.Types.I# (GHC.Prim.+# ipv_soJ y_aox)) r_aaJ
              }
              }
          }; } in
    go_aaE a_soD eta1_B1

  Changed 21 months ago by tibbe

To add to the confusion, using HEAD, this definition of foldl' gets an unboxed accumulator:

fold' :: (a -> b -> a) -> a -> Tree b -> a
fold' f = go
 where
   go z Leaf = z
   go z (Node a l r) = z `seq` go (f (go z l) a) r
{-# INLINE fold' #-}

but this one doesn't

fold' :: (a -> b -> a) -> a -> Tree b -> a
fold' f = go
 where
   go z Leaf = z
   go z (Node a l r) = z `seq` go (f (go z l) a) r

Note the missing INLINE pragma.

The strange thing is that GHC happily inlines both versions (since they're not recursive) but the former gets an unboxed accumulator for some reason I don't understand.

  Changed 20 months ago by igloo

  • milestone set to 7.2.1

  Changed 9 months ago by simonpj

  • summary changed from Missing unboxing in pre-order fold over binary tree to Strictness analyser is to conservative about passing a boxed parameter

I took a little look at this. As you say, you get the loop

sumTree_go :: Int -> Fold.Tree Int -> Int
sumTree_go =
  \ (z :: Int) (ds_ddX :: Fold.Tree Int) ->
    case ds_ddX of _ {
      Fold.Leaf -> z;
      Fold.Node a l r ->
        case sumTree_go z l of _ { I# z' ->
        case a of _ { I# a# ->
        sumTree_go (I# (+# z' a#)) r
        } } }

Notice that this loop is strict in z, but does not actually unbox z. The strictness analyser conservatively passes the boxed version under such circumstances, to avoid the possiblity of unboxing it, passing it to the function, which immediately reboxes it.

It turns out that adding a "!" to the defn of go is enough to fix this:

    go z Leaf = z
    go !z (Node a l r)   -- NOTE THE "!"
        = let z'  = go z l
              z'' = f z' a
          in z' `seq` z'' `seq` go z'' r

In fact the seqs are redundant because go is strict in both args, so this gives the same resulting code

    go z Leaf = z
    go !z (Node a l r)   -- NOTE THE "!"
        = go (f z' a) (go z l)

The code (in the sumTree call) is lovely

T4267.$wgo :: GHC.Prim.Int# -> T4267.Tree GHC.Types.Int -> GHC.Prim.Int#
T4267.$wgo =
  \ (ww_sps :: GHC.Prim.Int#) (w_spu :: T4267.Tree GHC.Types.Int) ->
    case w_spu of _ {
      T4267.Leaf -> ww_sps;
      T4267.Node ipv_soP ipv1_soQ ipv2_soR ->
        case T4267.$wgo ww_sps ipv1_soQ of ww1_spx { __DEFAULT ->
        case ipv_soP of _ { GHC.Types.I# y_ap5 ->
        T4267.$wgo (GHC.Prim.+# ww1_spx y_ap5) ipv2_soR
        } } }

So that's a workaround.

Meanwhile I think the strictness analyser should be a bit cleverer, so I'll keep the ticket open for that reason.

  Changed 9 months ago by tibbe

Aside: Does it matter if the bang pattern goes on the first or second equation of go? The first equation is already strict in z so putting the bang pattern on the second equation might be clearer. As a rule I always put bang patterns on the first equation if I want a function to be strict in some parameter.

  Changed 4 months ago by igloo

  • priority changed from normal to low
  • milestone changed from 7.4.1 to 7.6.1
Note: See TracTickets for help on using tickets.