Ticket #5615 (new bug)

Opened 19 months ago

Last modified 4 weeks ago

ghc produces poor code for `div` with constant powers of 2.

Reported by: Lennart Owned by: daniel.is.fischer
Priority: normal Milestone: 7.6.2
Component: Compiler Version: 7.4.1-rc1
Keywords: Cc: dterei, carter.schonwald@…
Operating System: Unknown/Multiple Architecture: x86
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

The code for Int (div x c) where c is a constant of the form 2n should be implemented with an arithmetic right shift, currently it's a call to a function. This optimization should be done for all signed types.

Change History

Changed 19 months ago by simonpj

  • owner set to daniel.is.fischer

Daniel, might you have time to look at this? (Or Ian.)

Simon

Changed 19 months ago by daniel.is.fischer

I'll see if I can find the relevant modules of the compiler.

Changed 19 months ago by Lennart

Also, mod of powers of 2 should be an AND instruction.

Changed 18 months ago by simonmar

From this input:

module Foo where

foo :: Int -> Int
foo x = x `div` 64

With -O2 we get as far as a call to GHC.Base.divInt#:

Foo.foo =
  \ (x_abl :: GHC.Types.Int) ->
    case x_abl of _ { GHC.Types.I# ww_awV ->
    case GHC.Base.divInt# ww_awV 64 of ww1_ax3 { __DEFAULT ->
    GHC.Types.I# ww1_ax3
    }
    }

but divInt# wasn't inlined:

Considering inlining: GHC.Base.divInt#
    arg infos [TrivArg, ValueArg]
    uf arity 2
    interesting continuation ArgCtxt False
    some_benefit True
    is exp: True
    is cheap: True
    guidance IF_ARGS [0 0] 139 0
    discounted size = 109
    ANSWER = NO

And divInt# has this unfolding:

435aa24ba2ad252f2b1992da3e8faa90
  divInt# :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
    {- Arity: 2, HasNoCafRefs, Strictness: LL,
       Unfolding: (\ x# :: GHC.Prim.Int# y# :: GHC.Prim.Int# ->
                   case GHC.Prim.># x# 0 of wild {
                     GHC.Types.False
                     -> case GHC.Prim.<# x# 0 of wild1 {
                          GHC.Types.False -> GHC.Prim.quotInt# x# y#
                          GHC.Types.True
                          -> case GHC.Prim.># y# 0 of wild2 {
                               GHC.Types.False -> GHC.Prim.quotInt# x# y#
                               GHC.Types.True
                               -> case GHC.Prim.quotInt#
                                         (GHC.Prim.+# x# 1)
                                         y# of wild3 { DEFAULT ->
                                  GHC.Prim.-# wild3 1 } } }
                     GHC.Types.True
                     -> case GHC.Prim.<# y# 0 of wild1 {
                          GHC.Types.False
                          -> case GHC.Prim.<# x# 0 of wild2 {
                               GHC.Types.False -> GHC.Prim.quotInt# x# y#
                               GHC.Types.True
                               -> case GHC.Prim.># y# 0 of wild3 {
                                    GHC.Types.False -> GHC.Prim.quotInt# x# y#
                                    GHC.Types.True
                                    -> case GHC.Prim.quotInt#
                                              (GHC.Prim.+# x# 1)
                                              y# of wild4 { DEFAULT ->
                                       GHC.Prim.-# wild4 1 } } }
                          GHC.Types.True
                          -> case GHC.Prim.quotInt#
                                    (GHC.Prim.-# x# 1)
                                    y# of wild2 { DEFAULT ->
                             GHC.Prim.-# wild2 1 } } }) -}

I'm rather surprised it has such a large size, since it is mostly primops and inline cases. I suspect we're counting too much for those cases, they ought to look cheap because they're inline (not evals). And there ought to be a big discount for that constant argument.

Changed 18 months ago by igloo

  • milestone set to 7.6.1

Other than the unpleasant solution of adding rules for

x `div` 2
x `div` 4
x `div` 8
...
x `div` 9223372036854775808

I think we would need a rule in prelude/PrelRules.lhs to do this. But I think this would get us back to the problem of needing to get an Id for unsafeShiftR from somewhere.

Would we be better off moving rule application to a different point in the compiler, when we are in a MonadThings monad?

Changed 18 months ago by dterei

  • cc dterei added

Changed 17 months ago by simonpj

  • difficulty set to Unknown

I don't see the problem here. We can add a rule for divInt# in PrelRules. The rule would replace (x `divInt#` 8) with (x `uncheckedIShiftRA#` 3); except that obviously instead of 8 we'd have any power of 2. We can get the Id for any primop easily.

Daniel might you look at this?

Simon

Changed 17 months ago by simonmar

I think I was hoping that we'd be able to do it without a built-in rule, since the backends already turn constant divisions into shifts (and LLVM probably has a range of more sophisticated translations). If only divInt# was inlined, it would just work.

Changed 17 months ago by simonpj

The trouble is that, as you can see above divInt# has a pretty big unfolding, and inlining that at every call site is perhaps overkill. Mind you, if the denominator is constant, at least some of that unfolding is dead code. But there's also a test on the numerator too; quotInt# is the primop I think (rightly or wrongly).

I wonder whether this numerator test stuff affects the correctness or otherwise of replacing divInt with a shift?

Anyway, adding a RULE woudl be fine I think.

Changed 17 months ago by simonmar

Well, that's a good point. Integer right shift is equivalent to division truncated to negative infinity (i.e. div), so the test on the numerator would not be necessary. That's a good argument for having a built in rule.

For quot some additional checks would be necessary in the rule. I just checked what gcc does, and for x / 2 it generates the clever sequence

        movl    %edi, %eax
        shrl    $31, %eax
        addl    %edi, %eax
        sarl    %eax

which is basically the same as x < 0 ? (x+1) >> 1 : x >> 1. Dividing by 4 instead gives this:

        leal    3(%rdi), %eax
        testl   %edi, %edi
        cmovns  %edi, %eax
        sarl    $2, %eax

which is x < 0 ? (x + 3) >> 2 : x >> 2.

We could do the same tricks with a built-in rule for quot.

Changed 16 months ago by simonmar

  • version changed from 7.2.1 to 7.4.1-rc1

One more thing to add: the backend does have some clever rewriting of quot# to shift, this comment is from cmm/CmmOpt.hs:

                -- shift right is not the same as quot, because it rounds
                -- to minus infinity, whereasq quot rounds toward zero.
                -- To fix this up, we add one less than the divisor to the
                -- dividend if it is a negative number.
                --
                -- to avoid a test/jump, we use the following sequence:
                --      x1 = x >> word_size-1  (all 1s if -ve, all 0s if +ve)
                --      x2 = y & (divisor-1)
                --      result = (x+x2) >>= log2(divisor)
                -- this could be done a bit more simply using conditional moves,
                -- but we're processor independent here.
                --
                -- we optimise the divide by 2 case slightly, generating
                --      x1 = x >> word_size-1  (unsigned)
                --      return = (x + x1) >>= log2(divisor)

Changed 16 months ago by augustss

Does the backend optimization of quot interact nicely with how div is encoded using quot? I want div (with a power of 2) to turn into a shift, because div (unlike quot) is equivalent to a shift.

Changed 16 months ago by simonmar

Unfortunately not, which is why this ticket is still open. I'm just pointing out related pieces we already have in place in the compiler.

Changed 8 months ago by igloo

  • milestone changed from 7.6.1 to 7.6.2

Changed 4 weeks ago by carter

  • cc carter.schonwald@… added
Note: See TracTickets for help on using tickets.