Ticket #5161 (closed bug: fixed)

Opened 2 years ago

Last modified 2 years ago

Poor performance of division; unnecessary branching

Reported by: rtvd Owned by: igloo
Priority: normal Milestone: 7.2.1
Component: libraries/base Version: 7.0.3
Keywords: div quot rem mod performance slow branching Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

In case an Int* is divided by a constant the compiled code contains unnecessary checks whether the value being divided is minBound. This check is necessary only if a value is being divided by -1 so that an exception would be thrown.

The branch can be removed by the GHC's optimiser after a small change: the two parts of the corresponding condition have to be flipped. This provides an opportunity to short-circuit this condition to 'false' without knowing the first argument (patch is attached).

Attachments

Change History

Changed 2 years ago by rtvd

  • status changed from new to patch

Please can someone review and the patch and put it into repository so that it would appear in the next GHC release?

Changed 2 years ago by simonpj

Your changes replace the guard

   | x == minBound && y == (-1) = overflowError 

by

   | y == (-1) && x == minBound = overflowError

At first I couldn't see how this would make any difference. After all, if y is 2 (say), then the overflow case is statically inaccessible, so the test should vanish. Here's why it doesn't. Initially we have something roughly like this:

quot x y = case x of
             -9223372036854775808 -> case y of 
                                        -1 -> overflowError
                                        DEFAULT -> x `quot#` y
             DEFAULT -> x `quot#` y

(I'm omitting the unboxing etc.)

Now if we have the call (p `quot` 2), and we inline quot we get:

  case p of
    -9223372036854775808 -> case 2 of 
                              -1 -> overflowError
                              DEFAULT -> p `quot#` 2
    DEFAULT -> p `quot#` 2

The inner case simplifies, leaving

  case p of
    -9223372036854775808 -> p `quot#` 2
    DEFAULT              -> p `quot#` 2

At this point I thought we'd just have a case with two identical branches, which is eliminated (assuming it doesn't affect strictness, which it doesn't in this case), leaving the desired (p `quot#` 1).

BUT in the minBound branch we know what p is, and GHC cleverly does the division at compile time, giving

  case p of
    -9223372036854775808 -> -4611686018427387904
    DEFAULT              -> p `quot#` 2

Darn! Now the branches aren't identical!

By switching the order of tests you've stopped this happening. But it'll happen instead if you have a constant numerator. The term (2 `quot` q) will turn into

case q of
  -1      -> 0
  DEFAULT -> 2 `quot#` q

Constant denominators are more common than constant numerators, so I think your patch is a good one -- Ian can you apply please? But also can you add this Trac comment into the code with Note [Ordering of tests] or something, and point to it from the test sites?

I wish I could see a good way to make GHC generate the best code regardless of test ordering. I can think of some un-satisfying solutions:

  • Delay constant folding until a later phase
  • Delay case-of-case until a later phase

but neither are great. Phase ordering is very delicate and non-modular.

Changed 2 years ago by igloo

  • owner set to igloo

Changed 2 years ago by igloo

  • milestone set to 7.2.1

Changed 2 years ago by igloo

  • status changed from patch to closed
  • resolution set to fixed

Applied, and note added:

changeset:5dac514410303675fe60985c502c2c2863da616c

changeset:a07504ab09e4c63e7ef25c13e4e97466c36a651a

Note: See TracTickets for help on using tickets.