Ticket #3259 (closed bug: fixed)

Opened 4 years ago

Last modified 4 years ago

A module-local combinator using Control.Parallel.par behaves differently than when it's imported

Reported by: blamario Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.11
Keywords: Cc:
Operating System: Linux Architecture: x86
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description (last modified by simonpj) (diff)

There's a moderately long thread@haskell-cafe discussing the problem at:

 http://www.haskell.org/pipermail/haskell-cafe/2009-May/061765.html

I will attach the source files and the outputs of compilation with

ghc-6.11.20090421 --make primes-test.hs -threaded -O2 -ddump-simpl

on a 32-bit Ubuntu 2009.4.

What appears to be happening is that GHC generates the call to function `parallelize' as though it was strict, even though the interface declares it as lazy, but only when the function is imported.

The only proof of this, apart from the execution time, is this line of difference between the two -ddump-simpl outputs:

> $diff main.simpl imported.simpl
> ...
> 223c232
> < a_s1rs [ALWAYS Just L] :: GHC.Integer.Internals.Integer
> ---
> > a_s1sV [ALWAYS Just S] :: GHC.Integer.Internals.Integer
> ... 

Attachments

primes-test.hs Download (0.5 KB) - added by blamario 4 years ago.
The main test file, with import instead of a local definition
primes-test.2.hs Download (0.5 KB) - added by blamario 4 years ago.
The main file, using the local definition
Parallelizable.hs Download (108 bytes) - added by blamario 4 years ago.
The module exporting the function definition
main.simpl Download (10.0 KB) - added by blamario 4 years ago.
Output of ghc-6.11.20090421 --make primes-test.hs -threaded -O2 -ddump-simpl &> main.simpl
imported.simpl Download (10.2 KB) - added by blamario 4 years ago.
Output of ghc-6.11.20090421 --make primes-test.hs -threaded -O2 -ddump-simpl &> imported.simpl

Change History

Changed 4 years ago by blamario

The main test file, with import instead of a local definition

Changed 4 years ago by blamario

The main file, using the local definition

Changed 4 years ago by blamario

The module exporting the function definition

Changed 4 years ago by blamario

Output of ghc-6.11.20090421 --make primes-test.hs -threaded -O2 -ddump-simpl &> main.simpl

Changed 4 years ago by blamario

Output of ghc-6.11.20090421 --make primes-test.hs -threaded -O2 -ddump-simpl &> imported.simpl

Changed 4 years ago by simonpj

  • difficulty set to Unknown
  • description modified (diff)

Well diagnosed! Your report reveals quite a long-standing and serious but, so thank you! It'll generate nasty, insidious loss of parallelism.

Here's brain dump of what is going on, just to record it for posterity. par is defined in GHC.Conc thus:

{-# INLINE par  #-}
par :: a -> b -> b
par  x y = case (par# x) of { _ -> lazy y }

-- The reason for the strange "lazy" call is that
-- it fools the compiler into thinking that pseq  and par are non-strict in
-- their second argument (even if it inlines pseq/par at the call site).
-- If it thinks par is strict in "y", then it often evaluates
-- "y" before "x", which is totally wrong.  

The function lazy is the identity function, but it is inlined only after strictness analysis, and (via some magic) pretends to be lazy. Hence par pretends to be lazy too.

The trouble is that both par and lazy are inlined into your definition of parallelise, so that the unfolding for parallelise (exposed in Parallelise.hi) does not use lazy at all. Then when compiling Main, parallelise is in turn inlined (before strictness analysis), and so the strictness analyser sees too much.

This was all sloppy thinking on my part. Inlining lazy after strictness analysis works fine for the current module, but not for importing modules. My proposed fix is to inline lazy only at the very, very end, and in particular after any unfoldings have been exposed in an interface file. That might mean that we lose some optimisations, but I don't think it'll make much difference.

However, I'm going to leave this until Simon M gets back from holiday to discuss.

Simon

Changed 4 years ago by simonpj

  • status changed from new to closed
  • resolution set to fixed

OK so after talking to Simon I indeed moved the exposure of lazy into CorePrep. That certainly fixes the bug.

Fri May 29 00:20:20 PDT 2009  simonpj@microsoft.com
  * Fix Trac #3259: expose 'lazy' only after generating interface files
  Ignore-this: 3c762bda546981c4b4f01d28b8586ff8

  This patch fixes an insidious and long-standing bug in the way that
  parallelism is handled in GHC.  See Note [lazyId magic] in MkId.
...snip...
  The downside is that a little less optimisation may happen on programs
  that use 'lazy'.  And you'll only see this in the results -ddump-prep
  not in -ddump-simpl.  So KEEP AN EYE OUT (Simon and Satnam especially).
  Still, it should work properly now.  Certainly fixes #3259.

    M ./compiler/basicTypes/MkId.lhs -18 +23
    M ./compiler/coreSyn/CorePrep.lhs -2 +10
    M ./compiler/stranal/WorkWrap.lhs -10 +3

I have not added a test case, because I can't see a simple way to test it. But please do see if things improve.

Do watch out for loss of optimisation as a result. Simon and I do not think it'll be important, but we've been wrong before.

For example, in your program you have way too many lazy calls:

parallelize a b = lazy a `par` (lazy b `pseq` (a, b))

Here the lazy calls are completely unnecessary, and indeed they (now) make the code a bit less efficient. Instead you can say

parallelize a b = a `par` (b `pseq` (a, b))

Thank you for identifying such an insidious bug.

Simon

Note: See TracTickets for help on using tickets.