Ticket #876 (closed bug: fixed)

Opened 7 years ago

Last modified 3 months ago

Length is not a good consumer

Reported by: ariep@… Owned by:
Priority: lowest Milestone: 7.6.2
Component: libraries/base Version: 6.5
Keywords: length Cc: rich.neswold@…, george.colpitts@…
Operating System: Linux Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty: Unknown
Test Case: perf/should_run/T876 Blocked By:
Blocking: Related Tickets:

Description

The program

module Main where

main = print $ length . filter odd $ [0 .. 999999999]

, compiled with ghc-6.5.20060508 or later with optimisation turned on (-O), overflows the stack when run. It seems that the generated code is not tail-recursive. 6.4.2 and 6.5 snapshots up to 20060507 do not have this problem.

This may be a silly example, but the same thing happens if you replace 'odd' with some more interesting predicate, and let the length of the input list be chosen by the user.

Change History

  Changed 7 years ago by simonmar

  • milestone set to 6.6

Thanks for narrowing this down to a particular day! I'd be willing to bet money that this is the culprit:

 http://www.haskell.org/pipermail/cvs-libraries/2006-May/004996.html

  Changed 7 years ago by igloo

I've confirmed that the patch Simon M indicates is indeed the problem, and I've copied it below for reference.

It makes sense to me that the above behaviour is seen: length is now a good consumer, but it generates 1+(1+(1+(1+... as it is consuming, and this causes a stack overflow. I don't think we can fix this while staying with fold/build fusion, so it looks to me like the patch should be reverted and the whole problem looked at post-6.6.

Do you agree, Simon PJ?

Thanks Ian

[Make length a good consumer
simonpj@microsoft**20060508142726

 Make length into a good consumer.  Fixes Trac bug #707.

 (Before length simply didn't use foldr.)

] {
hunk ./GHC/List.lhs 114
-length                  :: [a] -> Int
-length l                =  len l 0#
-  where
-    len :: [a] -> Int# -> Int
-    len []     a# = I# a#
-    len (_:xs) a# = len xs (a# +# 1#)
+length :: [a] -> Int
+length l = lenAcc 0# l
+
+{-# RULES
+"length" [~1] forall xs. length xs = foldr incL (I# 0#) xs
+"lenAcc" [1]  forall n#. foldr incL (I# n#) = lenAcc n#
+ #-}
+
+incL :: a -> Int -> Int        -- Internal
+{-# NOINLINE [0] incL #-}
+incL x n = n `plusInt` oneInt
+
+lenAcc :: Int# -> [a] -> Int   -- Internal
+lenAcc a# []     = I# a#
+lenAcc a# (_:xs) = lenAcc (a# +# 1#) xs
}

  Changed 7 years ago by simonmar

  • component changed from Compiler to libraries/base
  • milestone changed from 6.6 to 6.6.1

I rolled back the patch for 6.6 - linear stack space is more important than the constant factor speedup from fusion. Let's look at this again for 6.6.1.

  Changed 7 years ago by simonpj

  • summary changed from stack overflow on 'length . filter odd $ [0 .. 999999999]' to Length is not a good consumer

Changing title to reflect current bug

  Changed 7 years ago by simonpj

  • owner set to simonpj
  • milestone changed from 6.6.1 to 6.8

  Changed 7 years ago by simonpj

This will probably get fixed as a result of revising list fusion: task #915

  Changed 7 years ago by igloo

  • testcase set to list003

  Changed 6 years ago by guest

probably this ticket should be joined to the #915?

  Changed 6 years ago by guest

  • cc rich.neswold@… added

  Changed 6 years ago by simonmar

  • milestone changed from 6.8 branch to 6.10 branch

  Changed 4 years ago by igloo

  • milestone changed from 6.10 branch to 6.12 branch

  Changed 3 years ago by igloo

  • milestone changed from 6.12 branch to 6.12.3

  Changed 3 years ago by igloo

  • priority changed from normal to low
  • milestone changed from 6.12.3 to 6.14.1

  Changed 2 years ago by igloo

  • milestone changed from 7.0.1 to 7.0.2

  Changed 2 years ago by igloo

  • milestone changed from 7.0.2 to 7.2.1

  Changed 20 months ago by igloo

  • milestone changed from 7.2.1 to 7.4.1

  Changed 16 months ago by igloo

  • priority changed from low to lowest
  • milestone changed from 7.4.1 to 7.6.1

  Changed 8 months ago by igloo

  • milestone changed from 7.6.1 to 7.6.2

  Changed 8 months ago by george.colpitts

  • keywords length added
  • failure set to None/Unknown

follow-up: ↓ 21   Changed 7 months ago by george.colpitts

  • failure changed from None/Unknown to Runtime performance bug

I don't remember explicitly setting Type of failure to None/Unknown when I added a keyword of length but it seems I did. I have now changed the failure type to runtime performance bug. On ghc 7.4.1 the problem I see is that the space usage of length is O(n) not O(1) in the size of the input, perhaps this is a better summary of the bug. I'm assuming that this is what "length is not a good consumer" means. This of course also slows down execution speed. Given the preceding I'm not sure why priority is lowest.

*Main> length  [0 .. (2^20)]
1048577
it :: Int
(0.03 secs, 42303644 bytes)
*Main> length  [0 .. (2^30)]
1073741825
it :: Int
(23.08 secs, 42950370096 bytes)
*Main> 42950370096 / 42303644
1015.2877160180338

in reply to: ↑ 20   Changed 7 months ago by george.colpitts

  • architecture changed from x86 to Unknown/Multiple

Replying to george.colpitts: Got it, length is not a good consumer, refers to list fusion (deforestation) as explained in e.g. 7.17.4 of the ghc 7.4.1 Users Guide.

As this is on multiple platforms including x86, I have changed the architecture to Multiple/Unknown

  Changed 7 months ago by george.colpitts

  • cc george.colpitts@… added

  Changed 4 months ago by morabbin

  • status changed from new to closed
  • resolution set to invalid

Would have been covered by #915, which is marked closed as invalid; doing the same here.

  Changed 3 months ago by morabbin

  • owner simonpj deleted
  • status changed from closed to new
  • resolution invalid deleted

Premature; reopening as a performance bug. #915 was about the general stream fusion technique, which needs moar research. This is a good place to track an exemplar of the problem.

  Changed 3 months ago by simonpj

OK, so the problem in Ian's comment 6 years ago (ha ha!) was that the naive version of length using foldr does not lead to a tail recursive function. But after a little reflection I realise that what we want is foldl with a strict accumulating parameter; and that can be expressed using foldr. So here is a version that does work (notice the strict apply) in incL:

len :: [a] -> Int
len xs = foldr incL id xs 0

incL :: a -> (Int -> Int) -> Int -> Int
incL = \_ g x -> g $! (x+1)

To demonstrate:

foo :: Int -> Int -> Int
foo p q = len [p..q]

Compile with -O to get this nice tight code

Len.$wfoo :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
Len.$wfoo =
  \ (ww_sgn :: GHC.Prim.Int#) (ww1_sgr :: GHC.Prim.Int#) ->
    case GHC.Prim.># ww_sgn ww1_sgr of _ {
      GHC.Types.False ->
        letrec {
          $wgo_sgA [Occ=LoopBreaker]
            :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
          [LclId, Arity=2, Str=DmdType LL]
          $wgo_sgA =
            \ (w_sga :: GHC.Prim.Int#) (ww2_sgd :: GHC.Prim.Int#) ->
              case GHC.Prim.==# w_sga ww1_sgr of _ {
                GHC.Types.False ->
                  $wgo_sgA (GHC.Prim.+# w_sga 1) (GHC.Prim.+# ww2_sgd 1);
                GHC.Types.True -> GHC.Prim.+# ww2_sgd 1
              }; } in
        $wgo_sgA ww_sgn 0;
      GHC.Types.True -> 0
    }

So I think we can do this for length in the library. I need a clean nofib run to test, though.

Simon

  Changed 3 months ago by simonpj

  • status changed from new to closed
  • testcase changed from list003 to perf/should_run/T876
  • resolution set to fixed

This patch to GHC.List makes length a good consumer, for what it's worth. The nofib numbers didn't budge.

Simon

commit 82f56e5a331f9bbd5c8afda9e8d8ad71059e934b
Author: Simon Peyton Jones <simonpj@microsoft.com>
Date:   Thu Feb 14 08:22:08 2013 +0000

    Make 'length' into a good consumer, fixing Trac #876
    
    Trac #876 is the oldest ticket I have fixed in a long time.
    I finally figured out how to make foldr behave in a non
    space-leaky way for length.
    
    Thanks to Andy for re-opening.

>---------------------------------------------------------------

 GHC/List.lhs |   23 ++++++++++++++++++-----
 1 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/GHC/List.lhs b/GHC/List.lhs index 02d6965..049aa2a 100644
--- a/GHC/List.lhs
+++ b/GHC/List.lhs
@@ -114,12 +114,25 @@ null (_:_)              =  False
 -- | /O(n)/. 'length' returns the length of a finite list as an 'Int'.
 -- It is an instance of the more general 'Data.List.genericLength',
 -- the result type of which may be any kind of number.
+{-# NOINLINE [1] length #-}
 length                  :: [a] -> Int
-length l                =  len l 0#
-  where
-    len :: [a] -> Int# -> Int
-    len []     a# = I# a#
-    len (_:xs) a# = len xs (a# +# 1#)
+length l                =  lenAcc l 0#
+
+lenAcc :: [a] -> Int# -> Int
+lenAcc []     a# = I# a#
+lenAcc (_:xs) a# = lenAcc xs (a# +# 1#)
+
+incLen :: a -> (Int# -> Int) -> Int# -> Int incLen _ g x = g (x +# 1#)
+
+-- These rules make length into a good consumer
+-- Note that we use a higher-order-style use of foldr, so that
+-- the accumulating parameter can be evaluated strictly
+-- See Trac #876 for what goes wrong otherwise {-# RULES
+"length"     [~1] forall xs. length xs = foldr incLen I# xs 0#
+"lengthList" [1]  foldr incLen I# = lenAcc  #-}
 
 -- | 'filter', applied to a predicate and a list, returns the list of
 -- those elements that satisfy the predicate; i.e.,
Note: See TracTickets for help on using tickets.