Ticket #3076 (closed bug: wontfix)

Opened 4 years ago

Last modified 4 years ago

Make genericLength tail-recursive so it doesn't overflow stack

Reported by: Syzygies Owned by:
Priority: normal Milestone:
Component: Compiler Version: 6.10.1
Keywords: genericLength Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Runtime performance bug Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

A likely use for genericLength is to count lists of more than Int elements. However, the source code is not tail-recursive, making a poor "consumer", so genericLength easily overflows stack.

Here is the source code from 6.10.1:

genericLength           :: (Num i) => [b] -> i
genericLength []        =  0
genericLength (_:l)     =  1 + genericLength l

Here is a proposed alternative:

genericLength ∷ (Num i) ⇒ [b] → i
genericLength = len 0 where
  len n [] = n
  len n (_:xt) = len (n+1) xt

In my test application (enumerating the 66,960,965,307 atomic lattices on six atoms) this alternative avoids overflowing the stack.

[This is not the same issue as  http://hackage.haskell.org/trac/ghc/ticket/2962]

Attachments

Main.hs Download (1.6 KB) - added by Syzygies 4 years ago.

Change History

Changed 4 years ago by igloo

  • status changed from new to closed
  • difficulty set to Unknown
  • resolution set to invalid

That isn't an equivalent definition, e.g.:

data Nat = Zero | Succ Nat
    deriving (Show, Eq)

works :: Bool
works = genericLength1 (cycle "foo") > (5 :: Nat)

fails :: Bool
fails = genericLength2 (cycle "foo") > (5 :: Nat)

instance Num Nat where
    fromInteger 0 = Zero
    fromInteger (n + 1) = Succ (fromInteger n)
    Zero + n = n
    Succ m + n = Succ (m + n)

instance Ord Nat where
    compare Zero Zero = EQ
    compare Zero _    = LT
    compare _    Zero = GT
    compare (Succ m) (Succ n) = compare m n

genericLength1           :: (Num i) => [b] -> i
genericLength1 []        =  0
genericLength1 (_:l)     =  1 + genericLength1 l

genericLength2 :: (Num i) => [b] -> i
genericLength2 = len 0 where
  len n [] = n
  len n (_:xt) = len (n+1) xt

works works, but fails doesn't terminate:

*Main> works
True
*Main> fails
^CInterrupted.

If you restrict the type to Int or Integer, then it will work thanks to #2962, though.

Changed 4 years ago by Syzygies

  • keywords genericLength added
  • status changed from closed to reopened
  • resolution invalid deleted

That was enlightening. Thanks.

The library genericLength blows stack on very short lists. My first proposal fails on infinite lists, using Peano arithmetic. So they both fail reasonable tests. However, one can make a logarithmic improvement in the stack usage of the library function, and pass both tests:

  module Main where

  import Data.Int

  data Nat = Zero | Succ Nat
    deriving (Show, Eq)

  instance Num Nat where
    fromInteger 0 = Zero
    fromInteger n = Succ $ fromInteger $ n - 1
    Zero + n = n
    Succ m + n = Succ (m + n)
    Zero * _ = Zero
    Succ m * n = n + m * n
    abs n = n
    signum Zero = Zero
    signum (Succ _) = Succ Zero

  instance Ord Nat where
    compare Zero Zero = EQ
    compare Zero _    = LT
    compare _    Zero = GT
    compare (Succ m) (Succ n) = compare m n

  genericLength1       ∷ (Num i) ⇒ [b] → i
  genericLength1 []    =  0
  genericLength1 (_:l) =  1 + genericLength1 l

  genericLength2 ∷ (Num i) ⇒ [b] → i
  genericLength2 = len 0 where
    len n [] = n
    len n (_:xt) = len (n+1) xt

  genericLength3 ∷ (Num i) ⇒ [b] → i
  genericLength3 = len 1 0 where
    len _ n [] = n
    len m n (_:xt)
      | m == n = n + len (n+n) 1 xt
      | otherwise = len m (n+1) xt

  intLength1, intLength2, intLength3 ∷ [a] → Int64
  intLength1 = genericLength1
  intLength2 = genericLength2
  intLength3 = genericLength3

  natLength1, natLength2, natLength3 ∷ [a] → Nat
  natLength1 = genericLength1
  natLength2 = genericLength2
  natLength3 = genericLength3

  list :: Int64 -> [Bool]
  list 0 = []
  list n = True : list (n-1)

  main ∷ IO ()
  main = do

--  print $ intLength1 $ list $ 2^19 -- fails w/ 8388608 byte stack
    print $ intLength2 $ list $ 2^32
    print $ intLength3 $ list $ 2^32

    print $ natLength1 (repeat 1) > 5
--  print $ natLength2 (repeat 1) > 5 -- fails
    print $ natLength3 (repeat 1) > 5

Getting this to work requires specializing genericLength, to avoid class overhead. This is nevertheless preferable to writing a length function from scratch, which might fall into either of these traps.

Getting this to work also requires a "good producer". I was surprised to discover that in this context, [1..n] isn't a "good producer".

Changed 4 years ago by Syzygies

Changed 4 years ago by Syzygies

I ran a timing test, to compare intLengthN for N=1,2,3 on an example that N=1 (the existing library code) could handle:

print $ sum $ map intLength1 $ take 1000 $ iterate tail $ list $ 2^18

intLength3 runs slower (of course) than the invalid intLength2, but 5.5x faster than the current library code intlength1.

Changed 4 years ago by igloo

  • status changed from reopened to closed
  • resolution set to wontfix

This assumes that (==) terminates, which it might not do, e.g. with some datatypes representing the reals.

Changed 4 years ago by simonmar

  • failure set to Runtime performance bug
Note: See TracTickets for help on using tickets.