Ticket #917 (new bug)

Opened 5 years ago

Last modified 6 weeks ago

-O introduces space leak

Reported by: claus.reinke@… Owned by:
Priority: lowest Milestone: _|_
Component: Compiler Version: 6.5
Keywords: Cc: prokhorenkov@…, pho@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

consider the following variant of a popular space problem

initlast :: (()->[a]) -> ([a], a)
initlast xs = (init (xs ()), last (xs ()))

main = print $ case initlast (\()->[0..1000000000]) of
                 (init, last) -> (length init, last)

the long list has been wrapped in abstractions to avoid sharing, gaining constant space processing rather than the space leak in the original code - see

 http://www.haskell.org/pipermail/haskell-cafe/2006-September/018447.html

 http://www.haskell.org/pipermail/haskell-cafe/2006-September/018464.html

this works nicely if compiled without -O, but unfortunately, -O reintroduces the space leak (apparently lifting and sharing the common and space-expensive subexpression).

1. I didn't see a ticket for this issue, so I wanted to record the example

2. avoiding sharing isn't always possible, so it would be nice if one could "bypass" sharing in such cases. in the old g-machine, that might have been as simple as introducing a pragma that tells the compiler to omit the update after the eval in the code for some subexpression (so the original application node would not be overwritten by the space-expensive result, trading time for space). is there a chance for a similar trick in GHC? so one might write code like this (handwaving:-):

initlast :: [a] -> ([a], a)
initlast xs = (init ({-# COPY #-} xs), last ({-# COPY #-} xs))

main = print $ case initlast [0..1000000000] of
                 (init, last) -> (length init, last)

Change History

Changed 5 years ago by claus.reinke@…

actually, the relevant evals/updates would have been in in init and last, whereas the problem is visible in initlast, where the pragmas should be. so this would have required either a little more thinking if one wanted to do it in the compiler, or perhaps there might still be a simple way to tag thunks for "please don't update" in the runtime system?

Changed 5 years ago by simonpj

  • priority changed from normal to lowest
  • milestone set to _|_

Thanks for recording this. You are right that full laziness can introduce space leaks. GHC provides no way to selectively disable it, but you can use -fno-full-laziness.

It's a good research topic. It'd be great if someone could find a way to solve it... but I doubt we'll do anything to GHC in the short term.

Changed 3 years ago by simonmar

  • architecture changed from Unknown to Unknown/Multiple

Changed 3 years ago by simonmar

  • os changed from Unknown to Unknown/Multiple

Changed 3 years ago by simonpj

See also #3273 for another example of a space leak caused by the full laziness transformation.

Simon

Changed 15 months ago by prokhorenkov

  • cc prokhorenkov@… added
  • failure set to None/Unknown

Changed 14 months ago by igloo

Another example: #4276.

Changed 12 months ago by PHO

  • cc pho@… added

Changed 12 months ago by batterseapower

An interesting example of this occurs with something like:

foo act = forM_ [1..10000] act

Currently full-laziness floats the list out of foo. Not only does this increase space usage, it also disables the fusion that would ordinarily occur between the list and forM_.

I guess the reason this is that the rules matcher sees that applying the foldr/build rule for that shared build might destroy sharing. However, maybe the rules matcher should be allowed to duplicate work as long as it was duplicated in the original source program?

So when we float out, we say "OK, we're going to share this work for now, but if we spot a RULE opportunity at a later stage we're going to undo this sharing.". This would at least fix this case, but it does not seem at all straightforward how to represent information about this "original" lack of sharing.

Changed 12 months ago by rl

The CONLIKE pragma is useful in such cases. We could have a cheapBuild which would be exactly like build but would be marked as CONLIKE. Then, if GHC saw:

xs = cheapBuild f
foo x = foldr g z xs

it would apply the foldr/cheapBuild rule even though this loses sharing. We'd then be able to control which functions are cheap enough for sharing not to matter by using cheapBuild instead of build. This works reasonably well in DPH.

That's not to say that your suggestion doesn't make sense. It just seems rather harder to implement.

Changed 6 weeks ago by judahj

One more example that I ran into this week: #5729.

Note: See TracTickets for help on using tickets.