Ticket #4334 (closed proposal: fixed)

Opened 3 years ago

Last modified 3 years ago

Make lines stricter to fix space leak

Reported by: daniel.is.fischer Owned by: simonmar
Priority: normal Milestone: Not GHC
Component: libraries/base Version: 6.12.3
Keywords: lines, space leak Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Other Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I propose changing the implementation of lines to be stricter.

The current implementation causes (in GHC) a space leak, so it is better to sacrifice a little laziness to fix that leak.

Period of discussion: Three weeks, until 15th October (because of ICFP).

Current implementation:

lines                   :: String -> [String]
lines ""                =  []
lines s                 =  let (l, s') = break (== '\n') s
                           in  l : case s' of
                                        []      -> []
                                        (_:s'') -> lines s''

The relevant part of the core showing the leak (-O or -O2, similar for -O0 but without inlining break):

        let {
          ds1_sjM :: ([GHC.Types.Char], [GHC.Types.Char])
          LclId
          [Str: DmdType]
          ds1_sjM =
            case GHC.List.$wbreak @ GHC.Types.Char lvl1_rjW wild_B1
            of _ { (# ww1_aj3, ww2_aj4 #) ->
            (ww1_aj3, ww2_aj4)
            } } in
        GHC.Types.:
          @ GHC.Base.String
          (case ds1_sjM of _ { (l_aif, _) -> l_aif })
          (case ds1_sjM of _ { (_, s'_aih) ->
           case s'_aih of _ {
             [] -> GHC.Types.[] @ GHC.Base.String;
             : _ s''_adj -> Lines.lines s''_adj
           }
           })

ds1_sjM keeps a reference to the first line until the second is demanded, preventing it from being GCed.

Current strictness properties:

lines _|_       = _|_
lines (_|_:_|_) = _|_ : _|_

Proposed implementation:

lines                   :: String -> [String]
lines ""                =  []
lines s                 = case break (== '\n') s of
                            (l, s') -> l : case s' of
                                            []      -> []
                                            (_:s'') -> lines s''

Due to the pattern matching on break's result, the pair is never considered a single entity, allowing the first line to be GCed before the second is demanded.

Strictness properties of the proposed implementation:

lines _|_       = _|_
lines (_|_:_|_) = _|_

Generally, a _|_ as the first Char of a line would produce a _|_ instead of the _|_ : _|_ which the current implementation produces. _|_ in any other place would be treated identically.

Attachments

lines.dpatch Download (52.3 KB) - added by daniel.is.fischer 3 years ago.
patch fixing the space leak without changing strictness
T4334.hs Download (511 bytes) - added by daniel.is.fischer 3 years ago.
testcase
T4334.in Download (11 bytes) - added by daniel.is.fischer 3 years ago.
testcase input

Change History

  Changed 3 years ago by igloo

  • milestone set to Not GHC

  Changed 3 years ago by daniel.is.fischer

For GHC from 6.10 through HEAD, we can get non-leaking behaviour without changing the strictness properties of lines:

lines :: String -> [String]
lines "" = []
lines s  = cons $ case break (== '\n') s of
                    (l, s') -> (l, case s' of
                                    []    -> []
                                    _:s'' -> lines s'')

-- uncurry isn't available in Data.List
cons :: (a,[a]) -> [a]
cons ~(x,xs) = x : xs

Since it is not necessary to change the strictness properties, I favour this over the original proposal.

Note however that in jhc-0.7.6 this leaks memory (as does the original implementation).
Also note that the fix is fragile, a change to the garbage collector can break it easily.
Nevertheless, I prefer a fragile temporary fix over no fix at all.

  Changed 3 years ago by simonmar

To prevent breaking it in GHC, we can add it as a regression test.

Changed 3 years ago by daniel.is.fischer

patch fixing the space leak without changing strictness

Changed 3 years ago by daniel.is.fischer

testcase

  Changed 3 years ago by daniel.is.fischer

  • status changed from new to patch

Discussion thread:  http://haskell.org/pipermail/libraries/2010-September/014420.html There were some strong reservations about making lines stricter, since that's against the general spirit of Data.List.
Nevertheless, there was also support for the original proposal. Further, the point was raised that the proposed fix (which no longer changes the strictness) may only temporarily work.

Altogether, it didn't come to an entirely clear conclusion, but since the only point that received real opposition is eliminated, I think we should apply the patch to (at least for the time being) fix the leak.

I've added a testcase programme, to be compiled with optimisations and given T4334.in as input, criterion for success is maximum resident memory less than 100,000 bytes. I don't know how to specify that criterion for the test driver, otherwise I'd have made a patch for the testsuite myself.

Please review.

Changed 3 years ago by daniel.is.fischer

testcase input

follow-up: ↓ 7   Changed 3 years ago by simonmar

  • owner set to simonmar

Thanks, I'll look at incorporating it. Daniel: please could you not include whitespace changes with functional changes in the same patch, it makes it a bit difficult to review your patches. Thanks.

  Changed 3 years ago by simonmar

  • status changed from patch to merge

Applied, thanks!

Wed Oct 20 10:11:11 BST 2010  Daniel Fischer <daniel.is.fischer@web.de>
  * FIX #4334
  Make selector thunks visible to GHC to fix a space leak in lines.
Wed Oct 20 11:39:53 BST 2010  Simon Marlow <marlowsd@gmail.com>
  * add test for #4334 (space leak in Data.List.lines)

in reply to: ↑ 5   Changed 3 years ago by daniel.is.fischer

Replying to simonmar:

Daniel: please could you not include whitespace changes with functional changes in the same patch, it makes it a bit difficult to review your patches. Thanks.

Sorry. I've set my editor to automatically remove trailing whitespace and darcs sometimes threw a tantrum and wouldn't let me darcs send a patch without including all previously made patches, so I gave up and let darcs have its way, record all changes and send the entire shemozzle. I'll try harder to separate whitespace from contents in the future.

  Changed 3 years ago by igloo

  • status changed from merge to closed
  • resolution set to fixed

Both merged.

Note: See TracTickets for help on using tickets.