Ticket #4334 (closed proposal: fixed)
Make lines stricter to fix space leak
| Reported by: | daniel.is.fischer | Owned by: | simonmar |
|---|---|---|---|
| Priority: | normal | Milestone: | Not GHC |
| Component: | libraries/base | Version: | 6.12.3 |
| Keywords: | lines, space leak | Cc: | |
| Operating System: | Unknown/Multiple | Architecture: | Unknown/Multiple |
| Type of failure: | Other | Difficulty: | |
| Test Case: | Blocked By: | ||
| Blocking: | Related Tickets: |
Description
I propose changing the implementation of lines to be stricter.
The current implementation causes (in GHC) a space leak, so it is better to sacrifice a little laziness to fix that leak.
Period of discussion: Three weeks, until 15th October (because of ICFP).
Current implementation:
lines :: String -> [String]
lines "" = []
lines s = let (l, s') = break (== '\n') s
in l : case s' of
[] -> []
(_:s'') -> lines s''
The relevant part of the core showing the leak (-O or -O2, similar for -O0 but without inlining break):
let {
ds1_sjM :: ([GHC.Types.Char], [GHC.Types.Char])
LclId
[Str: DmdType]
ds1_sjM =
case GHC.List.$wbreak @ GHC.Types.Char lvl1_rjW wild_B1
of _ { (# ww1_aj3, ww2_aj4 #) ->
(ww1_aj3, ww2_aj4)
} } in
GHC.Types.:
@ GHC.Base.String
(case ds1_sjM of _ { (l_aif, _) -> l_aif })
(case ds1_sjM of _ { (_, s'_aih) ->
case s'_aih of _ {
[] -> GHC.Types.[] @ GHC.Base.String;
: _ s''_adj -> Lines.lines s''_adj
}
})
ds1_sjM keeps a reference to the first line until the second is demanded, preventing it from being GCed.
Current strictness properties:
lines _|_ = _|_ lines (_|_:_|_) = _|_ : _|_
Proposed implementation:
lines :: String -> [String]
lines "" = []
lines s = case break (== '\n') s of
(l, s') -> l : case s' of
[] -> []
(_:s'') -> lines s''
Due to the pattern matching on break's result, the pair is never considered a single entity, allowing the first line to be GCed before the second is demanded.
Strictness properties of the proposed implementation:
lines _|_ = _|_ lines (_|_:_|_) = _|_
Generally, a _|_ as the first Char of a line would produce a _|_ instead of the _|_ : _|_ which the current implementation produces. _|_ in any other place would be treated identically.

