Space Leak
A Haskell Sore Spot
When a program retains memory longer than the programmer expects,it is said to leak space. [1]
Quoting from Neil Mitchell's 2013 article in ACM Queue:
"[Writers of] compilers for lazy functional languages have been dealing with space leaks for more than 30 years and have developed a number of strategies to help.
...
Despite all the improvements, space leaks remain a thorn in the side of lazy evaluation, producing a significant disadvantage to weigh against the benefits."
"Pinpointing space leaks is a skill that takes practice and perseverance. Better tools could significantly simplify the process."
The Haskell applications programmer is well aware that space leak poses a significant impediment to development. [2] In fact, any large Haskell program, even carefully coded by skilled programmers, will typically exhibit space leak compounded of multiple causes. The most serious cases are catastrophic, requiring urgent measures for development and testing to proceed. Such dire leaks can arise several times per week.
All space leak degrades performance, and production-quality programs should eradicate as much leak behaviour as possible. [3] Difficulty ranges from distracting (minutes) to formidable (days, or worse). This steers development resources away from the application domain, forcing one to resort to bag-of-tricks hackery in attempting to isolate and resolve the more serious leaks as they arise. Understanding what's going on requires thinking about memory at a low level, and the abstractions involved are more complex than traditional block allocation. For space leak debugging, the current best practises constitute a black art. Leak behaviour is also extremely sensitive to build configuration (compiler version, and flags such as optimisation level).
The problem stems from the lazy evaluation model of Haskell [4] which has been discussed at length elsewhere (see Related).
What can we do?
This situation seems pretty hopeless. Can anything be done?Of course! We can keep studying the phenomenon:
- programming practise evolves as leaky patterns are recognised
- compilers get better at optimising leak away
- diagnostic and remedial tool support improves (for instance seqaid)
I heard Haskell will be strict-by-default in future...
Nah. ;) But if it were, you'd then have complementary problems, of overly strict code hurting performance. Making the language implicitly strict is not a panacea (though a pragma to that effect would be welcome; also consider strict-ghc-plugin and seqaid).We will prevail against space leak, and all such things, without forfeiting lazy semantics; through an improved understanding of demand, strictness, and evaluation. The optimal solution will often be a complex fusion of lazy and strict. As better tools emerge, these judgements will be automated, meaning less disruption to your development objectives. You'll enjoy more sustained attention on your domain of interest! [5]
Show me a simple example of a program that leaks
The leaky package contains a small Haskell 98 program known to leak even with -O2 optimisation under GHC 7.8.3. It also contains build flavours which test some new tools (deepseq-bounded and seqaid). Installing seqaid and running seqaid demo is a good way to get started experimenting with leaky.As new GHC releases come out, there may be some lag in case the upgraded compiler automatically optimises the leak out of leaky. This is a natural progress, and reflects the importance of the space leak problem. [6]
-- This leaks with GHC 7.8.3 -O2, strict field notwithstanding. duration = 5000000 -- XXX tune for your hardware period = duration `div` 10 data State = St !Bool toggle (St b) = St (not b) main = do putStrLn "Leaky started." rslt <- duty (St False) 0 (return $! rslt) >>= return putStrLn "Finished." duty :: State -> Int -> IO State duty state i = do let state' = toggle state if 0 == i `mod` period then return () else return () if i < duration then duty state' (1+i) else return state'
And here is the same program with some extras to tinker with.
------------------------------------------------------------------------------- -- So, with the following configuration, -O2 is smart enough -- to produce the correct Bool output, without seeming to -- do any work. (But not with -O0 or even with -O.) --- #define STATS 0 --- #define NO_PERIODIC_BRANCH 1 --- #define STRICT_FIELDS 1 -- -- In fact, -O0, -O and -O2 all have distinct behaviour: -- -O2 returns the result "instantly" -- -O returns the result in a second or two -- -O0 returns only after a long run ------------------------------------------------------------------------------- {-# LANGUAGE CPP #-} -- (or run cpp manually if need be) ------------------------------------------------------------------------------- -- Whether to run getStats computation (and output status lines). -- Note that when STATS is 1, this constitutes a forcing factor: -- Strictness is a bit like QM that way... #define STATS 1 -- If you explicitly performGC at the start of each getStats, you -- get a more regular sequence of status lines; but you are also -- tampering with the natural behaviour of the garbage collector. #define PERFORM_GC 1 -- This switch is offered because, only when set, is there a -- difference between -O and -O2 evident. (-O0 differs in any case.) -- And only when un-set will this program leak at -O2. -- (Only applicable when STATS is 0.) #define NO_PERIODIC_BRANCH 0 -- This leak is not plugged by strict fields! -- (In the case of STATS 0 and NO_PERIODIC_BRANCH 1, the leak is -- not manifest with -O2, but this is not due to the strict field.) #define STRICT_FIELDS 1 -- Superfluous I guess. #define RECORD_SYNTAX 0 ------------------------------------------------------------------------------- #if STATS -- XXX This doesn't work! -- Please GHC-compile with -rtsopts -with-rtsopts=-s -with-rtsopts=-T -- (-s not -T are independent; nice to see -s summary info.) {-# OPTIONS_GHC -rtsopts #-} {- OPTIONS_GHC -with-rtsopts=-s #-} {-# OPTIONS_GHC -with-rtsopts=-T #-} #endif ------------------------------------------------------------------------------- import Control.Exception ( evaluate ) -- (\x -> (return $! x) >>= return) import GHC.Stats import GHC.Int ( Int64 ) import System.Mem ( performGC ) ------------------------------------------------------------------------------- duration = 5000000 -- XXX tune for your hardware report_period = duration `div` 10 --report_period = 500000 -- GHC 7.8.3 -O2 is two smart (when STATS=0) ------------------------------------------------------------------------------- #if RECORD_SYNTAX #if STRICT_FIELDS data State = St { st :: !Bool } #else data State = St { st :: Bool } #endif #else #if STRICT_FIELDS data State = St !Bool #else data State = St Bool #endif #endif toggle (St b) = St (not b) ------------------------------------------------------------------------------- main = do putStrLn "Leaky started." #if STATS putStrLn $ "\n" ++ pad "live heap" ++ " " ++ pad "just alloc'd" #endif rslt <- duty (St False) 0 0 evaluate rslt putStrLn "\nFinished." ------------------------------------------------------------------------------- duty :: State -> Int -> Int64 -> IO State duty state i last_total_bytes_allocated = do #if RECORD_SYNTAX let state' = state { st = not (st state) } #else let state' = toggle state -- let state' = (\(St b)->St (not b)) state -- let state' = let St b = state in St (not b) #endif #if STATS new_total_bytes_allocated <- if 0 == i `mod` report_period then do #if PERFORM_GC performGC #endif stats <- getGCStats let bytes = bytesAllocated stats putStrLn $ pad (show (currentBytesUsed stats)) ++ " " ++ pad (show (bytes - last_total_bytes_allocated)) return $ bytesAllocated stats else return last_total_bytes_allocated #else let new_total_bytes_allocated = last_total_bytes_allocated #if ! NO_PERIODIC_BRANCH if 0 == i `mod` report_period -- if 0 == i `mod` 2 -- GHC 7.8.3 -O2 is two smart then do -- putStrLn "." -- unnecessary return () else return () #endif #endif if i < duration then duty state' (1+i) new_total_bytes_allocated else return state' ------------------------------------------------------------------------------- pad :: String -> String pad s = (take (15 - length s) $ repeat ' ') ++ s -------------------------------------------------------------------------------
These and other variants are available in the leaky source package.
Explanation of Output
If you actually build the leaky package with Cabal, the version built is further elaborated, and demonstrates seqaid. The output is discussed here.Notes
[1] | Given the non-locality of space leak phenomena in lazy languages, it is usually more appropriate to speak of "space leak" as an holistic, abstract quality of a program, rather than to speak of "a space leak" in particular. When I speak of "a leak", this refers to a particular leaking program as a whole, or to a particular run of a leaking program. Sometimes we leave out the word "space" and just speak of "leak", since space leak is the dominant form of resource leak plaguing Haskell. ⇪ |
[2] | Current state of the art, 2014. One indicator of the pervasiveness of the space leak problem is the popularity of the deepseq package, despite its notoriety and ill repute. (It has over 400 reverse dependencies on hackage, making it one of the most popular libraries of all time, coming in well ahead of lens even!) ⇪ |
[3] | One eventually enters the vast and nebulous realm of program optimisation, where the criterion "retains memory longer than expected" ramifies and effervesces. ⇪ |
[4] | At least, the GHC implementation of Haskell is lazy. ⇪ |
[5] | Otherwise, your domain of interest slides gradually, out of necessity or frustration, towards programming language theory and implementation. Not a bad thing of itself, but it's not entirely satisfying to have your goals deflected, even if the shift was toward the noble pursuit of better tools. ⇪ |
[6] | For instance, Haskell's most famous space leak, Anatomy of a Thunk Leak, no longer leaks with GHC 7.8 and -O2. ⇪ |
Related
Wiki Books - Haskell pages: haskellwiki pages:- Memory leak
- Space
- Laziness (different than the above!)
- Strictness
- How to profile a Haskell program
- Accumulating parameter
- Data types