Ticket #698 (closed bug: fixed)

Opened 7 years ago

Last modified 3 years ago

GHC's internal memory allocator never releases memory back to the OS

Reported by: guest Owned by: igloo
Priority: highest Milestone: 7.0.1
Component: Runtime System Version: 6.12.1
Keywords: Cc: Bulat.Ziganshin@…, barsoap@…, gwern0@…, asklingenberg@…, ndmitchell@…, gale@…, pho@…, rturk@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Moderate (less than a day)
Test Case: Blocked By:
Blocking: Related Tickets:

Description

allocaBytes does not appear to free the memory after the computation has completed. For example, start ghci and run:

allocaBytes (100*1024*1024) $ \_ -> getLine

ghci's virtual memory usage will jump up by 100MB. When you press enter however, it does not drop back down.

restart ghci and try:

bracket (mallocBytes 100*1024*1024) free $ \_ -> getLine

This time when you press enter, the memory usage will drop back down to its pre mallocBytes usage.

This also happens for compiled programs.

Note: the above test will only alloc virtual memory since nothing is ever read into the malloc'd memory. The following test program will actually force the pages to be mapped. It results in the same broken behaviour with the addition that the resident size of ghci gets stuck high.

module Main where

import Control.Exception 
import Foreign.Marshal.Alloc
import Foreign.Ptr
import Data.Word
import System.IO
import System.Mem

test :: Int -> IO ()
test mb =
    let size = mb * 1024 * 1024 in
        do -- bracket (mallocBytes size) free $ wait size -- this version works
           allocaBytes size $ wait size  -- this version does not free the memory afterwards
           performGC

wait :: Int -> Ptr Word8 -> IO ()
wait size p =    
    bracket (openBinaryFile "/dev/scsi/host0/bus0/target0/lun0/part1" ReadMode) (hClose) (\h -> hGetBuf h p size >> putStrLn "waiting..." >> getLine >> return ()) 

main =
    do test 150
       putStrLn "is it free?"
       getLine
       return ()

Change History

  Changed 7 years ago by simonmar

  • priority changed from normal to low
  • component changed from Compiler (FFI) to Runtime System
  • summary changed from allocaBytes does not actually free the memory after the computation to GHC's internal memory allocator never releases memory back to the OS

I've changed $(subject), this bug is actually just an instance of the more general problem that GHC never releases any memory back to the OS. The memory allocated by allocaBytes has actually been "freed", but it is still held by GHC's memory allocator, and not the OS.

Normally this isn't a big problem, because the memory will be re-used by GHC. However, in cases where you have a large spike in memory use or want to temporarily allocate a very large object, it becomes noticeable.

  Changed 7 years ago by simonmar

  • difficulty changed from Unknown to Moderate (1 day)
  • architecture changed from x86 to Multiple

  Changed 7 years ago by igloo

  • testcase set to N/A
  • milestone set to 6.8

  Changed 6 years ago by guest

  • cc Bulat.Ziganshin@… added

it becomes important when we are short on physical memory - in this case pages with garbage are written to swapfile :(

it also important for long-running programs

  Changed 6 years ago by simonmar

  • milestone changed from 6.8 branch to 6.10 branch

  Changed 5 years ago by simonmar

  • architecture changed from Multiple to Unknown/Multiple

  Changed 4 years ago by igloo

  • owner set to igloo
  • milestone changed from 6.10 branch to 6.12 branch

  Changed 4 years ago by ksf

  • cc barsoap@… added

  Changed 4 years ago by guest

  • cc gwern0@… added

I second Bulat's comment. This is *very* important for long-running servers. Consider the Gitit wiki server. There are a few pages like 'Recent changes' which get the entire revision history; each time this happens, the memory usage goes up a bit - even though even last bit of the history will get discarded once the page has been constructed and sent off to the client. (We've checked for memory leaks.) This is especially problematic when Gitit is being used, surprisingly enough, as a web host, since typically this is done on virtualized slices or is otherwise resource-constrained. It looks bad for wiki.darcs.net that an idling gitit takes 31% of RAM.

  Changed 4 years ago by asklinge

  • cc asklingenberg@… added

follow-up: ↓ 12   Changed 4 years ago by zooko

"each time this happens, the memory usage goes up a bit - even though even last bit of the history will get discarded once the page has been constructed and sent off to the client. (We've checked for memory leaks"

If the memory usage continues to grow when a substantially similar task is being performed over and over, then this is a memory leak. That's a separate issue from returning memory to the OS. Personally I don't think returning memory to the OS is worth the effort. The OS will swap it out anyway when it is unused. (Hint: never look at the "virtual memory size" on linux. It is a useless and misleading number -- it doesn't correlate with anything that we actually care about.)

in reply to: ↑ 11   Changed 4 years ago by simonmar

Replying to zooko:

If the memory usage continues to grow when a substantially similar task is being performed over and over, then this is a memory leak. That's a separate issue from returning memory to the OS.

Exactly. If anyone can demonstrate a leak, please create a separate ticket.

Personally I don't think returning memory to the OS is worth the effort. The OS will swap it out anyway when it is unused. (Hint: never look at the "virtual memory size" on linux. It is a useless and misleading number -- it doesn't correlate with anything that we actually care about.)

Which is why it hasn't been addressed so far. But there are people who still care about the absolute virtual size of their processes - perhaps because they want to avoid the swapping, or maybe just because they don't like seeing huge processes in top.

  Changed 4 years ago by JeremyShaw

I believe that virtual memory counts against your quota on most VPSes. On my VPS with a 64MB quota, I have a Haskell-based server which only requires 7MB RSS (and 5MB of that is SHR), but requires 40MB virtual.

In general, the threaded RTS seems to use a lot of virtual memory. On my system, the one-liner:

main = getLine >> return ()

compiled with, ghc --make -O2 -threaded -o main

requires less than 1MB of RSS but 20MB of virtual. If (unused) virtual memory really does count against my quota, then there could be an actual costs savings involved...

  Changed 4 years ago by simonmar

Jeremy - I think what you're describing here is not related to freeing of memory in the RTS. The virtual memory used by the trivial program is mostly shared libraries. Try cat /proc/<pid>/maps - on my x86_64/Linux system here, the trivial program without -threaded needs 16MB of VM, and with -threaded needs 20MB. The shared library mappings account for most of the 16MB, and the extra 4MB with -threaded is due to the OS threads which seem to reserve 2MB each. GHC's RTS has only allocated 1MB in each case, and there is no memory to free back to the OS.

  Changed 4 years ago by NeilMitchell

  • cc ndmitchell@… added

This issue is much more important now we have dll's. Imagine a program that calls out to a Haskell dll, and then returns some result. If the intermediate computation uses 200Mb of RAM then after returning it's very easy to have 200Mb of heap with virtually nothing in it. The argument that other Haskell bit's will soon reuse that heap is also less valid with dll's, as the main program will not be sharing the same heap.

  Changed 4 years ago by simonmar

  • os changed from Linux to Unknown/Multiple

I'm happy to provide pointers to anyone that wants to work on this.

follow-up: ↓ 22   Changed 4 years ago by crutcher

I want to work on this.

It seems that there's little agreement on what the 'right' behavior is, because there are many different execution models for which different behaviors are preferable. It seems we need a means of setting a memory reclamation policy, and plugging in some number of implementations of that policy, with flags to set it.

Off the top of my head, I see a few obvious ones:

  • Never return free memory (the current behavior)
  • Immediately return free memory (the notional behavior)
  • Return outstanding free memory on 'flush' events (nice for the dll case?)
  • Fixed Buffer - return free memory over X, for some buffer size X.
  • Ratio Buffer - return free memory over R, for some ratio of used memory.

And there's this one, which I'd like to be able to play with, but has numerous knobs.

  • Derivative Ratio Buffer - at time t, estimate the derivative D(t) of memory use, and return free memory over R*D(t+h) for some ration R and time step h.

Thoughts?

follow-up: ↓ 21   Changed 4 years ago by guest

Fixed buffer would be quite acceptable for the Gitit (and most servers) use case, I think.

I originally tried to work around the memory growth for darcs.net by adding the max heap option (the -m option IIRC); turned out that setting an upper bound didn't trigger returns to the OS but segfaults.

-- gwern

  Changed 4 years ago by crutcher

I've been talking to a colleague about this, and it seems that we can host essentially all of the other approaches on the 'fixed buffer' approach, by periodically sparking a process to change the buffer size.

I need to spend some more time mucking about; I'm using this as a ghc hacking starter project. I'd appreciate any suggestions :)

  Changed 4 years ago by crutcher

meh. This page (pointed to by the code and the wiki) is absent:  http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/GC

All of this comes down to the need for a process which, given a target X of memory to free, comes as close as reasonably possible to doing so.

The simplest approach would be to release free blocks and megablocks until X is satisfied, or there are no more candidate blocks or megablocks. This will likely give us reasonable behavior in the allocBytes case, but without some form of compaction, this doesn't really solve the larger problem.

I don't understand the garbage collector well enough yet, but I imagine we could entice it or some of its machinery to clear our candidate blocks for us. I'll keep reading :P

in reply to: ↑ 18   Changed 4 years ago by simonmar

Replying to guest:

I originally tried to work around the memory growth for darcs.net by adding the max heap option (the -m option IIRC); turned out that setting an upper bound didn't trigger returns to the OS but segfaults.

If you see a segfault, please report it!

in reply to: ↑ 17   Changed 4 years ago by simonmar

Replying to crutcher:

I want to work on this. It seems that there's little agreement on what the 'right' behavior is, because there are many different execution models for which different behaviors are preferable. It seems we need a means of setting a memory reclamation policy, and plugging in some number of implementations of that policy, with flags to set it. Off the top of my head, I see a few obvious ones: * Never return free memory (the current behavior) * Immediately return free memory (the notional behavior) * Return outstanding free memory on 'flush' events (nice for the dll case?) * Fixed Buffer - return free memory over X, for some buffer size X. * Ratio Buffer - return free memory over R, for some ratio of used memory. And there's this one, which I'd like to be able to play with, but has numerous knobs. * Derivative Ratio Buffer - at time t, estimate the derivative D(t) of memory use, and return free memory over R*D(t+h) for some ration R and time step h.

Bear in mind that the actual memory requirements fluctuate over time due to GC activity. When the copying GC is being used, at a major GC we require F*L0+L1 memory, where L0 is the amount of live data at the last GC, L1 is the current live data, and F is the value set by +RTS -F (default 2). In practice we'll need a little bit more than this, because we GC the cycle after the limit has been reached.

So I suggest that after a major GC we

  • estimate the amount of memory required at the next GC, assuming live data remains constant, call this M, and add a constant C
  • release any whole megablocks over this limit

Provide a way to set C, and/or define it as a fraction of M. I imagine that C == M would be a reasonable default: keep double the current requirements around just in case. People who want to be frugal with memory could set C == M/3. Programs with wildly varying memory requirements will suffer a performance hit if C is too low.

Memory could be released between GCs, but the live data value can only be calculated at a major GC, so it makes most sense to release memory at a major GC. Programs compiled with -threaded get an automatic major GC when they're idle (idle time set by +RTS -I), programs compiled without -threaded will have to call System.Mem.performGC to release memory if they intend to go idle.

  Changed 4 years ago by YitzGale

  • cc gale@… added

  Changed 4 years ago by simonmar

  • difficulty changed from Moderate (1 day) to Moderate (less than a day)

  Changed 3 years ago by PHO

  • cc pho@… added
  • failure set to None/Unknown

  Changed 3 years ago by igloo

  • milestone changed from 6.12 branch to 6.12.3

follow-up: ↓ 28   Changed 3 years ago by tinlix

  • version changed from 6.4.1 to 6.12.1

I have the following use case where it is important for the GC to free memory. I have a huge matrix file which needs to be multiplied with a vector. The matrix is 100Gb in size, so it's not possible to use matlab to do it.

I used lazy string and thought I could use the readFile in Data.ByteString?.Lazy to read in the matrix one row at a time and do the multiplication with the vector. The whole program just does a simple run over the huge file. It turns out the program run out of memory as it goes through the big file.

Finally I have to go back to C++/STL which enables me to write a program that takes constant memory and solves the problem fine.

IMHO, this bug is serious. What a GC should provide, i.e. constant memory consumption now becomes O(n). In a sense, this makes functions such as readFile useless for serious use.

in reply to: ↑ 27   Changed 3 years ago by igloo

Replying to tinlix:

I have the following use case where it is important for the GC to free memory. I have a huge matrix file which needs to be multiplied with a vector. The matrix is 100Gb in size, so it's not possible to use matlab to do it. I used lazy string and thought I could use the readFile in Data.ByteString?.Lazy to read in the matrix one row at a time and do the multiplication with the vector. The whole program just does a simple run over the huge file. It turns out the program run out of memory as it goes through the big file. Finally I have to go back to C++/STL which enables me to write a program that takes constant memory and solves the problem fine. IMHO, this bug is serious. What a GC should provide, i.e. constant memory consumption now becomes O(n). In a sense, this makes functions such as readFile useless for serious use.

From your description, it sounds like your program has a space leak. This ticket is not the problem: it is about the memory usage not dropping after a peak, but you expect the memory usage to be constant, so won't have any peaks.

I suggest you explain the problem, including the code, to the  haskell-cafe mailing list.

  Changed 3 years ago by igloo

  • priority changed from low to normal
  • milestone changed from 6.12.3 to 6.14.1

  Changed 3 years ago by Remi

  • cc rturk@… added

  Changed 3 years ago by zooko

  • cc zooko@… added

I still don't believe that it is important for GHC to release memory back to the operating system during a process, and I'm unsubscribing from this ticket so hopefully I'll hear no more about it.

  Changed 3 years ago by zooko

  • cc zooko@… removed

  Changed 3 years ago by simonmar

  • priority changed from normal to high

  Changed 3 years ago by zooko

Unfortunately I'm still receiving notifications in my email about this ticket. I've already removed my email address from the Cc: line. If anyone who understands the trac config better could take me off this ticket I would appreciate it.

  Changed 3 years ago by zooko

  • cc zooko@… added

Adding myself to Cc: with the intent to then un-add myself again and see if that makes me receive no more mail about this.

  Changed 3 years ago by zooko

  • cc zooko@… removed

Removing myself from Cc:.

  Changed 3 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Fixed:

Fri Aug 13 10:04:02 PDT 2010  Ian Lynagh <igloo@earth.li>
  * Return memory to the OS; trac #698

  Changed 3 years ago by simonmar

  • owner igloo deleted
  • status changed from closed to new
  • resolution fixed deleted

On Win32 we're de-committing memory, but we're not releasing the reserved address space. This is important to some people (Lennart & co. at Standard Chartered need this for example) so I'm re-opening the ticket.

  Changed 3 years ago by simonmar

  • owner set to igloo

  Changed 3 years ago by simonpj

  • priority changed from high to highest

Standard Chartered say it's release critical, and offer to help.

  Changed 3 years ago by zooko

  • cc zooko@… added

  Changed 3 years ago by zooko

  • cc zooko@… removed

  Changed 3 years ago by igloo

  • testcase N/A deleted

  Changed 3 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Fixed by:

Mon Nov  1 16:18:02 GMT 2010  Ian Lynagh <igloo@earth.li>
  * On Windows, when returning memory to the OS, we try to release it
  as well as decommiting it.

in HEAD and 7.0.

Note: See TracTickets for help on using tickets.