Ticket #4533 (closed bug: fixed)

Opened 2 years ago

Last modified 2 years ago

IO manager can leak MVars if threads are killed

Reported by: bos Owned by: bos
Priority: normal Milestone:
Component: Runtime System Version: 7.0.1
Keywords: Cc: tibbe, pho@…, dastapov@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

If a thread is killed in threadWaitRead and no data ever comes in on the file descriptor it was waiting on, the IO manager will not remove the FdData structure from its queue of waited-for events. This causes the queue to slowly grow in size.

Test case:

-- Compile with: ghc -threaded --make

import Control.Monad
import Control.Concurrent

main = do
  forM_ [0..10000] $ \i -> do
    when (i`mod`100==0) $ print i
    t <- forkIO $ threadWaitRead 0
    threadDelay 5000
    killThread t

This is not hugely likely to affect real programs, but could lead to a (perhaps accidental) denial of service against a long-running application.

Attachments

badness-ghc-6.png Download (24.5 KB) - added by bos 2 years ago.
GHC 6 behaves just fine
badness-ghc-7.png Download (7.8 KB) - added by bos 2 years ago.
GHC 7 shows increasing memory use over time
base-4533.patch Download (3.8 KB) - added by bos 2 years ago.
A patch that, alas, doesn't work
base-4533-1.patch Download (6.0 KB) - added by bos 2 years ago.
A slightly less insane patch ... but it still doesn't work
space-fixed-ghc-7.png Download (21.5 KB) - added by bos 2 years ago.
Memory usage with fix applied - nice and constant
fix-4533.dpatch Download (17.0 KB) - added by bos 2 years ago.
Updated patch

Change History

Changed 2 years ago by bos

GHC 6 behaves just fine

Changed 2 years ago by bos

GHC 7 shows increasing memory use over time

Changed 2 years ago by bos

  • component changed from Compiler to Runtime System

Changed 2 years ago by bos

This isn't going to be entirely trivial to fix.

I think what we need to do is use  System.Mem.Weak to store the callback in an FdData as a weak reference.

We'd also add a finalizer such that, if the callback gets GCed because its thread has died, we'd use unregisterFd to drop the registration.

This should happen within System.Event.Manager rather than System.Event.Thread.

Changed 2 years ago by bos

A patch that, alas, doesn't work

Changed 2 years ago by bos

I spent a little bit of time working on this tonight, and have attached the (not currently working) patch I put together.

This deadlocks on the example program from #4514, so it's clearly seriously ill in some way. I haven't figured out exactly what's going wrong yet, but I think that maybe the callback closure is being GCed because nothing refers to it.

Maybe I'll have to extend the registerFd signature to accept a value (in this case the MVar) that the manager can hold a weak reference to, which the client guarantees to keep alive as long as it is.

Changed 2 years ago by bos

Spent a few more hours on this, but don't have working code to show for it.

Currently, it looks to me like the weak reference must live out in System.Event.Thread if we want to retain the current APIs, which is less than ideal from an encapsulation standpoint.

So threadWait creates an empty MVar, then hands the IO manager a callback that has a reference to the MVar, while the caller blocks using a takeMVar. In order for the MVar to be garbage collected if the caller is killed, the callback must hold only a weak reference to the MVar. The earlier example code was never going to work, in other words, because I wasn't thinking about it correctly.

I have some test code which does this, but it's more complicated than I'd like. Oh, and it doesn't work. I'll attach it anyway, though I'm not sure I'd recommend reading it :-(

Changed 2 years ago by bos

A slightly less insane patch ... but it still doesn't work

Changed 2 years ago by PHO

  • cc pho@… added

Changed 2 years ago by bos

  • owner set to bos

Yikes! Why did I spend most of yesterday messing around with weak pointers, when all I needed was an exception handler? I blame the Thanksgiving turkey making me slow.

Anyway. I've attached a patch that adds exception handlers to the uses of takeMVar in threadWait and threadDelay. If an exception is thrown against the thread while it blocks, they drop the event or timeout registration, then rethrow the exception.

This works with the fix for #4514, and also makes the test case for this bug run in constant space, as it should.

Changed 2 years ago by bos

  • status changed from new to patch

Changed 2 years ago by bos

Memory usage with fix applied - nice and constant

Changed 2 years ago by bos

Updated patch

Changed 2 years ago by simonmar

We need to think about the consequences of changing the base API and whether we can do this in a minor release. Normally we don't make API changes, but if the API change is just an addition and is necessary to fix a bug, then a case could be made.

Quick patch review. Instead of this:

  takeMVar m `catch` \(e::SomeException) ->
     M.unregisterTimeout mgr reg >> throw e

it's better to write

  takeMVar m `onException` M.unregisterTimeout mgr reg

However, I'm a little concerned that you haven't closed the leak fully here. I think you want a mask_:

threadDelay :: Int -> IO ()
threadDelay usecs = mask_ $ do
  Just mgr <- readIORef eventManager
  m <- newEmptyMVar
  _ <- registerTimeout mgr usecs (putMVar m ())
  takeMVar m `onException` M.unregisterTimeout mgr reg

otherwise an exception could strike after registerTimeout but before takeMVar. Perhaps it works without the mask_, but that would only be by accident.

Changed 2 years ago by bos

Thanks, Simon! I didn't know about onException or mask_, so I'll reroll the patch and use those.

The public API changes are actually for #4514, so let's discuss that over there.

Changed 2 years ago by bos

I've attached the final fix to #4514 as  base-4514+4533.dpatch.

Changed 2 years ago by adept

  • cc dastapov@… added

Changed 2 years ago by igloo

  • status changed from patch to closed
  • resolution set to fixed

Applied to HEAD and 7.0 branches, thanks.

Note: See TracTickets for help on using tickets.