Ticket #2560 (closed bug: invalid)

Opened 5 years ago

Last modified 4 years ago

killThread and getChanContents appear to interact strangely

Reported by: batterseapower Owned by:
Priority: normal Milestone: 6.10.2
Component: Runtime System Version: 6.9
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

I'm not sure if this an error in my program or in GHC, but I think the behaviour I'm seeing is strange enough to merit a place on the tracker.

I'm particularly confused by the fact that my correctGetChanContents appears to be able to return an empty list!

To run the test case, compile the file below and run it. It /should/ fail to terminate, with a lot of output like this:

Here we go
In the thread
Got an element
CHUNK CHUNK CHUNK: length = 1
Here we go
In the thread
Got an element
CHUNK CHUNK CHUNK: length = 1
Here we go
In the thread
Got an element

Sometimes this doesn't happen (it's nondeterministic). If it completes succesfully then try it again.

module Main where

import System.IO.Unsafe

import Control.Exception
import Control.Concurrent

import Data.Maybe

import Prelude hiding (catch)


{-

THE PROBLEM
===========

This Haskell program nondeterministically fails to terminate.

EXPLANATION
===========

This example is extracted from a larger program and hence has a number
of interacting parts that conspire to create the bug. They are:

1) Timed-out evaluation. The timeoutList function takes a lazy list computation
   and a number of microseconds and returns as much of the list as could be
   evaluated in that number of microseconds.

2) "Improving" IO. To use this you write a sort of IO action that is able to
   call yieldImprovement at any time. When this is "run" with runImprovingIO
   you get back an action that can be run to actually do the IO computation you
   specified in the first place and a lazy list which represents the sequence
   of "improvements" that will be output by that computation at some future
   point in time.
   
   In this example, the computation part of the Improving IO is spun off to
   be evaluated on another thread, and the lazy list is consumed in chunks of
   as many items as can be read in 10ms by readWithTimeout.
   
   The test makes an Improving IO action that yields "improvements" that are
   just a sequence of numbers, and makes sure that they all come back via
   the lazy list of improvements when it is read in this chunky, timed-out
   manner.

TENTATIVE CAUSE
===============

The getChanContents function seems to not be exception-safe. If the timed-out
evaluation kills its thread before it runs to completion, the channel seems to be
left in an undefined state, causing it to stop yielding values and hanging the
program.

I have demonstrated a version, correctGetChanContents, that tries to fix this problem
by blocking asynchronous exceptions during a readChan call. However, although the
program is indeed more reliable, the problem still sporadically occurs! Argh!

STRANGE THINGS
==============

If you change the definition of correctGetChanContents to include an initial call
to block and the program subsequently enters the buggy state, we get the error:

> Main: reifyList: list finished before a final value arrived

How is that possible?? This even occurs if the second call to block is removed.

-}


main :: IO ()
main = do
    {-
    This doesn't terminate, which proves that block and killThread work:
    tid <- block $ forkIO $ let loop x = loop (x + 1) in loop 0
    killThread tid
    -}
    
    -- I found that the test almost always worked when the input list
    -- went up to 5000, failed 1/3 of the time when it went up to 20000,
    -- and always failed (didn't terminate) with a list of length 100000.
    let input = [1..5000] :: [Int]
    
    -- Kick off improving IO and actually run the IO stuff in another thread
    (improving, action) <- runImprovingIO (mapM_ yieldImprovement input)
    forkIO action
    
    -- We now want to get all of the items from the input that we sent through
    -- the improving IO channel by reading from the channel in time chunks 10ms
    -- in length.
    -- See comments in readWithTimeout to find out about this -1
    output <- readWithTimeout (length input - 1) improving
    
    --print output
    --print (length output)
    
    -- Assert that the output matches
    if input == output
     then putStrLn "OK"
     else putStrLn "Nope"
  where
    readWithTimeout n improving
      | n <= 0    = return []
      | otherwise = do
          -- For the sake of argument use a 10ms time step for the timeout
          chunk <- timeoutList 10000 (allListHeads improving)
          putStrLn $ "CHUNK CHUNK CHUNK: length = " ++ show (length chunk)
          
          -- Now we have a chunk, retrieve the next chunk
          rest <- case listToMaybeLast chunk of
              Nothing             -> do
                  -- This case only appears if the timeout period was insufficient to return even a single
                  -- item from the improving value. It never occurs in practice.
                  readWithTimeout n improving
              Just last_improving -> do
                  -- This is the normal case. We got at least one item, so we should continue reading the
                  -- improving value in chunks. However, we need to bear in mind that the first item in
                  -- the chunk we are working with will actually be the one we started with, so we shouldn't
                  -- include it when working out how many additional items we need to obtain, hence the +1.
                  readWithTimeout (n + 1 - length chunk) last_improving
          
          -- We need to drop the first improving value returned from the recursive call, as it will just be
          -- the one we gave the recursive call initially, hence the drop 1.
          return $ (mapMaybe maybeHead chunk) ++ drop 1 rest


allListHeads :: [a] -> [[a]]
allListHeads list@[]       = [list]
allListHeads list@(_:rest) = list : allListHeads rest

listToMaybeLast :: [a] -> Maybe a
listToMaybeLast = listToMaybe . reverse

maybeHead :: [a] -> Maybe a
maybeHead (x:_) = Just x
maybeHead []    = Nothing


--
-- Evaluation with timeout
--

-- | Evaluates the given list for the given number of microseconds. After the time limit
-- has been reached, a list is returned consisting of the prefix of the list that was
-- successfully evaluated within the time limit.
--
-- This function does /not/ evaluate the elements of the list: it just ensures that the
-- list spine arrives in good order.
timeoutList :: Int -> [a] -> IO [a]
timeoutList timeout improving = do
    -- Create var that will be used to store the known prefix (in reverse order)
    putStrLn "Here we go"
    known_prefix_var <- newMVar []
    
    -- Go off and get as much of that prefix as we can
    thread_id <- forkIO (putStrLn "In the thread" >> go known_prefix_var improving)
    
    -- Wait for it to do its thing, then kill the thread
    threadDelay timeout
    killThread thread_id
    
    -- Return that prefix
    mb_known_prefix <- tryTakeMVar known_prefix_var
    case mb_known_prefix of
        Nothing -> error "timeoutList: bug in threading logic!"
        Just known_prefix -> return (reverse known_prefix)
  where
    go _   [] = putStrLn "Bottom" >> return ()
    go var (x:xs) = do
        putStrLn "Got an element"
        modifyMVar_ var (\current_prefix -> return (x : current_prefix))
        go var xs


--
-- The ImprovingIO monad
--

newtype ImprovingIO i a = IIO { unIIO :: Chan (Maybe i) -> IO a }

instance Monad (ImprovingIO i) where
    return x = IIO (const $ return x)
    ma >>= f = IIO $ \chan -> do
                    a <- unIIO ma chan
                    unIIO (f a) chan

yieldImprovement :: i -> ImprovingIO i ()
yieldImprovement improvement = IIO $ \chan -> writeChan chan (Just improvement)

runImprovingIO :: ImprovingIO i () -> IO ([i], IO ())
runImprovingIO iio = do
    chan <- newChan
    let action = do
            unIIO iio chan
            putStrLn "SIGNALLING LIST END - everything is available!"
            writeChan chan Nothing -- @Nothing@ signals the end of the list
    yielded_improvements <- correctGetChanContents chan
    return (reifyList yielded_improvements, action)

correctGetChanContents :: Chan a -> IO [a]
correctGetChanContents ch
  = block $ unsafeInterleaveIO (block $ do
        x  <- readChan ch
        xs <- correctGetChanContents ch
        return (x:xs)
    )


liftIO :: IO a -> ImprovingIO i a
liftIO io = IIO $ const io

reifyList :: [Maybe i] -> [i]
reifyList (Just x:rest) = x : reifyList rest
reifyList (Nothing:_)   = []
reifyList []            = error "reifyList: list finished before a final value arrived"

Change History

Changed 5 years ago by batterseapower

Forgot to mention: the test case above uses a 5000 element list. For some reason, this causes quite a few failures on my machine. If it doesn't work on yours, try bumping it up to 100000 or so, which should make the failure certain.

Changed 5 years ago by igloo

  • difficulty set to Unknown
  • os changed from MacOS X to Unknown/Multiple
  • architecture changed from x86 to Unknown/Multiple
  • milestone set to 6.10.2

I can reproduce this on amd64/Linux too, and with 6.8.2 as well as the HEAD.

Changed 4 years ago by igloo

OK, I think the problem is that

readChan = modifyMVar ...

and

modifyMVar :: MVar a -> (a -> IO (a,b)) -> IO b
modifyMVar m io =
  block $ do
    a      <- takeMVar m
    (a',b) <- unblock (io a) `onException` putMVar m a
    putMVar m a'
    return b

so although you are calling readChan with exceptions blocked, they are being unblocked by the library.

I'm not sure what the best solution is.

Changed 4 years ago by simonmar

  • status changed from new to closed
  • resolution set to invalid

I've spent a long time staring at this ticket today, and I'm not convinced there's a bug. modifyMVar is exception-safe, and so is getChanContents, as far as I can tell. Ian's point is true, that you can't guarantee to keep exceptions blocked when calling a library function, but I don't think that makes anything unsafe in this case. Exception-safety is all about making sure that MVars get replaced if an exception is raised, and I think that always holds for modifyMVar and getChanContents.

No, I suspect there's a bug in the program logic. It just loops when it gets to a chunk size of 1, and if you look at the definition of readWithTimeout, you can see why. If the chunk size is 1, it just recurses with the same value for n. At least, that's my guess - I don't fully understand what this program is trying to do.

If you disagree, plesae re-open the bug.

Note: See TracTickets for help on using tickets.