Ticket #3160 (closed bug: wontfix)

Opened 4 years ago

Last modified 9 months ago

No exception safety in Control.Concurrent.QSem QSemN and SampleVar

Reported by: ChrisKuklewicz Owned by: simonmar
Priority: normal Milestone: 7.6.1
Component: libraries/base Version: 7.0.2
Keywords: Cc: greenrd@…, ChrisKuklewicz, hackage.haskell.org@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Incorrect result at runtime Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Looking at the code for QSem, QSemN, and SampleVar? shows they all use a "takeMVar" then "putMVar" programming idiom.

None of these are exception safe. An unlucky killThread will leave the MVar empty and cause the rest of the program to malfunction.

The solution is to rewrite them using withMVar and modifyMVar(_) to prevent the MVar from being left empty in the event of an exception.

Note: QSem also needs the bugfix in #3159

Change History

Changed 4 years ago by greenrd

  • cc greenrd@… added

Changed 4 years ago by igloo

  • difficulty set to Unknown

Changed 4 years ago by simonmar

I think rather than trying to get things right with modifyMVar and withMVar, we should try using STM for these abstractions. I suspect that the overhead of modifyMVar and co is comparable to the overhead of using STM (we measured this result for Chan in the original Haskell STM paper), and the STM version is much easier to get right. It would let us supply composable versions of the operations, too.

Changed 4 years ago by igloo

  • milestone set to 6.12.1

Changed 4 years ago by igloo

  • failure set to None/Unknown
  • milestone changed from 6.12.1 to 6.14.1

The simplest way forward might be to remove them from base, and put them in a de-coupled concurrency package.

Changed 2 years ago by igloo

  • milestone changed from 7.0.1 to 7.0.2

Changed 2 years ago by igloo

  • milestone changed from 7.0.2 to 7.2.1

Changed 2 years ago by ChrisKuklewicz

  • cc ChrisKuklewicz added
  • version changed from 6.10.2 to 7.0.2

I have uploaded  SafeSemaphore to hackage. The MSem and MSemQ there can supersede QSem and QSemN. The included TestKillSem?.hs demonstrates the error in QSem & QSemN and the fix in MSem & MSemN.

You can browse TestKillSem? and its results online at the  haskell wiki page.

The API for QSem is improved in MSem to handle negative numbers and is based on Integer instead of Int.

The API for QSemN is greatly expanded in MSemN.

Changed 2 years ago by ChrisKuklewicz

  • failure changed from None/Unknown to Incorrect result at runtime

I have updated SafeSemaphore? to version 0.5.0 which includes a new MSampleVar module. This fixes the problem in SampleVar? where any thread that dies while waiting to read the SampleVar? leaves the SampleVar? in a broken state. In this broken state a future writeSampleVar can block indefinitely trying to replace the current value.

The TestKillSem?.hs in SafeSemaphore? has been updated to also demonstrate that MSampleVar fixes this bug in SampleVar?.

Changed 2 years ago by ChrisKuklewicz

  • status changed from new to patch

I have submitted a set of 6 patches to cvs-ghc mailing list. The first 3 are for the testsuite and the second 3 are against base. All against a recent git pull.

Changed 2 years ago by igloo

  • owner set to simonmar

Patches are in here:  http://www.haskell.org/pipermail/cvs-ghc/2011-April/061461.html

Simon said in that thread he'd take a look, so making him owner.

Changed 21 months ago by igloo

  • milestone changed from 7.2.1 to 7.4.1

Changed 16 months ago by igloo

  • milestone changed from 7.4.1 to 7.6.1

Changed 13 months ago by ChrisKuklewicz

A small proposition: If someone would merge it into GHC then I will update the patch to merge cleanly with the desired trunk.

Changed 13 months ago by simonmar

The problem is that I just don't understand the patches. You sent a long and detailed explanation for which I'm very grateful, but nevertheless it would take a long time for me to understand your algorithms, and I'm not confident that I would be able to spot any bugs if there are any. Furthermore, you have used uninterruptibleMask in at least one place, which makes me very uneasy.

On the other hand, we know the current code is broken.

I'd be much more comfortable with using STM here. The code would be obviously correct, and probably only slightly slower than an optimised MVar implementation (and maybe even faster - all those masks and catches are not cheap).

Changed 13 months ago by ChrisKuklewicz

Ok, I see the difficulty. There are now two things I should do. One is that I should create a STM-based semaphore implementation. The second is that I should make a much clearer version of the MVar based code.

As for the current code, it would be a good thing to add a note to the documentation that the semaphore invariants can be broken when threads are killed.

Changed 13 months ago by simonpj

  • owner simonmar deleted
  • status changed from patch to new

Changed 12 months ago by liyang

  • cc hackage.haskell.org@… added

Changed 12 months ago by ChrisKuklewicz

I have updated SafeSemaphore? to 0.9.0 on hackage and git,  http://github.com/ChrisKuklewicz/SafeSemaphore

I have done several different things with this release:

There is a new Control.Concurrent.SSem module based on STM that is dead simple and replaces QSem and QSemN. I have not made a SampleVar? replacement with STM yet. The raw STM version is in Control.Concurrent.STM.SSem. This does guarantee FIFO operation and so has different starvation possibilities, especially with big and small requests to waitN.

The MSem.hs file is now a literate MSem.lhs file. This has much more elaborate comments discussing how the invariants are maintained. The uninteruptableMask_ is only being used for the quick signal operation to ensure bracket works properly.

The MSemN.hs file is now a literate MSemN.lhs file. This has extensively documented invariants and short "proof" arguments that they are maintained. The uninteruptableMask_ is used both in signal and to recover from interrupted wait operators.

The new MSemN2.hs file is a variant of MSemN.lhs that does NOT use uninteruptableMask_ in the wait operation at all. Instead it makes every wait call check for the possible interruption of the previous wait. The uninteruptableMask_ is not used on every signal call, but is used in the with combinator to guard its call to signal.

For replacing QSem/QSemN/SampleVar I will update the patch with the consensus design.

I have announced this 0.9.0 release to the Haskell-Cafe mailing list and included a link to this ticket.

Changed 12 months ago by simonmar

  • owner set to simonmar

Thanks. At the very least we should import the STM versions for 7.6.1.

Changed 9 months ago by simonmar

  • status changed from new to closed
  • resolution set to wontfix

The versions in base have been deprecated in 7.6.1, and we now recommend that people use the SafeSemaphore package instead.

This leaves the way open for having a concurrent package in the future that would combine a number of useful concurrent data structures with things like async.

Note: See TracTickets for help on using tickets.