Ticket #1391 (closed merge: fixed)

Opened 6 years ago

Last modified 6 years ago

forkProcess() in Schedule.c with -threaded should initialize mutexes in child process (POSIX)

Reported by: thorkilnaur Owned by: igloo
Priority: high Milestone: 6.8.1
Component: Runtime System Version: 6.7
Keywords: Cc:
Operating System: MacOS X Architecture: powerpc
Type of failure: Difficulty: Unknown
Test Case: forkprocess01(ghci) Blocked By:
Blocking: Related Tickets:

Description

forkProcess() in Schedule.c implements System.Posix.Process.forkProcess essentially by fork()'ing. In  http://www.gnu.org/software/libc/manual/html_node/Threads-and-Fork.html we read that

It's not intuitively obvious what should happen when a multi-threaded POSIX process calls fork. ... fork duplicates the whole memory space, including mutexes in their current locking state, but only the calling thread: other threads are not running in the child process. The mutexes are not usable after the fork and must be initialized with pthread_mutex_init in the child process.

Although a lot of things happen in forkProcess() in Schedule.c in the child process after fork() returns, this initialization of mutexes and related delicate matters are not done.

On my PPC Mac OS X 10.4.9, I have observed this to eventually result in the child process getting a SIGSEGV (signal 11; segmentation fault) when run via ghc --interactive. The failing test case forkprocess01(ghci) for PPC Mac OS X is an example of this.

The case of forkprocess01 failing when run via ghci --interactive comes about because ghc itself is linked with -threaded. With a ghc linked without -threaded, the segmentation fault does not happen.

When executing forkprocess01 compiled with --make -threaded, the error again cannot be reproduced. But this seems to be because forkprocess01 itself does not use multiple threads. With a slightly extended version of forkprocess01 called forkprocess03:

-- forkprocess03.hs:
-- Test that we can call exitFailure in a forked process, and have it
-- communicated properly to the parent.
-- Do this (forkprocess01) within a forkIO'ed child process.
import System.Exit
import System.Posix.Process
import Control.Concurrent
main0 = do
  p <- forkProcess $ exitWith (ExitFailure 72)
  r <- getProcessStatus True False p
  print r
main = do
  p <- forkIO $ main0
  threadDelay 10000000
  print p

the erroneous reaction can be observed when compiled with -threaded using a fairly recent ghc HEAD:

$ /Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-for-HughesPJ-wrong-fill-indent-20070506_1304/ghc/compiler/stage2/ghc-inplace --version
The Glorious Glasgow Haskell Compilation System, version 6.7.20070513
$ touch forkprocess03.hs
$ /Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-for-HughesPJ-wrong-fill-indent-20070506_1304/ghc/compiler/stage2/ghc-inplace --make forkprocess03 -threaded
[1 of 1] Compiling Main             ( forkprocess03.hs, forkprocess03.o )
Linking forkprocess03 ...
$ ./forkprocess03
Just (Terminated 11)
ThreadId 4
$

Whereas without -threaded, the program seems to run fine:

$ touch forkprocess03.hs
$ /Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-for-HughesPJ-wrong-fill-indent-20070506_1304/ghc/compiler/stage2/ghc-inplace --make forkprocess03
[1 of 1] Compiling Main             ( forkprocess03.hs, forkprocess03.o )
Linking forkprocess03 ...
$ ./forkprocess03
Just (Exited (ExitFailure 72))
ThreadId 2
$

The repair, however, is not particularly obvious. The above reference suggests using pthread_atfork to set up handlers to lock all mutexes before fork()'ing and subsequently unlock them in the parent and initializing them in the child. But even if this is chosen as the way forward, additional matters need to be clarified, to ensure that such handling plays well with the rest of the threaded runtime system and also retains the Windows variant of things. Also, if anything is done about this, #1185 should probably considered as well.

Attachments

experimental_initialization_of_task_locks_after_fork_1391.dpatch Download (0.5 KB) - added by guest 6 years ago.
Patch to perform experimental initialization of task locks after fork() (#1391)

Change History

  Changed 6 years ago by igloo

  • priority changed from normal to high
  • milestone set to 6.8

  Changed 6 years ago by igloo

Thanks for the detailed analysis thorkil!

  Changed 6 years ago by simonmar

The POSIX spec doesn't say anything about initialising mutexes in the child, as far as I can see. The most pertinent section is pthread_atfork:

 http://www.opengroup.org/onlinepubs/009695399/functions/pthread_atfork.html

So this requirement must be specific to GNU libc (or perhaps to MacOS?). The GNU libc documentation no longer includes threading stuff, and the link you supplied above doesn't work. The man page for pthread_atfork on my system here just reproduces the POSIX text.

As far as I can see, there are only two mutexes that might be problematic: sched_mutex and cap->lock for the single Capability (forkProcess isn't supported when there are multiple capabilities). It might be worth just holding these across the call to fork(), and releasing them in both the child and parent (or perhaps, initialising them in the child). Thorkil: could you try this?

follow-up: ↓ 5   Changed 6 years ago by guest

With your patch

Fri Sep 14 16:55:19 CEST 2007  Simon Marlow <simonmar@microsoft.com>
  * attempt to fix #1391, hold locks across fork() and initialize them in the child

the forkprocess01 test still fails:

$ make TEST=forkprocess01 stage=2 WAY=ghci
...
=====> forkprocess01(ghci)
cd . && '/Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-complete-for-pulling-and-copying-20070713_1212/ghc/compiler/stage2/ghc-inplace' -no-recomp -dcore-lint -dcmm-lint -Dpowerpc_apple_darwin  forkprocess01.hs --interactive -v0 -package unix  <forkprocess01.genscript 1>forkprocess01.interp.stdout 2>forkprocess01.interp.stderr
Actual stdout output differs from expected:
--- ./forkprocess01.stdout.normalised   2007-09-26 20:34:18.000000000 +0200
+++ ./forkprocess01.run.stdout.normalised       2007-09-26 20:34:18.000000000 +0200
@@ -1 +1 @@
-Just (Exited (ExitFailure 72))
+Just (Terminated 11)
*** unexpected failure for forkprocess01(ghci)

OVERALL SUMMARY for test run started at Wed Sep 26 20:34:14 CEST 2007
       7 total tests, which gave rise to
      49 test cases, of which
       0 caused framework failures
      48 were skipped

       0 expected passes
       0 expected failures
       0 unexpected passes
       1 unexpected failures

Unexpected failures:
   forkprocess01(ghci)

$

The crash report indicates a problem related to the locks:

**********

Host Name:      Thorkil-Naurs-Computer
Date/Time:      2007-09-26 21:03:17.246 +0200
OS Version:     10.4.10 (Build 8R218)
Report Version: 4

Command: ghc-6.9.20070925
Path:    /Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-complete-for-pulling-and-copying-20070713_1212/ghc/compiler/stage2/ghc-6.9.20070925
Parent:  ghc-6.9.20070925 [13662]

Version: ??? (???)

PID:    13665
Thread: 0

Exception:  EXC_BAD_ACCESS (0x0001)
Codes:      KERN_INVALID_ADDRESS (0x0001) at 0xfffffffc

Thread 0 Crashed:
0   libSystem.B.dylib   0x9002c514 restore_sem_to_pool + 84
1   libSystem.B.dylib   0x90001c94 pthread_mutex_lock + 604
2   ghc-6.9.20070925    0x012af064 waitForReturnCapability + 224 (crt.c:355)
3   ghc-6.9.20070925    0x0117e780 scheduleDoGC + 180 (crt.c:355)
4   ghc-6.9.20070925    0x0117ea88 exitScheduler + 96 (crt.c:355)
5   ghc-6.9.20070925    0x012a52fc hs_exit_ + 112 (crt.c:355)
6   ghc-6.9.20070925    0x012a5428 shutdownHaskellAndExit + 48 (crt.c:355)
7   <<00000000>>        0x041425c4 0 + 68429252
8   ghc-6.9.20070925    0x0117fa30 schedule + 820 (crt.c:355)
9   ghc-6.9.20070925    0x012a37e4 rts_evalStableIO + 72 (crt.c:355)
10  ghc-6.9.20070925    0x0117ec6c forkProcess + 368 (crt.c:355)
11  <<00000000>>        0x0321c580 0 + 52544896
12  ghc-6.9.20070925    0x0117fa30 schedule + 820 (crt.c:355)
13  ghc-6.9.20070925    0x0117ff1c workerStart + 88 (crt.c:355)
14  libSystem.B.dylib   0x9002bd08 _pthread_body + 96

Thread 1:
0   <<00000000>>        0xffff85d8 __spin_lock_relinquish + 24 (cpu_capabilities.h:186)
1   libSystem.B.dylib   0x90001a94 pthread_mutex_lock + 92
2   ghc-6.9.20070925    0x012aec68 releaseCapability_ + 296 (crt.c:355)
3   ghc-6.9.20070925    0x012af224 yieldCapability + 156 (crt.c:355)
4   ghc-6.9.20070925    0x0117e734 scheduleDoGC + 104 (crt.c:355)
5   ghc-6.9.20070925    0x0117f7a0 schedule + 164 (crt.c:355)
6   ghc-6.9.20070925    0x0117ff1c workerStart + 88 (crt.c:355)
7   libSystem.B.dylib   0x9002bd08 _pthread_body + 96

Thread 0 crashed with PPC Thread State 64:
  srr0: 0x000000009002c514 srr1: 0x000000000200f030                        vrsave: 0x0000000000000000
    cr: 0x22000224          xer: 0x0000000020000000   lr: 0x000000009002c4f8  ctr: 0x000000009002c4c0
    r0: 0x0000000000000001   r1: 0x00000000f00fd7d0   r2: 0x00000000ffffffff   r3: 0x00000000a000f014
    r4: 0x0000000000000000   r5: 0x0000000000000000   r6: 0x00000000ffffffff   r7: 0x0000000000000000
    r8: 0x0000000000000000   r9: 0x0000000000000000  r10: 0x00000000a000c4c8  r11: 0x00000000fffffffc
   r12: 0x000000009002c4c0  r13: 0x0000000000000000  r14: 0x0000000003414079  r15: 0x0000000000000048
   r16: 0x00000000035ebf61  r17: 0x00000000035eb6dd  r18: 0x00000000042abb2d  r19: 0x0000000000000000
   r20: 0x0000000000000000  r21: 0x0000000000000000  r22: 0x0000000000000000  r23: 0x00000000035ebf61
   r24: 0x00000000014e0000  r25: 0x0000000012141968  r26: 0x00000000a0001a48  r27: 0x0000000003100510
   r28: 0x00000000a0001fac  r29: 0x000000009fffc4c8  r30: 0x0000000000001003  r31: 0x000000009002c4c8

Binary Images Description:
    0x1000 -  0x142ffff ghc-6.9.20070925        /Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-complete-for-pulling-and-copying-20070713_1212/ghc/compiler/stage2/ghc-6.9.20070925
 0x2705000 -  0x2730fff GMP     /Library/Frameworks/GMP.framework/Versions/A/GMP
0x8fe00000 - 0x8fe52fff dyld 46.12      /usr/lib/dyld
0x90000000 - 0x901bcfff libSystem.B.dylib       /usr/lib/libSystem.B.dylib
0x90214000 - 0x90219fff libmathCommon.A.dylib   /usr/lib/system/libmathCommon.A.dylib

Inspired by some experiments I made when investigating this some time ago, I introduced initialization of additional locks, as sketched in the attached patch. With that change, the segmentation fault disappears and forkprocess01(ghci) succeeds. I should stress that I don't consider this patch in any way the solution to the problem, something additional or different will surely be needed. But I need to leave the matter in this, somewhat unresolved state for now.

Best regards Thorkil

Changed 6 years ago by guest

Patch to perform experimental initialization of task locks after fork() (#1391)

in reply to: ↑ 4   Changed 6 years ago by simonmar

Thanks Thorkil. I've pushed another change that should initialize the missing lock - it does essentially the same thing as your patch, but only a single Task is important here, the others are all discarded in the child process. Hopefully this will finally fix it.

  Changed 6 years ago by simonmar

Thorkil - following our conversation at ICFP, I've pushed a patch that calls initMutex for every Task:

Tue Oct  9 13:24:09 BST 2007  Simon Marlow <simonmar@microsoft.com>
  * also call initMutex on every task->lock, see #1391

could you let me know if that works for you?

  Changed 6 years ago by thorkilnaur

  • status changed from new to closed
  • resolution set to fixed

That seems to have done it: With your patch, forkprocess01 succeeds every time and I no longer see the segmentation fault.

Thanks and best regards Thorkil

  Changed 6 years ago by thorkilnaur

  • status changed from closed to reopened
  • resolution fixed deleted

Oh, I forgot, you mentioned perhaps pushing this to the stable branch, so I'll just reopen.

  Changed 6 years ago by thorkilnaur

  • owner set to simonmar
  • status changed from reopened to new

  Changed 6 years ago by simonmar

  • owner changed from simonmar to igloo
  • type changed from bug to merge
  • milestone changed from 6.8 branch to 6.8.1

Following patches need to be merged:

Tue Oct  9 13:24:09 BST 2007  Simon Marlow <simonmar@microsoft.com>
  * also call initMutex on every task->lock, see #1391

Thu Sep 27 10:13:31 BST 2007  Simon Marlow <simonmar@microsoft.com>
  * also acquire/release task->lock across fork()

Fri Sep 14 15:55:19 BST 2007  Simon Marlow <simonmar@microsoft.com>
  * attempt to fix #1391, hold locks across fork() and initialize them in the ch
ild

  Changed 6 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

All merged

Note: See TracTickets for help on using tickets.