Ticket #7229 (new bug)

Opened 9 months ago

Last modified 6 months ago

Detecting if a process was killed by a signal is impossible

Reported by: benmachine Owned by:
Priority: high Milestone: 7.8.1
Component: libraries/process Version:
Keywords: Cc: andersk@…, anton.nik@…, dcoutts
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Currently there is no good way of detecting if a process was terminated by a signal. We have the following problems:

  • Firstly, the API doesn't make any allowance for it. This may be because the concept doesn't exist on Windows, I'm not sure. But since the concept does exist on POSIX, we do have to make a decision about what to do there (or if we have made one, document it).
  • waitForProcess attempts to give the signal in the exit code, but the underlying C function doesn't use the WTERMSIG macro. It so happens that on my system it works fine anyway, but it has no right to in principle (and in any case, what it does is both sub-optimal (in that getting killed by SIGINT is indistinguishable from returning 3) and undocumented).
  • The C function getProcessExitCode is much worse. When a process has exited due to a signal, it tests WIFSIGNALED, and if true, sets errno = EINTR (huh?) and returns -1. Since it is called from throwErrnoIfMinus1Retry, the result is that it immediately gets called again: but this time the child doesn't exist anymore. The behaviour in the case of ECHILD is, bizarrely, to pretend that there was a child and it exited with status 0, i.e. success. Which is just untrue.

I don't know what the right behaviour is, but the above is both inconsistent and unhelpful. The lack of WTERMSIG is easily fixed, but that's the least of the problems here: we really need to work out what ought to happen before we start making it happen.

Change History

Changed 8 months ago by andersk

  • cc andersk@… added

Changed 8 months ago by lelf

  • cc anton.nik@… added

Changed 8 months ago by simonmar

  • priority changed from normal to high
  • difficulty set to Unknown
  • milestone set to 7.6.2

I have adopted the shell convention of using 128 + signal number as the exit status for a process that died due to a signal. Symbolic constants for the signal names are available from System.Posix.Signals. I've also fixed the slightly strange behaviour of getProcessExitCode - I dug through the logs a bit, but the current code goes back a long way and I couldn't find the reason for fixing EINTR.

Would anyone object to this going into 7.6.2? There is a semantic change, but it is very unlikely that anyone was depending on the old behaviour. We would need a minor version bump to process so that clients can conditionally depend on the new behaviour.

patches to process package:

commit 54038240284b11ad4117ca075fa2292f5069bc45
Author: Simon Marlow <marlowsd@gmail.com>
Date:   Mon Sep 24 09:50:24 2012 +0100

    Use (128+signal) as the exit code when a proc terminates due to a signal (#7229)

commit 9547cf40ac1ddcd471a8ce75f927b382a82c1038
Author: Simon Marlow <marlowsd@gmail.com>
Date:   Mon Sep 24 12:11:14 2012 +0100

    Test for #7229

commit 007fb056f2e77e65196de4bff94ea001e69a12eb
Author: Simon Marlow <marlowsd@gmail.com>
Date:   Mon Sep 24 12:27:43 2012 +0100

    Documentation for signal exit codes (#7229)

Changed 8 months ago by andersk

While that’s a small improvement over the previous undocumented behavior, it’s still not a real solution. In Unix programming, it’s important to be able to reliably distinguish between WIFEXITED(status) (e.g. exit(130)) and WIFSIGNALED(status) (e.g. raise(SIGINT)), so that one can implement WCE (wait and cooperative exit) on SIGINT. The 128 + signal thing that you see in your shell is specific to your shell; real languages do make this distinction.

Really the right answer is to add a constructor to ExitCode. While we’re doing that, we should expose WCOREDUMP(status) for completeness: ExitSignaled {exitSignal :: Int, exitCoreDump :: Bool}.

Changed 8 months ago by benmachine

Thanks Simon, that's certainly an improvement. I'm inclined to agree with andersk that it's not the principled solution.

My man page for WCOREDUMP says

       WCOREDUMP(status)
              returns  true  if  the  child produced a core dump.  This macro
              should only be employed if  WIFSIGNALED  returned  true.   This
              macro  is not specified in POSIX.1-2001 and is not available on
              some UNIX implementations (e.g., AIX, SunOS).   Only  use  this
              enclosed in #ifdef WCOREDUMP ... #endif.

I don't know how worried we are about that.

Changed 8 months ago by simonmar

Ok, points taken. 128+signal is not good, because that overlaps with exit codes.

Unfortunately we can't add the right information to ExitCode, because it is a platform-independent type. The correct type already exists: it is called System.Posix.ProcessStatus.

data ProcessStatus = Exited ExitCode
                   | Terminated Signal
                   | Stopped Signal
		   deriving (Eq, Ord, Show)

Unfortunately whoever wrote this forgot to add a Bool to indicate a core dump in the Terminated constructor. We *could* fix that.

Now, the right thing to do would be to create a new process-unix package containing

module System.Process.Posix where
waitForProcess :: ProcessHandle -> IO ProcessStatus
getProcessStatus :: ProcessHandle -> IO (Maybe ProcessStatus)

It can't be part of the process package because the API of a package cannot differ depending on the platform. It could be part of the unix package, but then we have to move a chunk of code from the process package into the unix package, which is annoying.

I propose as an intermediate solution that we just fix the ExitCode encoding: instead of 128+signal, use (signal << 8) + exit code, with a core dump setting the 0x8000 bit.

Changed 8 months ago by andersk

Waah… that’s going to confuse anyone who knows that the real encoding is (exit code << 8) + signal (which, say, Perl forces you to learn). If that’s what we end up with, I guess that difference ought to be noted in the documentation.

I’m not sure I see the problem with an extra constructor that’s unused on non-Unix, but other than that I don’t really have anything else to propose.

Changed 8 months ago by benmachine

I'm happy with Simon's solution (one aspect that confuses me: if you're terminated by signal 3, you get (3 << 8) + exit code: what's exit code?) but I'd be happier still, I think, with an extra constructor that's unused on Windows. It just seems that in practice that's what people will write in their code anyway, using a suitable ExitCode -> Either Signal Int function (or, well, Either Int Int, to be portable against the case of Signal not existing)

Changed 8 months ago by simonmar

  • cc dcoutts added

Duncan and I discussed this on IRC, and came up with the following API:

 http://community.haskell.org/~simonmar/process-7229/System-Process.html#g:4

The rationale here is:

  • We don't want to change ExitCode for a couple of reasons: first, it is baked into Haskell 2010 and therefore difficult to change; second, it is really intended to be "exit codes you can pass to exitWith", so adding a terminated value wouldn't make much sense there.
  • We want to make it possible to obtain the full platform-specific info (e.g. ProcessStatus) out of the exit status. Making an abstract type for ExitStatus allows us to do this in the future, by building separate platform-specific packages on top of process.

  • Clients can portably ask whether a process was terminated, but this will always return Nothing on non-Unix systems.

I think wasTerminated is a bit clunky, the cases for "exited with an ExitCode?" and "terminated" should really be mutually exclusive. But we'd have to add another type to express that, which is perhaps going too far.

Changed 8 months ago by andersk

Actually, passing a signal to exitWith would make perfect sense as a way to re-raise the same signal. This is what bash does when its last child (or with -e, any child) is terminated, for example. If ExitCode won’t change, a portable exitWithStatus :: ExitStatus -> IO a that’s capable of re-raising the same signal on Unix would be useful.

Can we use CInt instead of Int so that comparing with constants in System.Posix.Signals doesn’t require a cast? Or just arrange for ExitStatus to be the same as System.Posix.Process.ProcessStatus when applicable?

Changed 8 months ago by simonmar

Using exitWith (ExitTerminated signal) as a way to kill the current process with a signal is an attractive idea indeed.

We would have to move the existing ExitCode datatype into the haskell2010 package, and do the corresponding refactorings.

Would we do this in addition to the ExitStatus changes, or instead of them? Another alternative is to provide a unix-specific way to get a ProcessId from a ProcessHandle, and then the unix-specific wait functions can be used to get a ProcessStatus.

Duncan, what do you think?

Changed 6 months ago by igloo

  • milestone changed from 7.6.2 to 7.8.1
Note: See TracTickets for help on using tickets.