Ticket #3790 (closed merge: fixed)

Opened 3 years ago

Last modified 3 years ago

control-C causes segfault, siginfo_t* can be NULL on Solaris

Reported by: duncan Owned by: igloo
Priority: normal Milestone: 6.12.2
Component: Runtime System Version: 6.12.1
Keywords: Cc:
Operating System: Solaris Architecture: sparc
Type of failure: Runtime crash Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

On Sparc Solaris, all ghc-compiled programs (including ghc and ghci) segfault when interrupted with control-C.

This happens with ghc-6.10.4 and 6.12.1. It worked OK in ghc-6.8.3.

The following is a gdb backtrace generated from a core file produced from running ghc-6.12.1 and hitting C.

#0  0xff0b05d4 in memcpy ()
   from /platform/SUNW,SPARC-Enterprise-T5120/lib/libc_psr.so.1
#1  0x01d17040 in generic_handler ()
#2  <signal handler called>
#3  0xff1c4a34 in __lwp_park () from /lib/libc.so.1
#4  0xff1be968 in cond_sleep_queue () from /lib/libc.so.1
#5  0xff1bea84 in cond_wait_queue () from /lib/libc.so.1
#6  0xff1bf004 in cond_wait () from /lib/libc.so.1
#7  0xff1bf040 in pthread_cond_wait () from /lib/libc.so.1
#8  0x01d16ab0 in waitCondition ()
#9  0x01d01058 in yieldCapability ()
#10 0x01d07c08 in schedule ()
#11 0x01d055e4 in real_main ()
#12 0x01d0578c in hs_main ()
#13 0x00515434 in _start ()

The backtrace from a trivial ghc-compiled program looks similar. The only interesting thing this tells us is that the problem happens with the single-threaded and multi-threaded RTSs similarly.

Looking at generic_handler in ./posix/Signals.c, in the defined(THREADED_RTS) case:

        StgWord8 buf[sizeof(siginfo_t) + 1];
        int r;

        buf[0] = sig;
        memcpy(buf+1, info, sizeof(siginfo_t));

and in the ! defined(THREADED_RTS):

memcpy(next_pending_handler, info, sizeof(siginfo_t));

So it would appear that the siginfo_t *info parameter to generic_handler is NULL.

The Solaris manpage for sigaction indicates that the siginfo_t pointer may be NULL. Presumably it is non-NULL for the kind of signals that have extra info, like SIGCHILD, but not for SIGINT. The manpage for siginfo.h lists a number of kinds of signals that do receive a siginfo_t but SIGINT is not amongst them.

So this would explain why it worked in 6.8.3, since we only started using the siginfo_t * in 6.10.

So I guess we either push a "null"/"empty" siginfo_t down the IO manager pipe, or provide a way to indicate that there is no siginfo_t supplied.

Change History

Changed 3 years ago by simonmar

  • milestone set to 6.12.2

Your analysis looks reasonable. I think it should be ok to just memset(buf+1, 0, sizeof(siginfo_t)) in the case that info == NULL, because looking at the code in libraries/unix/System/Posix/Signals.hsc we use the si_errno field of the siginfo_t, but we only look at the rest if the signal is SIGCHLD. A value of zero for si_errno is a reasonable default.

Would you like to test and submit a patch?

Changed 3 years ago by simonmar

  • owner set to simonmar
  • status changed from new to assigned

Changed 3 years ago by simonmar

  • owner changed from simonmar to igloo
  • status changed from assigned to new
  • type changed from bug to merge

Fixed:

Tue Jan 26 07:54:49 PST 2010  Simon Marlow <marlowsd@gmail.com>
  * Fix signal segfaults on Solaris (#3790)

Changed 3 years ago by igloo

  • status changed from new to closed
  • resolution set to fixed

Merged

Note: See TracTickets for help on using tickets.