id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc	os	architecture	failure	difficulty	testcase	blockedby	blocking	related
4922	Segfault / Assertion failed in RTS (Compact.c)	dleuschner	simonmar	"Our application terminates with a segfault or an internal RTS error in about 80% of our testruns when we use the following runtime flags: 
{{{
+RTS -G4 -H1g -c -I0
}}}
Without them the application runs fine.  We discovered the problem only after having done many performance improvements to our code while doing stress tests with fast CPUs with many cores.

We compiled with the debugging runtime and got the following assertion failure:
{{{
SalviaDerivationGateway: internal error: ASSERTION FAILED: file rts/sm/Compact.c, line 171
    (GHC version 7.0.1.20110121 for x86_64_unknown_linux)
    Please report this as a GHC bug:
    http://www.haskell.org/ghc/reportabug
}}}
We're testing with a custom GHC build from the GHC 7.0 branch (with patches until yesterday).

Without the debugging runtime we sometimes get segfaults and sometimes errors like:
{{{
SalviaDerivationGateway: internal error: scavenge_mark_stack: unimplemented/strange closure type 1970861226 @ 0x7f7578f488f8
    (GHC version 7.0.1.20110121 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
}}}
The last few system calls before a segfault are:
{{{
[pid 30727] rt_sigprocmask(SIG_BLOCK, [HUP INT], [], 8) = 0
[pid 30727] clock_gettime(0xfffffffa /* CLOCK_??? */, {147, 512463346}) = 0
[pid 30727] getrusage(RUSAGE_SELF, {ru_utime={126, 620000}, ru_stime={20, 890000}, ...}) = 0
[pid 30727] mmap(0x7fb643800000, 3145728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb643400000
[pid 30727] --- SIGSEGV (Segmentation fault) @ 0 (0) ---
}}}
We were very concerned about the situation because an unstable runtime system really feels like we should better be using Java for ""serious"" applications.  It's absolutely no problem now because we'll just not use the tuned runtime system flags.  It might be a good idea to remove them entirely until they're known to work in busy applications.  (Or at least include a warning.)

I don't understand any of the details but maybe the problem with retainer profiling (issue #4820) has the same cause.

When testing new releases it would probably be a good idea to also test various flag combinations (maybe the GHC compiler binary could just choose some random values during startup if none are given ;-).

I hope this information is of some help.  We haven't tried to reproduce the problem with a small test program as we're a bit in a hurry doing a release.  If there is anything we can do to help to find the cause of the problem, please let us know."	bug	closed	high	7.2.1	Runtime System	7.0.1	worksforme		wehr@…	Linux	x86_64 (amd64)	Runtime crash					
