Ticket #2081 (closed bug: worksforme)

Opened 4 years ago

Last modified 3 years ago

GHC reports internal error: stg_ap_v_ret

Reported by: thorkilnaur Owned by: simonmar
Priority: high Milestone: 6.10.2
Component: Compiler Version: 6.11
Keywords: Cc:
Operating System: MacOS X Architecture: Unknown/Multiple
Type of failure: Difficulty: Unknown
Test Case: Blocked By:
Blocking: Related Tickets:

Description

On a PPC Mac OS X 10.4 running

while true; do du -k >/dev/null; done

to spend some resources, in about 50% of the cases, a GHC compile using a recent HEAD fails with this report:

$ /Users/thorkilnaur/tn/GHCDarcsRepository/ghc-HEAD-complete-for-pulling-and-copying-20070713_1212/ghc/compiler/stage2/ghc-inplace --make -fforce-recomp -v0 concio001 -o concio001
ghc-6.9.20080203: internal error: stg_ap_v_ret
    (GHC version 6.9.20080203 for powerpc_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Abort trap

This reaction does not seem to depend on what is being compiled.

Best regards Thorkil

Change History

Changed 4 years ago by igloo

  • difficulty set to Unknown
  • milestone set to 6.10 branch

May not be OS X specific: I just got this on amd64/Linux while validating the HEAD:

====> Running ./boxy/all.T
=====> Base1(normal)
cd ./boxy && '/home/ian/ghc/darcs/val/compiler/stage2/ghc-inplace' -no-recomp -dcore-lint -dcmm-lint -Dx86_64_unknown_linux  -c Base1.hs    >Base1.comp.stderr 2>&1
/bin/sh: line 1: 19152 Aborted                 '/home/ian/ghc/darcs/val/compiler/stage2/ghc-inplace' -no-recomp -dcore-lint -dcmm-lint -Dx86_64_unknown_linux -c Base1.hs >Base1.comp.stderr 2>&1
Compile failed (status 34304) errors were:
ghc-6.9.20080210: internal error: stg_ap_v_ret
    (GHC version 6.9.20080210 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

*** unexpected failure for Base1(normal)

Running the test again a couple of times didn't reproduce the error.

Changed 4 years ago by guest

  • os changed from MacOS X to Multiple
  • architecture changed from powerpc to Multiple
  • summary changed from On a strained PPC Mac OS X 10.4, GHC reports internal error: stg_ap_v_ret to GHC reports internal error: stg_ap_v_ret

The same bug can occur on Intel OS X in the resulting binary. The mkbndl package on hackage when compiled and simply run with "mkbndl" will cause the app to output the usage string and then mkbndl: internal error: stg_ap_v_ret

(GHC version 6.9.20080920 for i386_apple_darwin) Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug

Abort trap

Changed 4 years ago by simonmar

I don't see this, using a build on x86-Linux from yesterday and mkbndl downloaded from Hackage today. Can any Mac users reproduce it?

The error message could be indicative of a wide range of different corruption failures in the runtime, so my guess is that this ticket is actually more than one bug. It's possible the original bug has now been fixed and what we're looking at now is a new bug. But that's just my informed opinion. Let's try to reproduce it anyway.

Changed 4 years ago by judah

I was unable to reproduce this on an Intel iMac with OS X 10.5.4, ghc-6.10.0.20080921 and mkbundl-0.2.1. I downloaded the tarball and did runhaskell Setup [configure,build,install]. Thorkil, did anything differ in how you reproduced the error?

Changed 4 years ago by thorkilnaur

The original problem was reported against a GHC HEAD pulled around 2008-Feb-03. I am warming up my PPC Mac OS X 10.4 GHC HEAD, intending to see whether the problem is still there.

In the meantime, the PPC Mac OS X 10.5 builders, both HEAD ( http://darcs.haskell.org/buildbot/all/builders/tnaur%20PPC%20OSX%20head%202/builds/138) and STABLE ( http://darcs.haskell.org/buildbot/all/builders/tnaur%20PPC%20OSX%20stable%202/builds/175), currently get a report like

=====> arr001(normal)
cd ./array/should_run && '/Volumes/tn18_HD_1/Users/thorkilnaur/tn/buildbot/ghc/tnaur-ppc-osx-2/tnaur-ppc-osx-head-2/build/ghc/stage2-inplace/ghc' -fforce-recomp -dcore-lint -dcmm-lint -Dpowerpc_apple_darwin  -dno-debug-output -o arr001 arr001.hs    >arr001.comp.stderr 2>&1
timeout: internal error: stg_ap_p_ret
    (GHC version 6.11.20080925 for powerpc_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Compile failed (status -1536) errors were:

*** unexpected failure for arr001(normal)

for all tests, as far as I can tell. The latest head summary reports:

OVERALL SUMMARY for test run started at Thu Sep 25 05:40:29 CEST 2008
    2138 total tests, which gave rise to
    6607 test cases, of which
    1864 caused framework failures
    1057 were skipped

     274 expected passes
     167 expected failures
       1 unexpected passes
    3244 unexpected failures

I can reproduce like this:

$ ./timeout 200 'echo Something' 
timeout: internal error: stg_ap_p_ret
    (GHC version 6.11.20080925 for powerpc_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Something
Abort trap
$ 

I have not attempted to investigate further.

Best regards Thorkil

Changed 4 years ago by thorkilnaur

I am no longer able to produce the error message internal error: stg_ap_p_ret using the method originally described (compiling a program with GHC HEAD while spending resources in a separate process). I have looked a bit more into the timeout case: As I read what happens, timeout.hs is compiled using the stage1 inplace compiler. And:

$ uname -a
Darwin thorkil-naurs-mac-mini.local 9.4.0 Darwin Kernel Version 9.4.0: Mon Jun  9 19:36:17 PDT 2008; root:xnu-1228.5.20~1/RELEASE_PPC Power Macintosh
$ /Users/thorkilnaur/tn/buildbot/ghc/tnaur-ppc-osx-2/tnaur-ppc-osx-head-2/build/ghc/stage1-inplace/ghc -fforce-recomp --make timeout.hs
[1 of 1] Compiling Main             ( timeout.hs, timeout.o )
Linking timeout ...
$ ./timeout 200 'echo Something'
Something
$ /Users/thorkilnaur/tn/buildbot/ghc/tnaur-ppc-osx-2/tnaur-ppc-osx-head-2/build/ghc/stage1-inplace/ghc -fforce-recomp --make timeout.hs -O2
[1 of 1] Compiling Main             ( timeout.hs, timeout.o )
Linking timeout ...
$ ./timeout 200 'echo Something'
timeout: internal error: stg_ap_p_ret
    (GHC version 6.11.20080927 for powerpc_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Something
Abort trap
$ 

So -O2 makes a difference.

Best regards Thorkil

Changed 4 years ago by simonmar

  • milestone changed from 6.10 branch to 6.10.1

I want to get to the bottom of this (the timeout problem that Thorkil reports above) before the release.

Changed 4 years ago by simonmar

  • architecture changed from Multiple to Unknown/Multiple

Changed 4 years ago by simonmar

  • os changed from Multiple to Unknown/Multiple

Changed 4 years ago by thorkilnaur

After discussing this briefly on #ghc the other day, I tried to get back to the version of the HEAD that had the problem, but I failed to reproduce it. However, the two subsequent buildbot builds both have the problem ( http://darcs.haskell.org/buildbot/all/builders/tnaur%20PPC%20OSX%20head%202/builds/142 and  http://darcs.haskell.org/buildbot/all/builders/tnaur%20PPC%20OSX%20stable%202/builds/179). And I am able to consistently reproduce the problem mentioned above with the timeout program.

So the question is, how to proceed. Compiling with -O2 and running with +RTS -DSs produces:

$ ./timeout 200 "echo Something" +RTS -DSs
new task (taskCount: 1)
task exiting
new task (taskCount: 1)
created thread 1, stack size = f1 words
new bound thread (1)
### NEW SCHEDULER LOOP (task: 0x600520, cap: 0x2a1178)
-->> running thread 1 ThreadRunGHC ...
created thread 2, stack size = f1 words
created thread 3, stack size = f1 words
--<< thread 1 (ThreadRunGHC) stopped: is blocked on an MVar @ 0x117f370
-->> running thread 2 ThreadRunGHC ...
--<< thread 2 (ThreadRunGHC) stopped, yielding
-->> running thread 3 ThreadRunGHC ...
thread 3 did a safe foreign call
forking!
new task (taskCount: 2)
task exiting
thread 3: re-entering RTS
timeout: internal error: stg_ap_p_ret
    (GHC version 6.11.20081003 for powerpc_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
waking up thread 3 on cap 0
raising exception in thread 2.
raising exception in thread 1.
discarding task 6292768
created thread 4, stack size = f1 words
new bound thread (4)
### NEW SCHEDULER LOOP (task: 0x6005f0, cap: 0x2a1178)
-->> running thread 4 ThreadRunGHC ...
Something
Abort trap
$

But I am sorely in need of some hints.

Another possibility would be for some seasoned expert to get access to the machinery and take a look.

Best regards Thorkil

Changed 4 years ago by simonmar

Looks odd - timeout is supposed to be built with -threaded, but that output doesn't look like it was generated by the threaded RTS.

Changed 4 years ago by thorkilnaur

  • version changed from 6.9 to 6.11

You are right, the +RTS -DSs output was certainly not produced by a timeout compiled with -threaded. If I try that, I get:

$ /Users/thorkilnaur/tn/buildbot/ghc/tnaur-ppc-osx-2/tnaur-ppc-osx-head-2/build/ghc/stage1-inplace/ghc --make timeout.hs -fforce-recomp -O2 -debug -threaded
[1 of 1] Compiling Main             ( timeout.hs, timeout.o )
Linking timeout ...
$ ./timeout 200 'echo Something' +RTS -DSs
    a0cfe074: allocated 1 capabilities
    a0cfe074: new task (taskCount: 1)
    a0cfe074: returning; I want capability 0
    a0cfe074: resuming capability 0
    a0cfe074: starting new worker on capability 0
    a0cfe074: new worker task (taskCount: 2)
    a0cfe074: task exiting
    a0cfe074: new task (taskCount: 2)
    a0cfe074: returning; I want capability 0
    f0081000: ### NEW SCHEDULER LOOP (task: 0x600590, cap: 0x2ad30c)
    f0081000: giving up capability 0
    f0081000: passing capability 0 to worker 0xa0cfe074
    a0cfe074: resuming capability 0
    a0cfe074: created thread 1, stack size = f1 words
    a0cfe074: new bound thread (1)
    a0cfe074: ### NEW SCHEDULER LOOP (task: 0x6004d0, cap: 0x2ad30c)
    a0cfe074: ### Running thread 1 in bound thread
    a0cfe074: -->> running thread 1 ThreadRunGHC ...
    a0cfe074: thread 1 did a safe foreign call
    a0cfe074: freeing capability 0
    a0cfe074: thread 1: leaving RTS
    a0cfe074: returning; I want capability 0
    a0cfe074: resuming capability 0
    a0cfe074: thread 1: re-entering RTS
    a0cfe074: created thread 2, stack size = f1 words
    a0cfe074: --++ thread 1 (ThreadComplete) finished
    a0cfe074: bound thread (1) finished
    a0cfe074: passing capability 0 to worker 0xf0081000
    a0cfe074: task exiting
    a0cfe074: new task (taskCount: 2)
    a0cfe074: returning; I want capability 0
    a0cfe074: resuming capability 0
    a0cfe074: created thread 3, stack size = f1 words
    a0cfe074: new bound thread (3)
    a0cfe074: ### NEW SCHEDULER LOOP (task: 0x6004d0, cap: 0x2ad30c)
    f0081000: woken up on capability 0
    a0cfe074: ### this OS thread cannot run thread 2
    f0081000: capability 0 is owned by another task
    a0cfe074: giving up capability 0
    a0cfe074: passing capability 0 to worker 0xf0081000
    f0081000: woken up on capability 0
    f0081000: resuming capability 0
    f0081000: -->> running thread 2 ThreadRunGHC ...
    f0081000: thread 2 did a safe foreign call
    f0081000: passing capability 0 to bound task 0xa0cfe074
    f0081000: thread 2: leaving RTS
    a0cfe074: woken up on capability 0
    a0cfe074: resuming capability 0
    a0cfe074: ### Running thread 3 in bound thread
    a0cfe074: -->> running thread 3 ThreadRunGHC ...
    a0cfe074: created thread 4, stack size = f1 words
    a0cfe074: created thread 5, stack size = f1 words
    a0cfe074: --<< thread 3 (ThreadRunGHC) stopped: blocked
    a0cfe074: giving up capability 0
    a0cfe074: starting new worker on capability 0
    a0cfe074: new worker task (taskCount: 3)
    f0103000: ### NEW SCHEDULER LOOP (task: 0x6006b0, cap: 0x2ad30c)
    f0103000: -->> running thread 4 ThreadRunGHC ...
    f0103000: --<< thread 4 (ThreadRunGHC) stopped: blocked
    f0103000: -->> running thread 5 ThreadRunGHC ...
    f0103000: thread 5 did a safe foreign call
    f0103000: starting new worker on capability 0
    f0103000: new worker task (taskCount: 4)
    f0103000: thread 5: leaving RTS
    f0103000: forking!
    f0103000: new task (taskCount: 5)
    f0185000: ### NEW SCHEDULER LOOP (task: 0x600770, cap: 0x2ad30c)
    f0185000: giving up capability 0
    f0081000: returning; I want capability 0
    f0103000: returning; I want capability 0
    f0185000: passing capability 0 to worker 0xf0081000
    f0081000: resuming capability 0
    f0081000: thread 2: re-entering RTS
    f0081000: --<< thread 2 (ThreadRunGHC) stopped, yielding
    f0081000: giving up capability 0
    f0081000: passing capability 0 to worker 0xf0103000
    f0103000: resuming capability 0
    f0103000: passing capability 0 to worker 0xf0081000
    f0103000: task exiting
    f0103000: returning; I want capability 0
    f0103000: resuming capability 0
    f0103000: thread 5: re-entering RTS
timeout: internal error: stg_ap_p_ret
    (GHC version 6.11.20081003 for powerpc_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
    f0103000: waking up thread 5 on cap 0
    f0103000: raising exception in thread 4.
    f0103000: raising exception in thread 3.
    f0103000: raising exception in thread 2.
    f0103000: discarding task -266842112
    f0103000: discarding task -267374592
    f0103000: discarding task -267907072
    f0103000: discarding task -1596989324
    f0103000: created thread 6, stack size = f1 words
    f0103000: new bound thread (6)
    f0103000: ### NEW SCHEDULER LOOP (task: 0x600830, cap: 0x2ad30c)
    f0103000: ### Running thread 6 in bound thread
    f0103000: -->> running thread 6 ThreadRunGHC ...
    f0081000: woken up on capability 0
    f0081000: capability 0 is owned by another task
Something
Abort trap
$ 

Best regards Thorkil

Changed 4 years ago by simonmar

  • os changed from Unknown/Multiple to MacOS X

Changed 4 years ago by igloo

  • owner set to simonmar
  • priority changed from normal to high

Changed 4 years ago by igloo

  • milestone changed from 6.10.1 to 6.10.2

Changed 3 years ago by simonmar

I've slightly lost track of what this ticket is about. Thorkil: is there still a reproducible bug, if so on which platform(s), and with which version(s) of GHC?

Changed 3 years ago by thorkilnaur

I have a particular build of a particular version of the HEAD ( http://darcs.haskell.org/buildbot/all/builders/tnaur%20PPC%20OSX%20head%202/builds/142) with which I can reproduce the problem on PPC Mac OS X 10.5:

$ uname -a
Darwin thorkil-naurs-mac-mini.local 9.5.0 Darwin Kernel Version 9.5.0: Wed Sep  3 11:31:44 PDT 2008; root:xnu-1228.7.58~1/RELEASE_PPC Power Macintosh
$ ../../ghc/stage1-inplace/ghc -fforce-recomp --make timeout -O2
[1 of 1] Compiling Main             ( timeout.hs, timeout.o )
Linking timeout ...
$ ./timeout 200 "echo Something"
timeout: internal error: stg_ap_p_ret
    (GHC version 6.11.20081003 for powerpc_apple_darwin)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
Something
Abort trap
$ 

That particular HEAD build is the latest for which this happened. The latest STABLE build for which it happened is  http://darcs.haskell.org/buildbot/all/builders/tnaur%20PPC%20OSX%20stable%202/builds/180.

Best regards Thorkil

Changed 3 years ago by simonmar

  • status changed from new to closed
  • resolution set to worksforme

Let's close this ticket, and open a new ticket if it re-emerges.

Note: See TracTickets for help on using tickets.