Ticket #3693 (new feature request)

Opened 4 years ago

Last modified 5 months ago

Show stack traces

Reported by: jpet Owned by:
Priority: normal Milestone: 7.6.2
Component: Runtime System Version: 6.10.4
Keywords: Cc: mihai.maruseac@…, pumpkingod@…, lonetiger@…, hackage.haskell.org@…, anton.nik@…, dterei, bgamari@…, scpmw@…, yi.codeplayer@…, nathanhowell@…
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: None/Unknown Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

Debugging stack overflows can be very difficult, because GHC gives very little information as to exactly what is overflowing. Showing a basic stack dump (on crash, or in the ghci debugger) would be enormously helpful.

(Entered after spending two days trying to determine the cause of a stack overflow, before discovering it was a GHC bug [3677], which would have been apparent immediately if I could have only seen a callstack.)

Attachments

StackTraces.dpatch Download (144.1 KB) - added by batterseapower 3 years ago.
StackTraces-Base.dpatch Download (34.3 KB) - added by batterseapower 3 years ago.
StackTraces.3.dpatch Download (149.7 KB) - added by batterseapower 3 years ago.
Compiler-StackTraces.dpatch Download (83.1 KB) - added by batterseapower 3 years ago.
Testsuite-StackTraces.dpatch Download (84.9 KB) - added by batterseapower 3 years ago.
Base-StackTraces.dpatch Download (48.2 KB) - added by batterseapower 3 years ago.

Change History

  Changed 4 years ago by simonmar

  • milestone set to _|_

Stack traces are difficult to implement, which is why there isn't one. See e.g. ExplicitCallStack.

I've been pondering the use of HPC to generate some tracing information though, possibly for feeding into ThreadScope?.

  Changed 4 years ago by guest

Might it be easy to get non-lexical call stacks? That way you could at least see the chain of "case" statements that forced the thunk producing an error. And some information is better than none at all.

  Changed 4 years ago by jpet

The linked-to proposals and discussion are mostly about generating what the stack "would be" if evaluation order had been strict.

I'm suggesting what the previous comment does, something much simpler (and IMHO much more useful)--I want to see a dump of the actual stack that actually exists in memory. Walking these stacks is easy, well-defined, and portable; the only obvious obstacle seems to be annotating code with source line and file info.

  Changed 4 years ago by jpet

Sorry, I should have said, "walking these stacks for stack traces seems feasible, since the garbage collector already has to do it." I should know better than to use the word "easy" when suggesting a task for someone else.

  Changed 4 years ago by simonmar

Yes you can walk the stack, but the problem is that due to the amount of code transformation that GHC does, it is almost impossible to relate stack frames at runtime to the original code. The code that runs bears little resemblance to the code you wrote in the original program. Case expressions in the compiled code may or may not correspond to case expressions or pattern matches in the original code, and GHC doesn't generally keep track of that information.

  Changed 4 years ago by simonpj

I agree with all you say, but I think there's still a case for saying that any stack trace is better than none. It may not be easily explicable to Joe User, but it would give an expert a lot more to go on. It's a bit like attaching a loudspeaker to the memory access pin of a CPU; you can't explain the noises that come out, but you get to recognise them and they can be very helpful. (Don't laugh -- I worked on an Eliot 803 which did this! Well, I'm not sure which pin it was, but it made noises anyway.)

In short, I think there's a case for a quick-and-dirty stack trace, if such a thing could be done without much effort.

Simon

  Changed 4 years ago by simonmar

Ok, let's explore what could be done. In gdb you can see the stack like this:

(gdb) p16 $rbp
0x2b265a880368: 0x2b265a87f2c0
0x2b265a880360: 0x748d08 <stg_marked_upd_frame_info>
0x2b265a880358: 0x459090 <smN_info>
0x2b265a880350: 0x6aafd0 <s1VW_info>
0x2b265a880348: 0x2b265a87f2d1
0x2b265a880340: 0x58afc0 <s1Ed_info>
0x2b265a880338: 0x2b265a87f2d1
0x2b265a880330: 0xb6757b <base_GHCziStorable_readInt8OffPtr1_closure+3>
0x2b265a880328: 0xaf04d8 <base_GHCziInt_zdfEqInt8_closure>
0x2b265a880320: 0xb22c39 <r1wc_closure+1>
0x2b265a880318: 0x2b265a87f328
0x2b265a880310: 0x59dc60 <s11L_info>
0x2b265a880308: 0x2b2655c4dc18
0x2b265a880300: 0x6abf38 <sOz_info>
0x2b265a8802f8: 0x2b265a87f328
0x2b265a8802f0: 0x748b88 <stg_upd_frame_info>

In the RTS there's also a way to print out stacks, though I don't use it very much. Note this is upside-down compared to the above dump:

(gdb) p printStackChunk( 0x2b265a8802f0,0x2b265a880368)
Object 0x2b265a8802f0 = UPDATE_FRAME(0x748b88,0x2b265a87f328)
RET_SMALL (0x6abf28)
   stk[12] (0x2b265a880308) = Word# 47443647716376
RET_SMALL (0x59dc50)
   stk[10] (0x2b265a880318) = 0x2b265a87f328
   stk[9] (0x2b265a880320) = 0xb22c39
   stk[8] (0x2b265a880328) = 0xaf04d8
   stk[7] (0x2b265a880330) = 0xb6757b
   stk[6] (0x2b265a880338) = 0x2b265a87f2d1
RET_SMALL (0x58afb0)
   stk[4] (0x2b265a880348) = 0x2b265a87f2d1
RET_SMALL (0x6aafc0)
RET_SMALL (0x459080)
Object 0x2b265a880360 = UPDATE_FRAME(0x748d08,0x2b265a87f2c0)

You get less information in that there are no symbol names: the RTS doesn't have access to the binary's symbol table. That information is available in the binary, and it might be possible to extract it using the RTS linker (speculation - I don't know how hard this would be in practice). On the other hand, the RTS knows where the boundaries between stack frames are, and it can recognise an UPDATE_FRAME etc.

But even with symbol names, you couldn't glean a lot from the trace here. What might help is if we knew which module each of those symbols comes from, but that info isn't available, even in the symbol table. We could add the module name to every symbol, but that would make local symbols much bigger. Perhaps that doesn't matter as these are only local symbols and so symbol lookup isn't actually performed at link time, it just makes the strings bigger in the object files. Someone could do that experiment.

Maybe the right thing would be to add some debugging info to the compiled code that doesn't take up any space at runtime, but which the RTS could slurp in if necessary.

  Changed 4 years ago by simonpj

Apart from update/exception frames, the stack mainly consists of frames pushed by 'case'. If we put a string into each return-info-table (much as we do already for constructors, I think), then we could record (a) module, (b) top-level function (c) case binder... and that'd be plenty. It'd bloat our binaries a bit but I'm betting it would not be much.

An alternative could be to do this only if you compile with -debug or -ticky. But that would lose part of the benefit, I'm guessing.

Simon

  Changed 4 years ago by simonmar

Right, yes, we could do that. I didn't consider that kind of thing because I have a feeling it would bloat binaries too much (there are a lot more case expressions than constructors), but someone could do the experiment and find out.

Putting the info in a debug section as I mentioned before would be a good way to avoid the binary bloat, but it makes it more complicated to extract again. If you're going to do that you might as well use the symbol table too, rather than duplicate the info in the debug section.

  Changed 4 years ago by asuffield

For memory and disk space concerns, this is why we use .debug sections (and sometimes place them in second files).

For the difficulty of accessing and dumping it, how about not duplicating the effort? Emit the debugging information in DWARF format as if it was any other language, and either borrow the code from gdb, or (perhaps easier) teach gdb the few extra bits it needs to decipher ghc call stacks.

  Changed 3 years ago by batterseapower

  • patch set to 0

I've written a very simple patch to add support for basic STG stack traces.

Here's how it looks:

$ cat Stacks2.hs
module Main where

main = thunk_use_site (myHead [])

{-# NOINLINE thunk_use_site #-}
thunk_use_site :: Bool -> IO ()
thunk_use_site foo = case foo of True -> putStrLn "Woot!"

{-# NOINLINE myHead #-}
myHead (x:xs) = x

$ ./Stacks2
Stacks2: Stacks2.hs:10:1-17: Non-exhaustive patterns in function myHead

Stacks2: Backtrace:
  main:Main.thunk_use_site1
  base:GHC.TopHandler.runMainIO1

It takes the approach of storing an optional string pointer in return closure info tables. Currently only frames arising from "case" scrutinisation are supported. The string generated from them contains information about the top-level function from which the "case" originated.

There is a new primop, raiseWithStack#, and the catch# primop now supplies a ByteArray?# to the exception handler, which contains an array of string pointers. This is reified into a [String] on the Haskell side, and shown to the user in the Haskell-side TopHandler?.

I changed the simplifier in one place to stop it generating new top-level names like "a", and instead basing those names on the binder the code originated from (if any). We could go through fixing such parts of the compiler on an ad-hoc basis, or we could implement something using HPC information to produce decent stack traces.

This patch is pretty rough and ready and probably buggy, but I thought I'd throw it out there. Areas for improvement:

  • Reify other sorts of stack frame. In particular, allow the RetInfoTables? generated by hand-written .cmm to contain strings, and show update frames. If we do this we should probably use [StackFrame?] on the Haskell side rather than [String], where StackFrame? is an algebraic type with a case for each kind of stack frame.
  • Better information about how the backtrace corresponds to the program the user wrote (e.g. using HPC as discussed above)
  • Common up stack frame strings. Currently, I'll emit the same string several times if there are multiple "case" expressions within one top level binding, which is going to contribute to bloat
  • Test it out. In particular, actually try running the test suite :-)

One note: I repurposed some code from the IA64 branch of Adjustor.c to create my ByteArray?#. However, it looks to me like the original code would not even compile (apparently allocate() has gained a Capability* argument since that code was written). Is the IA64 backend dead? If it's not, I've *certainly* broken it with this patch.

Here are some rough-and-ready code bloat numbers. The base library before (approximately, this is another recent build I had lying around):

$ du .
8.0K	./autogen
116K	./cbits
132K	./Control/Concurrent
112K	./Control/Exception
 36K	./Control/Monad/ST
 80K	./Control/Monad
780K	./Control
 24K	./Data/STRef
2.6M	./Data
 16K	./Debug
464K	./Foreign/C
184K	./Foreign/Marshal
876K	./Foreign
236K	./GHC/IO/Encoding
400K	./GHC/IO/Handle
1.3M	./GHC/IO
5.6M	./GHC
 68K	./System/Console
 44K	./System/IO
 28K	./System/Mem
316K	./System/Posix
660K	./System
124K	./Text/ParserCombinators
144K	./Text/Read
 16K	./Text/Show
404K	./Text
8.0K	./Unsafe
 26M	.

And after:

$ du .
8.0K	./autogen
116K	./cbits
148K	./Control/Concurrent
120K	./Control/Exception
 36K	./Control/Monad/ST
 80K	./Control/Monad
848K	./Control
 24K	./Data/STRef
2.8M	./Data
 16K	./Debug
472K	./Foreign/C
200K	./Foreign/Marshal
920K	./Foreign
244K	./GHC/IO/Encoding
460K	./GHC/IO/Handle
1.5M	./GHC/IO
6.3M	./GHC
 80K	./System/Console
 52K	./System/IO
 32K	./System/Mem
332K	./System/Posix
712K	./System
132K	./Text/ParserCombinators
156K	./Text/Read
 16K	./Text/Show
440K	./Text
8.0K	./Unsafe
 30M	.

So a 15% increase in code size. Not too bad, especially considering there is room for improvement by commoning up some strings, as I outlined above.

Changed 3 years ago by batterseapower

  Changed 3 years ago by batterseapower

By the way, those NOINLINE annotations are actually redundant. You get the same backtrace without them.

Changed 3 years ago by batterseapower

  Changed 3 years ago by igloo

  • patch changed from 0 to 1
  • milestone changed from _|_ to 6.14.1

Changed 3 years ago by batterseapower

  Changed 3 years ago by batterseapower

The new version of the patch is much improved. Backtraces now look like this:

$ ./Stacks2
Stacks2: Stacks2.hs:8:1-17: Non-exhaustive patterns in function myHead

Stacks2: Backtrace:
  main:Main.thunk_use_site1 (Stacks2.hs:6:1-14)
  base:GHC.TopHandler.runMainIO1 (libraries/base/GHC/TopHandler.lhs:54:1-9)

Despite the fact that there is more information in the backtrace than before (the SrcLocs?), I actually managed to reduce binary bloat a little bit (it is now at 8%):

$ du .
8.0K	./autogen
116K	./cbits
148K	./Control/Concurrent
116K	./Control/Exception
 36K	./Control/Monad/ST
 80K	./Control/Monad
832K	./Control
 24K	./Data/STRef
2.7M	./Data
 16K	./Debug
468K	./Foreign/C
192K	./Foreign/Marshal
900K	./Foreign
236K	./GHC/IO/Encoding
424K	./GHC/IO/Handle
1.4M	./GHC/IO
6.0M	./GHC
 72K	./System/Console
 48K	./System/IO
 32K	./System/Mem
332K	./System/Posix
692K	./System
128K	./Text/ParserCombinators
152K	./Text/Read
 16K	./Text/Show
420K	./Text
8.0K	./Unsafe
 28M	.

This was achieved by commoning up the info, and not emitting it if it isn't actually needed.

I also tried to reduce the impact on physical memory of this debug info being in the object file by trying to emit it in a single contiguous section. The idea is that if its all together on one page then the OS's cache manager will simply not load it into physical memory until a stack trace is actually needed - thanks to Simon Marlow for this idea.

RetInfoTables? originating from the RTS now have some primitive backtrace information.

I've modified the tidier so that Internal names get names based on their referrer. This generalises the Simplifier hack from my last patch, so I've got rid of that change.

Remaining work:

  • Special handling for update frames (maybe), and perhaps some other of the non-standard RTS return frames
  • Look to see if I can improve the precision of backtrace info by using some of the context local to the return point
  • Run testsuite (I've been avoiding this, my machine is slow :-)

  Changed 3 years ago by simonmar

This is pretty exciting! I'm looking forward to having panic messages in GHC emit a stack trace by default. I'd be interested to see what a stack trace from the bowels of GHC looks like - have you tried it?

I've been thinking about stripping the debug info - we could do it at link time with a flag to GHC. strip has a flag to strip a particular section, and then we can tell the RTS that the debug info has been stripped in the same way that we tell it about the -rtsopts flag, i.e. with a special compiled-at-link-time C file.

  Changed 3 years ago by batterseapower

Thomas Schilling talked to me about the API I have at the moment. It's quite specialised in that you can only get a list of return frame descriptions, and makes the RTS do a comparatively large amount of work.

I've based this possible new API on his ideas. The idea is to move much more responsibility into the Haskell side of things, and hence allow for more interesting stack trace processors to be written. For example, you could write one which let the user interactively poke at the closures within the stack from using Vacuum or something.

catchWeird :: a -> (forall b. SomeException -> Addr# -> IO b -> (a -> IO b) -> IO b) -> a
retInfoTableDescription# :: Addr# -> Ptr CString
walkStack# :: Addr# -> Addr#

The idea is that the exception handler gets, along with the exception and a pointer to esp, two continuations:

  1. A continuation to use if you are able to recover from the exception and want to return a value. Entering this continuation strips the stack down to the current catch frame and returns that value to the thing above.
  2. A continuation to use if you are not able to recover. Entering this continuation passes control to the next exception handler on the stack.

The Addr# pointer to the stack top points to undefined memory after you enter either continuation, so you need to ensure that you have forced any thunks that close over it before you enter a continuation.

You can use walkStack# to move your stack pointer down the stack, and then unpackClosure# combined with retInfoTableDescription# to obtain a description of the frame.

I think the major obstacle to this API is deciding how to handle stack overflow exceptions. In that case, we cannot assume that there will be enough stack to let the exception handler reify the stack frame before continuing. Presumably, to deal with this we would have to either:

  1. Start a new stack somewhere to run the handler on if we get an asynchronous stack overflow exception
  2. Bump up the stack size temporarily while we run the handler on the original stack

Either way it sounds fiddly. I'm probably not going to implement this API yet, and just stick to the basic story for now.

follow-up: ↓ 18   Changed 3 years ago by batterseapower

Well, the "fast" testsuite basically works, with the exception of some of the GHC performance tests (and ffi005 which I have never seen pass on any of my OS X machines). This is encouraging, but I need to look carefully at the performance tests.

I'm not having much luck with providing a way to strip out the debug info. I played with the suggestion by Simon to have an optionally-strippable section, but:

  • OS X strip has no option to strip sections with a given name
  • If I install GNU strip using Macports, trying to strip anything produces by GHC causes the stripped thing to die immediately upon running with "Illegal instruction". gstrip itself complains about "unable to read unknown load command 0x22". I have no idea what to make of that.

Which is kind of discouraging. Even if I got it all working on every platform, we would never be able to strip the .ghc_debug section from shared libraries, which is sort of annoying.

I guess I could probably stuff the info into a normally-named .debug section, and then just run plain "strip" on the result if you have requested that stack traces be disabled. This might be annoying for those who link in their own .o files with debug info they want to preserve, but maybe that is an acceptable tradeoff. Thoughts?

in reply to: ↑ 17   Changed 3 years ago by simonmar

Replying to batterseapower:

I'm not having much luck with providing a way to strip out the debug info. I played with the suggestion by Simon to have an optionally-strippable section, but: * OS X strip has no option to strip sections with a given name

No big deal - just make it a no-op on OS X, perhaps emitting a warning.

Which is kind of discouraging. Even if I got it all working on every platform, we would never be able to strip the .ghc_debug section from shared libraries, which is sort of annoying.

It isn't such a problem in shared libraries, I'm not too worried about this.

  Changed 3 years ago by igloo

  • status changed from new to patch

Changed 3 years ago by batterseapower

Changed 3 years ago by batterseapower

Changed 3 years ago by batterseapower

  Changed 3 years ago by batterseapower

I've attached some fairly complete patches for stack trace support.

The outstanding issues are:

1. These three tests fail for me on both Windows and Linux after applying the patch, but I'm not sure why some of them do. One is a GHC performance test, so it looks like there is a small perf regression caused by the patches:

   3429(threaded2)
   T1969(normal)
   T3294(normal)

2. These two tests fail on Windows only:

   stacktrace1-ghci(ghci)
   stacktrace1-stripped(normal,optc,hpc,optasm,threaded1,threaded2)

The first failure is harmless. The error is "Failed to load interface for `HC.TopHandler?'", and is solved if you run dos2unix on the .genscript file. I'm not really sure why this is a problem for that particular ghci test and no other one, though!

The second failure is more serious: my guess is that Windows barfs when loading an executable where some symbols point into a stripped section. The exact error I get is "stacktrace1-stripped.exe is not a valid Win32 application", which is not very informative. It would however be straightforward to show a warning when the -strip-stack-info flag is given on Windows, just like we already do if it is used on OS X.

3. Currently I show the stg_ stack frames originating from the RTS, but it's up for debate as to whether we should bother. I think we probably should show them, because it's quite interesting to see where the update frames, catch frames etc are. It's very easy to change this behaviour to turn them off (a 2-line edit in CmmParse?.y).

Other than that I think the patch is pretty much done. I've tested it on 32 bit Intel on all three of Windows, Linux and OS X. Solaris / 64 bit support may possibly have minor bugs in them, but I can't test that. e.g. i'm not sure if I got the solaris assembler syntax correct.

This patch will make throwing an exception slightly slower because it reifies the stack to a depth of 100 when you call stg_raisezh. Other flaws with the design that has been implemented mean that:

  1. It is impossible to re-raise a Haskell-side "StackTrace?" value once you have caught it. This is fixable at the cost of extra allocation (reified stacks should be of type Arr# String rather than a ByteArray?# array of Addr# to CStrings), but that allocation can be done statically by generating a closure in .ghc_debug section.
  2. There is no support for "inner exceptions". I.e. if an exception handler throws an exception, we should pair the original stack trace with the stack trace from the new place it is throwing from. This can be implemented with a bit more effort than 1, but if we did it then it would subsume 1 (we could keep stack traces as ByteArray?#) and would require less effort on the part of the programmer to get informative stack traces.

Make of it what you will :-). I don't want to spend too many more cycles on this problem right now, so if you want to change the design radically it may take me a while to get around to it.

  Changed 3 years ago by batterseapower

I had a look at how you might encode how to walk the STG stack using DWARF. I don't think it's possible with DWARF v2 because the "register rules" which define how the stack walking process updates registers as it walks back up the stack do not include what we need.

Basically we need to update: $eip to (top_closure->info->entry_code) $ebp to (top_closure - size(top_closure))

(where the top_closure pointer can be determined from $eip and $ebp in a standard way)

However, the DWARF2 register rules do not make provision for arbitrary expressions: you can only transfer from other registers or a value stored at some offset from the frame base address.

DWARFv3 adds expression and val_expression transfer rules which we could easily use to implement this stack walking scheme. (Which have apparently been implemented in GDB since 2006:  http://cygwin.ru/ml/gdb-patches/2006-03/msg00076.html).

The work that would need to be done is:

  • Generate CFA and location information for code emitted from GHC
  • Add libdwarf as a GHC dependency
  • Augment the RTS with some code using libdwarf to reify the stack when an exception occurs, passing it to Haskell land as in my existing patch
  • The RTS code must be robust to missing DWARF info (i.e. if the executable was stripped)

The benefit to this approach compared to the one my patch implements is that dropping the debug info can be done with standard tools, and it will be more pleasant to use GDB on GHC-generated executables.

follow-up: ↓ 25   Changed 3 years ago by mihai.maruseac

  • cc mihai.maruseac@… added

So, is this solved? Will we have a stack trace in a future version of GHC?

(I just got free time and was thinking into dwelling into this when I found that there are several patches allready :) )

  Changed 3 years ago by pumpkin

  • cc pumpkingod@… added

  Changed 3 years ago by Phyx-

  • cc lonetiger@… added

in reply to: ↑ 22   Changed 3 years ago by simonmar

Replying to mihai.maruseac:

So, is this solved? Will we have a stack trace in a future version of GHC? (I just got free time and was thinking into dwelling into this when I found that there are several patches allready :) )

Max's patch isn't going into 7.0. There are some unsolved problems with the design (see earlier comments), so we're still looking for a good solution.

  Changed 3 years ago by igloo

  • status changed from patch to new

  Changed 3 years ago by simonpj

Quite a lot of work has been done here, esp by Max, and it'd be a pity not to make push it to completion. But no one "owns" this ticket, so I'm afraid it may languish unloved.

Does anyone want to take it on and push it over the hill? I'm sure Simon and Ian (esp) will act in support. Even enumerating the issues that need to be solved would be helpful.

Simon

  Changed 3 years ago by mihai.maruseac

Hi,

I am trying to do something as a disertation project (to be defended on July) but I don't guarantee that I'll stick to it instead of changing to something else or that I'll get something good done. However, I'll try, so hints, support, issues needing to be solved, anything would be useful.

Mihai Maruseac

PS: don't think I should take this as owner since I've only recently digged into GHC's sources

  Changed 3 years ago by liyang

  • cc hackage.haskell.org@… added

  Changed 3 years ago by igloo

  • milestone changed from 7.0.1 to 7.0.2

  Changed 2 years ago by igloo

  • milestone changed from 7.0.2 to 7.2.1

  Changed 2 years ago by ezyang

Hey Max,

Since we recently moved to Git, there's not a terribly easy way to import these Darcs patches anymore. Do you think you could move your patch series into a Git branch? I'd even volunteer to keep it up to date with the moving master :-)

Cheers, Edward

  Changed 2 years ago by lelf

  • cc anton.nik@… added

  Changed 2 years ago by batterseapower

Edward: I think my patches are useless because the approach I took is a dead end :-(. The Right Thing to do is generate DWARF info, even though it is a considerably more involved solution.

  Changed 2 years ago by dterei

  • cc dterei added

  Changed 21 months ago by igloo

  • milestone changed from 7.2.1 to 7.4.1

  Changed 21 months ago by bgamari

  • cc bgamari@… added

  Changed 20 months ago by scpmw

  • cc scpmw@… added

  Changed 20 months ago by huangyi

  • cc yi.codeplayer@… added

  Changed 19 months ago by nathanhowell

  • cc nathanhowell@… added

  Changed 17 months ago by igloo

  • milestone changed from 7.4.1 to 7.6.1

  Changed 9 months ago by igloo

  • milestone changed from 7.6.1 to 7.6.2

  Changed 5 months ago by morabbin

See also #960.

Note: See TracTickets for help on using tickets.