Ticket #4225 (closed bug: fixed)

Opened 3 years ago

Last modified 3 years ago

RTS -i (heap profiling interval) documentation could be clearer

Reported by: ClaudiusMaximus Owned by: simonmar
Priority: high Milestone: 7.0.1
Component: Profiling Version: 6.12.3
Keywords: Cc:
Operating System: Unknown/Multiple Architecture: Unknown/Multiple
Type of failure: Documentation bug Difficulty:
Test Case: Blocked By:
Blocking: Related Tickets:

Description

As profiling overhead depends on heap size, hp2ps (for example) gives misleading graphs wherein periods with high heap usage appear much longer in time than periods with low heap usage.

If #1420 is implemented, this option will be even more desireable.

Attachments

i1.png Download (4.1 KB) - added by ClaudiusMaximus 3 years ago.
./Main +RTS -h -i1 && cat Main-i1.hp | grep BEGIN | sed "s|.* ||" >i1.dat && gnuplot "i1.dat"

Change History

Changed 3 years ago by malcolm.wallace@…

I believe that the runtime system, when gathering a census for profiling at GC time, disables the timer and re-enables it again after the census is complete. This is specifically to eliminate the overheads of profiling from the resulting graph. If something has changed to break this property, then it should be fixed in the RTS, not post-processed away by hp2ps.

Changed 3 years ago by ClaudiusMaximus

./Main +RTS -h -i1 && cat Main-i1.hp | grep BEGIN | sed "s|.* " >i1.dat && gnuplot "i1.dat"

Changed 3 years ago by ClaudiusMaximus

In the attached graph "i1.png", you can see that the sample times as recorded in the .hp file are not uniform (they seem to measure "real time" or "program time" instead of "non-profiling program time"). So I guess the "removing overheads of profiling" property is indeed broken.

$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.12.3
$ uname -a
Linux zebimus 2.6.32-24-generic #38-Ubuntu SMP Mon Jul 5 09:20:59 UTC 2010 x86_64 GNU/Linux

Changed 3 years ago by ClaudiusMaximus

After some thinking, I've reached these conclusions, which are conjecture but if they are the case they do somewhat explain the behaviour I've been seeing:

1. To get meaningful heap profiles, GC is performed before each census. 2. GC takes an amount of time that depends on total heap usage. 3. Census takes an amount of time that depends on total heap usage. 4. GC affects the behaviour of the program, so IS included in heap profile time stamps. 5. Census does not affect the behaviour of the program, so IS NOT included in heap profile time stamps. 6. GC takes much longer than census.

(1) would imply that the space/time behaviour of a profiled program might be very different from an unprofiled program because GC might be performed much more frequently.

In which case, perhaps it would make sense to have a "tickless" heap profile where census is performed after any "naturally" occuring GC?

I will look for the documentation giving an overview of how the RTS actually does profiling, because chances are that my guesses here are way off the mark.

Changed 3 years ago by malcolm.wallace@…

I was certainly under the impression that if you specify no "-i" RTS option at all, then a heap census is taken only at the naturally-occuring GC breaks. (At least, this is the way it worked in hbc and nhc, the first compilers to support profiling.) The ghc manual (section 5.4.1) is possibly misleading here, since it suggests that the default sampling interval is 0.1 seconds. I believe it ought to say that if the "-i" option is given, but with no argument, then the default sampling interval is as stated. Otherwise, sampling is not on a regular ticking clock, but during GC.

Changed 3 years ago by ClaudiusMaximus

  • failure changed from None/Unknown to Documentation bug
  • type changed from feature request to bug
  • summary changed from RTS option to remove profiling overhead from heap profile time stamps to RTS -i (heap profiling interval) documentation could be clearer

Ok, I've done some brief testing and it seems that not specifying -i at all does give much saner behaviour of census at naturally-occuring GC (though, -s reports around 75 samples and the .hp has fewer, around 57 samples).

Perhaps when the documentation is updated to clarify this point there should be an additional note warning that specifying -i will force additional GC which might affect program behaviour?

Thanks for aiding my understanding of all this :)

PS: I found a link to this page but it does not yet contain any information:  http://hackage.haskell.org/trac/ghc/wiki/Commentary/Profiling/Heap linked from:  http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/Slop (which implies that census requires a freshly GC'd heap)

Changed 3 years ago by igloo

  • priority changed from normal to high
  • milestone set to 6.14.1

Changed 3 years ago by simonmar

To clarify/confirm the points that have come up so far:

  • heap census time is not counted as runtime in the profile
  • GC time is counted as runtime
  • heap profiling triggers a full GC at intervals specified by the -i flag, where the default is 0.1s.
  • There is no way, currently, to say "do a census after every naturally occurring GC". That could be easily implemented, although in GHC it would be less useful than it was in hbc/nhc98, because GHC has a generational GC and full collections are quite infrequent - you would get very few samples.

Eliminating profiling overhead from runtime isn't really possible anyway, because other aspects of profiling (e.g. maintaining the cost centre stacks) add quite a bit of overhead too, and that can't be easily factored out. Perhaps not counting the time for the "extra" GCs as part of the runtime would help in that different profiles would be more easily comparable, though it still won't help when comparing profiling to non-profiling runs.

Changed 3 years ago by simonmar

  • owner set to simonmar

Changed 3 years ago by simonmar

  • status changed from new to closed
  • resolution set to fixed

Fixed; I excluded GC time from the figures reported in the profile. This makes the runtime in the profile mostly insensitive to the -i setting. Also I found I needed to use a more accurate time source to measure CPU time on POSIX systems:

Mon Sep 13 14:38:52 BST 2010  Simon Marlow <marlowsd@gmail.com>
  * Don't include GC time in heap profiles (#4225)

Mon Sep 13 14:38:18 BST 2010  Simon Marlow <marlowsd@gmail.com>
  * Use clock_gettime (if available) to measure the process CPU time
  This is much more accurate than getrusage, which was giving misleading
  results when trying to time very quick operations like a minor GC.
Note: See TracTickets for help on using tickets.