| 54 | | Two new primitive ops have been created which allow to obtain the address of a closure info table and to obtain the closure payload (i.e. if it is a value, the arguments of the datacon). |
| 55 | | {{{ |
| 56 | | infoPtr# :: a -> Addr# |
| 57 | | closurePayload# :: a -> (# Array# b, ByteArr# #) |
| 58 | | }}} |
| 59 | | The use of these primitives is encapsulated in the `RtClosureInspect` module, which provides: |
| 60 | | {{{ |
| 61 | | getClosureType :: a -> IO ClosureType |
| 62 | | getInfoTablePtr :: a -> Ptr StgInfoTable |
| 63 | | getClosureData :: a -> IO Closure |
| 64 | | |
| 65 | | data Closure = Closure { tipe :: ClosureType |
| 66 | | , infoTable :: StgInfoTable |
| 67 | | , ptrs :: Array Int HValue |
| 68 | | , nonPtrs :: ByteArray# |
| 69 | | } |
| 70 | | |
| 71 | | data ClosureType = Constr |
| 72 | | | Fun |
| 73 | | | Thunk Int |
| 74 | | | ThunkSelector |
| 75 | | | Blackhole |
| 76 | | | AP |
| 77 | | | PAP |
| 78 | | | Indirection Int |
| 79 | | | Other Int |
| 80 | | deriving (Show, Eq) |
| 81 | | }}} |
| 82 | | |
| 83 | | The implementation of the datacon recovery stuff is scattered around: |
| 84 | | {{{ |
| 85 | | Linker.recoverDataCon :: a -> TcM Name |
| 86 | | |- recoverDCInDynEnv :: a -> IO (Maybe Name) |
| 87 | | |- recoverDCInRTS :: a -> TcM Name |
| 88 | | |- ObjLink.lookupDataCon :: Ptr StgInfoTable -> IO (Maybe String) |
| 89 | | }}} |
| 90 | | First we must make sure that we are dealing with a whnf value (i.e. a Constr), as opposed to a thunk, fun, indirection, etc. This information is retrieved from the very own info table (StgInfoTable comes with a Storable instance, defined at ByteCodeItbls). From here on I will use simply constr to refer to a Constr closure. |
| 91 | | |
| 92 | | Once we have the ability to recover the datacon of a constr and thus its (possibly polymorphic) type, we can construct its tree representation. The payload of a closure is an ordered set of pointers and non pointers (words). For a Constr closure, the non pointers correspond to leafs of the tree, primitive unboxed values, the pointers being the so-called subTerms, references to other closures. |
| | 50 | (If for some reason you want to check the original solution, browse the history for this wiki page.) |
| 209 | | == `breakpoint` Implementation == |
| 210 | | When compiling to bytecodes, breakpoints are desugared to 'fake' jump functions, i.e. they are not defined anywhere, later in the interactive environment we link them to something: |
| 211 | | {{{ |
| 212 | | breakpoint => breakpointJump |
| 213 | | breakpointCond => breakpointCondJump |
| 214 | | breakpointAuto => breakpointAutoJump |
| 215 | | }}} |
| 216 | | The types would be: |
| 217 | | {{{ |
| 218 | | data Locals = forall a. Locals a |
| 219 | | |
| 220 | | breakpointAutoJump, breakpointJump :: |
| 221 | | Int -- Address of a StablePtr containing the Ids |
| 222 | | -> [Locals] -- Local bindings list |
| 223 | | -> (String, String, Int) -- Package, Module and site number |
| 224 | | -> String -- Location message (filename + srcSpan) |
| 225 | | -> b -> b |
| 226 | | breakpointCondJump :: Int -> [Locals] -> (String,String,Int) -> String -> Bool -> b -> b |
| 227 | | }}} |
| 228 | | They get filled with the pointer to the ids in scope, their values, the site, a message, and the wrapped value in the desugarer. Everything served with the right amounts of unsafeCoerce sauce and TyApp dressing to make sure it core-lints. |
| 229 | | |
| 230 | | This transformation is loosely formalized in GhciDebugger/BreakpointJump |
| 231 | | |
| 232 | | The site number is relevant only for 'auto' breakpoints, explained later. For the other two types of breakpoints its value should be 0. |
| 233 | | |
| 234 | | The desugarer monad has been extended with an OccEnv of Ids to track the bindings in scope. Of course this environment thing is probably too ad-hoc to use it for anything else. The monad also carries a mutable table of breakpoint sites for the current module. This table is propagated to the ModGuts. |
| 235 | | |
| 236 | | === Default HValues for the Jump functions === |
| 237 | | The dynamic linker has been modified so that it won't panic if one of the jump functions fails to resolve. |
| 238 | | Now if the dynamic linker fails to find a HValue for a Name, before looking for a static symbol it will ask |
| 239 | | {{{ |
| 240 | | DsBreakpoint.lookupBogusBreakpointVal :: Name -> Maybe HValue |
| 241 | | }}} |
| 242 | | which returns a "just return the wrapped thing" if it is one of the Jump names and Nothing otherwise. |
| 243 | | |
| 244 | | This is necessary because a TH function might contain a call to a breakpoint function So if the module it lives in is compiled to bytecodes, the breakpoints will be desugared to 'jumps'. Whenever this code is spliced, the linker will fail to find the jumpfunctions unless there is a default. |
| 245 | | |
| 246 | | Why didn't I address the problem by forbidding breakpoints inside TH code? I couldn't find an easy solution for this, considering the user is free to put a manual breakpoint wherever. |
| 247 | | |
| 248 | | Why did I introduce the default as a special case in the linker? |
| 249 | | |
| 250 | | I considered other options: |
| 251 | | * Running TH splices in an extended link env. This would probably scatter breakpoint related code deep in the typechecker, and is ugly. |
| 252 | | * Making the 'jump' functions real, by giving them equations and types, maybe in the GHC.Exts module. This solution seemed fine but I wasn't sure of how this would interact with dynamic linking of 'jumps'. |
| 253 | | |
| 254 | | |
| 255 | | === A note about bindings in scope in a breakpoint === |
| 256 | | While I was trying to get the generated core for a breakpoint to lint, I made the design decision of not making available the things bound in a recursive group in the breakpoint context. This includes lets, wheres, and mdo notation. The latter case however is not enforced: I haven't found the time to work it out yet. |
| 257 | | |
| 258 | | |
| 259 | | = Dynamic Breakpoints = |
| 260 | | The approach followed here has been the well known 'do the simplest thing that could possibly work'. We instrument the code with 'auto' breakpoints at event ''sites''. Currently event sites are code locations where names are bound, and statements: |
| 261 | | * Binding sites (top level, let/where local bindings, case alternatives, lambda abstractions, etc.) |
| 262 | | * do statements (any variant of them) |
| 263 | | |
| 264 | | The instrumentation is done at the desugarer too, which has been extended accordingly. We distinguish between 'auto' breakpoints, those introduced by the desugarer, and 'normal' breakpoints user created by using the `breakpoint` function directly. |
| 265 | | |
| 266 | | |
| 267 | | == Overhead == |
| 268 | | The instrumentation scheme potentially introduces overhead at two stages: compile-time and run-time. Compile-time overhead is unnoticeable for general programs, although there are no benchmarks available to sustain this claim. Run-time overhead is much more noticeable. |
| 269 | | Run-time overhead has been measured informally to range in between 9x and 25x, depending on the code of the program under consideration. '''This is no longer true. ''' After extensive benchmarking and tweaking, overhead is down to 166% in average, 560% worst case, measured over the entire nofib suite. |
| 270 | | |
| 271 | | |
| 272 | | With an always-on breakpoints scenario in mind, we do a number of things to mitigate this overhead in absence of enabled breakpoints. One of these is to allow a ghc-api client to disable auto breakpoints via the ghc-api functions: |
| 273 | | {{{ |
| 274 | | enableAutoBreakpoints :: Session -> IO () |
| 275 | | disableAutoBreakpoints :: Session -> IO () |
| 276 | | }}} |
| 277 | | |
| 278 | | GHCi would keep breakpoints disabled until the user defines the first breakpoint, and thus for normal use we could keep the -fdebugging flag enabled always. |
| 279 | | |
| 280 | | The problem is that to make the implementation of `disableAutoBreakpoints` (`enableAutoBreakpoints resp.) effective at all we need to implement it by relinking the `breakpointJumpAuto` function to a new "do nothing" lambda (to the user-set bkptHandler resp.). |
| 281 | | |
| 282 | | This would imply a relink, which is quite annoying to a user of GHCi since any top level bindings are lost. This is why this functionality is only a proof of concept and is disabled for now. I wish I had a better understanding of how the dynamic linker and the top level environment in ghci work. |
| 283 | | |
| 284 | | We also try to do some simple breakpoint coalescing. |
| 285 | | |
| 286 | | === Breakpoint coalescing === |
| 287 | | ''.. implemented, to be documented..'' |
| 288 | | |
| 289 | | == Modifications in the renamer == |
| 290 | | This section is easy. There are NO modifications in the renamer, other than removing Lemmih's original code for the `breakpoint` function. All the stuff that we had originally placed here was moved to the desugarer in the final stage of the project. |
| 291 | | |
| 292 | | == Modifications to the desugarer == |
| 293 | | Extended to carry the local scope around. Also extended to desugar breakpoint* to breakpoint*Jump, and to produce the dyn breakpoints instrumentation under -fdebugging. |
| 294 | | |
| 295 | | == Passing the sitelist of a module around == |
| 296 | | After a module has been instrumented with dynamic breakpoints, the list of sites where breakpoints have been injected must be surfaced to the ghc-api. ModGuts has a new field mg_dbg_sites, and from there it is stored in ModDetails.md_dbg_sites |
| 297 | | |
| 298 | | == The `Opt_Debugging` flag == |
| 299 | | This is activated in command-line via `-fdebugging` and can be disabled with `-fno-debugging`. |
| 300 | | This flag simply enables breakpoint instrumentation in the desugarer. |
| 301 | | |
| 302 | | `-fno-debugging` is different from `-fignore-breakpoints` in that user inserted breakpoints will still work. |
| 303 | | |
| 304 | | == Interrupting at exceptions == |
| 305 | | Ideally, a breakpoint that would witness an exception would stop the execution, no more questions. Sadly, it seems impossible to 'witness' an exception. Throw and catch are essentially primitives (throw#, throwio# and catch#), we could install an exception handler at every breakpoint site but that: |
| 306 | | * Would add more overhead |
| 307 | | * Would require serious instrumentation to embed everything in IO, and thus |
| 308 | | * Would alter the evaluation order |
| 309 | | |
| 310 | | So it is not doable via this route. |
| 311 | | |
| 312 | | We could try and use some tricks. For instance, in every 'throw' we spot, we insert a breakpoint based on the condition on this throw. In every 'assert' we do the same. But this would see only user exceptions, missing system exceptions (pattern match failures for instance), asynchronous exceptions and others. Which is not acceptable imho. |
| 313 | | |
| 314 | | I don't know if a satisfactory solution is possible with the current scheme for breakpoints. |
| 315 | | |
| 316 | | == The breakpoints api at ghc-api == |
| 317 | | Once an 'auto' breakpoint, that is a breakpoint inserted by the renamer, is hit, an action is taken. There are hooks to customize this behaviour in the ghc-api. The GHC module provides: |
| 318 | | {{{ |
| 319 | | |
| 320 | | setBreakpointHandler :: Session -> BkptHandler Module -> IO () |
| 321 | | |
| 322 | | data BkptHandler a = BkptHandler { |
| 323 | | -- | What to do once an enabled breakpoint is found |
| 324 | | handleBreakpoint :: forall b. Session |
| 325 | | -> [(Id,HValue)] -- * Local bindings and their id's |
| 326 | | -> BkptLocation a -- * Module and Site # |
| 327 | | -> String -- * A SrcLoc string msg |
| 328 | | -> b -- * The arg. to the breakpoint fun |
| 329 | | -> IO b |
| 330 | | -- | Implementors should return True if the breakpoint is enabled |
| 331 | | , isAutoBkptEnabled :: Session |
| 332 | | -> BkptLocation a -- * Module and Site # |
| 333 | | -> IO Bool |
| 334 | | } |
| 335 | | }}} |
| 336 | | |
| 337 | | The Ghci debugger is a client of this API as described below. |
| 338 | | |
| 339 | | == The D in Dynamic Breakpoints == |
| 340 | | |
| 341 | | In order to implement the 'isAutoBkptEnabled' record, when a breakpoint is hit GHCi must find out whether that site is enabled or not. GHCi thus stores a boolean matrix of enabled breakpoint sites. This scheme is realized in [ [[GhcFile(compiler/main/Breakpoints.hs)]]]: |
| 342 | | {{{ |
| 343 | | data BkptTable a = BkptTable { |
| 344 | | breakpoints :: Map.Map a (UArray Int Bool) -- *An array of breaks, indexed by site number |
| 345 | | , sites :: Map.Map a [[(SiteNumber, Int)]] -- *A list of lines, each line can have zero or more sites, which are annotated with a column number |
| 346 | | } |
| 347 | | }}} |
| 348 | | |
| 349 | | Since this structure needs to be accessed every time a breakpoint is hit and is modified extremely few times in comparison, the goal is to have as fast access time as possible. All of the overhead in our debugger is going to be caused by this operation. |
| 350 | | |
| 351 | | Alternative designs should be explored. (Using bits instead of Bools in the matrix? discard the matrix thing and use an IORef in every breakpoint? some clever trick using the FFI?). Suggestions are welcome. |
| | 185 | (If for some reason you want to check the original solution, browse the history for this wiki page.) |