Is the leaky Example Contrived?

Some words may be in order about where seqaid and leaky came from, and how this all fits together. seqaid was the first of this group of projects, started in June (2014). At that time, it was a GHC API application, and there's been a fair amount of water under the bridge since then. In July I conceived of deepseq-bounded based on considerations while pushing seqaid. August I was on hiatus, then finished deepseq-bounded, but wanted to see if I could plug it in to seqaid and maybe release them together.

Up till then, I'd been using Anatomy as my main space leak example, but it stopped leaking in GHC 7.8.1 (if not sooner). I needed a good, robust space leak example to test my work on.

At the same time, I was dissatisfied with seqaid as a GHC API executable, because I wanted something more seamless to the user, like GHC plugins. So I foolishly went down a very deep rabbit hole to re-write seqaid as a GHC plugin, which finally was only possible by also using Template Haskell and a text-level (regex) pre-processor (GHC -F option). It is seamless to the user, but the code is quite a bit less elegant than the GHC API executable. (Which wasn't elegant to begin with.) I don't regret learning TH and Core, and the several other things learned on this adventure. But I would not recommend programming in Core unless you really need to, or you know (really know) that what you need to do is well-supported that late in the compilation pipeline.

So, I was making adventures into GHC plugins, and I needed a dependable leak example. What I did was take my most recent GHC (7.8.1), and try to write a small program that exhibited a space leak, because I didn't have a single small example, although I have numerous large ones. :) I documented this process (I always keep a play-by-play text file with any project or subproject), but I won't review it now. Suffice it to say, leaky was designed to "resemble a real-world program" (in the choice of data types, and in the "steady-state, long-running" behaviour, with state-to-state evolution). I didn't rest until I had it leaking with GHC 7.8.1 under -O2, even with all strict fields in the data structures. (It still leaks in 7.8.3.)

During that battle of wits with the compiler, I wasn't really thinking about deepseq-bounded or seqaid, so the leak example was not contrived with these tools in mind (although they already existed, particularly deepseq-bounded which was finished).

Now that I had a leak example, I was ready to test my leak-plugging tools. I added a deep list to the state, so that while NFData/force could be used to plug the leak, it incurred an arbitrarily-large performance hit. In this respect, there was some contrivance, as I wanted to show how NFDataN could outperform NFData when used in a similar manner. Then of course I wanted to show how NFDataP could outperform them both. I adorned the state with some large strict blobs, which NFDataN cannot avoid, but NFDataP can. The optimal forcing pattern used with NFDataP was hand-written. To date, the optimiser part of seqaid is planned but still unimplemented.

I think the example is valid (realistic), notwithstanding these contrivances. There hasn't been time yet to put the new tools to work on my real projects. seqaid in particular is just breaking out of its shell. I'll be reporting progress as it becomes feasible.

The following shows the output of seqaid with leaky. It is also a wee bit contrived, as I sweep NFDataN N value to a fixed depth, and then the fixed (hand-optimised) pattern is developed by replaying iterated shrinkPat in reverse. But it does summarise the results nicely.

Using NFDataN.forcen N:

                                                 live      alloc  type
  N  0                                         357828    3350236    TA
  N  1                                         686376    3316512    TA
  N  2                                         909104    4942636    TA
  N  3                                        1121052    4979364    TA
  N  4                                        1301872    5560432    TA
  N  5                                        1609760   53440684    TA
  N  6                                         151460   54431296    TA
  N  7                                         139240   53374284    TA
  N  8                                         129440   53405380    TA

Using NFDataP.forcep P:

                                                 live      alloc  type
  P  .                                         457296    3341600    TA
  P  .{...}                                    698872    5220200    TA
  P  .{.{...}..}                               954252    6063164    TA
  P  .{.{...{.}}..}                           1243156    6572740    TA
  P  .{.{...{.{.#..}}}..{..}}                 1452016    8829248    TA
  P  .{.{..{.}.{.{.{.}#..{.}}}}..{..{.}}}      319744   10577588    TA
  P  .{.{..{.}.{.{.{.}#..{.}}}}..{..{.}}}      159284    8870360    TA
  P  .{.{..{.}.{.{.{.}#..{.}}}}..{..{.}}}      150004    8826904    TA
  P  .{.{..{.}.{.{.{.}#..{.}}}}..{..{.}}}      190012    8748076    TA
  P  .{.{..{.}.{.{.{.}#..{.}}}}..{..{.}}}      128232    8867404    TA

A few remarks: