Is the leaky Example Contrived?

Some words may be in order about where seqaid and leaky came from, and how this all fits together. seqaid was the first of this group of projects, started in June (2014). At that time, it was a GHC API application, and there's been a fair amount of water under the bridge since then. In July I conceived of deepseq-bounded based on considerations while pushing seqaid. August I was on hiatus, then finished deepseq-bounded, but wanted to see if I could plug it in to seqaid and maybe release them together.

Up till then, I'd been using Anatomy as my main space leak example, but it stopped leaking in GHC 7.8.1 (if not earlier). I needed a good, robust space leak example to test my work on.

At the same time, I was dissatisfied with seqaid as a GHC API executable, because I wanted something more seamless to the user, like GHC plugins. So I foolishly went down a very deep rabbit hole to re-write seqaid as a GHC plugin, which finally was only possible by also using Template Haskell and a text-level (regex) pre-processor (GHC -F option). It is seamless to the user, but the code is quite a bit less elegant than the GHC API executable. (Which wasn't elegant to begin with.) I don't regret learning TH and Core, and the several other things learned on this adventure. But I would not recommend programming in Core unless you really need to, or you know (really know) that what you need to do is well-supported that late in the compilation pipeline.

So, I was making adventures into GHC plugins, and I needed a dependable leak example. What I did was take my most recent GHC (7.8.1), and try to write a small program that exhibited a space leak, because I didn't have a single small example, although I have numerous large ones. :) I documented this process (I always keep a play-by-play text file with any project or subproject), but I won't review it now. Suffice it to say, leaky was designed to "resemble a real-world program" (in the choice of data types, and in the "steady-state, long-running" behaviour, with state-to-state evolution). I didn't rest until I had it leaking with GHC 7.8.1 under -O2, even with all strict fields in the data structures. (It still leaks in 7.8.4 and 7.10.1-rc1.)

During that battle of wits with the compiler, I wasn't really thinking about deepseq-bounded or seqaid, so the leak example was not contrived with these tools in mind (although they already existed, particularly deepseq-bounded which was finished).

Now that I had a leak example, I was ready to test my leak-plugging tools. I added a deep list to the state, so that while NFData/force could be used to plug the leak, it incurred an arbitrarily-large performance hit. In this respect, there was some contrivance, as I wanted to show how NFDataN could outperform NFData when used in a similar manner. Then of course I wanted to show how NFDataP could outperform them both. I adorned the state with some large strict blobs, which NFDataN cannot avoid, but NFDataP can. The optimal forcing pattern used with NFDataP was hand-written. To date, the optimiser part of seqaid is planned but still unimplemented.

I think the example is valid (realistic), notwithstanding these contrivances. There hasn't been time yet to put the new tools to work on my real projects. seqaid in particular is just breaking out of its shell. I'll be reporting progress as it becomes feasible.

The following shows the output of seqaid with leaky. It is also a wee bit contrived, as I sweep NFDataN N value to a fixed depth, and then the fixed (hand-optimised) pattern is developed by replaying iterated shrinkPat in reverse. But it does illustrate the sorts of effects possible, once seqaid has an optimiser.

Using NFDataN.forcen N:

                                       live      alloc  type
  N  0                               357828    3350236    TA
  N  1                               686376    3316512    TA
  N  2                               909104    4942636    TA
  N  3                              1121052    4979364    TA
  N  4                              1301872    5560432    TA
  N  5                              1609760   53440684    TA
  N  6                               151460   54431296    TA
  N  7                               139240   53374284    TA
  N  8                               129440   53405380    TA

Using NFDataP.forcep P:

                                       live      alloc  type
 P  .                                227756    1769792    TA
 P  (.                 ..     )      296620    3060904    TA
 P  ((..  .           )..     )      360508    4908612    TA
 P  ((..  (.         ))..     )      421472    5852124    TA
 P  ((..  ((.  ...  ))).(..  ))      521860    9556760    TA
 P  ((.(.)(((.)..(.)))).(.(.)))      607100   11270660    TA
 P  ((!(!)(((!).!(!))))!(!(!)))     1925776    2392040    TA
 P  ((!(!)(((!).!(!))))!(!(!)))     1529652   10297768    TA
 P  ((!(!)(((!).!(!))))!(!(!)))     1249900    2393836    TA
 P  ((!(!)(((!).!(!))))!(!(!)))      690336   13244056    TA
 P  ((!(!)(((!).!(!))))!(!(!)))      212580   13974556    TA
 P  ((!(!)(((!).!(!))))!(!(!)))      216936   11284560    TA

A sort of commentary on the change history is on a separate page.

Remarks