Grammar for Control.DeepSeq.Bounded.Pattern Language

This change is in 0.6.  All related online documents
have been revised to use the new grammar.

The current grammar for the deepseq-bounded Pattern DSL is:

  New Grammar (deepseq-bounded >=0.6)
 
pat [ modifiers ] pat'
 
pat' | . | ! | * [ decimalint ] | ( { pat } )
 
modifiers zero or one of each of the eight modifier, in any order modifier | = | + | ^ | / | % | : typename { ; typename } : | @ decimalint | > permutation
 
typename string containing neither : (unless escaped), nor ; escaped \\: decimalint digit string not beginning with zero permutation of an initial part of the lowercase alphabet, e.g. cdba

Here gold and blue are meta-syntax, red is concrete lexical syntax, and black is informal description.

Optional whitespace can go between any two tokens (basically, anyplace there is space shown in the grammar above).

Semicolons never need escaping, because they're already illegal as part of any Haskell type name.

The semantics are given formally in the PatNode and PatNodeAttrs documentation, as well as informally in the examples, and through the project homepage.


Notable changes from 0.5.5 are:

There were also a few things which were almost changed, but decided against for 0.6. These may or may not end up in 0.7.

Up until version 0.7, flags are available to revert to the old grammar (or to turn off certain changes only), while allowing you to continue to enjoy some of the code improvements since 0.5.5. However, use of the new grammar is strongly encouraged, the old grammar is deprecated and support will probably be dropped in 0.7.


The rest of this page details earlier versions of the grammar, and discusses some rationales behind the changes leading to the above, present syntax.

First, we have the grammar for all versions of deepseq-bounded up to and including 0.5.5:

  Old Grammar (deepseq-bounded <0.6)
 
pat [ = ] . [ { { pat } } ] | ( [ = ] * [ decimalint ] | # ) | .: ctorname { space ctorname } { { pat } } | ( * [ decimalint ] | # ) : typename { space typename } {} ctorname string not containing whitespace typename string not containing whitespace decimalint digit string not beginning with zero space space character ASCII 0x32

And below is a simpler variant of the language that I'm in the process of changing over to, starting at the upcoming deepseq-bounded-0.6.0.0. It has less verbose concrete syntax, so the pattern strings are more concise with higher information density.

The difference is mostly cosmetic, although the new grammar is slightly more expressive, permitting = on any pattern, not only the ones it would seem worthwhile to parallelise.

This could even be useful, since a single node might represent an expensive computation, so if we wanted to force this node, it might pay to parallelise the forcing. And =# might make sense to measure parallelisation overhead. Essentially, there's no use complicating the language by constraining it: there are always possible uses just beyond the horizon of consideration.

  Transitional Grammar (unpublished deepseq-bounded version)
 
pat [ modifiers ] pat'
 
pat' | # | . | * [ decimalint ] | { { pat } } | : ctorname { : ctorname } { { pat } } | ( # | * [ decimalint ] ) :: typename { : typename } {}
 
ctorname typename typename string containing neither : (unless escaped), nor { escaped \: decimalint digit string not beginning with zero modifiers zero or one of each of the seven modifier, in any order modifier | = | + | ^ | / | % | @ decimalint | > permutation permutation of an initial part of the lowercase alphabet, e.g. cdba

It still requires some thought, whether # should be allowed a type constraint, and maybe a few other wee questions of that nature, before stabilising the language until at least version 1.*. The language may grow — for example, we may add pseq node types as we did for par but the core presented here should remain valid and effectual.

Show

Hide

Note that if allow merely : instead of :: for the typename-constrained case, there are two possible parses of
   #:(Int,Bool){}
namely, using the old syntax to resolve the ambiguity for sake of illustration:
   #.:(Int,Bool){}     -- two nodes, # and .
versus
   #:(Int,Bool){}      -- one type-qualified # node
We could also keep the .  just in type-qualified contexts .:, but that would overload . unpleasantly, besides placing the burden of verbosity on the (perhaps) more common TR nodes.

So, we need an extra : in one production alternative, and shed . in another. But the latter is the most frequent pat alternative ({ { pat } }), and with only two characters instead of the three (.{ { pat } }) of the old grammar, the result is noticeably trimmer concrete syntax.

Show

Hide

It would be possible to keep single-colon type qualification designators in all cases, if we required * and # to be written *{} and #{}. That is very tempting, except it feels harsh to give up the plain simplicity and symmetry of the bare symbols, and the {}'s are, after all, completely perfunctory and, far from conveying useful meaning, actually falsely suggest we are matching only unary nodes. (In the * case this is not logically a likely idea, but in the # case it can be.) And you have to consider also, what part of the syntax are you making less elegant, in exchange for more elegant type constraint syntax? Because I haven't seen a huge motivation to use type-constrained pattern nodes yet!...

Yet another, comparatively menial consideration which led to a last-minute concrete syntax change today: (Re)using colon as type list separator (was whitespace before). This is preferable to whitespace, for the simple reason that whitespace is more common in type rep strings ("Maybe Int", etc.) than is colon, so the more common symbol (whitespace) should be allowed un-escaped. Colon is the most economical choice (even including whitespace!) since semantically, colon is already being used exclusively to signal the beginning of such a type list!

Note that, in any case (and this wasn't appreciated before), when you're parsing a type list for a constraint, the parsing context is specific to that, and you can treat things differently -- you're only waiting for the stop character (or separator, or escape). So in particular, whitespace could have reserved special meaning within type constraints, and yet be used freely (say for verical alignment of constant pattern strings for visual HCI convenience) to space the other characters in the pattern.

Yes, I'd like that about the whitespace in.

This does mean the [currently enabled!]  ' '-for-'#'  thing must go... (I do like it faded to light grey in the HTML docs though; keep that.)

The next rendition changed some concrete lexemes, among other things:

  Penultimate Grammar (deepseq-bounded 0.6.0.0 candidate)
 
pat [ modifiers ] pat'
 
pat' | . | ! | * [ decimalint ] | ( { pat } ) | : ctorname { ; ctorname } : ( { pat } ) | ( . | * [ decimalint ] ) :: typename { ; typename } :
 
ctorname typename typename string containing neither : (unless escaped), nor ; escaped \: decimalint digit string not beginning with zero modifiers zero or one of each of the seven modifier, in any order modifier | = | + | ^ | / | % | @ decimalint | > permutation permutation of an initial part of the lowercase alphabet, e.g. cdba

And the much trimmer final grammar you find at the top of the page was the result of making type-constraint just another prefix modifier.