Pattern DSL for deepseq-bounded

Grammar for `Control.DeepSeq.Bounded.Pattern` Language

The current grammar for the deepseq-bounded Pattern DSL is:

  New Grammar (deepseq-bounded >=0.6) 
  pat          →  [ modifiers ] pat' 
  pat'         →  |  .  |  !  |  * [ decimalint ]  |  ( { pat } ) 
  modifiers    →  zero or one of each of the eight modifier, in any order
  modifier     →  |  =  |  +  |  ^  |  /  |  %
                 →|  : typename { ; typename } :
                 →|  @ decimalint
                 →|  > permutation 
  typename     →  string containing neither : (unless escaped), nor ;
  escaped      →  \\:
  decimalint   →  digit string not beginning with zero
  permutation  →  of an initial part of the lowercase alphabet, e.g. cdba

Here gold and blue are meta-syntax, red is concrete lexical syntax, and black is informal description.

{…} means "zero or more repetitions of the enclosed".
[…] means "zero or one occurrence of the enclosed".
…|… signifies a group of two or more alternatives, exactly one of which must be selected. This important metasyntax is more brightly tinted, to help it stand out.
(…) groups meta-expressions, useful for nested alternation.

Optional whitespace can go between any two tokens (basically, anyplace there is space shown in the grammar above).

Semicolons never need escaping, because they're already illegal as part of any Haskell type name.

The semantics are given formally in the PatNode and PatNodeAttrs documentation, as well as informally in the examples, and through the project homepage.

Notable changes from 0.5.5 are:

. became ! (WS pattern nodes)
# became . (WI pattern nodes)
braces ({…}) became parentheses ((…))
semicolon (instead of space) used to separate typenames
() (empty subpattern group, formerly using braces of course) is no longer used to terminate type constraint lists for TI, TN, and TW pattern nodes; rather, a single colon is used
a single colon is now also required to terminate typename lists in TR pattern nodes, which formerly were implicitly closed by the opening brace of the ensuing subpattern group (which must be present)
the language was enriched with seven new, prefix modifiers (called attributes in the API; refer to PatNodeAttrs)
type-constraint was made simply an eighth prefix modifer (formerly postfix; now only depth for *N nodes is postfix).

There were also a few things which were almost changed, but decided against for 0.6. These may or may not end up in 0.7.

the WS pattern node was almost removed from the language, since it is expressible as *1, but it is too convenient in testing and examples to sacrifice a single-character designation
the T* pattern nodes are probably going to get absorbed by PatNodeAttrs in a way analogous to what happened to the = (doSpark)

Up until version 0.7, flags are available to revert to the old grammar (or to turn off certain changes only), while allowing you to continue to enjoy some of the code improvements since 0.5.5. However, use of the new grammar is strongly encouraged, the old grammar is deprecated and support will probably be dropped in 0.7.

The rest of this page details earlier versions of the grammar, and discusses some rationales behind the changes leading to the above, present syntax.

First, we have the grammar for all versions of deepseq-bounded up to and including 0.5.5:

  Old Grammar (deepseq-bounded <0.6) 
  pat →  [ = ] . [ { { pat } } ]
      |  ( [ = ] * [ decimalint ]  |  # )
      |  .: ctorname { space ctorname } { { pat } }
      |  ( * [ decimalint ]  |  # ) : typename { space typename } {}
  ctorname → string not containing whitespace
  typename → string not containing whitespace
  decimalint → digit string not beginning with zero
  space → space character ASCII 0x32

And below is a simpler variant of the language that I'm in the process of changing over to, starting at the upcoming deepseq-bounded-0.6.0.0. It has less verbose concrete syntax, so the pattern strings are more concise with higher information density.

The difference is mostly cosmetic, although the new grammar is slightly more expressive, permitting = on any pattern, not only the ones it would seem worthwhile to parallelise.

This could even be useful, since a single node might represent an expensive computation, so if we wanted to force this node, it might pay to parallelise the forcing. And =# might make sense to measure parallelisation overhead. Essentially, there's no use complicating the language by constraining it: there are always possible uses just beyond the horizon of consideration.

  Transitional Grammar (unpublished deepseq-bounded version) 
  pat          →  [ modifiers ] pat' 
  pat'         →  |  #  |  .  |  * [ decimalint ]  |  { { pat } }
                 →|  : ctorname { : ctorname } { { pat } }
                 →|  ( #  |  * [ decimalint ] ) :: typename { : typename } {} 
  ctorname     →  typename
  typename     →  string containing neither : (unless escaped), nor {
  escaped      →  \:
  decimalint   →  digit string not beginning with zero
  modifiers    →  zero or one of each of the seven modifier, in any order
  modifier     →  |  =  |  +  |  ^  |  /  |  %
                 →|  @ decimalint
                 →|  > permutation
  permutation  →  of an initial part of the lowercase alphabet, e.g. cdba

It still requires some thought, whether # should be allowed a type constraint, and maybe a few other wee questions of that nature, before stabilising the language until at least version 1.*. The language may grow — for example, we may add pseq node types as we did for par — but the core presented here should remain valid and effectual.

Show

Hide

Note that if allow merely : instead of :: for the typename-constrained case, there are two possible parses of

   #:(Int,Bool){}

namely, using the old syntax to resolve the ambiguity for sake of illustration:

   #.:(Int,Bool){}     -- two nodes, # and .

versus

   #:(Int,Bool){}      -- one type-qualified # node

We could also keep the . just in type-qualified contexts .:, but that would overload . unpleasantly, besides placing the burden of verbosity on the (perhaps) more common TR nodes.

So, we need an extra : in one production alternative, and shed . in another. But the latter is the most frequent pat alternative ({ { pat } }), and with only two characters instead of the three (.{ { pat } }) of the old grammar, the result is noticeably trimmer concrete syntax.

Show

Hide

It would be possible to keep single-colon type qualification designators in all cases, if we required * and # to be written *{} and #{}. That is very tempting, except it feels harsh to give up the plain simplicity and symmetry of the bare symbols, and the {}'s are, after all, completely perfunctory and, far from conveying useful meaning, actually falsely suggest we are matching only unary nodes. (In the * case this is not logically a likely idea, but in the # case it can be.) And you have to consider also, what part of the syntax are you making less elegant, in exchange for more elegant type constraint syntax? Because I haven't seen a huge motivation to use type-constrained pattern nodes yet!...

Yet another, comparatively menial consideration which led to a last-minute concrete syntax change today: (Re)using colon as type list separator (was whitespace before). This is preferable to whitespace, for the simple reason that whitespace is more common in type rep strings ("Maybe Int", etc.) than is colon, so the more common symbol (whitespace) should be allowed un-escaped. Colon is the most economical choice (even including whitespace!) since semantically, colon is already being used exclusively to signal the beginning of such a type list!

Note that, in any case (and this wasn't appreciated before), when you're parsing a type list for a constraint, the parsing context is specific to that, and you can treat things differently -- you're only waiting for the stop character (or separator, or escape). So in particular, whitespace could have reserved special meaning within type constraints, and yet be used freely (say for verical alignment of constant pattern strings for visual HCI convenience) to space the other characters in the pattern.

Yes, I'd like that about the whitespace in.

This does mean the [currently enabled!] ' '-for-'#' thing must go... (I do like it faded to light grey in the HTML docs though; keep that.)

The next rendition changed some concrete lexemes, among other things:

  Penultimate Grammar (deepseq-bounded 0.6.0.0 candidate) 
  pat          →  [ modifiers ] pat' 
  pat'         →  |  .  |  !  |  * [ decimalint ]  |  ( { pat } )
                 →|  : ctorname { ; ctorname } : ( { pat } )
                 →|  ( .  |  * [ decimalint ] ) :: typename { ; typename } : 
  ctorname     →  typename
  typename     →  string containing neither : (unless escaped), nor ;
  escaped      →  \:
  decimalint   →  digit string not beginning with zero
  modifiers    →  zero or one of each of the seven modifier, in any order
  modifier     →  |  =  |  +  |  ^  |  /  |  %
                 →|  @ decimalint
                 →|  > permutation
  permutation  →  of an initial part of the lowercase alphabet, e.g. cdba

And the much trimmer final grammar you find at the top of the page was the result of making type-constraint just another prefix modifier.

Grammar for Control.DeepSeq.Bounded.Pattern Language

Grammar for `Control.DeepSeq.Bounded.Pattern` Language