A module to construct indentation aware parsers. Many programming language have indentation based syntax rules e.g. python and Haskell. This module exports combinators to create such parsers. This is a rewrite of the IndentParser package. There are a few changes in the names of functions besides the underlying code is simplified.
The input source can be thought of as a list of tokens. Abstractly each token occurs at a line and a column and has a width. The column number of a token measures is indentation. If t1 and t2 are two tokens then we say that indentation of t1 is more than t2 if the column number of occurrence of t1 is greater than that of t2.
Currently this module supports two kind of indentation based syntactic structures which we now describe:
- A block of indentation c is a sequence of tokens with indentation at least c. Examples for a block is a where clause of Haskell with no explicit braces.
- Line fold
- A line fold starting at line l and indentation c is a sequence of tokens that start at line l and possibly continue to subsequent lines as long as the indentation is greater than c. Such a sequence of lines need to be folded to a single line. An example is MIME headers. Line folding based binding separation is used in Haskell as well.
For indentation based grammars notice the following should be true
- Combinators for skipping whitespace/comments should skip spaces and comments no matter what the indentation is.
- All tokenisers of the language should check for indentation. The
tokenisermakes its input parser indentation aware. Use it on all tokenisers of the language.
- All tokenisers themselves should skip trailing whitespaces and comments, i.e. they should be lexeme parsers. Otherwise, the will be problem matching the next token.
Generating indentation aware tokenisers could be tricky. One can use the module Text.Parsec.IndentParsec.Token for this.