IndentParser-0.2.1: Combinators for parsing indentation based syntatic structures




A module to construct indentation aware parsers. Many programming language have indentation based syntax rules e.g. python and Haskell. This module exports combinators to create such parsers.

The input source can be thought of as a list of tokens. Abstractly each token occurs at a line and a column and has a width. The column number of a token measures is indentation. If t1 and t2 are two tokens then we say that indentation of t1 is more than t2 if the column number of occurrence of t1 is greater than that of t2.

Currently this module supports two kind of indentation based syntactic structures which we now describe:

A block of indentation c is a sequence of tokens with indentation at least c. Examples for a block is a where clause of Haskell with no explicit braces.
Line fold
A line fold starting at line l and indentation c is a sequence of tokens that start at line l and possibly continue to subsequent lines as long as the indentation is greater than c. Such a sequence of lines need to be folded to a single line. An example is MIME headers. Line folding based binding separation is used in Haskell as well.

The module exports three combinators are indentParser, block and lineFold. To construct parsers for indentation based grammars one typically applies the indentParser. A block can then be parsed using the combinator block and a line fold using lineFold. Generating indentation aware tokenisers could be tricky. Given a language description via the Text.ParserCombinators.Parsec.Language.LanguageDef record use module Text.ParserCombinators.Parsec.IndentParser.Token to generate its tokeiser (this will apply indentParser on all tokenisers and then the user can forget about indentParser combinator).


Internally indentations are implemented using Parser states. If one wants to use parser states as well then use the getState and setState functions exported by this module instead of those exported from the parsec library. Also use the parseTest and runParser function exported from this module instead of the one exported from Parsec.


Parser type.

type IndentParser tok st a = GenParser tok (st, IndentState) aSource

An indentation aware parser.

data IndentMode Source

The mode of the indentation parser.



Ignore indentation


In block mode


In line fold mode


indentParser :: IndentParser tok st a -> IndentParser tok st aSource

The combinator indentParser makes its input parser indentation aware. Usually one would want to make all the tokenisers indentation aware.

noIndent :: IndentParser tok st a -> IndentParser tok st aSource

The parser noIndent p runs p ignoring any indentation based structure. This can be used to parse for example an explicitly braced where clause in Haskell.

block :: IndentParser tok st a -> IndentParser tok st aSource

The parser block p parses a block of p.

lineFold :: IndentParser tok st a -> IndentParser tok st aSource

The parser lineFold p parses a folded line of p.

betweenOrBlock :: IndentParser tok st open -> IndentParser tok st close -> IndentParser tok st a -> IndentParser tok st aSource

The parser betweenOrBlock open close p parses p between open and close. If open is matched p is parsed in NoIndent mode otherwise a block p is parsed in Block mode. For eg. the parser for parsing haskell where clause would look like

 whereClause = do reserved where; betweenOrBlock bindings

betweenOrLineFold :: IndentParser tok st open -> IndentParser tok st close -> IndentParser tok st a -> IndentParser tok st aSource

Similar to betweenOrBlock but uses lineFold instead of block.

User state manipulation.

getState :: IndentParser tok st stSource

Gets the current user state.

setState :: st -> IndentParser tok st ()Source

Sets the user state.

Running and testing.



:: IndentParser tok st a

the parser to be run

-> st

the initial state

-> IndentMode

the indentation mode

-> SourceName

the source file name

-> [tok]

the list of tokens

-> Either ParseError a

the result of parsing

The most generic way to run an IndentParser. Use parseTest for testing your parser instead.



:: IndentParser tok () a

The parser to run

-> SourceName

The name of the source (to report errors)

-> [tok]

The input to the parser

-> Either ParseError a 

Runs the given parser on a given input stream and returns either the result or parse error.



:: IndentCharParser () a

The parser to run

-> SourceName

The file on which to run.

-> IO (Either ParseError a) 

Like parse but use the contents of SourceName as the input tokens.



:: Show a 
=> IndentParser tok () a

The parser to test

-> [tok]

The input to the parser

-> IO () 

Runs the input parser on the given stream and prints the result. Useful for testing parsers.