A module for constructing indentation aware tokeniser that can be used
in conjuction with
. All the
combinator takes a Text.ParserCombinators.Parsec.Token
as its first argument. For every field Text.ParserCombinators.Parsec.Token.TokenParser
foo
of
this module
exports a combinator Text.ParserCombinators.Parsec.Token.TokenParser
foo
. To define a tokeniser for an indentation
based language a user first defines the appropriate
record, applies
the combinator Text.ParserCombinators.Parsec.Language.LanguageDef
to get a Text.ParserCombinators.Parsec.Token.makeTokenParser
record
say Text.ParserCombinators.Parsec.Token.TokenParser
tokP
and then, instead of selecting the field foo
of tokP
,
applies the combinator foo
exported from this module to tokP
. The
semantics of the combinator foo
is essentially same as that of the
field foo
of
but
the returned parsers are indentation aware. Apart from these there are
certain new combinators that are defined specifically for parsing
certain indentation based syntactic constructs. (We have not defined
squares use brackets instead)
Text.ParserCombinators.Parsec.Token.TokenParser
There are two important classes of parser combinator exported by this module:
- Grouping Parser Combinator
- A grouping parser combinator takes as
input a parser say
p
and returns a parser that parsesp
between two grouping delimiters. There are three flavours of grouping parsers:foo
,fooOrBlock
andfooOrLineFold
wherefoo
can be one ofangles
,braces
,parens
,brackets
. To illustrate we takefoo
to bebraces
. The parser
parsesbraces
tokP pp
delimited by '{' and '}'. In this casep
does not care about indentation (i.e. the parserp
is run in
mode). The parserNoIndent
is likebracesOrBlock
tokP pbraces tokP p
but if no explicit delimiting braces are given parsesp
within an indented block. Similarly
parsesbracesOrLineFold
tokP pp
between '{' and '}' and uses line fold when no explicit braces are given. These can be two varients can be defined as follows
bracesOrBlock tokP p = braces tokP p <|> block p bracesOrLineFold tokP p = braces tokP p <|> lineFold p
- Seperator Parser Combinator
- A seperator parser combinator takes as
input a parser say
p
and returns a parser that parses a list ofp
seperated by a seperator. The module exports the combinatorsfooSep
,fooSep1
,fooOrNewLineSep
andfooOrNewLineSep1
, wherefoo
is eithersemi
(in which case the seperator is a semicolon ';') orcomma
(in which case the seperator is a comma ',').
To illustrate the use of this module we now give, as an incomplete example, a parser that parses a where clause in Haskell which illustrates the use of this module.
import qualified Text.ParserCombinators.Parsec.Language as L import qualified Text.ParserCombinators.Parsec.Toke as T import qualified Text.ParserCombinator.Parsec.IndentToken as IT
tokP = T.makeTokenParser L.haskellDef semiOrNewLineSep = IT.semiOrNewLineSep tokP bracesOrBlock = IT.bracesOrBlock tokP identifier = IT.identifier tokP .... symbol = IT.symbol tokP
binding = semiOrNewLineSep bind bind = do id <- identifier symbol (char '=') e <- expr return (id,e) whereClause = do reserved "where"; braceOrBlock binding
- type IndentCharParser st a = IndentParser Char st a
- type TokenParser st = TokenParser (st, IndentState)
- identifier :: TokenParser st -> IndentCharParser st String
- reserved :: TokenParser st -> String -> IndentCharParser st ()
- operator :: TokenParser st -> IndentCharParser st String
- reservedOp :: TokenParser st -> String -> IndentCharParser st ()
- charLiteral :: TokenParser st -> IndentCharParser st Char
- stringLiteral :: TokenParser st -> IndentCharParser st String
- natural :: TokenParser st -> IndentCharParser st Integer
- integer :: TokenParser st -> IndentCharParser st Integer
- float :: TokenParser st -> IndentCharParser st Double
- naturalOrFloat :: TokenParser st -> IndentCharParser st (Either Integer Double)
- decimal :: TokenParser st -> IndentCharParser st Integer
- hexadecimal :: TokenParser st -> IndentCharParser st Integer
- octal :: TokenParser st -> IndentCharParser st Integer
- semi :: TokenParser st -> IndentCharParser st String
- colon :: TokenParser st -> IndentCharParser st String
- dot :: TokenParser st -> IndentCharParser st String
- comma :: TokenParser st -> IndentCharParser st String
- lexeme :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- symbol :: TokenParser st -> String -> IndentCharParser st String
- whiteSpace :: TokenParser st -> IndentCharParser st ()
- semiSep :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]
- semiSep1 :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]
- commaSep :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]
- commaSep1 :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]
- semiOrNewLineSep :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]
- semiOrNewLineSep1 :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]
- commaOrNewLineSep :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]
- commaOrNewLineSep1 :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]
- parens :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- parensOrBlock :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- parensOrLineFold :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- braces :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- bracesOrBlock :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- bracesOrLineFold :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- angles :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- anglesOrBlock :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- anglesOrLineFold :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- brackets :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- bracketsOrBlock :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
- bracketsOrLineFold :: TokenParser st -> IndentCharParser st a -> IndentCharParser st a
Types
type IndentCharParser st a = IndentParser Char st aSource
type TokenParser st = TokenParser (st, IndentState)Source
Combinators
identifier :: TokenParser st -> IndentCharParser st StringSource
Indentation aware parser to match a valid identifier of the language.
:: TokenParser st | |
-> String | The reserved word to be matched. |
-> IndentCharParser st () |
Indentation aware parser to match a reserved word of the language.
operator :: TokenParser st -> IndentCharParser st StringSource
The parser reserved tokP keyword
parses the reserved word keyword.
The string keyword should have been declared as a reserved word in the
-record.
Text.ParserCombinator.Parserc.Language.LanguageDef
:: TokenParser st | |
-> String | The reserved operator to be matched. Should have
been declared as a reserved operator in the
|
-> IndentCharParser st () |
Indentation aware parser to match a reserved operator of the language.
charLiteral :: TokenParser st -> IndentCharParser st CharSource
Indentation aware parser to match a character literal (the syntax is assumend to be that of Hasekell which matches that of most programming language).
stringLiteral :: TokenParser st -> IndentCharParser st StringSource
Indentation aware parser to match a string literal (the syntax is assumend to be that of Hasekell which matches that of most programming language).
natural :: TokenParser st -> IndentCharParser st IntegerSource
Indentation aware parser to match a natural number.
integer :: TokenParser st -> IndentCharParser st IntegerSource
Indentation aware parser to match an integer.
float :: TokenParser st -> IndentCharParser st DoubleSource
Indentation aware parser to match a floating point number.
naturalOrFloat :: TokenParser st -> IndentCharParser st (Either Integer Double)Source
Indentation aware parser to match either a natural number or Floating point number.
decimal :: TokenParser st -> IndentCharParser st IntegerSource
Indentation aware parser to match an integer in decimal.
hexadecimal :: TokenParser st -> IndentCharParser st IntegerSource
Indentation aware parser to match an integer in hexadecimal.
octal :: TokenParser st -> IndentCharParser st IntegerSource
Indentation aware parser to match an integer in ocatal.
semi :: TokenParser st -> IndentCharParser st StringSource
Matches a semicolon and returns ';'.
colon :: TokenParser st -> IndentCharParser st StringSource
Matches a colon and returns :.
dot :: TokenParser st -> IndentCharParser st StringSource
Matches a dot and returns .
comma :: TokenParser st -> IndentCharParser st StringSource
Matches a comma and returns ,.
:: TokenParser st | |
-> IndentCharParser st a | The input parser. |
-> IndentCharParser st a |
Creates a lexeme parser. The resultant parser skips trailing spaces and is indentation aware.
symbol :: TokenParser st -> String -> IndentCharParser st StringSource
Indentation aware parser that is equvalent to string str.
whiteSpace :: TokenParser st -> IndentCharParser st ()Source
The parser whiteSpace skips spaces and comments. This does not care about indentation as skipping spaces should be done irrespective of the indentation.
Separator parser combinators
semiSep :: TokenParser st -> IndentCharParser st a -> IndentCharParser st [a]Source
Given an indentation aware parser p
as argument semiSep tokP
returns a parser that parses zero or more occurances of p
seperated by semicolon (';')
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st [a] |
Given an indentation aware parser p
as argument semiSep1 tokP
returns a parser that parses one or more occurances of p
seperated
by semicolon (';')
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st [a] |
Given an indentation aware parser p
as argument commaSep tokP
returns a parser that parses zero or more occurances of p
seperated by comma (',')
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st [a] |
Given an indentation aware parser p
as argument commaSep1 tokP
returns a parser that parses one or more occurances of p
seperated
by comma (',')
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st [a] |
Given an indentation aware parser p
as argument semiOrNewLineSep
tokP
returns a parser that parses zero or more occurances of p
seperated by either semicolons (';') or newlines. To seperate multiple
occurance of p
in the same line use an explicit semicolon (';').
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st [a] |
Given an indentation aware parser p
as argument semiOrNewLineSep1
tokP
returns a parser that parses one or more occurances of p
seperated by either semicolons (';') or newline. To seperate multiple
occurance of p
in the same line use an explicit semicolon (';').
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st [a] |
Given an indentation aware parser p
as argument commaOrNewLineSep
tokP
returns a parser that parses zero or more occurances of p
seperated by either comma (',') or newlines. To seperate multiple
occurance of p
in the same line use an explicit comma (',').
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st [a] |
Given an indentation aware parser p
as argument commaOrNewLineSep1
tokP
returns a parser that parses one or more occurances of p
seperated by either comma (',') or newline. To seperate multiple
occurance of p
in the same line use an explicit comma (',').
Grouping parser combinator
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st a |
The parser parens tokP p
parses p
between '(' and ')'. The parser
p
does not care about indentation i.e. p
is run in
mode.
NoIndent
:: TokenParser st | |
-> IndentCharParser st a | The input parser |
-> IndentCharParser st a |
Similar to
but when no explicit '(' and ')' are given,
groups parens
p
by block indentation.
:: TokenParser st | |
-> IndentCharParser st a | The input parser |
-> IndentCharParser st a |
Similar to
but when no explicit '(' and ')' are given,
groups parens
p
by a line fold.
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st a |
The parser braces tokP p
parses p
between '{' and '}'. The parser
p
does not care about indentation i.e. p
is run in
mode.
NoIndent
:: TokenParser st | |
-> IndentCharParser st a | The input parser |
-> IndentCharParser st a |
Similar to
but when no explicit '{' and '}' are given,
groups braces
p
by block indentation.
:: TokenParser st | |
-> IndentCharParser st a | The input parser |
-> IndentCharParser st a |
Similar to
but when no explicit '{' and '}' are given,
groups braces
p
by a line fold.
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st a |
The parser angles tokP p
parses p
between angles. The parser p
does not care about indentation i.e. p
is run in
mode.
NoIndent
:: TokenParser st | |
-> IndentCharParser st a | The input parser |
-> IndentCharParser st a |
Similar to
but when no explicit angles are given, groups
angles
p
by block indentation.
:: TokenParser st | |
-> IndentCharParser st a | The input parser |
-> IndentCharParser st a |
Similar to
but when no explicit angles are given, groups
angles
p
by a line fold.
:: TokenParser st | |
-> IndentCharParser st a | The input Parser |
-> IndentCharParser st a |
The parser brackets tokP p
parses p
between '[' and ']'. The
parser p
does not care about indentation i.e. p
is run in
mode.
NoIndent
:: TokenParser st | |
-> IndentCharParser st a | The input parser |
-> IndentCharParser st a |
Similar to
but when no explicit '[' and ']' are given,
groups brackets
p
by block indentation.
:: TokenParser st | |
-> IndentCharParser st a | The input parser |
-> IndentCharParser st a |
Similar to
but when no explicit '[' and ']' are given,
groups brackets
p
by a line fold.