parsec2-1.0.0: Monadic parser combinators

Portabilitynon-portable (uses existentially quantified data constructors)
Stabilityprovisional
MaintainerAntoine Latter <aslatter@gmail.com>

Text.ParserCombinators.Parsec.Token

Description

A helper module to parse lexical elements (tokens).

Synopsis

Documentation

data LanguageDef st Source

The LanguageDef type is a record that contains all parameterizable features of the Text.ParserCombinators.Parsec.Token module. The module Text.ParserCombinators.Parsec.Language contains some default definitions.

Constructors

LanguageDef 

Fields

commentStart :: String

Describes the start of a block comment. Use the empty string if the language doesn't support block comments. For example "/*".

commentEnd :: String

Describes the end of a block comment. Use the empty string if the language doesn't support block comments. For example "*/".

commentLine :: String

Describes the start of a line comment. Use the empty string if the language doesn't support line comments. For example "//".

nestedComments :: Bool

Set to True if the language supports nested block comments.

identStart :: CharParser st Char

This parser should accept any start characters of identifiers. For example letter <|> char "_".

identLetter :: CharParser st Char

This parser should accept any legal tail characters of identifiers. For example alphaNum <|> char "_".

opStart :: CharParser st Char

This parser should accept any start characters of operators. For example oneOf ":!#$%&*+./<=>?@\\^|-~"

opLetter :: CharParser st Char

This parser should accept any legal tail characters of operators. Note that this parser should even be defined if the language doesn't support user-defined operators, or otherwise the reservedOp parser won't work correctly.

reservedNames :: [String]

The list of reserved identifiers.

reservedOpNames :: [String]

The list of reserved operators.

caseSensitive :: Bool

Set to True if the language is case sensitive.

data TokenParser st Source

The type of the record that holds lexical parsers.

Constructors

TokenParser 

Fields

identifier :: CharParser st String

This lexeme parser parses a legal identifier. Returns the identifier string. This parser will fail on identifiers that are reserved words. Legal identifier (start) characters and reserved words are defined in the LanguageDef that is passed to makeTokenParser. An identifier is treated as a single token using try.

reserved :: String -> CharParser st ()

The lexeme parser reserved name parses symbol name, but it also checks that the name is not a prefix of a valid identifier. A reserved word is treated as a single token using try.

operator :: CharParser st String

This lexeme parser parses a legal operator. Returns the name of the operator. This parser will fail on any operators that are reserved operators. Legal operator (start) characters and reserved operators are defined in the LanguageDef that is passed to makeTokenParser. An operator is treated as a single token using try.

reservedOp :: String -> CharParser st ()

The lexeme parser reservedOp name parses symbol name, but it also checks that the name is not a prefix of a valid operator. A reservedOp is treated as a single token using try.

charLiteral :: CharParser st Char

This lexeme parser parses a single literal character. Returns the literal character value. This parsers deals correctly with escape sequences. The literal character is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).

stringLiteral :: CharParser st String

This lexeme parser parses a literal string. Returns the literal string value. This parsers deals correctly with escape sequences and gaps. The literal string is parsed according to the grammar rules defined in the Haskell report (which matches most programming languages quite closely).

natural :: CharParser st Integer

This lexeme parser parses a natural number (a positive whole number). Returns the value of the number. The number can be specified in decimal, hexadecimal or octal. The number is parsed according to the grammar rules in the Haskell report.

integer :: CharParser st Integer

This lexeme parser parses an integer (a whole number). This parser is like natural except that it can be prefixed with sign (i.e. '-' or '+'). Returns the value of the number. The number can be specified in decimal, hexadecimal or octal. The number is parsed according to the grammar rules in the Haskell report.

float :: CharParser st Double

This lexeme parser parses a floating point value. Returns the value of the number. The number is parsed according to the grammar rules defined in the Haskell report.

naturalOrFloat :: CharParser st (Either Integer Double)

This lexeme parser parses either natural or a float. Returns the value of the number. This parsers deals with any overlap in the grammar rules for naturals and floats. The number is parsed according to the grammar rules defined in the Haskell report.

decimal :: CharParser st Integer

Parses a positive whole number in the decimal system. Returns the value of the number.

hexadecimal :: CharParser st Integer

Parses a positive whole number in the hexadecimal system. The number should be prefixed with "0x" or "0X". Returns the value of the number.

octal :: CharParser st Integer

Parses a positive whole number in the octal system. The number should be prefixed with "0o" or "0O". Returns the value of the number.

symbol :: String -> CharParser st String

Lexeme parser symbol s parses string s and skips trailing white space.

lexeme :: forall a. CharParser st a -> CharParser st a

lexeme p first applies parser p and than the whiteSpace parser, returning the value of p. Every lexical token (lexeme) is defined using lexeme, this way every parse starts at a point without white space. Parsers that use lexeme are called lexeme parsers in this document.

The only point where the whiteSpace parser should be called explicitly is the start of the main parser in order to skip any leading white space.

    mainParser  = do{ whiteSpace
                     ; ds <- many (lexeme digit)
                     ; eof
                     ; return (sum ds)
                     }
whiteSpace :: CharParser st ()

Parses any white space. White space consists of zero or more occurrences of a space, a line comment or a block (multi line) comment. Block comments may be nested. How comments are started and ended is defined in the LanguageDef that is passed to makeTokenParser.

parens :: forall a. CharParser st a -> CharParser st a

Lexeme parser parens p parses p enclosed in parenthesis, returning the value of p.

braces :: forall a. CharParser st a -> CharParser st a

Lexeme parser braces p parses p enclosed in braces ('{' and '}'), returning the value of p.

angles :: forall a. CharParser st a -> CharParser st a

Lexeme parser angles p parses p enclosed in angle brackets ('<' and '>'), returning the value of p.

brackets :: forall a. CharParser st a -> CharParser st a

Lexeme parser brackets p parses p enclosed in brackets ('[' and ']'), returning the value of p.

squares :: forall a. CharParser st a -> CharParser st a

DEPRECATED: Use brackets.

semi :: CharParser st String

Lexeme parser |semi| parses the character ';' and skips any trailing white space. Returns the string ";".

comma :: CharParser st String

Lexeme parser comma parses the character ',' and skips any trailing white space. Returns the string ",".

colon :: CharParser st String

Lexeme parser colon parses the character ':' and skips any trailing white space. Returns the string ":".

dot :: CharParser st String

Lexeme parser dot parses the character '.' and skips any trailing white space. Returns the string ".".

semiSep :: forall a. CharParser st a -> CharParser st [a]

Lexeme parser semiSep p parses zero or more occurrences of p separated by semi. Returns a list of values returned by p.

semiSep1 :: forall a. CharParser st a -> CharParser st [a]

Lexeme parser semiSep1 p parses one or more occurrences of p separated by semi. Returns a list of values returned by p.

commaSep :: forall a. CharParser st a -> CharParser st [a]

Lexeme parser commaSep p parses zero or more occurrences of p separated by comma. Returns a list of values returned by p.

commaSep1 :: forall a. CharParser st a -> CharParser st [a]

Lexeme parser commaSep1 p parses one or more occurrences of p separated by comma. Returns a list of values returned by p.

makeTokenParser :: LanguageDef st -> TokenParser stSource

The expression makeTokenParser language creates a TokenParser record that contains lexical parsers that are defined using the definitions in the language record.

The use of this function is quite stylized - one imports the appropiate language definition and selects the lexical parsers that are needed from the resulting TokenParser.

  module Main where

  import Text.ParserCombinators.Parsec
  import qualified Text.ParserCombinators.Parsec.Token as P
  import Text.ParserCombinators.Parsec.Language (haskellDef)

  -- The parser
  ...

  expr  =   parens expr
        <|> identifier
        <|> ...


  -- The lexer
  lexer       = P.makeTokenParser haskellDef

  parens      = P.parens lexer
  braces      = P.braces lexer
  identifier  = P.identifier lexer
  reserved    = P.reserved lexer
  ...