megaparsec-4.0.0: Monadic parser combinators

Copyright© 2015 Megaparsec contributors © 2007 Paolo Martini © 1999–2001 Daan Leijen
LicenseBSD3
MaintainerMark Karpov <markkarpov@opmbx.org>
Stabilityexperimental
Portabilityportable
Safe HaskellNone
LanguageHaskell2010

Text.Megaparsec

Contents

Description

This module includes everything you need to get started writing a parser.

By default this module is set up to parse character data. If you'd like to parse the result of your own tokenizer you should start with the following imports:

import Text.Megaparsec.Prim
import Text.Megaparsec.Combinator

Then you can implement your own version of satisfy on top of the token primitive.

Typical import section looks like this:

import Text.Megaparsec
import Text.Megaparsec.String
-- import Text.Megaparsec.ByteString
-- import Text.Megaparsec.ByteString.Lazy
-- import Text.Megaparsec.Text
-- import Text.Megaparsec.Text.Lazy

As you can see the second import depends on data type you want to use as input stream. It just defines useful type-synonym Parser and parseFromFile function.

Megaparsec is capable of a lot. Apart from this standard functionality you can parse permutation phrases with Text.Megaparsec.Perm and even entire languages with Text.Megaparsec.Lexer. These modules should be imported explicitly along with the two modules mentioned above.

Synopsis

Running parser

type Parsec s = ParsecT s Identity Source

Parsec is non-transformer variant of more general ParsecT monad transformer.

data ParsecT s m a Source

ParsecT s m a is a parser with stream type s, underlying monad m and return type a.

runParser :: Stream s t => Parsec s a -> String -> s -> Either ParseError a Source

The most general way to run a parser over the Identity monad. runParser p file input runs parser p on the input list of tokens input, obtained from source file. The file is only used in error messages and may be the empty string. Returns either a ParseError (Left) or a value of type a (Right).

parseFromFile p file = runParser p file <$> readFile file

runParserT :: (Monad m, Stream s t) => ParsecT s m a -> String -> s -> m (Either ParseError a) Source

The most general way to run a parser. runParserT p file input runs parser p on the input list of tokens input, obtained from source file. The file is only used in error messages and may be the empty string. Returns a computation in the underlying monad m that return either a ParseError (Left) or a value of type a (Right).

parse Source

Arguments

:: Stream s t 
=> Parsec s a

Parser to run

-> String

Name of source file, included in error messages

-> s

Input for parser

-> Either ParseError a 

parse p file input runs parser p over Identity (see runParserT if you're using the ParserT monad transformer; parse itself is just a synonym for runParser). It returns either a ParseError (Left) or a value of type a (Right). show or print can be used to turn ParseError into the string representation of the error message. See Text.Megaparsec.Error if you need to do more advanced error analysis.

main = case (parse numbers "" "11, 2, 43") of
         Left err -> print err
         Right xs -> print (sum xs)

numbers = commaSep integer

parseMaybe :: Stream s t => Parsec s a -> s -> Maybe a Source

parseMaybe p input runs parser p on input and returns result inside Just on success and Nothing on failure. This function also parses eof, so if the parser doesn't consume all of its input, it will fail.

The function is supposed to be useful for lightweight parsing, where error messages (and thus file name) are not important and entire input should be parsed. For example it can be used when parsing of single number according to specification of its format is desired.

parseTest :: (Stream s t, Show a) => Parsec s a -> s -> IO () Source

The expression parseTest p input applies a parser p against input input and prints the result to stdout. Used for testing.

Combinators

(<|>) :: Alternative f => forall a. f a -> f a -> f a

An associative binary operation

This combinator implements choice. The parser p <|> q first applies p. If it succeeds, the value of p is returned. If p fails without consuming any input, parser q is tried.

The parser is called predictive since q is only tried when parser p didn't consume any input (i.e. the look ahead is 1). This non-backtracking behaviour allows for both an efficient implementation of the parser combinators and the generation of good error messages.

many :: Alternative f => forall a. f a -> f [a]

Zero or more.

many p applies the parser p zero or more times. Returns a list of the returned values of p.

identifier = (:) <$> letter <*> many (alphaNum <|> char '_')

some :: Alternative f => forall a. f a -> f [a]

One or more.

some p applies the parser p one or more times. Returns a list of the returned values of p.

word = some letter

optional :: Alternative f => f a -> f (Maybe a)

One or none.

optional p tries to apply parser p. It will parse p or nothing. It only fails if p fails after consuming input. On success result of p is returned inside of Just, on failure Nothing is returned.

unexpected :: MonadParsec s m t => String -> m a Source

The parser unexpected msg always fails with an unexpected error message msg without consuming any input.

The parsers fail, (<?>) and unexpected are the three parsers used to generate error messages. Of these, only (<?>) is commonly used.

(<?>) :: MonadParsec s m t => m a -> String -> m a infix 0 Source

A synonym for label in form of an operator.

label :: MonadParsec s m t => String -> m a -> m a Source

The parser label name p behaves as parser p, but whenever the parser p fails without consuming any input, it replaces names of “expected” tokens with the name name.

hidden :: MonadParsec s m t => m a -> m a Source

hidden p behaves just like parser p, but it doesn't show any “expected” tokens in error message when p fails.

try :: MonadParsec s m t => m a -> m a Source

The parser try p behaves like parser p, except that it pretends that it hasn't consumed any input when an error occurs.

This combinator is used whenever arbitrary look ahead is needed. Since it pretends that it hasn't consumed any input when p fails, the (<|>) combinator will try its second alternative even when the first parser failed while consuming input.

For example, here is a parser that will try (sorry for the pun) to parse word “let” or “lexical”:

>>> parseTest (string "let" <|> string "lexical") "lexical"
parse error at line 1, column 1:
unexpected "lex"
expecting "let"

What happens here? First parser consumes “le” and fails (because it doesn't see a “t”). The second parser, however, isn't tried, since the first parser has already consumed some input! try fixes this behavior and allows backtracking to work:

>>> parseTest (try (string "let") <|> string "lexical") "lexical"
"lexical"

try also improves error messages in case of overlapping alternatives, because Megaparsec's hint system can be used:

>>> parseTest (try (string "let") <|> string "lexical") "le"
parse error at line 1, column 1:
unexpected "le"
expecting "let" or "lexical"

lookAhead :: MonadParsec s m t => m a -> m a Source

lookAhead p parses p without consuming any input.

If p fails and consumes some input, so does lookAhead. Combine with try if this is undesirable.

notFollowedBy :: MonadParsec s m t => m a -> m () Source

notFollowedBy p only succeeds when parser p fails. This parser does not consume any input and can be used to implement the “longest match” rule.

eof :: MonadParsec s m t => m () Source

This parser only succeeds at the end of the input.

token Source

Arguments

:: MonadParsec s m t 
=> (Int -> SourcePos -> t -> SourcePos)

Next position calculating function

-> (t -> Either [Message] a)

Matching function for the token to parse

-> m a 

The parser token nextPos testTok accepts a token t with result x when the function testTok t returns Right x. The position of the next token should be returned when nextPos is called with the tab width, current source position, and the current token.

This is the most primitive combinator for accepting tokens. For example, the char parser could be implemented as:

char c = token updatePosChar testChar
  where testChar x = if x == c
                     then Right x
                     else Left . pure . Unexpected . showToken $ x

tokens Source

Arguments

:: (MonadParsec s m t, Eq t) 
=> (Int -> SourcePos -> [t] -> SourcePos)

Computes position of tokens

-> (t -> t -> Bool)

Predicate to check equality of tokens

-> [t]

List of tokens to parse

-> m [t] 

The parser tokens posFromTok test parses list of tokens and returns it. posFromTok is called with three arguments: tab width, initial position, and collection of tokens to parse. The resulting parser will use showToken to pretty-print the collection of tokens in error messages. Supplied predicate test is used to check equality of given and parsed tokens.

This can be used for example to write string:

string = tokens updatePosString (==)

between :: Applicative m => m open -> m close -> m a -> m a Source

between open close p parses open, followed by p and close. Returns the value returned by p.

braces = between (symbol "{") (symbol "}")

choice :: (Foldable f, Alternative m) => f (m a) -> m a Source

choice ps tries to apply the parsers in the list ps in order, until one of them succeeds. Returns the value of the succeeding parser.

count :: Alternative m => Int -> m a -> m [a] Source

count n p parses n occurrences of p. If n is smaller or equal to zero, the parser equals to return []. Returns a list of n values.

This parser is defined in terms of count', like this:

count n = count' n n

count' :: Alternative m => Int -> Int -> m a -> m [a] Source

count' m n p parses from m to n occurrences of p. If n is not positive or m > n, the parser equals to return []. Returns a list of parsed values.

Please note that m may be negative, in this case effect is the same as if it were equal to zero.

endBy :: Alternative m => m a -> m sep -> m [a] Source

endBy p sep parses zero or more occurrences of p, separated and ended by sep. Returns a list of values returned by p.

cStatements = cStatement `endBy` semicolon

endBy1 :: Alternative m => m a -> m sep -> m [a] Source

endBy1 p sep parses one or more occurrences of p, separated and ended by sep. Returns a list of values returned by p.

manyTill :: Alternative m => m a -> m end -> m [a] Source

manyTill p end applies parser p zero or more times until parser end succeeds. Returns the list of values returned by p. This parser can be used to scan comments:

simpleComment = string "<!--" >> manyTill anyChar (try $ string "-->")

Note that we need to use try since parsers anyChar and string "-->" overlap and string "-->" could consume input before failing.

someTill :: Alternative m => m a -> m end -> m [a] Source

someTill p end works similarly to manyTill p end, but p should succeed at least once.

option :: Alternative m => a -> m a -> m a Source

option x p tries to apply parser p. If p fails without consuming input, it returns the value x, otherwise the value returned by p.

priority = option 0 (digitToInt <$> digitChar)

sepBy :: Alternative m => m a -> m sep -> m [a] Source

sepBy p sep parses zero or more occurrences of p, separated by sep. Returns a list of values returned by p.

commaSep p = p `sepBy` comma

sepBy1 :: Alternative m => m a -> m sep -> m [a] Source

sepBy1 p sep parses one or more occurrences of p, separated by sep. Returns a list of values returned by p.

skipMany :: Alternative m => m a -> m () Source

skipMany p applies the parser p zero or more times, skipping its result.

space = skipMany spaceChar

skipSome :: Alternative m => m a -> m () Source

skipSome p applies the parser p one or more times, skipping its result.

chainl :: Alternative m => m a -> m (a -> a -> a) -> a -> m a Source

Deprecated: Use Text.Megaparsec.Expr instead.

chainl p op x parses zero or more occurrences of p, separated by op. Returns a value obtained by a left associative application of all functions returned by op to the values returned by p. If there are zero occurrences of p, the value x is returned.

chainl1 :: Alternative m => m a -> m (a -> a -> a) -> m a Source

Deprecated: Use Text.Megaparsec.Expr instead.

chainl1 p op parses one or more occurrences of p, separated by op Returns a value obtained by a left associative application of all functions returned by op to the values returned by p. This parser can for example be used to eliminate left recursion which typically occurs in expression grammars.

Consider using Text.Megaparsec.Expr instead.

chainr :: Alternative m => m a -> m (a -> a -> a) -> a -> m a Source

Deprecated: Use Text.Megaparsec.Expr instead.

chainr p op x parses zero or more occurrences of p, separated by op Returns a value obtained by a right associative application of all functions returned by op to the values returned by p. If there are no occurrences of p, the value x is returned.

Consider using Text.Megaparsec.Expr instead.

chainr1 :: Alternative m => m a -> m (a -> a -> a) -> m a Source

Deprecated: Use Text.Megaparsec.Expr instead.

chainr1 p op parses one or more occurrences of p, separated by op. Returns a value obtained by a right associative application of all functions returned by op to the values returned by p.

Consider using Text.Megaparsec.Expr instead.

sepEndBy :: Alternative m => m a -> m sep -> m [a] Source

Deprecated: Use sepBy p sep <* optional sep instead.

sepEndBy p sep parses zero or more occurrences of p, separated and optionally ended by sep. Returns a list of values returned by p.

sepEndBy1 :: Alternative m => m a -> m sep -> m [a] Source

Deprecated: Use sepBy1 p sep <* optional sep instead.

sepEndBy1 p sep parses one or more occurrences of p, separated and optionally ended by sep. Returns a list of values returned by p.

Character parsing

newline :: MonadParsec s m Char => m Char Source

Parses a newline character.

crlf :: MonadParsec s m Char => m String Source

Parses a carriage return character followed by a newline character. Returns sequence of characters parsed.

eol :: MonadParsec s m Char => m String Source

Parses a CRLF (see crlf) or LF (see newline) end of line. Returns the sequence of characters parsed.

eol = (pure <$> newline) <|> crlf

tab :: MonadParsec s m Char => m Char Source

Parses a tab character.

space :: MonadParsec s m Char => m () Source

Skips zero or more white space characters.

See also: skipMany and spaceChar.

controlChar :: MonadParsec s m Char => m Char Source

Parses control characters, which are the non-printing characters of the Latin-1 subset of Unicode.

spaceChar :: MonadParsec s m Char => m Char Source

Parses a Unicode space character, and the control characters: tab, newline, carriage return, form feed, and vertical tab.

upperChar :: MonadParsec s m Char => m Char Source

Parses an upper-case or title-case alphabetic Unicode character. Title case is used by a small number of letter ligatures like the single-character form of Lj.

lowerChar :: MonadParsec s m Char => m Char Source

Parses a lower-case alphabetic Unicode character.

letterChar :: MonadParsec s m Char => m Char Source

Parses alphabetic Unicode characters: lower-case, upper-case and title-case letters, plus letters of case-less scripts and modifiers letters.

alphaNumChar :: MonadParsec s m Char => m Char Source

Parses alphabetic or numeric digit Unicode characters.

Note that numeric digits outside the ASCII range are parsed by this parser but not by digitChar. Such digits may be part of identifiers but are not used by the printer and reader to represent numbers.

printChar :: MonadParsec s m Char => m Char Source

Parses printable Unicode characters: letters, numbers, marks, punctuation, symbols and spaces.

digitChar :: MonadParsec s m Char => m Char Source

Parses an ASCII digit, i.e between “0” and “9”.

octDigitChar :: MonadParsec s m Char => m Char Source

Parses an octal digit, i.e. between “0” and “7”.

hexDigitChar :: MonadParsec s m Char => m Char Source

Parses a hexadecimal digit, i.e. between “0” and “9”, or “a” and “f”, or “A” and “F”.

markChar :: MonadParsec s m Char => m Char Source

Parses Unicode mark characters, for example accents and the like, which combine with preceding characters.

numberChar :: MonadParsec s m Char => m Char Source

Parses Unicode numeric characters, including digits from various scripts, Roman numerals, et cetera.

punctuationChar :: MonadParsec s m Char => m Char Source

Parses Unicode punctuation characters, including various kinds of connectors, brackets and quotes.

symbolChar :: MonadParsec s m Char => m Char Source

Parses Unicode symbol characters, including mathematical and currency symbols.

separatorChar :: MonadParsec s m Char => m Char Source

Parses Unicode space and separator characters.

asciiChar :: MonadParsec s m Char => m Char Source

Parses a character from the first 128 characters of the Unicode character set, corresponding to the ASCII character set.

latin1Char :: MonadParsec s m Char => m Char Source

Parses a character from the first 256 characters of the Unicode character set, corresponding to the ISO 8859-1 (Latin-1) character set.

charCategory :: MonadParsec s m Char => GeneralCategory -> m Char Source

charCategory cat Parses character in Unicode General Category cat, see GeneralCategory.

char :: MonadParsec s m Char => Char -> m Char Source

char c parses a single character c.

semicolon = char ';'

char' :: MonadParsec s m Char => Char -> m Char Source

The same as char but case-insensitive. This parser returns actually parsed character preserving its case.

>>> parseTest (char' 'e') "E"
'E'
>>> parseTest (char' 'e') "G"
parse error at line 1, column 1:
unexpected 'G'
expecting 'E' or 'e'

anyChar :: MonadParsec s m Char => m Char Source

This parser succeeds for any character. Returns the parsed character.

oneOf :: MonadParsec s m Char => String -> m Char Source

oneOf cs succeeds if the current character is in the supplied list of characters cs. Returns the parsed character. Note that this parser doesn't automatically generate “expected” component of error message, so usually you should label it manually with label or (<?>).

See also: satisfy.

digit = oneOf ['0'..'9'] <?> "digit"

oneOf' :: MonadParsec s m Char => String -> m Char Source

The same as oneOf, but case-insensitive. Returns the parsed character preserving its case.

vowel = oneOf' "aeiou" <?> "vowel"

noneOf :: MonadParsec s m Char => String -> m Char Source

As the dual of oneOf, noneOf cs succeeds if the current character not in the supplied list of characters cs. Returns the parsed character.

noneOf' :: MonadParsec s m Char => String -> m Char Source

The same as noneOf, but case-insensitive.

consonant = noneOf' "aeiou" <?> "consonant"

satisfy :: MonadParsec s m Char => (Char -> Bool) -> m Char Source

The parser satisfy f succeeds for any character for which the supplied function f returns True. Returns the character that is actually parsed.

digitChar = satisfy isDigit <?> "digit"
oneOf cs  = satisfy (`elem` cs)

string :: MonadParsec s m Char => String -> m String Source

string s parses a sequence of characters given by s. Returns the parsed string (i.e. s).

divOrMod = string "div" <|> string "mod"

string' :: MonadParsec s m Char => String -> m String Source

The same as string, but case-insensitive. On success returns string cased as actually parsed input.

>>> parseTest (string' "foobar") "foObAr"
"foObAr"

Error messages

data Message Source

This data type represents parse error messages. There are three kinds of messages:

data Message = Unexpected String
             | Expected   String
             | Message    String

The fine distinction between different kinds of parse errors allows the system to generate quite good error messages for the user.

messageString :: Message -> String Source

Extract the message string from an error message.

badMessage :: Message -> Bool Source

Test if message string is empty.

data ParseError Source

The data type ParseError represents parse errors. It provides the source position (SourcePos) of the error and a list of error messages (Message). A ParseError can be returned by the function parse. ParseError is an instance of the Show and Eq type classes.

errorPos :: ParseError -> SourcePos Source

Extract the source position from ParseError.

errorMessages :: ParseError -> [Message] Source

Extract the list of error messages from ParseError.

errorIsUnknown :: ParseError -> Bool Source

Test whether given ParseError has associated collection of error messages. Return True if it has none and False otherwise.

Textual source position

data SourcePos Source

The abstract data type SourcePos represents source positions. It contains the name of the source (i.e. file name), a line number and a column number. SourcePos is an instance of the Show, Eq and Ord class.

sourceName :: SourcePos -> String Source

Extract the name of the source from a source position.

sourceLine :: SourcePos -> Int Source

Extract the line number from a source position.

sourceColumn :: SourcePos -> Int Source

Extract the column number from a source position.

Low-level operations

class (ShowToken t, ShowToken [t]) => Stream s t | s -> t where Source

An instance of Stream s t has stream type s, and token type t determined by the stream.

Methods

uncons :: s -> Maybe (t, s) Source

data Consumed a Source

This data structure represents an aspect of result of parser's work. The two constructors have the following meaning:

  • Consumed is a wrapper for result when some part of input stream was consumed.
  • Empty is a wrapper for result when no input was consumed.

See also: Reply.

Constructors

Consumed a 
Empty !a 

data Reply s a Source

This data structure represents an aspect of result of parser's work. The two constructors have the following meaning:

  • Ok for successfully run parser.
  • Error for failed parser.

See also: Consumed.

Constructors

Ok a !(State s) 
Error ParseError 

data State s Source

This is Megaparsec state, it's parametrized over stream type s.

Constructors

State 

Instances

Eq s => Eq (State s) Source 
Show s => Show (State s) Source 

getInput :: MonadParsec s m t => m s Source

Returns the current input.

setInput :: MonadParsec s m t => s -> m () Source

setInput input continues parsing with input. The getInput and setInput functions can for example be used to deal with #include files.

getPosition :: MonadParsec s m t => m SourcePos Source

Returns the current source position.

See also: SourcePos.

setPosition :: MonadParsec s m t => SourcePos -> m () Source

setPosition pos sets the current source position to pos.

getTabWidth :: MonadParsec s m t => m Int Source

Returns tab width. Default tab width is equal to defaultTabWidth. You can set different tab width with help of setTabWidth.

setTabWidth :: MonadParsec s m t => Int -> m () Source

Set tab width. If argument of the function is not positive number, defaultTabWidth will be used.

getParserState :: MonadParsec s m t => m (State s) Source

Returns the full parser state as a State record.

setParserState :: MonadParsec s m t => State s -> m () Source

setParserState st set the full parser state to st.

updateParserState :: MonadParsec s m t => (State s -> State s) -> m () Source

updateParserState f applies function f to the parser state.