attoparsec-parsec-0.1.3: An Attoparsec compatibility layer for Parsec

Safe HaskellNone

Data.Attoparsec.Text.Parsec

Contents

Description

This module implements Data.Attoparsec.Text in terms of Parsec. It can be used to write parsers that can be compiled against both Attoparsec and Parsec.

Differences from Data.Attoparsec.Text:

  • Incremental input is not supported.
  • satisfyWith, skip, scan, and most of the numeric parsers are not yet implemented. Patches are gladly welcome!
  • Parsec parsers (and hence the parsers provided here) do not automatically backtrack on failing alternatives that consumed input. With careful use of try it is possible to write parsers that behave consistent across Attoparsec and Parsec. Read the next section for more on that.

A simple usage example is here: https://github.com/sol/attoparsec-parsec#readme

Synopsis

Writing parsers that behave consistent across Attoparsec and Parsec

Some care is needed, so that parsers behave consistent across Attoparsec and Parsec in regards to backtracking. Attoparsec parsers always backtrack on failure. In contrast, a Parsec parser that fails after it has consumed input will not automatically backtrack, but it can be turned into backtracking parsers with try.

Here is an example that illustrates the difference. The following parser will fail under Parsec given an input of "for":

string "foo" <|> string "for"

The reason for its failure is that the first branch is a partial match, and will consume the letters 'f' and 'o' before failing. In Attoparsec, the above parser will succeed on that input, because the failed first branch will consume nothing.

The try function can be used to write parsers that behave consistent across Attoparsec and Parsec. Each alternative that may fail after consuming input, has to be prefixed with try. E.g. for the parser above we would write:

try (string "foo") <|> string "for"

For Parsec try enables backtracking, for Attoparsec it's just a type-constrained version of id (see Attoparsec's try).

Parser type

type Parser = Parsec Text ()

Running parsers

Combinators

(<?>)Source

Arguments

:: Parser a 
-> String

the name to use if parsing fails

-> Parser a 

Name the parser, in case failure occurs.

See Parsec's documentation of <?> for detailed semantics.

try :: Parser a -> Parser aSource

Attempt a parse, and if it fails, rewind the input so that no input appears to have been consumed.

See Parsec's documentation of try for detailed semantics.

Parsing individual characters

char :: Char -> Parser CharSource

Match a specific character.

anyChar :: Parser CharSource

Match any character.

notChar :: Char -> Parser CharSource

Match any character except the given one.

satisfy :: (Char -> Bool) -> Parser CharSource

The parser satisfy p succeeds for any character for which the predicate p returns True. Returns the character that is actually parsed.

digit = satisfy isDigit
    where isDigit c = c >= '0' && c <= '9'

Special character parsers

digit :: Parser CharSource

Parse a single digit, as recognised by isDigit.

letter :: Parser CharSource

Parse a letter, as recognised by isAlpha.

space :: Parser CharSource

Parse a space character, as recognised by isSpace.

Character classes

inClass :: String -> Char -> Bool

Match any character in a set.

vowel = inClass "aeiou"

Range notation is supported.

halfAlphabet = inClass "a-nA-N"

To add a literal '-' to a set, place it at the beginning or end of the string.

notInClass :: String -> Char -> Bool

Match any character not in a set.

Efficient string handling

string :: Text -> Parser TextSource

string s parses a sequence of characters that identically match s. Returns the parsed string (i.e. s).

stringCI :: Text -> Parser TextSource

Satisfy a literal string, ignoring case.

Note: No proper case folding is done, yet. Currently stringCI s is just

 char (toLower c) <|> char (toUpper c)

for each character of s. The implementation from Data.Attoparsec.Text tries to do proper case folding, but is actually buggy (see https://github.com/bos/attoparsec/issues/6). As long as you deal with characters from the ASCII range, both implementations should be fine.

skipSpace :: Parser ()Source

Skip over white space.

skipWhile :: (Char -> Bool) -> Parser ()Source

Skip past input for as long as the predicate returns True.

take :: Int -> Parser TextSource

Consume exactly n characters of input.

takeWhile :: (Char -> Bool) -> Parser TextSource

Consume input as long as the predicate returns True, and return the consumed input.

This parser does not fail. It will return an empty string if the predicate returns False on the first character of input.

Note: Because this parser does not fail, do not use it with combinators such as many, because such parsers loop until a failure occurs. Careless use will thus result in an infinite loop.

takeWhile1 :: (Char -> Bool) -> Parser TextSource

Consume input as long as the predicate returns True, and return the consumed input.

This parser requires the predicate to succeed on at least one character of input: it will fail if the predicate never returns True or if there is no input left.

takeTill :: (Char -> Bool) -> Parser TextSource

Consume input as long as the predicate returns False (i.e. until it returns True), and return the consumed input.

This parser does not fail. It will return an empty string if the predicate returns True on the first character of input.

Note: Because this parser does not fail, do not use it with combinators such as many, because such parsers loop until a failure occurs. Careless use will thus result in an infinite loop.

Consume all remaining input

takeText :: Parser TextSource

Consume all remaining input and return it as a single string.

takeLazyText :: Parser TextSource

Consume all remaining input and return it as a single string.

Text parsing

endOfLine :: Parser ()Source

Match either a single newline character '\n', or a carriage return followed by a newline character "\r\n".

isEndOfLine :: Char -> Bool

A predicate that matches either a carriage return '\r' or newline '\n' character.

isHorizontalSpace :: Char -> Bool

A predicate that matches either a space ' ' or horizontal tab '\t' character.

Numeric parsers

decimal :: Integral a => Parser aSource

Parse and decode an unsigned decimal number.

hexadecimal :: Integral a => Parser aSource

Parse and decode an unsigned hexadecimal number.

State observation and manipulation functions

endOfInput :: Parser ()Source

Match only if all input has been consumed.

atEnd :: Parser BoolSource

Return an indication of whether the end of input has been reached.