Safe Haskell | None |
---|
This module implements Data.Attoparsec.Text in terms of Parsec. It can be used to write parsers that can be compiled against both Attoparsec and Parsec.
Differences from Data.Attoparsec.Text:
- Incremental input is not supported.
-
satisfyWith
,skip
,scan
, and most of the numeric parsers are not yet implemented. Patches are gladly welcome! - Parsec parsers (and hence the parsers provided here) do not automatically
backtrack on failing alternatives that consumed input. With careful use
of
try
it is possible to write parsers that behave consistent across Attoparsec and Parsec. Read the next section for more on that.
A simple usage example is here: https://github.com/sol/attoparsec-parsec#readme
- type Parser = Parsec Text ()
- parseOnly :: Parser a -> Text -> Either String a
- (<?>) :: Parser a -> String -> Parser a
- try :: Parser a -> Parser a
- module Data.Attoparsec.Combinator
- char :: Char -> Parser Char
- anyChar :: Parser Char
- notChar :: Char -> Parser Char
- satisfy :: (Char -> Bool) -> Parser Char
- peekChar :: Parser (Maybe Char)
- digit :: Parser Char
- letter :: Parser Char
- space :: Parser Char
- inClass :: String -> Char -> Bool
- notInClass :: String -> Char -> Bool
- string :: Text -> Parser Text
- stringCI :: Text -> Parser Text
- skipSpace :: Parser ()
- skipWhile :: (Char -> Bool) -> Parser ()
- take :: Int -> Parser Text
- takeWhile :: (Char -> Bool) -> Parser Text
- takeWhile1 :: (Char -> Bool) -> Parser Text
- takeTill :: (Char -> Bool) -> Parser Text
- takeText :: Parser Text
- takeLazyText :: Parser Text
- endOfLine :: Parser ()
- isEndOfLine :: Char -> Bool
- isHorizontalSpace :: Char -> Bool
- decimal :: Integral a => Parser a
- hexadecimal :: Integral a => Parser a
- endOfInput :: Parser ()
- atEnd :: Parser Bool
Writing parsers that behave consistent across Attoparsec and Parsec
Some care is needed, so that parsers behave consistent across
Attoparsec and Parsec in regards to backtracking. Attoparsec parsers always
backtrack on failure. In contrast, a Parsec parser that fails after it has
consumed input will not automatically backtrack, but it can be turned into
backtracking parsers with try
.
Here is an example that illustrates the difference. The following parser
will fail under Parsec given an input of "for"
:
string "foo" <|> string "for"
The reason for its failure is that the first branch is a partial match, and
will consume the letters 'f'
and 'o'
before failing. In Attoparsec,
the above parser will succeed on that input, because the failed first
branch will consume nothing.
The try
function can be used to write parsers that behave consistent
across Attoparsec and Parsec. Each alternative that may fail after
consuming input, has to be prefixed with try
. E.g. for the parser above
we would write:
try (string "foo") <|> string "for"
For Parsec try
enables backtracking, for Attoparsec it's just a
type-constrained version of id
(see Attoparsec's try
).
Parser type
Running parsers
Combinators
Name the parser, in case failure occurs.
See Parsec's documentation of <?>
for detailed semantics.
try :: Parser a -> Parser aSource
Attempt a parse, and if it fails, rewind the input so that no input appears to have been consumed.
See Parsec's documentation of try
for detailed semantics.
module Data.Attoparsec.Combinator
Parsing individual characters
satisfy :: (Char -> Bool) -> Parser CharSource
The parser satisfy p
succeeds for any character for which the
predicate p
returns True
. Returns the character that is
actually parsed.
digit = satisfy isDigit where isDigit c = c >= '0' && c <= '9'
Special character parsers
Character classes
inClass :: String -> Char -> Bool
Match any character in a set.
vowel = inClass "aeiou"
Range notation is supported.
halfAlphabet = inClass "a-nA-N"
To add a literal '-'
to a set, place it at the beginning or end
of the string.
notInClass :: String -> Char -> Bool
Match any character not in a set.
Efficient string handling
string :: Text -> Parser TextSource
string s
parses a sequence of characters that identically match
s
. Returns the parsed string (i.e. s
).
stringCI :: Text -> Parser TextSource
Satisfy a literal string, ignoring case.
Note: No proper case folding is done, yet. Currently stringCI s
is just
char (toLower c) <|> char (toUpper c)
for each character of s
. The implementation from Data.Attoparsec.Text
tries to do proper case folding, but is actually buggy (see
https://github.com/bos/attoparsec/issues/6). As long as you deal with
characters from the ASCII range, both implementations should be fine.
skipWhile :: (Char -> Bool) -> Parser ()Source
Skip past input for as long as the predicate returns True
.
takeWhile :: (Char -> Bool) -> Parser TextSource
Consume input as long as the predicate returns True
, and return
the consumed input.
This parser does not fail. It will return an empty string if the
predicate returns False
on the first character of input.
Note: Because this parser does not fail, do not use it with
combinators such as many
, because such parsers loop until a
failure occurs. Careless use will thus result in an infinite loop.
takeTill :: (Char -> Bool) -> Parser TextSource
Consume input as long as the predicate returns False
(i.e. until it returns True
), and return the consumed input.
This parser does not fail. It will return an empty string if the
predicate returns True
on the first character of input.
Note: Because this parser does not fail, do not use it with
combinators such as many
, because such parsers loop until a
failure occurs. Careless use will thus result in an infinite loop.
Consume all remaining input
takeLazyText :: Parser TextSource
Consume all remaining input and return it as a single string.
Text parsing
Match either a single newline character '\n'
, or a carriage
return followed by a newline character "\r\n"
.
isEndOfLine :: Char -> Bool
A predicate that matches either a carriage return '\r'
or
newline '\n'
character.
isHorizontalSpace :: Char -> Bool
A predicate that matches either a space ' '
or horizontal tab
'\t'
character.
Numeric parsers
hexadecimal :: Integral a => Parser aSource
Parse and decode an unsigned hexadecimal number.
State observation and manipulation functions
endOfInput :: Parser ()Source
Match only if all input has been consumed.