License | BSD-3-Clause |
---|---|
Maintainer | Jamie Willis, Gigaparsec Maintainers |
Stability | stable |
Safe Haskell | Safe |
Language | Haskell2010 |
Text.Gigaparsec.Char
Description
This module contains many parsers to do with reading one or more characters. Almost every parser will need something from this module.
In particular, this module contains: combinators that can read specific characters; combinators that represent character classes and their negations; combinators for reading specific strings; as well as a selection of pre-made parsers to parse specific kinds of character, like digits and letters.
Since: 0.1.0.0
Synopsis
- char :: Char -> Parsec Char
- item :: Parsec Char
- satisfy :: (Char -> Bool) -> Parsec Char
- satisfyMap :: (Char -> Maybe a) -> Parsec a
- oneOf :: Set Char -> Parsec Char
- noneOf :: Set Char -> Parsec Char
- string :: String -> Parsec String
- stringOfMany :: (Char -> Bool) -> Parsec String
- stringOfSome :: (Char -> Bool) -> Parsec String
- strings :: Set String -> Parsec String
- trie :: Map String (Parsec a) -> Parsec a
- bit :: Parsec Char
- crlf :: Parsec Char
- digit :: Parsec Char
- endOfLine :: Parsec Char
- hexDigit :: Parsec Char
- letter :: Parsec Char
- letterOrDigit :: Parsec Char
- lower :: Parsec Char
- newline :: Parsec Char
- octDigit :: Parsec Char
- space :: Parsec Char
- tab :: Parsec Char
- upper :: Parsec Char
- whitespace :: Parsec Char
- spaces :: Parsec ()
- whitespaces :: Parsec ()
Core Combinators
These are the most primitive combinators for consuming input capable of any input reading tasks.
Arguments
:: Char | the character to parse, |
-> Parsec Char | a parser that tries to read a single |
This combinator tries to parse a single specific character c
from the input.
Attempts to read the given character c
from the input stream at the current position. If this
character can be found, it is consumed and returned. Otherwise, no input is consumed and this
combinator will fail.
Examples
>>>
parse (char 'a') ""
Failure ..>>>
parse (char 'a') "a"
Success 'a'>>>
parse (char 'a') "ba"
Failure ..
Since: 0.1.0.0
This parser will parse any single character from the input, failing if there is no input remaining.
Since: 0.1.0.0
Arguments
:: (Char -> Bool) | the predicate, |
-> Parsec Char | a parser that tries to read a single character |
This combinator tries to parse a single character from the input that matches the given predicate.
Attempts to read a character from the input and tests it against the predicate pred
. If a character
c
can be read and pred c
is true, then c
is consumed and returned. Otherwise, no input is
consumed and this combinator will fail.
Examples
>>>
parse (satisfy Data.Char.isDigit) ""
Failure ..>>>
parse (satisfy Data.Char.isDigit) "7"
Success '7'>>>
parse (satisfy Data.Char.isDigit) "a5"
Failure ..
Roughly speaking:
char c = satisfy (== c)
Since: 0.1.0.0
Arguments
:: (Char -> Maybe a) | the function to test the next character against and transform it with, should one exist. |
-> Parsec a |
This combinator tries to parse and process a character from the input if it is defined for the given function.
Attempts to read a character from the input and tests to see if it is in the domain of f
. If a character
c
can be read and f c
returns a Just
, then c
is consumed and Just (f c)
is returned. Otherwise,
no input is consumed and this combinator will fail.
Examples
>>>
let digit = satisfyMap (\c -> if isDigit c then Just (digitToInt c) else Nothing)
>>>
parse digit ""
Failure ..>>>
parse digit "7"
Success 7>>>
parse digit "a5"
Failure ..
Since: 0.1.0.0
Character Class Combinators
These combinators allow for working with character classes. This means that a set, or range, of characters can be specified, and the combinator will return a parser that matches one of those characters (or conversely, any character that is not in that set). The parsed character is always returned.
Arguments
:: Set Char | the set of character, |
-> Parsec Char | a parser that parses one of the member of the set |
This combinator tries to parse any character from supplied set of characters cs
, returning it if
successful.
If the next character in the input is a member of the set cs
, it is consumed and returned.
Otherwise, no input is consumed and the combinator fails.
Examples
>>>
let p = oneOf (Set.fromList ['a'..'c'])
>>>
parse p "a"
Success 'a'>>>
parse p "c"
Success 'c'>>>
parse p "xb"
Failure ..
Since: 0.1.0.0
Arguments
:: Set Char | the set, |
-> Parsec Char | a parser that parses one character that is not a member of the set |
This combinator tries to parse any character not from supplied set of characters cs
,
returning it if successful.
If the next character in the input is not a member of the set cs
, it is consumed and returned.
Otherwise, no input is consumed and the combinator fails.
Examples
>>>
let p = noneOf (Set.from ['a'..'c'])
>>>
parse p "a"
Failure ..>>>
parse p "c"
Failure ..>>>
parse p "xb"
Success 'x'>>>
parse p ""
Failure ..
Since: 0.1.0.0
String Combinators
These combinators allow for working with, or building, strings. This means that they can parse
specific strings, specific sets of strings, or can read characters repeatedly to generate
strings. They are united in all returning String
as their result.
Arguments
:: String | the string, |
-> Parsec String | a parser that either parses the string |
This combinator attempts to parse a given string from the input, and fails otherwise.
Attempts to read the given string completely from the input at the current position. If the string is present, then the parser succeeds, and the entire string is consumed from the input. Otherwise, if the input has too few characters remaining, or not all the characters matched, the parser fails. On failure, all the characters that were matched are consumed from the input.
Examples
>>>
parse (string "abc") ""
Failure ..>>>
parse (string "abc") "abcd"
Success "abc">>>
parse (string "abc") "xabc"
Failure ..
Notes
- The error messages generated by
string
do not reflect how far into the input it managed to get: this is because the error being positioned at the start of the string is more natural. However, input will still be consumed for purposes of backtracking.
Since: 0.1.0.0
Arguments
:: (Char -> Bool) | the predicate, |
-> Parsec String | a parser that returns the span of characters satisfying |
This combinator parses characters matching the given predicate zero or more times, collecting the results into a string.
Repeatly reads characters that satisfy the given predicate pred
. When no more characters
can be successfully read, the results are stitched together into a String
and returned.
This combinator can never fail, since satisfy
can never fail having consumed input.
Examples
>>>
let ident = letter <:> stringOfMany isAlphaNum
>>>
parse ident "abdc9d"
Success "abdc9d">>>
parse ident "a"
Success "a">>>
parser ident "9"
Failure ..
Notes
- this acts exactly like
stringOfMany (satisfy pred)
, but may be more efficient. - analogous to the
megaparsec
takeWhileP
combinator.
Since: 0.1.0.0
stringOfSome :: (Char -> Bool) -> Parsec String Source #
This combinator parses characters matching the given predicate one or more times, collecting the results into a string.
Repeatly reads characters that satisfy the given predicate pred
. When no more characters
can be successfully read, the results are stitched together into a String
and returned.
This combinator can never fail having consumed input, since satisfy
can never fail having
consumed input.
Examples
>>>
let ident = stringOfSome isAlpha
>>>
parse ident "abdc9d"
Success "abdc">>>
parse ident "a"
Success "a">>>
parser ident "9"
Failure ..
Notes
- this acts exactly like
stringOfMany (satisfy pred)
, but may be more efficient. - analogous to the
megaparsec
takeWhileP
combinator.
Since: 0.1.0.0
Arguments
:: Set String | the strings to try to parse. |
-> Parsec String | a parser that tries to parse all the given strings returning the longest one that matches. |
This combinator tries to parse each of the strings strs
, until one of them succeeds.
Unlike choice
, this combinator will not necessarily parse the strings in the
order provided. It will avoid strings that have another string as a prefix first, so that it has
Longest Match semantics. It will try to minimise backtracking too, making it a much more efficient
option than choice . map atomic
.
The longest succeeding string will be returned. If no strings match then the combinator fails.
Examples
>>>
let p = strings (Set.fromList ["hell", "hello", "goodbye", "g", "abc"])
>>>
parse p "hell"
Success "hell">>>
parse p "hello"
Success "hello">>>
parse p "good"
Success "g">>>
parse p "goodbye"
Success "goodbye">>>
parse p "a"
Failure ..
Since: 0.1.0.0
Arguments
:: Map String (Parsec a) | the key-value pairs to try to parse. |
-> Parsec a | a parser that tries to parse all the given key-value pairs, returning the (possibly failing) result of the value that corresponds to the longest matching key. |
This combinator tries to parse each of the key-value pairs kvs
, until one of them succeeds.
Each key-value pair in the map provided to this combinator is a string and a parser to perform if that string can be parsed. Keys that are a prefix of another key are avoided, so that the parser overall has Longest Match semantics. It will try and minimise backtracking too, making it an efficient option.
Examples
>>>
let p = trie $ Map.fromList [ ("hell", pure 4)
, ("hello", pure 5) , ("goodbye", pure 7) , ("g", pure 1) , ("abc", pure 3) ]>>>
parse p "hell"
Success 4>>>
parse p "hello"
Success 5>>>
parse p "good"
Success 1>>>
parse p "goodbye"
Success 7>>>
parse p "a"
Failure ..
Notes
- The scope of any backtracking performed is isolated to the key itself, as it is assumed
that once a key parses correctly, the branch has been committed to. Putting an
atomic
around the values will not affect this behaviour.
Since: 0.1.0.0
Specific Character Parsers
These parsers are special cases of satisfy
or char
. They are worth
using, as they are given special error labelling, producing nicer error messages than their
primitive counterparts.
This parser tries to parse a binary digit (bit) and returns it if successful.
A bit is either '0'
or '1'
.
Since: 0.1.0.0
This parser tries to parse a CRLF
newline character pair, returning '\n'
if successful.
A CRLF
character is the pair of carriage return ('\r'
) and line feed ('\n'
). These
two characters will be parsed together or not at all. The parser is made atomic using atomic
.
Since: 0.1.0.0
This parser tries to parse a digit, and returns it if successful.
A digit is one of '0'
to '9'
(inclusive).
Since: 0.1.0.0
endOfLine :: Parsec Char Source #
This parser will parse either a line feed (LF
) or a CRLF
newline, returning '\n'
if successful.
Since: 0.1.0.0
hexDigit :: Parsec Char Source #
This parser tries to parse a hexadecimal digit, and returns it if successful.
A hexadecimal digit is one of (all inclusive ranges):
- the digits
'0'
through'9'
- the letters
'a'
through'f'
- the letters
'A'
through'Z'
Since: 0.1.0.0
letter :: Parsec Char Source #
This parser tries to parse a letter, and returns it if successful.
A letter is a character within the following Unicode General Categories:
- Uppercase Letter (
Lu
) - Lowercase Letter (
Ll
) - Titlecase Letter (
Lt
) - Modifier Letter (
Lm
) - Other Letter (
Lo
)
Since: 0.1.0.0
This parser tries to parse an lowercase letter, and returns it if successful.
A lowercase letter is any character whose Unicode Category Type is Lowercase
Letter (Ll
).
Examples of characters within this category include:
* the Latin letters
through a
* Latin special character such as z
, é
, ß
* Cryillic letters
* Greek letters
* Coptic lettersð
Since: 0.1.0.0
newline :: Parsec Char Source #
This parser tries to parse a line feed newline ('\n'
) character, and returns it if successful.
This parser will not accept a carriage return (CR
) character or CRLF
.
Since: 0.1.0.0
octDigit :: Parsec Char Source #
This parser tries to parse an octal digit, and returns it if successful.
An octal digit is one of '0'
to '7'
(inclusive).
Since: 0.1.0.0
This parser tries to parse a space or tab character, and returns it if successful.
Since: 0.1.0.0
This parser tries to parse a tab ('\t'
) character, and returns it if successful.
This parser does not recognise vertical tabs, only horizontal ones.
Since: 0.1.0.0
This parser tries to parse an uppercase letter, and returns it if successful.
An uppercase letter is any character whose Unicode Category Type is Uppercase Letter (Lu
).
Examples of characters within this category include:
* the Latin letters
through A
* Latin special character such as Z
, Å
, Ç
* Cryillic letters
* Greek letters
* Coptic lettersÕ
Since: 0.1.0.0
whitespace :: Parsec Char Source #
This parser tries to parse a whitespace character, and returns it if successful.
Whitespace is any one of the following:
- a space (
' '
) - a tab (
't'
) - a line feed (
'n'
) - a carriage return (
'r'
) - a form feed (
'f'
) - a vertical tab (
'v'
) - any other character with General Category Space (
Zs
)
Since: 0.1.0.0
Whitespace Skipping Parsers
These parsers are designed to skip chunks of whitespace, for very rudimentary lexing tasks. It is probably better to use the functionality of Gigaparsec.Token.
This parser skips zero or more space characters using space
.
Since: 0.1.0.0
whitespaces :: Parsec () Source #
This parser skips zero or more space characters using whitespace
.
Since: 0.1.0.0