- type Parser a = P (Str Char) a
- run :: Show t => P (Str Char) t -> String -> IO ()
- pa :: Parser String
- pb :: Parser String
- (<++>) :: Parser String -> Parser String -> Parser String
- wfp :: Parser Int
- expr :: Parser Int
- spaces :: Parser String
- munch :: Parser String
- pAnyToken :: [String] -> Parser String
- pIntList :: Parser [Int]
- main :: IO ()
Documentation
type Parser a = P (Str Char) aSource
We start out by defining the type of parser we want; by specifying the type of the state we resolve a lot of overloading
run :: Show t => P (Str Char) t -> String -> IO ()Source
The fuction
runs the parser and shows both the result, and the correcting steps which were taken during the parsing process.
run
Our first two parsers are simple; one recognises a single a
character and the other one a single b
. Since we will use them later we
convert the recognsied character into String so they can be easily combined.
(<++>) :: Parser String -> Parser String -> Parser StringSource
We can now run the parser
on input "a", which succeeds:
pa
run pa "a"
Result: "a"
If we run the parser
on the empty input "", the expected symbol in inserted,
that the position where it was inserted is reported, and
we get information about what was expected at that position:
pa
run pa ""
Result: "a" Correcting steps: Inserted 'a' at position 0 expecting 'a'
Now let's see what happens if we encounter an unexpected symbol, as in:
run pa "b"
Result: "a" Correcting steps: Deleted 'b' at position 0 expecting 'a' Inserted 'a' at position 1 expecting 'a'
The combinator
applies two parsers sequentially to the input and concatenates their results:
<++>
run (pa <++> pa) "aa"@:
Result: "aa"
The function
is overloaded. The type of its argument determines how to interpret the argument. Thus far we have seen single characters,
but we may pass ranges as well as argument:
pSym
run (pList (pSym ('a','z'))) "doaitse"
Result: "doaitse"
An even more general instance of
takes a triple as argument: a predicate,
a string indicating what is expected,
and the value to insert if nothing can be recognised:
pSym
run (pSym (\t -> 'a' <= t && t <= 'z', "'a'..'z'", 'k')) "1"
Result: 'k' Correcting steps: Deleted '1' at position 0 expecting 'a'..'z' Inserted 'k' at position 1 expecting 'a'..'z'
The parser pCount
recognises a sequence of elements, throws away the results of the recognition process ( <$
), and just returns the number of returned elements.
The choice combinator <<|>
indicates that preference is to be given to the left alternative if it can make progress. This enables us to specify greedy strategies:
run (pCount pa) "aaaaa"
Result: 5
The parsers are instance of the class Monad and hence we can use the result of a previous parser to construct a following one:
run (do {l <- pCount pa; pExact l pb}) "aaacabbb"
Result: ["b","b","b","b"] Correcting steps: Deleted 'c' at position 3 expecting one of ['a', 'b'] Inserted 'b' at position 8 expecting 'b'
The function
converts an ambigous parser into one which returns all possible parses:
amb
run (amb ( (++) <$> pa2 <*> pa3 <|> (++) <$> pa3 <*> pa2)) "aaaaa"
Result: ["aaaaa","aaaaa"]
The applicative style makes it very easy to merge recognsition and computing a result.
As an example we parse a sequence of nested well formed parentheses pairs and
compute the maximum nesting depth with
:
wfp
run wfp "((()))()(())"
Result: 3
It is very easy to recognise infix expressions with any number of priorities and operators:
operators = [[('+', (+)), ('-', (-))], [('*' , (*))], [('^', (^))]] same_prio ops = msum [ op <$ pSym c | (c, op) <- ops] expr = foldr pChainl ( pNatural <|> pParens expr) (map same_prio operators) --
which we can call:
run expr "15-3*5+2^5"
Result: 32
Note that also here correction takes place:
run expr "2 + + 3 5"
Result: 37 Correcting steps: Deleted ' ' at position 1 expecting one of ['0'..'9', '^', '*', '-', '+'] Deleted ' ' at position 3 expecting one of ['(', '0'..'9'] Inserted '0' at position 4 expecting '0'..'9' Deleted ' ' at position 5 expecting one of ['(', '0'..'9'] Deleted ' ' at position 7 expecting one of ['0'..'9', '^', '*', '-', '+']
A common case where ambiguity arises is when we e.g. want to recognise identifiers,
but only those which are not keywords.
The combinator micro
inserts steps with a specfied cost in the result
of the parser which can be used to disambiguate:
ident :: Parser String ident = ((:) <$> pSym ('a','z') <*> pMunch (\x -> 'a' <= x && x <= 'z') `micro` 2) <* spaces idents = pList1 ident pKey keyw = pToken keyw `micro` 1 <* spaces spaces :: Parser String spaces = pMunch (==' ') takes_second_alt = pList ident \<|> (\ c t e -> ["IfThenElse"] ++ c ++ t ++ e) \<$ pKey "if" <*> pList_ng ident \<* pKey "then" <*> pList_ng ident \<* pKey "else" <*> pList_ng ident
A keyword is followed by a small cost 1
, which makes sure that
identifiers which have a keyword as a prefix win over the keyword. Identifiers are however
followed by a cost 2
, with as result that in this case the keyword wins.
Note that a limitation of this approach is that keywords are only recognised as such when expected!
test13 = run takes_second_alt "if a then if else c" test14 = run takes_second_alt "ifx a then if else c"
with results for test13
and test14
:
Result: ["IfThenElse","a","if","c"] Result: ["ifx","a","then","if", "else","c"]
A mistake which is made quite often is to construct a parser which can recognise a sequence of elements using one of the
derived combinators (say
), but where the argument parser can recognise the empty string.
The derived combinators check whether this is the case and terminate the parsing process with an error message:
pList
run (pList spaces) "" Result: *** Exception: The combinator pList requires that it's argument cannot recognise the empty string
run (pMaybe spaces) " " Result: *** Exception: The combinator pMaybe requires that it's argument cannot recognise the empty string
For documentation of
and pMerged
see the module Text.ParserCombinators.UU.Merge:
<||>
run (,,) `pMerged` (list_of pDigit <||> list_of pLower <||> list_of pUpper) "1AabCD2D3d"
results in
Result: ("123","abd","ACDD")
The function
munch = pMunch ( `elem` "^=*")
returns the longest prefix of the input obeying the predicate:
run munch "==^^**rest"
Result: "==^^**" Correcting steps: The token 'r' was not consumed by the parsing process. The token 'e' was not consumed by the parsing process. The token 's' was not consumed by the parsing process. The token 't' was not consumed by the parsing process.