- type Parser a = P (Str Char) a
- run :: Show t => P (Str Char) t -> String -> IO ()
- pa :: Parser String
- pb :: Parser String
- (<++>) :: Parser String -> Parser String -> Parser String
- pDigit :: Parser Char
- ident :: Parser String
- spaces :: Parser String
- pParens :: Parser a -> Parser a
- pAnyToken :: [String] -> Parser String
- pIntList :: Parser [Int]
- parseIntString :: Parser String
- main :: IO ()
Documentation
type Parser a = P (Str Char) aSource
We start out by defining the type of parser we want; by specifying the type of the state we resolve a lot of overloading
run :: Show t => P (Str Char) t -> String -> IO ()Source
The fuction
runs the parser and shows both the result, and the correcting steps which were taken during the parsing process.
run
Our first two parsers are simple; one recognises a single a
character and the other one a single b
. Since we will use them later we
convert the recognsied character into String so they can be easily combined.
(<++>) :: Parser String -> Parser String -> Parser StringSource
We can now run the parser
on input "a", which succeeds:
pa
Result: "a"
If we run the parser
on the empty input "", the expected symbol in inserted, that the position where it was inserted is reported, and
we get information about what was expected at that position: pa
run pa ""
Result: "a" Correcting steps: Inserteda
at position 0 expectinga
Now let's see what happens if we encounter an unexpected symbol, as in run pa "b"
:
Result: "a" Correcting steps: Deletedb
at position 0 expectinga
Inserteda
at position 1 expectinga
The combinator <++>
applies two parsers sequentially to the input and concatenates their results: run (pa ++ pa) "aa"
:
Result: "aa"
The function
is overloaded. The type of its argument determines how to interpret the argument. Thus far we have seen single characters,
but we may pass ranges as well as argument: pSym
run (pList (pSym (
a
,z
))) "doaitse"
Result: doaitse
An even more general instance of
takes a triple as argument: a predicate, a string indicating what is expected,
and the value to insert if nothing can be recognised: pSym
run (pSym (t ->
a
<= t && t <= z
, "'a'..'z'", k
)) "v"
Result:k
Correcting steps: Deleted '1' at position 0 expecting 'a'..'z' Insertedk
at position 1 expecting 'a'..'z'
The parser pCount
recognises a sequence of elements, throws away the results of the recognition process ( <$
), and just returns the number of returned elements.
The choice combinator <<|>
indicates that prefernce is to be given to the left alternative if it can make progress. This enables us to specify greedy strategies:
run (pCount pa) "aaaaa"
Result: 5
The parsers are instance of the class Monad and hence we can use the result of a previous parser to construct a following one: run (do {l <- pCount pa; pExact l pb}) "aaacabbb"
Result: ["b","b","b","b"] Correcting steps: Deletedc
at position 3 expecting one of [a
,b
] Insertedb
at position 8 expectingb
The function
converts an ambigous parser into one which returns all possible parses: amb
run (amb ( (++) $ pa2 * pa3 | (++) $ pa3 * pa2)) "aaaaa"
Result: ["aaaaa","aaaaa"]
The applicative style makes it very easy to merge recognsition and computing a result. As an example we parse a sequence of nested well formed parentheses pairs a,d
compute the maximum nesting depth: run ( max $ pParens ((+1) $ wfp) * wfp
opt
0) "((()))()(())"
Result: 3
It is very easy to recognise infix expressions with any number of priorities and operators:
pOp (c, op) = op <$ pSym c sepBy p op = pChainl op p expr = foldr sepBy factor [(pOp (+
, (+)) | pOp (-
, (-))), pOp (*
, (*))] factor = pNatural | pParens expr
| which we can call: run expr "15-3*5"
Result: 0
| Note that also here correction takes place: run expr "2 + + 3 5"
Result: 37 Correcting steps: Deleted ' ' at position 1 expecting one of ['0'..'9',*
,-
,+
] Inserted '0' at position 3 expecting one of ['(', '0'..'9'] Deleted ' ' at position 4 expecting one of ['(', '0'..'9'] Deleted ' ' at position 6 expecting one of ['0'..'9',*
,-
,+
]
A common case where ambiguity arises is when we e.g. want to recognise identifiers, but only those which are not keywords.
The combinator micro
inserts steps with a specfied cost in the result of the parser which can be used to disambiguate:
ident :: Parser String ident = ((:) $ pSym (a
,z
) * pMunch (x ->a
<= x && x <=z
)micro
2) <* spaces idents = pList1 ident pKey keyw = pToken keywmicro
1 <* spaces spaces :: Parser String spaces = pMunch (==' ') takes_second_alt = pList ident <|> ( c t e -> ["IfThenElse"] ++ c ++ t ++ e) <$ pKey "if" <*> pList_ng ident <* pKey "then" <*> pList_ng ident <* pKey "else" <*> pList_ng ident
A keyword is followed by a small cost 1
, which makes sure that identifiers which have a keyword as a prefix win over the keyword. Identifiers are however
followed by a cost 2
, with as result that in this case the keyword wins.
Note that a limitation of this approach is that keywords are only recognised as such when expected!
test13 = run takes_second_alt "if a then if else c" test14 = run takes_second_alt "ifx a then if else c"
with results:
Text.ParserCombinators.UU.Examples> test14 Result: ["IfThenElse","a","if","c"] Text.ParserCombinators.UU.Examples> test14 Result: ["ifx","a","then","if","else","c"]