uu-parsinglib-2.4.0: Online, error-correcting parser combinators; monadic and applicative interfaces




type Parser a = P (Str Char) aSource

We start out by defining the type of parser we want; by specifying the type of the state we resolve a lot of overloading

run :: Show t => P (Str Char) t -> String -> IO ()Source

The fuction run runs the parser and shows both the result, and the correcting steps which were taken during the parsing process.

pa :: Parser StringSource

Our first two parsers are simple; one recognises a single a character and the other one a single b. Since we will use them later we convert the recognsied character into String so they can be easily combined.

(<++>) :: Parser String -> Parser String -> Parser StringSource

We can now run the parser pa on input "a", which succeeds:

   Result: "a"

If we run the parser pa on the empty input "", the expected symbol in inserted, that the position where it was inserted is reported, and we get information about what was expected at that position: run pa ""

 Result: "a"
 Correcting steps: 
     Inserted a at position 0 expecting a

Now let's see what happens if we encounter an unexpected symbol, as in run pa "b":

 Result: "a"
 Correcting steps: 
     Deleted  b at position 0 expecting a
     Inserted a at position 1 expecting a

The combinator <++> applies two parsers sequentially to the input and concatenates their results: run (pa ++ pa) "aa":

 Result: "aa"

pDigit :: Parser CharSource

The function pSym is overloaded. The type of its argument determines how to interpret the argument. Thus far we have seen single characters, but we may pass ranges as well as argument: run (pList (pSym (a,z))) "doaitse"

 Result: doaitse

An even more general instance of pSym takes a triple as argument: a predicate, a string indicating what is expected, and the value to insert if nothing can be recognised: run (pSym (t -> a <= t && t <= z, "'a'..'z'", k)) "v"

 Result: k
 Correcting steps: 
     Deleted  '1' at position 0 expecting 'a'..'z'
     Inserted k at position 1 expecting 'a'..'z'

The parser pCount recognises a sequence of elements, throws away the results of the recognition process ( <$ ), and just returns the number of returned elements. The choice combinator <<|> indicates that prefernce is to be given to the left alternative if it can make progress. This enables us to specify greedy strategies: run (pCount pa) "aaaaa"

 Result: 5

The parsers are instance of the class Monad and hence we can use the result of a previous parser to construct a following one: run (do {l <- pCount pa; pExact l pb}) "aaacabbb"

 Result: ["b","b","b","b"]
 Correcting steps: 
     Deleted  c at position 3 expecting one of [a, b]
     Inserted b at position 8 expecting b

The function amb converts an ambigous parser into one which returns all possible parses: run (amb ( (++) $ pa2 * pa3 | (++) $ pa3 * pa2)) "aaaaa"

 Result: ["aaaaa","aaaaa"]

The applicative style makes it very easy to merge recognsition and computing a result. As an example we parse a sequence of nested well formed parentheses pairs a,d compute the maximum nesting depth: run ( max $ pParens ((+1) $ wfp) * wfp opt 0) "((()))()(())"

 Result: 3

It is very easy to recognise infix expressions with any number of priorities and operators:

 pOp (c, op) = op <$ pSym c
 sepBy p op = pChainl op p
 expr    = foldr sepBy factor [(pOp (+, (+)) | pOp (-, (-))),  pOp (* , (*))] 
 factor  = pNatural | pParens expr

| which we can call: run expr "15-3*5"

 Result: 0

| Note that also here correction takes place: run expr "2 + + 3 5"

 Result: 37
 Correcting steps: 
     Deleted  ' ' at position 1 expecting one of ['0'..'9', *, -, +]
     Inserted '0' at position 3 expecting one of ['(', '0'..'9']
     Deleted  ' ' at position 4 expecting one of ['(', '0'..'9']
     Deleted  ' ' at position 6 expecting one of ['0'..'9', *, -, +]

ident :: Parser StringSource

A common case where ambiguity arises is when we e.g. want to recognise identifiers, but only those which are not keywords. The combinator micro inserts steps with a specfied cost in the result of the parser which can be used to disambiguate:

 ident ::  Parser String
 ident = ((:) $ pSym (a,z) * pMunch (x -> a <= x && x <= z) micro 2) <* spaces
 idents = pList1 ident
 pKey keyw = pToken keyw micro 1 <* spaces
 spaces :: Parser String
 spaces = pMunch (==' ')
 takes_second_alt =   pList ident 
                <|> ( c t e -> ["IfThenElse"] ++  c   ++  t  ++  e) 
                    <$ pKey "if"   <*> pList_ng ident 
                    <* pKey "then" <*> pList_ng ident
                    <* pKey "else" <*> pList_ng ident  

A keyword is followed by a small cost 1, which makes sure that identifiers which have a keyword as a prefix win over the keyword. Identifiers are however followed by a cost 2, with as result that in this case the keyword wins. Note that a limitation of this approach is that keywords are only recognised as such when expected!

 test13 = run takes_second_alt "if a then if else c"
 test14 = run takes_second_alt "ifx a then if else c"

with results:

 Text.ParserCombinators.UU.Examples> test14
 Result: ["IfThenElse","a","if","c"]
 Text.ParserCombinators.UU.Examples> test14
 Result: ["ifx","a","then","if","else","c"]