| Safe Haskell | None |
|---|
Control.Proxy.Attoparsec.Tutorial
Contents
Description
In this tutorial you will learn how to use this library. The Simple example section should be enough to get you going, but you can keep reading if you want to better understand how to deal with complex parsing scenarios.
You may import this module and try the subsequent examples as you go.
- data Name = Name Text
- hello :: Parser Name
- input1 :: [Text]
- input2 :: [Text]
- helloPipe1 :: (Proxy p, Monad m) => () -> Pipe p Text Name m r
- helloPipe2 :: (Proxy p, Monad m) => () -> Pipe p Text Name m r
- helloPipe3 :: (Proxy p, Monad m) => () -> Pipe (EitherP BadInput p) Text Name m r
- helloPipe4 :: (Proxy p, Monad m) => () -> Pipe (EitherP BadInput p) Text Name m r
- helloPipe5 :: (Proxy p, Monad m) => () -> Pipe (EitherP BadInput p) Text Name m r
- helloPipe6 :: (Proxy p, Monad m) => () -> Pipe p Text Name m r
- skipPartialResults :: (Monad m, Proxy p, AttoparsecInput a) => ParserStatus a -> p (ParserStatus a) (ParserSupply a) (ParserStatus a) (ParserSupply a) m r
Simple example
We'll write a simple Parser that turns Text like “Hello John Doe.”
into , and then make a Name "John Doe"Pipe that turns
those Text values flowing downstream into Name values flowing
downstream using that Parser.
In this example we are using Text, but we may as well use
ByteString. Also, the OverloadedStrings language
extension lets us write our parser easily.
{-# LANGUAGE OverloadedStrings #-}
import Control.Proxy
import Control.Proxy.Attoparsec
import Control.Proxy.Trans.Either
import Data.Attoparsec.Text
import Data.Text
data Name = Name Text
deriving (Show)
hello :: Parser Name
hello = fmap Name $ "Hello " .*> takeWhile1 (/='.') <*. "."
We are done with our parser, now lets make a simple parsing Pipe with it.
helloPipe1 :: (Proxy p, Monad m) => () -> Pipe p Text Name m r helloPipe1 = parserInputD >-> parserD hello
As the type indicates, this Pipe receives Text values from upstream and
sends Name values downstream. This Pipe is made of two smaller
two smaller cooperating Proxys:
-
parserInputD: Preparesainput received from upstream to be consumed by a downstream parsingProxy. -
parserD: Repeatedly runs a givenon inputParsera bafrom upstream, and sendsbvalues downstream.
We need some sample input to test our simple helloPipe1.
input1 :: [Text] input1 = [ "Hello Kate." , "Hello Mary.Hello Jef" , "f." , "Hel" , "lo Tom." ]
We'll use as our input source, which sends
downstream one element from the list at a time. We'll call each of these
elements a chunk. So, fromListS input1 sends 5 chunks of
fromListS input1Text downstream.
Notice how some of our chunks are not, by themselves, complete inputs
for our hello Parser. This is fine; we want to be able to feed the
Parser with either partial or complete input as soon as it's
received from upstream. More input will be requested when needed.
Attoparsec's Parser handles partial parsing just fine.
>>>runProxy $ fromListS input1 >-> helloPipe1 >-> printDName "Kate" Name "Mary" Name "Jeff" Name "Tom"
We have accomplished our simple goal: We've made a Pipe that parses
downstream flowing input using our Parser hello.
Parsing control proxies
Handling parser errors
Let's try with some more complex input.
input2 :: [Text] input2 = [ "Hello Amy." , "Hello, Hello Tim." , "Hello Bob." , "Hello James" , "Hello" , "Hello World." , "HexHello Jon." , "H" , "ello Ann" , "." , "Hello Jean-Luc." ]
>>>runProxy $ fromListS input2 >-> helloPipe1 >-> printDName "Amy" Name "Bob" Name "JamesHelloHello World" Name "Ann" Name "Jean-Luc"
The simple helloPipe1 we built skips chunks of input that fail to be
parsed, and then continues parsing new input. That approach might be
enough if you are certain your input is always well-formed, but
sometimes you may prefer to act differently on these extraordinary
situations.
Instead of just using parserInputD and parserD to build our
helloPipe1, we could have used an additional Proxy in between them to
handle these situations. The module Control.Proxy.Attoparsec.Control
exports some useful Proxys that serve this purpose. The default
behavior just mentioned resembles the one provided by
skipMalformedChunks.
Here are some other examples:
retryLeftovers- On parsing failures, keep retrying with any left-over input, skipping individual bits each time. If there are no left-overs, then more input is requests form upstream.
helloPipe2 :: (Proxy p, Monad m) => () -> Pipe p Text Name m r helloPipe2 = parserInputD >-> retryLeftovers >-> parserD hello
>>>runProxy $ fromListS input2 >-> helloPipe2 >-> printDName "Amy" Name "Tim" Name "Bob" Name "JamesHelloHello World" Name "Jon" Name "Ann" Name "Jean-Luc"
throwParsingErrors-
When a parsing error arises, aborts execution by throwing
MalformedInputin theEitherPproxy transformer.
helloPipe3 :: (Proxy p, Monad m) => () -> Pipe (EitherP BadInput p) Text Name m r helloPipe3 = parserInputD >-> throwParsingErrors >-> parserD hello
>>>runProxy . runEitherK $ fromListS input2 >-> helloPipe3 >-> printDName "Amy" Left (MalformedInput {miParserErrror = ParserError {errorContexts = [], errorMessage = "Failed reading: takeWith"}})
limitInputLengthn-
If a
has consumed inputParsera baof length longer thannwithout producing abvalue, and it's still requesting more input, then throwInputTooLongin theEitherPproxy transformer.
helloPipe4 :: (Proxy p, Monad m) => () -> Pipe (EitherP BadInput p) Text Name m r helloPipe4 = parserInputD >-> limitInputLength 10 >-> parserD hello
>>>runProxy . runEitherK $ fromListS input2 >-> helloPipe4 >-> printDName "Amy" Name "Bob" Left (InputTooLong {itlLenght = 11})
Notice that by default, as mentioned earlier, parsing errors are ignored by skipping the malformed chunk. That's why we didn't get any complaint about the malformed input between “Amy” and “Bob”.
Composing
These Proxys that control the parsing behavior can be easily plugged
together with ( to achieve a combined functionality. Keep in
mind that the order in which these >->)Proxys are used is important.
Suppose you don't want to parse inputs of length longer than 10, and on parsing failures, you want to retry feeding the parser with any left-overs. You could achieve that behaviour like this:
helloPipe5 :: (Proxy p, Monad m) => () -> Pipe (EitherP BadInput p) Text Name m r helloPipe5 = parserInputD >-> limitInputLength 10 >-> retryLeftovers >-> parserD hello
>>>runProxy . runEitherK $ fromListS input2 >-> helloPipe5 >-> printDName "Amy" Name "Tim" Name "Bob" Left (InputTooLong {itlLenght = 11})
Custom behavior
In case the Proxys provided by Control.Proxy.Attoparsec.Control are not
enough for your needs, you can create your custom parsing control Proxy.
Through a parsing control Proxy, values flow
upstream and ParserStatus a values flow downstream. A parsing
control ParserSupply aProxy may use, replace, or alter the flow of these values to
achieve its purpose. A value received from downstream
reports the status of a ParserStatus aparserD parsing Proxy, and in exchange,
downstream expects a value.
ParserSupply a
values carry raw input to be parsed and directives on how it
should be used. ParserSupply a is just a type synonym for
ParserSupply a(. The SupplyUse, a)a value is the input chunk, and the SupplyUse value
could be either Resume if the parser currently waiting for input should be
fed, or Start, if a new parser should be started and fed the input,
effectively aborting any parsing activity currently waiting for more input.
See the documentation about ParserStatus and ParserSupply for more details.
Suppose you want to write a parsing control Proxy that never provides
additional input to partial parsing results. Let's first take a look at the
type of this Proxy:
skipPartialResults :: (Monad m, Proxy p, AttoparsecInput a) => ParserStatus a -> p (ParserStatus a) (ParserSupply a) (ParserStatus a) (ParserSupply a) m r
Just like we said, values flow upstream and ParserStatus a values flow downstream.
ParserSupply
a
Now to a simple implementation: If we receive from downstream,
then we know there is a partial parsing result waiting for more input. If we
were to respond to this request with Parsing n(, then the partial parser
would continue consuming input, but if we change our response to Resume, a)(,
then the partial parser would be aborted and a new parser would start
consuming the given input. A simple implementation is quite straightforward:
Start, a)
skipPartialResults = runIdentityK . foreverK $ go
where go x@(Parsing _) = request x >>= \(_, a) -> respond (Start, a)
go x = request x >>= respond
We forward upstream the requests we get from downstream. However, in the case
of a status, we replace with Parsing nStart the first value in the
pair we receive from upstream before responding.
ParserSupply a
Now we can use this parsing control Proxy with some simple input and see it
working.
helloPipe6 :: (Proxy p, Monad m) => () -> Pipe p Text Name m r helloPipe6 = parserInputD >-> skipPartialResults >-> parserD hello
>>>runProxy $ fromListS input1 >-> helloPipe6 >-> printDName "Kate" Name "Mary"
Try for yourself
This module exports the following previous examples so that you can try them.
skipPartialResults :: (Monad m, Proxy p, AttoparsecInput a) => ParserStatus a -> p (ParserStatus a) (ParserSupply a) (ParserStatus a) (ParserSupply a) m rSource