Control/Proxy/Parse/Tutorial.hs

{-| This module provides the tutorial for the @pipes-parse@ library

    This tutorial assumes that you have read the @pipes@ tutorial in
    @Control.Proxy.Tutorial@.
-}

module Control.Proxy.Parse.Tutorial (
    -- * Introduction
    -- $introduction

    -- * End of input
    -- $eof

    -- * Compatibility
    -- $compatibility

    -- * Pushback and leftovers
    -- $leftovers

    -- * Diverse leftovers
    -- $diverse

    -- * Isolating leftovers
    -- $mix

    -- * Return value
    -- $return

    -- * Resumable Parsing
    -- $resume

    -- * Nesting
    -- $nesting

    -- * Conclusion
    -- $conclusion
    ) where

import Control.Proxy
import Control.Proxy.Parse

{- $introduction
    @pipes-parse@ provides utilities commonly required for parsing streams using
    @pipes@:

    * End of input utilities and conventions for the @pipes@ ecosystem

    * Pushback and leftovers support for saving unused input

    * Tools to combine parsing stages with diverse or isolated leftover buffers

    * Ways to delimit parsers to subsets of streams

    Use these utilities to parse and validate streaming input in constant
    memory.
-}

{- $eof
    To guard an input stream against termination, protect it with the 'wrap'
    function:

> wrap :: (Monad m, Proxy p) => p a' a b' b m r -> p a' a b' (Maybe b) m s

    This wraps all output values in a 'Just' and then protects against
    termination by producing a never-ending stream of 'Nothing' values:

>>> -- Before
>>> runProxy $ enumFromToS 1 3 >-> printD
1
2
3
>>> -- After
>>> runProxy $ wrap . enumFromToS 1 3 >-> printD
Just 1
Just 2
Just 3
Nothing
Nothing
Nothing
Nothing
...

    You can also 'unwrap' streams:

> unwrap :: (Monad m, Proxy p) => x -> p x (Maybe a) x a m ()

    'unwrap' behaves like the inverse of 'wrap'.  Compose 'unwrap' downstream of
    a pipe to unwrap every 'Just' and terminate on the first 'Nothing':

> wrap . p >-> unwrap = p

    You will commonly use 'unwrap' to terminate an infinite stream:

>>> runProxy $ wrap . enumFromToS 1 3 >-> printD >-> unwrap
Just 1
Just 2
Just 3
Nothing

-}

{- $compatibility
    What if we want to ignore the 'Maybe' machinery entirely and interact with
    the original unwrapped stream?  We can use 'fmapPull' to lift existing
    proxies to ignore all 'Nothing's and only operate on the 'Just's:

> fmapPull
>     :: (Monad m, Proxy p)
>     => (x -> p x        a  x        b  m r)
>     -> (x -> p x (Maybe a) x (Maybe b) m r)

    We can use this to lift 'printD' to operate on the original stream:

>>> runProxy $ wrap . enumFromToS 1 3 >-> fmapPull printD >-> unwrap
1
2
3

    This lifting cleanly distributes over composition and obeys the following
    laws:

> fmapPull (f >-> g) = fmapPull f >-> fmapPull g
>
> fmapPull pull = pull

    You can navigate even more complicated mixtures of 'Maybe'-aware and
    'Maybe'-oblivious code using 'bindPull' and 'returnPull'.

    @pipes-parse@ requires no buy-in from the rest of the @pipes@ ecosystem
    thanks to these adapter routines that automatically lift existing pipes to
    interoperate with end-of-input protocols.
-}

{- $leftovers
    To take advantage of leftovers support, just replace your 'request's with
    'draw':

> draw :: (Monad m, Proxy p) => StateP [a] p () (Maybe a) y' y m (Maybe a)

    ... and use 'unDraw' to push back leftovers:

> unDraw :: (Monad m, Proxy p) => a -> StateP [a] p x' x y' y m ()

    These both use a last-in-first-out (LIFO) leftovers buffer of type @[a]@
    stored in a 'StateP' layer.  'unDraw' prepends elements to this list of
    leftovers and 'draw' will consume elements from the head of the leftovers
    list until it is empty before requesting new input from upstream:

> consumer :: (Proxy p) => () -> Consumer (StateP [a] p) (Maybe Int) IO ()
> consumer () = do
>     ma <- draw
>     lift $ print ma
>     -- You can push back values you never drew
>     unDraw 99
>     -- You can push back more than one value at a time
>     case ma of
>         Nothing -> return ()
>         -- The leftovers buffer only stores unwrapped values
>         Just a  -> unDraw a
>     -- Values come out of the buffer in last-in-first-out (LIFO) order
>     replicateM_ 2 $ do
>         ma <- draw
>         lift $ print ma

    To run the 'StateP' layer, just provide an empty initial state using
    'mempty':

>>> runProxy $ evalStateK mempty $ wrap . enumFromS 1 >-> consumer
Just 1
Just 1
Just 99

-}

{- $diverse
    Why use 'mempty' instead of @[]@?  @pipes-parse@ lets you easily mix
    distinct leftovers buffers into the same 'StateP' layer and 'mempty' will
    still do the correct thing when you use multiple buffers.

    For example, suppose that we need to compose parsing pipes that have
    different input types and therefore different types of leftovers buffers,
    such as the following two parsers:

> tallyLength
>     :: (Monad m, Proxy p)
>     => () -> Pipe (StateP [String] p) (Maybe String) (Maybe Int) m r
> tallyLength () = loop 0
>   where
>     loop tally = do
>         respond (Just tally)
>         mstr <- draw
>         case mstr of
>             Nothing  -> forever $ respond Nothing
>             Just str -> loop (tally + length str)
>
> adder
>     :: (Monad m, Proxy p)
>     => () -> Consumer (StateP [Int] p) (Maybe Int) m Int
> adder () = fmap sum $ drawAll ()

    We can use 'zoom' to unify these two parsers to share the same 'StateP'
    layer:

> combined
>     :: (Monad m, Proxy p)
>     => () -> Consumer (StateP ([String], [Int]) p) (Maybe String) m Int
> --                                 ^       ^
> --                                 |       |
> --        Two leftovers buffers ---+-------+
> combined = zoom _fst . tallyLength >-> zoom _snd . adder
>
> source :: (Monad m, Proxy p) => () -> Producer p String m ()
> source = fromListS ["One", "Two", "Three"]

    'zoom' takes a @Lens'@ as an argument which specifies which subset of the
    state that each parser will use.  '_fst' directs the @tallyLength@ parser to
    use the @[String]@ leftovers buffer and '_snd' directs the @adder@ parser to
    use the @[Int]@ leftovers buffer.

    Notice that we can still run the mixture of buffers by supplying 'mempty':

>>> runProxy $ evalStateK mempty $ wrap . source >-> combined
20

    This works because:

> (mempty :: ([String], [Int])) = ([], [])

    Let's study the type of 'zoom' to understand how it works:

> -- zoom's true type is slightly different to avoid a dependency on `lens`
> zoom :: Lens' s1 s2 -> StateP s2 p a' a b' b m r -> StateP s1 p a' a b' b m r

    'zoom' behaves like the function of the same name from the @lens@ package
    and zooms in on a sub-state using the provided lens.  When we give it the
    '_fst' lens we zoom in on the first element of a tuple:

> _fst :: Lens' (s1, s2) s1
>
> zoom _fst :: StateP s1 p a' a b' b m r -> StateP (s1, s2) p a' a b' b m r

    ... and when we give it the '_snd' lens we zoom in on the second element of
    a tuple:

> _snd :: Lens' (s1, s2) s2
>
> zoom _snd :: StateP s2 p a' a b' b m r -> StateP (s1, s2) p a' a b' b m r

    '_fst' and '_snd' are like '_1' and '_2' from the @lens@ package, except
    with a more monomorphic type.  This ensures that type inference works
    correctly when supplying 'mempty' as the initial state.

    If you want to merge more than one leftovers buffer, you can either nest
    pairs of tuples:

> p = zoom _fst . p1 >-> zoom (_snd . _fst) . p2 >-> zoom (_snd . _snd) . p3

    ... or you can create a data type that holds all your leftovers and generate
    lenses to its fields:

> import Control.Lens hiding (zoom)
>
> data Leftovers = Leftovers
>     { _buf1 :: [String]
>     , _buf2 :: [Int]
>     , _buf3 :: [Double]
>     }
> makeLenses ''Leftovers
> -- Generates:
> -- buf1 :: Lens' Leftovers [String]
> -- buf2 :: Lens' Leftovers [Int]
> -- buf3 :: Lens' Leftovers [Double]
>
> instance Monoid Leftovers where
>     mempty = Leftovers [] [] []
>     mappend (Leftovers as bs cs) (Leftovers as' bs' cs')
>         = Leftovers (as ++ as') (bs ++ bs') (cs ++ cs')
>
> p = zoom buf1 . p1 >-> zoom buf2 . p2 >-> zoom buf3 . p3

    'zoom' works seamlessly with all lenses from the @lens@ package, but you
    don't need a @lens@ dependency to use @pipes-parse@.
-}

{- $mix
    'zoom' isn't the only way to isolate buffers.  Let's say that you want to
    mix the following three @pipes-parse@ utilities:

> -- Transmit up to the specified number of elements
> passUpTo
>     :: (Monad m, Proxy p)
>     => Int -> () -> Pipe (StateP [a] p) (Maybe a) (Maybe a) m r
>
> -- Fold all input into a list
> drawAll :: (Monad m, Proxy p) => () -> StateP [a] p () (Maybe a) y' y m [a]
>
> -- Check if at end of input stream
> isEndOfInput :: (Monad m, Proxy p) => StateP [a] p () (Maybe a) y' y m Bool

    We might expect the following code to yield chunks of three elements at a
    time:

> chunks :: (Monad m, Proxy p) => () -> Pipe (StateP [a] p) (Maybe a) [a] m ()
> chunks () = loop
>   where
>     loop = do
>         as <- (passUpTo 3 >-> drawAll) ()
>         respond as
>         eof <- isEndOfInput
>         unless eof loop

    ... but it doesn't:

>>> runProxy $ evalStateK mempty $ wrap . enumFromToS 1 15 >-> chunks >-> printD
[1,2,3]
[4,5,6,7]
[8,9,10,11]
[12,13,14,15]

    @chunks@ behaves strangely because 'drawAll' shares the same leftovers
    buffer as 'passUpTo' and 'isEndOfInput'.  After the first chunk completes,
    'isEndOfInput' peeks at the next value, @4@, and immediately 'unDraw's the
    value.  'drawAll' retrieves this undrawn value from the leftovers before
    consulting 'passUpTo' which is why every subsequent list contains an extra
    element.

    We often don't want composed parsing stages like 'drawAll' to share the same
    leftovers buffer as upstream stages, but we also don't want to use 'zoom' to
    add yet another permanent buffer to our global leftovers state.  To solve
    this, we embed 'drawAll' within a transient 'StateP' layer using
    'evalStateK':

> chunks () = loop
>   where
>     loop = do
>         as  <- (passUpTo 3 >-> evalStateK mempty drawAll) ()
>         respond as
>         eof <- isEndOfInput
>         unless eof loop

    This runs 'drawAll' within a fresh temporary buffer so that it does not
    reuse the same buffer as the surrounding pipe:

>>> runProxy $ evalStateK mempty $ wrap . enumFromToS 1 15 >-> chunks >-> printD
[1,2,3]
[4,5,6]
[7,8,9]
[10,11,12]
[13,14,15]

    Conversely, remove the 'evalStateK' if you deliberately want downstream
    parsers to share the same leftovers buffers.
-}

{- $return
    'wrap' allows you to return values directly from parsers because it produces
    a polymorphic return value:

> -- The 's' is polymorphic and will type-check as anything
> wrap :: (Monad m, Proxy p) => p a' a b' b m r -> p a' a b' (Maybe b) m s

    This means that if you compose a parser downstream the parser can return the
    result directly:

> parser
>     :: (Monad m, Proxy p)
>     => () -> Consumer (StateP [a] p) (Maybe a) m (Maybe a, Maybe a)
> parser () = do
>     mx <- draw
>     my <- draw
>     return (mx, my)  -- Return the result

    The polymorphic return value of 'wrap' will type-check as anything,
    including our parser's result:

> session
>     :: (Monad m, Proxy p)
>     => () -> Session (StateP [Int] p) m (Maybe Int, Maybe Int)
> session = wrap . enumFromToS 0 9 >-> parser

    So we can run this 'Session' and retrieve the result directly from the
    return value:

>>> runProxy $ evalStateK session
(Just 0, Just 1)

-}

{- $resume
    You can save leftovers buffers if you need to interrupt parsing for any
    reason.  Just replace 'evalStateK' with 'runStateK':

>>> let session = wrap . enumFromS 0 >-> passWhile (< 3) >-> printD >-> unwrap
>>> runProxy $ runStateK mempty session
Just 0
Just 1
Just 2
Nothing
((), [3])

    This returns the leftovers buffers in the result so that you can reuse them
    later on.  In the above example, 'passWhile' pushed back the @3@ input onto
    the leftovers buffer, so the result includes the unused @3@.
-}

{- $nesting
    @pipes-parse@ allows you to cleanly delimit the scope of sub-parsers by
    restricting them to a subset of the stream, as the following example
    illustrates:

> import Control.Proxy
> import Control.Proxy.Parse
>
> parser
>     :: (Proxy p)
>     => () -> Consumer (StateP [Int] p) (Maybe Int) IO ([Int], [Int])
> parser () = do
>     lift $ putStrLn "Skip the first three elements"
>     (passUpTo 3 >-> evalStateK mempty skipAll) ()
>     lift $ putStrLn "Restrict subParser to consecutive elements less than 10"
>     (passWhile (< 10) >-> evalStateK mempty subParser) ()
>
> subParser
>     :: (Proxy p)
>     => () -> Consumer (StateP [Int] p) (Maybe Int) IO ([Int], [Int])
> subParser () = do
>     lift $ putStrLn "- Get the next four elements"
>     xs <- (passUpTo 4 >-> evalStateK mempty drawAll) ()
>     lift $ putStrLn "- Get the rest of the input"
>     ys <- drawAll ()
>     return (xs, ys)

    Notice how we use 'evalStateK' each time we subset a parser so that the
    sub-parser uses a fresh and transient leftovers buffer.

>>> runProxy $ evalStateK mempty $ wrap . enumFromS 0 >-> parser
Skip the first three elements
Restrict subParser to consecutive elements less than 10
- Get the next four elements
- Get the rest of the input
([3,4,5,6],[7,8,9])

-}

{- $conclusion
    @pipes-parse@ provides standardized end-of-input and leftovers utilities for
    you to use in your @pipes@-based libraries.  Unlike other streaming
    libraries, you can:

    * mix or isolate leftovers buffers in a precise and type-safe way,

    * easily delimit parsers to subsets of the input, and

    * ignore standardization, thanks to compatibility functions like 'fmapPull'.

    This library is intentionally minimal and datatype-specific parsers belong
    in derived libraries.  This makes @pipes-parse@ a very light-weight and
    stable dependency that you can use in your own projects.

    You can ask any questions about @pipes-parse@ and other @pipes@ libraries on
    the official @pipes@ mailing list at
    <mailto:haskell-pipes@googlegroups.com>.
-}