streamly-0.7.3.1: Beautiful Streaming, Concurrent and Reactive Composition
Copyright(c) 2020 Composewell Technologies
LicenseBSD3
Maintainerstreamly@composewell.com
Stabilityexperimental
PortabilityGHC
Safe HaskellSafe-Inferred
LanguageHaskell2010

Streamly.Internal.Data.Parser.Types

Description

Streaming and backtracking parsers.

Parsers just extend folds. Please read the Fold design notes in Streamly.Internal.Data.Fold.Types for background on the design.

Parser Design

The Parser type or a parsing fold is a generalization of the Fold type. The Fold type always succeeds on each input. Therefore, it does not need to buffer the input. In contrast, a Parser may fail and backtrack to replay the input again to explore another branch of the parser. Therefore, it needs to buffer the input. Therefore, a Parser is a fold with some additional requirements. To summarize, unlike a Fold, a Parser:

  1. may not generate a new value of the accumulator on every input, it may generate a new accumulator only after consuming multiple input elements (e.g. takeEQ).
  2. on success may return some unconsumed input (e.g. takeWhile)
  3. may fail and return all input without consuming it (e.g. satisfy)
  4. backtrack and start inspecting the past input again (e.g. alt)

These use cases require buffering and replaying of input. To facilitate this, the step function of the Fold is augmented to return the next state of the fold along with a command tag using a Step functor, the tag tells the fold driver to manipulate the future input as the parser wishes. The Step functor provides the following commands to the fold driver corresponding to the use cases outlined in the previous para:

  1. Skip: hold (buffer) the input or go back to a previous position in the stream
  2. Yield, Stop: tell how much input is unconsumed
  3. Error: indicates that the parser has failed without a result

How a Parser Works?

A parser is just like a fold, it keeps consuming inputs from the stream and accumulating them in an accumulator. The accumulator of the parser could be a singleton value or it could be a collection of values e.g. a list.

The parser may build a new output value from multiple input items. When it consumes an input item but needs more input to build a complete output item it uses Skip 0 s, yielding the intermediate state s and asking the driver to provide more input. When the parser determines that a new output value is complete it can use a Stop n b to terminate the parser with n items of input unused and the final value of the accumulator returned as b. If at any time the parser determines that the parse has failed it can return Error err.

A parser building a collection of values (e.g. a list) can use the Yield constructor whenever a new item in the output collection is generated. If a parser building a collection of values has yielded at least one value then it considered successful and cannot fail after that. In the current implementation, this is not automatically enforced, there is a rule that the parser MUST use only Stop for termination after the first Yield, it cannot use Error. It may be possible to change the implementation so that this rule is not required, but there may be some performance cost to it.

takeWhile and some combinators are good examples of efficient implementations using all features of this representation. It is possible to idiomatically build a collection of parsed items using a singleton parser and Alternative instance instead of using a multi-yield parser. However, this implementation is amenable to stream fusion and can therefore be much faster.

Error Handling

When a parser's step function is invoked it may iterminate by either a Stop or an Error return value. In an Alternative composition an error return can make the composed parser backtrack and try another parser.

If the stream stops before a parser could terminate then we use the extract function of the parser to retrieve the last yielded value of the parser. If the parser has yielded at least one value then extract MUST return a value without throwing an error, otherwise it uses the ParseError exception to throw an error.

We chose the exception throwing mechanism for extract instead of using an explicit error return via an Either type for keeping the interface simple as most of the time we do not need to catch the error in intermediate layers. Note that we cannot use exception throwing mechanism in step function because of performance reasons. Error constructor in that case allows loop fusion and better performance.

Future Work

It may make sense to move "takeWhile" type of parsers, which cannot fail but need some lookahead, to splitting folds. This will allow such combinators to be accepted where we need an unfailing Fold type.

Based on application requirements it should be possible to design even a richer interface to manipulate the input stream/buffer. For example, we could randomly seek into the stream in the forward or reverse directions or we can even seek to the end or from the end or seek from the beginning.

We can distribute and scan/parse a stream using both folds and parsers and merge the resulting streams using different merge strategies (e.g. interleaving or serial).

Synopsis

Documentation

data Step s b Source #

The return type of a Parser step.

A parser is driven by a parse driver one step at a time, at any time the driver may extract the result of the parser. The parser may ask the driver to backtrack at any point, therefore, the driver holds the input up to a point of no return in a backtracking buffer. The buffer grows or shrinks based on the return values of the parser step execution.

When a parser step is executed it generates a new intermediate state of the parse result along with a command to the driver. The command tells the driver whether to keep the input stream for a potential backtracking later on or drop it, and how much to keep. The constructors of Step represent the commands to the driver.

Internal

Constructors

Yield Int s

Yield offset state indicates that the parser has yielded a new result which is a point of no return. The result can be extracted using extract. The driver drops the buffer except offset elements before the current position in stream. The rule is that if a parser has yielded at least once it cannot return a failure result.

Skip Int s

Skip offset state indicates that the parser has consumed the current input but no new result has been generated. A new state is generated. However, if we use extract on state it will generate a result from the previous Yield. When offset is non-zero it is a backward offset from the current position in the stream from which the driver will feed the next input to the parser. The offset cannot be beyond the latest point of no return created by Yield.

Stop Int b

Stop offset result asks the driver to stop driving the parser because it has reached a fixed point and further input will not change the result. offset is the count of unused elements which includes the element on which Stop occurred.

Error String

An error makes the parser backtrack to the last checkpoint and try another alternative.

Instances

Instances details
Functor (Step s) Source # 
Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

fmap :: (a -> b) -> Step s a -> Step s b #

(<$) :: a -> Step s b -> Step s a #

data Parser m a b Source #

A parser is a fold that can fail and is represented as Parser step initial extract. Before we drive a parser we call the initial action to retrieve the initial state of the fold. The parser driver invokes step with the state returned by the previous step and the next input element. It results into a new state and a command to the driver represented by Step type. The driver keeps invoking the step function until it stops or fails. At any point of time the driver can call extract to inspect the result of the fold. It may result in an error or an output value.

Internal

Constructors

forall s. Parser (s -> a -> m (Step s b)) (m s) (s -> m b) 

Instances

Instances details
MonadCatch m => Alternative (Parser m a) Source #

Alternative instance using alt.

Note: The implementation of <|> is not lazy in the second argument. The following code will fail:

>>> S.parse (PR.satisfy (> 0) <|> undefined) $ S.fromList [1..10]
Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

empty :: Parser m a a0 #

(<|>) :: Parser m a a0 -> Parser m a a0 -> Parser m a a0 #

some :: Parser m a a0 -> Parser m a [a0] #

many :: Parser m a a0 -> Parser m a [a0] #

Monad m => Applicative (Parser m a) Source #

Applicative form of splitWith.

Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

pure :: a0 -> Parser m a a0 #

(<*>) :: Parser m a (a0 -> b) -> Parser m a a0 -> Parser m a b #

liftA2 :: (a0 -> b -> c) -> Parser m a a0 -> Parser m a b -> Parser m a c #

(*>) :: Parser m a a0 -> Parser m a b -> Parser m a b #

(<*) :: Parser m a a0 -> Parser m a b -> Parser m a a0 #

Functor m => Functor (Parser m a) Source # 
Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

fmap :: (a0 -> b) -> Parser m a a0 -> Parser m a b #

(<$) :: a0 -> Parser m a b -> Parser m a a0 #

Monad m => Monad (Parser m a) Source #

Monad composition can be used for lookbehind parsers, we can make the future parses depend on the previously parsed values.

If we have to parse "a9" or "9a" but not "99" or "aa" we can use the following parser:

backtracking :: MonadCatch m => PR.Parser m Char String
backtracking =
    sequence [PR.satisfy isDigit, PR.satisfy isAlpha]
    <|>
    sequence [PR.satisfy isAlpha, PR.satisfy isDigit]

We know that if the first parse resulted in a digit at the first place then the second parse is going to fail. However, we waste that information and parse the first character again in the second parse only to know that it is not an alphabetic char. By using lookbehind in a Monad composition we can avoid redundant work:

data DigitOrAlpha = Digit Char | Alpha Char

lookbehind :: MonadCatch m => PR.Parser m Char String
lookbehind = do
    x1 <-    Digit <$> PR.satisfy isDigit
         <|> Alpha <$> PR.satisfy isAlpha

    -- Note: the parse depends on what we parsed already
    x2 <- case x1 of
        Digit _ -> PR.satisfy isAlpha
        Alpha _ -> PR.satisfy isDigit

    return $ case x1 of
        Digit x -> [x,x2]
        Alpha x -> [x,x2]
Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

(>>=) :: Parser m a a0 -> (a0 -> Parser m a b) -> Parser m a b #

(>>) :: Parser m a a0 -> Parser m a b -> Parser m a b #

return :: a0 -> Parser m a a0 #

MonadCatch m => MonadPlus (Parser m a) Source #

mzero is same as empty, it aborts the parser. mplus is same as <|>, it selects the first succeeding parser.

Internal

Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

mzero :: Parser m a a0 #

mplus :: Parser m a a0 -> Parser m a a0 -> Parser m a a0 #

newtype ParseError Source #

This exception is used for two purposes:

  • When a parser ultimately fails, the user of the parser is intimated via this exception.
  • When the "extract" function of a parser needs to throw an error.

Internal

Constructors

ParseError String 

yield :: Monad m => b -> Parser m a b Source #

A parser that always yields a pure value without consuming any input.

Internal

yieldM :: Monad m => m b -> Parser m a b Source #

A parser that always yields the result of an effectful action without consuming any input.

Internal

splitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #

Sequential application. Apply two parsers sequentially to an input stream. The input is provided to the first parser, when it is done the remaining input is provided to the second parser. If both the parsers succeed their outputs are combined using the supplied function. The operation fails if any of the parsers fail.

This undoes an "append" of two streams, it splits the streams using two parsers and zips the results.

This implementation is strict in the second argument, therefore, the following will fail:

>>> S.parse (PR.satisfy (> 0) *> undefined) $ S.fromList [1]

Internal

die :: MonadThrow m => String -> Parser m a b Source #

A parser that always fails with an error message without consuming any input.

Internal

dieM :: MonadThrow m => m String -> Parser m a b Source #

A parser that always fails with an effectful error message and without consuming any input.

Internal

splitSome :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c Source #

See documentation of some.

Internal

splitMany :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c Source #

See documentation of many.

Internal

alt :: Monad m => Parser m x a -> Parser m x a -> Parser m x a Source #

Sequential alternative. Apply the input to the first parser and return the result if the parser succeeds. If the first parser fails then backtrack and apply the same input to the second parser and return the result.

Note: This implementation is not lazy in the second argument. The following will fail:

>>> S.parse (PR.satisfy (> 0) `PR.alt` undefined) $ S.fromList [1..10]

Internal