streamly-core-0.1.0: Streaming, parsers, arrays and more
Copyright(c) 2020 Composewell Technologies
LicenseBSD-3-Clause
Maintainerstreamly@composewell.com
Stabilityexperimental
PortabilityGHC
Safe HaskellNone
LanguageHaskell2010

Streamly.Internal.Data.Parser.ParserD.Type

Description

Streaming and backtracking parsers.

Parsers just extend folds. Please read the Fold design notes in Streamly.Internal.Data.Fold.Type for background on the design.

Parser Design

The Parser type or a parsing fold is a generalization of the Fold type. The Fold type always succeeds on each input. Therefore, it does not need to buffer the input. In contrast, a Parser may fail and backtrack to replay the input again to explore another branch of the parser. Therefore, it needs to buffer the input. Therefore, a Parser is a fold with some additional requirements. To summarize, unlike a Fold, a Parser:

  1. may not generate a new value of the accumulator on every input, it may generate a new accumulator only after consuming multiple input elements (e.g. takeEQ).
  2. on success may return some unconsumed input (e.g. takeWhile)
  3. may fail and return all input without consuming it (e.g. satisfy)
  4. backtrack and start inspecting the past input again (e.g. alt)

These use cases require buffering and replaying of input. To facilitate this, the step function of the Fold is augmented to return the next state of the fold along with a command tag using a Step functor, the tag tells the fold driver to manipulate the future input as the parser wishes. The Step functor provides the following commands to the fold driver corresponding to the use cases outlined in the previous para:

  1. Continue: buffer the current input and optionally go back to a previous position in the stream
  2. Partial: buffer the current input and optionally go back to a previous position in the stream, drop the buffer before that position.
  3. Done: parser succeeded, returns how much input was leftover
  4. Error: indicates that the parser has failed without a result

How a Parser Works?

A parser is just like a fold, it keeps consuming inputs from the stream and accumulating them in an accumulator. The accumulator of the parser could be a singleton value or it could be a collection of values e.g. a list.

The parser may build a new output value from multiple input items. When it consumes an input item but needs more input to build a complete output item it uses Continue 0 s, yielding the intermediate state s and asking the driver to provide more input. When the parser determines that a new output value is complete it can use a Done n b to terminate the parser with n items of input unused and the final value of the accumulator returned as b. If at any time the parser determines that the parse has failed it can return Error err.

A parser building a collection of values (e.g. a list) can use the Partial constructor whenever a new item in the output collection is generated. If a parser building a collection of values has yielded at least one value then it is considered successful and cannot fail after that. In the current implementation, this is not automatically enforced, there is a rule that the parser MUST use only Done for termination after the first Partial, it cannot use Error. It may be possible to change the implementation so that this rule is not required, but there may be some performance cost to it.

takeWhile and some combinators are good examples of efficient implementations using all features of this representation. It is possible to idiomatically build a collection of parsed items using a singleton parser and Alternative instance instead of using a multi-yield parser. However, this implementation is amenable to stream fusion and can therefore be much faster.

Error Handling

When a parser's step function is invoked it may terminate by either a Done or an Error return value. In an Alternative composition an error return can make the composed parser backtrack and try another parser.

If the stream stops before a parser could terminate then we use the extract function of the parser to retrieve the last yielded value of the parser. If the parser has yielded at least one value then extract MUST return a value without throwing an error, otherwise it uses the ParseError exception to throw an error.

We chose the exception throwing mechanism for extract instead of using an explicit error return via an Either type for keeping the interface simple as most of the time we do not need to catch the error in intermediate layers. Note that we cannot use exception throwing mechanism in step function because of performance reasons. Error constructor in that case allows loop fusion and better performance.

Optimizing backtracking

Applicative Composition

If a parser once returned Partial it can never fail after that. This is used to reduce the buffering. A Partial results in dropping the buffer and we cannot backtrack before that point.

Parsers can be composed using an Alternative, if we are in an alternative composition we may have to backtrack to try the other branch. When we compose two parsers using applicative f $ p1 * p2 we can return a Partial result only after both the parsers have succeeded. While running p1 we have to ensure that the input is not dropped until we have run p2, therefore we have to return a Continue instead of a Partial.

However, if we know they both cannot fail then we know that the composed parser can never fail. For this reason we should have "backtracking folds" as a separate type so that we can compose them in an efficient manner. In p1 itself we can drop the buffer as soon as a Partial result arrives. In fact, there is no Alternative composition for folds because they cannot fail.

Alternative Composition

In p1 | p2 as soon as the parser p1 returns Partial we know that it will not fail and we can immediately drop the buffer.

If we are not using the parser in an alternative composition we can downgrade the parser to a backtracking fold and use the "backtracking fold"'s applicative for more efficient implementation. To downgrade we can translate the Error of parser to an exception. This gives us best of both worlds, the applicative as well as alternative would have optimal backtracking buffer.

The "many" for parsers would be different than "many" for folds. In case of folds an error would be propagated. In case of parsers the error would be ignored.

Implementation Approach

Backtracking folds have an issue with tee style composition because each fold can backtrack independently, we will need independent buffers. Though this may be possible to implement it may not be efficient especially for folds that do not backtrack at all. Three types are possible, optimized for different use cases:

  • Non-backtracking folds: efficient Tee
  • Backtracking folds: efficient applicative
  • Parsers: alternative

Downgrade parsers to backtracking folds for applicative used without alternative. Upgrade backtracking folds to parsers when we have to use them as the last alternative.

Future Work

It may make sense to move "takeWhile" type of parsers, which cannot fail but need some lookahead, to splitting folds. This will allow such combinators to be accepted where we need an unfailing Fold type.

Based on application requirements it should be possible to design even a richer interface to manipulate the input stream/buffer. For example, we could randomly seek into the stream in the forward or reverse directions or we can even seek to the end or from the end or seek from the beginning.

We can distribute and scan/parse a stream using both folds and parsers and merge the resulting streams using different merge strategies (e.g. interleaving or serial).

Naming

As far as possible, try that the names of the combinators in this module are consistent with:

Synopsis

Setup

>>> :m
>>> import Control.Applicative ((<|>))
>>> import Data.Bifunctor (second)
>>> import Data.Char (isSpace)
>>> import qualified Data.Foldable as Foldable
>>> import qualified Data.Maybe as Maybe
>>> import Streamly.Data.Fold (Fold)
>>> import Streamly.Data.Parser (Parser)
>>> import qualified Streamly.Data.Fold as Fold
>>> import qualified Streamly.Data.Parser as Parser
>>> import qualified Streamly.Data.Stream as Stream

For APIs that have not been released yet.

>>> import qualified Streamly.Internal.Data.Fold as Fold
>>> import qualified Streamly.Internal.Data.Parser as Parser

Types

data Initial s b Source #

The type of a Parser's initial action.

Internal

Constructors

IPartial !s

Wait for step function to be called with state s.

IDone !b

Return a result right away without an input.

IError !String

Return an error right away without an input.

Instances

Instances details
Bifunctor Initial Source #

first maps on IPartial and second maps on IDone.

Internal

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

bimap :: (a -> b) -> (c -> d) -> Initial a c -> Initial b d #

first :: (a -> b) -> Initial a c -> Initial b c #

second :: (b -> c) -> Initial a b -> Initial a c #

Functor (Initial s) Source #

Maps a function over the result held by IDone.

>>> fmap = second

Internal

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

fmap :: (a -> b) -> Initial s a -> Initial s b #

(<$) :: a -> Initial s b -> Initial s a #

data Step s b Source #

The return type of a Parser step.

The parse operation feeds the input stream to the parser one element at a time, representing a parse Step. The parser may or may not consume the item and returns a result. If the result is Partial we can either extract the result or feed more input to the parser. If the result is Continue, we must feed more input in order to get a result. If the parser returns Done then the parser can no longer take any more input.

If the result is Continue, the parse operation retains the input in a backtracking buffer, in case the parser may ask to backtrack in future. Whenever a 'Partial n' result is returned we first backtrack by n elements in the input and then release any remaining backtracking buffer. Similarly, 'Continue n' backtracks to n elements before the current position and starts feeding the input from that point for future invocations of the parser.

If parser is not yet done, we can use the extract operation on the state of the parser to extract a result. If the parser has not yet yielded a result, the operation fails with a ParseError exception. If the parser yielded a Partial result in the past the last partial result is returned. Therefore, if a parser yields a partial result once it cannot fail later on.

The parser can never backtrack beyond the position where the last partial result left it at. The parser must ensure that the backtrack position is always after that.

Pre-release

Constructors

Partial !Int !s

Partial count state. The following hold on Partial result:

  1. extract on state would succeed and give a result.
  2. Input stream position is reset to current position - count.
  3. All input before the new position is dropped. The parser can never backtrack beyond this position.
Continue !Int !s

Continue count state. The following hold on a Continue result:

  1. If there was a Partial result in past, extract on state would give that result as Done otherwise it may return Error or Continue.
  2. Input stream position is reset to current position - count.
  3. the input is retained in a backtrack buffer.
Done !Int !b

Done with leftover input count and result.

Done count result means the parser has finished, it will accept no more input, last count elements from the input are unused and the result of the parser is in result.

Error !String

Parser failed without generating any output.

The parsing operation may backtrack to the beginning and try another alternative.

Instances

Instances details
Bifunctor Step Source #

Map first function over the state and second over the result.

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

bimap :: (a -> b) -> (c -> d) -> Step a c -> Step b d #

first :: (a -> b) -> Step a c -> Step b c #

second :: (b -> c) -> Step a b -> Step a c #

Functor (Step s) Source #

fmap = second

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

fmap :: (a -> b) -> Step s a -> Step s b #

(<$) :: a -> Step s b -> Step s a #

extractStep :: Monad m => (s -> m (Step s1 b)) -> Step s b -> m (Step s1 b) Source #

Map an extract function over the state of Step

bimapOverrideCount :: Int -> (s -> s1) -> (b -> b1) -> Step s b -> Step s1 b1 Source #

Bimap discarding the count, and using the supplied count instead.

data Parser a m b Source #

A parser is a fold that can fail and is represented as Parser step initial extract. Before we drive a parser we call the initial action to retrieve the initial state of the fold. The parser driver invokes step with the state returned by the previous step and the next input element. It results into a new state and a command to the driver represented by Step type. The driver keeps invoking the step function until it stops or fails. At any point of time the driver can call extract to inspect the result of the fold. If the parser hits the end of input extract is called. It may result in an error or an output value.

Pre-release

Constructors

forall s. Parser (s -> a -> m (Step s b)) (m (Initial s b)) (s -> m (Step s b)) 

Instances

Instances details
Monad m => Monad (Parser a m) Source #

See documentation of Parser.

Although this implementation allows stream fusion, it has quadratic complexity, making it suitable only for a small number of compositions. As a thumb rule use it for less than 8 compositions, use ParserK otherwise.

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

(>>=) :: Parser a m a0 -> (a0 -> Parser a m b) -> Parser a m b #

(>>) :: Parser a m a0 -> Parser a m b -> Parser a m b #

return :: a0 -> Parser a m a0 #

Functor m => Functor (Parser a m) Source # 
Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

fmap :: (a0 -> b) -> Parser a m a0 -> Parser a m b #

(<$) :: a0 -> Parser a m b -> Parser a m a0 #

Monad m => MonadFail (Parser a m) Source # 
Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

fail :: String -> Parser a m a0 #

Monad m => Applicative (Parser a m) Source #

Applicative form of splitWith.

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

pure :: a0 -> Parser a m a0 #

(<*>) :: Parser a m (a0 -> b) -> Parser a m a0 -> Parser a m b #

liftA2 :: (a0 -> b -> c) -> Parser a m a0 -> Parser a m b -> Parser a m c #

(*>) :: Parser a m a0 -> Parser a m b -> Parser a m b #

(<*) :: Parser a m a0 -> Parser a m b -> Parser a m a0 #

(Monad m, MonadIO m) => MonadIO (Parser a m) Source # 
Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

liftIO :: IO a0 -> Parser a m a0 #

Monad m => Alternative (Parser a m) Source #

Sequential alternative. The input is first passed to the first parser, and if it succeeds, the result is returned. However, if the first parser fails, the parser driver backtracks and tries the same input on the second parser, returning the result if it succeeds.

Note: The implementation of <|> is not lazy in the second argument. The following code will fail:

>>> Stream.parse (Parser.satisfy (> 0) <|> undefined) $ Stream.fromList [1..10]
*** Exception: Prelude.undefined
...

WARNING! this is not suitable for large scale use. As a thumb rule stream fusion works well for less than 8 compositions of this operation, otherwise consider using ParserK. Do not use recursive parser implementations based on this Alternative instance.

Instance details

Defined in Streamly.Internal.Data.Parser.ParserD.Type

Methods

empty :: Parser a m a0 #

(<|>) :: Parser a m a0 -> Parser a m a0 -> Parser a m a0 #

some :: Parser a m a0 -> Parser a m [a0] #

many :: Parser a m a0 -> Parser a m [a0] #

newtype ParseError Source #

This exception is used when a parser ultimately fails, the user of the parser is intimated via this exception.

Pre-release

Constructors

ParseError String 

rmapM :: Monad m => (b -> m c) -> Parser a m b -> Parser a m c Source #

rmapM f parser maps the monadic function f on the output of the parser.

>>> rmap = fmap

Constructors

fromPure :: Monad m => b -> Parser a m b Source #

A parser that always yields a pure value without consuming any input.

fromEffect :: Monad m => m b -> Parser a m b Source #

A parser that always yields the result of an effectful action without consuming any input.

splitWith :: Monad m => (a -> b -> c) -> Parser x m a -> Parser x m b -> Parser x m c Source #

Sequential parser application.

Apply two parsers sequentially to an input stream. The first parser runs and processes the input, the remaining input is then passed to the second parser. If both parsers succeed, their outputs are combined using the supplied function. If either parser fails, the operation fails.

This implementation is strict in the second argument, therefore, the following will fail:

>>> Stream.parse (Parser.splitWith const (Parser.satisfy (> 0)) undefined) $ Stream.fromList [1]
*** Exception: Prelude.undefined
...

Although this implementation allows stream fusion, it has quadratic complexity, making it suitable only for a small number of compositions. As a thumb rule use it for less than 8 compositions, use ParserK otherwise.

Below are some common idioms that can be expressed using splitWith and other parser primitives:

>>> span p f1 f2 = Parser.splitWith (,) (Parser.takeWhile p f1) (Parser.fromFold f2)
>>> spanBy eq f1 f2 = Parser.splitWith (,) (Parser.groupBy eq f1) (Parser.fromFold f2)

Pre-release

split_ :: Monad m => Parser x m a -> Parser x m b -> Parser x m b Source #

Sequential parser application ignoring the output of the first parser. Apply two parsers sequentially to an input stream. The input is provided to the first parser, when it is done the remaining input is provided to the second parser. The output of the parser is the output of the second parser. The operation fails if any of the parsers fail.

This implementation is strict in the second argument, therefore, the following will fail:

>>> Stream.parse (Parser.split_ (Parser.satisfy (> 0)) undefined) $ Stream.fromList [1]
*** Exception: Prelude.undefined
...

Although this implementation allows stream fusion, it has quadratic complexity, making it suitable only for a small number of compositions. As a thumb rule use it for less than 8 compositions, use ParserK otherwise.

Pre-release

die :: Monad m => String -> Parser a m b Source #

A parser that always fails with an error message without consuming any input.

dieM :: Monad m => m String -> Parser a m b Source #

A parser that always fails with an effectful error message and without consuming any input.

Pre-release

splitSome :: Monad m => Parser a m b -> Fold m b c -> Parser a m c Source #

See documentation of some.

Pre-release

splitMany :: Monad m => Parser a m b -> Fold m b c -> Parser a m c Source #

See documentation of many.

Pre-release

splitManyPost :: Monad m => Parser a m b -> Fold m b c -> Parser a m c Source #

Like splitMany, but inner fold emits an output at the end even if no input is received.

Internal

alt :: Monad m => Parser x m a -> Parser x m a -> Parser x m a Source #

Sequential alternative. The input is first passed to the first parser, and if it succeeds, the result is returned. However, if the first parser fails, the parser driver backtracks and tries the same input on the second parser, returning the result if it succeeds.

Note: This implementation is not lazy in the second argument. The following will fail:

> Stream.parse (Parser.satisfy (> 0) `Parser.alt` undefined) $ Stream.fromList [1..10]
  • ** Exception: Prelude.undefined

Although this implementation allows stream fusion, it has quadratic complexity, making it suitable only for a small number of compositions. As a thumb rule use it for less than 8 compositions, use ParserK otherwise.

Time Complexity: O(n^2) where n is the number of compositions.

Pre-release

concatMap :: Monad m => (b -> Parser a m c) -> Parser a m b -> Parser a m c Source #

Map a Parser returning function on the result of a Parser.

Pre-release

Input transformation

lmap :: (a -> b) -> Parser b m r -> Parser a m r Source #

lmap f parser maps the function f on the input of the parser.

>>> Stream.parse (Parser.lmap (\x -> x * x) (Parser.fromFold Fold.sum)) (Stream.enumerateFromTo 1 100)
Right 338350
lmap = Parser.lmapM return

lmapM :: Monad m => (a -> m b) -> Parser b m r -> Parser a m r Source #

lmapM f parser maps the monadic function f on the input of the parser.

filter :: Monad m => (a -> Bool) -> Parser a m b -> Parser a m b Source #

Include only those elements that pass a predicate.

>>> Stream.parse (Parser.filter (> 5) (Parser.fromFold Fold.sum)) $ Stream.fromList [1..10]
Right 40

noErrorUnsafeSplitWith :: Monad m => (a -> b -> c) -> Parser x m a -> Parser x m b -> Parser x m c Source #

Works correctly only if both the parsers are guaranteed to never fail.

noErrorUnsafeSplit_ :: Monad m => Parser x m a -> Parser x m b -> Parser x m b Source #

noErrorUnsafeConcatMap :: Monad m => (b -> Parser a m c) -> Parser a m b -> Parser a m c Source #