Copyright | (c) 2020 Composewell Technologies |
---|---|
License | BSD3 |
Maintainer | streamly@composewell.com |
Stability | experimental |
Portability | GHC |
Safe Haskell | Safe-Inferred |
Language | Haskell2010 |
Streaming and backtracking parsers.
Parsers just extend folds. Please read the Fold
design notes in
Streamly.Internal.Data.Fold.Types for background on the design.
Parser Design
The Parser
type or a parsing fold is a generalization of the Fold
type.
The Fold
type always succeeds on each input. Therefore, it does not need
to buffer the input. In contrast, a Parser
may fail and backtrack to
replay the input again to explore another branch of the parser. Therefore,
it needs to buffer the input. Therefore, a Parser
is a fold with some
additional requirements. To summarize, unlike a Fold
, a Parser
:
- may not generate a new value of the accumulator on every input, it may generate a new accumulator only after consuming multiple input elements (e.g. takeEQ).
- on success may return some unconsumed input (e.g. takeWhile)
- may fail and return all input without consuming it (e.g. satisfy)
- backtrack and start inspecting the past input again (e.g. alt)
These use cases require buffering and replaying of input. To facilitate
this, the step function of the Fold
is augmented to return the next state
of the fold along with a command tag using a Step
functor, the tag tells
the fold driver to manipulate the future input as the parser wishes. The
Step
functor provides the following commands to the fold driver
corresponding to the use cases outlined in the previous para:
Skip
: hold (buffer) the input or go back to a previous position in the streamYield
,Stop
: tell how much input is unconsumedError
: indicates that the parser has failed without a result
How a Parser Works?
A parser is just like a fold, it keeps consuming inputs from the stream and accumulating them in an accumulator. The accumulator of the parser could be a singleton value or it could be a collection of values e.g. a list.
The parser may build a new output value from multiple input items. When it
consumes an input item but needs more input to build a complete output item
it uses Skip 0 s
, yielding the intermediate state s
and asking the
driver to provide more input. When the parser determines that a new output
value is complete it can use a Stop n b
to terminate the parser with n
items of input unused and the final value of the accumulator returned as
b
. If at any time the parser determines that the parse has failed it can
return Error err
.
A parser building a collection of values (e.g. a list) can use the Yield
constructor whenever a new item in the output collection is generated. If a
parser building a collection of values has yielded at least one value then
it considered successful and cannot fail after that. In the current
implementation, this is not automatically enforced, there is a rule that the
parser MUST use only Stop
for termination after the first Yield
, it
cannot use Error
. It may be possible to change the implementation so that
this rule is not required, but there may be some performance cost to it.
takeWhile
and
some
combinators are good examples of
efficient implementations using all features of this representation. It is
possible to idiomatically build a collection of parsed items using a
singleton parser and Alternative
instance instead of using a
multi-yield parser. However, this implementation is amenable to stream
fusion and can therefore be much faster.
Error Handling
When a parser's step
function is invoked it may iterminate by either a
Stop
or an Error
return value. In an Alternative
composition an error
return can make the composed parser backtrack and try another parser.
If the stream stops before a parser could terminate then we use the
extract
function of the parser to retrieve the last yielded value of the
parser. If the parser has yielded at least one value then extract
MUST
return a value without throwing an error, otherwise it uses the ParseError
exception to throw an error.
We chose the exception throwing mechanism for extract
instead of using an
explicit error return via an Either
type for keeping the interface simple
as most of the time we do not need to catch the error in intermediate
layers. Note that we cannot use exception throwing mechanism in step
function because of performance reasons. Error
constructor in that case
allows loop fusion and better performance.
Future Work
It may make sense to move "takeWhile" type of parsers, which cannot fail but need some lookahead, to splitting folds. This will allow such combinators to be accepted where we need an unfailing Fold type.
Based on application requirements it should be possible to design even a richer interface to manipulate the input stream/buffer. For example, we could randomly seek into the stream in the forward or reverse directions or we can even seek to the end or from the end or seek from the beginning.
We can distribute and scan/parse a stream using both folds and parsers and merge the resulting streams using different merge strategies (e.g. interleaving or serial).
Synopsis
- data Step s b
- data Parser m a b = forall s. Parser (s -> a -> m (Step s b)) (m s) (s -> m b)
- newtype ParseError = ParseError String
- yield :: Monad m => b -> Parser m a b
- yieldM :: Monad m => m b -> Parser m a b
- splitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c
- die :: MonadThrow m => String -> Parser m a b
- dieM :: MonadThrow m => m String -> Parser m a b
- splitSome :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c
- splitMany :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c
- alt :: Monad m => Parser m x a -> Parser m x a -> Parser m x a
Documentation
The return type of a Parser
step.
A parser is driven by a parse driver one step at a time, at any time the
driver may extract
the result of the parser. The parser may ask the driver
to backtrack at any point, therefore, the driver holds the input up to a
point of no return in a backtracking buffer. The buffer grows or shrinks
based on the return values of the parser step execution.
When a parser step is executed it generates a new intermediate state of the
parse result along with a command to the driver. The command tells the
driver whether to keep the input stream for a potential backtracking later
on or drop it, and how much to keep. The constructors of Step
represent
the commands to the driver.
Internal
Yield Int s |
|
Skip Int s |
|
Stop Int b |
|
Error String | An error makes the parser backtrack to the last checkpoint and try another alternative. |
A parser is a fold that can fail and is represented as Parser step
initial extract
. Before we drive a parser we call the initial
action to
retrieve the initial state of the fold. The parser driver invokes step
with the state returned by the previous step and the next input element. It
results into a new state and a command to the driver represented by Step
type. The driver keeps invoking the step function until it stops or fails.
At any point of time the driver can call extract
to inspect the result of
the fold. It may result in an error or an output value.
Internal
Instances
MonadCatch m => Alternative (Parser m a) Source # |
Note: The implementation of
|
Monad m => Applicative (Parser m a) Source # |
|
Defined in Streamly.Internal.Data.Parser.Types | |
Functor m => Functor (Parser m a) Source # | |
Monad m => Monad (Parser m a) Source # | Monad composition can be used for lookbehind parsers, we can make the future parses depend on the previously parsed values. If we have to parse "a9" or "9a" but not "99" or "aa" we can use the following parser: backtracking :: MonadCatch m => PR.Parser m Char String
backtracking =
sequence [PR.satisfy isDigit, PR.satisfy isAlpha]
We know that if the first parse resulted in a digit at the first place then
the second parse is going to fail. However, we waste that information and
parse the first character again in the second parse only to know that it is
not an alphabetic char. By using lookbehind in a data DigitOrAlpha = Digit Char | Alpha Char lookbehind :: MonadCatch m => PR.Parser m Char String lookbehind = do x1 <- Digit |
MonadCatch m => MonadPlus (Parser m a) Source # |
Internal |
newtype ParseError Source #
This exception is used for two purposes:
- When a parser ultimately fails, the user of the parser is intimated via this exception.
- When the "extract" function of a parser needs to throw an error.
Internal
Instances
Exception ParseError Source # | |
Defined in Streamly.Internal.Data.Parser.Types toException :: ParseError -> SomeException # fromException :: SomeException -> Maybe ParseError # displayException :: ParseError -> String # | |
Show ParseError Source # | |
Defined in Streamly.Internal.Data.Parser.Types showsPrec :: Int -> ParseError -> ShowS # show :: ParseError -> String # showList :: [ParseError] -> ShowS # |
yield :: Monad m => b -> Parser m a b Source #
A parser that always yields a pure value without consuming any input.
Internal
yieldM :: Monad m => m b -> Parser m a b Source #
A parser that always yields the result of an effectful action without consuming any input.
Internal
splitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #
Sequential application. Apply two parsers sequentially to an input stream. The input is provided to the first parser, when it is done the remaining input is provided to the second parser. If both the parsers succeed their outputs are combined using the supplied function. The operation fails if any of the parsers fail.
This undoes an "append" of two streams, it splits the streams using two parsers and zips the results.
This implementation is strict in the second argument, therefore, the following will fail:
>>>
S.parse (PR.satisfy (> 0) *> undefined) $ S.fromList [1]
Internal
die :: MonadThrow m => String -> Parser m a b Source #
A parser that always fails with an error message without consuming any input.
Internal
dieM :: MonadThrow m => m String -> Parser m a b Source #
A parser that always fails with an effectful error message and without consuming any input.
Internal
splitSome :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c Source #
See documentation of some
.
Internal
splitMany :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c Source #
See documentation of many
.
Internal
alt :: Monad m => Parser m x a -> Parser m x a -> Parser m x a Source #
Sequential alternative. Apply the input to the first parser and return the result if the parser succeeds. If the first parser fails then backtrack and apply the same input to the second parser and return the result.
Note: This implementation is not lazy in the second argument. The following will fail:
>>>
S.parse (PR.satisfy (> 0) `PR.alt` undefined) $ S.fromList [1..10]
Internal