streamly-0.7.3.2: Beautiful Streaming, Concurrent and Reactive Composition
Copyright(c) 2020 Composewell Technologies
LicenseBSD3
Maintainerstreamly@composewell.com
Stabilityexperimental
PortabilityGHC
Safe HaskellSafe-Inferred
LanguageHaskell2010

Streamly.Internal.Data.Parser

Description

Fast streaming parsers.

Applicative and Alternative type class based combinators from the parser-combinators package can also be used with the Parser type. However, there are two important differences between parser-combinators and the equivalent ones provided in this module in terms of performance:

1) parser-combinators use plain Haskell lists to collect the results, in a strict Monad like IO, the results are necessarily buffered before they can be consumed. This may not perform optimally in streaming applications processing large amounts of data. Equivalent combinators in this module can consume the results of parsing using a Fold, thus providing a scalability and a generic consumer.

2) Several combinators in this module can be many times faster because of stream fusion. For example, many combinator in this module is much faster than the many combinator of Alternative type class.

Failing parsers in this module throw the ParseError exception.

Synopsis

Documentation

data Parser m a b Source #

A parser is a fold that can fail and is represented as Parser step initial extract. Before we drive a parser we call the initial action to retrieve the initial state of the fold. The parser driver invokes step with the state returned by the previous step and the next input element. It results into a new state and a command to the driver represented by Step type. The driver keeps invoking the step function until it stops or fails. At any point of time the driver can call extract to inspect the result of the fold. It may result in an error or an output value.

Internal

Constructors

forall s. Parser (s -> a -> m (Step s b)) (m s) (s -> m b) 

Instances

Instances details
MonadCatch m => Alternative (Parser m a) Source #

Alternative instance using alt.

Note: The implementation of <|> is not lazy in the second argument. The following code will fail:

>>> S.parse (PR.satisfy (> 0) <|> undefined) $ S.fromList [1..10]
Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

empty :: Parser m a a0 #

(<|>) :: Parser m a a0 -> Parser m a a0 -> Parser m a a0 #

some :: Parser m a a0 -> Parser m a [a0] #

many :: Parser m a a0 -> Parser m a [a0] #

Monad m => Applicative (Parser m a) Source #

Applicative form of splitWith.

Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

pure :: a0 -> Parser m a a0 #

(<*>) :: Parser m a (a0 -> b) -> Parser m a a0 -> Parser m a b #

liftA2 :: (a0 -> b -> c) -> Parser m a a0 -> Parser m a b -> Parser m a c #

(*>) :: Parser m a a0 -> Parser m a b -> Parser m a b #

(<*) :: Parser m a a0 -> Parser m a b -> Parser m a a0 #

Functor m => Functor (Parser m a) Source # 
Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

fmap :: (a0 -> b) -> Parser m a a0 -> Parser m a b #

(<$) :: a0 -> Parser m a b -> Parser m a a0 #

Monad m => Monad (Parser m a) Source #

Monad composition can be used for lookbehind parsers, we can make the future parses depend on the previously parsed values.

If we have to parse "a9" or "9a" but not "99" or "aa" we can use the following parser:

backtracking :: MonadCatch m => PR.Parser m Char String
backtracking =
    sequence [PR.satisfy isDigit, PR.satisfy isAlpha]
    <|>
    sequence [PR.satisfy isAlpha, PR.satisfy isDigit]

We know that if the first parse resulted in a digit at the first place then the second parse is going to fail. However, we waste that information and parse the first character again in the second parse only to know that it is not an alphabetic char. By using lookbehind in a Monad composition we can avoid redundant work:

data DigitOrAlpha = Digit Char | Alpha Char

lookbehind :: MonadCatch m => PR.Parser m Char String
lookbehind = do
    x1 <-    Digit <$> PR.satisfy isDigit
         <|> Alpha <$> PR.satisfy isAlpha

    -- Note: the parse depends on what we parsed already
    x2 <- case x1 of
        Digit _ -> PR.satisfy isAlpha
        Alpha _ -> PR.satisfy isDigit

    return $ case x1 of
        Digit x -> [x,x2]
        Alpha x -> [x,x2]
Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

(>>=) :: Parser m a a0 -> (a0 -> Parser m a b) -> Parser m a b #

(>>) :: Parser m a a0 -> Parser m a b -> Parser m a b #

return :: a0 -> Parser m a a0 #

MonadCatch m => MonadPlus (Parser m a) Source #

mzero is same as empty, it aborts the parser. mplus is same as <|>, it selects the first succeeding parser.

Internal

Instance details

Defined in Streamly.Internal.Data.Parser.Types

Methods

mzero :: Parser m a a0 #

mplus :: Parser m a a0 -> Parser m a a0 -> Parser m a a0 #

Accumulators

fromFold :: Monad m => Fold m a b -> Parser m a b Source #

The resulting parse never terminates and never errors out.

any :: Monad m => (a -> Bool) -> Parser m a Bool Source #

>>> S.parse (PR.any (== 0)) $ S.fromList [1,0,1]
> Right True

all :: Monad m => (a -> Bool) -> Parser m a Bool Source #

>>> S.parse (PR.all (== 0)) $ S.fromList [1,0,1]
> Right False

yield :: Monad m => b -> Parser m a b Source #

A parser that always yields a pure value without consuming any input.

Internal

yieldM :: Monad m => m b -> Parser m a b Source #

A parser that always yields the result of an effectful action without consuming any input.

Internal

die :: MonadThrow m => String -> Parser m a b Source #

A parser that always fails with an error message without consuming any input.

Internal

dieM :: MonadThrow m => m String -> Parser m a b Source #

A parser that always fails with an effectful error message and without consuming any input.

Internal

Element parsers

peek :: MonadThrow m => Parser m a a Source #

Peek the head element of a stream, without consuming it. Fails if it encounters end of input.

>>> S.parse ((,) <$> PR.peek <*> PR.satisfy (> 0)) $ S.fromList [1]
(1,1)
peek = lookAhead (satisfy True)

Internal

eof :: Monad m => Parser m a () Source #

Succeeds if we are at the end of input, fails otherwise.

>>> S.parse ((,) <$> PR.satisfy (> 0) <*> PR.eof) $ S.fromList [1]
> (1,())

Internal

satisfy :: MonadThrow m => (a -> Bool) -> Parser m a a Source #

Returns the next element if it passes the predicate, fails otherwise.

>>> S.parse (PR.satisfy (== 1)) $ S.fromList [1,0,1]
> 1

Internal

Sequence parsers

take :: Monad m => Int -> Fold m a b -> Parser m a b Source #

Take at most n input elements and fold them using the supplied fold.

Stops after n elements. Never fails.

>>> S.parse (PR.take 1 FL.toList) $ S.fromList [1]
[1]
S.chunksOf n f = S.splitParse (FL.take n f)

Internal

takeEQ :: MonadThrow m => Int -> Fold m a b -> Parser m a b Source #

Stops after taking exactly n input elements.

  • Stops - after n elements.
  • Fails - if the stream ends before it can collect n elements.
>>> S.parse (PR.takeEQ 4 FL.toList) $ S.fromList [1,0,1]
> "takeEQ: Expecting exactly 4 elements, got 3"

Internal

takeGE :: MonadThrow m => Int -> Fold m a b -> Parser m a b Source #

Take at least n input elements, but can collect more.

  • Stops - never.
  • Fails - if the stream ends before producing n elements.
>>> S.parse (PR.takeGE 4 FL.toList) $ S.fromList [1,0,1]
> "takeGE: Expecting at least 4 elements, got only 3"
>>> S.parse (PR.takeGE 4 FL.toList) $ S.fromList [1,0,1,0,1]
> [1,0,1,0,1]

Internal

lookAhead :: MonadThrow m => Parser m a b -> Parser m a b Source #

takeWhile :: Monad m => (a -> Bool) -> Fold m a b -> Parser m a b Source #

Collect stream elements until an element fails the predicate. The element on which the predicate fails is returned back to the input stream.

  • Stops - when the predicate fails.
  • Fails - never.
>>> S.parse (PR.takeWhile (== 0) FL.toList) $ S.fromList [0,0,1,0,1]
> [0,0]

We can implement a breakOn using takeWhile:

breakOn p = takeWhile (not p)

Internal

takeWhile1 :: MonadThrow m => (a -> Bool) -> Fold m a b -> Parser m a b Source #

Like takeWhile but takes at least one element otherwise fails.

Internal

sliceSepBy :: Monad m => (a -> Bool) -> Fold m a b -> Parser m a b Source #

Collect stream elements until an element succeeds the predicate. Drop the element on which the predicate succeeded. The succeeding element is treated as an infix separator which is dropped from the output.

  • Stops - when the predicate succeeds.
  • Fails - never.
>>> S.parse (PR.sliceSepBy (== 1) FL.toList) $ S.fromList [0,0,1,0,1]
> [0,0]

S.splitOn pred f = S.splitParse (PR.sliceSepBy pred f)

>>> S.toList $ S.splitParse (PR.sliceSepBy (== 1) FL.toList) $ S.fromList [0,0,1,0,1]
> [[0,0],[0],[]]

Internal

sliceSepByMax :: Monad m => (a -> Bool) -> Int -> Fold m a b -> Parser m a b Source #

Split using a condition or a count whichever occurs first. This is a hybrid of splitOn and take. The element on which the condition succeeds is dropped.

Internal

sliceEndWith :: (a -> Bool) -> Fold m a b -> Parser m a b Source #

Collect stream elements until an element succeeds the predicate. Also take the element on which the predicate succeeded. The succeeding element is treated as a suffix separator which is kept in the output segement.

  • Stops - when the predicate succeeds.
  • Fails - never.

S.splitWithSuffix pred f = S.splitParse (PR.sliceEndWith pred f)

Unimplemented

sliceBeginWith :: (a -> Bool) -> Fold m a b -> Parser m a b Source #

Collect stream elements until an elements passes the predicate, return the last element on which the predicate succeeded back to the input stream. If the predicate succeeds on the first element itself then it is kept in the stream and we continue collecting. The succeeding element is treated as a prefix separator which is kept in the output segement.

  • Stops - when the predicate succeeds in non-leading position.
  • Fails - never.

S.splitWithPrefix pred f = S.splitParse (PR.sliceBeginWith pred f)

Unimplemented

wordBy :: (a -> Bool) -> Fold m a b -> Parser m a b Source #

Like splitOn but strips leading, trailing, and repeated separators. Therefore, ".a..b." having . as the separator would be parsed as ["a","b"]. In other words, its like parsing words from whitespace separated text.

  • Stops - when it finds a word separator after a non-word element
  • Fails - never.
S.wordsBy pred f = S.splitParse (PR.wordBy pred f)

Unimplemented

groupBy :: (a -> a -> Bool) -> Fold m a b -> Parser m a b Source #

groupBy cmp f $ S.fromList [a,b,c,...] assigns the element a to the first group, then if a `cmp` b is True b is also assigned to the same group. If a `cmp` c is True then c is also assigned to the same group and so on. When the comparison fails a new group is started. Each group is folded using the Fold f and the result of the fold is emitted in the output stream.

  • Stops - when the comparison fails.
  • Fails - never.
S.groupsBy cmp f = S.splitParse (PR.groupBy cmp f)

Unimplemented

eqBy :: MonadThrow m => (a -> a -> Bool) -> [a] -> Parser m a () Source #

Match the given sequence of elements using the given comparison function.

Internal

Binary Combinators

Sequential Applicative

splitWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #

Sequential application. Apply two parsers sequentially to an input stream. The input is provided to the first parser, when it is done the remaining input is provided to the second parser. If both the parsers succeed their outputs are combined using the supplied function. The operation fails if any of the parsers fail.

This undoes an "append" of two streams, it splits the streams using two parsers and zips the results.

This implementation is strict in the second argument, therefore, the following will fail:

>>> S.parse (PR.satisfy (> 0) *> undefined) $ S.fromList [1]

Internal

Parallel Applicatives

teeWith :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #

teeWith f p1 p2 distributes its input to both p1 and p2 until both of them succeed or fail and combines their output using f. The parser succeeds if both the parsers succeed.

Internal

teeWithFst :: Monad m => (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #

Like teeWith but ends parsing and zips the results, if available, whenever the first parser ends.

Internal

teeWithMin :: (a -> b -> c) -> Parser m x a -> Parser m x b -> Parser m x c Source #

Like teeWith but ends parsing and zips the results, if available, whenever any of the parsers ends or fails.

Unimplemented

Sequential Interleaving

deintercalate :: Fold m a y -> Parser m x a -> Fold m b z -> Parser m x b -> Parser m x (y, z) Source #

Apply two parsers alternately to an input stream. The input stream is considered an interleaving of two patterns. The two parsers represent the two patterns.

This undoes a "gintercalate" of two streams.

Unimplemented

Parallel Alternatives

shortest :: Monad m => Parser m x a -> Parser m x a -> Parser m x a Source #

Shortest alternative. Apply both parsers in parallel but choose the result from the one which consumed least input i.e. take the shortest succeeding parse.

Internal

longest :: MonadCatch m => Parser m x a -> Parser m x a -> Parser m x a Source #

Longest alternative. Apply both parsers in parallel but choose the result from the one which consumed more input i.e. take the longest succeeding parse.

Internal

N-ary Combinators

Sequential Collection

sequence :: Fold m b c -> t (Parser m a b) -> Parser m a c Source #

sequence f t collects sequential parses of parsers in the container t using the fold f. Fails if the input ends or any of the parsers fail.

Unimplemented

Sequential Repetition

count :: Int -> Fold m b c -> Parser m a b -> Parser m a c Source #

count n f p collects exactly n sequential parses of parser p using the fold f. Fails if the input ends or the parser fails before n results are collected.

Unimplemented

countBetween :: Int -> Int -> Fold m b c -> Parser m a b -> Parser m a c Source #

countBetween m n f p collects between m and n sequential parses of parser p using the fold f. Stop after collecting n results. Fails if the input ends or the parser fails before m results are collected.

Unimplemented

many :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c Source #

Collect zero or more parses. Apply the parser repeatedly on the input stream, stop when the parser fails, accumulate zero or more parse results using the supplied Fold. This parser never fails, in case the first application of parser fails it returns an empty result.

Compare with many.

Internal

some :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a c Source #

Collect one or more parses. Apply the supplied parser repeatedly on the input stream and accumulate the parse results as long as the parser succeeds, stop when it fails. This parser fails if not even one result is collected.

Compare with some.

Internal

manyTill :: MonadCatch m => Fold m b c -> Parser m a b -> Parser m a x -> Parser m a c Source #

manyTill f collect test tries the parser test on the input, if test fails it backtracks and tries collect, after collect succeeds test is tried again and so on. The parser stops when test succeeds. The output of test is discarded and the output of collect is accumulated by the supplied fold. The parser fails if collect fails.

Internal

Collection of Alternatives

choice :: t (Parser m a b) -> Parser m a b Source #

choice parsers applies the parsers in order and returns the first successful parse.