Copyright	©2019 James Brock
License	BSD2
Maintainer	James Brock <jamesbrock@gmail.com>
Safe Haskell	Safe-Inferred
Language	Haskell2010

Replace.Megaparsec

Contents

Running parser
Parser combinator

Description

Replace.Megaparsec is for finding text patterns, and also replacing or splitting on the found patterns. This activity is traditionally done with regular expressions, but Replace.Megaparsec uses Text.Megaparsec parsers instead for the pattern matching.

Replace.Megaparsec can be used in the same sort of “pattern capture” or “find all” situations in which one would use Python re.findall, or Perl m//, or Unix grep.

Replace.Megaparsec can be used in the same sort of “stream editing” or “search-and-replace” situations in which one would use Python re.sub, or Perl s///, or Unix sed, or awk.

Replace.Megaparsec can be used in the same sort of “string splitting” situations in which one would use Python re.split or Perl split.

See the replace-megaparsec package README for usage examples.

Type constraints

output stream type `Tokens s` = input stream type `s`

All functions in the Running Parser section require the type of the stream of text that is input to be Stream s such that Tokens s ~ s, because we want to output the same type of stream that was input. That requirement is satisfied for all the Stream instances included with Text.Megaparsec:

Custom error type `e` should be `Void`

Megaparsec parsers have a custom error data component e. When writing parsers to be used by this module, the custom error type e should usually be Void, because every function in this module expects a parser failure to occur on every token in a non-matching section of the input stream, so parser failure error descriptions are not returned, and you'll never see the custom error information.

Special fast input types

Functions in this module will be “fast” when the input stream type s is:

We mean “fast” in the same sense as MonadParsec: when returning subsections of the input stream, we return slices of the input stream data, rather than constructing a list of tokens and then building a new stream subsection from that list. This relies on implementation details of the stream representation, so there are specialization re-write rules in this module to make that possible without adding new typeclasses.

Synopsis

breakCap :: forall e s a. (Ord e, Stream s, Tokens s ~ s) => Parsec e s a -> s -> Maybe (s, a, s)
breakCapT :: forall m e s a. (Ord e, Stream s, Tokens s ~ s, Monad m) => ParsecT e s m a -> s -> m (Maybe (s, a, s))
splitCap :: forall e s a. (Ord e, Stream s, Tokens s ~ s) => Parsec e s a -> s -> [Either s a]
splitCapT :: forall e s m a. (Ord e, Stream s, Tokens s ~ s, Monad m) => ParsecT e s m a -> s -> m [Either s a]
streamEdit :: forall e s a. (Ord e, Stream s, Monoid s, Tokens s ~ s) => Parsec e s a -> (a -> s) -> s -> s
streamEditT :: forall e s m a. (Ord e, Stream s, Monad m, Monoid s, Tokens s ~ s) => ParsecT e s m a -> (a -> m s) -> s -> m s
anyTill :: forall e s m a. MonadParsec e s m => m a -> m (Tokens s, a)
sepCap :: forall e s m a. MonadParsec e s m => m a -> m [Either (Tokens s) a]
findAll :: MonadParsec e s m => m a -> m [Either (Tokens s) (Tokens s)]
findAllCap :: MonadParsec e s m => m a -> m [Either (Tokens s) (Tokens s, a)]

Running parser

Functions in this section are ways to run parsers (like runParser). They take as arguments a sep parser and some input, run the parser on the input, and return a result.

breakCap Source #

Arguments

:: forall e s a. (Ord e, Stream s, Tokens s ~ s)
=> Parsec e s a	The pattern matching parser `sep`
-> s	The input stream of text
-> Maybe (s, a, s)	Maybe (prefix, parse_result, suffix)

Break on and capture one pattern

Find the first occurence of a pattern in a text stream, capture the found pattern, and break the input text stream on the found pattern.

The breakCap function is like takeWhile, but can be predicated beyond more than just the next one token. It's also like breakOn, but the needle can be a pattern instead of a constant string.

Be careful not to look too far ahead; if the sep parser looks to the end of the input then breakCap could be O(n²).

The pattern parser sep may match a zero-width pattern (a pattern which consumes no parser input on success).

Output

Nothing when no pattern match was found.
Just (prefix, parse_result, suffix) for the result of parsing the pattern match, and the prefix string before and the suffix string after the pattern match. prefix and suffix may be zero-length strings.

Access the matched section of text

If you want to capture the matched string, then combine the pattern parser sep with match.

With the matched string, we can reconstruct the input string. For all input, sep, if

let (Just (prefix, (infix, _), suffix)) = breakCap (match sep) input

then

input == prefix <> infix <> suffix

breakCapT Source #

Arguments

:: forall m e s a. (Ord e, Stream s, Tokens s ~ s, Monad m)
=> ParsecT e s m a	The pattern matching parser `sep`
-> s	The input stream of text
-> m (Maybe (s, a, s))	Maybe (prefix, parse_result, suffix)

Break on and capture one pattern

Monad transformer version of breakCap.

The parser sep will run in the underlying monad context.

splitCap Source #

Arguments

:: forall e s a. (Ord e, Stream s, Tokens s ~ s)
=> Parsec e s a	The pattern matching parser `sep`
-> s	The input stream of text
-> [Either s a]	List of matching and non-matching input sections.

Split on and capture all patterns

Find all occurences of the pattern sep, split the input string, capture all the patterns and the splits.

The input string will be split on every leftmost non-overlapping occurence of the pattern sep. The output list will contain the parsed result of input string sections which match the sep pattern in Right, and non-matching sections in Left.

splitCap depends on sepCap, see sepCap for more details.

Access the matched section of text

If you want to capture the matched strings, then combine the pattern parser sep with match.

With the matched strings, we can reconstruct the input string. For all input, sep, if

let output = splitCap (match sep) input

then

input == mconcat (second fst <$> output)

splitCapT Source #

Arguments

:: forall e s m a. (Ord e, Stream s, Tokens s ~ s, Monad m)
=> ParsecT e s m a	The pattern matching parser `sep`
-> s	The input stream of text
-> m [Either s a]	List of matching and non-matching input sections.

Split on and capture all patterns

Monad transformer version of splitCap.

The parser sep will run in the underlying monad context.

streamEdit Source #

Arguments

:: forall e s a. (Ord e, Stream s, Monoid s, Tokens s ~ s)
=> Parsec e s a	The pattern matching parser `sep`
-> (a -> s)	The `editor` function. Takes a parsed result of `sep` and returns a new stream section for the replacement.
-> s	The input stream of text to be edited
-> s	The edited input stream

Stream editor

Also known as “find-and-replace”, or “match-and-substitute”. Finds all non-overlapping sections of the stream which match the pattern sep, and replaces them with the result of the editor function.

Access the matched section of text in the `editor`

If you want access to the matched string in the editor function, then combine the pattern parser sep with match. This will effectively change the type of the editor function to (s,a) -> s.

This allows us to write an editor function which can choose to not edit the match and just leave it as it is. If the editor function returns the first item in the tuple, then streamEdit will not change the matched string.

So, for all sep:

streamEdit (match sep) fst ≡ id

streamEditT Source #

Arguments

:: forall e s m a. (Ord e, Stream s, Monad m, Monoid s, Tokens s ~ s)
=> ParsecT e s m a	The pattern matching parser `sep`
-> (a -> m s)	The `editor` function. Takes a parsed result of `sep` and returns a new stream section for the replacement.
-> s	The input stream of text to be edited
-> m s	The edited input stream

Stream editor

Monad transformer version of streamEdit.

Both the parser sep and the editor function will run in the underlying monad context.

If you want to do IO operations in the editor function or the parser sep, then run this in IO.

If you want the editor function or the parser sep to remember some state, then run this in a stateful monad.

Parser combinator

Functions in this section are parser combinators. They take a sep parser for an argument, combine sep with another parser, and return a new parser.

anyTill Source #

Arguments

:: forall e s m a. MonadParsec e s m
=> m a	The pattern matching parser `sep`
-> m (Tokens s, a)	parser

Specialized manyTill_

Parser combinator to consume input until the sep pattern matches, equivalent to manyTill_ anySingle sep. On success, returns the prefix before the pattern match and the parsed match.

sep may be a zero-width parser, it may succeed without consuming any input.

This combinator will produce a parser which acts like takeWhileP but is predicated beyond more than just the next one token. anyTill is also like takeWhileP in that it will be “fast” when applied to an input stream type s for which there are specialization re-write rules.

sepCap Source #

Arguments

:: forall e s m a. MonadParsec e s m
=> m a	The pattern matching parser `sep`
-> m [Either (Tokens s) a]	parser

Separate and capture

Parser combinator to find all of the leftmost non-overlapping occurrences of the pattern parser sep in a text stream. The sepCap parser will always consume its entire input and can never fail.

sepCap is similar to the sep* family of parser combinators found in parser-combinators and parsers, but it returns the parsed result of the sep parser instead of throwing it away.

Output

The input stream is separated and output into a list of sections:

Sections which can parsed by the pattern sep will be parsed and captured as Right.
Non-matching sections of the stream will be captured in Left.

The output list also has these properties:

If the input is "" then the output list will be [].
If there are no pattern matches, then the entire input stream will be returned as one non-matching Left section.
The output list will not contain two consecutive Left sections.

Zero-width matches forbidden

If the pattern matching parser sep would succeed without consuming any input then sepCap will force it to fail. If we allow sep to match a zero-width pattern, then it can match the same zero-width pattern again at the same position on the next iteration, which would result in an infinite number of overlapping pattern matches.

findAll Source #

Arguments

:: MonadParsec e s m
=> m a	The pattern matching parser `sep`
-> m [Either (Tokens s) (Tokens s)]	parser

Deprecated: replace with `findAll sep = (fmap.fmap) (second fst) $ sepCap (match sep)`

Find all occurences

Parser combinator for finding all occurences of a pattern in a stream.

Will call sepCap with the match combinator and return the text which matched the pattern parser sep in the Right sections.

Definition:

findAll sep = (fmap.fmap) (second fst) $ sepCap (match sep)

findAllCap Source #

Arguments

:: MonadParsec e s m
=> m a	The pattern matching parser `sep`
-> m [Either (Tokens s) (Tokens s, a)]	parser

Deprecated: replace with `findAllCap sep = sepCap (match sep)`

Find all occurences, parse and capture pattern matches

Parser combinator for finding all occurences of a pattern in a stream.

Will call sepCap with the match combinator so that the text which matched the pattern parser sep will be returned in the Right sections, along with the result of the parse of sep.

Definition:

findAllCap sep = sepCap (match sep)

Type constraints

output stream type Tokens s = input stream type s

Custom error type e should be Void

Special fast input types

Running parser

Break on and capture one pattern

Output

Access the matched section of text

Break on and capture one pattern

Split on and capture all patterns

Access the matched section of text

Split on and capture all patterns

Stream editor

Access the matched section of text in the editor

Stream editor

Parser combinator

Specialized manyTill_

Separate and capture

Output

Zero-width matches forbidden

Find all occurences

Find all occurences, parse and capture pattern matches

output stream type `Tokens s` = input stream type `s`

Custom error type `e` should be `Void`

Access the matched section of text in the `editor`