parse-replace: Stream editing with parsers

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Stream editing and find-and-replace with Megaparsec monadic parser combinators.


[Skip to Readme]

Properties

Versions 1.0.0.0
Change log CHANGELOG.md
Dependencies base, megaparsec [details]
License BSD-3-Clause
Author James Brock
Maintainer jamesbrock@gmail.com
Category Parsing
Home page https://github.com/jamesdbrock/parse-replace
Bug tracker https://github.com/jamesdbrock/parse-replace/issues
Source repo head: git clone https://github.com/jamesdbrock/parse-replace.git
Uploaded by JamesBrock at 2019-08-24T09:57:19Z

Modules

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


Readme for parse-replace-1.0.0.0

[back to package description]

Parse Replace

This module is for doing “pattern capture” or “stream editing” or “find-and-replace” or “match-and-substitute,” using Megaparsec parsers instead of the more traditional regular expressions.

It can be used in the same sort of “pattern capture” situations in which one would use the Python re.findall or Perl m//, or Unix grep.

This module can be used for “find-and-replace” or “stream editing” in the same sort of situations in which one would use Python re.sub, or Perl s///, or Unix sed, or awk.

Why would we want to do pattern matching and substitution with parsers instead of regular expressions?

Examples

Try the examples with ghci by running cabal v2-repl in the parse-replace/ root directory.

The examples depend on these imports.

import Parsereplace
import Text.Megaparsec
import Text.Megaparsec.Char
import Text.Megaparsec.Char.Lexer

Parsing with sepCap family of parser combinators

The following examples show how to match a pattern to a string of text and deconstruct the string of text by separating it into sections which match the pattern, and sections which don't match.

Pattern-match, capture only the parsed result

Separate the input string into sections which can be parsed as a hexadecimal number with a prefix "0x", and sections which can't.

let hexparser = string "0x" >> hexadecimal :: Parsec Void String Integer
parseTest (sepCap hexparser) "0xA 000 0xFFFF"
[Right 10,Left " 000 ",Right 65535]

Pattern match, capture only the matched text

Just get the strings sections which match the hexadecimal parser, throw away the parsed number.

let hexparser = string "0x" >> hexadecimal :: Parsec Void String Integer
parseTest (findAll hexparser) "0xA 000 0xFFFF"
[Right "0xA",Left " 000 ",Right "0xFFFF"]

Pattern match, capture the matched text and the parsed result

Capture the parsed hexadecimal number, as well as the string section which parses as a hexadecimal number.

let hexparser = string "0x" >> hexadecimal :: Parsec Void String Integer
parseTest (findAllCap hexparser) "0xA 000 0xFFFF"
[Right ("0xA",10),Left " 000 ",Right ("0xFFFF",65535)]

Pattern match, capture only the locations of the matched patterns

Find all of the sections of the stream which match the Text.Megaparsec.Char.space1 parser (a string of whitespace). Print a list of the offsets of the beginning of every pattern match.

import Data.Either
let spaceoffset = getOffset <* space1 :: Parsec Void String Int
parseTest (return . rights =<< sepCap spaceoffset) " a  b  "
[0,2,5]

Edit text strings by running parsers with streamEdit

The following examples show how to search for a pattern in a string of text and then edit the string of text to substitute in some replacement text for the matched patterns.

Pattern match and replace with a constant

Replace all carriage-return-newline instances with newline.

streamEdit crlf (const "\n") "1\r\n\r\n2"
"1\n\n2"

Pattern match and edit the matches

Replace alphabetic characters with the next character in the alphabet.

streamEdit (some letterChar) (fmap succ) "HAL 9000"
"IBM 9000"

Pattern match and edit the matches

Find all of the string sections s which can be parsed as a hexadecimal number r, and if r≤16, then replace s with a decimal number.

let hexparser = string "0x" >> hexadecimal :: Parsec Void String Integer
streamEdit (match hexparser) (\(s,r) -> if r <= 16 then show r else s) "0xA 000 0xFFFF"
"10 000 0xFFFF"

Context-sensitive pattern match and edit the matches

Capitalize the third letter in a string. The capthird parser searches for individual letters, and it needs to remember how many times it has run so that it can match successfully only on the third time that it finds a letter. To enable the parser to remember how many times it has found a letter, we'll compose the parser with a State monad from the mtl package. (Run in ghci with cabal v2-repl -b mtl).

import qualified Control.Monad.State.Strict as MTL
import Control.Monad.State.Strict (get, put, evalState)
import Data.Char (toUpper)

let capthird :: ParsecT Void String (MTL.State Int) String
    capthird = do
        x <- letterChar
        i <- get
        put (i+1)
        if i==3 then return [x] else empty

flip evalState 1 $ streamEditT capthird (return . fmap toUpper) "a a a a a"
"a a A a a"

Alternatives

http://hackage.haskell.org/package/regex

http://hackage.haskell.org/package/pipes-parse