lens-regex-pcre-0.3.1.0: A lensy interface to regular expressions

Copyright(c) Chris Penner 2019
LicenseBSD3
Safe HaskellNone
LanguageHaskell2010

Control.Lens.Regex

Contents

Description

 
Synopsis

Combinators

regex :: Regex -> IndexedTraversal' Int Text (Match Text) Source #

The base combinator for doing regex searches. It's a traversal which selects Matches; you can compose it with match or groups to get the relevant parts of your match.

>>> txt = "raindrops on roses and whiskers on kittens" :: Text

Search

>>> has (regex [rx|whisk|]) txt
True

Get matches

>>> txt ^.. regex [rx|\br\w+|] . match
["raindrops","roses"]

Edit matches

>>> txt & regex [rx|\br\w+|] . match %~ T.intersperse '-' . T.toUpper
"R-A-I-N-D-R-O-P-S on R-O-S-E-S and whiskers on kittens"

Get Groups

>>> txt ^.. regex [rx|(\w+) on (\w+)|] . groups
[["raindrops","roses"],["whiskers","kittens"]]

Edit Groups

>>> txt & regex [rx|(\w+) on (\w+)|] . groups %~ Prelude.reverse
"roses on raindrops and kittens on whiskers"

Get the third match

>>> txt ^? regex [rx|\w+|] . index 2 . match
Just "roses"

Match integers, Read them into ints, then sort them in-place dumping them back into the source text afterwards.

>>> "Monday: 29, Tuesday: 99, Wednesday: 3" & partsOf (regex [rx|\d+|] . match . unpacked . _Show @Int) %~ sort
"Monday: 3, Tuesday: 29, Wednesday: 99"

To alter behaviour of the regex you may wish to pass PCREOptions when compiling it. The default behaviour may seem strange in certain cases; e.g. it operates in 'single-line' mode. You can compile the Regex separately and add any options you like, then pass the resulting Regex into regex; Alternatively can make your own version of the QuasiQuoter with any options you want embedded by using mkRegexQQ.

regexBS :: Regex -> IndexedTraversal' Int ByteString (Match ByteString) Source #

A version of regex which operates directly on ByteStrings. This is more efficient than using regex as it avoids converting back and forth between ByteString and Text.

match :: Monoid text => Traversal' (Match text) text Source #

Traverse each match

Get a match if one exists:

>>> "find a needle in a haystack" ^? regex [rx|n..dle|] . match
Just "needle"

Collect all matches

>>> "one _two_ three _four_" ^.. regex [rx|_\w+_|] . match
["_two_","_four_"]

You can edit the traversal to perform a regex replace/substitution

>>> "one _two_ three _four_" & regex [rx|_\w+_|] . match %~ T.toUpper
"one _TWO_ three _FOUR_"

groups :: Traversal' (Match text) [text] Source #

Access all groups of a match at once.

Note that you can edit the groups through this traversal, Changing the length of the list has behaviour similar to partsOf.

Get all matched groups:

>>> "raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups
[["raindrops","roses"],["whiskers","kittens"]]

You can access a specific group by combining with ix

>>> "raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups .  ix 1
["roses","kittens"]

groups is a traversal; you can mutate matches through it.

>>> "raindrops on roses and whiskers on kittens" & regex [rx|(\w+) on (\w+)|] . groups .  ix 1 %~ T.toUpper
"raindrops on ROSES and whiskers on KITTENS"

Editing the list rearranges groups

>>> "raindrops on roses and whiskers on kittens" & regex [rx|(\w+) on (\w+)|] . groups %~ Prelude.reverse
"roses on raindrops and kittens on whiskers"

You can traverse the list to flatten out all groups

>>> "raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . groups . traversed
["raindrops","roses","whiskers","kittens"]

matchAndGroups :: Monoid text => Getter (Match text) (text, [text]) Source #

Collect both the match text AND all the matching groups

>>> "raindrops on roses and whiskers on kittens" ^.. regex [rx|(\w+) on (\w+)|] . matchAndGroups
[("raindrops on roses",["raindrops","roses"]),("whiskers on kittens",["whiskers","kittens"])]

Compiling regex

rx :: QuasiQuoter Source #

QuasiQuoter for compiling regexes. This is just re re-exported under a different name so as not to conflict with re from Lens

mkRegexQQ :: [PCREOption] -> QuasiQuoter #

Returns a QuasiQuoter like re, but with given PCRE options.

compile :: ByteString -> [PCREOption] -> Regex #

compile

Compile a perl-compatible regular expression stored in a strict bytestring.

An example

let r = compile (pack "^(b+|a){1,2}?bc") []

Or using GHC's -XOverloadedStrings flag, and importing Data.ByteString.Char8, we can avoid the pack:

let r = compile "^(b+|a){1,2}?bc" []

If the regular expression is invalid, an exception is thrown. If this is unsuitable, compileM is availlable, which returns failure in a monad.

To do case insentive matching,

compile "^(b+|a){1,2}?bc" [caseless]

Other flags are documented below.

The resulting abstract regular expression can be passed to match for matching against a subject string.

The arguments are:

  • pat: A ByteString containing the regular expression to be compiled.
  • flags, optional bit flags. If Nothing is provided, defaults are used.

Valid compile-time flags are:

  • anchored - Force pattern anchoring
  • auto_callout - Compile automatic callouts
  • bsr_anycrlf - \R matches only CR, LF, or CRLF
  • bsr_unicode - \R matches all Unicode line endings
  • caseless - Do caseless matching
  • dollar_endonly - $ not to match newline at end
  • dotall - matches anything including NL
  • dupnames - Allow duplicate names for subpatterns
  • extended - Ignore whitespace and # comments
  • extra - PCRE extra features (not much use currently)
  • firstline - Force matching to be before newline
  • multiline - ^ and $ match newlines within data
  • newline_any - Recognize any Unicode newline sequence
  • newline_anycrlf - Recognize CR, LF, and CRLF as newline sequences
  • newline_cr - Set CR as the newline sequence
  • newline_crlf - Set CRLF as the newline sequence
  • newline_lf - Set LF as the newline sequence
  • no_auto_capture - Disable numbered capturing parentheses (named ones available)
  • ungreedy - Invert greediness of quantifiers
  • utf8 - Run in UTF-8 mode
  • no_utf8_check - Do not check the pattern for UTF-8 validity

The regex is allocated via malloc on the C side, and will be deallocated by the runtime when the Haskell value representing it goes out of scope.

See 'man pcreapi for more details.

Caveats: patterns with embedded nulls, such as "0*" seem to be mishandled, as this won't currently match the subject "000".

compileM :: ByteString -> [PCREOption] -> Either String Regex #

compileM A safe version of compile with failure wrapped in an Either.

Examples,

> compileM ".*" [] :: Either String Regex
Right (Regex 0x000000004bb5b980 ".*")
> compileM "*" [] :: Either String Regex
Left "nothing to repeat"

Types

type Match text = [Either text text] Source #

Match represents a whole regex match; you can drill into it using match or groups or matchAndGroups

text is either Text or ByteString depending on whether you use regex or regexBS

Consider this to be internal; don't depend on its representation.

data Regex #

An abstract pointer to a compiled PCRE Regex structure The structure allocated by the PCRE library will be deallocated automatically by the Haskell storage manager.

Instances
Eq Regex 
Instance details

Defined in Text.Regex.PCRE.Light.Base

Methods

(==) :: Regex -> Regex -> Bool #

(/=) :: Regex -> Regex -> Bool #

Ord Regex 
Instance details

Defined in Text.Regex.PCRE.Light.Base

Methods

compare :: Regex -> Regex -> Ordering #

(<) :: Regex -> Regex -> Bool #

(<=) :: Regex -> Regex -> Bool #

(>) :: Regex -> Regex -> Bool #

(>=) :: Regex -> Regex -> Bool #

max :: Regex -> Regex -> Regex #

min :: Regex -> Regex -> Regex #

Show Regex 
Instance details

Defined in Text.Regex.PCRE.Light.Base

Methods

showsPrec :: Int -> Regex -> ShowS #

show :: Regex -> String #

showList :: [Regex] -> ShowS #