rex-0.6.1: A quasi-quoter for typeful results of regex captures.

Copyright(c) Michael Sloan 2011
MaintainerMichael Sloan (mgsloan@gmail.com)
Stabilityunstable
Portabilityunportable
Safe HaskellNone
LanguageHaskell98

Text.Regex.PCRE.Rex

Contents

Description

This module provides a template Haskell quasiquoter for regular expressions, which provides the following features:

1) Compile-time checking that the regular expression is valid.

2) Arity of resulting tuple based on the number of selected capture patterns in the regular expression.

3) Allows for the inline interpolation of mapping functions :: String -> a.

4) Precompiles the regular expression at compile time, by calling into the PCRE library and storing a ByteString literal representation of its state.

5) Compile-time configurable to use different PCRE options, turn off precompilation, use ByteStrings, or set a default mapping expression.

Inspired by Matt Morrow's regexqq package: http://hackage.haskell.org/package/regexqq/docs/Text-Regex-PCRE-QQ.html.

And some code from Erik Charlebois's interpolatedstring-qq package: http://hackage.haskell.org/package/interpolatedstring-qq/.

Synopsis

Language Extensions

Since this is a quasiquoter library that generates code using view patterns, the following extensions are required:

{-# LANGUAGE TemplateHaskell, QuasiQuotes, ViewPatterns #-}

First Example

Here's an example which parses peano numbers of the form Z, S Z, S S Z, etc. The s+ means that it is not sensitive to the quantity or type of separating whitespace. These examples can also be found in Test.hs.

peano :: String -> Maybe Int
peano = [rex|^(?{ length . filter (=='S') } \s* (?:S\s+)*Z)\s*$|]
*Main> peano "Z"
Just 0
*Main> peano "S Z"
Just 1
*Main> peano "S   S Z"
Just 2
*Main> peano "S S S Z"
Just 3
*Main> peano "invalid"
Nothing

The token "(?{" introduces a capture group which has a mapping applied to the result. In this case, it's length . filter (==S). If the ?{ ... } are omitted, then the capture group is not taken as part of the results of the match. If the contents of the ?{ ... } is omitted, then a call to rexView is assumed:

parsePair :: String -> Maybe (String, String)
parsePair = [rex|^<\s* (?{ }[^\s,>]+) \s*,\s* (?{ }[^\s,>]+) \s*>$|]

The rexView exported by this module is just equal to id, so by default no preprocessing is done. However, we can shadow this locally:

parsePair' :: String -> Maybe (Int, Int)
parsePair' = [rex|^<\s* (?{ }[^\s,>]+) \s*,\s* (?{ }[^\s,>]+) \s*>$|]
  where
    rexView = read

Additional shorthands can be added by using rexWithConf and specifying custom values for rexPreprocessExp or rexPreprocessPat.

Second Example

This example is derived from http://www.regular-expressions.info/dates.html

parseDate :: String -> Maybe (Int, Int, Int)
parseDate [rex|^(?{ read -> y }(?:19|20)\d\d)[- /.]
                (?{ read -> m }0[1-9]|1[012])[- /.]
                (?{ read -> d }0[1-9]|[12][0-9]|3[01])$|]
  |  (d > 30 && (m `elem` [4, 6, 9, 11]))
  || (m == 2 &&
      (d == 29 && not (mod y 4 == 0 && (mod y 100 /= 0 || mod y 400 == 0)))
     || (d > 29)) = Nothing
  | otherwise = Just (y, m, d)
parseDate _ = Nothing

The above example makes use of the regex quasi-quoter as a pattern matcher. The interpolated Haskell patterns are used to construct an implicit view pattern out of the inlined ones. The above pattern is expanded to the equivalent:

parseDate ([rex|^(?{ read }(?:19|20)\d\d)[- /.]
                 (?{ read }0[1-9]|1[012])[- /.]
                 (?{ read }0[1-9]|[12][0-9]|3[01])$|]
          -> Just (y, m, d))

ByteStrings vs Strings

Since pcre-light is a wrapper over a C API, the most efficient interface is ByteStrings, as it does not natively speak Haskell lists. The [rex| ... ] quasiquoter implicitely packs the input into a bystestring, and unpacks the results to strings before providing them to your mappers. The brex QuasiQuoter is provided for this purpose. You can also define your own QuasiQuoter - the definitions of the default configurations are as follows:

rex  = rexWithConf $ defaultRexConf
brex = rexWithConf $ defaultRexConf { rexByteString = True }

defaultRexConf = RexConf False True "id" [PCRE.extended] []

The first False specifies to use String rather than ByteString. The True argument specifies to use precompilation. -- The string following is the default mapping expression, used when omitted. Due to GHC staging restrictions, your configuration will need to be in a different module than its usage.

Future Work

There are a few things that could potentially be improved:

1) PCRE captures, unlike .NET regular expressions, yield the last capture made by a particular pattern. So, for example, (...)*, will only yield one match for .... Ideally these would be detected and yield an implicit [a].

2) Patterns with disjunction between captures ((?{f}a) | (?{g}b)) will provide the empty string to one of f / g. In the case of pattern expressions, it would be convenient to be able to map multiple captures into a single variable / pattern, preferring the first non-empty option.

Quasiquoters

rex :: QuasiQuoter Source #

Rex quasiquoter which takes String as input, and uses defaultRexConf for its configuration. Can be used in expressions and patterns.

brex :: QuasiQuoter Source #

Rex quasiquoter which takes ByteString as input, and otherwise uses defaultRexConf for its configuration. Can be used in expressions and patterns.

Configurable QuasiQuoter

rexWithConf :: RexConf -> QuasiQuoter Source #

A configureable regular-expression QuasiQuoter. Takes the options to pass to the PCRE engine, along with Bools to flag ByteString usage and non-compilation respecively. The provided String indicates which mapping function to use, when one is omitted - "(?{} ...)".

data RexConf Source #

Constructors

RexConf 

Fields

  • rexByteString :: Bool

    When True, the input type is a ByteString, otherwise, it's a String.

  • rexCompiled :: Bool

    When True, the regex is precompiled.

  • rexPreprocessExp :: String -> String

    Preprocess the string used in expression antiquotes. defaultRexConf just passes through the string unaltered, unless it just consists of whitespace. When it's all whitespace, "rexView" is used.

  • rexPreprocessPat :: String -> String

    Preprocess the string used in pattern antiquotes. defaultRexConf adds parenthesis around the string, so that view patterns will parse without requiring parenthesis around them.

  • rexViewExp :: Exp

    When a pattern match doesn't have a view pattern, this expression is used to preprocess it before matching. When defaultRexConf is used, perhaps via rex or brex, a reference to rexView is used.

    The rexView exported by this module is id, so by default no preprocessing is done before

  • rexPCREOpts :: [PCREOption]

    Options used when compiling PCRE regular expressions.

  • rexPCREExecOpts :: [PCREExecOption]

    Options used when executing PCRE regular expressions.

defaultRexConf :: RexConf Source #

Default rex configuration, which specifies that the regexes operate on strings, don't post-process the matched patterns, and use extended. This setting causes whitespace to be non-semantic, and ignores # comments.

Utilities

makeQuasiMultiline :: QuasiQuoter -> QuasiQuoter Source #

This is a QuasiQuoter transformer, which allows for a whitespace- sensitive quasi-quoter to be broken over multiple lines. The default rex and brex functions do not need this as they are already whitespace insensitive. However, if you create your own configuration, which omits the extended parameter, then this could be useful. The leading space of each line is ignored, and all newlines removed.

eitherToParseResult :: Either String a -> ParseResult a Source #

Converts Left to ParseFailed noLoc, and a Right to ParseOk.

parseExp :: String -> ParseResult Exp Source #

Parse a Haskell expression into a Template Haskell Exp.

parsePat :: String -> ParseResult Pat Source #

Parse a Haskell pattern match into a Template Haskell Pat.

rexParseMode :: ParseMode Source #

Parse mode used by parseExp and parsePat.

Used by the generated code

rexView :: a -> a Source #

A default view function used when expression antiquotes are empty, or when pattern antiquotes omit a view pattern. See the documentation for rexPreprocessPat and rexPreprocessExp for more details.

You can locally shadow this rexView with your own version, if you wish. One good option is readMay from the safe package: http://hackage.haskell.org/package/safe/docs/Safe.html#v:readMay.

The type of this identity rexView is fully polymorphic so that it can be used with either String or ByteString.