rex-0.1: A quasi-quoter for typeful results of regex captures.

Portabilityunportable
Stabilityunstable
MaintainerMichael Sloan (mgsloan@gmail.com)

Text.Regex.PCRE.Rex

Description

This module provides a template Haskell quasiquoter for regular expressions, which provides the following features:

1) Compile-time checking that the regular expression is valid.

2) Arity of resulting tuple based on the number of selected capture patterns in the regular expression.

3) By default utilizes type inference to determine how to parse capture patterns - uses the read function's return-type polymorphism

4) Allows for the inline interpolation of a mapping function String -> a.

5) Precompiles the regular expression at compile time, by calling into the PCRE library and storing a ByteString literal representation of its state.

Here's a silly example which parses peano numbers of the form Z, S Z, S S Z, etc. The s+ means that it is not sensitive to the quantity or type of seperating whitespace. (these examples can also be found in Test.hs)

 peano :: String -> Maybe Int
 peano = [rex|^(?{ length . filter (=='S') } \s* (?:S\s+)*Z)\s*$|]
 *Main> peano "Z"
 Just 0
 *Main> peano "S Z"
 Just 1
 *Main> peano "S   S Z"
 Just 2
 *Main> peano "S S S Z"
 Just 3
 *Main> peano "invalid"
 Nothing

The token "(?{" introduces a capture group which has a mapping applied to the result - in this case "length . filter (==S)". If an expression is omitted, e.g. "(?{} ... )", then there is an implicit usage of the read function.

If the ?{ ... } are omitted, then the capture group is not taken as part of the results of the match.

 vect2d :: String -> Maybe (Int, Int)
 vect2d = [rex|^<\s* (?{}\d+) \s*,\s* (?{}\d+) \s*>$|]

The following example is derived from http:www.regular-expressions.info/dates.html

 parseDate :: String -> Maybe (Int, Int, Int)
 parseDate [rex|^(?{ y }(?:19|20)\d\d)[- /.]
                 (?{ m }0[1-9]|1[012])[- /.]
                 (?{ d }0[1-9]|[12][0-9]|3[01])$|]
   |  (d > 30 && (m `elem` [4, 6, 9, 11]))
   || (m == 2 &&
        (d == 29 && not (mod y 4 == 0 && (mod y 100 /= 0 || mod y 400 == 0)))
      || (d > 29)) = Nothing
   | otherwise = Just (y, m, d)
 parseDate _ = Nothing

The above example makes use of the regex quasi-quoter as a pattern matcher. The interpolated Haskell patterns are used to construct an implicit view pattern. The above pattern is expanded to the equivalent:

 parseDate ([rex|^(?{}(?:19|20)\d\d)[- /.]
                  (?{}0[1-9]|1[012])[- /.]
                  (?{}0[1-9]|[12][0-9]|3[01])$|]
           -> Just (y, m, d))

In order to provide a capture-mapper along with a pattern, use view-patterns inside the interpolation brackets.

Caveat: Since haskell-src-exts does not support parsing view-patterns, the above is implemented as a relatively naive split on "->". It presumes that the last "->" in the interpolated pattern seperates the pattern from an expression on the left. This allows for lambdas to be present in the expression, but prevents nesting view patterns.

There are a few other inelegances:

1) PCRE captures, unlike .NET regular expressions, yield the last capture made by a particular pattern. So, for example, (...)*, will only yield one match for .... Ideally these would be detected and yield an implicit [a].

2) Patterns with disjunction between captures ((?{f}a) | (?{g}b)) will provide the empty string to one of f / g. In the case of pattern expressions, it would be convenient to be able to map multiple captures into a single variable / pattern, preferring the first non-empty option. The general logic for this is a bit complicated, and postponed for a later release.

Since pcre-light is a wrapper over a C API, the most efficient interface is ByteStrings, as it does not natively speak Haskell lists. The [rex| ... ] quasiquoter implicitely packs the input into a bystestring, and unpacks the results to strings before providing them to your mappers. Use [brex| ... ] to bypass this, and process raw ByteStrings. In order to preserve the same default behavior, "read . unpack" is the default mapper for brex.

Inspired by / copy-modified from Matt Morrow's regexqq package: http://hackage.haskell.org/packages/archive/regexqq/latest/doc/html/src/Text-Regex-PCRE-QQ.html

And code from Erik Charlebois's interpolatedstring-qq package: http://hackage.haskell.org/packages/archive/interpolatedstring-qq/latest/doc/html/Text-InterpolatedString-QQ.html

Synopsis

Documentation

brex :: QuasiQuoterSource

The regular expression quasiquoter for strings.

maybeRead :: Read a => String -> Maybe aSource

A possibly useful utility function - yields Just x when there is a valid parse, and Nothing otherwise.

padRight :: a -> Int -> [a] -> [a]Source

Given a desired list-length, if the passed list is too short, it is padded with the given element. Otherwise, it trims.