Copyright | (c) Michael Sloan 2011 |
---|---|
Maintainer | Michael Sloan (mgsloan@gmail.com) |
Stability | unstable |
Portability | unportable |
Safe Haskell | None |
Language | Haskell2010 |
This module provides a template Haskell quasiquoter for regular expressions, which provides the following features:
1) Compile-time checking that the regular expression is valid.
2) Arity of resulting tuple based on the number of selected capture patterns in the regular expression.
3) Allows for the inline interpolation of mapping functions :: String -> a.
4) Precompiles the regular expression at compile time, by calling into the
PCRE library and storing a ByteString
literal representation of its state.
5) Compile-time configurable to use different PCRE options, turn off
precompilation, use ByteString
s, or set a default mapping expression.
Inspired by Matt Morrow's regexqq package: http://hackage.haskell.org/package/regexqq/docs/Text-Regex-PCRE-QQ.html.
And some code from Erik Charlebois's interpolatedstring-qq package: http://hackage.haskell.org/package/interpolatedstring-qq/.
Synopsis
- rex :: QuasiQuoter
- brex :: QuasiQuoter
- rexWithConf :: RexConf -> QuasiQuoter
- data RexConf = RexConf {
- rexByteString :: Bool
- rexCompiled :: Bool
- rexPreprocessExp :: String -> String
- rexPreprocessPat :: String -> String
- rexViewExp :: Exp
- rexPCREOpts :: [PCREOption]
- rexPCREExecOpts :: [PCREExecOption]
- defaultRexConf :: RexConf
- makeQuasiMultiline :: QuasiQuoter -> QuasiQuoter
- eitherToParseResult :: Either String a -> ParseResult a
- parseExp :: String -> ParseResult Exp
- parsePat :: String -> ParseResult Pat
- rexParseMode :: ParseMode
- rexView :: a -> a
Language Extensions
Since this is a quasiquoter library that generates code using view patterns, the following extensions are required:
{-# LANGUAGE TemplateHaskell, QuasiQuotes, ViewPatterns #-}
First Example
Here's an example which parses peano numbers of the form Z, S Z, S S Z, etc. The s+ means that it is not sensitive to the quantity or type of separating whitespace. These examples can also be found in Test.hs.
peano :: String -> Maybe Int peano = [rex|^(?{ length . filter (=='S') } \s* (?:S\s+)*Z)\s*$|]
*Main> peano "Z" Just 0 *Main> peano "S Z" Just 1 *Main> peano "S S Z" Just 2 *Main> peano "S S S Z" Just 3 *Main> peano "invalid" Nothing
The token "(?{" introduces a capture group which has a mapping applied to
the result. In this case, it's length . filter (==
. If the ?{ ... }
are omitted, then the capture group is not taken as part of the results of
the match. If the contents of the ?{ ... } is omitted, then a call to
S
)rexView
is assumed:
parsePair :: String -> Maybe (String, String) parsePair = [rex|^<\s* (?{ }[^\s,>]+) \s*,\s* (?{ }[^\s,>]+) \s*>$|]
The rexView
exported by this module is just equal to id
, so by default
no preprocessing is done. However, we can shadow this locally:
parsePair' :: String -> Maybe (Int, Int) parsePair' = [rex|^<\s* (?{ }[^\s,>]+) \s*,\s* (?{ }[^\s,>]+) \s*>$|] where rexView = read
Additional shorthands can be added by using rexWithConf
and specifying
custom values for rexPreprocessExp
or rexPreprocessPat
.
Second Example
This example is derived from http://www.regular-expressions.info/dates.html
parseDate :: String -> Maybe (Int, Int, Int) parseDate [rex|^(?{ read -> y }(?:19|20)\d\d)[- /.] (?{ read -> m }0[1-9]|1[012])[- /.] (?{ read -> d }0[1-9]|[12][0-9]|3[01])$|] | (d > 30 && (m `elem` [4, 6, 9, 11])) || (m == 2 && (d == 29 && not (mod y 4 == 0 && (mod y 100 /= 0 || mod y 400 == 0))) || (d > 29)) = Nothing | otherwise = Just (y, m, d) parseDate _ = Nothing
The above example makes use of the regex quasi-quoter as a pattern matcher. The interpolated Haskell patterns are used to construct an implicit view pattern out of the inlined ones. The above pattern is expanded to the equivalent:
parseDate ([rex|^(?{ read }(?:19|20)\d\d)[- /.] (?{ read }0[1-9]|1[012])[- /.] (?{ read }0[1-9]|[12][0-9]|3[01])$|] -> Just (y, m, d))
ByteStrings vs Strings
Since pcre-light is a wrapper over a C API, the most efficient interface is
ByteStrings, as it does not natively speak Haskell lists. The [rex| ... ]
quasiquoter implicitely packs the input into a bystestring, and unpacks the
results to strings before providing them to your mappers. The brex
QuasiQuoter
is provided for this purpose. You can also define your own
QuasiQuoter
- the definitions of the default configurations are as follows:
rex = rexWithConf $ defaultRexConf brex = rexWithConf $ defaultRexConf { rexByteString = True } defaultRexConf = RexConf False True "id" [PCRE.extended] []
The first False
specifies to use String
rather than ByteString
. The
True
argument specifies to use precompilation. -- The
string following is the default mapping expression, used when omitted.
Due to GHC staging restrictions, your configuration will need to be in a
different module than its usage.
Future Work
There are a few things that could potentially be improved:
1) PCRE captures, unlike .NET regular expressions, yield the last capture
made by a particular pattern. So, for example, (...)*, will only yield one
match for ...
. Ideally these would be detected and yield an implicit [a].
2) Patterns with disjunction between captures ((?{f}a) | (?{g}b)) will provide the empty string to one of f / g. In the case of pattern expressions, it would be convenient to be able to map multiple captures into a single variable / pattern, preferring the first non-empty option.
Quasiquoters
rex :: QuasiQuoter Source #
Rex quasiquoter which takes String
as input, and uses defaultRexConf
for its configuration. Can be used in expressions and patterns.
brex :: QuasiQuoter Source #
Rex quasiquoter which takes ByteString
as input, and otherwise uses
defaultRexConf
for its configuration. Can be used in expressions and
patterns.
Configurable QuasiQuoter
rexWithConf :: RexConf -> QuasiQuoter Source #
A configureable regular-expression QuasiQuoter. Takes the options to pass
to the PCRE engine, along with Bool
s to flag ByteString
usage and
non-compilation respecively. The provided String
indicates which mapping
function to use, when one is omitted - "(?{} ...)".
RexConf | |
|
defaultRexConf :: RexConf Source #
Default rex configuration, which specifies that the regexes operate on
strings, don't post-process the matched patterns, and use extended
.
This setting causes whitespace to be non-semantic, and ignores # comments.
Utilities
makeQuasiMultiline :: QuasiQuoter -> QuasiQuoter Source #
This is a QuasiQuoter
transformer, which allows for a whitespace-
sensitive quasi-quoter to be broken over multiple lines. The default rex
and brex
functions do not need this as they are already whitespace
insensitive. However, if you create your own configuration, which omits the
extended
parameter, then this could be useful. The leading space of
each line is ignored, and all newlines removed.
eitherToParseResult :: Either String a -> ParseResult a Source #
Converts Left
to
, and a ParseFailed
noLoc
Right
to
.ParseOk
parseExp :: String -> ParseResult Exp Source #
Parse a Haskell expression into a Template Haskell Exp.
parsePat :: String -> ParseResult Pat Source #
Parse a Haskell pattern match into a Template Haskell Pat.
Used by the generated code
A default view function used when expression antiquotes are empty, or when
pattern antiquotes omit a view pattern. See the documentation for
rexPreprocessPat
and rexPreprocessExp
for more details.
You can locally shadow this rexView
with your own version, if you wish.
One good option is readMay from the safe package:
http://hackage.haskell.org/package/safe/docs/Safe.html#v:readMay.
The type of this identity rexView is fully polymorphic so that it can be
used with either String
or ByteString
.