-- Hoogle documentation, generated by Haddock -- See Hoogle, http://www.haskell.org/hoogle/ -- | A quasi-quoter for typeful results of regex captures. -- -- Provides a quasi-quoter for regular expressions which yields a tuple, -- of appropriate arity and types, representing the results of the -- captures. Allows the user to specify parsers for captures as inline -- Haskell. Can also be used to provide typeful pattern matching in -- function definitions and pattern matches. Also, it precompiles the -- regular expressions into a PCRE compiled byte-array representation, at -- compile time. @package rex @version 0.1 -- | This module provides support for accessing the compiled representation -- of PCRE regular expressions. This byte array encodes a lookup-table -- based representation for the regular expression, and can be safely -- written out and read back by compatible PCRE versions. module Text.Regex.PCRE.Precompile -- | Compiles the given regular expression, and assuming nothing bad -- happens, yields the bytestring filled with PCRE's compiled -- representation. precompile :: ByteString -> [PCREOption] -> IO (Maybe ByteString) -- | Takes a compiled regular expression adn yields its bytestring -- representation. regexToTable :: Regex -> IO (Maybe ByteString) -- | Creates a regular expression regexFromTable :: ByteString -> IO Regex -- | This module provides a template Haskell quasiquoter for regular -- expressions, which provides the following features: -- -- 1) Compile-time checking that the regular expression is valid. -- -- 2) Arity of resulting tuple based on the number of selected capture -- patterns in the regular expression. -- -- 3) By default utilizes type inference to determine how to parse -- capture patterns - uses the read function's return-type -- polymorphism -- -- 4) Allows for the inline interpolation of a mapping function String -- -> a. -- -- 5) Precompiles the regular expression at compile time, by calling into -- the PCRE library and storing a ByteString literal representation of -- its state. -- -- Here's a silly example which parses peano numbers of the form Z, S Z, -- S S Z, etc. The s+ means that it is not sensitive to the quantity or -- type of seperating whitespace. (these examples can also be found in -- Test.hs) -- --
-- peano :: String -> Maybe Int
-- peano = [rex|^(?{ length . filter (=='S') } \s* (?:S\s+)*Z)\s*$|]
--
--
-- -- *Main> peano "Z" -- Just 0 -- *Main> peano "S Z" -- Just 1 -- *Main> peano "S S Z" -- Just 2 -- *Main> peano "S S S Z" -- Just 3 -- *Main> peano "invalid" -- Nothing ---- -- The token "(?{" introduces a capture group which has a mapping applied -- to the result - in this case "length . filter (==S)". If an -- expression is omitted, e.g. "(?{} ... )", then there is an implicit -- usage of the read function. -- -- If the ?{ ... } are omitted, then the capture group is not taken as -- part of the results of the match. -- --
-- vect2d :: String -> Maybe (Int, Int)
-- vect2d = [rex|^<\s* (?{}\d+) \s*,\s* (?{}\d+) \s*>$|]
--
--
-- The following example is derived from
-- http:www.regular-expressions.info/dates.html
--
--
-- parseDate :: String -> Maybe (Int, Int, Int)
-- parseDate [rex|^(?{ y }(?:19|20)\d\d)[- /.]
-- (?{ m }0[1-9]|1[012])[- /.]
-- (?{ d }0[1-9]|[12][0-9]|3[01])$|]
-- | (d > 30 && (m `elem` [4, 6, 9, 11]))
-- || (m == 2 &&
-- (d == 29 && not (mod y 4 == 0 && (mod y 100 /= 0 || mod y 400 == 0)))
-- || (d > 29)) = Nothing
-- | otherwise = Just (y, m, d)
-- parseDate _ = Nothing
--
--
-- The above example makes use of the regex quasi-quoter as a pattern
-- matcher. The interpolated Haskell patterns are used to construct an
-- implicit view pattern. The above pattern is expanded to the
-- equivalent:
--
--
-- parseDate ([rex|^(?{}(?:19|20)\d\d)[- /.]
-- (?{}0[1-9]|1[012])[- /.]
-- (?{}0[1-9]|[12][0-9]|3[01])$|]
-- -> Just (y, m, d))
--
--
-- In order to provide a capture-mapper along with a pattern, use
-- view-patterns inside the interpolation brackets.
--
-- Caveat: Since haskell-src-exts does not support parsing view-patterns,
-- the above is implemented as a relatively naive split on "->". It
-- presumes that the last "->" in the interpolated pattern seperates
-- the pattern from an expression on the left. This allows for lambdas to
-- be present in the expression, but prevents nesting view patterns.
--
-- There are a few other inelegances:
--
-- 1) PCRE captures, unlike .NET regular expressions, yield the last
-- capture made by a particular pattern. So, for example, (...)*, will
-- only yield one match for .... Ideally these would be detected
-- and yield an implicit [a].
--
-- 2) Patterns with disjunction between captures ((?{f}a) | (?{g}b)) will
-- provide the empty string to one of f / g. In the case of pattern
-- expressions, it would be convenient to be able to map multiple
-- captures into a single variable / pattern, preferring the first
-- non-empty option. The general logic for this is a bit complicated, and
-- postponed for a later release.
--
-- Since pcre-light is a wrapper over a C API, the most efficient
-- interface is ByteStrings, as it does not natively speak Haskell lists.
-- The [rex| ... ] quasiquoter implicitely packs the input into a
-- bystestring, and unpacks the results to strings before providing them
-- to your mappers. Use [brex| ... ] to bypass this, and process raw
-- ByteStrings. In order to preserve the same default behavior, "read .
-- unpack" is the default mapper for brex.
--
-- Inspired by / copy-modified from Matt Morrow's regexqq package:
-- http://hackage.haskell.org/packages/archive/regexqq/latest/doc/html/src/Text-Regex-PCRE-QQ.html
--
-- And code from Erik Charlebois's interpolatedstring-qq package:
-- http://hackage.haskell.org/packages/archive/interpolatedstring-qq/latest/doc/html/Text-InterpolatedString-QQ.html
module Text.Regex.PCRE.Rex
rex :: QuasiQuoter
-- | The regular expression quasiquoter for strings.
brex :: QuasiQuoter
-- | A possibly useful utility function - yields Just x when there
-- is a valid parse, and Nothing otherwise.
maybeRead :: Read a => String -> Maybe a
-- | Given a desired list-length, if the passed list is too short, it is
-- padded with the given element. Otherwise, it trims.
padRight :: a -> Int -> [a] -> [a]