regex-applicative-text-0.1.0.1: regex-applicative on text

Copyright(c) 2015 Oleg Grenrus
LicenseBSD3
MaintainerOleg Grenrus <oleg.grenrus@iki.fi>
Stabilityexperimental
Safe HaskellSafe
LanguageHaskell2010

Text.Regex.Applicative.Text

Contents

Description

Text.Regex.Applicative API specialised to Char and Text.

Synopsis

Types

type RE' a = RE Char a Source

Convenience alias for RE working (also) on Text.

data RE s a :: * -> * -> *

Type of regular expressions that recognize symbols of type s and produce a result of type a.

Regular expressions can be built using Functor, Applicative and Alternative instances in the following natural way:

  • f <$> ra matches iff ra matches, and its return value is the result of applying f to the return value of ra.
  • pure x matches the empty string (i.e. it does not consume any symbols), and its return value is x
  • rf <*> ra matches a string iff it is a concatenation of two strings: one matched by rf and the other matched by ra. The return value is f a, where f and a are the return values of rf and ra respectively.
  • ra <|> rb matches a string which is accepted by either ra or rb. It is left-biased, so if both can match, the result of ra is used.
  • empty is a regular expression which does not match any string.
  • many ra matches concatenation of zero or more strings matched by ra and returns the list of ra's return values on those strings.
  • some ra matches concatenation of one or more strings matched by ra and returns the list of ra's return values on those strings.

Smart constructors

sym :: Char -> RE' Char Source

Match and return the given symbol

psym :: (Char -> Bool) -> RE' Char Source

Match and return a single Char which satisfies the predicate

msym :: (Char -> Maybe a) -> RE' a Source

Like psym, but allows to return a computed value instead of the original symbol

anySym :: RE' Char Source

Match and return any single symbol

string :: Text -> RE' Text Source

Match and return the given Text.

import Text.Regex.Applicative

number = string "one" *> pure 1 <|> string "two" *> pure 2

main = print $ "two" =~ number

reFoldl :: Greediness -> (b -> a -> b) -> b -> RE' a -> RE' b Source

Match zero or more instances of the given expression, which are combined using the given folding function.

Greediness argument controls whether this regular expression should match as many as possible (Greedy) or as few as possible (NonGreedy) instances of the underlying expression.

few :: RE' a -> RE' [a] Source

Match zero or more instances of the given expression, but as few of them as possible (i.e. non-greedily). A greedy equivalent of few is many.x

>>> findFirstPrefix (few anySym  <* "b") "ababab"
Just ("a","abab")
>>> findFirstPrefix (many anySym  <* "b") "ababab"
Just ("ababa","")

withMatched :: RE' a -> RE' (a, Text) Source

Return matched symbols as part of the return value

Basic matchers

match :: RE' a -> Text -> Maybe a Source

Attempt to match a Text against the regular expression. Note that the whole string (not just some part of it) should be matched.

>>> match (sym 'a' <|> sym 'b') "a"
Just 'a'
>>> match (sym 'a' <|> sym 'b') "ab"
Nothing

(=~) :: Text -> RE' a -> Maybe a infix 2 Source

s =~ a = match a s

replace :: RE' Text -> Text -> Text Source

Replace matches of regular expression with it's value.

>>> replace ("!" <$ sym 'f' <* some (sym 'o')) "quuxfoofooooofoobarfobar"
"quux!!!bar!bar"

Advanced matchers

findFirstPrefix :: RE' a -> Text -> Maybe (a, Text) Source

Find a string prefix which is matched by the regular expression.

Of all matching prefixes, pick one using left bias (prefer the left part of <|> to the right part) and greediness.

This is the match which a backtracking engine (such as Perl's one) would find first.

If match is found, the rest of the input is also returned.

>>> findFirstPrefix ("a" <|> "ab") "abc"
Just ("a","bc")
>>> findFirstPrefix ("ab" <|> "a") "abc"
Just ("ab","c")
>>> findFirstPrefix "bc" "abc"
Nothing

findLongestPrefix :: RE' a -> Text -> Maybe (a, Text) Source

Find the longest string prefix which is matched by the regular expression.

Submatches are still determined using left bias and greediness, so this is different from POSIX semantics.

If match is found, the rest of the input is also returned.

>>> let keyword = "if"
>>> let identifier = many $ psym isAlpha
>>> let lexeme = (Left <$> keyword) <|> (Right <$> identifier)
>>> findLongestPrefix lexeme "if foo"
Just (Left "if"," foo")
>>> findLongestPrefix lexeme "iffoo"
Just (Right "iffoo","")

findShortestPrefix :: RE' a -> Text -> Maybe (a, Text) Source

Find the shortest prefix (analogous to findLongestPrefix)

findFirstInfix :: RE' a -> Text -> Maybe (Text, a, Text) Source

Find the leftmost substring that is matched by the regular expression. Otherwise behaves like findFirstPrefix. Returns the result together with the prefix and suffix of the string surrounding the match.

findLongestInfix :: RE' a -> Text -> Maybe (Text, a, Text) Source

Find the leftmost substring that is matched by the regular expression. Otherwise behaves like findLongestPrefix. Returns the result together with the prefix and suffix of the string surrounding the match.

findShortestInfix :: RE' a -> Text -> Maybe (Text, a, Text) Source

Find the leftmost substring that is matched by the regular expression. Otherwise behaves like findShortestPrefix. Returns the result together with the prefix and suffix of the string surrounding the match.

Module re-exports