Copyright | (c) 2010 Bryan O'Sullivan |
---|---|
License | BSD-style |
Maintainer | bos@serpentine.com |
Stability | experimental |
Portability | GHC |
Safe Haskell | Safe-Inferred |
Language | Haskell98 |
Regular expression support for Unicode, implemented as bindings to the International Components for Unicode (ICU) libraries.
The syntax and behaviour of ICU regular expressions are Perl-like. For complete details, see the ICU User Guide entry at http://userguide.icu-project.org/strings/regexp.
Note: The functions in this module are not thread safe. For
thread safe use, see clone
below, or use the pure functions in
Data.Text.ICU.
Synopsis
- data MatchOption
- data ParseError
- data Regex
- regex :: [MatchOption] -> Text -> IO Regex
- regex' :: [MatchOption] -> Text -> IO (Either ParseError Regex)
- clone :: Regex -> IO Regex
- setText :: Regex -> Text -> IO ()
- getUTextPtr :: Regex -> IO UTextPtr
- pattern :: Regex -> Text
- find :: Regex -> TextI -> IO Bool
- findNext :: Regex -> IO Bool
- groupCount :: Regex -> IO Int
- start :: Regex -> Int -> IO (Maybe TextI)
- end :: Regex -> Int -> IO (Maybe TextI)
- start_ :: Regex -> Int -> IO TextI
- end_ :: Regex -> Int -> IO TextI
Types
data MatchOption Source #
Options for controlling matching behaviour.
CaseInsensitive | Enable case insensitive matching. |
Comments | Allow comments and white space within patterns. |
DotAll | If set, |
Literal | If set, treat the entire pattern as a literal string. Metacharacters or escape sequences in the input sequence will be given no special meaning. The option |
Multiline | Control behaviour of |
HaskellLines | Haskell-only line endings. When this mode is enabled, only
|
UnicodeWord | Unicode word boundaries. If set, Warning: Unicode word boundaries are quite different from traditional regular expression word boundaries. See http://unicode.org/reports/tr29/#Word_Boundaries. |
ErrorOnUnknownEscapes | Throw an error on unrecognized backslash escapes. If set, fail with an error on patterns that contain backslash-escaped ASCII letters without a known special meaning. If this flag is not set, these escaped letters represent themselves. |
WorkLimit Int | Set a processing limit for match operations. Some patterns, when matching certain strings, can run in exponential time. For practical purposes, the match operation may appear to be in an infinite loop. When a limit is set a match operation will fail with an error if the limit is exceeded. The units of the limit are steps of the match engine. Correspondence with actual processor time will depend on the speed of the processor and the details of the specific pattern, but will typically be on the order of milliseconds. By default, the matching time is not limited. |
StackLimit Int | Set the amount of heap storage available for use by the match backtracking stack. ICU uses a backtracking regular expression engine, with the backtrack stack maintained on the heap. This function sets the limit to the amount of memory that can be used for this purpose. A backtracking stack overflow will result in an error from the match operation that caused it. A limit is desirable because a malicious or poorly designed pattern can use excessive memory, potentially crashing the process. A limit is enabled by default. |
Instances
Show MatchOption Source # | |
Defined in Data.Text.ICU.Regex.Internal showsPrec :: Int -> MatchOption -> ShowS # show :: MatchOption -> String # showList :: [MatchOption] -> ShowS # | |
Eq MatchOption Source # | |
Defined in Data.Text.ICU.Regex.Internal (==) :: MatchOption -> MatchOption -> Bool # (/=) :: MatchOption -> MatchOption -> Bool # |
data ParseError Source #
Detailed information about parsing errors. Used by ICU parsing
engines that parse long rules, patterns, or programs, where the
text being parsed is long enough that more information than an
ICUError
is needed to localize the error.
Instances
Exception ParseError Source # | |
Defined in Data.Text.ICU.Error.Internal toException :: ParseError -> SomeException # fromException :: SomeException -> Maybe ParseError # displayException :: ParseError -> String # | |
Show ParseError Source # | |
Defined in Data.Text.ICU.Error.Internal showsPrec :: Int -> ParseError -> ShowS # show :: ParseError -> String # showList :: [ParseError] -> ShowS # | |
NFData ParseError Source # | |
Defined in Data.Text.ICU.Error.Internal rnf :: ParseError -> () # |
A compiled regular expression.
Regex
values are usually constructed using the regex
or
regex'
functions. This type is also an instance of IsString
,
so if you have the OverloadedStrings
language extension enabled,
you can construct a Regex
by simply writing the pattern in
quotes (though this does not allow you to specify any Option
s).
Functions
Construction
regex :: [MatchOption] -> Text -> IO Regex Source #
Compile a regular expression with the given options. This
function throws a ParseError
if the pattern is invalid.
The Regex
is initialized with empty text to search against.
regex' :: [MatchOption] -> Text -> IO (Either ParseError Regex) Source #
Compile a regular expression with the given options. This is safest to use when the pattern is constructed at run time.
clone :: Regex -> IO Regex Source #
Make a copy of a compiled regular expression. Cloning a regular expression is faster than opening a second instance from the source form of the expression, and requires less memory.
Note that the current input string and the position of any matched text within it are not cloned; only the pattern itself and and the match mode flags are copied.
Cloning can be particularly useful to threaded applications that
perform multiple match operations in parallel. Each concurrent RE
operation requires its own instance of a Regex
.
Managing text to search
setText :: Regex -> Text -> IO () Source #
Set the subject text string upon which the regular expression will look for matches. This function may be called any number of times, allowing the regular expression pattern to be applied to different strings.
getUTextPtr :: Regex -> IO UTextPtr Source #
Get the subject text that is currently associated with this regular expression object.
Inspection
pattern :: Regex -> Text Source #
Return the source form of the pattern used to construct this regular expression or match.
Searching
find :: Regex -> TextI -> IO Bool Source #
Find the first matching substring of the input string that matches the pattern.
If n is non-negative, the search for a match begins at the specified index, and any match region is reset.
If n is -1, the search begins at the start of the input region, or at the start of the full string if no region has been specified.
If a match is found, start
, end
, and group
will provide more
information regarding the match.
Match groups
Capturing groups are numbered starting from zero. Group zero is always the entire matching text. Groups greater than zero contain the text matching each capturing group in a regular expression.
groupCount :: Regex -> IO Int Source #
Return the number of capturing groups in this regular expression's pattern.
start :: Regex -> Int -> IO (Maybe TextI) Source #
Returns the index in the input string of the start of the text
matched by the specified capture group during the previous match
operation. Returns Nothing
if the capture group was not part of
the last match.
end :: Regex -> Int -> IO (Maybe TextI) Source #
Returns the index in the input string of the end of the text
matched by the specified capture group during the previous match
operation. Returns Nothing
if the capture group was not part of
the last match.
start_ :: Regex -> Int -> IO TextI Source #
Returns the index in the input string of the start of the text
matched by the specified capture group during the previous match
operation. Returns -1
if the capture group was not part of the
last match.
end_ :: Regex -> Int -> IO TextI Source #
Returns the index in the input string of the end of the text
matched by the specified capture group during the previous match
operation. Returns -1
if the capture group was not part of
the last match.