| Portability | GHC | 
|---|---|
| Stability | experimental | 
| Maintainer | bos@serpentine.com | 
| Safe Haskell | Safe-Infered | 
Data.Text.ICU.Regex
Description
Regular expression support for Unicode, implemented as bindings to the International Components for Unicode (ICU) libraries.
The syntax and behaviour of ICU regular expressions are Perl-like. For complete details, see the ICU User Guide entry at http://userguide.icu-project.org/strings/regexp.
Note: The functions in this module are not thread safe.  For
 thread safe use, see clone below, or use the pure functions in
 ICU.
- data  MatchOption - = CaseInsensitive
- | Comments
- | DotAll
- | Literal
- | Multiline
- | HaskellLines
- | UnicodeWord
- | ErrorOnUnknownEscapes
- | WorkLimit Int
- | StackLimit Int
 
- data ParseError
- data Regex
- regex :: [MatchOption] -> Text -> IO Regex
- regex' :: [MatchOption] -> Text -> IO (Either ParseError Regex)
- clone :: Regex -> IO Regex
- setText :: Regex -> Text -> IO ()
- getText :: Regex -> IO (ForeignPtr Word16, I16)
- pattern :: Regex -> Text
- find :: Regex -> I16 -> IO Bool
- findNext :: Regex -> IO Bool
- groupCount :: Regex -> IO Int
- start :: Regex -> Int -> IO (Maybe I16)
- end :: Regex -> Int -> IO (Maybe I16)
- start_ :: Regex -> Int -> IO I16
- end_ :: Regex -> Int -> IO I16
Types
data MatchOption Source
Options for controlling matching behaviour.
Constructors
| CaseInsensitive | Enable case insensitive matching. | 
| Comments | Allow comments and white space within patterns. | 
| DotAll | If set,  | 
| Literal | If set, treat the entire pattern as a literal string. Metacharacters or escape sequences in the input sequence will be given no special meaning. The option  | 
| Multiline | Control behaviour of  | 
| HaskellLines | Haskell-only line endings.  When this mode is enabled, only
  | 
| UnicodeWord | Unicode word boundaries.  If set,  Warning: Unicode word boundaries are quite different from traditional regular expression word boundaries. See http://unicode.org/reports/tr29/#Word_Boundaries. | 
| ErrorOnUnknownEscapes | Throw an error on unrecognized backslash escapes. If set, fail with an error on patterns that contain backslash-escaped ASCII letters without a known special meaning. If this flag is not set, these escaped letters represent themselves. | 
| WorkLimit Int | Set a processing limit for match operations. Some patterns, when matching certain strings, can run in exponential time. For practical purposes, the match operation may appear to be in an infinite loop. When a limit is set a match operation will fail with an error if the limit is exceeded. The units of the limit are steps of the match engine. Correspondence with actual processor time will depend on the speed of the processor and the details of the specific pattern, but will typically be on the order of milliseconds. By default, the matching time is not limited. | 
| StackLimit Int | Set the amount of heap storage avaliable for use by the match backtracking stack. ICU uses a backtracking regular expression engine, with the backtrack stack maintained on the heap. This function sets the limit to the amount of memory that can be used for this purpose. A backtracking stack overflow will result in an error from the match operation that caused it. A limit is desirable because a malicious or poorly designed pattern can use excessive memory, potentially crashing the process. A limit is enabled by default. | 
Instances
data ParseError Source
Detailed information about parsing errors.  Used by ICU parsing
 engines that parse long rules, patterns, or programs, where the
 text being parsed is long enough that more information than an
 ICUError is needed to localize the error.
A compiled regular expression.
Regex values are usually constructed using the regex or
 regex' functions.  This type is also an instance of IsString,
 so if you have the OverloadedStrings language extension enabled,
 you can construct a Regex by simply writing the pattern in
 quotes (though this does not allow you to specify any Options).
Functions
Construction
regex :: [MatchOption] -> Text -> IO RegexSource
Compile a regular expression with the given options.  This
 function throws a ParseError if the pattern is invalid.
The Regex is initialized with empty text to search against.
regex' :: [MatchOption] -> Text -> IO (Either ParseError Regex)Source
Compile a regular expression with the given options. This is safest to use when the pattern is constructed at run time.
clone :: Regex -> IO RegexSource
Make a copy of a compiled regular expression. Cloning a regular expression is faster than opening a second instance from the source form of the expression, and requires less memory.
Note that the current input string and the position of any matched text within it are not cloned; only the pattern itself and and the match mode flags are copied.
Cloning can be particularly useful to threaded applications that
 perform multiple match operations in parallel.  Each concurrent RE
 operation requires its own instance of a Regex.
Managing text to search
setText :: Regex -> Text -> IO ()Source
Set the subject text string upon which the regular expression will look for matches. This function may be called any number of times, allowing the regular expression pattern to be applied to different strings.
getText :: Regex -> IO (ForeignPtr Word16, I16)Source
Get the subject text that is currently associated with this regular expression object.
Inspection
pattern :: Regex -> TextSource
Return the source form of the pattern used to construct this regular expression or match.
Searching
find :: Regex -> I16 -> IO BoolSource
Find the first matching substring of the input string that matches the pattern.
If n is non-negative, the search for a match begins at the specified index, and any match region is reset.
If n is -1, the search begins at the start of the input region, or at the start of the full string if no region has been specified.
If a match is found, start, end, and group will provide more
 information regarding the match.
Match groups
Capturing groups are numbered starting from zero. Group zero is always the entire matching text. Groups greater than zero contain the text matching each capturing group in a regular expression.
groupCount :: Regex -> IO IntSource
Return the number of capturing groups in this regular expression's pattern.
start :: Regex -> Int -> IO (Maybe I16)Source
Returns the index in the input string of the start of the text
 matched by the specified capture group during the previous match
 operation.  Returns Nothing if the capture group was not part of
 the last match.
end :: Regex -> Int -> IO (Maybe I16)Source
Returns the index in the input string of the end of the text
 matched by the specified capture group during the previous match
 operation.  Returns Nothing if the capture group was not part of
 the last match.