Portability | non-portable (regex-base needs MPTC+FD) |
---|---|
Stability | experimental |
Maintainer | libraries@haskell.org, textregexlazy@personal.mightyreason.com |
WrapPosix.hsc exports a wrapped version of the ffi imports. To
increase type safety, the flags are newtype'd. The other important
export is a Regex
type that is specific to the Posix library
backend. The flags are documented in Text.Regex.Posix. The
defaultCompOpt
is (compExtended .|. compNewline)
.
The Regex
, CompOption
, and ExecOption
types and their RegexOptions
instance is declared. The =~
and =~~
convenience functions are
defined.
The exported symbols are the same whether 1 is defined, but
when it is not defined then getVersion == Nothing
and all other
exported values will call error or fail.
This module will fail or error only if allocation fails or a nullPtr is passed in.
- data Regex
- type RegOffset = Int64
- (=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> target
- (=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, Monad m) => source1 -> source -> m target
- type WrapError = (ReturnCode, String)
- wrapCompile :: CompOption -> ExecOption -> CString -> IO (Either WrapError Regex)
- wrapTest :: Regex -> CString -> IO (Either WrapError Bool)
- wrapMatch :: Regex -> CString -> IO (Either WrapError (Maybe [(RegOffset, RegOffset)]))
- wrapMatchAll :: Regex -> CString -> IO (Either WrapError [MatchArray])
- wrapCount :: Regex -> CString -> IO (Either WrapError Int)
- unusedRegOffset :: RegOffset
- newtype CompOption = CompOption CInt
- compBlank :: CompOption
- compExtended :: CompOption
- compIgnoreCase :: CompOption
- compNoSub :: CompOption
- compNewline :: CompOption
- newtype ExecOption = ExecOption CInt
- execBlank :: ExecOption
- execNotBOL :: ExecOption
- execNotEOL :: ExecOption
- newtype ReturnCode = ReturnCode CInt
- retBadbr :: ReturnCode
- retBadpat :: ReturnCode
- retBadrpt :: ReturnCode
- retEcollate :: ReturnCode
- retEctype :: ReturnCode
- retEescape :: ReturnCode
- retEsubreg :: ReturnCode
- retEbrack :: ReturnCode
- retEparen :: ReturnCode
- retEbrace :: ReturnCode
- retErange :: ReturnCode
- retEspace :: ReturnCode
High-level API
A compiled regular expression.
RegOffset is typedef int regoff_t on Linux and ultimately typedef long long __int64_t on Max OS X. So rather than saying 2,147,483,647 is all the length you need, I'll take the larger: 9,223,372,036,854,775,807 should be enough bytes for anyone, no need for Integer. The alternative is to compile to different sizes in a platform dependent manner with type RegOffset = (, which I do not want to do.
There is also a special value unusedRegOffset
:: RegOffset
which is
(-1) and as a starting index means that the subgroup capture was
unused. Otherwise the RegOffset indicates a character boundary that
is before the character at that index offset, with the first
character at index offset 0. So starting at 1 and ending at 2 means
to take only the second character.
(=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> targetSource
(=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, Monad m) => source1 -> source -> m targetSource
Low-level API
type WrapError = (ReturnCode, String)Source
The return code will be retOk when it is the Haskell wrapper and not the underlying library generating the error message.
:: CompOption | Flags (bitmapped) |
-> ExecOption | Flags (bitmapped) |
-> CString | The regular expression to compile (ASCII only, no null bytes) |
-> IO (Either WrapError Regex) | Returns: the compiled regular expression |
wrapMatch :: Regex -> CString -> IO (Either WrapError (Maybe [(RegOffset, RegOffset)]))Source
wrapMatch returns offsets for the begin and end of each capture. Unused captures have offsets of unusedRegOffset which is (-1)
wrapMatchAll :: Regex -> CString -> IO (Either WrapError [MatchArray])Source
wrapMatchAll returns the offset and length of each capture. Unused captures have an offset of unusedRegOffset which is (-1) and length of 0.
Miscellaneous
Compilation options
newtype CompOption Source
A bitmapped CInt
containing options for compilation of regular
expressions. Option values (and their man 3 regcomp names) are
-
compBlank
which is a completely zero value for all the flags. This is also theblankCompOpt
value. -
compExtended
(REG_EXTENDED) which can be set to use extended instead of basic regular expressions. This is set in thedefaultCompOpt
value. -
compNewline
(REG_NEWLINE) turns on newline sensitivity: The dot (.) and inverted set[^ ]
never match newline, and ^ and $ anchors do match after and before newlines. This is set in thedefaultCompOpt
value. -
compIgnoreCase
(REG_ICASE) which can be set to match ignoring upper and lower distinctions. -
compNoSub
(REG_NOSUB) which turns off all information from matching except whether a match exists.
A completely zero value for all the flags.
This is also the blankCompOpt
value.
Execution options
newtype ExecOption Source
A bitmapped CInt
containing options for execution of compiled
regular expressions. Option values (and their man 3 regexec names) are
-
execBlank
which is a complete zero value for all the flags. This is the blankExecOpt value. -
execNotBOL
(REG_NOTBOL) can be set to prevent ^ from matching at the start of the input. -
execNotEOL
(REG_NOTEOL) can be set to prevent $ from matching at the end of the input (before the terminating NUL).
A completely zero value for all the flags.
This is also the blankExecOpt
value.
Return codes
newtype ReturnCode Source
ReturnCode is an enumerated CInt
, corresponding to the error codes
from man 3 regex
:
-
retBadbr
(REG_BADBR
) invalid repetition count(s) in{ }
-
retBadpat
(REG_BADPAT
) invalid regular expression -
retBadrpt
(REG_BADRPT
)?
,*
, or+
operand invalid -
retEcollate
(REG_ECOLLATE
) invalid collating element -
retEctype
(REG_ECTYPE
) invalid character class -
retEescape
(REG_EESCAPE
)\
applied to unescapable character -
retEsubreg
(REG_ESUBREG
) invalid backreference number -
retEbrack
(REG_EBRACK
) brackets[ ]
not balanced -
retEparen
(REG_EPAREN
) parentheses( )
not balanced -
retEbrace
(REG_EBRACE
) braces{ }
not balanced -
retErange
(REG_ERANGE
) invalid character range in[ ]
-
retEspace
(REG_ESPACE
) ran out of memory -
retNoMatch
(REG_NOMATCH
) The regexec() function failed to match