regex-posix-0.93.1: Replaces/Enhances Text.Regex

Portabilitynon-portable (regex-base needs MPTC+FD)
Stabilityexperimental
Maintainerlibraries@haskell.org, textregexlazy@personal.mightyreason.com

Text.Regex.Posix.Wrap

Contents

Description

WrapPosix.hsc exports a wrapped version of the ffi imports. To increase type safety, the flags are newtype'd. The other important export is a Regex type that is specific to the Posix library backend. The flags are documented in Text.Regex.Posix. The defaultCompOpt is (compExtended .|. compNewline).

The Regex, CompOption, and ExecOption types and their RegexOptions instance is declared. The =~ and =~~ convenience functions are defined.

The exported symbols are the same whether 1 is defined, but when it is not defined then getVersion == Nothing and all other exported values will call error or fail.

This module will fail or error only if allocation fails or a nullPtr is passed in.

Synopsis

High-level API

type RegOffset = Int64Source

RegOffset is typedef int regoff_t on Linux and ultimately typedef long long __int64_t on Max OS X. So rather than saying 2,147,483,647 is all the length you need, I'll take the larger: 9,223,372,036,854,775,807 should be enough bytes for anyone, no need for Integer. The alternative is to compile to different sizes in a platform dependent manner with type RegOffset = (, which I do not want to do.

There is also a special value unusedRegOffset :: RegOffset which is (-1) and as a starting index means that the subgroup capture was unused. Otherwise the RegOffset indicates a character boundary that is before the character at that index offset, with the first character at index offset 0. So starting at 1 and ending at 2 means to take only the second character.

(=~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target) => source1 -> source -> targetSource

(=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext Regex source1 target, Monad m) => source1 -> source -> m targetSource

Low-level API

type WrapError = (ReturnCode, String)Source

The return code will be retOk when it is the Haskell wrapper and not the underlying library generating the error message.

wrapCompileSource

Arguments

:: CompOption

Flags (bitmapped)

-> ExecOption

Flags (bitmapped)

-> CString

The regular expression to compile (ASCII only, no null bytes)

-> IO (Either WrapError Regex)

Returns: the compiled regular expression

wrapMatch :: Regex -> CString -> IO (Either WrapError (Maybe [(RegOffset, RegOffset)]))Source

wrapMatch returns offsets for the begin and end of each capture. Unused captures have offsets of unusedRegOffset which is (-1)

wrapMatchAll :: Regex -> CString -> IO (Either WrapError [MatchArray])Source

wrapMatchAll returns the offset and length of each capture. Unused captures have an offset of unusedRegOffset which is (-1) and length of 0.

Miscellaneous

Compilation options

newtype CompOption Source

A bitmapped CInt containing options for compilation of regular expressions. Option values (and their man 3 regcomp names) are

  • compBlank which is a completely zero value for all the flags. This is also the blankCompOpt value.
  • compExtended (REG_EXTENDED) which can be set to use extended instead of basic regular expressions. This is set in the defaultCompOpt value.
  • compNewline (REG_NEWLINE) turns on newline sensitivity: The dot (.) and inverted set [^ ] never match newline, and ^ and $ anchors do match after and before newlines. This is set in the defaultCompOpt value.
  • compIgnoreCase (REG_ICASE) which can be set to match ignoring upper and lower distinctions.
  • compNoSub (REG_NOSUB) which turns off all information from matching except whether a match exists.

Constructors

CompOption CInt 

compBlank :: CompOptionSource

A completely zero value for all the flags. This is also the blankCompOpt value.

Execution options

newtype ExecOption Source

A bitmapped CInt containing options for execution of compiled regular expressions. Option values (and their man 3 regexec names) are

  • execBlank which is a complete zero value for all the flags. This is the blankExecOpt value.
  • execNotBOL (REG_NOTBOL) can be set to prevent ^ from matching at the start of the input.
  • execNotEOL (REG_NOTEOL) can be set to prevent $ from matching at the end of the input (before the terminating NUL).

Constructors

ExecOption CInt 

execBlank :: ExecOptionSource

A completely zero value for all the flags. This is also the blankExecOpt value.

Return codes

newtype ReturnCode Source

ReturnCode is an enumerated CInt, corresponding to the error codes from man 3 regex:

  • retBadbr (REG_BADBR) invalid repetition count(s) in { }
  • retBadpat (REG_BADPAT) invalid regular expression
  • retBadrpt (REG_BADRPT) ?, *, or + operand invalid
  • retEcollate (REG_ECOLLATE) invalid collating element
  • retEctype (REG_ECTYPE) invalid character class
  • retEescape (REG_EESCAPE) \ applied to unescapable character
  • retEsubreg (REG_ESUBREG) invalid backreference number
  • retEbrack (REG_EBRACK) brackets [ ] not balanced
  • retEparen (REG_EPAREN) parentheses ( ) not balanced
  • retEbrace (REG_EBRACE) braces { } not balanced
  • retErange (REG_ERANGE) invalid character range in [ ]
  • retEspace (REG_ESPACE) ran out of memory
  • retNoMatch (REG_NOMATCH) The regexec() function failed to match

Constructors

ReturnCode CInt