weighted-regexp-0.3.1.1: Weighted Regular Expression Matcher

Stabilityexperimental
MaintainerSebastian Fischer <mailto:sebf@informatik.uni-kiel.de>

Text.RegExp.Internal

Description

This module exports internal data types and matching functions. You do not need to import it unless you want to write your own matching algorithms.

Synopsis

Documentation

newtype RegExp c Source

Regular expressions are represented as values of type RegExp c where c is the character type of the underlying alphabet. Values of type RegExp c can be matched against lists of type [c].

Constructors

RegExp (forall w. Semiring w => RegW w c) 

data RegW w c Source

Constructors

RegW 

Fields

active :: !Bool
 
empty :: !w
 
final_ :: !w
 
reg :: !(Reg w c)
 

Instances

final :: Semiring w => RegW w c -> wSource

data Reg w c Source

Constructors

Eps 
Sym String (c -> w) 
Alt (RegW w c) (RegW w c) 
Seq (RegW w c) (RegW w c) 
Rep (RegW w c) 

Instances

class Semiring w => Weight a b w whereSource

Methods

symWeight :: (a -> w) -> b -> wSource

Instances

Weight c c Bool 
Weight c c Longest 
Num a => Weight c c (Numeric a) 
Weight c (Int, c) Leftmost 
Weight c (Int, c) LeftLong 

defaultSymWeight :: (a -> w) -> a -> wSource

weighted :: Weight a b w => RegW w a -> RegW w bSource

eps :: RegExp cSource

Matches the empty word. eps has no direct string representation but is used to implement other constructs such as optional components like a?.

char :: Char -> RegExp CharSource

Matches the given character.

sym :: (Eq c, Show c) => c -> RegExp cSource

Matches the given symbol.

psym :: String -> (c -> Bool) -> RegExp cSource

Matches a symbol that satisfies the given predicate.

symW :: Semiring w => String -> (c -> w) -> RegW w cSource

anySym :: RegExp cSource

Matches an arbitrary symbol.

noMatch :: RegExp cSource

Does not match anything. noMatch is an identity for alt.

alt :: RegExp c -> RegExp c -> RegExp cSource

Matches either of two regular expressions. For example a+b matches either the character a or the character b.

altW :: Semiring w => RegW w c -> RegW w c -> RegW w cSource

seq_ :: RegExp c -> RegExp c -> RegExp cSource

Matches the sequence of two regular expressions. For example the regular expressions ab matches the word ab.

seqW :: Semiring w => RegW w c -> RegW w c -> RegW w cSource

rep :: RegExp c -> RegExp cSource

Matches zero or more occurrences of the given regular expression. For example a* matches the character a zero or more times.

repW :: Semiring w => RegW w c -> RegW w cSource

rep1 :: RegExp c -> RegExp cSource

Matches one or more occurrences of the given regular expression. For example a+ matches the character a one or more times.

opt :: RegExp c -> RegExp cSource

Matches the given regular expression or the empty word. Optional expressions are usually written a? but could also be written (|a), that is, as alternative between eps and a.

brep :: (Int, Int) -> RegExp c -> RegExp cSource

Matches a regular expression a given number of times. For example, the regular expression a{4,7} matches the character a four to seven times. If the minimal and maximal occurences are identical, one can be left out, that is, a{2} matches two occurrences of the character a.

Numerical bounds are implemented via translation into ordinary regular expressions. For example, a{4,7} is translated into aaaa(a(a(a)?)?)?.

regW :: Semiring w => RegExp c -> RegW w cSource

acceptFull :: RegExp c -> [c] -> BoolSource

Checks whether a regular expression matches the given word. For example, acceptFull (fromString "b|abc") "b" yields True because the first alternative of b|abc matches the string "b".

acceptPartial :: RegExp c -> [c] -> BoolSource

Checks whether a regular expression matches a subword of the given word. For example, acceptPartial (fromString "b") "abc" yields True because "abc" contains the substring "b".

matchingCount :: Num a => RegExp c -> [c] -> aSource

Computes in how many ways a word can be matched against a regular expression.

fullMatch :: Weight a b w => RegExp a -> [b] -> wSource

Matches a regular expression against a word computing a weight in an arbitrary semiring.

The symbols can have associated weights associated by the symWeight function of the Weight class. This function also allows to adjust the type of the used alphabet such that, for example, positional information can be taken into account by zipping the word with positions.

partialMatch :: Weight a b w => RegExp a -> [b] -> wSource

Matches a regular expression against substrings of a word computing a weight in an arbitrary semiring. Similar to fullMatch the Weight class is used to associate weights to the symbols of the regular expression.

matchW :: Semiring w => RegW w c -> [c] -> wSource

shiftW :: Semiring w => w -> RegW w c -> c -> RegW w cSource

shift :: Semiring w => w -> Reg w c -> c -> RegW w cSource