This module defines the data type of internal regular expression pattern, | as well as the partial derivative operations for regular expression patterns.
- data Pat
- strip :: Pat -> RE
- pdPat :: Pat -> Letter -> [Pat]
- type Binder = IntMap [Range]
- toBinder :: Pat -> Binder
- listifyBinder :: Binder -> [(Int, [Range])]
- pdPat0 :: Pat -> Letter -> [(Pat, Int -> Binder -> Binder)]
- pdPat0Sim :: Pat -> Letter -> [(Pat, Int -> Binder -> Binder)]
- nub2 :: Eq a => [(a, b)] -> [(a, b)]
regular expression patterns
|PVar Int [Range] Pat|
pattern without binder
|PPair Pat Pat|
|PChoice Pat Pat GFlag|
|PStar Pat GFlag|
|PPlus Pat Pat|
plus pattern, it is used internally to indicate that it is unrolled from a PStar
empty pattern, it is used intermally to indicate that mkEmpty function has been applied.
The Eq instance for Pat data type NOTE: We ignore the 'consumed word' when comparing patterns (ie we only compare the pattern structure). Essential for later comparisons among patterns.
mainly interested in simplifying epsilon, p --> p could be made more optimal, e.g. (epsilon, epsilon) --> epsilon
pdPat computes the partial derivatives of a pattern w.r.t. a letter.
Integrating non-greedy operator with PStar
For p*, we need to unroll it into a special construct
say PPlus p' p* where p' in p/l.
When we push another label, say l' to PPlus p' p*, and
p' is emptiable, naively, we would do
[ PPlus p'' p* | p'' <- p' l ] ++ [ PPlus (mkE p') (PPlus p''' p*) | (PPlus p''' p*) <- p*l ]
Now the problem here is the shape of the pdpat are infinite, which
breaks the requirement of getting a compilation scheme.
The fix here is to simplify the second component, by combining the binding, of (mkE p') and p'''
since they share the same set of variables.
[ PPlus p'' p* | p'' <- p' l ] ++ [ PPlus p4 p* | (PPlus p''' p*) <- p*l ]
where p4 = combineBinding (mkE p') p'''
For pdPat0 approach, we do not need to do this explicitly, we simply drop
(mkE p') even in the PPair case. see the definitely of pdPat0 below
Binder type denotes a set of (pattern var * range) pairs
type Binder = [(Int, [Range])]
pdPat0 is the
abstracted form of the
It computes a set of pairs. Each pair consists a
shape of the partial derivative, and
an update function which defines the change of the pattern bindings from the
source pattern to
the resulting partial derivative. This is used in the compilation of the regular expression pattern