This module defines the data type of internal regular expression pattern, | as well as the partial derivative operations for regular expression patterns.
- data Pat
- strip :: Pat -> RE
- mkEmpPat :: Pat -> Pat
- pdPat :: Pat -> Letter -> [Pat]
- getBindingsFrom :: Pat -> Pat -> Pat
- type Binder = [(Int, [Range])]
- toBinder :: Pat -> Binder
- updateBinderByIndex :: Int -> Int -> Binder -> Binder
- updateBinderByIndexSub :: Int -> Int -> Binder -> Binder
- pdPat0 :: Pat -> Letter -> [(Pat, Int -> Binder -> Binder)]
- nub2 :: Eq a => [(a, b)] -> [(a, b)]
regular expression patterns
|PVar Int [Range] Pat|
pattern without binder
|PPair Pat Pat|
|PChoice Pat Pat GFlag|
|PStar Pat GFlag|
|PPlus Pat Pat|
plus pattern, it is used internally to indicate that it is unrolled from a PStar
empty pattern, it is used intermally to indicate that mkEmpty function has been applied.
pdPat computes the partial derivatives of a pattern w.r.t. a letter.
Integrating non-greedy operator with PStar
For p*, we need to unroll it into a special construct
say PPlus p' p* where p' in p/l.
When we push another label, say l' to PPlus p' p*, and
p' is emptiable, naively, we would do
[ PPlus p'' p* | p'' <- p' l ] ++ [ PPlus (mkE p') (PPlus p''' p*) | (PPlus p''' p*) <- p*l ]
Now the problem here is the shape of the pdpat are infinite, which
breaks the requirement of getting a compilation scheme.
The fix here is to simplify the second component, by combining the binding, of (mkE p') and p'''
since they share the same set of variables.
[ PPlus p'' p* | p'' <- p' l ] ++ [ PPlus p4 p* | (PPlus p''' p*) <- p*l ]
where p4 = combineBinding (mkE p') p'''
For pdPat0 approach, we do not need to do this explicitly, we simply drop
(mkE p') even in the PPair case. see the definitely of pdPat0 below
getBindingsFrom transfer bindings from p2 to p1
updateBinderByIndex updates a binder given an index to a pattern var
ASSUMPTION: the var index in the pattern is linear. e.g. no ( 0 :: R1, (1 :: R2, 2 :a: R3))
pdPat0 is the
abstracted form of the
It computes a set of pairs. Each pair consists a
shape of the partial derivative, and
an update function which defines the change of the pattern bindings from the
source pattern to
the resulting partial derivative. This is used in the compilation of the regular expression pattern