Safe Haskell | None |
---|
Datatypes used to describe webrexps, and some helper functions.
- data WebRef
- data NodeRange
- data Op
- data ActionExpr
- = ActionExprs [ActionExpr]
- | BinOp Op ActionExpr ActionExpr
- | ARef String
- | CstI Int
- | CstS String
- | NodeReplace ActionExpr
- | OutputAction
- | DeepOutputAction
- | NodeNameOutputAction
- | Call BuiltinFunc [ActionExpr]
- data WebRexp
- = Branch [WebRexp]
- | Unions [WebRexp]
- | List [WebRexp]
- | Star WebRexp
- | Repeat RepeatCount WebRexp
- | Alternative WebRexp WebRexp
- | Unique Int
- | Str String
- | Action ActionExpr
- | Range Int [NodeRange]
- | Ref WebRef
- | DirectChild WebRef
- | ConstrainedRef WebRef ActionExpr
- | DiggLink
- | DumpLink
- | NextSibling
- | PreviousSibling
- | Parent
- data RepeatCount
- data BuiltinFunc
- simplifyNodeRanges :: [NodeRange] -> [NodeRange]
- foldWebRexp :: (a -> WebRexp -> (a, WebRexp)) -> a -> WebRexp -> (a, WebRexp)
- assignWebrexpIndices :: WebRexp -> (Int, Int, WebRexp)
- prettyShowWebRef :: WebRef -> String
- packRefFiltering :: WebRexp -> WebRexp
- isInNodeRange :: Int -> [NodeRange] -> Bool
- isOperatorBoolean :: Op -> Bool
- isActionPredicate :: ActionExpr -> Bool
Types
represent an element
Ranges to be able to filter nodes by position.
Definitions of the operators available in the actions of the webrexp.
OpAdd | |
OpSub | |
OpMul | |
OpDiv | |
OpLt | |
OpLe | |
OpGt | |
OpGe | |
OpEq | '=' in webrexp ( |
OpNe | '!=' ( |
OpAnd | '&' ( |
OpOr | '|' ( |
OpMatch | '=~' regexp matching |
OpContain | '~=' op contain, as the CSS3 operator. |
OpBegin | '^=' op beginning, as the CSS3 operator. |
OpEnd | '$=' op beginning, as the CSS3 operator. |
OpSubstring | '^=' op beginning, as the CSS3 operator. |
OpHyphenBegin | '|=' op beginning, as the CSS3 operator. |
OpConcat | ':' concatenate two strings |
data ActionExpr Source
Represent an action Each production of the grammar more or less map to a data constructor of this type.
ActionExprs [ActionExpr] | { ... ; ... ; ... ; ... }
A list of action to execute, each
one must return a |
BinOp Op ActionExpr ActionExpr | Basic binary opertor application |
ARef String | Find a value of a given attribute for the current element. |
CstI Int | An integer constant. |
CstS String | A string constant |
NodeReplace ActionExpr | '$'... operator Used to put the action value back into the evaluation pipeline. |
OutputAction | the |
DeepOutputAction | Translate a node and all it's children into text. |
NodeNameOutputAction | Retrieve a node name |
Call BuiltinFunc [ActionExpr] | funcName(..., ...) |
Type representation of web-regexp, main type.
Branch [WebRexp] | ( ... ; ... ; ... ) |
Unions [WebRexp] | ( ... , ... , ... ) |
List [WebRexp] | ... ... (each action followed, no rollback) |
Star WebRexp | ... * |
Repeat RepeatCount WebRexp | ... #{ } |
Alternative WebRexp WebRexp | '|' Represent two alternative path, if the first fail, the second one is taken |
Unique Int | '!' Possess an unique index to differentiate all the differents uniques. Negative value are considered invalid, all positive or null one are accepted. |
Str String | "..." A string constant in the source expression. |
Action ActionExpr | "{ ... }" |
Range Int [NodeRange] | '[ ... ]' Filtering Range The Int is used as an index for a counter in the DFS evaluator. |
Ref WebRef | every tag/class name |
DirectChild WebRef | Find children who are the different descendent of the current nodes. |
ConstrainedRef WebRef ActionExpr | This constructor is an optimisation, it combine an Ref followed by an action, where every action is a predicate. Help pruning quickly the evaluation tree in DFS evaluation. |
DiggLink | '>>' operator in the language, used to follow hyper link |
DumpLink | '->' operator in the language, used to follow hyper link and dump the resulting content on hard drive (if permited). |
NextSibling | '+' operator in the language, used to select the next sibling node. |
PreviousSibling | '~' operator in the language, used to select the previous sibling node. |
Parent | '<' operator in the language. Select the parent node |
data RepeatCount Source
data BuiltinFunc Source
Type used to index built-in functions in actions.
Functions
Transformations
simplifyNodeRanges :: [NodeRange] -> [NodeRange]Source
This function is an helper function to simplify the handling the node range. After simplification, the ranges are sorted in ascending order and no node range overlap.
foldWebRexp :: (a -> WebRexp -> (a, WebRexp)) -> a -> WebRexp -> (a, WebRexp)Source
This function permit the rewriting of a wabrexp in a depth-first fashion while carying out an accumulator.
assignWebrexpIndices :: WebRexp -> (Int, Int, WebRexp)Source
Preparation function for webrexp, assign all indices used for evaluation as an automata.
prettyShowWebRef :: WebRef -> StringSource
Pretty printing for WebRef
. It's should be reparsable
by the WebRexp parser.
Predicates
isInNodeRange :: Int -> [NodeRange] -> BoolSource
Helper function to check if a given in dex is within all the ranges
isOperatorBoolean :: Op -> BoolSource
Tell if an action operator return a boolean
operation. Useful to tell if an action is a
predicate. See isActionPredicate
isActionPredicate :: ActionExpr -> BoolSource
Tell if an action is a predicate and is only used to filter nodes. Expression can be modified with this information to help prunning as soon as possible with the DFS evaluator.