Safe Haskell | None |
---|
Scraping (innerHTML/innerText) and modification (node removal) functions.
- class GetInner elem where
- class GetAttribute elem where
- remove :: (Node -> Bool) -> Node -> Node
- removeDepth :: (Node -> Bool) -> Int -> Node -> Node
- removeTags :: [String] -> [Node] -> [Node]
- removeQueries :: [String] -> [Node] -> [Node]
- rmElem :: String -> String -> [String] -> [Node] -> [Node]
- nodeHaving :: (Node -> Bool) -> Node -> Bool
- removeQuery :: String -> [Node] -> [Node]
InnerHTML / InnerText
class GetInner elem whereSource
Type class for getting lazy text representation of HTML element(s). This can be used for Node
, Cursor
, [Node], and [Cursor].
Attirbutes
class GetAttribute elem whereSource
ename :: elem -> Maybe TextSource
Tag name of element node. Returns Nothing if the node is not an element.
eid :: elem -> Maybe TextSource
Returns an element id. If node is not an element or does not have an id, returns Nothing.
eclass :: elem -> [Text]Source
Returns element classes. If node is not an element or does not have a class, returns an empty list.
getMeta :: Text -> elem -> [Text]Source
Searches a meta with a specified name under a cursor, and gets a ''content'' field.
Removing descendant nodes
These functions work on Node
or [Node]
remove :: (Node -> Bool) -> Node -> NodeSource
Removes descendant nodes that satisfy predicate, and returns a new updated Node
.
This is a general function, and internally used for other remove* functions in this module.
removeDepth :: (Node -> Bool) -> Int -> Node -> NodeSource
Similar to remove
, but with a limit of depth.
removeTags :: [String] -> [Node] -> [Node]Source
Remove all descendant nodes with specified tag names.
removeQueries :: [String] -> [Node] -> [Node]Source
Remove all descendant nodes that match any of query strings. ''removeQuery'' in ver 0.1 was merged into this.
rmElem :: String -> String -> [String] -> [Node] -> [Node]Source
Remove descendant nodes that match specified tag, id, and class (similar to remove
, but more specific.)
If you pass an empty string to tag or id, that does not filter tag or id (Read the source code for details).
rmElem ''div'' ''div-id'' [''div-class'', ''div-class2''] nodes = newnodes
Other
nodeHaving :: (Node -> Bool) -> Node -> BoolSource
Checks whether the node contains any descendant (and self) node that satisfies predicate. To return false, this function needs to traverse all descendant elements, so this is not efficient.
Deprecated
removeQuery :: String -> [Node] -> [Node]Source
Deprecated: Use removeQueries instead.
Remove all descendant nodes that match a query string.