Webrexp-1.0: Regexp-like engine to scrap web data




This module store the interface between the evaluator and the manipulated graph.



class (GraphPath rezPath, Eq a) => GraphWalker a rezPath | a -> rezPath whereSource

The aim of this typeclass is to permit the use of different html/xml parser if if the first one is found to be bad. All the logic should use this interface.

Minimal implementation : everything.


attribOf :: String -> a -> Maybe StringSource

Get back an attribute of the node if it exists

nameOf :: a -> Maybe StringSource

If the current node is named, return it's name, otherwise return Nothing.

childrenOf :: MonadIO m => a -> m [a]Source

Get all the children of the current node.

valueOf :: a -> StringSource

Retrieve the value of the tag (textual)

indirectLinks :: a -> [rezPath]Source

Retrieve all the indirectly linked content of a node, can be used for element like an HTML link or an linked image/obj

accessGraph :: MonadIO m => Loggers -> rezPath -> m (AccessResult a rezPath)Source

The idea behind link following. The graph engine may have another name for the resource, so an updated name can be given. The given function is there to log information, the second is to log errors

isHistoryMutable :: a -> BoolSource

Tell if the history associated is fixed or not. If the history is not fixed and can change (if you are querying the filesystem for example, it should return False)


GraphWalker DirectoryNode ResourcePath 
GraphWalker JsonNode ResourcePath 
GraphWalker HaXmLNode ResourcePath 
(PartialGraph a ResourcePath, PartialGraph b ResourcePath) => GraphWalker (UnionNode a b) ResourcePath 

class Show a => GraphPath a whereSource

Represent indirect links or links which necessitate the use of the IO monad to walk around the graph.


(<//>) :: a -> a -> aSource

Combine two path togethers, you can think of the / operator for an equivalence.

importPath :: String -> Maybe aSource

conversion to be used to import path from attributes/document (not really well specified).

dumpDataAtPath :: (Monad m, MonadIO m) => Loggers -> a -> m ()Source

Move semantic, try to dump the pointed resource to the current folder.

localizePath :: a -> FilePathSource

Given a graphpath, transform it to a filepath which can be used to store a node.

Commodity types

data AccessResult a rezPath Source

Result of indirect access demand.


Result rezPath a

We got a result and parsed it, maybe it has changed of location, so we give back the location

DataBlob rezPath ByteString

We got something, but we can't interpret it, so we return a binary blob.


Cannot access the resource.

type Logger = String -> IO ()Source

Type used to propagate different logging level across the software.

type Loggers = (Logger, Logger, Logger)Source

NormalErrverbose loggers.

type NodePath a = [(a, Int)]Source

Represent the path used to find the node from the starting point of the graph.

Helper functions.

descendants :: (MonadIO m, GraphWalker a r) => a -> m [(a, [(a, Int)])]Source

Return a list of all the children/linked node of a given node. The given node is not included in the list. A list of node with the taken path is returned.

findNamed :: (Functor m, MonadIO m, GraphWalker a r) => String -> a -> m [(a, [(a, Int)])]Source

Given a tag and a name, retrieve the first matching tags in the hierarchy. It must return the list of ancestors permitting the acess to the path used to find children

the returned list must contain : the node itself if it match the name, and all the children containing the good name.

findFirstNamed :: (Functor m, MonadIO m, GraphWalker a r) => String -> [a] -> m (Maybe (a, [(a, Int)]))Source

Return the first found node if any.