Webrexp-1.1.2: Regexp-like engine to scrap web data

Safe HaskellNone




Generic module for using Webrexp as a user. the main functions for the user are queryDocument to perform an in-memory evaluation, and evalWebRexpDepthFirst


In memory evaluation

data ParseableType Source

Describe different kind of content parser usable



Indicate a parser which must be tolerant enough to parse HTML


You can go ahead and use a rather strict parser.


Do what you want with it for now.

queryDocument :: ParseableType -> ByteString -> WebRexp -> [Either String String]Source

Query a document in memory and retrieve the results, you can use it in combination to the quasiquoting facility to embed the webrexp in haskell :

 {-# LANGUAGE QuasiQuotes #-}
 import Text.Webrexp
 import Text.Webrexp.Quote
 import qualified Data.ByteString.Char8 as B

 main :: IO ()
 main = print $ queryDocument ParseableJson document [webrexpParse| some things [.] |]
     where document = B.pack "{ \"some\": { \"things\": \"a phrase\" } }"

The returned values contain possible errors as Left and real value as 'Right.

queryDocumentM :: forall s. ParseableType -> ByteString -> WebRexp -> ST s [Either String String]Source

Exactly same thing as queryDocument, but in ST

Default evaluation

evalWebRexp :: String -> IO BoolSource

Simple evaluation function, evaluation is the breadth first type.

evalWebRexpDepthFirst :: String -> IO BoolSource

Evaluate a webrexp in depth first fashion, returning a success status telling if the evaluation got up to the end.

parseWebRexp :: String -> Maybe WebRexpSource

Prepare a webrexp. This function is useful if the expression has to be applied many times.

evalParsedWebRexp :: WebRexp -> IO BoolSource

Evaluation for pre-parsed webrexp. Best method if a webrexp has to be evaluated many times. Evaluated using breadth first method.

executeParsedWebRexp :: WebRexp -> IO [Either String String]Source

Evaluate a webrexp and return all the dumped text as Right and all errors as Left. Evaluated using depth first method.

Crawling configuration

evalWebRexpWithConf :: Conf -> IO BoolSource

Function used in the command line program.