shpider-0.0.3: Web automation library in Haskell.Source codeContentsIndex
Network.Shpider
Description

This module exposes the main functionality of shpider It allows you to quickly write crawlers, and for simple cases even without reading the page source eg.

 runShpider $ do
    download "http://hackage.haskell.org/packages/archive/pkg-list.html"
    l : _ <- getLinksByText "shpider"
    download $ linkAddress l
Synopsis
module Network.Shpider.Code
module Network.Shpider.State
module Network.Shpider.URL
module Network.Shpider.Options
module Network.Shpider.Forms
module Network.Shpider.Links
download :: String -> Shpider (ShpiderCode, Page)
sendForm :: Form -> Shpider (ShpiderCode, Page)
getLinksByText :: String -> Shpider [Link]
getLinksByTextRegex :: String -> Shpider [Link]
getFormsByAction :: String -> Shpider [Form]
currentLinks :: Shpider [Link]
currentForms :: Shpider [Form]
parsePage :: String -> String -> Shpider Page
isAuthorizedDomain :: String -> Shpider Bool
withAuthorizedDomain :: String -> Shpider (ShpiderCode, Page) -> Shpider (ShpiderCode, Page)
haveVisited :: String -> Shpider Bool
Documentation
module Network.Shpider.Code
module Network.Shpider.State
module Network.Shpider.URL
module Network.Shpider.Options
module Network.Shpider.Forms
module Network.Shpider.Links
download :: String -> Shpider (ShpiderCode, Page)Source
Fetch whatever is at this address, and attempt to parse the content into a Page. Return the status code with the parsed content.
sendForm :: Form -> Shpider (ShpiderCode, Page)Source
Send a form to the URL specified in its action attribute
getLinksByText :: String -> Shpider [Link]Source
Get all links which match this text.
getLinksByTextRegex :: String -> Shpider [Link]Source
Get all links whose text matches this regex.
getFormsByAction :: String -> Shpider [Form]Source
Get all forms whose action matches the given action
currentLinks :: Shpider [Link]Source
Return the links on the current page.
currentForms :: Shpider [Form]Source
Return the forms on the current page.
parsePage :: String -> String -> Shpider PageSource
Parse a given URL and source html into the Page datatype. This will set the current page.
isAuthorizedDomain :: String -> Shpider BoolSource
If stayOnDomain has been set to true, then isAuthorizedDomain returns True if the given URL is on the domain and false otherwise. If stayOnDomain has not been set to True, then it returns True.
withAuthorizedDomain :: String -> Shpider (ShpiderCode, Page) -> Shpider (ShpiderCode, Page)Source
withAuthorizedDomain will execute the function if the url given is an authorized domain. See isAuthorizedDomain.
haveVisited :: String -> Shpider BoolSource
if keepTrack has been set, then haveVisited will return True if the given URL has been visited.
Produced by Haddock version 2.4.2