http-conduit-browser-2.0.0.1: Browser interface to the http-conduit package

Safe HaskellNone
LanguageHaskell98

Network.HTTP.Conduit.Browser

Contents

Description

This module is designed to work similarly to the Network.Browser module in the HTTP package. The idea is that there are two new types defined: BrowserState and BrowserAction. The purpose of this module is to make it easy to describe a browsing session, including navigating to multiple pages, and have things like cookie jar updates work as expected as you browse around.

BrowserAction is a monad that handles all your browser-related activities. This monad is actually implemented as a specialization of the State monad, over the BrowserState type. The BrowserState type has various bits of information that a web browser keeps, such as a current cookie jar, the number of times to retry a request on failure, HTTP proxy information, etc. In the BrowserAction monad, there is one BrowserState at any given time, and you can modify it by using the convenience functions in this module.

A special kind of modification of the current browser state is the action of making a HTTP request. This will do the request according to the params in the current BrowserState, as well as modifying the current state with, for example, an updated cookie jar and location.

To use this module, you would bind together a series of BrowserActions (This simulates the user clicking on links or using a settings dialogue etc.) to describe your browsing session. When you've described your session, you call browse on your top-level BrowserAction to actually convert your actions into the ResourceT IO monad.

Here is an example program:

{-# LANGUAGE OverloadedStrings #-}
import qualified Data.ByteString.Lazy as LB
import qualified Data.Text.Encoding as TE
import qualified Data.Text.Lazy.Encoding as TLE
import qualified Data.Text.Lazy.IO as TLIO
import           Data.Conduit
import           Network.HTTP.Conduit
import           Network.HTTP.Conduit.Browser

-- The web request to log in to a service
req1 :: IO (Request)
req1 = do
  req <- parseUrl "http://www.myurl.com/login.php"
  return $ urlEncodedBody [ (TE.encodeUtf8 "name", TE.encodeUtf8 "litherum")
                          , (TE.encodeUtf8 "pass", TE.encodeUtf8 "S33kRe7")
                          ] req

-- Once authenticated, run this request
req2 :: IO (Request m')
req2 = parseUrl "http://www.myurl.com/main.php"

-- Bind two BrowserActions together
action :: Request -> Request -> BrowserAction (Response LB.ByteString)
action r1 r2 = do
  _ <- makeRequestLbs r1
  makeRequestLbs r2

main :: IO ()
main = do
  man <- newManager def
  r1 <- req1
  r2 <- req2
  out <- runResourceT $ browse man $ do
    setDefaultHeader "User-Agent" $ Just "A very popular browser"
    action r1 r2
  TLIO.putStrLn $ TLE.decodeUtf8 $ responseBody out

Synopsis

Main

browse :: Monad m => Manager -> GenericBrowserAction m a -> m a Source #

Do the browser action with the given manager

parseRelativeUrl :: MonadThrow m => String -> GenericBrowserAction m Request Source #

Convert an URL relative to current Location into a Request

Will throw InvalidUrlException on parse failures or if your Location is Nothing (e.g. you haven't made any requests before)

makeRequest :: (MonadBaseControl IO m, MonadIO m, MonadResource m) => Request -> GenericBrowserAction m (Response (ResumableSource (ResourceT IO) ByteString)) Source #

Make a request, using all the state in the current BrowserState

makeRequestLbs :: (MonadIO m, MonadResource m, MonadBaseControl IO m) => Request -> GenericBrowserAction m (Response ByteString) Source #

Make a request and pack the result as a lazy bytestring.

Note: Even though this function returns a lazy bytestring, it does not utilize lazy I/O, and therefore the entire response body will live in memory. If you want constant memory usage, you'll need to use the conduit package and makeRequest directly.

Browser state

You can save and restore the state at will

Manager

The active manager, managing the connection pool

Location

The last visited url (similar to the location bar in mainstream browsers). Location is updated on every request.

default: Nothing

Cookies

Cookie jar

Global cookie jar. Cookies in Request's cookieJar are preferred to global cookies if there's a name collision.

default: def

Cookie filter

Each new Set-Cookie the browser encounters will pass through this filter. Only cookies that pass the filter (and are already valid) will be allowed into the cookie jar

default: const $ const $ return True

Proxies

HTTP

An optional proxy to send all requests through if Nothing uses Request's proxy

default: Nothing

Redirects

The number of redirects to allow. if Nothing uses Request's redirectCount

default: Nothing

Retries

The number of times to retry a failed connection

default: 0

Timeout

Number of microseconds to wait for a response. if Nothing uses Request's responseTimeout

default: Nothing

Authorities

A user-provided function that provides optional authorities. This function gets run on all requests before they get sent out. The output of this function is applied to the request.

default: const Nothing

Headers

Default headers

Specifies Headers that should be added to Request, these will be overriden by any headers specified in requestHeaders.

do insertDefaultHeader ("User-Agent", "dog")
   insertDefaultHeader ("Connection", "keep-alive")
   makeRequest def{requestHeaders = [("User-Agent", "kitten"), ("Accept", "x-animal/mouse")]}
> User-Agent: kitten
> Accept: x-animal/mouse
> Connection: keep-alive

default: [("User-Agent", "http-conduit-browser")]

Override headers

Specifies Headers that should be added to Request, these will override Headers already specified in requestHeaders.

do insertOverrideHeader ("User-Agent", "rat")
   insertOverrideHeader ("Connection", "keep-alive")
   makeRequest def{requestHeaders = [("User-Agent", "kitten"), ("Accept", "everything/digestible")]}
> User-Agent: rat
> Accept: everything/digestible
> Connection: keep-alive

default: []

Error handling

Function to check the status code. Note that this will run after all redirects are performed. if Nothing uses Request's checkStatus

default: Nothing