| Safe Haskell | None |
|---|
Network.HTTP.Conduit.Downloader
Contents
Description
HTTP downloader tailored for web-crawler needs.
- Handles all possible http-conduit exceptions and returns human readable error messages.
- Handles some web server bugs (returning
deflatedata instead ofgzip). - Ignores invalid SSL sertificates.
- Receives data in 32k blocks internally to reduce memory fragmentation on many parallel downloads.
- Download timeout.
- Total download size limit.
- Returns HTTP headers for subsequent redownloads
and handles
Not modifiedresults. - Can be used with external DNS resolver (hsdns-cache for example).
- Keep-alive connections pool (thanks to http-conduit).
Typical workflow in crawler:
withDnsCache $ c -> withDownloader $ d -> do
... -- got URL from queue
ra <- resolveA c $ hostNameFromUrl url
case ra of
Left err -> ... -- uh oh, bad host
Right ha -> do
... -- crawler politeness stuff (rate limits, domain queues)
dr <- download d url (Just ha) opts
case dr of
DROK dat redownloadOpts ->
... -- analyze data, save redownloadOpts for next download
DRRedirect .. -> ...
DRNotModified -> ...
DRError e -> ...
It's highly recommended to use
http://hackage.haskell.org/package/hsdns-cache
for DNS resolution since getAddrInfo used in http-conduit can be
buggy and ineffective when it needs to resolve many hosts per second for
a long time.
- urlGetContents :: String -> IO ByteString
- urlGetContentsPost :: String -> ByteString -> IO ByteString
- download :: Downloader -> String -> Maybe HostAddress -> DownloadOptions -> IO DownloadResult
- post :: Downloader -> String -> Maybe HostAddress -> ByteString -> IO DownloadResult
- downloadG :: (Request (ResourceT IO) -> ResourceT IO (Request (ResourceT IO))) -> Downloader -> String -> Maybe HostAddress -> DownloadOptions -> IO DownloadResult
- data DownloadResult
- type DownloadOptions = [String]
- data DownloaderSettings = DownloaderSettings {}
- data Downloader
- withDownloader :: (Downloader -> IO a) -> IO a
- withDownloaderSettings :: DownloaderSettings -> (Downloader -> IO a) -> IO a
- newDownloader :: DownloaderSettings -> IO Downloader
- postRequest :: ByteString -> Request a -> Request b
- sinkByteString :: MonadIO m => Int -> Sink ByteString m (Maybe ByteString)
Download operations
urlGetContents :: String -> IO ByteStringSource
Download single URL with default DownloaderSettings.
Fails if result is not DROK.
urlGetContentsPost :: String -> ByteString -> IO ByteStringSource
Post data and download single URL with default DownloaderSettings.
Fails if result is not DROK.
Arguments
| :: Downloader | |
| -> String | URL |
| -> Maybe HostAddress | Optional resolved |
| -> DownloadOptions | |
| -> IO DownloadResult |
Perform download
post :: Downloader -> String -> Maybe HostAddress -> ByteString -> IO DownloadResultSource
Perform HTTP POST.
Arguments
| :: (Request (ResourceT IO) -> ResourceT IO (Request (ResourceT IO))) | Function to modify |
| -> Downloader | |
| -> String | URL |
| -> Maybe HostAddress | Optional resolved |
| -> DownloadOptions | |
| -> IO DownloadResult |
Generic version of download
with ability to modify http-conduit Request.
data DownloadResult Source
Result of download operation.
Constructors
| DROK ByteString DownloadOptions | Successful download with data and options for next download. |
| DRRedirect String | Redirect URL |
| DRError String | Error |
| DRNotModified | HTTP 304 Not Modified |
Instances
type DownloadOptions = [String]Source
If-None-Match and/or If-Modified-Since headers.
Downloader
data DownloaderSettings Source
Settings used in downloader.
Constructors
| DownloaderSettings | |
Fields
| |
Instances
data Downloader Source
Keeps http-conduit Manager and DownloaderSettings.
withDownloader :: (Downloader -> IO a) -> IO aSource
Create a new Downloader, use it in the provided function,
and then release it.
withDownloaderSettings :: DownloaderSettings -> (Downloader -> IO a) -> IO aSource
Create a new Downloader with provided settings,
use it in the provided function, and then release it.
newDownloader :: DownloaderSettings -> IO DownloaderSource
Create a Downloader with settings.
Utils
postRequest :: ByteString -> Request a -> Request bSource
Make HTTP POST request.
sinkByteString :: MonadIO m => Int -> Sink ByteString m (Maybe ByteString)Source
Sink data using 32k buffers to reduce memory fragmentation.
Returns Nothing if downloaded too much data.