http-conduit-downloader: HTTP downloader tailored for web-crawler needs.

[ bsd3, library, web ] [ Propose Tags ]

HTTP/HTTPS downloader built on top of http-conduit and used in https://bazqux.com crawler.

  • Handles all possible http-conduit exceptions and returns human readable error messages.

  • Handles some web server bugs (returning deflate data instead of gzip, invalid gzip encoding).

  • Uses OpenSSL instead of tls package (since tls doesn't handle all sites).

  • Ignores invalid SSL sertificates.

  • Receives data in 32k chunks internally to reduce memory fragmentation on many parallel downloads.

  • Download timeout.

  • Total download size limit.

  • Returns HTTP headers for subsequent redownloads and handles 'Not modified' results.

  • Can be used with external DNS resolver (e.g. concurrent-dns-cache).

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

Versions [RSS] 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.13, 1.0.14, 1.0.15, 1.0.16, 1.0.17, 1.0.18, 1.0.19, 1.0.20, 1.0.21, 1.0.22, 1.0.23, 1.0.24, 1.0.25, 1.0.30, 1.0.31, 1.0.32, 1.0.33, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5
Dependencies base (>=4 && <5), bytestring, conduit, connection, data-default, HsOpenSSL (>=0.11.2), http-client (>=0.6.1), http-conduit (>=2.3.4), http-types, mtl, network (>=2.6), network-uri (>=2.6), resourcet, text, time (>=1.5.0), zlib [details]
License BSD-3-Clause
Author Vladimir Shabanov <vshabanoff@gmail.com>
Maintainer Vladimir Shabanov <vshabanoff@gmail.com>
Category Web
Home page https://github.com/bazqux/http-conduit-downloader
Source repo head: git clone https://github.com/bazqux/http-conduit-downloader
Uploaded by VladimirShabanov at 2019-02-21T13:11:36Z
Distributions
Reverse Dependencies 2 direct, 0 indirect [details]
Downloads 22286 total (79 in the last 30 days)
Rating 2.0 (votes: 1) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2019-02-21 [all 1 reports]