The shpider package
Shpider is a web automation library for Haskell. It allows you to quickly write crawlers, and for simple cases ( like following links ) even without reading the page source.
It has useful features such as turning relative links from a page into absolute links, options to authorize transactions only on a given domain, and the option to only download html documents.
It also provides a nice syntax for filling out forms.
An example:
runShpider $ do
download "http://apage.com"
theForm : _ <- getFormsByAction "http://anotherpage.com"
sendForm $ fillOutForm theForm $ pairs $ do
"occupation" =: "unemployed Haskell programmer"
"location" =: "mother's house"
Shpider contains a patched version of the curl package ( the original package's garbage-collection caused non-deterministic behaviour ). The curl licence is therefore distributed with this package.
Properties
| Versions | 0.0.1, 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.1.0, 0.1.1, 0.2, 0.2.1.1 |
|---|---|
| Dependencies | base (<5), bytestring, containers, mtl, regex-posix, tagsoup, tagsoup-parsec, url (≥2) |
| License | BSD3 |
| Author | Johnny Morrice |
| Maintainer | Johnny Morrice <spoon@killersmurf.com> |
| Category | Web |
| Home page | http://www.killersmurf.com/projects/shpider |
| Upload date | Thu Aug 6 16:22:24 UTC 2009 |
| Uploaded by | JohnnyMorrice |
| Built on | ghc-6.10 |
| Build failure | ghc-6.12 (log) |
Modules
Downloads
- shpider-0.0.6.tar.gz (Cabal source package)
- package description (included in the package)