wp-archivebot: Subscribe to a wiki's RSS feed and archive external links
A MediaWiki's RecentChanges or NewPages links to every new edit or article; this bot will poll the corresponding RSS feeds (easier and more reliable than parsing the HTML), follow the links to the new edit/article, and then use TagSoup to filter out every off-wiki link (eg. to http://cnn.com).
With this list of external links, the bot will then fire off requests to http://webcitation.org/, which will make a backup (similar to the Internet Archive, but on-demand).
Example: to archive links from every article in the English Wikipedia's RecentChanges:
wp-archivebot gwern0@gmail.com 'http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&feed=rss'
Downloads
- wp-archivebot-0.1.tar.gz [browse] (Cabal source package)
- Package description (as included in the package)
Maintainer's Corner
For package maintainers and hackage trustees
Candidates
- No Candidates
| Versions [RSS] | 0.1 |
|---|---|
| Dependencies | base (>=3 && <4), feed, HTTP, network, parallel, tagsoup [details] |
| Tested with | ghc ==6.10.2 |
| License | BSD-3-Clause |
| Author | Gwern |
| Maintainer | gwern0@gmail.com |
| Category | Network |
| Uploaded | by GwernBranwen at 2009-06-04T16:31:50Z |
| Distributions | |
| Reverse Dependencies | 1 direct, 0 indirect [details] |
| Executables | wp-archivebot |
| Downloads | 1055 total (1 in the last 30 days) |
| Rating | (no votes yet) [estimated by Bayesian average] |
| Your Rating | |
| Status | Docs not available [build log] All reported builds failed as of 2017-01-01 [all 7 reports] |