The wp-archivebot package

[Tags: bsd3, program]

A MediaWiki's RecentChanges or NewPages links to every new edit or article; this bot will poll the corresponding RSS feeds (easier and more reliable than parsing the HTML), follow the links to the new edit/article, and then use TagSoup to filter out every off-wiki link (eg. to http:cnn.com).

With this list of external links, the bot will then fire off requests to http:webcitation.org/, which will make a backup (similar to the Internet Archive, but on-demand).

Example: to archive links from every article in the English Wikipedia's RecentChanges:

 wp-archivebot gwern0@gmail.com 'http://en.wikipedia.org/w/index.php?title=Special:RecentChanges&feed=rss'

Properties

Version0.1
Dependenciesbase (==3.*), feed, HTTP, network, parallel, tagsoup
LicenseBSD3
AuthorGwern
Maintainergwern0@gmail.com
StabilityExperimental
CategoryNetwork
Executableswp-archivebot
Upload dateThu Jun 4 16:31:50 UTC 2009
Uploaded byGwernBranwen
Downloads47 total (5 in last 30 days)

Downloads

Maintainers' corner

For package maintainers and hackage trustees