-- Hoogle documentation, generated by Haddock
-- See Hoogle, http://www.haskell.org/hoogle/
-- | Archive supplied URLs in WebCite & Internet Archive
--
-- archiver is a daemon which will process a specified text file, each
-- line of which is a URL, and will (randomly) one by one request that
-- the URLs be archived or spidered by http://www.webcitation.org
-- and http://www.archive.org for future reference. (One may
-- optionally specify an arbitrary sh command like wget
-- --page-requisites to download URLs locally.)
--
-- Because the interface is a simple text file, this can be combined with
-- other scripts; for example, a script using Sqlite to extract visited
-- URLs from Firefox, or a program extracting URLs from Pandoc documents.
-- (See http://www.gwern.net/Archiving%20URLs.)
--
-- For explanation of the derivation of the code in
-- Network.URL.Archiver, see
-- http://www.gwern.net/haskell/Wikipedia%20Archive%20Bot.
@package archiver
@version 0.4
module Network.URL.Archiver
-- | Error check the URL and then archive it using webciteArchive
-- and alexaArchive
checkArchive :: String -> String -> IO ()
-- | Request http://www.webcitation.org to copy a supplied URL;
-- WebCite does on-demand archiving, unlike Alexa/Internet Archive, and
-- so in practice this is the most useful function. This function throws
-- away any return status from WebCite (which may be changed in the
-- future), so it is suggested that one test with a valid email address.
-- This and alexArchive ignore any attempt to archive the
-- archive's existing pages, since that is useless.
--
-- Warning! WebCite has throttling mechanisms; if you request more
-- than 100 URLs per hour, your IP may be banned! It is suggested that
-- one sleep for ~30 seconds between each URL request.
webciteArchive :: String -> String -> IO ()
-- | Request http://www.alexa.com to spider a supplied URL. Alexa
-- supplies the Internet Archive's caches. TODO: currently broken? Alexa
-- changed pages? is down?
alexaArchive :: String -> IO ()