-- Hoogle documentation, generated by Haddock
-- See Hoogle, http://www.haskell.org/hoogle/


-- | Archive supplied URLs in WebCite & Internet Archive
--   
--   archiver is a daemon which will process a specified text file, each
--   line of which is a URL, and will (randomly) one by one request that
--   the URLs be archived or spidered by <a>http://www.webcitation.org</a>
--   and <a>http://www.archive.org</a> for future reference. (One may
--   optionally specify an arbitrary <a>sh</a> command like <a>wget
--   --page-requisites</a> to download URLs locally.)
--   
--   Because the interface is a simple text file, this can be combined with
--   other scripts; for example, a script using Sqlite to extract visited
--   URLs from Firefox, or a program extracting URLs from Pandoc documents.
--   (See <a>http://www.gwern.net/Archiving%20URLs</a>.)
--   
--   For explanation of the derivation of the code in
--   <a>Network.URL.Archiver</a>, see
--   <a>http://www.gwern.net/haskell/Wikipedia%20Archive%20Bot</a>.
@package archiver
@version 0.4

module Network.URL.Archiver

-- | Error check the URL and then archive it using <a>webciteArchive</a>
--   and <a>alexaArchive</a>
checkArchive :: String -> String -> IO ()

-- | Request <a>http://www.webcitation.org</a> to copy a supplied URL;
--   WebCite does on-demand archiving, unlike Alexa/Internet Archive, and
--   so in practice this is the most useful function. This function throws
--   away any return status from WebCite (which may be changed in the
--   future), so it is suggested that one test with a valid email address.
--   This and <tt>alexArchive</tt> ignore any attempt to archive the
--   archive's existing pages, since that is useless.
--   
--   <i>Warning!</i> WebCite has throttling mechanisms; if you request more
--   than 100 URLs per hour, your IP may be banned! It is suggested that
--   one sleep for ~30 seconds between each URL request.
webciteArchive :: String -> String -> IO ()

-- | Request <a>http://www.alexa.com</a> to spider a supplied URL. Alexa
--   supplies the Internet Archive's caches. TODO: currently broken? Alexa
--   changed pages? is down?
alexaArchive :: String -> IO ()