-- Hoogle documentation, generated by Haddock -- See Hoogle, http://www.haskell.org/hoogle/ -- | Archive supplied URLs in WebCite & Internet Archive -- -- archiver is a daemon which will process a specified text file, each -- line of which is a URL, and will (randomly) one by one request that -- the URLs be archived or spidered by http://www.webcitation.org -- and http://www.archive.org for future reference. (One may -- optionally specify an arbitrary sh command like wget -- --page-requisites to download URLs locally.) -- -- Because the interface is a simple text file, this can be combined with -- other scripts; for example, a script using Sqlite to extract visited -- URLs from Firefox, or a program extracting URLs from Pandoc documents. -- (See http://www.gwern.net/Archiving%20URLs.) -- -- For explanation of the derivation of the code in -- Network.URL.Archiver, see -- http://www.gwern.net/haskell/Wikipedia%20Archive%20Bot. @package archiver @version 0.4 module Network.URL.Archiver -- | Error check the URL and then archive it using webciteArchive -- and alexaArchive checkArchive :: String -> String -> IO () -- | Request http://www.webcitation.org to copy a supplied URL; -- WebCite does on-demand archiving, unlike Alexa/Internet Archive, and -- so in practice this is the most useful function. This function throws -- away any return status from WebCite (which may be changed in the -- future), so it is suggested that one test with a valid email address. -- This and alexArchive ignore any attempt to archive the -- archive's existing pages, since that is useless. -- -- Warning! WebCite has throttling mechanisms; if you request more -- than 100 URLs per hour, your IP may be banned! It is suggested that -- one sleep for ~30 seconds between each URL request. webciteArchive :: String -> String -> IO () -- | Request http://www.alexa.com to spider a supplied URL. Alexa -- supplies the Internet Archive's caches. TODO: currently broken? Alexa -- changed pages? is down? alexaArchive :: String -> IO ()