Safe Haskell	None

Holumbus.Crawler.Robots

Synopsis

Documentation

Add a robots.txt description for a given URI, if it's not already there. The 1. main function of this module

Check whether a robot is not allowed to access a page. The 2. main function of this module

Get the protocol-host-port part of an URI

Access, parse and evaluate a robots.txt file for a given URI

Try to get the robots.txt file for a given host. If it's not there or any errors occur during access, the empty string is returned

Parse the robots.txt, select the crawler specific parts and build a robots restriction value

Enable the evaluation of robots.txt

Disable the evaluation of robots.txt