Holumbus-Searchengine-1.2.1: A search and indexing engine.

Safe HaskellNone

Holumbus.Crawler.Html

Synopsis

Documentation

getHtmlReferences :: ArrowXml a => a XmlTree URISource

Collect all HTML references to other documents within a, frame and iframe elements

toAbsRef :: URI -> URI -> URISource

construct an absolute URI by a base URI and a possibly relative URI

computeDocBase :: ArrowXml a => a XmlTree StringSource

Compute the base URI of a HTML page with respect to a possibly given base element in the head element of a html page.

Stolen from Uwe Schmidt, http://www.haskell.org/haskellwiki/HXT and then stolen back again by Uwe from Holumbus.Utility

normalizeWS :: String -> StringSource

normalize whitespace by splitting a text into words and joining this together with unwords

limitLength :: Int -> String -> StringSource

take the first n chars of a string, if the input is too long the cut off is indicated by "..." at the end