Text.XML.HXT.Parser.HtmlParser

hxt-filter-8.3.0: A collection of tools for processing XML with Haskell (Filter variant).

Portability	portable
Stability	stable
Maintainer	Uwe Schmidt (uwe@fh-wedel.de)

Description

This parser tries to interprete everything as HTML no errors are emitted during parsing. If something looks weired, warning messages are inserted in the document tree.

Module contains state filter for easy parsing and error handling real work is done in Text.XML.HXT.Parser.HtmlParsec

Synopsis

getHtmlDoc :: XmlStateFilter state

parseHtmlDoc :: XmlStateFilter a

runHtmlParser :: XmlStateFilter a

substHtmlEntities :: XmlTree -> XmlTrees

module Text.XML.HXT.Parser.HtmlParsec

Documentation

getHtmlDoc :: XmlStateFilter state

Source

read a document and parse it with parseHtmlDoc. The main entry point of this module

The input tree must be a root tree like in ' Text.XML.HXT.Parser.MainFunctions.getXmlDoc'. The content is read with getXmlContents, is parsed with parseHtmlDoc and canonicalized (char refs are substituted in content and attributes, but comment is preserved)

see also : Text.XML.HXT.Parser.DTDProcessing.getWellformedDoc

parseHtmlDoc :: XmlStateFilter a

Source

The HTML parsing filter

The input is parsed with runHtmlParser, everything is interpreted as HTML, if errors ocuur, the parser will try to do some meaningfull action and continues parsing. Afterwards the entitiy references for defined for XHTML are resovled, any unresolved reference is transformed into plain text.

Error messages during parsing and entity resolving are added as warning nodes into the resulting tree.

The warnings are issued, if the 1. parameter noWarnings is set to True, afterwards all are removed from the resulting tree.

runHtmlParser :: XmlStateFilter a

Source

The pure HTML parser, usually called via parseHtmlDoc.

substHtmlEntities :: XmlTree -> XmlTrees

Source

module Text.XML.HXT.Parser.HtmlParsec

Produced by Haddock version 2.4.2