hexpat-tagsoup-0.1: Parse (possibly malformed) HTML to hexpat tree



An integration of the tagsoup and hexpat packages, allowing HTML to be parsed to a hexpat tree, tolerant of errors.

The real work is done by Neil Mitchell's tagsoup package.



parseTags :: (StringLike s, GenericXMLString text) => s -> UNode textSource

Parse tags using TagSoup, invoke canonicalizeTags to convert them all to lower case, automatically self-close tags like img and input, then convert to a hexpat tree.

parseTagsOptions :: (StringLike s, GenericXMLString text) => ParseOptions s -> s -> UNode textSource

Variant that accepts options.

data ParseOptions str

These options control how parseTags works.




optTagPosition :: Bool

Should TagPosition values be given before some items (default=False,fast=False)

optTagWarning :: Bool

Should TagWarning values be given (default=False,fast=False)

optEntityData :: (str, Bool) -> [Tag str]

How to lookup an entity (Bool = has ending ';')

optEntityAttrib :: (str, Bool) -> (str, [Tag str])

How to lookup an entity in an attribute (Bool = has ending ';'?)

optTagTextMerge :: Bool

Require no adjacent TagText values (default=True,fast=False)

parseOptions :: StringLike str => ParseOptions str

The default parse options value, described in ParseOptions.

parseOptionsFast :: StringLike str => ParseOptions str

A ParseOptions structure optimised for speed, following the fast options.