tagsoup-0.14: Parsing and extracting information from (possibly malformed) HTML/XML documents

Safe HaskellNone



NOTE: This module is preliminary and may change at a future date.

This module is intended to help converting a list of tags into a tree of tags.



data TagTree str Source

A tree of Tag values.


TagBranch str [Attribute str] [TagTree str]

A 'TagOpen'/'TagClose' pair with the Tag values in between.

TagLeaf (Tag str)

Any leaf node


Functor TagTree Source 
Eq str => Eq (TagTree str) Source 
Ord str => Ord (TagTree str) Source 
Show str => Show (TagTree str) Source 

tagTree :: Eq str => [Tag str] -> [TagTree str] Source

Convert a list of tags into a tree. This version is not lazy at all, that is saved for version 2.

parseTree :: StringLike str => str -> [TagTree str] Source

Build a TagTree from a string.

parseTreeOptions :: StringLike str => ParseOptions str -> str -> [TagTree str] Source

Build a TagTree from a string, specifying the ParseOptions.

data ParseOptions str Source

These options control how parseTags works. The ParseOptions type is usually generated by one of parseOptions, parseOptionsFast or parseOptionsEntities, then selected fields may be overriden.

The options optTagPosition and optTagWarning specify whether to generate TagPosition or TagWarning elements respectively. Usually these options should be set to False to simplify future stages, unless you rely on position information or want to give malformed HTML messages to the end user.

The options optEntityData and optEntityAttrib control how entities, for example   are handled. Both take a string, and a boolean, where True indicates that the entity ended with a semi-colon ;. Inside normal text optEntityData will be called, and the results will be inserted in the tag stream. Inside a tag attribute optEntityAttrib will be called, and the first component of the result will be used in the attribute, and the second component will be appended after the TagOpen value (usually the second component is []). As an example, to not decode any entities, pass:

    {optEntityData=\(str,b) -> [TagText $ "&" ++ str ++ [';' | b]]
    ,optEntityAttrib\(str,b) -> ("&" ++ str ++ [';' | b], [])




optTagPosition :: Bool

Should TagPosition values be given before some items (default=False,fast=False).

optTagWarning :: Bool

Should TagWarning values be given (default=False,fast=False)

optEntityData :: (str, Bool) -> [Tag str]

How to lookup an entity (Bool = has ending ';')

optEntityAttrib :: (str, Bool) -> (str, [Tag str])

How to lookup an entity in an attribute (Bool = has ending ';'?)

optTagTextMerge :: Bool

Require no adjacent TagText values (default=True,fast=False)

flattenTree :: [TagTree str] -> [Tag str] Source

Flatten a TagTree back to a list of Tag.

renderTree :: StringLike str => [TagTree str] -> str Source

Render a TagTree.

renderTreeOptions :: StringLike str => RenderOptions str -> [TagTree str] -> str Source

Render a TagTree with some RenderOptions.

data RenderOptions str Source

These options control how renderTags works.

The strange quirk of only minimizing <br> tags is due to Internet Explorer treating <br></br> as <br><br>.




optEscape :: str -> str

Escape a piece of text (default = escape the four characters &"<>)

optMinimize :: str -> Bool

Minimise <b></b> -> <b/> (default = minimise only <br> tags)

optRawTag :: str -> Bool

Should a tag be output with no escaping (default = true only for script)

transformTree :: (TagTree str -> [TagTree str]) -> [TagTree str] -> [TagTree str] Source

This operation is based on the Uniplate transform function. Given a list of trees, it applies the function to every tree in a bottom-up manner. This operation is useful for manipulating a tree - for example to make all tag names upper case:

upperCase = transformTree f
  where f (TagBranch name atts inner) = [TagBranch (map toUpper name) atts inner]
        f x = [x]

universeTree :: [TagTree str] -> [TagTree str] Source

This operation is based on the Uniplate universe function. Given a list of trees, it returns those trees, and all the children trees at any level. For example:

   [TagBranch "a" [("href","url")] [TagBranch "b" [] [TagLeaf (TagText "text")]]]
== [TagBranch "a" [("href","url")] [TagBranch "b" [] [TagLeaf (TagText "text")]]]
   ,TagBranch "b" [] [TagLeaf (TagText "text")]]

This operation is particularly useful for queries. To collect all "a" tags in a tree, simply do:

[x | x@(TagBranch "a" _ _) <- universeTree tree]