tagsoup-0.6: Parsing and extracting information from (possibly malformed) HTML documents

Text.HTML.TagSoup.Type

Contents

Description

The central type in TagSoup

Synopsis

Data structures and parsing

data Tag Source

An HTML element, a document is [Tag]. There is no requirement for TagOpen and TagClose to match

Constructors

TagOpen String [Attribute]

An open tag with Attributes in their original order.

TagClose String

A closing tag

TagText String

A text node, guaranteed not to be the empty string

TagComment String

A comment

TagWarning String

Meta: Mark a syntax error in the input file

TagPosition !Row !Column

Meta: The position of a parsed element

Instances

type Attribute = (String, String)Source

An HTML attribute id="name" generates ("id","name")

type Row = IntSource

Tag identification

isTagOpen :: Tag -> BoolSource

Test if a Tag is a TagOpen

isTagClose :: Tag -> BoolSource

Test if a Tag is a TagClose

isTagText :: Tag -> BoolSource

Test if a Tag is a TagText

isTagOpenName :: String -> Tag -> BoolSource

Returns True if the Tag is TagOpen and matches the given name

isTagCloseName :: String -> Tag -> BoolSource

Returns True if the Tag is TagClose and matches the given name

Extraction

fromTagText :: Tag -> StringSource

Extract the string from within TagText, crashes if not a TagText

fromAttrib :: String -> Tag -> StringSource

Extract an attribute, crashes if not a TagOpen. Returns "" if no attribute present.

maybeTagText :: Tag -> Maybe StringSource

Extract the string from within TagText, otherwise Nothing

maybeTagWarning :: Tag -> Maybe StringSource

Extract the string from within TagWarning, otherwise Nothing

innerText :: [Tag] -> StringSource

Extract all text content from tags (similar to Verbatim found in HaXml)