| Safe Haskell | Safe |
|---|---|
| Language | Haskell2010 |
Text.HTML.Parser
Description
This is a performance-oriented HTML tokenizer aim at web-crawling applications. It follows the HTML5 parsing specification quite closely, so it behaves reasonable well on ill-formed documents from the open Web.
Synopsis
- parseTokens :: Text -> [Token]
- parseTokensLazy :: Text -> [Token]
- token :: Parser Token
- data Token
- type TagName = Text
- type AttrName = Text
- type AttrValue = Text
- data Attr = Attr !AttrName !AttrValue
- renderTokens :: [Token] -> Text
- renderToken :: Token -> Text
- renderAttrs :: [Attr] -> Text
- renderAttr :: Attr -> Text
- canonicalizeTokens :: [Token] -> [Token]
Parsing
Types
An HTML token
Constructors
| TagOpen !TagName [Attr] | An opening tag. Attribute ordering is arbitrary. |
| TagSelfClose !TagName [Attr] | A self-closing tag. |
| TagClose !TagName | A closing tag. |
| ContentText !Text | The content between tags. |
| ContentChar !Char | A single character of content |
| Comment !Builder | Contents of a comment. |
| Doctype !Text | Doctype |
Instances
Rendering, text canonicalization
renderTokens :: [Token] -> Text Source #
See renderToken.
renderAttrs :: [Attr] -> Text Source #
See renderAttr.
renderAttr :: Attr -> Text Source #
Does not escape quotation in attribute values!
canonicalizeTokens :: [Token] -> [Token] Source #
Meld neighoring ContentChar and ContentText constructors together and drops empty text
elements.