tagsoup-0.13.9: Parsing and extracting information from (possibly malformed) HTML/XML documents

Safe HaskellSafe
LanguageHaskell98

Text.HTML.TagSoup.Entity

Description

This module converts between HTML/XML entities (i.e. &) and the characters they represent.

Synopsis

Documentation

lookupEntity :: String -> Maybe String Source

Lookup an entity, using lookupNumericEntity if it starts with # and lookupNamedEntity otherwise

lookupNamedEntity :: String -> Maybe String Source

Lookup a named entity, using htmlEntities

lookupNamedEntity "amp" == Just "&"
lookupNamedEntity "haskell" == Nothing

lookupNumericEntity :: String -> Maybe String Source

Lookup a numeric entity, the leading '#' must have already been removed.

lookupNumericEntity "65" == Just "A"
lookupNumericEntity "x41" == Just "A"
lookupNumericEntity "x4E" === Just "N"
lookupNumericEntity "x4e" === Just "N"
lookupNumericEntity "X4e" === Just "N"
lookupNumericEntity "Haskell" == Nothing
lookupNumericEntity "" == Nothing
lookupNumericEntity "89439085908539082" == Nothing

escapeXML :: String -> String Source

Escape an XML string.

escapeXML "hello world" == "hello world"
escapeXML "hello & world" == "hello & world"

xmlEntities :: [(String, String)] Source

A table mapping XML entity names to resolved strings. All strings are a single character long. Does not include apos as Internet Explorer does not know about it.

htmlEntities :: [(String, String)] Source

A table mapping HTML entity names to resolved strings. Most resolved strings are a single character long, but some (e.g. "ngeqq") are two characters long. The list is taken from http://www.w3.org/TR/html5/syntax.html#named-character-references.