xmlhtml: XML parser and renderer with HTML 5 quirks mode

This is a package candidate release! Here you can preview how this package release will appear once published to the main package index (which can be accomplished via the 'maintain' link below). Please note that once a package has been published to the main package index it cannot be undone! Please consult the package uploading documentation for more information.

[maintain] [Publish]

Contains renderers and parsers for both XML and HTML 5 document fragments, which share data structures so that it's easy to work with both. Document fragments are bits of documents, which are not constrained by some of the high-level structure rules (in particular, they may contain more than one root element).

Note that this is not a compliant HTML 5 parser. Rather, it is a parser for HTML 5 compliant documents. It does not implement the HTML 5 parsing algorithm, and should generally be expected to perform correctly only on documents that you trust to conform to HTML 5. This is not a suitable library for implementing web crawlers or other software that will be exposed to documents from outside sources. The result is also not the HTML 5 node structure, but rather something closer to the physical structure. For example, omitted start tags are not inserted (and so, their corresponding end tags must also be omitted).

[Skip to Readme]


Versions 0.1,, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5,,, 0.1.6, 0.1.7, 0.2.0,,,,, 0.2.1, 0.2.2, 0.2.3,,,,,, 0.2.4, 0.2.5,,,,
Change log CHANGELOG.md
Dependencies base (>=4.5 && <4.18), blaze-builder (>=0.2 && <0.5), blaze-html (>=0.9 && <0.10), blaze-markup (>=0.8 && <0.9), bytestring (>=0.9 && <0.12), bytestring-builder (>= && <0.11), containers (>=0.3 && <0.7), parsec (>=3.1.2 && <3.2), text (>=0.11 && <2.1), unordered-containers (>=0.1.4 && <0.3) [details]
License BSD-3-Clause
Author Chris Smith <cdsmith@gmail.com>
Maintainer Chris Smith <cdsmith@gmail.com>
Category Text, XML
Home page https://github.com/snapframework/xmlhtml
Source repo head: git clone https://github.com/snapframework/xmlhtml.git
Uploaded by cydparser at 2022-11-14T22:47:13Z


[Index] [Quick Jump]


Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Readme for xmlhtml-

[back to package description]

xmlhtml - XML and HTML 5 parsing and rendering

GitHub CI

This library implements both parsers and renderers for XML and HTML 5 document fragments. The two share data structures to represent the document tree, so that you can write code to easily work with either XML or HTML 5. Convenience functions are also available to work with the internal data structure in several natural ways.


To get started, just use the parseHTML or parseXML functions from Text.XmlHtml to parse a ByteString into a document tree. On the other side, use render to write the document tree back to a ByteString.

Working with document trees is easily done in two ways.

  1. Text.XmlHtml exports the document tree types (notably, Document and Node) and functions like getAttribute, setAttribute, tagName, childNodes, etc. for working with them.

  2. Text.XmlHtml.Cursor exports a zipper for node forests, which you can use to navigate and modify the document tree positionally.

That's it, basically. This is hopefully a pretty simple package to use.

TO DO Items:

  1. Do something better with character encodings. For now, they are basically ignored, and we just use the byte order mark to distinguish between the three required encodings. We should implement the encoding sniffing rules for both XML (the declaration) and HTML 5.

  2. Benchmark and improve performance of the parsers and renderers.

  3. Ensure that rendering always gives an error rather than writing an invalid document. (Is this a good idea? It does limit rendering speed.)