fast-tagsoup: Fast parsing and extracting information from (possibly malformed) HTML/XML documents

Fast TagSoup parser. Speeds of 20-200MB/sec were observed.

Works only with strict bytestrings.

This library is intended to be used in conjunction with the original tagsoup package:

import Text.HTML.TagSoup hiding (parseTags, renderTags)
import Text.HTML.TagSoup.Fast

Besides speed fast-tagsoup correctly handles HTML <script> and <style> tags, converts tags to lower case and can decode non UTF-8 XML for you.

This parser is used in production in BazQux Reader feeds and comments crawler.

Modules

For package maintainers and hackage trustees

Candidates

Versions [RSS]	1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.13, 1.0.14
Dependencies	base (>=4 && <5), bytestring, containers, tagsoup (>=0.13.10), text, text-icu [details]
License	BSD-3-Clause
Copyright	Vladimir Shabanov 2011-2017
Author	Vladimir Shabanov <vshabanoff@gmail.com>
Maintainer	Vladimir Shabanov <vshabanoff@gmail.com>
Uploaded	by VladimirShabanov at 2017-07-04T17:36:00Z
Category	XML
Home page	https://github.com/vshabanov/fast-tagsoup
Source repo	head: git clone https://github.com/vshabanov/fast-tagsoup
Distributions	NixOS:1.0.14
Reverse Dependencies	3 direct, 0 indirect [details]
Downloads	12049 total (39 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs available [build log] Last success reported on 2017-07-04 [all 1 reports]