fast-tagsoup: Fast parsing and extracting information from (possibly malformed) HTML/XML documents

[ bsd3, library, xml ] [ Propose Tags ]

Fast TagSoup parser. Speeds of 20-200MB/sec were observed.

Works only with strict bytestrings.

This library is intended to be used in conjunction with the original tagsoup package:

import Text.HTML.TagSoup hiding (parseTags, renderTags)
import Text.HTML.TagSoup.Fast

Besides speed fast-tagsoup correctly handles HTML <script> and <style> tags, converts tags to lower case and can decode non UTF-8 XML for you.

This parser is used in production in BazQux Reader feeds and comments crawler.

Versions 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.0.9, 1.0.10, 1.0.11, 1.0.12, 1.0.13, 1.0.14
Dependencies base (==4.*), bytestring, containers, tagsoup (>=0.13.10), text, text-icu [details]
License BSD-3-Clause
Copyright Vladimir Shabanov 2011-2017
Author Vladimir Shabanov <vshabanoff@gmail.com>
Maintainer Vladimir Shabanov <vshabanoff@gmail.com>
Category XML
Home page https://github.com/vshabanov/fast-tagsoup
Source repo head: git clone https://github.com/vshabanov/fast-tagsoup
Uploaded by VladimirShabanov at Tue Jul 4 17:36:00 UTC 2017
Distributions NixOS:1.0.14
Downloads 4291 total (19 in the last 30 days)
Rating (no votes yet) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2017-07-04 [all 1 reports]
Hackage Matrix CI

Modules

[Index]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees