html-parse: A high-performance HTML tokenizer
This package provides a fast and reasonably robust HTML5 tokenizer built
upon the attoparsec library. The parsing strategy is based upon the HTML5
parsing specification with few deviations.
The package targets similar use-cases to the venerable tagsoup library,
but is significantly more efficient, achieving parsing speeds of over 50
megabytes per second on modern hardware with and typical web documents.
Downloads
- html-parse-0.2.0.0.tar.gz [browse] (Cabal source package)
- Package description (as included in the package)
Maintainer's Corner
For package maintainers and hackage trustees
Candidates
| Versions [RSS] | 0.1.0.0, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.2.1.0, 0.2.2.0 |
|---|---|
| Dependencies | attoparsec (>=0.13 && <0.14), base (>=4.8 && <4.10), deepseq (>=1.4 && <1.5), text (>=1.2 && <1.3) [details] |
| License | BSD-3-Clause |
| Copyright | (c) 2016 Ben Gamari |
| Author | Ben Gamari |
| Maintainer | ben@smart-cactus.org |
| Category | Text |
| Home page | http://github.com/bgamari/html-parse |
| Source repo | head: git clone git://github.com/bgamari/html-parse |
| Uploaded | by BenGamari at 2016-11-23T17:02:09Z |
| Distributions | Arch:0.2.2.0, LTSHaskell:0.2.2.0, NixOS:0.2.2.0, Stackage:0.2.2.0 |
| Reverse Dependencies | 3 direct, 0 indirect [details] |
| Downloads | 4051 total (5 in the last 30 days) |
| Rating | (no votes yet) [estimated by Bayesian average] |
| Your Rating | |
| Status | Docs uploaded by user [build log] All reported builds failed as of 2025-04-18 [all 2 reports] |