html-parse: A high-performance HTML tokenizer

[ bsd3, library, text ] [ Propose Tags ] [ Report a vulnerability ]

This package provides a fast and reasonably robust HTML5 tokenizer built upon the attoparsec library. The parsing strategy is based upon the HTML5 parsing specification with few deviations.

The package targets similar use-cases to the venerable tagsoup library, but is significantly more efficient, achieving parsing speeds of over 50 megabytes per second on modern hardware with and typical web documents.

Modules

[Index]

Text
- HTML
  - Text.HTML.Parser

Downloads

html-parse-0.2.0.0.tar.gz [browse] (Cabal source package)
Package description (as included in the package)

Maintainer's Corner

Package maintainers

BenGamari

For package maintainers and hackage trustees

edit package information

Candidates

0.2.0.0, 0.2.0.1, 0.2.0.2, 0.2.1.0, 0.2.1.1, 0.2.2.0

Versions [RSS]	0.1.0.0, 0.2.0.0, 0.2.0.1, 0.2.0.2, 0.2.1.0, 0.2.2.0
Dependencies	attoparsec (>=0.13 && <0.14), base (>=4.8 && <4.10), deepseq (>=1.4 && <1.5), text (>=1.2 && <1.3) [details]
License	BSD-3-Clause
Copyright	(c) 2016 Ben Gamari
Author	Ben Gamari
Maintainer	ben@smart-cactus.org
Uploaded	by BenGamari at 2016-11-23T17:02:09Z
Category	Text
Home page	http://github.com/bgamari/html-parse
Source repo	head: git clone git://github.com/bgamari/html-parse
Distributions	Arch:0.2.2.0, LTSHaskell:0.2.2.0, NixOS:0.2.2.0, Stackage:0.2.2.0
Reverse Dependencies	3 direct, 0 indirect [details]
Downloads	4079 total (11 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user [build log] All reported builds failed as of 2025-04-18 [all 2 reports]