The html-parse package

[Tags:benchmark, bsd3, library, test]

This package provides a fast and reasonably robust HTML5 tokenizer built upon the attoparsec library. The parsing strategy is based upon the HTML5 parsing specification with few deviations.

The package targets similar use-cases to the venerable tagsoup library, but is significantly more efficient, achieving parsing speeds of over 50 megabytes per second on modern hardware with and typical web documents.

For instance,

>>> parseTokens "<div><h1 class=widget>Hello World</h1><br/>"
[TagOpen "div" [],TagOpen "h1" [Attr "class" "widget"],
ContentText "Hello World",TagClose "h1",TagSelfClose "br" []]


Dependencies attoparsec (==0.13.*), base (>=4.7 && <4.11), containers (==0.5.*), deepseq (==1.4.*), text (==1.2.*) [details]
License BSD3
Copyright (c) 2016 Ben Gamari
Author Ben Gamari
Category Text
Home page
Source repository head: git clone git://
Uploaded Thu Aug 10 03:52:43 UTC 2017 by BenGamari
Distributions NixOS:
Downloads 166 total (45 in the last 30 days)
0 []
Status Docs uploaded by user
Build status unknown [no reports yet]
Hackage Matrix CI




Maintainer's Corner

For package maintainers and hackage trustees