The fast-tagsoup-utf8-only package
Fast TagSoup parser. Speeds of 20-200MB/sec were observed.
Works only with strict bytestrings.
This library is intended to be used in conjunction with the original tagsoup package:
import Text.HTML.TagSoup hiding (parseTags, renderTags) import Text.HTML.TagSoup.Fast.Utf8Only
Besides speed fast-tagsoup correctly handles HTML <script> and <style> tags and converts tags to lower case. This fork purposefully removes support for parsing non-utf8 documents, to avoid dependency on text-icu. If you need to handle other encodings, refer to the original http://hackage.haskell.org/package/fast-tagsoup
This parser is used in production in BazQux Reader feeds and comments crawler.
Properties
| Version | 1.0.4 |
|---|---|
| Dependencies | base (4.*), bytestring, tagsoup, text |
| License | BSD3 |
| Copyright | Vladimir Shabanov 2011-2012 |
| Author | Vladimir Shabanov <vshabanoff@gmail.com> |
| Maintainer | Vladimir Shabanov <vshabanoff@gmail.com> |
| Category | XML |
| Home page | https://github.com/vshabanov/fast-tagsoup |
| Source repository | git clone https://github.com/exbb2/fast-tagsoup |
| Upload date | Sat Feb 9 10:56:39 UTC 2013 |
| Uploaded by | MikhailKuddah |
| Built on | ghc-7.6 |
Modules
- Text
- HTML
- TagSoup
- HTML
Downloads
- fast-tagsoup-utf8-only-1.0.4.tar.gz (Cabal source package)
- package description (included in the package)