tagsoup: Parsing and extracting information from (possibly malformed) HTML/XML documents
TagSoup is a library for parsing HTML/XML. It supports the HTML 5 specification, and can be used to parse either well-formed XML, or unstructured and malformed HTML from the web. The library also provides useful functions to extract information from an HTML document, making it ideal for screen-scraping.
Users should start from the Text.HTML.TagSoup module.
| Versions | 0.1, 0.4, 0.6, 0.8 |
|---|---|
| Dependencies | base (>=4 && <5), bytestring, containers, deepseq (==1.1.0.0), HTTP, mtl, network, QuickCheck (>=2.1 && <2.2), time |
| License | BSD3 |
| Copyright | Neil Mitchell 2006-2010 |
| Author | Neil Mitchell <ndmitchell@gmail.com> |
| Maintainer | Neil Mitchell <ndmitchell@gmail.com> |
| Category | XML |
| Home page | http://community.haskell.org/~ndm/tagsoup/ |
| Executables | tagsoup |
| Upload date | Thu Jan 7 20:14:58 UTC 2010 |
| Uploaded by | NeilMitchell |
| Built on | ghc-6.10, ghc-6.12 |
| Distributions | Debian: 0.6, Arch: 0.8 |
Modules
- Text
Downloads
- tagsoup-0.8.tar.gz (Cabal source package)
- package description (included in the package)
