tagsoup: Parsing and extracting information from (possibly malformed) HTML documents

[ bsd3, library, xml ] [ Propose Tags ]

TagSoup is a library for extracting information out of unstructured HTML code, sometimes known as tag-soup. The HTML does not have to be well formed, or render properly within any particular framework. This library is for situations where the author of the HTML is not cooperating with the person trying to extract the information, but is also not trying to hide the information.

Versions [faq] 0.1, 0.4, 0.6, 0.8, 0.9, 0.10, 0.10.1, 0.11, 0.11.1, 0.12, 0.12.1, 0.12.2, 0.12.3, 0.12.4, 0.12.5, 0.12.6, 0.12.7, 0.12.8, 0.13, 0.13.1, 0.13.2, 0.13.3, 0.13.4, 0.13.5, 0.13.6, 0.13.7, 0.13.8, 0.13.9, 0.13.10, 0.14, 0.14.1, 0.14.2, 0.14.3, 0.14.4, 0.14.5, 0.14.6, 0.14.7, 0.14.8
Dependencies base (<4.8), mtl, network [details]
License BSD-3-Clause
Copyright 2006-8, Neil Mitchell
Author Neil Mitchell
Maintainer ndmitchell@gmail.com
Revised Revision 1 made by AdamBergmark at 2015-04-02T15:43:03Z
Category XML
Home page http://www-users.cs.york.ac.uk/~ndm/tagsoup/
Uploaded by NeilMitchell at 2008-01-14T17:57:13Z
Distributions Arch:0.14.8, Debian:0.14.6, Fedora:0.14.8, FreeBSD:0.13.3, LTSHaskell:0.14.8, NixOS:0.14.8, Stackage:0.14.8, openSUSE:0.14.8
Executables tagsoup
Downloads 175762 total (570 in the last 30 days)
Rating 2.5 (votes: 3) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Hackage Matrix CI
Docs uploaded by user
Build status unknown [no reports yet]

Modules

[Index]

Downloads

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

For package maintainers and hackage trustees