HandsomeSoup: Work with HTML more easily in HXT

[ bsd3, library, text ] [ Propose Tags ]

See examples and full readme on the Github page: https://github.com/egonSchiele/HandsomeSoup

[Skip to Readme]

Modules

[Index]

Text
- CSS
  - Text.CSS.Parser
- Text.HandsomeSoup

Downloads

HandsomeSoup-0.1.tar.gz [browse] (Cabal source package)
Package description (revised from the package)

Note: This package has metadata revisions in the cabal description newer than included in the tarball. To unpack the package including the revisions, use 'cabal get'.

Maintainer's Corner

Package maintainers

AdityaBhargava

For package maintainers and hackage trustees

edit package information

Candidates

No Candidates

Versions [RSS]	0.1, 0.2, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.4, 0.4.2
Dependencies	base (<5), containers, HTTP, hxt, MaybeT, mtl, network (<2.6), parsec, transformers [details]
License	BSD-3-Clause
Author	Aditya Bhargava
Maintainer	bluemangroupie@gmail.com
Revised	Revision 1 made by AdamBergmark at 2015-05-10T12:35:10Z
Category	Text
Home page	https://github.com/egonSchiele/HandsomeSoup
Uploaded	by AdityaBhargava at 2012-04-24T22:12:15Z
Distributions	LTSHaskell:0.4.2, NixOS:0.4.2, Stackage:0.4.2
Reverse Dependencies	5 direct, 1 indirect [details]
Downloads	13533 total (36 in the last 30 days)
Rating	(no votes yet) [estimated by Bayesian average]
Your Rating	λ λ λ
Status	Docs uploaded by user Build status unknown [no reports yet]

Readme for HandsomeSoup-0.1

[back to package description]

HandsomeSoup

Current Status: very very pre-alpha. Usable but buggy.

HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell.

It is built on top of HXT and adds a few functions that make is easier to work with HTML.

Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 parser for HXT (it is very close to this right now).

Example

Nokogiri, the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup:

main = do
    doc <- fromUrl "http://www.google.com/search?q=egon+schiele"
    links <- runX $ doc >>> css "h3.r a" ! "href"
    mapM_ putStrLn links

What can HandsomeSoup do for you?

Easily parse an online page using `fromUrl`

doc <- fromUrl "http://example.com"

Or a local page using `parseHtml`

contents <- readFile [filename]
doc <- parseHtml contents

Easily extract elements using `css`

Here are some valid selectors:

doc <<< css "a"
doc <<< css "*"
doc <<< css "a#link1"
doc <<< css "a.foo"
doc <<< css "p > a"
doc <<< css "#container h1"
doc <<< css "img[width]"
doc <<< css "img[width=400]"
doc <<< css "a[class~=bar]"

Easily get attributes using `(!)`

doc <<< css "img" ! "src"
doc <<< css "a" ! "href"