The HandsomeSoup package

[Tags: bsd3, library]

See examples and full readme on the Github page:

[Skip to ReadMe]


Versions0.1, 0.2, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.4, 0.4.2
Change logNone available
Dependenciesbase (<5), containers, HTTP, hxt, MaybeT, mtl, network (<2.6), parsec, transformers [details]
AuthorAditya Bhargava
Home page
UploadedTue Apr 24 22:12:15 UTC 2012 by AdityaBhargava
UpdatedSun May 10 12:35:10 UTC 2015 by AdamBergmark to revision 1
DistributionsLTSHaskell:0.4.2, NixOS:0.4.2, Stackage:0.4.2
Downloads3810 total (53 in last 30 days)
0 []
StatusDocs uploaded by user
Build status unknown [no reports yet]




Maintainers' corner

For package maintainers and hackage trustees

Readme for HandsomeSoup-0.1


Current Status: very very pre-alpha. Usable but buggy.

HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell.

It is built on top of HXT and adds a few functions that make is easier to work with HTML.

Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 parser for HXT (it is very close to this right now).


Nokogiri, the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup:

main = do
    doc <- fromUrl ""
    links <- runX $ doc >>> css "h3.r a" ! "href"
    mapM_ putStrLn links

What can HandsomeSoup do for you?

Easily parse an online page using fromUrl

doc <- fromUrl ""

Or a local page using parseHtml

contents <- readFile [filename]
doc <- parseHtml contents

Easily extract elements using css

Here are some valid selectors:

doc <<< css "a"
doc <<< css "*"
doc <<< css "a#link1"
doc <<< css ""
doc <<< css "p > a"
doc <<< css "#container h1"
doc <<< css "img[width]"
doc <<< css "img[width=400]"
doc <<< css "a[class~=bar]"

Easily get attributes using (!)

doc <<< css "img" ! "src"
doc <<< css "a" ! "href"