The HandsomeSoup package

[Tags: bsd3, library]

See examples and full readme on the Github page:

[Skip to ReadMe]


Versions0.1, 0.2, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.4, 0.4.2
Change logNone available
Dependenciesbase (>=4.6 && <5), containers, HandsomeSoup, HTTP, hxt, hxt-http, mtl, network, network-uri, parsec, transformers (>=0.3) [details]
AuthorAditya Bhargava
Home page
UploadedTue Jun 9 20:16:37 UTC 2015 by AdityaBhargava
DistributionsLTSHaskell:0.4.2, NixOS:0.4.2, Stackage:0.4.2
Downloads3811 total (41 in last 30 days)
0 []
StatusDocs available [build log]
Last success reported on 2015-06-11 [all 1 reports]




network-uriGet Network.URI from the network-uri packageEnabledAutomatic
buildexamplesBuild examplesDisabledAutomatic

Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info


Maintainers' corner

For package maintainers and hackage trustees

Readme for HandsomeSoup-0.4.2


Current Status: Usable and stable. Needs GHC 7.6. Please file bugs!

HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell.

It is built on top of HXT and adds a few functions that make it easier to work with HTML.

Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 selector parser for HXT.


cabal install HandsomeSoup


Nokogiri, the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup:

import Text.XML.HXT.Core
import Text.HandsomeSoup

main = do
    let doc = fromUrl ""
    links <- runX $ doc >>> css "h3.r a" ! "href"
    mapM_ putStrLn links

What can HandsomeSoup do for you?

Easily parse an online page using fromUrl

let doc = fromUrl ""

Or a local page using parseHtml

contents <- readFile [filename]
let doc = parseHtml contents

Easily extract elements using css

Here are some valid selectors:

doc <<< css "a"
doc <<< css "*"
doc <<< css "a#link1"
doc <<< css ""
doc <<< css "p > a"
doc <<< css "p strong"
doc <<< css "#container h1"
doc <<< css "img[width]"
doc <<< css "img[width=400]"
doc <<< css "a[class~=bar]"
doc <<< css "a:first-child"

Easily get attributes using (!)

doc <<< css "img" ! "src"
doc <<< css "a" ! "href"


Find Haddock docs on Hackage.

I also wrote The Complete Guide To Parsing HXT With Haskell.


Made by Adit.