HandsomeSoup: Work with HTML more easily in HXT

[ bsd3, library, text ] [ Propose Tags ]

See examples and full readme on the Github page: https://github.com/egonSchiele/HandsomeSoup

[Skip to Readme]


Automatic Flags

Get Network.URI from the network-uri package


Build examples


Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info


Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees


  • No Candidates
Versions [RSS] 0.1, 0.2, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.4, 0.4.2
Dependencies base (>=4.6 && <5), containers, HandsomeSoup, HTTP, hxt, hxt-http, mtl, network, network-uri, parsec, transformers (>=0.3) [details]
License BSD-3-Clause
Author Aditya Bhargava
Maintainer bluemangroupie@gmail.com
Category Text
Home page https://github.com/egonSchiele/HandsomeSoup
Uploaded by AdityaBhargava at 2015-06-09T20:16:37Z
Distributions LTSHaskell:0.4.2, NixOS:0.4.2, Stackage:0.4.2
Reverse Dependencies 5 direct, 1 indirect [details]
Executables handsomesoup
Downloads 13559 total (36 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2015-06-11 [all 1 reports]

Readme for HandsomeSoup-0.4.2

[back to package description]


Current Status: Usable and stable. Needs GHC 7.6. Please file bugs!

HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell.

It is built on top of HXT and adds a few functions that make it easier to work with HTML.

Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 selector parser for HXT.


cabal install HandsomeSoup


Nokogiri, the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup:

import Text.XML.HXT.Core
import Text.HandsomeSoup

main = do
    let doc = fromUrl "http://www.google.com/search?q=egon+schiele"
    links <- runX $ doc >>> css "h3.r a" ! "href"
    mapM_ putStrLn links

What can HandsomeSoup do for you?

Easily parse an online page using fromUrl

let doc = fromUrl "http://example.com"

Or a local page using parseHtml

contents <- readFile [filename]
let doc = parseHtml contents

Easily extract elements using css

Here are some valid selectors:

doc <<< css "a"
doc <<< css "*"
doc <<< css "a#link1"
doc <<< css "a.foo"
doc <<< css "p > a"
doc <<< css "p strong"
doc <<< css "#container h1"
doc <<< css "img[width]"
doc <<< css "img[width=400]"
doc <<< css "a[class~=bar]"
doc <<< css "a:first-child"

Easily get attributes using (!)

doc <<< css "img" ! "src"
doc <<< css "a" ! "href"


Find Haddock docs on Hackage.

I also wrote The Complete Guide To Parsing HXT With Haskell.


Made by Adit.