# HandsomeSoup

Current Status: very very pre-alpha. Usable but buggy.

HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell.

It is built on top of [HXT](http://www.fh-wedel.de/~si/HXmlToolbox/) and adds a few functions that make is easier to work with HTML.

Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 parser for HXT (it is very close to this right now).

## Example

[Nokogiri](http://nokogiri.org/), the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup:

    main = do
        doc <- fromUrl "http://www.google.com/search?q=egon+schiele"
        links <- runX $ doc >>> css "h3.r a" ! "href"
        mapM_ putStrLn links

## What can HandsomeSoup do for you?

### Easily parse an online page using `fromUrl`

    doc <- fromUrl "http://example.com"

### Or a local page using `parseHtml`

    contents <- readFile [filename]
    doc <- parseHtml contents

### Easily extract elements using `css`

Here are some valid selectors:

    doc <<< css "a"
    doc <<< css "*"
    doc <<< css "a#link1"
    doc <<< css "a.foo"
    doc <<< css "p > a"
    doc <<< css "#container h1"
    doc <<< css "img[width]"
    doc <<< css "img[width=400]"
    doc <<< css "a[class~=bar]"

### Easily get attributes using `(!)`

    doc <<< css "img" ! "src"
    doc <<< css "a" ! "href"