# HandsomeSoup Current Status: Usable and stable. **Needs GHC 7.6**. Please file bugs! HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell. It is built on top of [HXT](http://www.fh-wedel.de/~si/HXmlToolbox/) and adds a few functions that make it easier to work with HTML. Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 selector parser for HXT. ## Install cabal install HandsomeSoup ## Example [Nokogiri](http://nokogiri.org/), the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup: import Text.XML.HXT.Core import Text.HandsomeSoup main = do let doc = fromUrl "http://www.google.com/search?q=egon+schiele" links <- runX $ doc >>> css "h3.r a" ! "href" mapM_ putStrLn links ## What can HandsomeSoup do for you? ### Easily parse an online page using `fromUrl` let doc = fromUrl "http://example.com" ### Or a local page using `parseHtml` contents <- readFile [filename] let doc = parseHtml contents ### Easily extract elements using `css` Here are some valid selectors: doc <<< css "a" doc <<< css "*" doc <<< css "a#link1" doc <<< css "a.foo" doc <<< css "p > a" doc <<< css "p strong" doc <<< css "#container h1" doc <<< css "img[width]" doc <<< css "img[width=400]" doc <<< css "a[class~=bar]" doc <<< css "a:first-child" ### Easily get attributes using `(!)` doc <<< css "img" ! "src" doc <<< css "a" ! "href" ## Docs Find [Haddock docs on Hackage](http://hackage.haskell.org/package/HandsomeSoup). I also wrote [The Complete Guide To Parsing HXT With Haskell](http://adit.io/posts/2012-04-14-working_with_HTML_in_haskell.html). ## Credits Made by [Adit](http://adit.io).