html-conduit: Parse HTML documents using xml-conduit datatypes.

[ conduit, library, mit, text, web ] [ Propose Tags ]

This package uses tagstream-conduit for its parser. It automatically balances mismatched tags, so that there shouldn't be any parse failures. It does not handle a full HTML document rendering, such as adding missing html and head tags. Note that, since version 1.3.1, it uses an inlined copy of tagstream-conduit with entity decoding bugfixes applied.


[Skip to Readme]
Versions 0.0.0, 0.0.1, 0.1.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 0.1.0.4, 1.1.0, 1.1.0.1, 1.1.0.2, 1.1.0.3, 1.1.0.4, 1.1.0.5, 1.1.0.6, 1.1.1, 1.1.1.1, 1.1.1.2, 1.2.0, 1.2.1, 1.2.1.1, 1.2.1.2, 1.3.0, 1.3.1, 1.3.2
Change log ChangeLog.md
Dependencies attoparsec, base (==4.*), bytestring, conduit (>=1.3), conduit-extra, containers, resourcet (>=1.2), text, transformers, xml-conduit (>=1.3), xml-types (==0.3.*) [details]
License MIT
Author Michael Snoyman
Maintainer michael@snoyman.com
Category Web, Text, Conduit
Home page https://github.com/snoyberg/xml
Source repo head: git clone git://github.com/snoyberg/xml.conduit
Uploaded by MichaelSnoyman at Sat Oct 20 17:07:31 UTC 2018
Distributions Arch:1.3.2, Debian:1.3.1, FreeBSD:1.2.0, LTSHaskell:1.3.2, NixOS:1.3.2, Stackage:1.3.2
Downloads 26214 total (129 in the last 30 days)
Rating (no votes yet) [estimated by rule of succession]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2018-10-20 [all 1 reports]
Hackage Matrix CI

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

For package maintainers and hackage trustees


Readme for html-conduit-1.3.2

[back to package description]

This package uses tagstream-conduit for its parser. It automatically balances mismatched tags, so that there shouldn't be any parse failures. It does not handle a full HTML document rendering, such as adding missing html and head tags. Note that, since version 1.3.1, it uses an inlined copy of tagstream-conduit with entity decoding bugfixes applied.

Simple usage example:

#!/usr/bin/env stack
{- stack --install-ghc --resolver lts-6.23 runghc
   --package http-conduit --package html-conduit
-}
{-# LANGUAGE OverloadedStrings #-}
import qualified Data.Text.IO        as T
import           Network.HTTP.Simple (httpSink)
import           Text.HTML.DOM       (sinkDoc)
import           Text.XML.Cursor     (attributeIs, content, element,
                                      fromDocument, ($//), (&/), (&//))

main :: IO ()
main = do
    doc <- httpSink "http://www.yesodweb.com/book" $ const sinkDoc
    let cursor = fromDocument doc
    T.putStrLn "Chapters in the Yesod book:\n"
    mapM_ T.putStrLn
      $ cursor
      $// attributeIs "class" "main-listing"
      &// element "li"
      &/ element "a"
      &/ content