| |||||||||||
| |||||||||||
Description | |||||||||||
Version : $Id: ReadDocument.hs,v 1.10 20061124 07:41:37 hxml Exp $ Compound arrows for reading an XML/HTML document or an XML/HTML string | |||||||||||
Synopsis | |||||||||||
| |||||||||||
Documentation | |||||||||||
| |||||||||||
the main document input filter this filter can be configured by an option list, a value of type Attributes available options:
All attributes not evaluated by readDocument are stored in the created document root node for easy access of the various options in e.g. the input/output modules If the document name is the empty string or an uri of the form "stdin:", the document is read from standard input. examples: readDocument [ ] "test.xml" reads and validates a document "test.xml", no namespace propagation, only canonicalization is performed readDocument [ (a_validate, "0") , (a_encoding, isoLatin1) ] "test.xml" reads document "test.xml" without validation, default encoding isoLatin1. readDocument [ (a_parse_html, "1") , (a_encoding, isoLatin1) ] "" reads a HTML document from standard input, no validation is done when parsing HTML, default encoding is isoLatin1 readDocument [ (a_parse_html, "1") , (a_proxy, "www-cache:3128") , (a_curl, "1") , (a_issue_warnings, "0") ] "http://www.haskell.org/" reads Haskell homepage with HTML parser ignoring any warnings, with http access via external program curl and proxy "www-cache" at port 3128 readDocument [ (a_validate, "1") , (a_check_namespace, "1") , (a_remove_whitespace, "1") , (a_trace, "2") ] "http://www.w3c.org/" read w3c home page (xhtml), validate and check namespaces, remove whitespace between tags, trace activities with level 2 for minimal complete examples see Text.XML.HXT.Arrow.WriteDocument.writeDocument and runX, the main starting point for running an XML arrow. | |||||||||||
| |||||||||||
the arrow version of readDocument, the arrow input is the source URI | |||||||||||
| |||||||||||
read a document that is stored in a normal Haskell String the same function as readDocument, but the parameter forms the input. All options available for readDocument are applicable for readString. Default encoding: No encoding is done, the String argument is taken as Unicode string | |||||||||||
| |||||||||||
the arrow version of readString, the arrow input is the source URI | |||||||||||
| |||||||||||
parse a string as HTML content, substitute all HTML entity refs and canonicalize tree (substitute char refs, ...). Errors are ignored. A simpler version of readFromString but with less functionality. Does not run in the IO monad | |||||||||||
| |||||||||||
parse a string as XML content, substitute all predefined XML entity refs and canonicalize tree (substitute char refs, ...) | |||||||||||
Produced by Haddock version 2.3.0 |