| |||||||||||
| |||||||||||
Description | |||||||||||
Version : $Id: ReadDocument.hs,v 1.10 20061124 07:41:37 hxml Exp $ Compound arrows for reading an XML/HTML document or an XML/HTML string | |||||||||||
Synopsis | |||||||||||
| |||||||||||
Documentation | |||||||||||
| |||||||||||
the main document input filter this filter can be configured by an option list, a value of type Attributes available options:
All attributes not evaluated by readDocument are stored in the created document root node for easy access of the various options in e.g. the input/output modules If the document name is the empty string or an uri of the form "stdin:", the document is read from standard input. examples: readDocument [ ] "test.xml" reads and validates a document "test.xml", no namespace propagation, only canonicalization is performed readDocument [ (a_validate, "0") , (a_encoding, isoLatin1) , (a_parse_by_mimetype, "1") ] "http://localhost/test.php" reads document "test.php", parses it as HTML or XML depending on the mimetype given from the server, but without validation, default encoding isoLatin1. readDocument [ (a_parse_html, "1") , (a_encoding, isoLatin1) ] "" reads a HTML document from standard input, no validation is done when parsing HTML, default encoding is isoLatin1, parsing is done with tagsoup parser readDocument [ (a_encoding, isoLatin1) , (a_mime_type, "/etc/mime.types") , (a_tagsoup, "1") ] "test.svg" reads an SVG document from standard input, sets the mime type by looking in the system mimetype config file, default encoding is isoLatin1, parsing is done with the lightweight tagsoup parser, which implies no validation. readDocument [ (a_parse_html, "1") , (a_proxy, "www-cache:3128") , (a_curl, "1") , (a_issue_warnings, "0") ] "http://www.haskell.org/" reads Haskell homepage with HTML parser ignoring any warnings, with http access via external program curl and proxy "www-cache" at port 3128 readDocument [ (a_validate, "1") , (a_check_namespace, "1") , (a_remove_whitespace, "1") , (a_trace, "2") ] "http://www.w3c.org/" read w3c home page (xhtml), validate and check namespaces, remove whitespace between tags, trace activities with level 2 for minimal complete examples see Text.XML.HXT.Arrow.WriteDocument.writeDocument and runX, the main starting point for running an XML arrow. | |||||||||||
| |||||||||||
the arrow version of readDocument, the arrow input is the source URI | |||||||||||
| |||||||||||
read a document that is stored in a normal Haskell String the same function as readDocument, but the parameter forms the input. All options available for readDocument are applicable for readString. Default encoding: No encoding is done, the String argument is taken as Unicode string | |||||||||||
| |||||||||||
the arrow version of readString, the arrow input is the source URI | |||||||||||
| |||||||||||
parse a string as HTML content, substitute all HTML entity refs and canonicalize tree (substitute char refs, ...). Errors are ignored. A simpler version of readFromString but with less functionality. Does not run in the IO monad | |||||||||||
| |||||||||||
parse a string as XML content, substitute all predefined XML entity refs and canonicalize tree (substitute char refs, ...) | |||||||||||
Produced by Haddock version 2.3.0 |