| Portability | portable | 
|---|---|
| Stability | stable | 
| Maintainer | Uwe Schmidt (uwe@fh-wedel.de) | 
Text.XML.HXT.Arrow.ReadDocument
Description
Compound arrows for reading an XML/HTML document or an XML/HTML string
- readDocument :: SysConfigList -> String -> IOStateArrow s b XmlTree
 - readFromDocument :: SysConfigList -> IOStateArrow s String XmlTree
 - readString :: SysConfigList -> String -> IOStateArrow s b XmlTree
 - readFromString :: SysConfigList -> IOStateArrow s String XmlTree
 - hread :: ArrowXml a => a String XmlTree
 - xread :: ArrowXml a => a String XmlTree
 
Documentation
readDocument :: SysConfigList -> String -> IOStateArrow s b XmlTreeSource
the main document input filter
this filter can be configured by a list of configuration options, a value of type Text.XML.HXT.XmlState.TypeDefs.SysConfig
for all available options see module Text.XML.HXT.XmlState.SystemConfig
-  
withValidate yes/no: switch on/off DTD validation. Only for XML parsed documents, not for HTML parsing. -  
withParseHTML yes/no: switch on HTML parsing. -  
withParseByMimeType yes/no: select XML/HTML parser by document mime type. text/xml and text/xhtml are parsed as XML, text/html as HTML. -  
withCheckNamespaces yes/no: Switch on/off namespace propagation and checking -  
withInputEncoding encoding-spec: Set default encoding. -  
withTagSoup: use light weight and lazy parser based on tagsoup lib. This is only available when package hxt-tagsoup is installed andText.XML.HXT.TagSoupis imported -  
withRelaxNG schema.rng: validate document with Relax NG, the parameter is for the schema URI. This implies using XML parser, no validation against DTD, and canonicalisation. -  
withCurl [curl-option...]: Use the libCurl binding for HTTP access. This is only available when package hxt-curl is installed andText.XML.HXT.Curlis imported -  
withHTTP [http-option...]: Use the Haskell HTTP package for HTTP access. This is only available when package hxt-http is installed andText.XML.HXT.HTTPis imported 
examples:
readDocument [] "test.xml"
reads and validates a document "test.xml", no namespace propagation, only canonicalization is performed
 ...
 import Text.XML.HXT.Curl
 ...
 readDocument [ withValidate        no
              , withInputEncoding   isoLatin1
              , withParseByMimeType yes
              , withCurl []
              ] \"http:\/\/localhost\/test.php\"
reads document "test.php", parses it as HTML or XML depending on the mimetype given from the server, but without validation, default encoding isoLatin1.
HTTP access is done via libCurl.
 readDocument [ withParseHTML       yes
              , withInputEncoding   isoLatin1
              ] ""
reads a HTML document from standard input, no validation is done when parsing HTML, default encoding is isoLatin1,
 readDocument [ withInputEncoding  isoLatin1
              , withValidate       no
              , withMimeTypeFile   "/etc/mime.types"
              , withStrictInput    yes
              ] "test.svg"
reads an SVG document from "test.svg", sets the mime type by looking in the system mimetype config file,
default encoding is isoLatin1,
 ...
 import Text.XML.HXT.Curl
 import Text.XML.HXT.TagSoup
 ...
 readDocument [ withParseHTML      yes
              , withTagSoup
              , withProxy          "www-cache:3128"
              , withCurl           []
              , withWarnings       no
              ] "http://www.haskell.org/"
reads Haskell homepage with HTML parser, ignoring any warnings (at the time of writing, there were some HTML errors), with http access via libCurl interface and proxy "www-cache" at port 3128, parsing is done with tagsoup HTML parser. This requires packages "hxt-curl" and "hxt-tagsoup" to be installed
 readDocument [ withValidate          yes
              , withCheckNamespaces   yes
              , withRemoveWS          yes
              , withTrace             2
              , withHTTP              []
              ] "http://www.w3c.org/"
read w3c home page (xhtml), validate and check namespaces, remove whitespace between tags, trace activities with level 2. HTTP access is done with Haskell HTTP package
for minimal complete examples see Text.XML.HXT.Arrow.WriteDocument.writeDocument and runX, the main starting point for running an XML arrow.
readFromDocument :: SysConfigList -> IOStateArrow s String XmlTreeSource
the arrow version of readDocument, the arrow input is the source URI
readString :: SysConfigList -> String -> IOStateArrow s b XmlTreeSource
read a document that is stored in a normal Haskell String
the same function as readDocument, but the parameter forms the input.
 All options available for readDocument are applicable for readString.
Default encoding: No encoding is done, the String argument is taken as Unicode string
readFromString :: SysConfigList -> IOStateArrow s String XmlTreeSource
the arrow version of readString, the arrow input is the source URI