This module provides functions to parse an XML document to a tree structure, either strictly or lazily.
The GenericXMLString
type class allows you to use any string type. Three
string types are provided for here: String
, ByteString
and Text
.
Here is a complete example to get you started:
-- | A "hello world" example of hexpat that lazily parses a document, printing -- it to standard out. import Text.XML.Expat.Tree import Text.XML.Expat.Format import System.Environment import System.Exit import System.IO import qualified Data.ByteString.Lazy as L main = do args <- getArgs case args of [filename] -> process filename otherwise -> do hPutStrLn stderr "Usage: helloworld <file.xml>" exitWith $ ExitFailure 1 process :: String -> IO () process filename = do inputText <- L.readFile filename -- Note: Because we're not using the tree, Haskell can't infer the type of -- strings we're using so we need to tell it explicitly with a type signature. let (xml, mErr) = parse defaultParseOptions inputText :: (UNode String, Maybe XMLParseError) -- Process document before handling error, so we get lazy processing. L.hPutStr stdout $ format xml putStrLn "" case mErr of Nothing -> return () Just err -> do hPutStrLn stderr $ "XML parse failed: "++show err exitWith $ ExitFailure 2
Error handling in strict parses is very straightforward - just check the
Either
return value. Lazy parses are not so simple. Here are two working
examples that illustrate the ways to handle errors. Here they are:
Way no. 1 - Using a Maybe value
import Text.XML.Expat.Tree import qualified Data.ByteString.Lazy as L import Data.ByteString.Internal (c2w) -- This is the recommended way to handle errors in lazy parses main = do let (tree, mError) = parse defaultParseOptions (L.pack $ map c2w $ "<top><banana></apple></top>") print (tree :: UNode String) -- Note: We check the error _after_ we have finished our processing -- on the tree. case mError of Just err -> putStrLn $ "It failed : "++show err Nothing -> putStrLn "Success!"
Way no. 2 - Using exceptions
parseThrowing
can throw an exception from pure code, which is generally a bad
way to handle errors, because Haskell's lazy evaluation means it's hard to
predict where it will be thrown from. However, it may be acceptable in
situations where it's not expected during normal operation, depending on the
design of your program.
... import Control.Exception.Extensible as E -- This is not the recommended way to handle errors. main = do do let tree = parseThrowing defaultParseOptions (L.pack $ map c2w $ "<top><banana></apple></top>") print (tree :: UNode String) -- Because of lazy evaluation, you should not process the tree outside -- the 'do' block, or exceptions could be thrown that won't get caught. `E.catch` (\exc -> case E.fromException exc of Just (XMLParseException err) -> putStrLn $ "It failed : "++show err Nothing -> E.throwIO exc)
- type Node tag text = NodeG [] tag text
- data NodeG c tag text
- type UNode text = Node text text
- module Text.XML.Expat.Internal.NodeClass
- type QNode text = Node (QName text) text
- module Text.XML.Expat.Internal.Qualified
- type NNode text = Node (NName text) text
- module Text.XML.Expat.Internal.Namespaced
- data ParseOptions tag text = ParseOptions {
- overrideEncoding :: Maybe Encoding
- entityDecoder :: Maybe (tag -> Maybe text)
- defaultParseOptions :: ParseOptions tag text
- data Encoding
- parse :: (GenericXMLString tag, GenericXMLString text) => ParseOptions tag text -> ByteString -> (Node tag text, Maybe XMLParseError)
- parse' :: (GenericXMLString tag, GenericXMLString text) => ParseOptions tag text -> ByteString -> Either XMLParseError (Node tag text)
- data XMLParseError = XMLParseError String XMLParseLocation
- data XMLParseLocation = XMLParseLocation {}
- parseThrowing :: (GenericXMLString tag, GenericXMLString text) => ParseOptions tag text -> ByteString -> Node tag text
- data XMLParseException = XMLParseException XMLParseError
- saxToTree :: GenericXMLString tag => [SAXEvent tag text] -> (Node tag text, Maybe XMLParseError)
- class (Monoid s, Eq s) => GenericXMLString s where
- gxNullString :: s -> Bool
- gxToString :: s -> String
- gxFromString :: String -> s
- gxFromChar :: Char -> s
- gxHead :: s -> Char
- gxTail :: s -> s
- gxBreakOn :: Char -> s -> (s, s)
- gxFromCStringLen :: CStringLen -> IO s
- gxToByteString :: s -> ByteString
- eAttrs :: Node tag text -> [(tag, text)]
- type Nodes tag text = [Node tag text]
- type UNodes text = Nodes text text
- type QNodes text = [Node (QName text) text]
- type NNodes text = [Node (NName text) text]
- parseTree :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> (Node tag text, Maybe XMLParseError)
- parseTree' :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> Either XMLParseError (Node tag text)
- parseSAX :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [SAXEvent tag text]
- parseSAXLocations :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [(SAXEvent tag text, XMLParseLocation)]
- parseTreeThrowing :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> Node tag text
- parseSAXThrowing :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [SAXEvent tag text]
- parseSAXLocationsThrowing :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [(SAXEvent tag text, XMLParseLocation)]
- type ParserOptions tag text = ParseOptions tag text
- defaultParserOptions :: ParseOptions tag text
Tree structure
type Node tag text = NodeG [] tag textSource
A pure tree representation that uses a list as its container type.
In the hexpat
package, a list of nodes has the type [Node tag text]
, but note
that you can also use the more general type function ListOf
to give a list of
any node type, using that node's associated list type, e.g.
ListOf (UNode Text)
.
The tree representation of the XML document.
c
is the container type for the element's children, which is [] in the
hexpat
package, and a monadic list type for hexpat-iteratee
.
tag
is the tag type, which can either be one of several string types,
or a special type from the Text.XML.Expat.Namespaced
or
Text.XML.Expat.Qualified
modules.
text
is the string type for text content.
type UNode text = Node text textSource
Type alias for a node with unqualified tag names where tag and text are the same string type.
Generic node manipulation
Qualified nodes
type QNode text = Node (QName text) textSource
Type alias for a node where qualified names are used for tags
Namespaced nodes
type NNode text = Node (NName text) textSource
Type alias for a node where namespaced names are used for tags
Parse to tree
data ParseOptions tag text Source
ParseOptions | |
|
defaultParseOptions :: ParseOptions tag textSource
:: (GenericXMLString tag, GenericXMLString text) | |
=> ParseOptions tag text | Parse options |
-> ByteString | Input text (a lazy ByteString) |
-> (Node tag text, Maybe XMLParseError) |
Lazily parse XML to tree. Note that forcing the XMLParseError return value will force the entire parse. Therefore, to ensure lazy operation, don't check the error status until you have processed the tree.
:: (GenericXMLString tag, GenericXMLString text) | |
=> ParseOptions tag text | Parse options |
-> ByteString | Input text (a strict ByteString) |
-> Either XMLParseError (Node tag text) |
Strictly parse XML to tree. Returns error message or valid parsed tree.
data XMLParseError Source
Parse error, consisting of message text and error location
data XMLParseLocation Source
Specifies a location of an event within the input text
XMLParseLocation | |
|
Variant that throws exceptions
:: (GenericXMLString tag, GenericXMLString text) | |
=> ParseOptions tag text | Parse options |
-> ByteString | Input text (a lazy ByteString) |
-> Node tag text |
Lazily parse XML to tree. In the event of an error, throw XMLParseException
.
parseThrowing
can throw an exception from pure code, which is generally a bad
way to handle errors, because Haskell's lazy evaluation means it's hard to
predict where it will be thrown from. However, it may be acceptable in
situations where it's not expected during normal operation, depending on the
design of your program.
data XMLParseException Source
An exception indicating an XML parse error, used by the ..Throwing variants.
Convert from SAX
saxToTree :: GenericXMLString tag => [SAXEvent tag text] -> (Node tag text, Maybe XMLParseError)Source
A lower level function that lazily converts a SAX stream into a tree structure.
Abstraction of string types
class (Monoid s, Eq s) => GenericXMLString s whereSource
An abstraction for any string type you want to use as xml text (that is,
attribute values or element text content). If you want to use a
new string type with hexpat, you must make it an instance of
GenericXMLString
.
gxNullString :: s -> BoolSource
gxToString :: s -> StringSource
gxFromString :: String -> sSource
gxFromChar :: Char -> sSource
gxBreakOn :: Char -> s -> (s, s)Source
gxFromCStringLen :: CStringLen -> IO sSource
gxToByteString :: s -> ByteStringSource
Deprecated
type Nodes tag text = [Node tag text]Source
DEPRECATED: Use [Node tag text] instead.
Type alias for nodes.
type UNodes text = Nodes text textSource
DEPRECATED: Use [UNode text] instead.
Type alias for nodes with unqualified tag names where tag and text are the same string type. DEPRECATED.
type QNodes text = [Node (QName text) text]Source
DEPRECATED: Use [QNode text] instead.
Type alias for nodes where qualified names are used for tags
type NNodes text = [Node (NName text) text]Source
DEPRECATED: Use [NNode text] instead.
Type alias for nodes where namespaced names are used for tags.
:: (GenericXMLString tag, GenericXMLString text) | |
=> Maybe Encoding | Optional encoding override |
-> ByteString | Input text (a lazy ByteString) |
-> (Node tag text, Maybe XMLParseError) |
DEPREACTED: Use parse
instead.
Lazily parse XML to tree. Note that forcing the XMLParseError return value will force the entire parse. Therefore, to ensure lazy operation, don't check the error status until you have processed the tree.
:: (GenericXMLString tag, GenericXMLString text) | |
=> Maybe Encoding | Optional encoding override |
-> ByteString | Input text (a strict ByteString) |
-> Either XMLParseError (Node tag text) |
DEPRECATED: use parse
instead.
Strictly parse XML to tree. Returns error message or valid parsed tree.
:: (GenericXMLString tag, GenericXMLString text) | |
=> Maybe Encoding | Optional encoding override |
-> ByteString | Input text (a lazy ByteString) |
-> [SAXEvent tag text] |
:: (GenericXMLString tag, GenericXMLString text) | |
=> Maybe Encoding | Optional encoding override |
-> ByteString | Input text (a lazy ByteString) |
-> [(SAXEvent tag text, XMLParseLocation)] |
DEPRECATED: Use parseLocations
instead.
A variant of parseSAX that gives a document location with each SAX event.
:: (GenericXMLString tag, GenericXMLString text) | |
=> Maybe Encoding | Optional encoding override |
-> ByteString | Input text (a lazy ByteString) |
-> Node tag text |
DEPRECATED: Use parseThrowing
instead.
Lazily parse XML to tree. In the event of an error, throw XMLParseException
.
:: (GenericXMLString tag, GenericXMLString text) | |
=> Maybe Encoding | Optional encoding override |
-> ByteString | Input text (a lazy ByteString) |
-> [SAXEvent tag text] |
DEPRECATED: Use parseThrowing
instead.
Lazily parse XML to SAX events. In the event of an error, throw
XMLParseException
.
parseSAXLocationsThrowingSource
:: (GenericXMLString tag, GenericXMLString text) | |
=> Maybe Encoding | Optional encoding override |
-> ByteString | Input text (a lazy ByteString) |
-> [(SAXEvent tag text, XMLParseLocation)] |
DEPRECATED: Used parseLocationsThrowing
instead.
A variant of parseSAX that gives a document location with each SAX event.
In the event of an error, throw XMLParseException
.
type ParserOptions tag text = ParseOptions tag textSource
defaultParserOptions :: ParseOptions tag textSource
DEPRECATED. Renamed to defaultParseOptions.