hexpat-0.7: wrapper for expat, the fast XML parserSource codeContentsIndex
Text.XML.Expat.Tree
Contents
Tree structure
Parse to tree
SAX-style parse
Variants that throw exceptions
Abstraction of string types
Description

This module provides functions to parse an XML document to a tree structure, either strictly or lazily, as well as a lazy SAX-style interface.

The GenericXMLString type class allows you to use any string type. Three string types are provided for here: String, ByteString and Text.

Here is a complete example to get you started:

 -- | A "hello world" example of hexpat that lazily parses a document, printing
 -- it to standard out.

 import Text.XML.Expat.Tree
 import Text.XML.Expat.Format
 import System.Environment
 import System.Exit
 import System.IO
 import qualified Data.ByteString.Lazy as L

 main = do
     args <- getArgs
     case args of
         [filename] -> process filename
         otherwise  -> do
             hPutStrLn stderr "Usage: helloworld <file.xml>"
             exitWith $ ExitFailure 1

 process :: String -> IO ()
 process filename = do
     inputText <- L.readFile filename
     -- Note: Because we're not using the tree, Haskell can't infer the type of
     -- strings we're using so we need to tell it explicitly with a type signature.
     let (xml, mErr) = parseTree Nothing inputText :: (UNode String, Maybe XMLParseError)
     -- Process document before handling error, so we get lazy processing.
     L.hPutStr stdout $ formatTree xml
     putStrLn ""
     case mErr of
         Nothing -> return ()
         Just err -> do
             hPutStrLn stderr $ "XML parse failed: "++show err
             exitWith $ ExitFailure 2

Error handling in strict parses is very straight forward - just check the Either return value. Lazy parses are not so simple. Here are two working examples that illustrate the ways to handle errors. Here they are:

Way no. 1 - Using a Maybe value

 import Text.XML.Expat.Tree
 import qualified Data.ByteString.Lazy as L
 import Data.ByteString.Internal (c2w)

 -- This is the recommended way to handle errors in lazy parses
 main = do
     let (tree, mError) = parseTree Nothing (L.pack $ map c2w $ "<top><banana></apple></top>")
     print (tree :: UNode String)
     -- Note: We check the error _after_ we have finished our processing on the tree.
     case mError of
         Just err -> putStrLn $ "It failed : "++show err
         Nothing -> putStrLn "Success!"

Way no. 2 - Using exceptions

Unless exceptions fit in with the design of your program, this way is less preferred.

 ...
 import Control.Exception.Extensible as E

 -- This is not the recommended way to handle errors.
 main = do
     do
         let tree = parseTreeThrowing Nothing (L.pack $ map c2w $ "<top><banana></apple></top>")
         print (tree :: UNode String)
         -- Because of lazy evaluation, you should not process the tree outside the 'do' block,
         -- or exceptions could be thrown that won't get caught.
     `E.catch` (\exc ->
         case E.fromException exc of
             Just (XMLParseException err) -> putStrLn $ "It failed : "++show err
             Nothing -> E.throwIO exc)
Synopsis
data Node tag text
= Element {
eName :: !tag
eAttrs :: ![(tag, text)]
eChildren :: [Node tag text]
}
| Text !text
type Nodes tag text = [Node tag text]
type Attributes tag text = [(tag, text)]
type UNode text = Node text text
type UNodes text = Nodes text text
type UAttributes text = Attributes text text
parseTree :: (Show tag, Show text, GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> (Node tag text, Maybe XMLParseError)
parseTree' :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> Either XMLParseError (Node tag text)
data Encoding
= ASCII
| UTF8
| UTF16
| ISO88591
data XMLParseError = XMLParseError String XMLParseLocation
data XMLParseLocation = XMLParseLocation {
xmlLineNumber :: Int64
xmlColumnNumber :: Int64
xmlByteIndex :: Int64
xmlByteCount :: Int64
}
parseSAX :: (Show tag, Show text, GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [SAXEvent tag text]
data SAXEvent tag text
= StartElement tag [(tag, text)]
| EndElement tag
| CharacterData text
| FailDocument XMLParseError
saxToTree :: GenericXMLString tag => [SAXEvent tag text] -> (Node tag text, Maybe XMLParseError)
parseSAXLocations :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [(SAXEvent tag text, XMLParseLocation)]
data XMLParseException = XMLParseException XMLParseError
parseSAXThrowing :: (Show tag, Show text, GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [SAXEvent tag text]
parseTreeThrowing :: (Show tag, Show text, GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> Node tag text
class (Monoid s, Eq s) => GenericXMLString s where
gxNullString :: s -> Bool
gxToString :: s -> String
gxFromString :: String -> s
gxFromChar :: Char -> s
gxHead :: s -> Char
gxTail :: s -> s
gxBreakOn :: Char -> s -> (s, s)
gxFromCStringLen :: CStringLen -> IO s
gxToByteString :: s -> ByteString
Tree structure
data Node tag text Source
The tree representation of the XML document.
Constructors
Element
eName :: !tag
eAttrs :: ![(tag, text)]
eChildren :: [Node tag text]
Text !text
show/hide Instances
(Eq tag, Eq text) => Eq (Node tag text)
(Show tag, Show text) => Show (Node tag text)
(NFData tag, NFData text) => NFData (Node tag text)
type Nodes tag text = [Node tag text]Source
Type shortcut for nodes
type Attributes tag text = [(tag, text)]Source
Type shortcut for attributes
type UNode text = Node text textSource
Type shortcut for a single node with unqualified tag names where tag and text are the same string type.
type UNodes text = Nodes text textSource
Type shortcut for nodes with unqualified tag names where tag and text are the same string type.
type UAttributes text = Attributes text textSource
Type shortcut for attributes with unqualified names where tag and text are the same string type.
Parse to tree
parseTreeSource
:: (Show tag, Show text, GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> (Node tag text, Maybe XMLParseError)
Lazily parse XML to tree. Note that forcing the XMLParseError return value will force the entire parse. Therefore, to ensure lazy operation, don't check the error status until you have processed the tree.
parseTree'Source
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a strict ByteString)
-> Either XMLParseError (Node tag text)
Strictly parse XML to tree. Returns error message or valid parsed tree.
data Encoding Source
Encoding types available for the document encoding.
Constructors
ASCII
UTF8
UTF16
ISO88591
data XMLParseError Source
Parse error, consisting of message text and error location
Constructors
XMLParseError String XMLParseLocation
show/hide Instances
data XMLParseLocation Source
Specifies a location of an event within the input text
Constructors
XMLParseLocation
xmlLineNumber :: Int64Line number of the event
xmlColumnNumber :: Int64Column number of the event
xmlByteIndex :: Int64Byte index of event from start of document
xmlByteCount :: Int64The number of bytes in the event
show/hide Instances
SAX-style parse
parseSAXSource
:: (Show tag, Show text, GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> [SAXEvent tag text]
Lazily parse XML to SAX events. In the event of an error, FailDocument is the last element of the output list.
data SAXEvent tag text Source
Constructors
StartElement tag [(tag, text)]
EndElement tag
CharacterData text
FailDocument XMLParseError
show/hide Instances
(Eq tag, Eq text) => Eq (SAXEvent tag text)
(Show tag, Show text) => Show (SAXEvent tag text)
(NFData tag, NFData text) => NFData (SAXEvent tag text)
saxToTree :: GenericXMLString tag => [SAXEvent tag text] -> (Node tag text, Maybe XMLParseError)Source
A lower level function that lazily converts a SAX stream into a tree structure.
parseSAXLocationsSource
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> [(SAXEvent tag text, XMLParseLocation)]
A variant of parseSAX that gives a document location with each SAX event.
Variants that throw exceptions
data XMLParseException Source
An exception indicating an XML parse error, used by the ..Throwing variants.
Constructors
XMLParseException XMLParseError
show/hide Instances
parseSAXThrowingSource
:: (Show tag, Show text, GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> [SAXEvent tag text]
Lazily parse XML to SAX events. In the event of an error, throw XMLParseException.
parseTreeThrowingSource
:: (Show tag, Show text, GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> Node tag text
Lazily parse XML to tree. In the event of an error, throw XMLParseException.
Abstraction of string types
class (Monoid s, Eq s) => GenericXMLString s whereSource
An abstraction for any string type you want to use as xml text (that is, attribute values or element text content). If you want to use a new string type with hexpat, you must make it an instance of GenericXMLString.
Methods
gxNullString :: s -> BoolSource
gxToString :: s -> StringSource
gxFromString :: String -> sSource
gxFromChar :: Char -> sSource
gxHead :: s -> CharSource
gxTail :: s -> sSource
gxBreakOn :: Char -> s -> (s, s)Source
gxFromCStringLen :: CStringLen -> IO sSource
gxToByteString :: s -> ByteStringSource
show/hide Instances
Produced by Haddock version 2.4.2