hexpat-0.12: wrapper for expat, the fast XML parserSource codeContentsIndex
Text.XML.Expat.Tree
Contents
Tree structure
Parse to tree
Variant that throws exceptions
SAX-style parse
Abstraction of string types
Deprecated
Description

This module provides functions to parse an XML document to a tree structure, either strictly or lazily.

The GenericXMLString type class allows you to use any string type. Three string types are provided for here: String, ByteString and Text.

Here is a complete example to get you started:

 -- | A "hello world" example of hexpat that lazily parses a document, printing
 -- it to standard out.

 import Text.XML.Expat.Tree
 import Text.XML.Expat.Format
 import System.Environment
 import System.Exit
 import System.IO
 import qualified Data.ByteString.Lazy as L

 main = do
     args <- getArgs
     case args of
         [filename] -> process filename
         otherwise  -> do
             hPutStrLn stderr "Usage: helloworld <file.xml>"
             exitWith $ ExitFailure 1

 process :: String -> IO ()
 process filename = do
     inputText <- L.readFile filename
     -- Note: Because we're not using the tree, Haskell can't infer the type of
     -- strings we're using so we need to tell it explicitly with a type signature.
     let (xml, mErr) = parse defaultParserOptions inputText :: (UNode String, Maybe XMLParseError)
     -- Process document before handling error, so we get lazy processing.
     L.hPutStr stdout $ format xml
     putStrLn ""
     case mErr of
         Nothing -> return ()
         Just err -> do
             hPutStrLn stderr $ "XML parse failed: "++show err
             exitWith $ ExitFailure 2

Error handling in strict parses is very straight forward - just check the Either return value. Lazy parses are not so simple. Here are two working examples that illustrate the ways to handle errors. Here they are:

Way no. 1 - Using a Maybe value

 import Text.XML.Expat.Tree
 import qualified Data.ByteString.Lazy as L
 import Data.ByteString.Internal (c2w)

 -- This is the recommended way to handle errors in lazy parses
 main = do
     let (tree, mError) = parse defaultParserOptions
                    (L.pack $ map c2w $ "<top><banana></apple></top>")
     print (tree :: UNode String)

     -- Note: We check the error _after_ we have finished our processing
     -- on the tree.
     case mError of
         Just err -> putStrLn $ "It failed : "++show err
         Nothing -> putStrLn "Success!"

Way no. 2 - Using exceptions

parseThrowing can throw an exception from pure code, which is generally a bad way to handle errors, because Haskell's lazy evaluation means it's hard to predict where it will be thrown from. However, it may be acceptable in situations where it's not expected during normal operation, depending on the design of your program.

 ...
 import Control.Exception.Extensible as E

 -- This is not the recommended way to handle errors.
 main = do
     do
         let tree = parseThrowing defaultParserOptions
                        (L.pack $ map c2w $ "<top><banana></apple></top>")
         print (tree :: UNode String)
         -- Because of lazy evaluation, you should not process the tree outside
         -- the 'do' block, or exceptions could be thrown that won't get caught.
     `E.catch` (\exc ->
         case E.fromException exc of
             Just (XMLParseException err) -> putStrLn $ "It failed : "++show err
             Nothing -> E.throwIO exc)
Synopsis
data Node tag text
= Element {
eName :: !tag
eAttributes :: ![(tag, text)]
eChildren :: [Node tag text]
}
| Text !text
type Attributes tag text = [(tag, text)]
type UNode text = Node text text
type UAttributes text = Attributes text text
textContent :: (NodeClass n, Monoid text) => n tag text -> text
isElement :: NodeClass n => n tag text -> Bool
isNamed :: (NodeClass n, Eq tag) => tag -> n tag text -> Bool
isText :: NodeClass n => n tag text -> Bool
getName :: (NodeClass n, GenericXMLString tag) => n tag text -> tag
getAttributes :: NodeClass n => n tag text -> [(tag, text)]
getAttribute :: (NodeClass n, GenericXMLString tag) => n tag text -> tag -> Maybe text
getChildren :: NodeClass n => n tag text -> [n tag text]
modifyName :: NodeClass n => (tag -> tag) -> n tag text -> n tag text
modifyAttributes :: NodeClass n => ([(tag, text)] -> [(tag, text)]) -> n tag text -> n tag text
setAttribute :: (Eq tag, NodeClass n, GenericXMLString tag) => tag -> text -> n tag text -> n tag text
deleteAttribute :: (Eq tag, NodeClass n, GenericXMLString tag) => tag -> n tag text -> n tag text
alterAttribute :: (Eq tag, NodeClass n, GenericXMLString tag) => tag -> Maybe text -> n tag text -> n tag text
modifyChildren :: NodeClass n => ([n tag text] -> [n tag text]) -> n tag text -> n tag text
mapAllTags :: NodeClass n => (tag -> tag') -> n tag text -> n tag' text
data ParserOptions tag text = ParserOptions {
parserEncoding :: Maybe Encoding
entityDecoder :: Maybe (tag -> Maybe text)
}
defaultParserOptions :: ParserOptions tag text
data Encoding
= ASCII
| UTF8
| UTF16
| ISO88591
parse :: (GenericXMLString tag, GenericXMLString text) => ParserOptions tag text -> ByteString -> (Node tag text, Maybe XMLParseError)
parse' :: (GenericXMLString tag, GenericXMLString text) => ParserOptions tag text -> ByteString -> Either XMLParseError (Node tag text)
data XMLParseError = XMLParseError String XMLParseLocation
data XMLParseLocation = XMLParseLocation {
xmlLineNumber :: Int64
xmlColumnNumber :: Int64
xmlByteIndex :: Int64
xmlByteCount :: Int64
}
parseThrowing :: (GenericXMLString tag, GenericXMLString text) => ParserOptions tag text -> ByteString -> Node tag text
data XMLParseException = XMLParseException XMLParseError
data SAXEvent tag text
= StartElement tag [(tag, text)]
| EndElement tag
| CharacterData text
| FailDocument XMLParseError
saxToTree :: GenericXMLString tag => [SAXEvent tag text] -> (Node tag text, Maybe XMLParseError)
class (Monoid s, Eq s) => GenericXMLString s where
gxNullString :: s -> Bool
gxToString :: s -> String
gxFromString :: String -> s
gxFromChar :: Char -> s
gxHead :: s -> Char
gxTail :: s -> s
gxBreakOn :: Char -> s -> (s, s)
gxFromCStringLen :: CStringLen -> IO s
gxToByteString :: s -> ByteString
eAttrs :: Node tag text -> [(tag, text)]
type Nodes tag text = [Node tag text]
type UNodes text = Nodes text text
parseTree :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> (Node tag text, Maybe XMLParseError)
parseTree' :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> Either XMLParseError (Node tag text)
parseSAX :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [SAXEvent tag text]
parseSAXLocations :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [(SAXEvent tag text, XMLParseLocation)]
parseTreeThrowing :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> Node tag text
parseSAXThrowing :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [SAXEvent tag text]
parseSAXLocationsThrowing :: (GenericXMLString tag, GenericXMLString text) => Maybe Encoding -> ByteString -> [(SAXEvent tag text, XMLParseLocation)]
Tree structure
data Node tag text Source
The tree representation of the XML document.
Constructors
Element
eName :: !tag
eAttributes :: ![(tag, text)]
eChildren :: [Node tag text]
Text !text
show/hide Instances
NodeClass Node
(Eq tag, Eq text) => Eq (Node tag text)
(Show tag, Show text) => Show (Node tag text)
(NFData tag, NFData text) => NFData (Node tag text)
type Attributes tag text = [(tag, text)]Source
Type shortcut for attributes
type UNode text = Node text textSource
Type shortcut for a single node with unqualified tag names where tag and text are the same string type.
type UAttributes text = Attributes text textSource
Type shortcut for attributes with unqualified names where tag and text are the same string type.
textContent :: (NodeClass n, Monoid text) => n tag text -> textSource
Extract all text content from inside a tag into a single string, including any text contained in children.
isElement :: NodeClass n => n tag text -> BoolSource
Is the given node an element?
isNamed :: (NodeClass n, Eq tag) => tag -> n tag text -> BoolSource
Is the given node a tag with the given name?
isText :: NodeClass n => n tag text -> BoolSource
Is the given node text?
getName :: (NodeClass n, GenericXMLString tag) => n tag text -> tagSource
Get the name of this node if it's an element, return empty string otherwise.
getAttributes :: NodeClass n => n tag text -> [(tag, text)]Source
Get the attributes of a node if it's an element, return empty list otherwise.
getAttribute :: (NodeClass n, GenericXMLString tag) => n tag text -> tag -> Maybe textSource
Get the value of the attribute having the specified name.
getChildren :: NodeClass n => n tag text -> [n tag text]Source
Get children of a node if it's an element, return empty list otherwise.
modifyName :: NodeClass n => (tag -> tag) -> n tag text -> n tag textSource
Modify name if it's an element, no-op otherwise.
modifyAttributes :: NodeClass n => ([(tag, text)] -> [(tag, text)]) -> n tag text -> n tag textSource
Modify attributes if it's an element, no-op otherwise.
setAttribute :: (Eq tag, NodeClass n, GenericXMLString tag) => tag -> text -> n tag text -> n tag textSource
Set the value of the attribute with the specified name to the value, overwriting the first existing attribute with that name if present.
deleteAttribute :: (Eq tag, NodeClass n, GenericXMLString tag) => tag -> n tag text -> n tag textSource
Delete the first attribute matching the specified name.
alterAttribute :: (Eq tag, NodeClass n, GenericXMLString tag) => tag -> Maybe text -> n tag text -> n tag textSource
setAttribute if Just, deleteAttribute if Nothing.
modifyChildren :: NodeClass n => ([n tag text] -> [n tag text]) -> n tag text -> n tag textSource
Modify children (non-recursively) if it's an element, no-op otherwise.
mapAllTags :: NodeClass n => (tag -> tag') -> n tag text -> n tag' textSource
Map all tags (both tag names and attribute names) recursively.
Parse to tree
data ParserOptions tag text Source
Constructors
ParserOptions
parserEncoding :: Maybe EncodingThe encoding parameter, if provided, overrides the document's encoding declaration.
entityDecoder :: Maybe (tag -> Maybe text)If provided, entity references (i.e. &nbsp; and friends) will be decoded into text using the supplied lookup function
defaultParserOptions :: ParserOptions tag textSource
data Encoding Source
Encoding types available for the document encoding.
Constructors
ASCII
UTF8
UTF16
ISO88591
parseSource
:: (GenericXMLString tag, GenericXMLString text)
=> ParserOptions tag textParser options
-> ByteStringInput text (a lazy ByteString)
-> (Node tag text, Maybe XMLParseError)
Lazily parse XML to tree. Note that forcing the XMLParseError return value will force the entire parse. Therefore, to ensure lazy operation, don't check the error status until you have processed the tree.
parse'Source
:: (GenericXMLString tag, GenericXMLString text)
=> ParserOptions tag textParser options
-> ByteStringInput text (a strict ByteString)
-> Either XMLParseError (Node tag text)
Strictly parse XML to tree. Returns error message or valid parsed tree.
data XMLParseError Source
Parse error, consisting of message text and error location
Constructors
XMLParseError String XMLParseLocation
show/hide Instances
data XMLParseLocation Source
Specifies a location of an event within the input text
Constructors
XMLParseLocation
xmlLineNumber :: Int64Line number of the event
xmlColumnNumber :: Int64Column number of the event
xmlByteIndex :: Int64Byte index of event from start of document
xmlByteCount :: Int64The number of bytes in the event
show/hide Instances
Variant that throws exceptions
parseThrowingSource
:: (GenericXMLString tag, GenericXMLString text)
=> ParserOptions tag textParser options
-> ByteStringInput text (a lazy ByteString)
-> Node tag text

Lazily parse XML to tree. In the event of an error, throw XMLParseException.

parseThrowing can throw an exception from pure code, which is generally a bad way to handle errors, because Haskell's lazy evaluation means it's hard to predict where it will be thrown from. However, it may be acceptable in situations where it's not expected during normal operation, depending on the design of your program.

data XMLParseException Source
An exception indicating an XML parse error, used by the ..Throwing variants.
Constructors
XMLParseException XMLParseError
show/hide Instances
SAX-style parse
data SAXEvent tag text Source
Constructors
StartElement tag [(tag, text)]
EndElement tag
CharacterData text
FailDocument XMLParseError
show/hide Instances
(Eq tag, Eq text) => Eq (SAXEvent tag text)
(Show tag, Show text) => Show (SAXEvent tag text)
(NFData tag, NFData text) => NFData (SAXEvent tag text)
saxToTree :: GenericXMLString tag => [SAXEvent tag text] -> (Node tag text, Maybe XMLParseError)Source
A lower level function that lazily converts a SAX stream into a tree structure.
Abstraction of string types
class (Monoid s, Eq s) => GenericXMLString s whereSource
An abstraction for any string type you want to use as xml text (that is, attribute values or element text content). If you want to use a new string type with hexpat, you must make it an instance of GenericXMLString.
Methods
gxNullString :: s -> BoolSource
gxToString :: s -> StringSource
gxFromString :: String -> sSource
gxFromChar :: Char -> sSource
gxHead :: s -> CharSource
gxTail :: s -> sSource
gxBreakOn :: Char -> s -> (s, s)Source
gxFromCStringLen :: CStringLen -> IO sSource
gxToByteString :: s -> ByteStringSource
show/hide Instances
Deprecated
eAttrs :: Node tag text -> [(tag, text)]Source
type Nodes tag text = [Node tag text]Source

DEPRECATED: Use [Node tag text] instead.

Type shortcut for nodes.

type UNodes text = Nodes text textSource

DEPRECATED: Use [UNode text] instead.

Type shortcut for nodes with unqualified tag names where tag and text are the same string type. Deprecated

parseTreeSource
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> (Node tag text, Maybe XMLParseError)

DEPREACTED: Use parse instead.

Lazily parse XML to tree. Note that forcing the XMLParseError return value will force the entire parse. Therefore, to ensure lazy operation, don't check the error status until you have processed the tree.

parseTree'Source
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a strict ByteString)
-> Either XMLParseError (Node tag text)

DEPRECATED: use parse instead.

Strictly parse XML to tree. Returns error message or valid parsed tree.

parseSAXSource
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> [SAXEvent tag text]

DEPRECATED: Use parse instead.

Lazily parse XML to SAX events. In the event of an error, FailDocument is the last element of the output list. Deprecated in favour of new parse

parseSAXLocationsSource
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> [(SAXEvent tag text, XMLParseLocation)]

DEPRECATED: Use parseLocations instead.

A variant of parseSAX that gives a document location with each SAX event.

parseTreeThrowingSource
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> Node tag text

DEPRECATED: Use parseThrowing instead.

Lazily parse XML to tree. In the event of an error, throw XMLParseException.

parseSAXThrowingSource
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> [SAXEvent tag text]

DEPRECATED: Use parseThrowing instead.

Lazily parse XML to SAX events. In the event of an error, throw XMLParseException.

parseSAXLocationsThrowingSource
:: (GenericXMLString tag, GenericXMLString text)
=> Maybe EncodingOptional encoding override
-> ByteStringInput text (a lazy ByteString)
-> [(SAXEvent tag text, XMLParseLocation)]

DEPRECATED: Used parseLocationsThrowing instead.

A variant of parseSAX that gives a document location with each SAX event. In the event of an error, throw XMLParseException.

Produced by Haddock version 2.6.1