Safe Haskell	None
Language	Haskell98

Text.XML.Expat.Tree

Contents

Tree structure
Generic node manipulation
Qualified nodes
Namespaced nodes
Parse to tree
Variant that throws exceptions
Convert from SAX
Abstraction of string types

Description

This module provides functions to parse an XML document to a tree structure, either strictly or lazily.

The GenericXMLString type class allows you to use any string type. Three string types are provided for here: String, ByteString and Text.

Here is a complete example to get you started:

-- | A "hello world" example of hexpat that lazily parses a document, printing
-- it to standard out.

import Text.XML.Expat.Tree
import Text.XML.Expat.Format
import System.Environment
import System.Exit
import System.IO
import qualified Data.ByteString.Lazy as L

main = do
    args <- getArgs
    case args of
        [filename] -> process filename
        otherwise  -> do
            hPutStrLn stderr "Usage: helloworld <file.xml>"
            exitWith $ ExitFailure 1

process :: String -> IO ()
process filename = do
    inputText <- L.readFile filename
    -- Note: Because we're not using the tree, Haskell can't infer the type of
    -- strings we're using so we need to tell it explicitly with a type signature.
    let (xml, mErr) = parse defaultParseOptions inputText :: (UNode String, Maybe XMLParseError)
    -- Process document before handling error, so we get lazy processing.
    L.hPutStr stdout $ format xml
    putStrLn ""
    case mErr of
        Nothing -> return ()
        Just err -> do
            hPutStrLn stderr $ "XML parse failed: "++show err
            exitWith $ ExitFailure 2

Error handling in strict parses is very straightforward - just check the Either return value. Lazy parses are not so simple. Here are two working examples that illustrate the ways to handle errors. Here they are:

Way no. 1 - Using a Maybe value

import Text.XML.Expat.Tree
import qualified Data.ByteString.Lazy as L
import Data.ByteString.Internal (c2w)

-- This is the recommended way to handle errors in lazy parses
main = do
    let (tree, mError) = parse defaultParseOptions
                   (L.pack $ map c2w $ "<top><banana></apple></top>")
    print (tree :: UNode String)

    -- Note: We check the error _after_ we have finished our processing
    -- on the tree.
    case mError of
        Just err -> putStrLn $ "It failed : "++show err
        Nothing -> putStrLn "Success!"

Way no. 2 - Using exceptions

parseThrowing can throw an exception from pure code, which is generally a bad way to handle errors, because Haskell's lazy evaluation means it's hard to predict where it will be thrown from. However, it may be acceptable in situations where it's not expected during normal operation, depending on the design of your program.

...
import Control.Exception.Extensible as E

-- This is not the recommended way to handle errors.
main = do
    do
        let tree = parseThrowing defaultParseOptions
                       (L.pack $ map c2w $ "<top><banana></apple></top>")
        print (tree :: UNode String)
        -- Because of lazy evaluation, you should not process the tree outside
        -- the 'do' block, or exceptions could be thrown that won't get caught.
    `E.catch` (\exc ->
        case E.fromException exc of
            Just (XMLParseException err) -> putStrLn $ "It failed : "++show err
            Nothing -> E.throwIO exc)

Synopsis

Tree structure

type Node tag text = NodeG [] tag text Source #

A pure tree representation that uses a list as its container type.

In the hexpat package, a list of nodes has the type [Node tag text], but note that you can also use the more general type function ListOf to give a list of any node type, using that node's associated list type, e.g. ListOf (UNode Text).

data NodeG c tag text Source #

The tree representation of the XML document.

c is the container type for the element's children, which would normally be [], but could potentially be a monadic list type to allow for chunked I/O.

tag is the tag type, which can either be one of several string types, or a special type from the Text.XML.Expat.Namespaced or Text.XML.Expat.Qualified modules.

text is the string type for text content.

Constructors

Element
Fields eName :: !tag eAttributes :: ![(tag, text)] eChildren :: c (NodeG c tag text)
Text !text

Instances

(Functor c, List c) => MkElementClass NodeG c Source #
Methods mkElement :: tag -> Attributes tag text -> c (NodeG c tag text) -> NodeG c tag text Source #
(Functor c, List c) => NodeClass NodeG c Source #
Methods isElement :: NodeG c tag text -> Bool Source # isText :: NodeG c tag text -> Bool Source # isCData :: NodeG c tag text -> Bool Source # isProcessingInstruction :: NodeG c tag text -> Bool Source # isComment :: NodeG c tag text -> Bool Source # textContentM :: Monoid text => NodeG c tag text -> ItemM c text Source # isNamed :: Eq tag => tag -> NodeG c tag text -> Bool Source # getName :: Monoid tag => NodeG c tag text -> tag Source # hasTarget :: Eq text => text -> NodeG c tag text -> Bool Source # getTarget :: Monoid text => NodeG c tag text -> text Source # getAttributes :: NodeG c tag text -> [(tag, text)] Source # getChildren :: NodeG c tag text -> c (NodeG c tag text) Source # getText :: Monoid text => NodeG c tag text -> text Source # modifyName :: (tag -> tag) -> NodeG c tag text -> NodeG c tag text Source # modifyAttributes :: ([(tag, text)] -> [(tag, text)]) -> NodeG c tag text -> NodeG c tag text Source # modifyChildren :: (c (NodeG c tag text) -> c (NodeG c tag text)) -> NodeG c tag text -> NodeG c tag text Source # modifyElement :: ((tag, [(tag, text)], c (NodeG c tag text)) -> (tag', [(tag', text)], c (NodeG c tag' text))) -> NodeG c tag text -> NodeG c tag' text Source # mapAllTags :: (tag -> tag') -> NodeG c tag text -> NodeG c tag' text Source # mapNodeContainer :: List c' => (forall a. c a -> ItemM c (c' a)) -> NodeG c tag text -> ItemM c (NodeG c' tag text) Source # mkText :: text -> NodeG c tag text Source #
(Eq tag, Eq text) => Eq (NodeG [] tag text) Source #
Methods (==) :: NodeG [] tag text -> NodeG [] tag text -> Bool # (/=) :: NodeG [] tag text -> NodeG [] tag text -> Bool #
(Show tag, Show text) => Show (NodeG [] tag text) Source #
Methods showsPrec :: Int -> NodeG [] tag text -> ShowS # show :: NodeG [] tag text -> String # showList :: [NodeG [] tag text] -> ShowS #
(NFData tag, NFData text) => NFData (NodeG [] tag text) Source #
Methods rnf :: NodeG [] tag text -> () #
type ListOf (NodeG c tag text) Source #
type ListOf (NodeG c tag text) = c (NodeG c tag text)

type UNode text = Node text text Source #

Type alias for a node with unqualified tag names where tag and text are the same string type.

Generic node manipulation

module Text.XML.Expat.Internal.NodeClass

Qualified nodes

type QNode text = Node (QName text) text Source #

Type alias for a node where qualified names are used for tags

module Text.XML.Expat.Internal.Qualified

Namespaced nodes

type NNode text = Node (NName text) text Source #

Type alias for a node where namespaced names are used for tags

module Text.XML.Expat.Internal.Namespaced

Parse to tree

data ParseOptions tag text Source #

Constructors

ParseOptions
Fields overrideEncoding :: Maybe Encoding The encoding parameter, if provided, overrides the document's encoding declaration. entityDecoder :: Maybe (tag -> Maybe text) If provided, entity references (i.e. ` ` and friends) will be decoded into text using the supplied lookup function

defaultParseOptions :: ParseOptions tag text Source #

data Encoding Source #

Constructors

ASCII
UTF8
UTF16
ISO88591

parse Source #

Arguments

:: (GenericXMLString tag, GenericXMLString text)
=> ParseOptions tag text	Parse options
-> ByteString	Input text (a lazy ByteString)
-> (Node tag text, Maybe XMLParseError)

Lazily parse XML to tree. Note that forcing the XMLParseError return value will force the entire parse. Therefore, to ensure lazy operation, don't check the error status until you have processed the tree.

parse' Source #

Arguments

:: (GenericXMLString tag, GenericXMLString text)
=> ParseOptions tag text	Parse options
-> ByteString	Input text (a strict ByteString)
-> Either XMLParseError (Node tag text)

Strictly parse XML to tree. Returns error message or valid parsed tree.

parseG Source #

Arguments

:: (GenericXMLString tag, GenericXMLString text, List l)
=> ParseOptions tag text	Parse options
-> l ByteString	Input text as a generalized list of blocks
-> ItemM l (NodeG l tag text)

Parse a generalized list to a tree, ignoring parse errors. This function allows for a parse from an enumerator/iteratee to a "lazy" tree structure using the List-enumerator package.

data XMLParseError Source #

Parse error, consisting of message text and error location

Constructors

XMLParseError String XMLParseLocation

Instances

Eq XMLParseError Source #
Methods (==) :: XMLParseError -> XMLParseError -> Bool # (/=) :: XMLParseError -> XMLParseError -> Bool #
Show XMLParseError Source #
Methods showsPrec :: Int -> XMLParseError -> ShowS # show :: XMLParseError -> String # showList :: [XMLParseError] -> ShowS #
NFData XMLParseError Source #
Methods rnf :: XMLParseError -> () #

data XMLParseLocation Source #

Specifies a location of an event within the input text

Constructors

XMLParseLocation
Fields xmlLineNumber :: Int64 Line number of the event xmlColumnNumber :: Int64 Column number of the event xmlByteIndex :: Int64 Byte index of event from start of document xmlByteCount :: Int64 The number of bytes in the event

Instances

Eq XMLParseLocation Source #
Methods (==) :: XMLParseLocation -> XMLParseLocation -> Bool # (/=) :: XMLParseLocation -> XMLParseLocation -> Bool #
Show XMLParseLocation Source #
Methods showsPrec :: Int -> XMLParseLocation -> ShowS # show :: XMLParseLocation -> String # showList :: [XMLParseLocation] -> ShowS #
NFData XMLParseLocation Source #
Methods rnf :: XMLParseLocation -> () #

Variant that throws exceptions

parseThrowing Source #

Arguments

:: (GenericXMLString tag, GenericXMLString text)
=> ParseOptions tag text	Parse options
-> ByteString	Input text (a lazy ByteString)
-> Node tag text

Lazily parse XML to tree. In the event of an error, throw XMLParseException.

parseThrowing can throw an exception from pure code, which is generally a bad way to handle errors, because Haskell's lazy evaluation means it's hard to predict where it will be thrown from. However, it may be acceptable in situations where it's not expected during normal operation, depending on the design of your program.

data XMLParseException Source #

An exception indicating an XML parse error, used by the ..Throwing variants.

Constructors

XMLParseException XMLParseError

Instances

Eq XMLParseException Source #
Methods (==) :: XMLParseException -> XMLParseException -> Bool # (/=) :: XMLParseException -> XMLParseException -> Bool #
Show XMLParseException Source #
Methods showsPrec :: Int -> XMLParseException -> ShowS # show :: XMLParseException -> String # showList :: [XMLParseException] -> ShowS #
Exception XMLParseException Source #
Methods toException :: XMLParseException -> SomeException # fromException :: SomeException -> Maybe XMLParseException # displayException :: XMLParseException -> String #

Convert from SAX

saxToTree :: GenericXMLString tag => [SAXEvent tag text] -> (Node tag text, Maybe XMLParseError) Source #

A lower level function that lazily converts a SAX stream into a tree structure.

saxToTreeG :: forall tag text l. (GenericXMLString tag, List l) => l (SAXEvent tag text) -> ItemM l (NodeG l tag text) Source #

A lower level function that converts a generalized SAX stream into a tree structure. Ignores parse errors.

Abstraction of string types

class (Monoid s, Eq s) => GenericXMLString s where Source #

An abstraction for any string type you want to use as xml text (that is, attribute values or element text content). If you want to use a new string type with hexpat, you must make it an instance of GenericXMLString.

Minimal complete definition

gxNullString, gxToString, gxFromString, gxFromChar, gxHead, gxTail, gxBreakOn, gxFromByteString, gxToByteString

Methods

gxNullString :: s -> Bool Source #

gxToString :: s -> String Source #

gxFromString :: String -> s Source #

gxFromChar :: Char -> s Source #

gxHead :: s -> Char Source #

gxTail :: s -> s Source #

gxBreakOn :: Char -> s -> (s, s) Source #

gxFromByteString :: ByteString -> s Source #

gxToByteString :: s -> ByteString Source #

Instances

GenericXMLString String Source #
Methods gxNullString :: String -> Bool Source # gxToString :: String -> String Source # gxFromString :: String -> String Source # gxFromChar :: Char -> String Source # gxHead :: String -> Char Source # gxTail :: String -> String Source # gxBreakOn :: Char -> String -> (String, String) Source # gxFromByteString :: ByteString -> String Source # gxToByteString :: String -> ByteString Source #
GenericXMLString ByteString Source #
Methods gxNullString :: ByteString -> Bool Source # gxToString :: ByteString -> String Source # gxFromString :: String -> ByteString Source # gxFromChar :: Char -> ByteString Source # gxHead :: ByteString -> Char Source # gxTail :: ByteString -> ByteString Source # gxBreakOn :: Char -> ByteString -> (ByteString, ByteString) Source # gxFromByteString :: ByteString -> ByteString Source # gxToByteString :: ByteString -> ByteString Source #
GenericXMLString Text Source #
Methods gxNullString :: Text -> Bool Source # gxToString :: Text -> String Source # gxFromString :: String -> Text Source # gxFromChar :: Char -> Text Source # gxHead :: Text -> Char Source # gxTail :: Text -> Text Source # gxBreakOn :: Char -> Text -> (Text, Text) Source # gxFromByteString :: ByteString -> Text Source # gxToByteString :: Text -> ByteString Source #