Safe Haskell | None |
---|
This module provides both a native Haskell solution for parsing XML documents into a stream of events, and a set of parser combinators for dealing with a stream of events.
As a simple example, if you have the following XML file:
<?xml version="1.0" encoding="utf-8"?> <people> <person age="25">Michael</person> <person age="2">Eliezer</person> </people>
Then this code:
{-# LANGUAGE OverloadedStrings #-} import Control.Monad.Trans.Resource import Data.Conduit (($$)) import Data.Text (Text, unpack) import Text.XML.Stream.Parse data Person = Person Int Text deriving Show parsePerson = tagName "person" (requireAttr "age") $ \age -> do name <- content return $ Person (read $ unpack age) name parsePeople = tagNoAttr "people" $ many parsePerson main = do people <- runResourceT $ parseFile def "people.xml" $$ force "people required" parsePeople print people
will produce:
[Person {age = 25, name = "Michael"},Person {age = 2, name = "Eliezer"}]
Previous versions of this module contained a number of more sophisticated functions written by Aristid Breitkreuz and Dmitry Olshansky. To keep this package simpler, those functions are being moved to a separate package. This note will be updated with the name of the package(s) when available.
- parseBytes :: MonadThrow m => ParseSettings -> Conduit ByteString m Event
- parseBytesPos :: MonadThrow m => ParseSettings -> Conduit ByteString m EventPos
- parseText :: MonadThrow m => ParseSettings -> Conduit Text m EventPos
- detectUtf :: MonadThrow m => Conduit ByteString m Text
- parseFile :: MonadResource m => ParseSettings -> FilePath -> Producer m Event
- parseLBS :: MonadThrow m => ParseSettings -> ByteString -> Producer m Event
- data ParseSettings
- def :: Default a => a
- type DecodeEntities = Text -> Content
- psDecodeEntities :: ParseSettings -> DecodeEntities
- decodeXmlEntities :: DecodeEntities
- decodeHtmlEntities :: DecodeEntities
- tag :: MonadThrow m => (Name -> Maybe a) -> (a -> AttrParser b) -> (b -> ConduitM Event o m c) -> ConduitM Event o m (Maybe c)
- tagPredicate :: MonadThrow m => (Name -> Bool) -> AttrParser a -> (a -> ConduitM Event o m b) -> ConduitM Event o m (Maybe b)
- tagName :: MonadThrow m => Name -> AttrParser a -> (a -> ConduitM Event o m b) -> ConduitM Event o m (Maybe b)
- tagNoAttr :: MonadThrow m => Name -> ConduitM Event o m a -> ConduitM Event o m (Maybe a)
- content :: MonadThrow m => Consumer Event m Text
- contentMaybe :: MonadThrow m => Consumer Event m (Maybe Text)
- data AttrParser a
- requireAttr :: Name -> AttrParser Text
- optionalAttr :: Name -> AttrParser (Maybe Text)
- requireAttrRaw :: String -> ((Name, [Content]) -> Maybe b) -> AttrParser b
- optionalAttrRaw :: ((Name, [Content]) -> Maybe b) -> AttrParser (Maybe b)
- ignoreAttrs :: AttrParser ()
- orE :: Monad m => Consumer Event m (Maybe a) -> Consumer Event m (Maybe a) -> Consumer Event m (Maybe a)
- choose :: Monad m => [Consumer Event m (Maybe a)] -> Consumer Event m (Maybe a)
- many :: Monad m => Consumer Event m (Maybe a) -> Consumer Event m [a]
- force :: MonadThrow m => String -> ConduitM Event o m (Maybe a) -> ConduitM Event o m a
- data XmlException
- = XmlException { }
- | InvalidEndElement Name
- | InvalidEntity Text
- | UnparsedAttributes [(Name, [Content])]
- data PositionRange
- type EventPos = (Maybe PositionRange, Event)
Parsing XML files
parseBytes :: MonadThrow m => ParseSettings -> Conduit ByteString m EventSource
Parses a byte stream into Event
s. This function is implemented fully in
Haskell using attoparsec-text for parsing. The produced error messages do
not give line/column information, so you may prefer to stick with the parser
provided by libxml-enumerator. However, this has the advantage of not
relying on any C libraries.
This relies on detectUtf
to determine character encoding, and parseText
to do the actual parsing.
parseBytesPos :: MonadThrow m => ParseSettings -> Conduit ByteString m EventPosSource
parseText :: MonadThrow m => ParseSettings -> Conduit Text m EventPosSource
Parses a character stream into Event
s. This function is implemented
fully in Haskell using attoparsec-text for parsing. The produced error
messages do not give line/column information, so you may prefer to stick
with the parser provided by libxml-enumerator. However, this has the
advantage of not relying on any C libraries.
detectUtf :: MonadThrow m => Conduit ByteString m TextSource
Automatically determine which UTF variant is being used. This function first checks for BOMs, removing them as necessary, and then check for the equivalent of <?xml for each of UTF-8, UTF-16LEBE, and UTF-32LEBE. It defaults to assuming UTF-8.
parseFile :: MonadResource m => ParseSettings -> FilePath -> Producer m EventSource
A helper function which reads a file from disk using enumFile
, detects
character encoding using detectUtf
, parses the XML using parseBytes
, and
then hands off control to your supplied parser.
parseLBS :: MonadThrow m => ParseSettings -> ByteString -> Producer m EventSource
Parse an event stream from a lazy ByteString
.
Parser settings
type DecodeEntities = Text -> ContentSource
Entity decoding
decodeXmlEntities :: DecodeEntitiesSource
Default implementation of DecodeEntities
: handles numeric entities and
the five standard character entities (lt, gt, amp, quot, apos).
decodeHtmlEntities :: DecodeEntitiesSource
HTML4-compliant entity decoder. Handles numerics, the five standard character entities, and the additional 248 entities defined by HTML 4 and XHTML 1.
Note that HTML 5 introduces a drastically larger number of entities, and this code does not recognize most of them.
Event parsing
tag :: MonadThrow m => (Name -> Maybe a) -> (a -> AttrParser b) -> (b -> ConduitM Event o m c) -> ConduitM Event o m (Maybe c)Source
The most generic way to parse a tag. It takes a predicate for checking if
this is the correct tag name, an AttrParser
for handling attributes, and
then a parser for dealing with content.
This function automatically absorbs its balancing closing tag, and will
throw an exception if not all of the attributes or child elements are
consumed. If you want to allow extra attributes, see ignoreAttrs
.
This function automatically ignores comments, instructions and whitespace.
tagPredicate :: MonadThrow m => (Name -> Bool) -> AttrParser a -> (a -> ConduitM Event o m b) -> ConduitM Event o m (Maybe b)Source
A simplified version of tag
which matches against boolean predicates.
tagName :: MonadThrow m => Name -> AttrParser a -> (a -> ConduitM Event o m b) -> ConduitM Event o m (Maybe b)Source
tagNoAttr :: MonadThrow m => Name -> ConduitM Event o m a -> ConduitM Event o m (Maybe a)Source
A further simplified tag parser, which requires that no attributes exist.
content :: MonadThrow m => Consumer Event m TextSource
Grabs the next piece of content. If none if available, returns empty
.
This is simply a wrapper around contentMaybe
.
contentMaybe :: MonadThrow m => Consumer Event m (Maybe Text)Source
Grabs the next piece of content if available. This function skips over any comments and instructions and concatenates all content until the next start or end tag.
Attribute parsing
data AttrParser a Source
A monad for parsing attributes. By default, it requires you to deal with
all attributes present on an element, and will throw an exception if there
are unhandled attributes. Use the requireAttr
, optionalAttr
et al
functions for handling an attribute, and ignoreAttrs
if you would like to
skip the rest of the attributes on an element.
Alternative
instance behave like First
monoid. It chooses first
parser which doesn't fail.
requireAttr :: Name -> AttrParser TextSource
Require that a certain attribute be present and return its value.
optionalAttr :: Name -> AttrParser (Maybe Text)Source
Return the value for an attribute if present.
requireAttrRaw :: String -> ((Name, [Content]) -> Maybe b) -> AttrParser bSource
optionalAttrRaw :: ((Name, [Content]) -> Maybe b) -> AttrParser (Maybe b)Source
ignoreAttrs :: AttrParser ()Source
Skip the remaining attributes on an element. Since this will clear the
list of attributes, you must call this after any calls to requireAttr
,
optionalAttr
, etc.
Combinators
orE :: Monad m => Consumer Event m (Maybe a) -> Consumer Event m (Maybe a) -> Consumer Event m (Maybe a)Source
many :: Monad m => Consumer Event m (Maybe a) -> Consumer Event m [a]Source
Keep parsing elements as long as the parser returns Just
.
Exceptions
data XmlException Source
Other types
type EventPos = (Maybe PositionRange, Event)Source