| Safe Haskell | None |
|---|---|
| Language | Haskell2010 |
Text.XML.TyDom.Conduit
Contents
- class ToElem a where
- class FromElem a where
- class ToXText a where
- class FromXText a where
- class Conv p q where
- newtype Attr z :: * -> * = Attr {
- unAttr :: z
- newtype Child z :: * -> * = Child {
- unChild :: z
- newtype Content z :: * -> * = Content {
- unContent :: z
- newtype AttrName = AttrName {
- unAttrName :: Name
- newtype ElemName = ElemName {
- unElemName :: Name
- newtype XTextError :: * = XTextError Text
- data Result a
- data Path
- data Cause
- data OptionsElement = OptionsElement {}
- data ReadNodeOrdering :: *
- data ReadLeftovers :: *
- genericToElem :: (Generic z, GToElem Element ElemName AttrName Text (Rep z)) => OptionsElement -> z -> Element
- genericFromElem :: (Generic z, GFromElem Element ElemName AttrName Text (Rep z)) => OptionsElement -> Element -> Result z
- genericConv :: (Generic a, Generic b, GConv (Rep a) (Rep b)) => a -> b
- unAttr :: Attr z -> z
- unChild :: Child z -> z
- unContent :: Content z -> z
- defaultOptionsElement :: OptionsElement
- render :: Element -> Text
- renderFailure :: Result a -> Text
- parse :: Text -> Result Element
- ignoreWSContent :: Element -> Element
- getAttrValue :: FromXText a => AttrName -> Element -> Result a
- xTextErrType :: Text -> Text -> Either XTextError a
Guide
Quick start
xml-tydom is a library for expressing XML representations using Haskell
data types. The serialization to and from XML is done automatically using GHC
Generics and (optionally) some Template Haskell. A good way to illustrate
this is with a quick example.
We start with a Haskell data type that describes the XML structure we want:
{-# LANGUAGE DeriveGeneric #-}
import GHC.Generics (Generic)
data Person = Person
{ id :: Attr Int -- an attribute
, name :: Child Text -- a child element containing text
, comment :: Content Text -- a child text content node
} deriving (Show, Generic)
Then we use GHC Generics to write instances of ToElem and FromElem for
the Person type (you can probably guess what these do):
instanceToElemPerson wheretoElem=genericToElemdefaultOptionsElementinstanceFromElemPerson wherefromElem=genericFromElemdefaultOptionsElement
With these typeclass instances available, we can serialize a value of type
Person to Text containing XML, and also read back the generated Text:
>>> person = Person (Attr42) (Child"Joe") (Content"XML4Joe!") >>> text =render$toElemperson >>> text "<Person id=\"42\"><name>Joe</name>XML4Joe!</Person>" >>> personResult = (parsetext >>=fromElem) ::ResultPersonSuccess(Person { id =Attr{unAttr= 42 } , name =Child{unChild= "Joe" } , comment =Content{unContent= "XML4Joe!" } })
XML text
Textual content in XML documents can appear as either attributes or text content nodes within elements. The conversion of types to and from text is controlled by a pair of typeclasses:
ToXTexta- Converts type
atoText. FromXTexta- Converts
TexttoEither.XTextErrora
It is recommended that instances of these typeclasses should be written manually for most user-defined types.
Available encodings
The following types exist to represent parts of the XML DOM:
{ selectorName =Attra }- A value of type
awill become an attribute of the element, containing the textual representation ofa. The name of the attribute is specified by theselectorName, which must be supplied for the field. { selectorName =Childa }- A value of type
awill become a child element. The name of the child element is specified by theselectorName, which must be supplied for the field. The child element will contain a text node containing the textual representation ofa. { selectorName =Contenta }- A value of type
awill become a text node of the element. TheselectorNameis not used in the encoding to XML, and is optional. { selectorName =a}- Value
awill become a child element. TheselectorNameis optional and is not used in the encoding to XML. There must be an appropriate instance ofToElemand / orFromElemfor the typea.
In addition to these wrappers in their basic form, they can also be combined
with Maybe and lists to create optional and list DOM parts. The following
combinations are supported automatically:
The case of Attr [a] is not supported because there is no obvious
encoding for more than one value of an attribute. Similarly, Content [a]
would be problematic because a list of text content nodes could not be
separated from each other trivially. To encode lists in attributes or text
content, instances of ToXText [a] / FromXText [a] can be supplied
for type a that can handle case-specific encoding.
Newtype aliasing
In addition to the basic encoding types (Attr, Child and Content), it
is possible to alias an entire element using a newtype. An instance for a
newtype created using genericToElem / genericFromElem will use the
encoding for the wrapped type with the name of the newtype constructor.
For example:
data Port = Port { Content Int } deriving (Show, Generic)
newtype InPort = InPort { unInPort :: Port } deriving (Show, Generic)
opt = defaultOptionsElement
instance ToElem Port where toElem = genericToElem opt
instance ToElem InPort where toElem = genericToElem opt
>>> render $ toElem (Port (Content 443))
"<Port>443</Port>"
>>> render $ toElem (InPort (Port (Content 443)))
"<InPort>443</InPort>"
Sum types
The name of an element is always specified by the name of the constructor in
Haskell. Sum types, with multiple constructors, are also supported in a
straightforward way. These can represent cases where one element can be
chosen from a selection of elements (ie. <xsd:choice> in an XML schema).
For example:
data Ref = Id { id :: Attr Int }
| Name { name :: Content Text }
deriving (Show, Generic)
opt = defaultOptionsElement
instance ToElem Ref where toElem = genericToElem opt
instance FromElem Ref where fromElem = genericFromElem opt
>>> text = render $ toElem (Name (Content Martok))
>>> text
"<Name>Martok</Name>"
>>> refResult = (parse text >>= fromElem) :: Result Ref
Success (Name { name = Content { unContent = "Martok" } })
Encoding options
Several options exist for the encoding. These are specified by
OptionsElement, which is passed as an argument to genericToElem and
genericFromElem. The following can be specified:
- Bijections from constructor and selector names to element names and attribute names.
- Whether nodes should be sequential (ie.
<xsd:sequence/>) or can appear in any order when reading. Nodes are always written sequentially. - Whether an error is produced when extra attributes or nodes exist in the XML but not in the Haskell datatype.
The naming bijections are particularly useful because often a particular XML schema will require names that are not directly representable as Haskell constructors or selectors. For example, XML names may start with lowercase characters, or they may require hyphens, or namespaces. In these instances, a function can be provided which must perform a bijective (one-to-one) mapping between the textual representation of an element or attribute name and its required XML name. As a simple example, we may want to drop the first two characters of a selector name:
import Data.Text (Text) import qualified Data.Text as T import qualified Text.XML as XML -- xml-conduit opt =defaultOptionsElement{optAttrName= attrName } attrName = Text ->AttrNameattrName selectorName =AttrName$ XML.Name (T.drop 2 selectorName) Nothing Nothing data Address = Address { adName ::AttrText } deriving (Show, Generic) ---> ^^ - drop these two letters from the attribute name instanceToElemAddress wheretoElem=genericToElemopt >>>render$toElem(Address (Attr"Josephine Citizen")) "<Address Name=\"Josephine Citizen\"/>"
Both AttrName and ElemName are newtype wrappers around XML.Name.
Separating encodings
Specifying the encoding in Haskell is clumsy, because types are littered with
mentions of Attr, Child and Content. From a practical perspective,
these are quite ugly if they appear in the application's data model. They
conflate the concerns of data representation and serialization, which should
be separate.
We can improve this situation by using one type for the application's
own data model and a separate type for the encoding. xml-tydom provides
some Template Haskell support to ease this process. For example:
{-# LANGUAGE TemplateHaskell #-}
import Text.XML.TyDom.Conduit.TH (makeEncoding)
-- Data type for the application (plain; no Attr, Child or Content)
data Address = Address
{ name :: Text
, street :: Text
, city :: Text
, zip :: Int
} deriving (Show, Generic)
-- Data type specifying the encoding. This must have the same form as the
-- application data type, except for mentions of Attr, Child and
-- Content.
data EncAddress = EncAddress
{ encName :: Child Text
, encStreet :: Child Text
, encCity :: Child Text
, encZip :: Child Int
} deriving (Show, Generic)
-- We need to specify both ToElem and FromElem instances for the
-- encoding type (the Template Haskell operation requires both):
instance ToElem EncAddress where
toElem = toElem defaultOptionsElement
instance FromElem EncAddress where
fromElem = fromElem defaultOptionsElement
-- But having done this, we can get Template Haskell to write instances for
-- the application type (Address). Instances are supplied for:
-- - ToElem Address
-- - FromElem Address
-- - Conv Address AddressEnc
-- - Conv AddressEnc Address
$(makeEncoding ''Address ''EncAddress)
If you use this approach, the names of attributes and elements are specified
using the encoding type (EncAddress in the above example), and not
the application data type. Under the hood, to produce XML, the application
data type is first converted to the encoding type (using a Generic
converter), and then the encoding type is converted to XML. The reverse
process is followed to read from XML. Because the encoding (and thus the
OptionsElement) is specified completely by the encoding type, the required
ToElem and FromElem instances for the application type are completely
unambiguous.
Error handling
Reading from XML to a type can fail. The result of reading from XML is the
Result type, which is a disjunction specifying either Success or
Failure. In the event of a Failure, the Path to the failed element from
the document root is recorded, as is a detailed Cause of the failure. If
you want a convenient textual representation of the failure, this can be
achieved with the renderFailure function. For example:
import qualified Data.Text.IO as T (putStr) path =PathItem(ElemName(XML.Name Root Nothing Nothing))PathRootcause =MissingAttribute(AttrName(XML.Name "myAttr" Nothing Nothing)) >>> T.putStr $renderFailure(Failurepath cause) Path: Root Missing attribute [myAttr]
Reading non-sequenced XML
Often, we are faced with reading child elements whose order is not
guaranteed. xml-tydom supports this to the greatest extent that is
feasible. To enable non-sequential reading, optReadChildOrdering must be
set to All in the OptionsElement that is used to generate the FromElem
instance. The handling of different cases can be addressed separately:
- content
- The first text content is accepted.
- optional content
- If no content is present then this becomes
Nothing. - child element
- The first child element which succeeds in
fromElemis accepted. - optional child element
- If no child element succeeds in
fromElemthen this becomesNothing. - list of child elements
- Every child element which succeeds in
fromElembecomes part of the list.
Given these rules, it should become apparent that certain combinations are
not valid for elements that are read as All. For example, while a data
type like the following is OK for Sequence elements, it will fail for All
elements:
-- This will work for aSequenceread, but not anAllread data OnlyOkForSequenced = OnlyOkForSequenced { aWidgets :: [Widgets] , grommit :: Grommit , bWidgets :: [Widgets] } deriving (Show, Generic)
However, similar rules also exist for Sequence reads, although they are
somewhat more obvious:
-- This will FAIL to read! DO NOT ACTUALLY USE IT
-- We can't possibly tell where the list of widget child elements ends, so
-- all of them will be consumed, leaving no remaining widget for the final
-- member of the datatype.
data Bad = Bad
{ aWidgets :: [Widgets]
, widget :: Widget
} deriving (Show, Generic)
Classes
Minimal complete definition
Typeclass for conversion between types p and q.
This typeclass is particularly used for conversion between raw types and
their XML-decorated versions, corresponding to Attr, Child, Content
and CData.
Minimal complete definition
Types
Attribute.
Specifies that a record field of type Attr z should become an XML
attribute. The name of the attribute is specified by the name of the record
selector, while the value is the textual representation of the value of
type z.
Instances
| (KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Attr z))) | S1 (named) + Attr - record selector for an XML attribute. |
| (KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Attr (Maybe z)))) | S1 (named) + Attr Maybe - record selector for an optional XML attribute. |
| (KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Attr z))) | S1 (named) + Attr - record selector for an XML attribute. |
| (KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Attr (Maybe z)))) | S1 (named) + Attr Maybe - record selector for optional XML attribute. |
| Conv z (Attr z) | |
| Eq z => Eq (Attr z) | |
| Show z => Show (Attr z) | |
| Arbitrary z => Arbitrary (Attr z) | |
| Conv (Attr z) z | |
Child (containing only text).
Specifies that a record field of type Child z should become a child
element of the current element, containing the textual representation of
the value of type z.
Instances
| (KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child z))) | S1 (named) + Child - record selector for a simple child element with text content. |
| (KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child (Maybe z)))) | S1 (named) + Child Maybe - record selector for a simple optional child element with text content. |
| (KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child [z]))) | S1 (named) + [Child] - record selector for a list of child elements with text content. |
| (KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child z))) | S1 (named) + Child - record selector for a simple child element with text. |
| (KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child (Maybe z)))) | S1 (named) + Child Maybe - record selector for an optional simple child element with text. |
| (KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child [z]))) | S1 (named) + [Child] - record selector for a list of simple child elements with text. |
| Conv z (Child z) | |
| Eq z => Eq (Child z) | |
| Show z => Show (Child z) | |
| Arbitrary z => Arbitrary (Child z) | |
| Conv (Child z) z | |
Content node.
Specifies that a record field of type Content z should become a content
node of the current element, containing the textual representation of the
value of type z.
Instances
| FromXText t z => GFromElem e n a t (S1 q (Rec0 (Content z))) | S1 (named or unnamed) + Content - record selector for a content child. |
| FromXText t z => GFromElem e n a t (S1 q (Rec0 (Content (Maybe z)))) | S1 (named or unnamed) + Content Maybe - record selector for an optional content child. |
| ToXText t z => GToElem e n a t (S1 q (Rec0 (Content z))) | S1 (named or unnamed) + Content - record selector for a content node. |
| ToXText t z => GToElem e n a t (S1 q (Rec0 (Content (Maybe z)))) | S1 (named or unnamed) + Content Maybe - record selector for an optional content node. |
| Conv z (Content z) | |
| Eq z => Eq (Content z) | |
| Show z => Show (Content z) | |
| Arbitrary z => Arbitrary (Content z) | |
| Conv (Content z) z | |
Constructors
| AttrName | |
Fields
| |
Constructors
| ElemName | |
Fields
| |
newtype XTextError :: * #
Error which may occur when parsing XML text.
Constructors
| XTextError Text |
Instances
data OptionsElement Source #
Constructors
| OptionsElement | |
Fields | |
data ReadNodeOrdering :: * #
Specifies how child nodes should be treated when reading a type from an element.
Constructors
| Sequence | Child nodes should be read in strict sequence (ie.
|
| All | Child nodes can appear in any order (ie. |
Instances
data ReadLeftovers :: * #
Specifies how any left-over parts of an element should be treated when reading a type from an element.
Constructors
| LeftoversOK | Left-over parts of an element are OK, and not an error. |
| LeftoversError | Left-over parts of an element should produce an error. |
Instances
Generics
genericToElem :: (Generic z, GToElem Element ElemName AttrName Text (Rep z)) => OptionsElement -> z -> Element Source #
Generic producer for ToElem instances.
genericFromElem :: (Generic z, GFromElem Element ElemName AttrName Text (Rep z)) => OptionsElement -> Element -> Result z Source #
Generic producer for FromElem instances.
genericConv :: (Generic a, Generic b, GConv (Rep a) (Rep b)) => a -> b #
Generic producer for a Conv instance.
Functions
renderFailure :: Result a -> Text Source #
ignoreWSContent :: Element -> Element Source #
xTextErrType :: Text -> Text -> Either XTextError a #
Formats an XTextError string and returns it as a Left instance.