Safe Haskell | None |
---|---|
Language | Haskell2010 |
- class ToElem a where
- class FromElem a where
- class ToXText a where
- class FromXText a where
- class Conv p q where
- newtype Attr z :: * -> * = Attr {
- unAttr :: z
- newtype Child z :: * -> * = Child {
- unChild :: z
- newtype Content z :: * -> * = Content {
- unContent :: z
- newtype AttrName = AttrName {
- unAttrName :: Name
- newtype ElemName = ElemName {
- unElemName :: Name
- newtype XTextError :: * = XTextError Text
- data Result a
- data Path
- data Cause
- data OptionsElement = OptionsElement {}
- data ReadNodeOrdering :: *
- data ReadLeftovers :: *
- genericToElem :: (Generic z, GToElem Element ElemName AttrName Text (Rep z)) => OptionsElement -> z -> Element
- genericFromElem :: (Generic z, GFromElem Element ElemName AttrName Text (Rep z)) => OptionsElement -> Element -> Result z
- genericConv :: (Generic a, Generic b, GConv (Rep a) (Rep b)) => a -> b
- unAttr :: Attr z -> z
- unChild :: Child z -> z
- unContent :: Content z -> z
- defaultOptionsElement :: OptionsElement
- render :: Element -> Text
- renderFailure :: Result a -> Text
- parse :: Text -> Result Element
- ignoreWSContent :: Element -> Element
- getAttrValue :: FromXText a => AttrName -> Element -> Result a
- xTextErrType :: Text -> Text -> Either XTextError a
Guide
Quick start
xml-tydom
is a library for expressing XML representations using Haskell
data types. The serialization to and from XML is done automatically using GHC
Generics and (optionally) some Template Haskell. A good way to illustrate
this is with a quick example.
We start with a Haskell data type that describes the XML structure we want:
{-# LANGUAGE DeriveGeneric #-} import GHC.Generics (Generic) data Person = Person { id ::Attr
Int -- an attribute , name ::Child
Text -- a child element containing text , comment ::Content
Text -- a child text content node } deriving (Show, Generic)
Then we use GHC Generics to write instances of ToElem
and FromElem
for
the Person
type (you can probably guess what these do):
instanceToElem
Person wheretoElem
=genericToElem
defaultOptionsElement
instanceFromElem
Person wherefromElem
=genericFromElem
defaultOptionsElement
With these typeclass instances available, we can serialize a value of type
Person
to Text
containing XML, and also read back the generated Text
:
>>> person = Person (Attr
42) (Child
"Joe") (Content
"XML4Joe!") >>> text =render
$toElem
person >>> text "<Person id=\"42\"><name>Joe</name>XML4Joe!</Person>" >>> personResult = (parse
text >>=fromElem
) ::Result
PersonSuccess
(Person { id =Attr
{unAttr
= 42 } , name =Child
{unChild
= "Joe" } , comment =Content
{unContent
= "XML4Joe!" } })
XML text
Textual content in XML documents can appear as either attributes or text content nodes within elements. The conversion of types to and from text is controlled by a pair of typeclasses:
ToXText
a
- Converts type
a
toText
. FromXText
a
- Converts
Text
toEither
.XTextError
a
It is recommended that instances of these typeclasses should be written manually for most user-defined types.
Available encodings
The following types exist to represent parts of the XML DOM:
{ selectorName =
Attr
a }
- A value of type
a
will become an attribute of the element, containing the textual representation ofa
. The name of the attribute is specified by theselectorName
, which must be supplied for the field. { selectorName =
Child
a }
- A value of type
a
will become a child element. The name of the child element is specified by theselectorName
, which must be supplied for the field. The child element will contain a text node containing the textual representation ofa
. { selectorName =
Content
a }
- A value of type
a
will become a text node of the element. TheselectorName
is not used in the encoding to XML, and is optional. { selectorName =
a
}
- Value
a
will become a child element. TheselectorName
is optional and is not used in the encoding to XML. There must be an appropriate instance ofToElem
and / orFromElem
for the typea
.
In addition to these wrappers in their basic form, they can also be combined
with Maybe
and lists to create optional and list DOM parts. The following
combinations are supported automatically:
The case of Attr
[a]
is not supported because there is no obvious
encoding for more than one value of an attribute. Similarly, Content
[a]
would be problematic because a list of text content nodes could not be
separated from each other trivially. To encode lists in attributes or text
content, instances of ToXText
[a]
/ FromXText
[a]
can be supplied
for type a
that can handle case-specific encoding.
Newtype aliasing
In addition to the basic encoding types (Attr
, Child
and Content
), it
is possible to alias an entire element using a newtype
. An instance for a
newtype
created using genericToElem
/ genericFromElem
will use the
encoding for the wrapped type with the name of the newtype
constructor.
For example:
data Port = Port { Content Int } deriving (Show, Generic) newtype InPort = InPort { unInPort :: Port } deriving (Show, Generic) opt =defaultOptionsElement
instanceToElem
Port wheretoElem
=genericToElem
opt instanceToElem
InPort wheretoElem
=genericToElem
opt >>> render $ toElem (Port (Content 443)) "<Port>443</Port>" >>> render $ toElem (InPort (Port (Content 443))) "<InPort>443</InPort>"
Sum types
The name of an element is always specified by the name of the constructor in
Haskell. Sum types, with multiple constructors, are also supported in a
straightforward way. These can represent cases where one element can be
chosen from a selection of elements (ie. <xsd:choice>
in an XML schema).
For example:
data Ref = Id { id ::Attr
Int } | Name { name ::Content
Text } deriving (Show, Generic) opt =defaultOptionsElement
instanceToElem
Ref wheretoElem
=genericToElem
opt instanceFromElem
Ref wherefromElem
=genericFromElem
opt >>> text =render
$toElem
(Name (Content
Martok)) >>> text "<Name>Martok</Name>" >>> refResult = (parse
text >>=fromElem
) ::Result
RefSuccess
(Name { name =Content
{unContent
= "Martok" } })
Encoding options
Several options exist for the encoding. These are specified by
OptionsElement
, which is passed as an argument to genericToElem
and
genericFromElem
. The following can be specified:
- Bijections from constructor and selector names to element names and attribute names.
- Whether nodes should be sequential (ie.
<xsd:sequence/>
) or can appear in any order when reading. Nodes are always written sequentially. - Whether an error is produced when extra attributes or nodes exist in the XML but not in the Haskell datatype.
The naming bijections are particularly useful because often a particular XML schema will require names that are not directly representable as Haskell constructors or selectors. For example, XML names may start with lowercase characters, or they may require hyphens, or namespaces. In these instances, a function can be provided which must perform a bijective (one-to-one) mapping between the textual representation of an element or attribute name and its required XML name. As a simple example, we may want to drop the first two characters of a selector name:
import Data.Text (Text) import qualified Data.Text as T import qualified Text.XML as XML -- xml-conduit opt =defaultOptionsElement
{optAttrName
= attrName } attrName = Text ->AttrName
attrName selectorName =AttrName
$ XML.Name (T.drop 2 selectorName) Nothing Nothing data Address = Address { adName ::Attr
Text } deriving (Show, Generic) ---> ^^ - drop these two letters from the attribute name instanceToElem
Address wheretoElem
=genericToElem
opt >>>render
$toElem
(Address (Attr
"Josephine Citizen")) "<Address Name=\"Josephine Citizen\"/>"
Both AttrName
and ElemName
are newtype
wrappers around XML.Name
.
Separating encodings
Specifying the encoding in Haskell is clumsy, because types are littered with
mentions of Attr
, Child
and Content
. From a practical perspective,
these are quite ugly if they appear in the application's data model. They
conflate the concerns of data representation and serialization, which should
be separate.
We can improve this situation by using one type for the application's
own data model and a separate type for the encoding. xml-tydom
provides
some Template Haskell support to ease this process. For example:
{-# LANGUAGE TemplateHaskell #-} import Text.XML.TyDom.Conduit.TH (makeEncoding
) -- Data type for the application (plain; noAttr
,Child
orContent
) data Address = Address { name :: Text , street :: Text , city :: Text , zip :: Int } deriving (Show, Generic) -- Data type specifying the encoding. This must have the same form as the -- application data type, except for mentions ofAttr
,Child
and --Content
. data EncAddress = EncAddress { encName ::Child
Text , encStreet ::Child
Text , encCity ::Child
Text , encZip ::Child
Int } deriving (Show, Generic) -- We need to specify bothToElem
andFromElem
instances for the -- encoding type (the Template Haskell operation requires both): instanceToElem
EncAddress wheretoElem
=toElem
defaultOptionsElement
instanceFromElem
EncAddress wherefromElem
=fromElem
defaultOptionsElement
-- But having done this, we can get Template Haskell to write instances for -- the application type (Address). Instances are supplied for: -- -ToElem
Address -- -FromElem
Address -- -Conv
Address AddressEnc -- -Conv
AddressEnc Address $(makeEncoding
''Address ''EncAddress)
If you use this approach, the names of attributes and elements are specified
using the encoding type (EncAddress
in the above example), and not
the application data type. Under the hood, to produce XML, the application
data type is first converted to the encoding type (using a Generic
converter), and then the encoding type is converted to XML. The reverse
process is followed to read from XML. Because the encoding (and thus the
OptionsElement
) is specified completely by the encoding type, the required
ToElem
and FromElem
instances for the application type are completely
unambiguous.
Error handling
Reading from XML to a type can fail. The result of reading from XML is the
Result
type, which is a disjunction specifying either Success
or
Failure
. In the event of a Failure
, the Path
to the failed element from
the document root is recorded, as is a detailed Cause
of the failure. If
you want a convenient textual representation of the failure, this can be
achieved with the renderFailure
function. For example:
import qualified Data.Text.IO as T (putStr) path =PathItem
(ElemName
(XML.Name Root Nothing Nothing))PathRoot
cause =MissingAttribute
(AttrName
(XML.Name "myAttr" Nothing Nothing)) >>> T.putStr $renderFailure
(Failure
path cause) Path: Root Missing attribute [myAttr]
Reading non-sequenced XML
Often, we are faced with reading child elements whose order is not
guaranteed. xml-tydom
supports this to the greatest extent that is
feasible. To enable non-sequential reading, optReadChildOrdering
must be
set to All
in the OptionsElement
that is used to generate the FromElem
instance. The handling of different cases can be addressed separately:
- content
- The first text content is accepted.
- optional content
- If no content is present then this becomes
Nothing
. - child element
- The first child element which succeeds in
fromElem
is accepted. - optional child element
- If no child element succeeds in
fromElem
then this becomesNothing
. - list of child elements
- Every child element which succeeds in
fromElem
becomes part of the list.
Given these rules, it should become apparent that certain combinations are
not valid for elements that are read as All
. For example, while a data
type like the following is OK for Sequence
elements, it will fail for All
elements:
-- This will work for aSequence
read, but not anAll
read data OnlyOkForSequenced = OnlyOkForSequenced { aWidgets :: [Widgets] , grommit :: Grommit , bWidgets :: [Widgets] } deriving (Show, Generic)
However, similar rules also exist for Sequence
reads, although they are
somewhat more obvious:
-- This will FAIL to read! DO NOT ACTUALLY USE IT -- We can't possibly tell where the list of widget child elements ends, so -- all of them will be consumed, leaving no remaining widget for the final -- member of the datatype. data Bad = Bad { aWidgets :: [Widgets] , widget :: Widget } deriving (Show, Generic)
Classes
Types
Attribute.
Specifies that a record field of type Attr z
should become an XML
attribute. The name of the attribute is specified by the name of the record
selector, while the value is the textual representation of the value of
type z
.
(KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Attr z))) | S1 (named) + Attr - record selector for an XML attribute. |
(KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Attr (Maybe z)))) | S1 (named) + Attr Maybe - record selector for an optional XML attribute. |
(KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Attr z))) | S1 (named) + Attr - record selector for an XML attribute. |
(KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Attr (Maybe z)))) | S1 (named) + Attr Maybe - record selector for optional XML attribute. |
Conv z (Attr z) | |
Eq z => Eq (Attr z) | |
Show z => Show (Attr z) | |
Arbitrary z => Arbitrary (Attr z) | |
Conv (Attr z) z | |
Child (containing only text).
Specifies that a record field of type Child z
should become a child
element of the current element, containing the textual representation of
the value of type z
.
(KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child z))) | S1 (named) + Child - record selector for a simple child element with text content. |
(KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child (Maybe z)))) | S1 (named) + Child Maybe - record selector for a simple optional child element with text content. |
(KnownSymbol name, FromXText t z) => GFromElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child [z]))) | S1 (named) + [Child] - record selector for a list of child elements with text content. |
(KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child z))) | S1 (named) + Child - record selector for a simple child element with text. |
(KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child (Maybe z)))) | S1 (named) + Child Maybe - record selector for an optional simple child element with text. |
(KnownSymbol name, ToXText t z) => GToElem e n a t (S1 (MetaSel (Just Symbol name) g h i) (Rec0 (Child [z]))) | S1 (named) + [Child] - record selector for a list of simple child elements with text. |
Conv z (Child z) | |
Eq z => Eq (Child z) | |
Show z => Show (Child z) | |
Arbitrary z => Arbitrary (Child z) | |
Conv (Child z) z | |
Content node.
Specifies that a record field of type Content z
should become a content
node of the current element, containing the textual representation of the
value of type z
.
FromXText t z => GFromElem e n a t (S1 q (Rec0 (Content z))) | S1 (named or unnamed) + Content - record selector for a content child. |
FromXText t z => GFromElem e n a t (S1 q (Rec0 (Content (Maybe z)))) | S1 (named or unnamed) + Content Maybe - record selector for an optional content child. |
ToXText t z => GToElem e n a t (S1 q (Rec0 (Content z))) | S1 (named or unnamed) + Content - record selector for a content node. |
ToXText t z => GToElem e n a t (S1 q (Rec0 (Content (Maybe z)))) | S1 (named or unnamed) + Content Maybe - record selector for an optional content node. |
Conv z (Content z) | |
Eq z => Eq (Content z) | |
Show z => Show (Content z) | |
Arbitrary z => Arbitrary (Content z) | |
Conv (Content z) z | |
newtype XTextError :: * #
Error which may occur when parsing XML text.
data OptionsElement Source #
data ReadNodeOrdering :: * #
Specifies how child nodes should be treated when reading a type from an element.
data ReadLeftovers :: * #
Specifies how any left-over parts of an element should be treated when reading a type from an element.
LeftoversOK | Left-over parts of an element are OK, and not an error. |
LeftoversError | Left-over parts of an element should produce an error. |
Generics
genericToElem :: (Generic z, GToElem Element ElemName AttrName Text (Rep z)) => OptionsElement -> z -> Element Source #
Generic producer for ToElem
instances.
genericFromElem :: (Generic z, GFromElem Element ElemName AttrName Text (Rep z)) => OptionsElement -> Element -> Result z Source #
Generic producer for FromElem
instances.
genericConv :: (Generic a, Generic b, GConv (Rep a) (Rep b)) => a -> b #
Generic producer for a Conv
instance.
Functions
renderFailure :: Result a -> Text Source #
ignoreWSContent :: Element -> Element Source #
xTextErrType :: Text -> Text -> Either XTextError a #
Formats an XTextError
string and returns it as a Left
instance.