HaXml
HaXml is a collection of utilities for parsing, filtering,
transforming, and generating
XML documents using
Haskell. Its basic facilities
include:
- a parser for XML,
- a separate error-correcting parser for HTML,
- an XML validator,
- pretty-printers for XML and HTML.
For processing XML documents, the following components are provided:
- Combinators is a combinator library for generic XML document
processing, including transformation, editing, and generation.
- Haskell2Xml is a replacement class for Haskell's Show/Read
classes: it allows you to read and write ordinary Haskell data as XML
documents. The DrIFT tool (available from
http://repetae.net/~john/computer/haskell/DrIFT/)
can automatically derive this class for you.
- DtdToHaskell is a tool for translating any valid XML DTD
into equivalent Haskell types.
- In conjunction with the Xml2Haskell class framework,
this allows you to generate, edit, and transform documents as normal
typed values in programs, and to read and write them as human-readable
XML documents.
- Finally, Xtract is a grep-like tool for XML documents,
loosely based on the XPath and XQL query languages. It can be used
either from the command-line, or within your own code as part of the
library.
Detailed documentation of the HaXml APIs
is generated automatically by Haddock directly from the source code.
An introduction to HaXml for people who know more about XML than
about Haskell can be found at
IBM DeveloperWorks.
A paper describing and comparing the generic Combinators with
the typed representation (DtdToHaskell/Xml2Haskell) is available here:
(12 pages of double-column A4)
Some additional info about using the various facilities is here:
Known problems:
- To use -package HaXml interactively with GHCi, you need
at least ghci-5.02.3.
- Haskell2Xml generates Parameter Entity Declarations in the internal
subset of the DTD, which don't conform to the strict well-formedness
conditions of XML. We think the constraint in question is spurious,
and any reasonable XML tool ought to deal adequately with full PEs.
Nevertheless, many standard XML processors reject these auto-generated
DTDs. The solution is easy - just move the DTD into a separate file!
- DtdToHaskell generates the Haskell String type for DTD attributes
that are of Tokenized or Notation Types in XML. This may not be
entirely accurate.
Current version:
HaXml-1.13.2, release date 2006.09.08
By HTTP:
.tar.gz,
.zip.
By FTP:
ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/
FreeBSD port:
http://freshports.org/textproc/haxml/
Development version:
The development version of HaXml is available separately. See
http://www.cs.york.ac.uk/fp/HaXml-devel/
Older versions:
By FTP:
ftp://ftp.cs.york.ac.uk/pub/haskell/HaXml/
To install HaXml, you must have a Haskell compiler: ghc-5.02
or later, and/or nhc98-1.14/hmake-3.06 or later, and/or
Hugs98 (Sept 2003) or later. Use
./configure [--prefix=...] [--buildwith=...]
make
make install
to configure, build, and install HaXml as a package for your
compiler(s). You need write permission on the library installation
directories of your compiler(s). Afterwards, to gain access to
the HaXml libraries, you only need to add the option -package
HaXml to your compiler commandline (no option required for Hugs).
Various stand-alone tools are also built - DtdToHaskell, Xtract,
Validate, MkOneOf - and copied to the final installation location
specified by the --prefix=... option to configure.
To build/install on a Windows system without the Cygwin shell and
utilities, you can avoid the configure/make steps by simply using the
minimal Build.bat script. Edit it first for the location
of your compiler etc.
Graham Klyne has extended the 1.12 version of HaXml significantly, in
particular to ensure that the parser passes a large XML acceptance test
suite, and to deal more correctly with Unicode, namespaces, and parameter
entity expansion. His modifications will eventually be merged
back in to the main tree, but in the meantime, you can get his
version here:
http://www.ninebynine.org/Software/HaskellUtils/
The latest stable version (1.13.2) has the following features and fixes:
- Updated to work with ghc-6.6 (removed uses of Data.FiniteMap).
The prior version (1.13.1) has the following features and fixes:
- Bugfix to permit percent character in attribute values.
- Bugfix to parse unquoted attribute values starting '+' or '#' in HTML.
- Bugfix to keep the original DTD in output of 'processXmlWith'.
Version 1.13 has the following features and fixes:
- Bugfixes to the document validator: no more infinite loops.
- Bugfixes to lexing mixed text and references between quote chars.
- Updated to work with ghc-6.4's new package mechanism.
Complete Changelog
We are interested in hearing your feedback on these XML facilities -
suggestions for improvements, comments, criticisms, bug reports. Please mail
Development of these XML libraries was originally funded by Canon
Research Europe Ltd.. Subsequent maintenance and development has
been partially supported by the EPSRC, and the University of York.
Licence: The library is Free and Open Source Software,
i.e., the bits we wrote are copyright to us, but freely licensed
for your use, modification, and re-distribution, provided you don't
restrict anyone else's use of it. The HaXml library is distributed
under the GNU Lesser General Public Licence (LGPL) - see file
LICENCE-LGPL for more details. We allow one
special exception to the LGPL - see COPYRIGHT.
The HaXml tools are distributed under the GNU General Public Licence
(GPL) - see LICENCE-GPL. (If you don't
like any of these licensing conditions, please contact us to discuss
your requirements.)
- Joe English has written a more space-efficient parser for XML
in Haskell, called hxml. What is more, it can be used as a simple
drop-in replacement for the HaXml parser!
Available here.
- Uwe Schmidt recently designed another
Haskell XML Toolbox
based on the ideas of HaXml and hxml.
- Some comparisons between functional language approaches to processing
XML can be found in
Bijan Parsia's article on xml.com
- Christian Lindig has written an XML parser in O'Caml:
here.
- Andreas Neumann of the University of Trier has written a
validating XML parser in Standard ML:
here.
- Erik Meijer and Mark Shields have a design for a functional programming
language that treats XML documents as basic data types:
XMLambda.
- Benjamin Pierce and Haruo Hosoya have a different but similar design in
XDuce, which is
also implemented.
- Taking XDuce's approach further, is the very cool
CDuce by Véronique Benzaken,
Guiseppe Castagna, and Alain Frisch. The CDuce language does
fully statically-typed transformation of XML documents, thus
guaranteeing correctness, and what is more, it is also faster
than the untyped XSLT!
- The Xcerpt project uses HaXml
to create another rule-based query and transformation language for XML,
inspired by logic programming, and based on positional selection rather
than navigational selection.
- Ulf Wiger describes an Erlang toolkit for XML:
XMerL
- The Java world has adopted the ideas from DtdToHaskell into
the Java Architecture for XML Binding
(JAXB). JAXB translates
an XML Schema Definition into a set of Java classes, and provides
the runtime machinery (like Xml2Haskell) for reading and
writing objects of those classes to/from XML files.
- There is a comprehensive reading list for XML and web programming in
functional languages here.