HXQ: A Compiler from XQuery to Haskell

Download HXQ-0.12.0.tar.gz

Description

HXQ is a fast and space-efficient translator from XQuery (the standard query language for XML) to embedded Haskell code. The translation is based on Haskell templates. HXQ takes full advantage of Haskell's lazy evaluation to keep in memory only those parts of XML data needed at each point of evaluation, thus performing stream-based evaluation for forward queries (queries that do not contain backward steps). This results to an implementation that is as fast and space-efficient as any stream-based implementation based on SAX filters or finite state machines. Furthermore, the coding is far simpler and extensible since its based on XML trees, rather than SAX events.

For example, the XQuery given below, which is against the DBLP XML file (420MB), runs in 39 seconds on my laptop PC (using 18MB of max heap space). To contrast this, Qexo, which compiles XQueries to Java bytecode, took 1 minute 17 seconds (using no less than 1400MB of heap space). Also XQilla, which is written in C++, took 1 minute and 10 secs (using 1150MB of heap space). (All results are taken on an Intel Core 2 Duo 2.2GHz 2GB running ghc-6.8.2 on Linux 2.6.23 kernel.)

Finally, HXQ can store XML documents in a relational database (currently MySQL or SQLite), by shredding XML into relational tuples, and by translating XQueries over the shredded documents into optimized SQL queries.

Installation Instructions (HXQ without Database Connectivity)

HXQ can be installed on most platforms but I have only tested it on Linux, MAC OS X, and Windows XP. The simplest installation is without database connectivity (ie, it can only process XQueries against XML text documents). If you want database connectivity (over mysql or sqlite relational databases), look at the installation instructions with database connectivity.

First, you need to install the Glasgow Haskell Compiler, ghc. Optionally, if you want to modify the XQuery parser, you need to install the parser generator for Haskell, happy. Then, download HXQ version 0.12.0 and untar it (using tar xfz on Linux or 7z x on Windows). Then, you execute the following commands inside the HXQ directory:

runhaskell Setup.lhs configure
runhaskell Setup.lhs build
runhaskell Setup.lhs install
On linux, the last command must be run as root. If you get an error during configuration that the readline package is missing, install readline before HXQ. HXQ consists of the executable xquery, which is the XQuery interpreter, and the HXQ library. To use the HXQ library in a Haskell program, simply import Text.XML.HXQ.XQuery.

Installation Instructions for Database Connectivity

Current Status

HXQ supports most essential XQuery features, although some system functions are missing (but are easy to add). To see the list of supported system functions, run xquery -help . HXQ does not have static typechecking; it leaves all checking to Haskell. In addition, the XQuery semantics requires duplicate elimination and sorting by document order for every XPath step, which is very expensive and unnecessary in most cases. This is not currently supported by HXQ but will be addressed in the future (needs a static analysis to determine when duplicate elimination is necessary). For example, e//*//* may return duplicate elements in HXQ.

HXQ uses the HXML parser for XML (developed by Joe English), which is included in the source. I have also tried hexpat, tagsoup, HXT, and HaXML Xtract, but they all have space leaks.

HXQ has two parsers: one that generates simple rose trees from XML documents, which can be processed by forward queries without space leaks, and another parser where each tree node has a reference to its parent. Some, but not all, backward axis steps (such as the parent axis /..) are removed from a query using optimization rules. If there are backward axis steps left in the query, then HXQ uses the latter parser, which may result to a performance penalty due to space leaks.

XQuery Documentation

The complete XQuery syntax in HXQ is described in hxq-manual.pdf. Here some tutorials on XPath and XQuery. Here is a nice course on XML and XQuery.

Using the Compiler

The main functions for embedding XQueries in Haskell are:

where query is a string value (a Haskell expression that evaluates to a string at compile-time). They both translate the query into Haskell code, which is compiled and optimized into machine code directly. The code that xe generates has type XSeq (a sequence of XML trees of type [XTree]) while the code that xq generates has type (IO XSeq). If the query reads a document (using doc(...)) or calls an external function, then you should use xq since it requires IO. You can use the value of a Haskell variable v inside a query using $v as long as v has type XSeq. To call a function in a query, it should be defined in Haskell with type (XSeq,...,XSeq) -> IO XSeq.

Here is an example of a main program:

f(x,y) = $(xq "<article><first>{$x}</first><second>{$y}</second></article>")

main = do a <- $(xq ("<result>{                                                        "
                 ++"     for $x at $i in doc('data/dblp.xml')//inproceedings           "
                 ++"     where $x/author = 'Leonidas Fegaras'                          "
                 ++"     order by $x/year descending                                   "
                 ++"     return <paper>{ $i, ') ', $x/booktitle/text(),                "
                 ++"                     ': ', $x/title/text()                         "
                 ++"            }</paper>                                              "
                 ++"  }</result>                                                       "))
          putXSeq a
          b <- $(xq " f( $a/paper[10], $a/paper[8] ) ")
          putXSeq b
Another example, can be found in Test1.hs. You compile it using ghc --make Test1.hs -o a.out.

You can compile an XQuery file into a Haskell program (Temp.hs) using xquery -c file. Or better, you can use the script compile (on either Unix or Windows) to compile the XQuery file to an executable. For example:

compile data/q1.xq
will compile the XQuery file data/q1.xq into the executable a.out.

Using the Interpreter

The HXQ interpreter is far more slower than the compiler; use it only if you need to evaluate ad-hoc XQueries read from input or from files. The main functions are:

The HXQ interpreter doesn't recognize Haskell variables and functions (but you may declare XQuery variables and functions using the XQuery 'declare' syntax). The main HXQ program, called xquery, evaluates an XQuery in a file using the interpreter. For example:
xquery data/q1.xq
Without an argument, it reads and evaluates XQueries and variable/function declarations from input. With xquery -p xpath-query xml-file you evaluate an XPath query against an XML file, eg. xquery -p "//inproceedings[100]" data/dblp.xml. With xquery -help you get the list of system functions and usage information.


Last modified: 12/06/08 by Leonidas Fegaras