HXQ: A Compiler from XQuery to Haskell

Description

HXQ is a fast and space-efficient translator from XQuery (the standard query language for XML) to embedded Haskell code. The translation is based on Haskell templates. HXQ takes full advantage of Haskell's lazy evaluation to keep in memory only those parts of XML data needed at each point of evaluation, thus performing stream-based evaluation for forward queries (queries that do not contain backward steps). This results to an implementation that is as fast and space-efficient as any stream-based implementation based on SAX filters or finite state machines. Furthermore, the coding is far simpler and extensible since its based on XML trees, rather than SAX events.

For example, the XQuery given at the bottom of this page, which is against the DBLP XML database (420MB), runs in 39 seconds on my laptop (using 64MB of heap space). To contrast this, Qexo, which compiles XQueries to Java bytecode, took 1 minute 17 seconds (using 2048MB of heap space).

HXQ uses the HXML parser for XML (developed by Joe English), which is included in the source. I have also tried hexpat, but I found it too slow for large documents. I haven't tried HaXML yet.

Installation Instructions

First, you need to install the Glasgow Haskell Compiler, ghc, and the parser generator for Haskell, happy. For example, in Fedora Linux, you install both using:

yum install ghc happy

Then download HXQ and untar it. You can either use cabal or make to build it. Then run xquery to test drive it.

Current Status

HXQ supports most essential XQuery features, although some system functions are missing (but are easy to add). To see the list of supported system functions, run xquery -help . HXQ does not have static typechecking; it leaves all checking to Haskell. This means that it distinguishes regular predicates from indexing at run time: if an XPath predicate returns an integer at run time, it is taken as indexing and this index is checked against the current node position. The most important omission is backward step axes, such as /.. (parent). Some, but not all, parent axis steps are removed using optimization rules; all others cause a compilation error.

Using the Compiler

The main functions for embedding XQueries in Haskell are:

$(xe query) :: XSeq
$(xq query) :: IO XSeq

where query is a string value (a Haskell expression that evaluates to a string at compile-time), They both translate the query into Haskell code, which is compiled and optimized into machine code directly. The code that xe generates has type XSeq (a sequence of XML trees of type [XTree]) while the code that xq generates has type (IO XSeq). If the query reads at least one document (using doc(...)), then you should use xq since it requires IO. To define constant XML data or a function body, it is better to use xe. You can use the value of a Haskell variable v inside a query using $v as long as v has type XSeq. To use a function in a query, it should be defined in Haskell with type (XSeq,...,XSeq) -> XSeq.

Here is an example of a main program:

f(x,y) = $(xe "<article><first>{$x}</first><second>{$y}</second></article>")

main = do a <- $(xq ("<result>{                                                        "
                 ++"     for $x at $i in doc('data/dblp.xml')//inproceedings           "
                 ++"     where $x/author = 'Leonidas Fegaras'                          "
                 ++"     order by $x/year descending                                   "
                 ++"     return <paper>{ $i, ') ', $x/booktitle/text(),                "
                 ++"                     ': ', $x/title/text()                         "
                 ++"            }</paper>                                              "
                 ++"  }</result>                                                       "))
          putStrLn (show a)
          b <- $(xq " f( $a/paper[10], $a/paper[8] ) ")
          putStrLn (show b)

Another example, can be found in Test1.hs.

You can compile an XQuery file into a Haskell program (Temp.hs) using xquery -c file. Or better, you can use the Unix shell script compile to compile the XQuery file to an executable. For example:

compile data/q1.xq

will compile the XQuery file data/q1.xq into a.out.

Using the Interpreter

The HXQ interpreter is far more slower than the compiler; use it only if you need to evaluate ad-hoc XQueries read from input or from files. The main functions are:

xquery :: String -> IO XSeq -- Evaluates an XQuery in a string
xfile :: String -> IO XSeq -- Evaluates an XQuery in a file

The HXQ interpreter doesn't recognize Haskell variables and functions (but you may declare XQuery variables and functions using the XQuery 'declare' syntax). The main HXQ program, called xquery, evaluates an XQuery in a file using the interpreter. For example:

xquery data/q1.xq

Without an argument, it reads the XQuery from input. With xquery -help you get the list of system functions.

Extending HXQ

If you want to extend HXQ, you need to read about Template Haskell. Use make ghci to do various translation tests in ghci. You can look at the Abstract Syntax Trees and the generated code from a query by evaluating cq query.

Last modified: 04/11/08 by Leonidas Fegaras