HXQ: A Compiler from XQuery to Haskell

Download HXQ-0.9.0.tar.gz

Description

HXQ is a fast and space-efficient translator from XQuery (the standard query language for XML) to embedded Haskell code. The translation is based on Haskell templates. HXQ takes full advantage of Haskell's lazy evaluation to keep in memory only those parts of XML data needed at each point of evaluation, thus performing stream-based evaluation for forward queries (queries that do not contain backward steps). This results to an implementation that is as fast and space-efficient as any stream-based implementation based on SAX filters or finite state machines. Furthermore, the coding is far simpler and extensible since its based on XML trees, rather than SAX events.

For example, the XQuery given below, which is against the DBLP XML database (420MB), runs in 39 seconds on my laptop PC (using 18MB of max heap space). To contrast this, Qexo, which compiles XQueries to Java bytecode, took 1 minute 17 seconds (using no less than 1400MB of heap space). Also XQilla, which is written in C++, took 1 minute and 10 secs (using 1150MB of heap space). (All results are taken on an Intel Core 2 Duo 2.2GHz 2GB running ghc-6.8.2 on Linux 2.6.23 kernel.)

Finally, HXQ can store XML documents in a relational database (currently SQLite), by shredding XML into relational tuples, and by translating XQueries over the shredded documents into optimized SQL queries.

Installation Instructions

HXQ can be installed on most platforms but I have only tested it on Linux and Windows XP. The simplest installation is without database connectivity (ie, it can only process XQueries against XML text documents).

First, you need to install the Glasgow Haskell Compiler, ghc. Optionally, if you want to modify the XQuery parser, you need to install the parser generator for Haskell, happy. Then, download HXQ version 0.9.0 and untar it (using tar xfz on Linux or 7z x on Windows). Then configure cabal without or with database connectivity:

Without database connectivity:

To configure HXQ, do:

runhaskell Setup.lhs configure

With database connectivity:

For database connectivity, you need to install SQLite. (On Linux, you can install it using yum install sqlite.) Then you need to install the Haskell packages: HDBC 1.1.4 (but not version 1.1.5) and the HDBC-sqlite3 1.1.4 driver to connect to SQLite relational databases. Then you configure HXQ:

runhaskell Setup.lhs configure -fdb

runhaskell Setup.lhs build
runhaskell Setup.lhs install

xqueryimport Text.XML.HXQ.XQuery

Current Status

HXQ supports most essential XQuery features, although some system functions are missing (but are easy to add). To see the list of supported system functions, run xquery -help . HXQ does not have static typechecking; it leaves all checking to Haskell. In addition, the XQuery semantics requires duplicate elimination and sorting by document order for every XPath step, which is very expensive and unnecessary in most cases. This is not currently supported by HXQ but will be addressed in the future (needs a static analysis to determine when duplicate elimination is necessary). For example, e//*//* may return duplicate elements in HXQ.

HXQ uses the HXML parser for XML (developed by Joe English), which is included in the source. I have also tried hexpat, tagsoup, HXT, and HaXML Xtract, but they all have space leaks.

HXQ has two parsers: one that generates simple rose trees from XML documents, which can be processed by forward queries without space leaks, and another parser where each tree node has a reference to its parent. Some, but not all, backward axis steps (such as the parent axis /..) are removed from a query using optimization rules. If there are backward axis steps left in the query, then HXQ uses the latter parser, which may result to a performance penalty due to space leaks.

Using the Compiler

The main functions for embedding XQueries in Haskell are:

$(xe query) :: XSeq
$(xq query) :: IO XSeq

queryat compile-timeXSeq[XTree](IO XSeq)v$vvXSeq(XSeq,...,XSeq) -> XSeq

Here is an example of a main program:

f(x,y) = $(xe "<article><first>{$x}</first><second>{$y}</second></article>")

main = do a <- $(xq ("<result>{                                                        "
                 ++"     for $x at $i in doc('data/dblp.xml')//inproceedings           "
                 ++"     where $x/author = 'Leonidas Fegaras'                          "
                 ++"     order by $x/year descending                                   "
                 ++"     return <paper>{ $i, ') ', $x/booktitle/text(),                "
                 ++"                     ': ', $x/title/text()                         "
                 ++"            }</paper>                                              "
                 ++"  }</result>                                                       "))
          putXSeq a
          b <- $(xq " f( $a/paper[10], $a/paper[8] ) ")
          putXSeq b

Test1.hsghc --make Test1.hs -o a.out

You can compile an XQuery file into a Haskell program (Temp.hs) using xquery -c file. Or better, you can use the script compile (on either Unix or Windows) to compile the XQuery file to an executable. For example:

compile data/q1.xq

data/q1.xqa.out

Using the Interpreter

The HXQ interpreter is far more slower than the compiler; use it only if you need to evaluate ad-hoc XQueries read from input or from files. The main functions are:

xquery :: String -> IO XSeq -- Evaluates an XQuery in a string
xfile :: String -> IO XSeq -- Evaluates an XQuery in a file

xquery

xquery data/q1.xq

xquery -p xpath-query xml-filexquery -p "//inproceedings[100]" data/dblp.xmlxquery -help

Database Connectivity

HXQ provides an interface to HDBC to query relational data inside an XQuery. For the HXQ compiler, the main function that allows database connectivity is:

$(xqdb query) :: (IConnection conn) => conn -> IO XSeq

do db <- connect "myDB"
   result <- $(xqdb xquery) db

xqueryDB :: (IConnection conn) => String -> conn -> IO XSeq

Currently, HXQ works with SQLite only, but is very easy to make it work with any relational database that supports ODBC: simply install HDBC-odbc and change the file src/withDB/Text/XML/HXQ/DBConnect.hs accordingly.

Querying an Existing Database

An XQuery may contain multiple SQL queries in the form sql(query,args), where query is the sql query that may contain parameters (denoted by ?), which are bound to the values in args (an XSeq). An example can be found in TestDB.hs. To run this example, you need to install the company database. For example, using the sqlite3 interpreter, you do:

sqlite3 myDB
.read data/company.sql
.quit

TestDB.hs

Shredding

To store an XML document into a relational database, use the following Haskell function:

shred :: (IConnection conn) => conn -> String -> String -> IO ()
shred db file name

do db <- connect "myDB"
   shred db "data/cs.xml" "c"

The Haskell function

printSchema db name

createIndex db name tagname

Publishing

You can query a shredded XML document using the XQuery function:

publish(dbame,name)

TestDB2.hs

Last modified: 08/23/08 by Leonidas Fegaras