HXQ is a fast and space-efficient translator from XQuery to embedded Haskell code (the translation is based on GHC templates). It takes full advantage of Haskell's lazy evaluation to keep in memory only those parts of XML data needed at each point of evaluation, thus performing stream-based evaluation. This results to an implementation that is as fast and space-efficient as any stream-based implementation based on SAX filters or finite state machines. Furthermore, the coding is far simpler and extensible since its based on XML trees, rather than SAX events (both the code generator and the the XQuery parser are about 400 lines each). For example, the XQuery below against the DBLP XML database (420MB) runs in 1.5 minutes on my laptop. My high-performance Java implementation of XQuery, called XQPull, which is stream-based (based on a StAX events), took about the same time.
If you haven't done so, install the Glasgow Haskell Compiler, ghc, and the parser generator for Haskell, happy. For example, in Fedora Linux, you install both using:
yum install ghc happyThen download HXQ and untar it. You can either use cabal or make to build it. Then run xquery to test drive it.
HXQ supports most essential XQuery features, although some system functions are missing (but are easy to add). It does not have static typechecking; it leaves all checking to Haskell. This means that it distinguishes regular predicates from indexing at run time: if an XPath predicate returns an integer at run time, it is indexing and this index is checked against the current node position. The most important omission is backward step axes, such as /.. (parent). Some, but not all, parent axis steps are removed using optimization rules, all others cause a compilation error.
HXQ represents XML data as rose trees:
type Tag = String type AttList = [(String,String)] data XTree = XElem Tag AttList [XTree] | XText String | XInt Int | XFloat Float type XSeq = [XTree]which basically means that one cannot implement backward navigation steps directly. The main functions for embedding XQueries in Haskell are:
Here is an example of a main program:
f(x,y) = $(xe "<article><first>{$x}</first><second>{$y}</second></article>") main = do a <- $(xq ("<result>{ " ++" for $x in doc('data/dblp.xml')//inproceedings at $i " ++" where $x/author = 'Leonidas Fegaras' " ++" orderby $x/year descending " ++" return <paper>{ $i, ') ', $x/booktitle/text(), " ++" ': ', $x/title/text() " ++" }</paper> " ++" }</result> ")) putStrLn (show a) b <- $(xq " f( $a/paper[10], $a/paper[8] ) ") putStrLn (show b)Another example, can be found in Main.hs.
If you want to extend HXQ, you need to read about Template Haskell. Use make test to do various translation tests in ghci. You can look at the Abstract Syntax Trees and the generated code from a query by evaluating cq query. Basically, the way you generate Haskell code is similar to operational semantics (where [| ... |] are syntactic brackets).
Last modified: 03/18/08 by Leonidas Fegaras