HXQ with Database Connectivity

Download HXQ-0.19.0.tar.gz

Installation Instructions (HXQ with database connectivity)

You may use either MySQL or sqlite. The easiest to install is sqlite.

Installation with MySQL

Here is a MySQL on Linux Tutorial. To install the MySQL database server and the MySQL/ODBC driver on Linux you do:

yum install mysql mysql-devel mysql-server unixODBC-devel mysql-connector-odbc
and you may use the following sample top-level file .odbc.ini:
[ODBC Data Sources]
HXQ     = MyODBC 3.51 Driver DSN

[HXQ]
Driver       = /usr/lib/libmyodbc3.so
Description  = Connector/ODBC 3.51 Driver DSN
user=root
password=xxxxx
option=262144
(Make sure that your username/password works and that the Driver has the correct path.) Then, start the mysql server (using service mysqld start as root on Linux) and create a database using the mysql command create database hxq.

The simplest way to install HXQ with MySQL is to use cabal (see HXQ without Database Connectivity):

cabal install HXQ -fmysql
and then compile xquery.hs. Alternatively, you can download and install the packages HDBC and HDBC-odbc. Then you untar HXQ and inside the HXQ directory you do:
runhaskell Setup.lhs configure --user -fmysql
runhaskell Setup.lhs build
runhaskell Setup.lhs install

Installation with sqlite

To use sqlite, you need to install SQLite. On Linux, you can install it using yum install sqlite. The simplest way to install HXQ with sqlite is to use cabal (see HXQ without Database Connectivity):

cabal install HXQ -fsqlite
and then compile xquery.hs. Alternatively, you can download and install the packages HDBC and HDBC-sqlite3. Then you untar HXQ and inside the HXQ directory you do:
runhaskell Setup.lhs configure --user -fsqlite
runhaskell Setup.lhs build
runhaskell Setup.lhs install

Working with Databases

HXQ provides an interface to HDBC to query relational data inside an XQuery. For the HXQ compiler, the main function that allows database connectivity is:

$(xqdb query) :: Connection -> IO XSeq
For example, if the database name is "hxq", then
do db <- connect "hxq"
   result <- $(xqdb xquery) db
For the HXQ interpreter, the function is:
xqueryDB :: String -> Connection -> IO XSeq
The xquery executable can also run XQueries that use a database by specifying the database name using the -db option, eg. xquery -db hxq.

Querying an Existing Database

An XQuery may contain multiple SQL queries in the form sql(query,args), where query is the sql query that may contain parameters (denoted by ?), which are bound to the values in args (an XSeq). An example can be found in tests/TestDB.hs. To run this example, you need to install the company database (using source data/company.sql in mysql or .read data/company.sql in sqlite3) and then compile and run tests/TestDB.hs.

Shredding

To synthesize a relational schema schemaname to store an XML document located at pathname, use the following Haskell function:

genSchema :: Connection -> String -> String -> [String] -> IO ()
genSchema db pathname schemaname excludedtags
for a database db, where the excluded tags are HTML tags to be ignored (skipped) when the XML data are document-centric. HXQ will find a good relational schema (using hybrid inlining) to store the XML data by scanning the document to extract its structural summary and then deriving a good relational schema from the summary. To actually store the data from the XML document into the relational schema, use the following Haskell function:
shred :: Connection -> String -> String -> IO ()
shred db pathname schemaname
For example,
do db <- connect "hxq"
   genSchema db "data/cs.xml" "c" []
   shred db "data/cs.xml" "c"

The Haskell function

printSchema db schemaname
displays the relational schema for the shredded document under the given schemaname, while
createIndex db schemaname tagname
creates a secondary index on tagname for the shredded document.

Publishing

You can query a shredded XML document using the XQuery function:

publish(dbame,schemaname)
where dbname is the database file name and schemaname is the unique schema name assigned to the XML document when was shredded. The translation from XQuery to SQL is done at compile-time, so both dbname and schemaname must be constant strings. HXQ will do its best to push relevant predicates to the generated SQL query (using partial evaluation and code folding), thus deriving an efficient execution. One example is tests/TestDB2.hs.

Updates

The XQuery syntax has been extended with the following expressions in HXQ:

   insert e1 into e2
   delete from e
   replace e2 with e1
where e2 and e are XQuery expressions that return XML elements extracted from shredded documents stored in the relational database. That is, you can not update XML text files or constructed XML elements. In addition, e2 must return a singleton sequence (exactly one XML element) and the returned XML sequence from e1 must be compatible to the document structural summary at the point of the update. These updates are evaluated using SQL updates over the underlying relational database. They all return (). The insert expression makes e1 a new child of e2. The delete expression removes the element e from its parent. The replace expression replaces e2 with e1. These updates are not automatically committed; the programmer must commit them explicitly using the commit db function. (The xquery program though commits at the end of each xquery automatically.)

Examples of updates:

replace publish('hxq','c')//gradstudent[name/lastname="Smith"]/gpa with 3.7
insert <zip>12345</zip> into publish('hxq','c')//gradstudent[name/lastname="Smith"]/address
for $e in publish('hxq','e')//employee return replace $e/salary with $e/salary*1.5

Virtual Views

Virtual views are declared using the declare view syntax. For example:

declare view byAuthor ($x) { publish("hxq","d")//inproceedings[author=$x]/title }
Virtual views are like functions but they are macro-expanded before optimization (so they cannot be recursive),

Example: Installing and Querying the DBLP Database

First download and uncompress dblp.xml.gz from DBLP. To install the DBLP database, compile and execute tests/TestDBLP.hs. Then, you may evaluate queries, such as data/q4.xq, using the HXQ interpreter, xquery -db hxq -t data/q4.xq, which takes about 90 milliseconds with MySQL.

Status


Last modified: 01/09/10 by Leonidas Fegaras