Tuesday, April 15, 2008

XQuery and Native XML databases

I'm in the process of teaching myself XQuery and native XML databases. A simple way of understanding this areas is to draw parallels with the relational database world where XQuery=SQL and Native XML Database=RDBMS, XQuery Modules=Stored Procedures.

http://www.xml.com/pub/a/2002/10/16/xquery.html

The major difference being that instead of tables, columns and rows we have collections of XML documents. The significant advantage here being that an XML document contains all the data that is need to satisfy most queries. There may be need to join to other documents or external data but in general the object selected in an XQuery world is less likely to need a join than in a relational world where data has been normalised out.

The term "native" generally refers to the database being designed from the ground up to hold XML, rather than as something that has been added afterwards (thinks SQL XML). This means that the database has been designed to handle XML in an optimal fashion.

You could of course roll-your-own XML database system, but these products offer some key functionality that would take some time to implement:
  • Support for a w3c standard XQuery
  • holding large datasets (think about DOM models and holding terabytes of data)
  • Xml Schema support, especially in terms of query validation and optimization, (think sorting of date fields)
  • Indexes built on XML structures (using schemas to allow better understanding)and you can see that its a non-trivial product to recreate.
My background means that I used to be SQL mad, modelling everything in SQL and using it to store everything, then about the time of XML (about 10 years ago) I became disillusioned with the whole RDBMS solution in the context of XML aware applications.

If you have a C# application that uses XML serialisation in memento patterns to store state and support undo/command patterns you start to think that putting in an RDBMS is not adding anything to the equation. In fact you have to create data models, write stored procedures and data access layers. Whilst MS have made strides in making this process simpler the whole process is questionable when you know your code already has a XML representation for its classes.

So imagine being able to put those serialised objects as XML into a database that supported it out of the box, that only required you to supply a schema (optional for simple apps) and gave you a general purpose query tool (XQuery) that could do all the things that SQL could do but understood XML.

Now don't get me wrong RDBMS are not dead, if you have fact-type data that looks like you could easily put it in Excel then its probably best in an RDBMS. However if your data looks like an XML document (tree like) or is mostly narrative XML rather than data XML then maybe, just maybe you should look at a native database:

http://www.rpbourret.com/xml/XMLAndDatabases.htm

No comments: