Figment Engine: XQuery

Showing posts with label XQuery. Show all posts

Wednesday, May 21, 2008

XQuery and Unit testing - Part III (xqUnit)

A short post this time - I think that


(: a test definition :)
 let $test :=
  <test>
   <method>
    let $book:= /bib/book
   </method>
   <asserts>
    <assertTrue name="count nodes">
     return count($book) = 4
    </assertTrue>
    <assertTrue name="find by name">
     return $book[1]/title = "TCP/IP Illustrated"
    </assertTrue>
    <assertTrue name="find by author">
     return count($book[author/last = "Stevens"]) = 2
    </assertTrue>
    <assertTrue name="find by author case sensitivty">
     return count($book[author/last = "stevens"]) = 0
    </assertTrue>
   </asserts>
  </test>

looks better from a naming point of view, it gives a better idea of intent and opens up the options for other kinds of asserts.

Once we apply a schema to this model, the xqunit:excute function can be locked down to expecting an assert rather than any node...

Sunday, May 18, 2008

XQuery and Unit testing - Part II (xqUnit)

So I've started using MarkLogic as the basis for my unit testing framework, the choice being premised on this is the system I will be coding most of my XQuery against.

In terms of the framework (which I have named xqUnit) I am trying to base it as much as possible on the xUnit approach. I've been reading Gerard Meszaroses book in order to get a better understanding of this, and it is very well written:

Ultimately I want to have a test runner which is web based with a nice progress bar that goes green. I also want my test framework to be testable. However my first step is to write a spike to see how this problem could be solved:


(: first ever version :)
declare namespace xqunit = "http://xqunit.FigmentEngine.com/v0.0.1"

(: a function to take some method code and some test code, join them together and run them :)
define function xqunit:execute($method as node(), $test as node())
{
 let $code := string-join(($method/text(), $test/text()),
  codepoints-to-string(13))

 let $result :=
  try
  {
   <result> { xdmp:eval($code) } </result>
  }
  catch ($exception)
  {
   <result exception="{$exception/err:format-string}"> { $exception/err:stack/err:frame[1] } </result>
  }

 return <test name="{$test/@name}"> { $result } </test>
}

(: a test definition :)
let $test :=
 <test>
  <method>
   let $book:= /bib/book
  </method>
  <exercise name="count nodes">
   return count($book) = 4
  </exercise>
  <exercise name="find by name">
   return $book[1]/title = "TCP/IP Illustrated"
  </exercise>
  <exercise name="find by author">
   return count($book[author/last = "Stevens"]) = 2
  </exercise>
  <exercise name="find by author case sensitivty">
   return count($book[author/last = "stevens"]) = 0
  </exercise>
 </test>

(: exercise each test and show the results of the run :)
for $exercise in $test/exercise
let $result := xqunit:execute($test/method, $exercise)
return <testrun> { $result } </testrun>

The installation of MarkLogic includes some test data which is based on the XML Query Use Cases so I have used them as the basis for my initial datasets. This code shows my basic approach, which is:

a xml schema (not defined yet) which defines a test with a method and a set of exercises (I'm need a better name for this)
the framework takes the test method and combines the code in the test methd with the code in the exercise
the framework executes the joined code (xdmp:eval is MarkLogic's dynamic method)
this results in either a result or an exception if the code is invalid (this uses MarkLogics extension to XQuery in order to catch the exception). Catching exceptions like this is useful for picking up errors in the code, but could also be used for testing (think xqunit:expectException)
this process is repeated for all exercises and output as a set of xml results

In principle this approach works, though a few concerns appear:

the test, method and exercise are badly named, I need to align this better with xUnit
the approach could be inefficient (the method gets re-executed each time)
it looks like method is more of a setup and the exercises are the tests..
the exercises have to return true or false for success, this could be more obvious if we had code like "return xqunit:assertEqual(count($book), 4)"

Reading the code, it seems obvious what a test is and what it is trying to do - so I am happy with that. The approach needs to consider two kinds of testing:

data testing, where we are testing that the data in the database is correct
function testing, where we are testing that a function works correctly

Some tests may fall into both camps, but I think it may be a useful way of loooking at the user stories for the framework.

This seems a good spike for now, so I've started the process of building a site http://xqunit.figmentengine.com/ and hopefully I can start to drive through a story based approach.

Sunday, April 27, 2008

XQuery and Unit testing - Part I (xqUnit)

so as part of the process of learning XQuery I wanted to use a unit test approach to document my understanding. Doing some research quickly shows theres not much going on in this area, bumblebee is the most mentioned in http://jimfuller.blogspot.com/2007/09/poor-mans-unit-testing-with-xquery.html however there does not appear to be any information on the referenced site so I'm not sure it exists anymore.

Another option is mentioned at http://jimfuller.blogspot.com/2007/09/poor-mans-unit-testing-with-xquery.html which is a kind of roll-your-own solution. I think that this may be the way to go, but I want to add a few more things:

create the test methods on the basis on unit test best practice (if there is such a thing) this would involve some research to see if there is a meta-process for creating unit test systems - maybe xUnit will provide a template for this. I'm guessing a lot of this is premised on the testability of the harness itself (better watch out for a Gödel brick wall on this http://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems)
Make the functions more useful in a XQuery world, specifically being able to test if an XPath expression is true, this would allow me to test that a query result contains expected data - maybe this could be a set of expressions. This would be more powerful and flexible than just having "expected XML" output.

Anyway, some research first and then I can post up my first attempt (which will of course be pants)

Tuesday, April 15, 2008

XQuery and Native XML databases

I'm in the process of teaching myself XQuery and native XML databases. A simple way of understanding this areas is to draw parallels with the relational database world where XQuery=SQL and Native XML Database=RDBMS, XQuery Modules=Stored Procedures.

http://www.xml.com/pub/a/2002/10/16/xquery.html

The major difference being that instead of tables, columns and rows we have collections of XML documents. The significant advantage here being that an XML document contains all the data that is need to satisfy most queries. There may be need to join to other documents or external data but in general the object selected in an XQuery world is less likely to need a join than in a relational world where data has been normalised out.

The term "native" generally refers to the database being designed from the ground up to hold XML, rather than as something that has been added afterwards (thinks SQL XML). This means that the database has been designed to handle XML in an optimal fashion.

You could of course roll-your-own XML database system, but these products offer some key functionality that would take some time to implement:

Support for a w3c standard XQuery
holding large datasets (think about DOM models and holding terabytes of data)
Xml Schema support, especially in terms of query validation and optimization, (think sorting of date fields)
Indexes built on XML structures (using schemas to allow better understanding)and you can see that its a non-trivial product to recreate.

My background means that I used to be SQL mad, modelling everything in SQL and using it to store everything, then about the time of XML (about 10 years ago) I became disillusioned with the whole RDBMS solution in the context of XML aware applications.

If you have a C# application that uses XML serialisation in memento patterns to store state and support undo/command patterns you start to think that putting in an RDBMS is not adding anything to the equation. In fact you have to create data models, write stored procedures and data access layers. Whilst MS have made strides in making this process simpler the whole process is questionable when you know your code already has a XML representation for its classes.

So imagine being able to put those serialised objects as XML into a database that supported it out of the box, that only required you to supply a schema (optional for simple apps) and gave you a general purpose query tool (XQuery) that could do all the things that SQL could do but understood XML.

Now don't get me wrong RDBMS are not dead, if you have fact-type data that looks like you could easily put it in Excel then its probably best in an RDBMS. However if your data looks like an XML document (tree like) or is mostly narrative XML rather than data XML then maybe, just maybe you should look at a native database:

http://www.rpbourret.com/xml/XMLAndDatabases.htm