home
about
services
contact

Where am I?

Group By in XQuery 1.0 for MarkLogic Server

August 23, 2011 at 02:12 PM | categories: XQuery, MarkLogic | View Comments

XQuery 3.0 introduces new syntax for "group by". At this time, MarkLogic Server 4.2 is the latest release, and it doesn't have support for that syntax. So how can we implement "group by" when writing XQuery for MarkLogic?

Let's start with the W3C use cases. First, let's fetch the sample data and put it into MarkLogic. We can do that using cq. I'll leave out the schemas, since we don't need those. I also won't be exhaustive about optimizing every expression in these examples: suffice to say that there is room for even more improvement.

Sorry about the long block of code, but we need those documents. Paste that into cq, evaluate it, and you should get the empty sequence. That means your documents were inserted correctly: you can use the 'explore' link to check.

The cq explorer shows the W3C test documents.

Now we can write some queries. Here is the first use case (Q1).

And the result should look like this:

We can't write XQuery 3.0 using XQuery 1.0 — but we can get the same result using an extra distinct-values step.

This code is a little awkward, though. Instead of looping through the records once, we have to perform a database lookup on each product name. Normally this would be an unavoidable cost, and perhaps a reason to look forward to XQuery 3.0. But MarkLogic gives us a way to cheat, and use an accumulator model to get the same result more quickly. I'm talking about maps.

This produces the same output, and will scale better the distinct-values() version would. Of course it is also less portable. But database application developers often have to implement non-portable optimizations, and the less portable code can be segregated into its own library modules.

Now let's look at the next example (Q2).

Expected result:

Here is a solution using maps:

Again, this solution produces the same results. This time we had two elements in the grouping key, and the map key must be a string. So we had to use an old database trick and concatenate the two values with a known delimiter. Naturally we have to be careful in our choice of delimiter.

For the remaining queries, I'll skip the W3C examples and output XML. Here are my solutions. Again, these return the desired results, but could benefit from more optimization work.

This final use-case is kind of odd, because the sample code works if you simply comment out the "group by". In other words, the sample data only contains one group. But I reimplemented it anyway.

That's it. I hope this was worth your time.

Read and Post Comments