Where am I?

Rebalancing for CoRB

November 01, 2011 at 08:50 PM | categories: XQuery, MarkLogic | View Comments

I've written some quick scripts for rebalancing forests in a MarkLogic Server database. This leverages CoRB, and makes the job fairly simple. So if you add more forests to a database, and don't have the luxury of clearing and reloading, I hope this code will help.

Read and Post Comments

MarkLogic 5.0 - First Look

November 01, 2011 at 12:24 PM | categories: XQuery, MarkLogic | View Comments

In case you have missed the news, MarkLogic Server 5.0-1 is now available. The upgrade went smoothly for me, but this is a major release so it is wise to back up your databases and configuration before upgrading. The on-disk forest version appears to have changed, which will trigger reindexing of all forests. You may want to manually disable reindexing before upgrading, so that you don't have to contend with multiple forests trying to reindex at the same time.

This is also a good time to double-check your free disk space, since reindexing uses extra disk space. Some of that space won't be released when reindexing finishes, either. For example, one of my forests looked like this:

This forest is holding on to over 2-GiB of deleted fragments.

You can purge those deleted fragments by forcing a merge of the forest, or of the entire database. After doing this, my forest used less disk space.

After the forced merge, the deleted fragments are gone and the forest is smaller.

This new release is stricter about unquoted attributes. With previous releases this would generally work, even though the XQuery 1.0 Recommendation requires quoted attribute values:

<test a={xdmp:random()}/>

Now it throws an XDMP-UNEXPECTED error. Quote the attribute value correctly, and the problem is fixed.

<test a="{xdmp:random()}"/>

I'm looking forward to learning more about the 5.0 release, but so far it looks good.

Read and Post Comments

Yet another search parser - XQYSP

October 24, 2011 at 01:19 PM | categories: XQuery, MarkLogic | View Comments

If you need something a little more sophisticated that the search parser built into the MarkLogic search API, give XQYSP a try. It supports nested groups, range queries, near queries with distance and ordering, and should be fairly easy to extend.

XQYSP takes a slightly different approach than the Search API or the older lib-parser.xqy, both of which returned cts:query items. Instead, XQYSP returns an abstract syntax tree (AST) as XML. It is up to you, the caller, to transform that AST into a cts:query. That is a little more work for you, but adds a lot of flexibility at the same time. Most of the tasks that used to go into lib-parser-custom.xqy can now be implemented without changing the parser itself. To make it easier to get started, though, I have provided sample code to generate a query from an AST. I hope it is useful.

Read and Post Comments

XQUT - Unit Testing in Pure XQuery

September 13, 2011 at 04:06 PM | categories: XQuery, MarkLogic | View Comments

I was working on a couple of pure XQuery projects that needed unit testing. While I could have integrated with JUnit or another existing framework, I really wanted something simple that I could run directly from cq. Hence XQUT.

XQUT will usually be invoked like this:

The cq app server should point to the code you are testing, so that your test suite can import libraries. The eval root is different: it is the location of the XQUT code, so that you only need one copy of XQUT. The external variable SUITE is an XML test suite. A simple test suite might look like this:

The XML is fairly simple. Under the root suite element we have one or more unit elements, each representing a test. The test XQuery can be defined as the lexical value of the element, or as its expr child. The result can be defined by a result attribute or element.

For more sophisticated tests, you can add xsi:type attributes and sequences of result elements. You can also use an optional environment element to import libraries, define variables, and define namespace prefixes. If you add setup elements, these will be evaluated before any tests. The test suite for XQUT itself contains more examples.

Read and Post Comments

Group By in XQuery 1.0 for MarkLogic Server

August 23, 2011 at 02:12 PM | categories: XQuery, MarkLogic | View Comments

XQuery 3.0 introduces new syntax for "group by". At this time, MarkLogic Server 4.2 is the latest release, and it doesn't have support for that syntax. So how can we implement "group by" when writing XQuery for MarkLogic?

Let's start with the W3C use cases. First, let's fetch the sample data and put it into MarkLogic. We can do that using cq. I'll leave out the schemas, since we don't need those. I also won't be exhaustive about optimizing every expression in these examples: suffice to say that there is room for even more improvement.

Sorry about the long block of code, but we need those documents. Paste that into cq, evaluate it, and you should get the empty sequence. That means your documents were inserted correctly: you can use the 'explore' link to check.

The cq explorer shows the W3C test documents.

Now we can write some queries. Here is the first use case (Q1).

And the result should look like this:

We can't write XQuery 3.0 using XQuery 1.0 — but we can get the same result using an extra distinct-values step.

This code is a little awkward, though. Instead of looping through the records once, we have to perform a database lookup on each product name. Normally this would be an unavoidable cost, and perhaps a reason to look forward to XQuery 3.0. But MarkLogic gives us a way to cheat, and use an accumulator model to get the same result more quickly. I'm talking about maps.

This produces the same output, and will scale better the distinct-values() version would. Of course it is also less portable. But database application developers often have to implement non-portable optimizations, and the less portable code can be segregated into its own library modules.

Now let's look at the next example (Q2).

Expected result:

Here is a solution using maps:

Again, this solution produces the same results. This time we had two elements in the grouping key, and the map key must be a string. So we had to use an old database trick and concatenate the two values with a known delimiter. Naturally we have to be careful in our choice of delimiter.

For the remaining queries, I'll skip the W3C examples and output XML. Here are my solutions. Again, these return the desired results, but could benefit from more optimization work.

This final use-case is kind of odd, because the sample code works if you simply comment out the "group by". In other words, the sample data only contains one group. But I reimplemented it anyway.

That's it. I hope this was worth your time.

Read and Post Comments