Rebalancing for CoRB
November 01, 2011 at 08:50 PM | categories: XQuery, MarkLogic | View CommentsMarkLogic 5.0 - First Look
November 01, 2011 at 12:24 PM | categories: XQuery, MarkLogic | View CommentsThis is also a good time to double-check your free disk space, since reindexing uses extra disk space. Some of that space won't be released when reindexing finishes, either. For example, one of my forests looked like this:
You can purge those deleted fragments by forcing a merge of the forest, or of the entire database. After doing this, my forest used less disk space.
This new release is stricter about unquoted attributes. With previous releases this would generally work, even though the XQuery 1.0 Recommendation requires quoted attribute values:
<test a={xdmp:random()}/>
Now it throws an
XDMP-UNEXPECTED
error. Quote the attribute value correctly, and the problem is fixed.
<test a="{xdmp:random()}"/>
I'm looking forward to learning more about the 5.0 release, but so far it looks good.
Yet another search parser - XQYSP
October 24, 2011 at 01:19 PM | categories: XQuery, MarkLogic | View CommentsXQYSP takes a slightly different approach than the Search API or the older lib-parser.xqy, both of which returned cts:query items. Instead, XQYSP returns an abstract syntax tree (AST) as XML. It is up to you, the caller, to transform that AST into a cts:query. That is a little more work for you, but adds a lot of flexibility at the same time. Most of the tasks that used to go into lib-parser-custom.xqy can now be implemented without changing the parser itself. To make it easier to get started, though, I have provided sample code to generate a query from an AST. I hope it is useful.
XQUT - Unit Testing in Pure XQuery
September 13, 2011 at 04:06 PM | categories: XQuery, MarkLogic | View CommentsI was working on a couple of pure XQuery projects that needed unit testing. While I could have integrated with JUnit or another existing framework, I really wanted something simple that I could run directly from cq. Hence XQUT.
XQUT will usually be invoked like this:
The cq app server should point to the code you are testing,
so that your test suite can import libraries. The eval root is
different: it is the location of the XQUT code, so that you only need
one copy of XQUT. The external variable SUITE
is an XML
test suite. A simple test suite might look like this:
The XML is fairly simple. Under the root suite
element we have one or more unit
elements, each
representing a test. The test XQuery can be defined as the lexical
value of the element, or as its expr
child. The result
can be defined by a result
attribute or element.
For more sophisticated tests, you can
add xsi:type
attributes and sequences
of result
elements. You can also use an
optional environment
element to import libraries, define
variables, and define namespace prefixes. If you
add setup
elements, these will be evaluated before any
tests. The test
suite for XQUT itself contains more examples.
Group By in XQuery 1.0 for MarkLogic Server
August 23, 2011 at 02:12 PM | categories: XQuery, MarkLogic | View CommentsXQuery 3.0 introduces new syntax for "group by". At this time, MarkLogic Server 4.2 is the latest release, and it doesn't have support for that syntax. So how can we implement "group by" when writing XQuery for MarkLogic?
Let's start with the W3C use cases. First, let's fetch the sample data and put it into MarkLogic. We can do that using cq. I'll leave out the schemas, since we don't need those. I also won't be exhaustive about optimizing every expression in these examples: suffice to say that there is room for even more improvement.
Sorry about the long block of code, but we need those documents. Paste that into cq, evaluate it, and you should get the empty sequence. That means your documents were inserted correctly: you can use the 'explore' link to check.
Now we can write some queries. Here is the first use case (Q1).
And the result should look like this:
We can't write XQuery 3.0 using XQuery 1.0 — but we can get the same result using an extra distinct-values step.
This code is a little awkward, though. Instead of looping through the records once, we have to perform a database lookup on each product name. Normally this would be an unavoidable cost, and perhaps a reason to look forward to XQuery 3.0. But MarkLogic gives us a way to cheat, and use an accumulator model to get the same result more quickly. I'm talking about maps.
This produces the same output, and will scale better the
distinct-values()
version would. Of course it is also less
portable. But database application developers often have to implement
non-portable optimizations, and the less portable code can be
segregated into its own library modules.
Now let's look at the next example (Q2).
Expected result:
Here is a solution using maps:
Again, this solution produces the same results. This time we had two elements in the grouping key, and the map key must be a string. So we had to use an old database trick and concatenate the two values with a known delimiter. Naturally we have to be careful in our choice of delimiter.
For the remaining queries, I'll skip the W3C examples and output XML. Here are my solutions. Again, these return the desired results, but could benefit from more optimization work.
This final use-case is kind of odd, because the sample code works if you simply comment out the "group by". In other words, the sample data only contains one group. But I reimplemented it anyway.
That's it. I hope this was worth your time.