<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
     xmlns:atom="http://www.w3.org/2005/Atom"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:wfw="http://wellformedweb.org/CommentAPI/"
     >
  <channel>
    <title>Where am I?</title>
    <link>http://blakeley.com/blogofile</link>
    <description>Performance, scalability, databases, and whatever comes up.</description>
    <pubDate>Fri, 18 May 2012 01:00:51 GMT</pubDate>
    <generator>Blogofile</generator>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <item>
      <title>rsyslog and MarkLogic</title>
      <link>http://blakeley.com/blogofile/2012/05/17/rsyslog-and-marklogic</link>
      <pubDate>Thu, 17 May 2012 18:00:01 UTC</pubDate>
      <category><![CDATA[MarkLogic]]></category>
      <category><![CDATA[Linux]]></category>
      <guid isPermaLink="true">http://blakeley.com/blogofile/2012/05/17/rsyslog-and-marklogic</guid>
      <description>rsyslog and MarkLogic</description>
      <content:encoded><![CDATA[<p>You probably know that MarkLogic Server logs important events
to the <code>ErrorLog.txt</code> file. By default it logs events at <code>INFO</code> or higher,
but many development and staging environments change the <code>file-log-level</code>
to <code>DEBUG</code>. These log levels are also available to the <code>xdmp:log</code> function,
and some of your XQuery code might use that for <code>printf</code>-style debugging.</p>
<p>You might even know that MarkLogic also sends important events
to the operating system. On linux this means <code>syslog</code>, and important events
are those at <code>NOTICE</code> and higher by default.</p>
<p>But are you monitoring these events?</p>
<p>How can you set up your MarkLogic deployment so that it will automatically
alert you to errors, warnings, or other important events?</p>
<p>Most linux deployments now use <code>rsyslog</code> as their system logging facility.
The <a href="http://www.rsyslog.com/doc/manual.html">full documentation</a> is available,
but this brief tutorial will show you how to set up email alerts for MarkLogic
using <code>rsyslog</code> version 4.2.6.</p>
<p>All configuration happens in <code>/etc/rsyslog.conf</code>.
Here is a sample of what we need for email alerts.
First, at the top of the file you should see several <code>ModLoad</code> declarations.
Check for <code>ommail</code> and add it if needed.</p>
<pre><code>$ModLoad ommail.so  # email support
</code></pre>
<p>Next, add a stanza for MarkLogic somewhere after the <code>ModLoad</code> declaration.</p>
<pre><code># MarkLogic
$template MarkLogicSubject,"Problem with MarkLogic on %hostname%"
$template MarkLogicBody,"rsyslog message from MarkLogic:\r\n[%timestamp%] %app-name% %pri-text%:%msg%"
$ActionMailSMTPServer 127.0.0.1
$ActionMailFrom your-address@your-domain
$ActionMailTo your-address@your-domain
$ActionMailSubject MarkLogicSubject
#$ActionExecOnlyOnceEveryInterval 3600
daemon.notice   :ommail:;MarkLogicBody
</code></pre>
<p>Be sure to replace both instances of <code>your-address@your-domain</code>
with an appropriate value. The ActionMailSMTPServer must be smart enough
to deliver email to that address. I used a default <code>sendmail</code> configuration
on the local host, but you might choose to connect to a different host.</p>
<p>Note that I have commented out the <code>ActionExecOnlyOnceEveryInterval</code> option.
The author of <code>rsyslog</code>, <a href="http://www.gerhards.net/rainer">Rainer Gerhards</a>,
recommends setting this value to a reasonably high number of seconds
so that your email inbox is not flooded with messages.
However, the <code>rsyslog</code> documentation states that excess messages
are discarded, and I did not want to loose any important messages.
What I would really like to do is buffer messages for N seconds at a time,
and merge them together in one email.
But while <code>rsyslog</code> has many features, and does offer buffering,
it does not seem to know how to combine consecutive messages
into a single email.</p>
<p>Getting back to what <code>rsyslog</code> <em>can</em> do,
you can customize the subject and body of the mail message.
With the configuration above, a restart of the server
might send you an email like this one:</p>
<pre><code>Subject: Problem with MarkLogic on myhostname.mydomain

rsyslog message from MarkLogic:
[May 17 23:58:36] MarkLogic daemon.notice&lt;29&gt;: Starting MarkLogic Server 5.0-3 i686 in /opt/MarkLogic with data in /var/opt/MarkLogic
</code></pre>
<p>When making any <code>rsyslog</code> changes, be sure to restart the service:</p>
<pre><code>sudo service rsyslog restart
</code></pre>
<p>At the same time, check your system log for any errors or typos.
This is usually <code>/var/log/messages</code> or <code>/var/log/syslog</code>.
The full documentation for <a href="http://www.rsyslog.com/doc/property_replacer.html">template substitution properties
</a> is online.
You can also read about a wealth of other options available in <code>rsyslog</code>.</p>]]></content:encoded>
    </item>
    <item>
      <title>Directory Assistance</title>
      <link>http://blakeley.com/blogofile/2012/03/19/directory-assistance</link>
      <pubDate>Mon, 19 Mar 2012 12:34:56 UTC</pubDate>
      <category><![CDATA[MarkLogic]]></category>
      <guid isPermaLink="true">http://blakeley.com/blogofile/2012/03/19/directory-assistance</guid>
      <description>Directory Assistance</description>
      <content:encoded><![CDATA[<p>For a long time now, MarkLogic Server has implemented two distinct features
that are both called "directories". This causes confusion, especially since one
of these features scales well and the other often causes scalability problems.
Let's try to distinguish between these two features,
and talk about why they both exist.</p>
<p>Directories were first introduced to accomodate WebDAV.
Since WebDAV clients treat the database as if it were a filesystem,
they expect document URIs with the solidus, or <code>/</code>,
to imply directory structure. That's one feature called "directories":
if you insert a document with the URI <code>/a/b/c.xml</code>, you can call
<code>xdmp:directory('/a/b/', '1')</code> to select that document -
and any other document with the same URI prefix. These URI prefixes
are indexed in much the same way that document URIs and collection URIs
are indexed, so queries are "searchable" and scale well.</p>
<p>This "implied directory structure" works with any database configuration.
You do not need <code>directory-creation=automatic</code>
to use the <code>cts:directory-query</code> and <code>xdmp:directory</code> functions.</p>
<script src="https://gist.github.com/2127471.js?file=gistfile1.xq"></script>

<p>This returns a query plan in XML:</p>
<script src="https://gist.github.com/2127484.js?file=gistfile1.xml"></script>

<p>But WebDAV clients expect more than just directory listings.
They also want to lock documents and directories.
It is easy to understand document locking: the idea here is that
a WebDAV-aware editor might lock a document, copy it to the local filesystem
for editing, and copy it back to the server when the editing session ends.
It may be less clear that a WebDAV client sometimes needs to lock directories,
but it does.</p>
<p>Directory locking is implemented using special directory fragments.
There are no documents associated with these properties,
so they are sometimes called "naked properties."
Here is an example.</p>
<script src="https://gist.github.com/2127498.js?file=gistfile1.xq"></script>

<p>Once this update has committed to the database,
we can query the directory fragment.</p>
<script src="https://gist.github.com/2127503.js?file=gistfile1.xq"></script>

<script src="https://gist.github.com/2127509.js?file=gistfile1.xml"></script>

<p>Once you have a directory fragment, you have something that the database
can lock for WebDAV clients. It's rare for anything else
to use this behavior, but <code>xdmp:lock-acquire</code> is available for custom
content management applications.</p>
<p>Earlier I mentioned that there are two kinds of "directories",
one that scales well and one that sometimes causes problems.
I wrote that queries based on directory URIs scale well,
so you might guess that directory fragments sometimes cause problems.
That's correct, and it results from a database feature called
"automatic directory creation".</p>
<p>When automatic directory creation is enabled - as it is by default -
the database will ensure that directory fragments exist for every
implied directory in the URI for every new or updated document.
The document URI <code>/a/b/c.xml</code> implies a directory fragment
for <code>/</code>, <code>/a/</code>, and <code>/a/b/</code>. So the database will ensure that these exist
whenever a request updates <code>/a/b/c.xml</code>.</p>
<p>So what happens when one request updates <code>/a/b/c.xml</code>
and another request updates <code>/a/b/d.xml</code>?</p>
<p>Both requests try to ensure that there are directory fragments
for <code>/</code>, <code>/a/</code>, and <code>/a/b/</code>. This causes lock contention.
The same problem shows up if another request is updating <code>/fubar.xml</code>,
because both queries look for the <code>/</code> directory fragment.
The situation gets worse as concurrency increases.
It gets even worse if "maintain directory last-modified" is enabled,
because the directory fragments have to be updated too.
But happily that feature is not enabled by default.</p>
<p>The solution to this problem is simple. In my experience
at least 80% of MarkLogic Server customers do not use WebDAV,
so they do not need automatic directory creation. Instead,
they can set directory creation to "manual".
Do this whenever you create a new database,
or script it using <code>admin:database-set-directory-creation</code>.</p>
<p><img alt="admin UI screen shot" src="/blogofile/images/0109.directory-assistance.admin-UI.png" title="setting directory creation in the admin UI" /></p>
<p>If you do use WebDAV, try to limit its scope. Perhaps you can get by
with a limited number of predefined WebDAV directories,
which you create manually using <code>xdmp:directory-create</code>
as part of your application deployment.
Or perhaps you only use WebDAV for your XQuery modules,
which only contains a few hundred or at most a few thousand documents.
In that case you can use automatic directory creation without a problem.</p>
<p>Generally speaking, really large databases don't use WebDAV anyway.
"Big content" databases, with hundreds of millions or billions of documents,
tend to be much to large for WebDAV to be useful.
For smaller databases where WebDAV is useful,
automatic directory creation is fine.</p>
<p>Sometimes it is useful to set "directory-creation" to "manual-enforced".
With this configuration you will see an <code>XDMP-PARENTDIR</code> error
whenever your code tries to insert a document
with an implied directory structure
that does not have corresponding directory fragments.
But this feature is rarely used.</p>
<p>To sum up, directory URIs are highly scalable and very useful,
and are always indexed. Your code can call <code>xdmp:directory</code>
with any database settings.
The default "automatic directory creation" feature creates directory fragments,
which can be a bottleneck for large databases.
Most applications are better off with "directory-creation" set to "manual".</p>]]></content:encoded>
    </item>
    <item>
      <title>Let-free Style and Streaming</title>
      <link>http://blakeley.com/blogofile/2012/03/19/let-free-style-and-streaming</link>
      <pubDate>Mon, 19 Mar 2012 12:34:56 UTC</pubDate>
      <category><![CDATA[XQuery]]></category>
      <category><![CDATA[MarkLogic]]></category>
      <guid isPermaLink="true">http://blakeley.com/blogofile/2012/03/19/let-free-style-and-streaming</guid>
      <description>Let-free Style and Streaming</description>
      <content:encoded><![CDATA[<p>If you are familiar with Lisp or Scheme, you know that a function call can
replace a variable binding, and function calls can also replace most loops.
This is also true in XQuery.</p>
<script src="https://gist.github.com/2127325.js?file=gistfile1.xq"></script>

<script src="https://gist.github.com/2127351.js?file=gistfile1.txt"></script>

<p>In XQuery this leads to a style of coding that I call "let-free".
In this style, there are no FLWOR expressions.
Really this is "FLWOR-free", not "let-free",
but that's too much of a mouthful for me.</p>
<p>But why would you write let-free code?</p>
<p>The answer is scalability - you knew it would be, right?
This breaks out into concurrency and streaming.
Let's talk about concurrency first.
In the MarkLogic Server implementation of XQuery,
every <code>let</code> is evaluated in sequence. However, other expressions
are evaluated lazily with concurrency-friendy "future values".
So a performance-critical single-threaded request can sometimes
benefit from let-free style. You can see this technique in use
in some of my code:
the <a href="github.com/marklogic/semantic">semantic library</a>
or the <a href="github.com/mblakele/task-rebalancer">task-server forest rebalancer</a>.
Both of these projects try to benefit from multi-core CPUs.</p>
<p>The let-free style can also help with query scalability
by allowing the results to stream,
rather than buffering the entire result sequence.
If you need to export large result sets, for example,
this technique can help avoid <code>XDMP-EXPNTREECACHEFULL</code> errors.
Those errors result when your query's working set is too large
to fit in the expanded tree cache, a sort of scratch space for XML trees.
But streaming results don't have to fit into the cache.</p>
<p>For example, let's suppose you need to list every document URI in the database.
But you do not have the URI lexicon enabled,
and you cannot reindex to create it.</p>
<script src="https://gist.github.com/2127363.js?file=gistfile1.xq"></script>

<script src="https://gist.github.com/2127371.js?file=gistfile1.xq"></script>

<p>Note that nested evaluations cannot stream, either. So even a let-free query
may throw XDMP-EXPNTREECACHEFULL in cq or another development tool.
To test this query, use an http module instead.
This is ideal for web service implementations too.</p>
<p>In this example we used function mapping, a MarkLogic extension to XQuery 1.0.
If a function takes a single argument but is called using a sequence,
the evaluator simply maps the sequence to multiple function calls.
This is somewhat faster than a FLWOR, and it can stream.</p>
<p>Besides using function mapping, let-free style can use XPath steps.
However, this technique only works for sequences of nodes.</p>
<script src="https://gist.github.com/2127388.js?file=gistfile1.xq"></script>

<p>While these techniques are useful, they can make for code that is
hard to read and tricky to debug. Function mapping is especially prone to errors
that are difficult to diagnose. If a function signature specifies an argument
without a quantifier or with the <code>+</code> quantifier,
and the runtime argument is empty, the function will not be called at all.
This is surprising, since normally the function would be called
and would cause a strong typing error.</p>
<script src="https://gist.github.com/2127394.js?file=gistfile1.xq"></script>

<script src="https://gist.github.com/2127403.js?file=gistfile1.xq"></script>

<p>The first expression returns the empty sequence,
while the second throws the expected strong typing error <code>XDMP-AS</code>.
This behavior is annoying, but in some applications
the benefits of function mapping outweigh this drawback.
We can make debugging easier if we weaken the function signature
to <code>document-node()?</code> so that the function will be called
even when the argument is empty. If needed, we can include an explicit check
for empty input too.</p>
<p>Another let-free trick is to use module variables.
These act much like <code>let</code> bindings, but they can stream.</p>
<script src="https://gist.github.com/2127416.js?file=gistfile1.xq"></script>

<p>This example is a bit contrived, since the module variable doesn't add anything.
But if you find yourself struggling to refactor a <code>let</code> as a function call
or an XPath step, consider using a module variable.
Module variables are also excellent tools for avoiding repeated work,
since the right-hand expression is evaluated lazily and is never
evaluated more than once. If the evaluation does not use the module variable,
then the right-hand expression is never evaluated.
In contrast, the right-expression of a <code>let</code> is evaluated
even when the <code>return</code> does not use its value.</p>
<p>As always, do not optimize code unless there is a problem to solve.
There are also some situations where the let-free style isn't appropriate.
Aside from making your code harder to read and more difficult to debug,
let-free style simply doesn't work in situations where your FLWOR
would have an <code>order by</code> clause.
And after all, streaming won't work for that case anyway.
The evaluator can't sort the result set without buffering it first.</p>]]></content:encoded>
    </item>
    <item>
      <title>Conditional Profiling for MarkLogic</title>
      <link>http://blakeley.com/blogofile/2011/12/14/conditional-profiling-for-marklogic</link>
      <pubDate>Wed, 14 Dec 2011 15:16:17 UTC</pubDate>
      <category><![CDATA[XQuery]]></category>
      <category><![CDATA[MarkLogic]]></category>
      <guid isPermaLink="true">http://blakeley.com/blogofile/2011/12/14/conditional-profiling-for-marklogic</guid>
      <description>Conditional Profiling for MarkLogic</description>
      <content:encoded><![CDATA[<p>Today I pushed <a href="https://github.com/mblakele/cprof">cprof</a> to GitHub.
This XQuery library helps application developers
who need to retrofit existing applications with profiling capabilities.
Just replace all your existing calls to
<code>xdmp:eval</code>, <code>xdmp:invoke</code>, <code>xdmp:value</code>,
<code>xdmp:xslt-eval</code>, and <code>xdmp:xslt-eval</code> with corresponding <code>cprof:</code> calls.
Add a little logic around <code>cprof:enable</code> and <code>cprof:report</code>, and you are done.</p>]]></content:encoded>
    </item>
    <item>
      <title>Before you upgrade to 5.0-1</title>
      <link>http://blakeley.com/blogofile/archives/599</link>
      <pubDate>Thu, 03 Nov 2011 08:47:15 UTC</pubDate>
      <category><![CDATA[MarkLogic]]></category>
      <guid>http://blakeley.com/blogofile/archives/599</guid>
      <description>Before you upgrade to 5.0-1</description>
      <content:encoded><![CDATA[
Thinking about upgrading to <a href="http://developer.marklogic.com/">MarkLogic Server 5.0-1</a>?
<br/><br/>
As usual, back up everything. I haven't seen any data loss myself, but it is your data so be careful.
<br/><br/>
If you have made any changes to Docs (port 8000) or App Services (8002), the app-services portion of the upgrade won't happen (but the rest of the server will be fine). If you want to use the new monitoring services, you want that part of the upgrade to happen.
<br/><br/>
The fix is to revert your changes to ports 8000 and 8002. If you have repurposed either port for <a href="http://github.com/marklogic/cq/">cq</a>, you may want to go into cq and export all any *local* sessions before changing anything. Local sessions in cq are tied to local browser storage, which is tied to host and port, so you will lose access to them if you change the cq port. Not many folks seem to use cq's local sessions, but I thought I'd mention it. Whether you use cq on those ports or not, make sure port 8000 has root <code>Docs/</code> and 8002 has root <code>Apps/</code> or <code>Apps/appbuilder/</code> - you can see these checks in <code>Admin/lib/upgrade.xqy</code>, function <code>check-prereqs-50</code>.
<br/><br/>
If <code>upgrade.xqy</code> decides not to upgrade your App Services configuration, it will log a message "Skipping appservices upgrades, prerequisites not met." at level "error". The rest of the server will function correctly, but you won't get the appservices part of 5.0.<br/><br/>
]]></content:encoded>
    </item>
    <item>
      <title>Rebalancing for CoRB</title>
      <link>http://blakeley.com/blogofile/archives/597</link>
      <pubDate>Tue, 01 Nov 2011 20:50:34 UTC</pubDate>
      <category><![CDATA[XQuery]]></category>
      <category><![CDATA[MarkLogic]]></category>
      <guid>http://blakeley.com/blogofile/archives/597</guid>
      <description>Rebalancing for CoRB</description>
      <content:encoded><![CDATA[
I've written some quick scripts for <a href="https://github.com/mblakele/corb-rebalancer">rebalancing forests in a MarkLogic Server database</a>. This leverages CoRB, and makes the job fairly simple. So if you add more forests to a database, and don't have the luxury of clearing and reloading, I hope this code will help.<br/><br/>
]]></content:encoded>
    </item>
    <item>
      <title>MarkLogic 5.0 - First Look</title>
      <link>http://blakeley.com/blogofile/archives/577</link>
      <pubDate>Tue, 01 Nov 2011 12:24:23 UTC</pubDate>
      <category><![CDATA[XQuery]]></category>
      <category><![CDATA[MarkLogic]]></category>
      <guid>http://blakeley.com/blogofile/archives/577</guid>
      <description>MarkLogic 5.0 - First Look</description>
      <content:encoded><![CDATA[
In case you have missed the news, <a href="http://developer.marklogic.com/download">MarkLogic Server 5.0-1</a> is now available. The upgrade went smoothly for me, but this is a major release so it is wise to back up your databases and configuration before upgrading. The on-disk forest version appears to have changed, which will trigger reindexing of all forests. You may want to manually disable reindexing before upgrading, so that you don't have to contend with multiple forests trying to reindex at the same time.
<br/><br/>
This is also a good time to double-check your free disk space, since reindexing uses extra disk space. Some of that space won't be released when reindexing finishes, either. For example, one of my forests looked like this:
<br/><br/>
<div>
<a href="/blogofile/images/wp-content/2011/11/Screen-shot-2011-11-01-at-10.15.11-.png"><img class="size-medium wp-image-580" title="Forest status after reindexing" src="/blogofile/images/wp-content/2011/11/Screen-shot-2011-11-01-at-10.15.11--300x47.png" alt="This forest is holding on to over 2-GiB of deleted fragments." width="300" height="47" /></a>
</div>
<br/><br/>
You can purge those deleted fragments by forcing a merge of the forest, or of the entire database. After doing this, my forest used less disk space.
<br/><br/>
<div>
<a href="/blogofile/images/wp-content/2011/11/Screen-shot-2011-11-01-at-10.24.34-.png"><img class="size-medium wp-image-580" title="Forest status after forced merge" src="/blogofile/images/wp-content/2011/11/Screen-shot-2011-11-01-at-10.24.34--300x32.png" alt="After the forced merge, the deleted fragments are gone and the forest is smaller." width="300" height="32" /></a>
</div>
<br/><br/>
This new release is stricter about unquoted attributes. With previous releases this would generally work, even though the <a href="http://www.w3.org/TR/xquery/#doc-xquery-DirectConstructor">XQuery 1.0 Recommendation</a> requires quoted attribute values:
<p style="padding-left: 30px;"><span style="font-family: monospace;">&lt;test a={xdmp:random()}/&gt;</span></p>
<br/><br/>
Now it throws an <code>XDMP-UNEXPECTED</code> error. Quote the attribute value correctly, and the problem is fixed.
<p style="padding-left: 30px;"><span style="font-family: monospace;">&lt;test a="{xdmp:random()}"/&gt;</span></p>
<br/><br/>
I'm looking forward to learning more about the 5.0 release, but so far it looks good.<br/><br/>
]]></content:encoded>
    </item>
    <item>
      <title>Yet another search parser - XQYSP</title>
      <link>http://blakeley.com/blogofile/archives/588</link>
      <pubDate>Mon, 24 Oct 2011 13:19:59 UTC</pubDate>
      <category><![CDATA[XQuery]]></category>
      <category><![CDATA[MarkLogic]]></category>
      <guid>http://blakeley.com/blogofile/archives/588</guid>
      <description>Yet another search parser - XQYSP</description>
      <content:encoded><![CDATA[
If you need something a little more sophisticated that the search parser built into the MarkLogic search API, give <a href="https://github.com/mblakele/xqysp">XQYSP</a> a try. It supports nested groups, range queries, near queries with distance and ordering, and should be fairly easy to extend.
<br/><br/>
XQYSP takes a slightly different approach than the Search API or the older lib-parser.xqy, both of which returned cts:query items. Instead, XQYSP returns an abstract syntax tree (AST) as XML. It is up to you, the caller, to transform that AST into a cts:query. That is a little more work for you, but adds a lot of flexibility at the same time. Most of the tasks that used to go into lib-parser-custom.xqy can now be implemented without changing the parser itself. To make it easier to get started, though, I have provided sample code to generate a query from an AST. I hope it is useful.<br/><br/>
]]></content:encoded>
    </item>
    <item>
      <title>XQUT - Unit Testing in Pure XQuery</title>
      <link>http://blakeley.com/blogofile/archives/590</link>
      <pubDate>Tue, 13 Sep 2011 16:06:42 UTC</pubDate>
      <category><![CDATA[XQuery]]></category>
      <category><![CDATA[MarkLogic]]></category>
      <guid>http://blakeley.com/blogofile/archives/590</guid>
      <description>XQUT - Unit Testing in Pure XQuery</description>
      <content:encoded><![CDATA[
<p>
I was working on a couple of pure XQuery projects that needed unit
testing. While I could have integrated with JUnit or another existing
framework, I really wanted something simple that I could run directly
from <a href="http://github.com/marklogic/cq">cq</a>.
Hence <a href="http://github.com/mblakele/xqut">XQUT</a>.
</p>
<p>
XQUT will usually be invoked like this:
</p>
<script src="https://gist.github.com/1332346.js?file=xqut-sample.xqy"></script>

<p>
The cq app server should point to the code you are testing,
so that your test suite can import libraries. The eval root is
different: it is the location of the XQUT code, so that you only need
one copy of XQUT. The external variable <code>SUITE</code> is an XML
test suite. A simple test suite might look like this:
</p>
<script src="https://gist.github.com/1332351.js?file=xqut-sample.xml"></script>

<p>
The XML is fairly simple. Under the root <code>suite</code>
element we have one or more <code>unit</code> elements, each
representing a test. The test XQuery can be defined as the lexical
value of the element, or as its <code>expr</code> child. The result
can be defined by a <code>result</code> attribute or element.
</p>
<p>
For more sophisticated tests, you can
add <code>xsi:type</code> attributes and sequences
of <code>result</code> elements. You can also use an
optional <code>environment</code> element to import libraries, define
variables, and define namespace prefixes. If you
add <code>setup</code> elements, these will be evaluated before any
tests. The <a href="https://github.com/mblakele/xqut/blob/master/test/test.xml">test
suite for XQUT</a> itself contains more examples.
</p>
]]></content:encoded>
    </item>
    <item>
      <title>Group By in XQuery 1.0 for MarkLogic Server</title>
      <link>http://blakeley.com/blogofile/archives/560</link>
      <pubDate>Tue, 23 Aug 2011 14:12:17 UTC</pubDate>
      <category><![CDATA[XQuery]]></category>
      <category><![CDATA[MarkLogic]]></category>
      <guid>http://blakeley.com/blogofile/archives/560</guid>
      <description>Group By in XQuery 1.0 for MarkLogic Server</description>
      <content:encoded><![CDATA[
<p>
XQuery 3.0 introduces new syntax for "group by". At this time,
MarkLogic Server 4.2 is the latest release, and it doesn't have
support for that syntax. So how can we implement "group by" when
writing XQuery for MarkLogic?
</p>

<p>
Let's start with the
<a href="http://www.w3.org/TR/xquery-30-use-cases/#groupby-queries-results">W3C
use cases</a>. First, let's fetch the
<a href="http://www.w3.org/TR/xquery-30-use-cases/#dataproducts">sample
data</a> and put it into MarkLogic. We can do that
using <a href="https://github.com/marklogic/cq">cq</a>. I'll leave out
the schemas, since we don't need those. I also won't be exhaustive
about optimizing every expression in these examples: suffice to say
that there is room for even more improvement.
</p>

<script src="https://gist.github.com/1342270.js"> </script>

<p>
Sorry about the long block of code, but we need those documents. Paste
that into cq, evaluate it, and you should get the empty sequence. That
means your documents were inserted correctly: you can use the
'explore' link to check.
</p>

<a
 href="/blogofile/images/wp-content/2011/08/Screen-shot-2011-08-23-at-13.02.29-.png"><img
 src="/blogofile/images/wp-content/2011/08/Screen-shot-2011-08-23-at-13.02.29--300x218.png"
 alt="The cq explorer shows the W3C test documents."
 title="The cq explorer shows the W3C test documents."
 width="300" height="218"
 class="alignnone size-medium wp-image-562" /></a>

<p>
Now we can write some queries. Here is the first use case (Q1).
</p>

<script src="https://gist.github.com/1342274.js"> </script>

<p>
And the result should look like this:
</p>

<script src="https://gist.github.com/1342278.js"> </script>

<p>
We can't write XQuery 3.0 using XQuery 1.0 &mdash; but we can get the
same result using an extra distinct-values step.
</p>

<script src="https://gist.github.com/1342280.js"> </script>

<p>
This code is a little awkward, though. Instead of looping through the
records once, we have to perform a database lookup on each product
name. Normally this would be an unavoidable cost, and perhaps a reason
to look forward to XQuery 3.0. But MarkLogic gives us a way to cheat,
and use an accumulator model to get the same result more quickly. I'm
talking about
<a href="http://developer.marklogic.com/pubs/4.2/apidocs/map.html">maps</a>.
</p>

<script src="https://gist.github.com/1342282.js"> </script>

<p>
This produces the same output, and will scale better the
<code>distinct-values()</code> version would. Of course it is also less
portable. But database application developers often have to implement
non-portable optimizations, and the less portable code can be
segregated into its own library modules.
</p>

<p>Now let's look at the next example (Q2).</p>

<script src="https://gist.github.com/1342287.js"> </script>

<p>Expected result:</p>

<script src="https://gist.github.com/1342290.js"> </script>

<p>Here is a solution using maps:</p>

<script src="https://gist.github.com/1342291.js"> </script>

<p>
Again, this solution produces the same results. This time we had two
elements in the grouping key, and the map key must be a string. So we
had to use an old database trick and concatenate the two values with a
known delimiter. Naturally we have to be careful in our choice of
delimiter.
</p>

<p>
For the remaining queries, I'll skip the W3C examples and output
XML. Here are my solutions. Again, these return the desired results,
but could benefit from more optimization work.
</p>

<script src="https://gist.github.com/1342293.js"> </script>

<script src="https://gist.github.com/1342299.js"> </script>

<script src="https://gist.github.com/1342297.js"> </script>

<script src="https://gist.github.com/1342296.js"> </script>

<script src="https://gist.github.com/1342295.js"> </script>

<script src="https://gist.github.com/1342294.js"> </script>

<p>
This final use-case is kind of odd, because the sample code works if
you simply comment out the "group by". In other words, the sample data
only contains one group. But I reimplemented it anyway.
</p>

<p>That's it. I hope this was worth your time.
</p>
]]></content:encoded>
    </item>
  </channel>
</rss>

