Where am I?

Let-free Style and Streaming

March 19, 2012 at 12:34 PM | categories: XQuery, MarkLogic | View Comments

If you are familiar with Lisp or Scheme, you know that a function call can replace a variable binding, and function calls can also replace most loops. This is also true in XQuery.

In XQuery this leads to a style of coding that I call "let-free". In this style, there are no FLWOR expressions. Really this is "FLWOR-free", not "let-free", but that's too much of a mouthful for me.

But why would you write let-free code?

The answer is scalability - you knew it would be, right? This breaks out into concurrency and streaming. Let's talk about concurrency first. In the MarkLogic Server implementation of XQuery, every let is evaluated in sequence. However, other expressions are evaluated lazily with concurrency-friendly "future values". So a performance-critical single-threaded request can sometimes benefit from let-free style. You can see this technique in use in some of my code: the semantic library or the task-server forest rebalancer. Both of these projects try to benefit from multi-core CPUs.

The let-free style can also help with query scalability by allowing the results to stream, rather than buffering the entire result sequence. If you need to export large result sets, for example, this technique can help avoid XDMP-EXPNTREECACHEFULL errors. Those errors result when your query's working set is too large to fit in the expanded tree cache, a sort of scratch space for XML trees. But streaming results don't have to fit into the cache.

For example, let's suppose you need to list every document URI in the database. But you do not have the URI lexicon enabled, and you cannot reindex to create it.

Note that nested evaluations cannot stream, either. So even a let-free query may throw XDMP-EXPNTREECACHEFULL in cq or another development tool. To test this query, use an http module instead. This is ideal for web service implementations too.

In this example we used function mapping, a MarkLogic extension to XQuery 1.0. If a function takes a single argument but is called using a sequence, the evaluator simply maps the sequence to multiple function calls. This is somewhat faster than a FLWOR, and it can stream.

Besides using function mapping, let-free style can use XPath steps. However, this technique only works for sequences of nodes.

While these techniques are useful, they can make for code that is hard to read and tricky to debug. Function mapping is especially prone to errors that are difficult to diagnose. If a function signature specifies an argument without a quantifier or with the + quantifier, and the runtime argument is empty, the function will not be called at all. This is surprising, since normally the function would be called and would cause a strong typing error.

The first expression returns the empty sequence, while the second throws the expected strong typing error XDMP-AS. This behavior is annoying, but in some applications the benefits of function mapping outweigh this drawback. We can make debugging easier if we weaken the function signature to document-node()? so that the function will be called even when the argument is empty. If needed, we can include an explicit check for empty input too.

Another let-free trick is to use module variables. These act much like let bindings, but they can stream.

This example is a bit contrived, since the module variable doesn't add anything. But if you find yourself struggling to refactor a let as a function call or an XPath step, consider using a module variable. Module variables are also excellent tools for avoiding repeated work, since the right-hand expression is evaluated lazily and is never evaluated more than once. If the evaluation does not use the module variable, then the right-hand expression is never evaluated. In contrast, the right-expression of a let is evaluated even when the return does not use its value.

As always, do not optimize code unless there is a problem to solve. There are also some situations where the let-free style isn't appropriate. Aside from making your code harder to read and more difficult to debug, let-free style simply doesn't work in situations where your FLWOR would have an order by clause. And after all, streaming won't work for that case anyway. The evaluator can't sort the result set without buffering it first.

Read and Post Comments

Conditional Profiling for MarkLogic

December 14, 2011 at 03:16 PM | categories: XQuery, MarkLogic | View Comments

Today I pushed cprof to GitHub. This XQuery library helps application developers who need to retrofit existing applications with profiling capabilities. Just replace all your existing calls to xdmp:eval, xdmp:invoke, xdmp:value, xdmp:xslt-eval, and xdmp:xslt-eval with corresponding cprof: calls. Add a little logic around cprof:enable and cprof:report, and you are done.

Read and Post Comments

Before you upgrade to 5.0-1

November 03, 2011 at 08:47 AM | categories: MarkLogic | View Comments

Thinking about upgrading to MarkLogic Server 5.0-1?

As usual, back up everything. I haven't seen any data loss myself, but it is your data so be careful.

If you have made any changes to Docs (port 8000) or App Services (8002), the app-services portion of the upgrade won't happen (but the rest of the server will be fine). If you want to use the new monitoring services, you want that part of the upgrade to happen.

The fix is to revert your changes to ports 8000 and 8002. If you have repurposed either port for cq, you may want to go into cq and export all any *local* sessions before changing anything. Local sessions in cq are tied to local browser storage, which is tied to host and port, so you will lose access to them if you change the cq port. Not many folks seem to use cq's local sessions, but I thought I'd mention it. Whether you use cq on those ports or not, make sure port 8000 has root Docs/ and 8002 has root Apps/ or Apps/appbuilder/ - you can see these checks in Admin/lib/upgrade.xqy, function check-prereqs-50.

If upgrade.xqy decides not to upgrade your App Services configuration, it will log a message "Skipping appservices upgrades, prerequisites not met." at level "error". The rest of the server will function correctly, but you won't get the appservices part of 5.0.

Read and Post Comments

Rebalancing for CoRB

November 01, 2011 at 08:50 PM | categories: XQuery, MarkLogic | View Comments

I've written some quick scripts for rebalancing forests in a MarkLogic Server database. This leverages CoRB, and makes the job fairly simple. So if you add more forests to a database, and don't have the luxury of clearing and reloading, I hope this code will help.

Read and Post Comments

MarkLogic 5.0 - First Look

November 01, 2011 at 12:24 PM | categories: XQuery, MarkLogic | View Comments

In case you have missed the news, MarkLogic Server 5.0-1 is now available. The upgrade went smoothly for me, but this is a major release so it is wise to back up your databases and configuration before upgrading. The on-disk forest version appears to have changed, which will trigger reindexing of all forests. You may want to manually disable reindexing before upgrading, so that you don't have to contend with multiple forests trying to reindex at the same time.

This is also a good time to double-check your free disk space, since reindexing uses extra disk space. Some of that space won't be released when reindexing finishes, either. For example, one of my forests looked like this:

This forest is holding on to over 2-GiB of deleted fragments.

You can purge those deleted fragments by forcing a merge of the forest, or of the entire database. After doing this, my forest used less disk space.

After the forced merge, the deleted fragments are gone and the forest is smaller.

This new release is stricter about unquoted attributes. With previous releases this would generally work, even though the XQuery 1.0 Recommendation requires quoted attribute values:

<test a={xdmp:random()}/>

Now it throws an XDMP-UNEXPECTED error. Quote the attribute value correctly, and the problem is fixed.

<test a="{xdmp:random()}"/>

I'm looking forward to learning more about the 5.0 release, but so far it looks good.

Read and Post Comments