Where am I?

AlbumMixer v1.11

March 30, 2012 at 12:34 PM | categories: iOS | View Comments

AlbumMixer v1.11 fixes some minor bugs, and should be better behaved with iTunes Match. If you see any problems with this release, please use Settings > Report a Problem from within the app. I will also read comments posted here.

Read and Post Comments

Directory Assistance

March 19, 2012 at 12:34 PM | categories: MarkLogic | View Comments

For a long time now, MarkLogic Server has implemented two distinct features that are both called "directories". This causes confusion, especially since one of these features scales well and the other often causes scalability problems. Let's try to distinguish between these two features, and talk about why they both exist.

Directories were first introduced to accommodate WebDAV. Since WebDAV clients treat the database as if it were a filesystem, they expect document URIs with the solidus, or /, to imply directory structure. That's one feature called "directories": if you insert a document with the URI /a/b/c.xml, you can call xdmp:directory('/a/b/', '1') to select that document - and any other document with the same URI prefix. These URI prefixes are indexed in much the same way that document URIs and collection URIs are indexed, so queries are "searchable" and scale well.

This "implied directory structure" works with any database configuration. You do not need directory-creation=automatic to use the cts:directory-query and xdmp:directory functions.

This returns a query plan in XML:

But WebDAV clients expect more than just directory listings. They also want to lock documents and directories. It is easy to understand document locking: the idea here is that a WebDAV-aware editor might lock a document, copy it to the local filesystem for editing, and copy it back to the server when the editing session ends. It may be less clear that a WebDAV client sometimes needs to lock directories, but it does.

Directory locking is implemented using special directory fragments. There are no documents associated with these properties, so they are sometimes called "naked properties." Here is an example.

Once this update has committed to the database, we can query the directory fragment.

Once you have a directory fragment, you have something that the database can lock for WebDAV clients. It's rare for anything else to use this behavior, but xdmp:lock-acquire is available for custom content management applications.

Earlier I mentioned that there are two kinds of "directories", one that scales well and one that sometimes causes problems. I wrote that queries based on directory URIs scale well, so you might guess that directory fragments sometimes cause problems. That's correct, and it results from a database feature called "automatic directory creation".

When automatic directory creation is enabled - as it is by default - the database will ensure that directory fragments exist for every implied directory in the URI for every new or updated document. The document URI /a/b/c.xml implies a directory fragment for /, /a/, and /a/b/. So the database will ensure that these exist whenever a request updates /a/b/c.xml.

So what happens when one request updates /a/b/c.xml and another request updates /a/b/d.xml?

Both requests try to ensure that there are directory fragments for /, /a/, and /a/b/. This causes lock contention. The same problem shows up if another request is updating /fubar.xml, because both queries look for the / directory fragment. The situation gets worse as concurrency increases. It gets even worse if "maintain directory last-modified" is enabled, because the directory fragments have to be updated too. But happily that feature is not enabled by default.

The solution to this problem is simple. In my experience at least 80% of MarkLogic Server customers do not use WebDAV, so they do not need automatic directory creation. Instead, they can set directory creation to "manual". Do this whenever you create a new database, or script it using admin:database-set-directory-creation.

admin UI screen shot

If you do use WebDAV, try to limit its scope. Perhaps you can get by with a limited number of predefined WebDAV directories, which you create manually using xdmp:directory-create as part of your application deployment. Or perhaps you only use WebDAV for your XQuery modules, which only contains a few hundred or at most a few thousand documents. In that case you can use automatic directory creation without a problem.

Generally speaking, really large databases don't use WebDAV anyway. "Big content" databases, with hundreds of millions or billions of documents, tend to be much to large for WebDAV to be useful. For smaller databases where WebDAV is useful, automatic directory creation is fine.

Sometimes it is useful to set "directory-creation" to "manual-enforced". With this configuration you will see an XDMP-PARENTDIR error whenever your code tries to insert a document with an implied directory structure that does not have corresponding directory fragments. But this feature is rarely used.

To sum up, directory URIs are highly scalable and very useful, and are always indexed. Your code can call xdmp:directory with any database settings. The default "automatic directory creation" feature creates directory fragments, which can be a bottleneck for large databases. Most applications are better off with "directory-creation" set to "manual".

Read and Post Comments

Let-free Style and Streaming

March 19, 2012 at 12:34 PM | categories: XQuery, MarkLogic | View Comments

If you are familiar with Lisp or Scheme, you know that a function call can replace a variable binding, and function calls can also replace most loops. This is also true in XQuery.

In XQuery this leads to a style of coding that I call "let-free". In this style, there are no FLWOR expressions. Really this is "FLWOR-free", not "let-free", but that's too much of a mouthful for me.

But why would you write let-free code?

The answer is scalability - you knew it would be, right? This breaks out into concurrency and streaming. Let's talk about concurrency first. In the MarkLogic Server implementation of XQuery, every let is evaluated in sequence. However, other expressions are evaluated lazily with concurrency-friendly "future values". So a performance-critical single-threaded request can sometimes benefit from let-free style. You can see this technique in use in some of my code: the semantic library or the task-server forest rebalancer. Both of these projects try to benefit from multi-core CPUs.

The let-free style can also help with query scalability by allowing the results to stream, rather than buffering the entire result sequence. If you need to export large result sets, for example, this technique can help avoid XDMP-EXPNTREECACHEFULL errors. Those errors result when your query's working set is too large to fit in the expanded tree cache, a sort of scratch space for XML trees. But streaming results don't have to fit into the cache.

For example, let's suppose you need to list every document URI in the database. But you do not have the URI lexicon enabled, and you cannot reindex to create it.

Note that nested evaluations cannot stream, either. So even a let-free query may throw XDMP-EXPNTREECACHEFULL in cq or another development tool. To test this query, use an http module instead. This is ideal for web service implementations too.

In this example we used function mapping, a MarkLogic extension to XQuery 1.0. If a function takes a single argument but is called using a sequence, the evaluator simply maps the sequence to multiple function calls. This is somewhat faster than a FLWOR, and it can stream.

Besides using function mapping, let-free style can use XPath steps. However, this technique only works for sequences of nodes.

While these techniques are useful, they can make for code that is hard to read and tricky to debug. Function mapping is especially prone to errors that are difficult to diagnose. If a function signature specifies an argument without a quantifier or with the + quantifier, and the runtime argument is empty, the function will not be called at all. This is surprising, since normally the function would be called and would cause a strong typing error.

The first expression returns the empty sequence, while the second throws the expected strong typing error XDMP-AS. This behavior is annoying, but in some applications the benefits of function mapping outweigh this drawback. We can make debugging easier if we weaken the function signature to document-node()? so that the function will be called even when the argument is empty. If needed, we can include an explicit check for empty input too.

Another let-free trick is to use module variables. These act much like let bindings, but they can stream.

This example is a bit contrived, since the module variable doesn't add anything. But if you find yourself struggling to refactor a let as a function call or an XPath step, consider using a module variable. Module variables are also excellent tools for avoiding repeated work, since the right-hand expression is evaluated lazily and is never evaluated more than once. If the evaluation does not use the module variable, then the right-hand expression is never evaluated. In contrast, the right-expression of a let is evaluated even when the return does not use its value.

As always, do not optimize code unless there is a problem to solve. There are also some situations where the let-free style isn't appropriate. Aside from making your code harder to read and more difficult to debug, let-free style simply doesn't work in situations where your FLWOR would have an order by clause. And after all, streaming won't work for that case anyway. The evaluator can't sort the result set without buffering it first.

Read and Post Comments

Where am I?

AlbumMixer v1.11

Directory Assistance

Let-free Style and Streaming

Latest blog posts

Recommended

Apps

Projects

Categories

Archives