Directory Assistance
March 19, 2012 at 12:34 PM | categories: MarkLogic | View CommentsFor a long time now, MarkLogic Server has implemented two distinct features that are both called "directories". This causes confusion, especially since one of these features scales well and the other often causes scalability problems. Let's try to distinguish between these two features, and talk about why they both exist.
Directories were first introduced to accommodate WebDAV.
Since WebDAV clients treat the database as if it were a filesystem,
they expect document URIs with the solidus, or /
,
to imply directory structure. That's one feature called "directories":
if you insert a document with the URI /a/b/c.xml
, you can call
xdmp:directory('/a/b/', '1')
to select that document -
and any other document with the same URI prefix. These URI prefixes
are indexed in much the same way that document URIs and collection URIs
are indexed, so queries are "searchable" and scale well.
This "implied directory structure" works with any database configuration.
You do not need directory-creation=automatic
to use the cts:directory-query
and xdmp:directory
functions.
This returns a query plan in XML:
But WebDAV clients expect more than just directory listings. They also want to lock documents and directories. It is easy to understand document locking: the idea here is that a WebDAV-aware editor might lock a document, copy it to the local filesystem for editing, and copy it back to the server when the editing session ends. It may be less clear that a WebDAV client sometimes needs to lock directories, but it does.
Directory locking is implemented using special directory fragments. There are no documents associated with these properties, so they are sometimes called "naked properties." Here is an example.
Once this update has committed to the database, we can query the directory fragment.
Once you have a directory fragment, you have something that the database
can lock for WebDAV clients. It's rare for anything else
to use this behavior, but xdmp:lock-acquire
is available for custom
content management applications.
Earlier I mentioned that there are two kinds of "directories", one that scales well and one that sometimes causes problems. I wrote that queries based on directory URIs scale well, so you might guess that directory fragments sometimes cause problems. That's correct, and it results from a database feature called "automatic directory creation".
When automatic directory creation is enabled - as it is by default -
the database will ensure that directory fragments exist for every
implied directory in the URI for every new or updated document.
The document URI /a/b/c.xml
implies a directory fragment
for /
, /a/
, and /a/b/
. So the database will ensure that these exist
whenever a request updates /a/b/c.xml
.
So what happens when one request updates /a/b/c.xml
and another request updates /a/b/d.xml
?
Both requests try to ensure that there are directory fragments
for /
, /a/
, and /a/b/
. This causes lock contention.
The same problem shows up if another request is updating /fubar.xml
,
because both queries look for the /
directory fragment.
The situation gets worse as concurrency increases.
It gets even worse if "maintain directory last-modified" is enabled,
because the directory fragments have to be updated too.
But happily that feature is not enabled by default.
The solution to this problem is simple. In my experience
at least 80% of MarkLogic Server customers do not use WebDAV,
so they do not need automatic directory creation. Instead,
they can set directory creation to "manual".
Do this whenever you create a new database,
or script it using admin:database-set-directory-creation
.
If you do use WebDAV, try to limit its scope. Perhaps you can get by
with a limited number of predefined WebDAV directories,
which you create manually using xdmp:directory-create
as part of your application deployment.
Or perhaps you only use WebDAV for your XQuery modules,
which only contains a few hundred or at most a few thousand documents.
In that case you can use automatic directory creation without a problem.
Generally speaking, really large databases don't use WebDAV anyway. "Big content" databases, with hundreds of millions or billions of documents, tend to be much to large for WebDAV to be useful. For smaller databases where WebDAV is useful, automatic directory creation is fine.
Sometimes it is useful to set "directory-creation" to "manual-enforced".
With this configuration you will see an XDMP-PARENTDIR
error
whenever your code tries to insert a document
with an implied directory structure
that does not have corresponding directory fragments.
But this feature is rarely used.
To sum up, directory URIs are highly scalable and very useful,
and are always indexed. Your code can call xdmp:directory
with any database settings.
The default "automatic directory creation" feature creates directory fragments,
which can be a bottleneck for large databases.
Most applications are better off with "directory-creation" set to "manual".