Introduction to Multi-Statement Transactions
June 21, 2013 at 12:34 PM | categories: XQuery, MarkLogic | View CommentsIf you are an old hand with MarkLogic, you are used to writing
update queries with implicit commits. Sometimes this
means restructuring your code so that everything can happen in one commit,
with no conflicting updates. In extreme cases you might
even decide to run multiple transactions from one query,
using xdmp:invoke
or semicolons.
Historically this meant giving up atomicity.
Multi-statement transactions, introduced in MarkLogic 6, promise a third way. We can write a transaction that spans multiple statements, with an explicit commit or rollback.
For most updates it's probably best to stick with the old ways and use implicit commits. But let's look at a concrete example of a time when multi-statement transactions are the right tool for the job.
Suppose you are using DLS (Document Library Services) to manage your document versioning. But you have a special case where you want to insert two discrete versions of the same document atomically. That may sound odd, but I ran into that exact problem recently.
First we need to discover that there is a problem. Let's bootstrap the a test document with DLS.
import module namespace dls="http://marklogic.com/xdmp/dls"
at "/MarkLogic/dls.xqy";
try {
dls:document-delete('test', false(), false()) }
catch ($ex) {
if ($ex/error:code ne 'DLS-UNMANAGED') then xdmp:rethrow()
else if (empty(doc('test'))) then ()
else xdmp:document-delete('test') }
;
import module namespace dls="http://marklogic.com/xdmp/dls"
at "/MarkLogic/dls.xqy";
dls:document-insert-and-manage('test', false(), <x id="x1"/>)
Now let's write some XQuery to insert two versions in one update, and see what happens.
import module namespace dls="http://marklogic.com/xdmp/dls"
at "/MarkLogic/dls.xqy";
dls:document-checkout-update-checkin(
'test', <x id="x2"/>, "version two", true()),
dls:document-checkout-update-checkin(
'test', <x id="x3"/>, "version three", true())
This throws an XDMP-CONFLICTINGUPDATES
error, because these calls to DLS
end up trying to update the same nodes twice in the same transaction.
In implicit commit mode, aka "auto" mode, this is difficult to avoid.
We could ask MarkLogic to extend DLS with a new function
designed for this situation. But that is a long-term solution,
and we need to move on with this implementation.
So what can we do? We might read up on xdmp:invoke
, xdmp:eval
, etc.
If we are careful, we can write a top-level read-only query
that invokes one or more update transactions.
(: Entry point - must be a read-only query. :)
xdmp:invoke(
'update.xqy',
(xs:QName('URI'), 'test',
xs:QName('NEW'), <x id="x2"/>,
xs:QName('NOTE'), "version two")),
xdmp:invoke(
'update.xqy',
(xs:QName('URI'), 'test',
xs:QName('NEW'), <x id="x3"/>,
xs:QName('NOTE'), "version three"))
This invokes a module called update.xqy
, which would look like this:
(: update.xqy :)
import module namespace dls="http://marklogic.com/xdmp/dls"
at "/MarkLogic/dls.xqy";
declare variable $NEW as node() external ;
declare variable $NOTE as xs:string external ;
declare variable $URI as xs:string external ;
dls:document-checkout-update-checkin(
$URI, $NEW, $NOTE, true())
This works - at least, it doesn't throw XDMP-CONFLICTINGUPDATES
.
But we have lost atomicity. Each of the two updates runs
as a different transaction. This opens up a potential race
condition, where a second query updates the document
in between our two transactions. That could break our application.
There are ways around this, but they get complicated quickly. They are also difficult to test, so we can never be confident that we have plugged all the potential holes in our process. It would be much more convenient if we could run multiple statements inside one transaction, with each statement able to see the database state of the previous statements.
We can do exactly that using a multi-statement transaction. Let's get our feet wet by looking at a very simple MST.
declare option xdmp:transaction-mode "update";
xdmp:document-insert('temp', <one/>)
;
xdmp:document-insert('temp', <two/>),
xdmp:commit()
There are three important points to this query.
1. The option xdmp:transaction-mode="update"
begins a multi-statment transaction.
1. The semicolon after the first xdmp:document-insert
ends that statement and begins another.
1. The xdmp:commit
ends the multi-statement transaction
by commiting all updates to the database.
This runs without error, and we can verify that doc('temp')
contains <two/>
after it runs.
But how can we prove that all this takes place in a single transaction?
Let's decorate the query with a few more function calls.
declare option xdmp:transaction-mode "update";
xdmp:get-transaction-mode(),
xdmp:transaction(),
doc('temp')/*,
xdmp:document-insert('temp', <one/>)
;
xdmp:get-transaction-mode(),
xdmp:transaction(),
doc('temp')/*,
xdmp:document-insert('temp', <two/>),
xdmp:commit()
This time we return some extra information within each statement: the transaction mode, the transaction id, and the contents of the test doc. The transaction ids will be different every time, but here is one example.
update
17378667561611037626
<two/>
update
17378667561611037626
<one/>
So the document test
started out with the old node <two/>
,
but after the first statement it changed to <one/>
.
Both statements see the same transaction mode and id.
Try changing the xdmp:transaction-mode
declaration to auto
, the default.
You should see the mode change to auto
, and two different transaction-ids.
This tells us that in update
mode we have a multi-statement transaction,
and in auto
mode we have a non-atomic sequence of two different transactions.
Before MarkLogic 6, all update statements ran in auto
mode.
Now let's apply what we've learned about MST to the original problem: inserting two different versions of a managed document in a single transaction.
import module namespace dls="http://marklogic.com/xdmp/dls"
at "/MarkLogic/dls.xqy";
declare option xdmp:transaction-mode "update";
dls:document-checkout-update-checkin(
'test', <x id="x2"/>, "version two", true())
;
import module namespace dls="http://marklogic.com/xdmp/dls"
at "/MarkLogic/dls.xqy";
dls:document-checkout-update-checkin(
'test', <x id="x3"/>, "version three", true()),
xdmp:commit()
As above, this code uses three important features:
1. Set xdmp:transaction-mode="update"
to begin the MST.
1. Use semicolons to end one statement and begin another.
1. Use xdmp:commit
to end the MST and commit all updates.
To abort a multi-statement transaction, use xdmp:rollback
.
So now you have a new tool for situations where implicit commit
is a little too awkward. Try not to overdo it, though.
In most situations, the default xdmp:transaction-mode="auto"
is still the best path.
External Variables (Code Review, Part II)
September 28, 2012 at 12:34 PM | categories: XQuery, MarkLogic | View CommentsRemember when I talked about XQuery Code Review?
The other day I was forwarding that link to a client,
and noticed that I forgot to mention external variables.
I talked about xdmp:eval
and xdmp:value
in the section titled Look_for_injection_paths,
and mentioned that it's usually better to use xdmp:invoke
or xdmp:unpath
,
which are less vulnerable to injection attacks.
But it can be convenient or even necessary to evaluate dynamic XQuery.
That's what xdmp:eval
and xdmp:value
are there for, after all.
I've even written tools like Presta
to help you.
Used properly, dynamic queries can be made safe.
The trick is to never let user data directly into your dynamic queries.
Whenever you see xdmp:eval
or xdmp:value
in XQuery,
ask yourself "Where did this query comes from?"
If any part of it came from user input, flag it for a rewrite.
(: WRONG - This code is vulnerable to an injection attack! :)
xdmp:eval(
concat('doc("', xdmp:get-request-field('uri'), '")'))
Actually there are at least two bugs in this code.
There is a functional problem: what happens if the uri
request field
is fubar-"baz"
? You might not expect a uri to include a quote,
and maybe that will never legitimately happen in your application.
But if that request-field does arrive, xdmp:value
will throw an error:
XDMP-UNEXPECTED: (err:XPST0003) Unexpected token syntax error
That's because you haven't properly escaped the uri in the dynamic XQuery.
And you could escape it. You could even write a function to do that for you.
But if you miss any of the various characters that need escaping,
XDMP-UNEXPECTED
will be there, waiting for you.
So far we've only talked about innocent mistakes. But what if someone out there
is actively hostile? Let's say it's me. If I know that your web service
expects a uri
request-field, I might guess that your code looks something like
the code above, and try an injection attack.
After a little trial and error, I might find that sending
uri=x"),cts:uris(),("
returns a list of all the documents in your database,
whether you want me to see them or not. Then I can send something like
uri=x"),xdmp:document-delete("fubar
. If that document exists,
and security isn't tight... it's gone. Or maybe I will decide to try
xdmp:forest-clear
instead.
In SQL we use bind variables to solve both of these problems. Any user input binds to a variable inside the SQL, and the database driver takes care of escaping for us. We no longer have to worry about obscure syntax errors or injection attacks, as long as we remember to use variable for all externally-supplied parameters. In XQuery these are known as external variables.
(: Always use external variables for user-supplied data. :)
xdmp:eval(
'declare variable $URI as xs:string external ;
doc($URI)',
(xs:QName('URI'), xdmp:get-request-field('uri')))
The syntax is a little odd: that second parameter is a sequence of alternating QName and value. Because XQuery doesn't support nested sequences, this means you can't naively bind a sequence to a value. Instead you can pass in XML or a map, or use a convention like comma-separated values (CSV).
(: Using XML to bind a sequence to an external variable. :)
xdmp:eval(
'declare variable $URI-LIST as element(uri-list) external ;
doc($URI-LIST/uri)',
(xs:QName('URI-LIST'),
element uri-list {
for $uri in xdmp:get-request-field('uri')
return element uri { $uri } }))
Even though these examples all use pure XQuery, this code review principle
also applies to XCC code. If you see a Java or .NET program using AdHocQuery
,
check to make sure that all user input binds to variables.
Remember, the best time to fix a potential security problem is before the code goes live.
AlbumMixer v1.13
June 13, 2012 at 01:26 PM | categories: iOS | View CommentsAlbumMixer v1.13 fixes a minor bug where the player display would look odd when tracks were missing some metadata.
If you see any problems with this release,
please use Settings > Report a Problem
from within the app.
I will also read comments posted here.
rsyslog and MarkLogic
May 17, 2012 at 06:00 PM | categories: MarkLogic, Linux | View CommentsYou probably know that MarkLogic Server logs important events
to the ErrorLog.txt
file. By default it logs events at INFO
or higher,
but many development and staging environments change the file-log-level
to DEBUG
. These log levels are also available to the xdmp:log
function,
and some of your XQuery code might use that for printf
-style debugging.
You might even know that MarkLogic also sends important events
to the operating system. On linux this means syslog
, and important events
are those at NOTICE
and higher by default.
But are you monitoring these events?
How can you set up your MarkLogic deployment so that it will automatically alert you to errors, warnings, or other important events?
Most linux deployments now use rsyslog
as their system logging facility.
The full documentation is available,
but this brief tutorial will show you how to set up email alerts for MarkLogic
using rsyslog
version 4.2.6.
All configuration happens in /etc/rsyslog.conf
.
Here is a sample of what we need for email alerts.
First, at the top of the file you should see several ModLoad
declarations.
Check for ommail
and add it if needed.
$ModLoad ommail.so # email support
Next, add a stanza for MarkLogic somewhere after the ModLoad
declaration.
# MarkLogic
$template MarkLogicSubject,"Problem with MarkLogic on %hostname%"
$template MarkLogicBody,"rsyslog message from MarkLogic:\r\n[%timestamp%] %app-name% %pri-text%:%msg%"
$ActionMailSMTPServer 127.0.0.1
$ActionMailFrom your-address@your-domain
$ActionMailTo your-address@your-domain
$ActionMailSubject MarkLogicSubject
#$ActionExecOnlyOnceEveryInterval 3600
daemon.notice :ommail:;MarkLogicBody
Be sure to replace both instances of your-address@your-domain
with an appropriate value. The ActionMailSMTPServer must be smart enough
to deliver email to that address. I used a default sendmail
configuration
on the local host, but you might choose to connect to a different host.
Note that I have commented out the ActionExecOnlyOnceEveryInterval
option.
The author of rsyslog
, Rainer Gerhards,
recommends setting this value to a reasonably high number of seconds
so that your email inbox is not flooded with messages.
However, the rsyslog
documentation states that excess messages
are discarded, and I did not want to lose any important messages.
What I would really like to do is buffer messages for N seconds at a time,
and merge them together in one email.
But while rsyslog
has many features, and does offer buffering,
it does not seem to know how to combine consecutive messages
into a single email.
Getting back to what rsyslog
can do,
you can customize the subject and body of the mail message.
With the configuration above, a restart of the server
might send you an email like this one:
Subject: Problem with MarkLogic on myhostname.mydomain
rsyslog message from MarkLogic:
[May 17 23:58:36] MarkLogic daemon.notice<29>: Starting MarkLogic Server 5.0-3 i686 in /opt/MarkLogic with data in /var/opt/MarkLogic
When making any rsyslog
changes, be sure to restart the service:
sudo service rsyslog restart
At the same time, check your system log for any errors or typos.
This is usually /var/log/messages
or /var/log/syslog
.
The full documentation for template substitution properties
is online.
You can also read about a wealth of other options available in rsyslog
.
AlbumMixer v1.12
May 16, 2012 at 10:40 AM | categories: iOS | View CommentsAlbumMixer v1.12 fixes a minor bug where the player state would be wrong when returning from background mode.
If you see any problems with this release,
please use Settings > Report a Problem
from within the app.
I will also read comments posted here.