home
about
services
contact

Where am I?

MacBook Air 2010 (13.3/1.8/4/256)

December 07, 2010 at 04:22 PM | categories: OS X | View Comments

Mine arrived around 22 November. It seems to be a very well-made laptop, and I'm enjoying its low mass and long battery life. I'm still getting used to the keyboard: I've remapped capslock to control, but occasionally I'll still try to use the fn key as a control key.

This marks my first attempt in about a decade to use OS X as a full-time environment. I've have a Mac of some sort since 1988, but for the past few years I rarely used the latest one (an iMac). I used it for a little audio work, and for the past year for iOS development. For productivity tasks it was usually easier to use my work laptop instead, which ran debian linux at Oracle and Kubuntu or Ubuntu linux at MarkLogic.

One reason for all this was keyboard accessibility. Apple clearly doesn't make this a priority. Even with Universal Access enabled, I'm constantly finding things that only work with the mouse or trackpad. ClipMenu and SizeUp help, but I'm always looking for more efficient ways to work.

Read and Post Comments
 

DSH - distributed shell for cluster administration

December 07, 2010 at 04:10 PM | categories: Linux | View Comments

I noticed this in the github pull requests for homebrew, and thought it looked interesting. We've all written a script that sends a command-line out to N hosts, right? Dancer's Shell looks like a nice way to automate that. It lets you place hosts in groups, and send commands to a group of hosts. The next time I need to work with a cluster, I'm planning to delve into this and see if it helps.

Read and Post Comments
 

Web-based user interface for searching metadata-driven relational databases

December 06, 2010 at 10:17 PM | categories: Uncategorized | View Comments

With some embarrassment, I concede that my name is now on a software patent assigned to Oracle. Apparently it was granted at the end of last month.

Web-based user interface for searching metadata-driven relational databases

It wouldn't surprise me if there is prior art for this - it's basically a SQL query builder driven by Javascript and an in-page data dictionary. Certainly we had all seen query builders before. But when we built the application, back in 2003, it seemed innovative to feed the query schema into a web page, with JavaScript that then presented the options to the user.

Read and Post Comments
 

XQuery Code Review

December 05, 2010 at 01:57 PM | categories: XQuery, MarkLogic | View Comments

Code review is a crucial part of good development methodology. It can be a pre-checkin activity, or an ongoing background task in a pair-programming environment. What should you look for when reviewing XQuery code?

I'm sure I've forgotten a few things, but hopefully this will get you started.

Use strong typing

You can catch many potential bugs this way. If a function's parameters and return value aren't strongly typed, there should be a comment explaining why that isn't practical. Try to narrow down the types, too: be as specific as possible. If a parameter or return type is node(), ask why it can't be element()? If it's element(), why can't it be element(foo) or element(*, typename) instead? Don't forget about quantifiers and schema-element().

Besides function variables and return types, this also applies to module variables, FLWOR variable bindings, and typeswitch variable bindings.

Look for injection paths

The xdmp:eval and xdmp:value functions are especially vulnerable, and should be avoided as much as possible in favor of xdmp:invoke and xdmp:unpath, respectively. But it's important to check any user input. For example, is the user entitled to see the contents of the document requested via xdmp:get-request-field? It's best if the document is protected by a good security configuration with read permissions, but if security checks are being done in application code, be especially careful to guard against injection attacks.

Are there order-by clauses or expressions with gt or lt operators?

If so, could they be accelerated using range indexes? If range indexes do exist, are they working? Sometimes the FLWOR syntax will be subtly outside the boundaries of what MarkLogic Server can optimize, and sometimes the query collation won't match a string range index collation.

You can check using xdmp:query-trace or xdmp:plan. Often this makes a good unit test, especially for performance-critical functionality.

Does the code rely on effective boolean values?

If you see 'if ($foo)then...', ask what the type of $foo is. If it isn't already a boolean, then the effective boolean value rules come into play. That could cause mysterious bugs or errors. Would it be better to write 'if(exists($foo))...' or perhaps 'if ($foo ne 0)...' instead? The core idea here is to write what you mean, and mean what you write.

This idea also applies to XPath predicates, where-clauses, and any other boolean context.

Review any XPath with '/text()' or '//text()' steps

When I see a query is full of text() steps, my first thought is that the developer may not really understand atomization. Believe it or not, there's often some performance to be gained by eliminating these extra steps in favor of atomization. But more importantly, text() can be dangerous. It doesn't mean "give me the lexical string value of the context nodes". It does mean "give me the text children, if any, of each of the context nodes". So it can return nothing, or one text node, or one million text nodes. These two code examples look similar, but behave in very different ways. As a result, I try to avoid text() as much as possible. When it is needed, I leave comments describing why string() won't work, and what will happen in cases such as empty sequence or multiple nodes.

let $n := <p>nodes <i>can</i> have multiple text children</p>
return concat('testing... ', $n)
=> testing... nodes can have multiple text children
let $n := <p>nodes <i>can</i> have multiple text children</p>
return concat('testing... ', $n/text())
=> XDMP-ARGTYPE: (err:XPTY0004) concat("testing... ", (text{"nodes "},
text{" have multiple text children"})) -- arg2 is not of type xs:anyAtomicType?

Review any XPath with '//' steps

Sometimes these are necessary, and sometimes they even benefit performance. But more often they are a sign of short-sighted laziness, or outright ignorance of the content. Rewriting these expressions as fully-qualified paths can improve code resiliency and also performance.

If the business logic really calls for descendant-or-self::node(), comment the code to explain why.

Review any function calls within XPath predicates

Function calls inside an XPath predicate can be horrible for performance, since the function must be called for every item in the predicate's input sequence. If the result of the function call is static, simply bind the result to a variable. This is also true of operators: even simple math operations like:

$list[$start to $start +$size]
should be rewritten as
$list[$start to $stop]

If you have trouble seeing why this might be a problem, consider a list with 100 items. Now consider this expression:

$list[ xdmp:sleep(100) ]

Evaluation will cost 100-ms per item, or 10 seconds total. Every expression takes a finite amount of time to evaluate, and performance optimization is sometimes a matter of reducing the expression count. Also, most function calls will defeat any index lookups that might otherwise take place.

Review any XPath sub-expressions within XPath predicates

Here I don't mean [ foo ] but rather [ foo/bar ] or [ foo[bar] ]. These sub-expressions can be bad for performance. If the result of the sub-expression is static, bind it to a variable.

Watch for nested if-then-else-if-then-else... sub-expressions

Can it be rewritten as a typeswitch? If so, this is often easier to read, and faster.

Look for opportunities to eliminated unnecessary expressions

For example, I frequently see code resembling this:

...
return if ($flag) then $result else ()

It would be clearer and more XQuery-like to write this as:

...
where $flag
return $result

Another common pattern is:

if ($foo/bar) then $foo/bar else ()

This can be re-written without changing the results, as:

$foo/bar

The trick is to understand that if there are no results for an XPath expression, then the result is (). A similar pattern is:

if ($foo eq $bar) then true() else false()

This can be written as:

($foo eq $bar)

The parenthesis aren't strictly necessary, but make it easier to read expressions like this:

let $baz := ($foo eq $bar)
...

Watch out for function mapping

Function mapping is a powerful friend and a terrible enemy. If the code is a 0.9-to-1.0 port, I recommend disabling function mapping in every module. If it's new 1.0 code, think about how function mapping might help or cause problems in different situations.

Disabling function mapping is easy. In the prologue, simply add:

declare option xdmp:mapping "false";

Your turn

As Mary might ask: comments, questions, or denunciations? Any tips of your own that you would like to share? Let's hear them.

Read and Post Comments
 

My Name Here

November 29, 2010 at 10:38 AM | categories: home, MarkLogic | View Comments

As of 19 November I am no longer employed by MarkLogic. This was an amicable parting of ways: I know it's a cliche, but I really am "leaving to pursue other interests". I continue to believe in MarkLogic's technology and business model, but it is time for me to return to my own consulting business. I expect to work on some projects involving MarkLogic Server and related technologies. I also intend to branch out into other areas.

Thanks to all at MarkLogic, and to the customers and partners that I worked with there. If you want to keep in touch, I'm on LinkedIn.

Read and Post Comments