Archive for the ‘XQuery’ Category

Real XQuery

Sunday, November 19th, 2006

XQuery can have fun applications - no, really! Let’s suppose I wanted to find a good pub in WC1, London. There’s a local CAMRA web site with useful listings. But they track every pub in the area, so several of them don’t have real ale, or are shut, or otherwise unsuitable. And we want to list the most interesting pubs first.

  1. CAMRA awards
  2. number of real ales
  3. historic interiors

First, we have to get the document into MarkLogic Server. Unfortunately the content has a character sequence that is illegal in UTF-8. Worse, the document does not say what character set it is in. MarkLogic Server does not have built-in support for any character set other than UTF-8, but it does have access to tidy as a built-in extension function: xdmp:tidy(), and tidy does know a thing or two about miscellaneous character sets. We can use this, plus some binary-node and xdmp:quote() magic, to load the page into MarkLogic Server.

declare namespace xh="http://www.w3.org/1999/xhtml"

let $uri := 'http://www.camranorthlondon.org.uk/nlpg/wc1.html'
let $html :=
  xdmp:tidy(xdmp:quote(xdmp:document-get(
    $uri,
    <options xmlns="xdmp:document-get">
      <format>binary</format>
    </options>
  )))/xh:html
return xdmp:document-insert($uri, $html)

That query returns the empty sequence, and puts an XHTML version of the original content into my MarkLogic Server database.

Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a dl element, with dd elements that have role-based CSS class names.

I didn’t bother to fragment the listings, so the searching is all fairly “dumb”, but it works well enough. The pubs I’m interested in are at the top.

default element namespace = "http://www.w3.org/1999/xhtml"

element html {
  element head { element title { "wc1" } },
  element body {
    for $pub in
      doc('http://www.camranorthlondon.org.uk/nlpg/wc1.xhtml')
      /html/body/div/div/dl
      [ empty(dd[@class eq 'shut']) ]
      [ not(cts:contains(dd[ @class eq 'beer' ], 'no real ale')) ]
      [ not(cts:contains(dd[ @class eq 'desc' ], 'boarded up')) ]
    let $award := cts:contains($pub/dd, 'pub of the year')
    let $beers := count(tokenize($pub/dd[@class eq 'beer'], '[,;]+'))
    let $listed := cts:contains($pub/dd[@class eq 'desc'], 'listed')
    order by $award descending, $beers descending, $listed descending
    return element { node-name($pub) } {
      $pub/node()[ not(@class = ('tele', 'link', 'updt')) ]
    }
  }
}

Maybe I should use this in the developer course….

Paginated search tutorial

Thursday, September 28th, 2006

I still have a long list of tutorial ideas. Here’s the latest one I’ve written, covering paginated search from an XQuery library module.

XQSync tutorial

Tuesday, August 29th, 2006

The RecordLoader tutorial seems to be popular, so here’s another. This tutorial covers XQSync.

Performance and QA Test Tool

Wednesday, July 12th, 2006

Over at Mark Logic we recently released a new PerformanceMeters tool, which can be handy for benchmark testing - and unit testing, too. I’ve written a short tutorial.

RecordLoader tutorial

Monday, June 26th, 2006

If you’ve tried to use my RecordLoader tool, you’ve probably noticed that it’s lightly-documented and somewhat cryptic. This tutorial may help.