Where am I?

Real XQuery

November 19, 2006 at 10:59 PM | categories: XQuery, beer, MarkLogic | View Comments

XQuery can have fun applications - no, really! Let's suppose I wanted to find a good pub in WC1, London. There's a local CAMRA web site with useful listings. But they track every pub in the area, so several of them don't have real ale, or are shut, or otherwise unsuitable. And we want to list the most interesting pubs first.

CAMRA awards
number of real ales
historic interiors

First, we have to get the document into MarkLogic Server. Unfortunately the content has a character sequence that is illegal in UTF-8. Worse, the document does not say what character set it is in. MarkLogic Server does not have built-in support for any character set other than UTF-8, but it does have access to tidy as a built-in extension function: xdmp:tidy(), and tidy does know a thing or two about miscellaneous character sets. We can use this, plus some binary-node and xdmp:quote() magic, to load the page into MarkLogic Server.

declare namespace xh="http://www.w3.org/1999/xhtml"



let $uri := 'http://www.camranorthlondon.org.uk/nlpg/wc1.html'
let $html :=
  xdmp:tidy(xdmp:quote(xdmp:document-get(
    $uri,
    <options xmlns="xdmp:document-get">
      <format>binary</format>
    </options>
  )))/xh:html
return xdmp:document-insert($uri, $html)

That query returns the empty sequence, and puts an XHTML version of the original content into my MarkLogic Server database.

Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a dl element, with dd elements that have role-based CSS class names.

I didn't bother to fragment the listings, so the searching is all fairly "dumb", but it works well enough. The pubs I'm interested in are at the top.

default element namespace = "http://www.w3.org/1999/xhtml"



element html {
  element head { element title { "wc1" } },
  element body {
    for $pub in
      doc('http://www.camranorthlondon.org.uk/nlpg/wc1.xhtml')
      /html/body/div/div/dl
      [ empty(dd[@class eq 'shut']) ]
      [ not(cts:contains(dd[ @class eq 'beer' ], 'no real ale')) ]
      [ not(cts:contains(dd[ @class eq 'desc' ], 'boarded up')) ]
    let $award := cts:contains($pub/dd, 'pub of the year')
    let $beers := count(tokenize($pub/dd[@class eq 'beer'], '[,;]+'))
    let $listed := cts:contains($pub/dd[@class eq 'desc'], 'listed')
    order by $award descending, $beers descending, $listed descending
    return element { node-name($pub) } {
      $pub/node()[ not(@class = ('tele', 'link', 'updt')) ]
    }
  }
}

Where am I?

Real XQuery

Latest blog posts

Recommended

Apps

Projects

Categories

Archives