home
about
services
contact

Where am I?

Real XQuery

November 19, 2006 at 10:59 PM | categories: XQuery, beer, MarkLogic | View Comments

XQuery can have fun applications - no, really! Let's suppose I wanted to find a good pub in WC1, London. There's a local CAMRA web site with useful listings. But they track every pub in the area, so several of them don't have real ale, or are shut, or otherwise unsuitable. And we want to list the most interesting pubs first.
  1. CAMRA awards
  2. number of real ales
  3. historic interiors
First, we have to get the document into MarkLogic Server. Unfortunately the content has a character sequence that is illegal in UTF-8. Worse, the document does not say what character set it is in. MarkLogic Server does not have built-in support for any character set other than UTF-8, but it does have access to tidy as a built-in extension function: xdmp:tidy(), and tidy does know a thing or two about miscellaneous character sets. We can use this, plus some binary-node and xdmp:quote() magic, to load the page into MarkLogic Server.
declare namespace xh="http://www.w3.org/1999/xhtml"


let $uri := 'http://www.camranorthlondon.org.uk/nlpg/wc1.html' let $html := xdmp:tidy(xdmp:quote(xdmp:document-get( $uri, <options xmlns="xdmp:document-get"> <format>binary</format> </options> )))/xh:html return xdmp:document-insert($uri, $html)
That query returns the empty sequence, and puts an XHTML version of the original content into my MarkLogic Server database.

Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a dl element, with dd elements that have role-based CSS class names.

I didn't bother to fragment the listings, so the searching is all fairly "dumb", but it works well enough. The pubs I'm interested in are at the top.
default element namespace = "http://www.w3.org/1999/xhtml"


element html { element head { element title { "wc1" } }, element body { for $pub in doc('http://www.camranorthlondon.org.uk/nlpg/wc1.xhtml') /html/body/div/div/dl [ empty(dd[@class eq 'shut']) ] [ not(cts:contains(dd[ @class eq 'beer' ], 'no real ale')) ] [ not(cts:contains(dd[ @class eq 'desc' ], 'boarded up')) ] let $award := cts:contains($pub/dd, 'pub of the year') let $beers := count(tokenize($pub/dd[@class eq 'beer'], '[,;]+')) let $listed := cts:contains($pub/dd[@class eq 'desc'], 'listed') order by $award descending, $beers descending, $listed descending return element { node-name($pub) } { $pub/node()[ not(@class = ('tele', 'link', 'updt')) ] } } }
Maybe I should use this in the developer course....

Read and Post Comments
 

Paginated search tutorial

September 28, 2006 at 03:47 PM | categories: XQuery, MarkLogic | View Comments

I still have a long list of tutorial ideas. Here's the latest one I've written, covering paginated search from an XQuery library module.

Read and Post Comments
 

XQSync tutorial

August 29, 2006 at 03:38 PM | categories: XQuery, MarkLogic | View Comments

The RecordLoader tutorial seems to be popular, so here's another. This tutorial covers XQSync.

Read and Post Comments
 

Performance and QA Test Tool

July 12, 2006 at 03:41 PM | categories: XQuery, MarkLogic | View Comments

Over at Mark Logic we recently released a new PerformanceMeters tool, which can be handy for benchmark testing - and unit testing, too. I've written a short tutorial.



Read and Post Comments
 

RecordLoader tutorial

June 26, 2006 at 03:38 PM | categories: XQuery, MarkLogic | View Comments

If you've tried to use my RecordLoader tool, you've probably noticed that it's lightly-documented and somewhat cryptic. This tutorial may help.

Read and Post Comments