Real XQuery
November 19, 2006 at 10:59 PM | categories: XQuery, beer, MarkLogic | View Comments
XQuery can have fun applications - no, really! Let's suppose I wanted to find a good pub in WC1, London. There's a local CAMRA web site with useful listings. But they track every pub in the area, so several of them don't have real ale, or are shut, or otherwise unsuitable. And we want to list the most interesting pubs first.
Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a
I didn't bother to fragment the listings, so the searching is all fairly "dumb", but it works well enough. The pubs I'm interested in are at the top.
- CAMRA awards
- number of real ales
- historic interiors
xdmp:tidy()
, and tidy does know a thing or two about miscellaneous character sets. We can use this, plus some binary-node and xdmp:quote()
magic, to load the page into MarkLogic Server.
declare namespace xh="http://www.w3.org/1999/xhtml"That query returns the empty sequence, and puts an XHTML version of the original content into my MarkLogic Server database.
let $uri := 'http://www.camranorthlondon.org.uk/nlpg/wc1.html' let $html := xdmp:tidy(xdmp:quote(xdmp:document-get( $uri, <options xmlns="xdmp:document-get"> <format>binary</format> </options> )))/xh:html return xdmp:document-insert($uri, $html)
Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a
dl
element, with dd
elements that have role-based CSS class names.
I didn't bother to fragment the listings, so the searching is all fairly "dumb", but it works well enough. The pubs I'm interested in are at the top.
default element namespace = "http://www.w3.org/1999/xhtml"Maybe I should use this in the developer course....
element html { element head { element title { "wc1" } }, element body { for $pub in doc('http://www.camranorthlondon.org.uk/nlpg/wc1.xhtml') /html/body/div/div/dl [ empty(dd[@class eq 'shut']) ] [ not(cts:contains(dd[ @class eq 'beer' ], 'no real ale')) ] [ not(cts:contains(dd[ @class eq 'desc' ], 'boarded up')) ] let $award := cts:contains($pub/dd, 'pub of the year') let $beers := count(tokenize($pub/dd[@class eq 'beer'], '[,;]+')) let $listed := cts:contains($pub/dd[@class eq 'desc'], 'listed') order by $award descending, $beers descending, $listed descending return element { node-name($pub) } { $pub/node()[ not(@class = ('tele', 'link', 'updt')) ] } } }