home
about
services
contact

Where am I?

The Smoke

December 08, 2006 at 04:21 PM | categories: UK, travel, beer, XQuery, MarkLogic | View Comments

Happy chance puts me in London on the last two nights of East London CAMRA's Pig's Ear festival. I'm here to teach a 5-day session on XQuery and MarkLogic Server, but that doesn't start until Monday. So I have time for another.

Pig's Ear 2006

I'm using Hilton points to stay at the Waldorf, these first two nights. Then I'll move into a cheaper place for work. To get to the Pig's Ear, I walked up to Bloomsbury Square, then took the 55 bus to Hackney Town Hall. The festival was in a place called Ocean, across Mare St from the Town Hall (and from the Hackney Empire) - in other words, quite far from LondonLand. I suspect that I wasn't the only American tourist in the crowd, but it was probably close.

The Pig's Ear has been on hiatus for a few years, but I've been once before, when it was in Stratford (no Avon please - we're Londoners). The new hall is a bit sterile, compared to Stratford Town Hall, but it does have much more room. Stratford was always standing only, while Ocean seems to have enough seating for a much larger crowd. The festival beers still tend toward "winter warmers" - strong stuff. That isn't my favorite style: these days I prefer milds. I did try "Night on Mare Street", and found it dangerously easy to drink. But I managed to try every mild on the program, too - St Augustine's Tower was nice. I also tried several cloudy halves from the cider and perry counter. That's good stuff, and it's impossible to find in the USA. It's impossible to find through most of the UK, too. No, the processed stuff doesn't count. Sorry.

But you didn't come here to listen to me rant - you came for a pint. Cheers!

Read and Post Comments
 

Real XQuery

November 19, 2006 at 10:59 PM | categories: XQuery, beer, MarkLogic | View Comments

XQuery can have fun applications - no, really! Let's suppose I wanted to find a good pub in WC1, London. There's a local CAMRA web site with useful listings. But they track every pub in the area, so several of them don't have real ale, or are shut, or otherwise unsuitable. And we want to list the most interesting pubs first.
  1. CAMRA awards
  2. number of real ales
  3. historic interiors
First, we have to get the document into MarkLogic Server. Unfortunately the content has a character sequence that is illegal in UTF-8. Worse, the document does not say what character set it is in. MarkLogic Server does not have built-in support for any character set other than UTF-8, but it does have access to tidy as a built-in extension function: xdmp:tidy(), and tidy does know a thing or two about miscellaneous character sets. We can use this, plus some binary-node and xdmp:quote() magic, to load the page into MarkLogic Server.
declare namespace xh="http://www.w3.org/1999/xhtml"


let $uri := 'http://www.camranorthlondon.org.uk/nlpg/wc1.html' let $html := xdmp:tidy(xdmp:quote(xdmp:document-get( $uri, <options xmlns="xdmp:document-get"> <format>binary</format> </options> )))/xh:html return xdmp:document-insert($uri, $html)
That query returns the empty sequence, and puts an XHTML version of the original content into my MarkLogic Server database.

Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a dl element, with dd elements that have role-based CSS class names.

I didn't bother to fragment the listings, so the searching is all fairly "dumb", but it works well enough. The pubs I'm interested in are at the top.
default element namespace = "http://www.w3.org/1999/xhtml"


element html { element head { element title { "wc1" } }, element body { for $pub in doc('http://www.camranorthlondon.org.uk/nlpg/wc1.xhtml') /html/body/div/div/dl [ empty(dd[@class eq 'shut']) ] [ not(cts:contains(dd[ @class eq 'beer' ], 'no real ale')) ] [ not(cts:contains(dd[ @class eq 'desc' ], 'boarded up')) ] let $award := cts:contains($pub/dd, 'pub of the year') let $beers := count(tokenize($pub/dd[@class eq 'beer'], '[,;]+')) let $listed := cts:contains($pub/dd[@class eq 'desc'], 'listed') order by $award descending, $beers descending, $listed descending return element { node-name($pub) } { $pub/node()[ not(@class = ('tele', 'link', 'updt')) ] } } }
Maybe I should use this in the developer course....

Read and Post Comments