home
about
services
contact

Where am I?

Querying XML: the book plug

January 19, 2007 at 12:04 PM | categories: XQuery, MarkLogic | View Comments

Stephen has written a book on XQuery. Unfortunately, I haven't read it yet.



Read and Post Comments
 

The Smoke

December 08, 2006 at 04:21 PM | categories: UK, travel, beer, XQuery, MarkLogic | View Comments

Happy chance puts me in London on the last two nights of East London CAMRA's Pig's Ear festival. I'm here to teach a 5-day session on XQuery and MarkLogic Server, but that doesn't start until Monday. So I have time for another.

Pig's Ear 2006

I'm using Hilton points to stay at the Waldorf, these first two nights. Then I'll move into a cheaper place for work. To get to the Pig's Ear, I walked up to Bloomsbury Square, then took the 55 bus to Hackney Town Hall. The festival was in a place called Ocean, across Mare St from the Town Hall (and from the Hackney Empire) - in other words, quite far from LondonLand. I suspect that I wasn't the only American tourist in the crowd, but it was probably close.

The Pig's Ear has been on hiatus for a few years, but I've been once before, when it was in Stratford (no Avon please - we're Londoners). The new hall is a bit sterile, compared to Stratford Town Hall, but it does have much more room. Stratford was always standing only, while Ocean seems to have enough seating for a much larger crowd. The festival beers still tend toward "winter warmers" - strong stuff. That isn't my favorite style: these days I prefer milds. I did try "Night on Mare Street", and found it dangerously easy to drink. But I managed to try every mild on the program, too - St Augustine's Tower was nice. I also tried several cloudy halves from the cider and perry counter. That's good stuff, and it's impossible to find in the USA. It's impossible to find through most of the UK, too. No, the processed stuff doesn't count. Sorry.

But you didn't come here to listen to me rant - you came for a pint. Cheers!

Read and Post Comments
 

Real XQuery

November 19, 2006 at 10:59 PM | categories: XQuery, beer, MarkLogic | View Comments

XQuery can have fun applications - no, really! Let's suppose I wanted to find a good pub in WC1, London. There's a local CAMRA web site with useful listings. But they track every pub in the area, so several of them don't have real ale, or are shut, or otherwise unsuitable. And we want to list the most interesting pubs first.
  1. CAMRA awards
  2. number of real ales
  3. historic interiors
First, we have to get the document into MarkLogic Server. Unfortunately the content has a character sequence that is illegal in UTF-8. Worse, the document does not say what character set it is in. MarkLogic Server does not have built-in support for any character set other than UTF-8, but it does have access to tidy as a built-in extension function: xdmp:tidy(), and tidy does know a thing or two about miscellaneous character sets. We can use this, plus some binary-node and xdmp:quote() magic, to load the page into MarkLogic Server.
declare namespace xh="http://www.w3.org/1999/xhtml"


let $uri := 'http://www.camranorthlondon.org.uk/nlpg/wc1.html' let $html := xdmp:tidy(xdmp:quote(xdmp:document-get( $uri, <options xmlns="xdmp:document-get"> <format>binary</format> </options> )))/xh:html return xdmp:document-insert($uri, $html)
That query returns the empty sequence, and puts an XHTML version of the original content into my MarkLogic Server database.

Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a dl element, with dd elements that have role-based CSS class names.

I didn't bother to fragment the listings, so the searching is all fairly "dumb", but it works well enough. The pubs I'm interested in are at the top.
default element namespace = "http://www.w3.org/1999/xhtml"


element html { element head { element title { "wc1" } }, element body { for $pub in doc('http://www.camranorthlondon.org.uk/nlpg/wc1.xhtml') /html/body/div/div/dl [ empty(dd[@class eq 'shut']) ] [ not(cts:contains(dd[ @class eq 'beer' ], 'no real ale')) ] [ not(cts:contains(dd[ @class eq 'desc' ], 'boarded up')) ] let $award := cts:contains($pub/dd, 'pub of the year') let $beers := count(tokenize($pub/dd[@class eq 'beer'], '[,;]+')) let $listed := cts:contains($pub/dd[@class eq 'desc'], 'listed') order by $award descending, $beers descending, $listed descending return element { node-name($pub) } { $pub/node()[ not(@class = ('tele', 'link', 'updt')) ] } } }
Maybe I should use this in the developer course....

Read and Post Comments
 

HP nc6400: 64-bit thrills and chills

October 13, 2006 at 04:04 PM | categories: MarkLogic, Linux | View Comments

Out goes a three-year old Dell D600 laptop - in comes a new HP nc6400. So far this has been both good and bad.

The D600 was very, very old for a laptop: in three years we replaced the motherboard twice, the hard drive twice, and the keyboard once. On the bright side, practically everything about the laptop actually worked under kubuntu dapper and edgy. Even the closed-source Broadcom wifi worked reasonably well, thanks to ndiswrapper (the open-source bcm44xx driver was never reliable for me). Hibernate and suspend-to-RAM both worked pretty well.

The new nc6400 looks very nice: from a distance you could mistake it for a Thinkpad. The screen is a downgrade (1280x800 instead of the D600's 1440x1050), but the laptop is noticeably lighter. The real reason to upgrade, though, was the CPU: it has a 2.0-GHz T7200 - one of Intel's new EM64T-capable "merom" or "Core 2 Duo" chips. EM64T is Intel's take on AMD's "AMD64" extensions: either branding extends the x86 instruction set to allow for 64-bit native mode.

If you've used MarkLogic Server, you know how much we like AMD Opteron CPUs. Yes, they're very fast - but the real reason is the 64-bit support. Now that Intel is shipping more EM64T chips, there's no excuse for 32-bit address spaces, anymore.

Except that... Windows hasn't caught up yet. And honestly, neither has linux. It's easy to run a 64-bit linux server, and only moderately harder to find all the 64-bit Windows drivers for your server hardware. But desktops are harder: Windows x64 might not support your sound card, for example.

Laptops? Only for the brave. The laptop makers seem to know this, and none of them are hyping their 64-bit support. My nc6400 came with Windows XP 32-bit pre-installed.

There's a good argument behind this. The laptop "only" has 2-GB of RAM, so you could argue that it would work fine in 32-bit mode. For MS Office or Firefox, that's absolutely true. But with MarkLogic Server, memory fragmentation turns out to be at least as important as memory size. In a 32-bit environment, it's pretty easy to chop up the 2-GB or 3-GB address space so badly that the software (or even the OS) has to be restarted. I've never seen this happen in 64-bit mode, simply because the address space is so huge.

So I unpacked the nc6400, booted from a recent kubuntu "edgy" CD for amd64, and got to work. I shrank the XP partition to 20-GB, on an 80-GB disk, and installed linux on the rest. Note to the unwary: ntfsresize rightly insists on a clean unmount of your NTFS partition, before it will do any resizing. Apparently XP doesn't cleanly unmount when it restarts: you must shut down XP, instead.

The install went pretty smoothly. The nc6400 has an Intel 3945 wifi card, so I don't need ndiswrapper anymore. I had some trouble with the widescreen resolution (1280x800), but that's to be expected right now.

So what's the bad news? Well, it looks like HP really messed up the bios (F.05) when they added support for the Core 2 Duo CPUs. I noticed this fairly quickly with my new linux install: the CPU temperatures, as reported by /proc/acpi/thermal_zone, were hitting 95-C. The fans seemed to run at three speeds: the medium-speed fan would run until the temp hit 95-C, then a higher-speed fan would take it down to 85-C, and then the temperature would start to climb again. This made the laptop uncomfortably hot, and very loud.

After more investigation, it appeared that the cpufreq module wouldn't load, so the CPU was running both cores at full speed. Naturally, this generates a lot of heat. Reports indicate that the last bios rev (F.03) work fine with cpufreq, but that older bios won't work with my T7200 CPU. So I'm stuck, until and unless HP fixes the problem in a future bios update.

Judging by other reports on the net, HP's round of merom-related BIOS releases removed the ACPI-based CPU frequency scaling tables from every laptop product that supports merom CPUs. The same change doesn't seem to affect Windows XP: apparently XP ships with its own CPU frequency scaling tables, instead of expecting the bios to supply them. Conspiracy theorists might ask if this was done on purpose, but I think it's just a bug. Time (and HP's next BIOS release) will tell: after all, my old D600 didn't hibernate properly with linux until Dell had released about 10 BIOS updates.

As a workaround, I used the BIOS setup screen to disable the CPU's second core. So I still don't have frequency scaling, and my laptop is only using half of its capacity, but at least it isn't overheating. The CPU temperature is fairly stable at 55-C. The battery life probably suffers, though: I'm only seeing about 90-min from the internal battery.

Read and Post Comments
 

Paginated search tutorial

September 28, 2006 at 03:47 PM | categories: XQuery, MarkLogic | View Comments

I still have a long list of tutorial ideas. Here's the latest one I've written, covering paginated search from an XQuery library module.

Read and Post Comments