Real XQuery
November 19, 2006 at 10:59 PM | categories: XQuery, beer, MarkLogic | View Comments
XQuery can have fun applications - no, really! Let's suppose I wanted to find a good pub in WC1, London. There's a local CAMRA web site with useful listings. But they track every pub in the area, so several of them don't have real ale, or are shut, or otherwise unsuitable. And we want to list the most interesting pubs first.
Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a
I didn't bother to fragment the listings, so the searching is all fairly "dumb", but it works well enough. The pubs I'm interested in are at the top.
- CAMRA awards
- number of real ales
- historic interiors
xdmp:tidy()
, and tidy does know a thing or two about miscellaneous character sets. We can use this, plus some binary-node and xdmp:quote()
magic, to load the page into MarkLogic Server.
declare namespace xh="http://www.w3.org/1999/xhtml"That query returns the empty sequence, and puts an XHTML version of the original content into my MarkLogic Server database.
let $uri := 'http://www.camranorthlondon.org.uk/nlpg/wc1.html' let $html := xdmp:tidy(xdmp:quote(xdmp:document-get( $uri, <options xmlns="xdmp:document-get"> <format>binary</format> </options> )))/xh:html return xdmp:document-insert($uri, $html)
Now that the listings are in the database, I can generate my own pub guide. The content author made this easy for me, by following a very regular structure (maybe the original content was generated from a database or a spreadsheet?). Each pub is a
dl
element, with dd
elements that have role-based CSS class names.
I didn't bother to fragment the listings, so the searching is all fairly "dumb", but it works well enough. The pubs I'm interested in are at the top.
default element namespace = "http://www.w3.org/1999/xhtml"Maybe I should use this in the developer course....
element html { element head { element title { "wc1" } }, element body { for $pub in doc('http://www.camranorthlondon.org.uk/nlpg/wc1.xhtml') /html/body/div/div/dl [ empty(dd[@class eq 'shut']) ] [ not(cts:contains(dd[ @class eq 'beer' ], 'no real ale')) ] [ not(cts:contains(dd[ @class eq 'desc' ], 'boarded up')) ] let $award := cts:contains($pub/dd, 'pub of the year') let $beers := count(tokenize($pub/dd[@class eq 'beer'], '[,;]+')) let $listed := cts:contains($pub/dd[@class eq 'desc'], 'listed') order by $award descending, $beers descending, $listed descending return element { node-name($pub) } { $pub/node()[ not(@class = ('tele', 'link', 'updt')) ] } } }
HP nc6400: 64-bit thrills and chills
October 13, 2006 at 04:04 PM | categories: MarkLogic, Linux | View Comments
Out goes a three-year old Dell D600 laptop - in comes a new HP nc6400. So far this has been both good and bad.
The D600 was very, very old for a laptop: in three years we replaced the motherboard twice, the hard drive twice, and the keyboard once. On the bright side, practically everything about the laptop actually worked under kubuntu dapper and edgy. Even the closed-source Broadcom wifi worked reasonably well, thanks to ndiswrapper (the open-source bcm44xx driver was never reliable for me). Hibernate and suspend-to-RAM both worked pretty well.
The new nc6400 looks very nice: from a distance you could mistake it for a Thinkpad. The screen is a downgrade (1280x800 instead of the D600's 1440x1050), but the laptop is noticeably lighter. The real reason to upgrade, though, was the CPU: it has a 2.0-GHz T7200 - one of Intel's new EM64T-capable "merom" or "Core 2 Duo" chips. EM64T is Intel's take on AMD's "AMD64" extensions: either branding extends the x86 instruction set to allow for 64-bit native mode.
If you've used MarkLogic Server, you know how much we like AMD Opteron CPUs. Yes, they're very fast - but the real reason is the 64-bit support. Now that Intel is shipping more EM64T chips, there's no excuse for 32-bit address spaces, anymore.
Except that... Windows hasn't caught up yet. And honestly, neither has linux. It's easy to run a 64-bit linux server, and only moderately harder to find all the 64-bit Windows drivers for your server hardware. But desktops are harder: Windows x64 might not support your sound card, for example.
Laptops? Only for the brave. The laptop makers seem to know this, and none of them are hyping their 64-bit support. My nc6400 came with Windows XP 32-bit pre-installed.
There's a good argument behind this. The laptop "only" has 2-GB of RAM, so you could argue that it would work fine in 32-bit mode. For MS Office or Firefox, that's absolutely true. But with MarkLogic Server, memory fragmentation turns out to be at least as important as memory size. In a 32-bit environment, it's pretty easy to chop up the 2-GB or 3-GB address space so badly that the software (or even the OS) has to be restarted. I've never seen this happen in 64-bit mode, simply because the address space is so huge.
So I unpacked the nc6400, booted from a recent kubuntu "edgy" CD for amd64, and got to work. I shrank the XP partition to 20-GB, on an 80-GB disk, and installed linux on the rest. Note to the unwary: ntfsresize rightly insists on a clean unmount of your NTFS partition, before it will do any resizing. Apparently XP doesn't cleanly unmount when it restarts: you must shut down XP, instead.
The install went pretty smoothly. The nc6400 has an Intel 3945 wifi card, so I don't need ndiswrapper anymore. I had some trouble with the widescreen resolution (1280x800), but that's to be expected right now.
So what's the bad news? Well, it looks like HP really messed up the bios (F.05) when they added support for the Core 2 Duo CPUs. I noticed this fairly quickly with my new linux install: the CPU temperatures, as reported by /proc/acpi/thermal_zone, were hitting 95-C. The fans seemed to run at three speeds: the medium-speed fan would run until the temp hit 95-C, then a higher-speed fan would take it down to 85-C, and then the temperature would start to climb again. This made the laptop uncomfortably hot, and very loud.
After more investigation, it appeared that the cpufreq module wouldn't load, so the CPU was running both cores at full speed. Naturally, this generates a lot of heat. Reports indicate that the last bios rev (F.03) work fine with cpufreq, but that older bios won't work with my T7200 CPU. So I'm stuck, until and unless HP fixes the problem in a future bios update.
Judging by other reports on the net, HP's round of merom-related BIOS releases removed the ACPI-based CPU frequency scaling tables from every laptop product that supports merom CPUs. The same change doesn't seem to affect Windows XP: apparently XP ships with its own CPU frequency scaling tables, instead of expecting the bios to supply them. Conspiracy theorists might ask if this was done on purpose, but I think it's just a bug. Time (and HP's next BIOS release) will tell: after all, my old D600 didn't hibernate properly with linux until Dell had released about 10 BIOS updates.
As a workaround, I used the BIOS setup screen to disable the CPU's second core. So I still don't have frequency scaling, and my laptop is only using half of its capacity, but at least it isn't overheating. The CPU temperature is fairly stable at 55-C. The battery life probably suffers, though: I'm only seeing about 90-min from the internal battery.
The D600 was very, very old for a laptop: in three years we replaced the motherboard twice, the hard drive twice, and the keyboard once. On the bright side, practically everything about the laptop actually worked under kubuntu dapper and edgy. Even the closed-source Broadcom wifi worked reasonably well, thanks to ndiswrapper (the open-source bcm44xx driver was never reliable for me). Hibernate and suspend-to-RAM both worked pretty well.
The new nc6400 looks very nice: from a distance you could mistake it for a Thinkpad. The screen is a downgrade (1280x800 instead of the D600's 1440x1050), but the laptop is noticeably lighter. The real reason to upgrade, though, was the CPU: it has a 2.0-GHz T7200 - one of Intel's new EM64T-capable "merom" or "Core 2 Duo" chips. EM64T is Intel's take on AMD's "AMD64" extensions: either branding extends the x86 instruction set to allow for 64-bit native mode.
If you've used MarkLogic Server, you know how much we like AMD Opteron CPUs. Yes, they're very fast - but the real reason is the 64-bit support. Now that Intel is shipping more EM64T chips, there's no excuse for 32-bit address spaces, anymore.
Except that... Windows hasn't caught up yet. And honestly, neither has linux. It's easy to run a 64-bit linux server, and only moderately harder to find all the 64-bit Windows drivers for your server hardware. But desktops are harder: Windows x64 might not support your sound card, for example.
Laptops? Only for the brave. The laptop makers seem to know this, and none of them are hyping their 64-bit support. My nc6400 came with Windows XP 32-bit pre-installed.
There's a good argument behind this. The laptop "only" has 2-GB of RAM, so you could argue that it would work fine in 32-bit mode. For MS Office or Firefox, that's absolutely true. But with MarkLogic Server, memory fragmentation turns out to be at least as important as memory size. In a 32-bit environment, it's pretty easy to chop up the 2-GB or 3-GB address space so badly that the software (or even the OS) has to be restarted. I've never seen this happen in 64-bit mode, simply because the address space is so huge.
So I unpacked the nc6400, booted from a recent kubuntu "edgy" CD for amd64, and got to work. I shrank the XP partition to 20-GB, on an 80-GB disk, and installed linux on the rest. Note to the unwary: ntfsresize rightly insists on a clean unmount of your NTFS partition, before it will do any resizing. Apparently XP doesn't cleanly unmount when it restarts: you must shut down XP, instead.
The install went pretty smoothly. The nc6400 has an Intel 3945 wifi card, so I don't need ndiswrapper anymore. I had some trouble with the widescreen resolution (1280x800), but that's to be expected right now.
So what's the bad news? Well, it looks like HP really messed up the bios (F.05) when they added support for the Core 2 Duo CPUs. I noticed this fairly quickly with my new linux install: the CPU temperatures, as reported by /proc/acpi/thermal_zone, were hitting 95-C. The fans seemed to run at three speeds: the medium-speed fan would run until the temp hit 95-C, then a higher-speed fan would take it down to 85-C, and then the temperature would start to climb again. This made the laptop uncomfortably hot, and very loud.
After more investigation, it appeared that the cpufreq module wouldn't load, so the CPU was running both cores at full speed. Naturally, this generates a lot of heat. Reports indicate that the last bios rev (F.03) work fine with cpufreq, but that older bios won't work with my T7200 CPU. So I'm stuck, until and unless HP fixes the problem in a future bios update.
Judging by other reports on the net, HP's round of merom-related BIOS releases removed the ACPI-based CPU frequency scaling tables from every laptop product that supports merom CPUs. The same change doesn't seem to affect Windows XP: apparently XP ships with its own CPU frequency scaling tables, instead of expecting the bios to supply them. Conspiracy theorists might ask if this was done on purpose, but I think it's just a bug. Time (and HP's next BIOS release) will tell: after all, my old D600 didn't hibernate properly with linux until Dell had released about 10 BIOS updates.
As a workaround, I used the BIOS setup screen to disable the CPU's second core. So I still don't have frequency scaling, and my laptop is only using half of its capacity, but at least it isn't overheating. The CPU temperature is fairly stable at 55-C. The battery life probably suffers, though: I'm only seeing about 90-min from the internal battery.
Paparazzi!
October 01, 2006 at 03:48 PM | categories: home | View Comments
I've been having trouble with nocturnal visitors, lately: raccoons and skunks. I haven't seen a skunk yet, but I can tell that they're around, somehow.
The raccoons are more obvious, and they show up in gangs of two or three. This one was on the back deck at about 3am this morning. The cat makes lots of noise, but she's safe behind the glass. I wonder what she'd do if I opened the door for her? Probably she'd wind up as a couple pounds of raccoon chow.
But raccoons don't like to have their picture taken. This one muttered something about papararazzi (they're everywhere!), and slunk off into the bushes.
The raccoons are more obvious, and they show up in gangs of two or three. This one was on the back deck at about 3am this morning. The cat makes lots of noise, but she's safe behind the glass. I wonder what she'd do if I opened the door for her? Probably she'd wind up as a couple pounds of raccoon chow.
But raccoons don't like to have their picture taken. This one muttered something about papararazzi (they're everywhere!), and slunk off into the bushes.
Paginated search tutorial
September 28, 2006 at 03:47 PM | categories: XQuery, MarkLogic | View Comments
I still have a long list of tutorial ideas. Here's the latest one I've written, covering paginated search from an XQuery library module.