Skip to content

Month: November 2005

[thx] HAM


flickr_IMG_7139.jpg
Originally uploaded by Andy Cadaver.

I was just emailing with Sarah Carey, and she correctly noted that my weblog has been tending towards the techie-incomprehensible recently. A brief look at the front page confirms this.

So here’s a remedy: a photo of the delicious ham which the lovely C cooked up for Thanksgiving, last Thursday. Just look at that, mmmmm!

When I get back to Ireland, I will be bringing Thanksgiving with me; a holiday based around eating cooked fowl, with no religious baggage whatsoever? I’m so there.


New SpamAssassin Rule Development Tools

Recently, I’ve been working on new systems to develop SpamAssassin rules faster, and with a lower ‘barrier to entry’ to the core ruleset. Some highlights seem bloggable, seeing as it’s all web-based and I can link to it!

The ‘preflight’ BuildBot:

This uses the fantastic BuildBot continuous-integration system to monitor changes to our Subversion repository.

Every time something is checked into SVN, this wakes up and immediately runs mass-checks using that latest code and rules, allowing near-real-time viewing of changes in rule behaviour. (A ‘mass-check’ is a massive run of SpamAssassin across a corpus of hundreds of thousands of emails, en masse, to measure rule hit-rates.)

The corpus it mass-checks is split in a certain way so that results will be available very quickly — typically in under 10 minutes — with increasing quantities of results becoming available as time elapses.

Progress of the mass-checks are visible at the BuildBot here; as they complete, their results become visible on the Rule-QA app (below). (More info, if you’re curious.)

The Rule-QA App:

To date, we’ve used the basic “freqs” table — output from the hit-frequencies command-line script — as the UI for rule QA and evaluation. This is fine for a small number of developers, but it scales badly and (like mass-checks) requires a pretty complex setup on the developer’s machine.

This new component is a web application, which takes the “freqs” table, and “webifies” it — demo.

Some major improvements are also made possible; the most important, that it can now display ‘freqs’ for multiple revisions during the day, and keeps historical data for comparison. It adds several new reports from ‘hit-frequencies’; a score-map, overlaps, a performance measurement, and a boolean ‘promoteability’ measurement.

Finally, a really useful new report is the graph of rule hit-rate, as it changes over time. Here’s a cached demo, or see the same data produced ‘live’. This gives a totally new insight into how the rule hits for various people’s corpora, how that changed over time, and allows a whole new type of rule analysis. (In fact, it also allows pretty good corpus analysis, too; can you tell which submitters bounce high-scoring spam at receipt time?)

(More info on these.)

Product idea: RAID Backup Enclosures

Cory Doctorow at Boing Boing links to an article at TechCrunch that lists Better and Cheaper Online File Storage as a product that needs to be made. However, Ben Laurie does the sums on online storage as a useful backup medium, and found them not exactly compelling (e.g. 100GB of data will take 75 days to upload over an 128Kbps link).

I tend to agree. An online host isn’t great as a backup host, since, in my experience, there are two types of backups required:

  • The important small files (for example: encrypted password lists, my address book, my ~/bin directory)
  • The massive big filesets (for example: MP3s, photos)

The first kind of fileset is amenable to an online backup-storage service, at first glance. However — in my opinion you’re better off going the whole hog for these files, and using the distributed, versioned backup method of putting it in a good networked revision control system, and checking it out everywhere, so you can also make changes and check in from any host; otherwise, you face the perils of syncing up a single backup from multiple “writers”, without conflicts. So far, none of the online file storage services offer SVN as an access method, so a shell account at a colo server still seems more useful on that count.

The second kind of fileset, as Ben notes, will take donkey’s years to upload and sync as a backup mechanism; and the economics are hardly compelling for the service provider.

I think I prefer Brad Templeton’s idea to deal with large-data backups —

I propose a software RAID-5, done over a LAN with 3 to 5 drives scattered over several machines on the LAN.

Slow as hell, of course, having to read and write your data out over the LAN even at 100mbits. Gigabit would obviously be better. But what is it we have that’s taking up all this disk space ? it?s video, music and photos. Things which, if just being played back, don?t need to be accessed very fast. If you’re not editing video or music, in particular, you can handle having it on a very slow device. (Photos are a bigger issue, as they do sometimes need fast access when building thumbnails etc.)

This could even be done among neighbours over 802.11g, with suitable encryption. In theory.

As a commenter notes, Linux has support for this already, in the form of software RAID and the network block device.

So: take an external IDE enclosure, add a GumStix board running Linux with software RAID, LVM, and nbd, and add wifi. Then add DAV, SMB and NFS export of the disk, and some decent UI code to organise the volumes into a single exported RAID volume (hopefully automatically!), and it’d be a pretty compelling product, in my opinion!

(hey Craig! I said GumStix! ;)

Wisdom Teeth — Complete!

On Friday, I got my lower-left wisdom tooth extracted. That’s the last one that should cause any trouble; there’s only one remaining, and it’s fully out so shouldn’t act up. After a few years of on-again-off-again twinges, and lots of irresponsible putting-off of surgery, I’ve finally taken care of it.

The downside: I’m totally zonked on painkillers, so I won’t be doing much for the next few days apart from what’s required for day-to-day day-job stuff.

Urban Dead HUD; added Inventory Sorting

I’ve updated the Urban Dead HUD Greasemonkey userscript; it now offers inventory sorting, inspired by Ikko’s userscript (albeit a little different in implementation). Here’s a screenshot:

Right now, UD is reasonably interesting — our team of plucky survivors have been helping out with the defence of Caiger Mall, a major mall towards the north-west of the city. We’ve repulsed the Church of the Resurrection‘s attempts to wipe us out, but that seems to have made us quite a juicy target; there are now no less than three separate Zombie groups ganging up on us. For now, we’re still holding out.

Mobile phone repair at Karol Bagh Market

I love these pictures:

I link-blogged that article ages ago, but I keep thinking of it, so it’s worth a proper post in its own right, to expand on that.

These guys work at an Indian mobile phone repair stall in Karol Bagh Market, in Delhi. The blog entry notes:

As in China, many of the mobile phone shops and street kiosks offer mobile phone repair service. Many of these guys can strip and rebuild a mobile phone in minutes. … a lot of the hyperbole surrounding western hacker culture makes me smile compared to what these guys are doing day in day out.

Also, a commenter notes: ‘in india, for about 1$, you can convert a CDMA phone to GSM !! also, they can unlock phones and do a veriety of hacks for little money.’

There’s so many lessons I’m getting from it:

  1. I’ve had a shoe resoled in 5 minutes for next to nothing at a stall not too different from that — but this is a mobile phone. It’s amazing to think of that level of hardware hacking taking place every day at a back-street market stall.

  2. Those phones were doubtless planned, as a product, with a ‘ship back to manufacturer’ support plan. That clearly isn’t going to fly without that developed-world luxury, Fedex. So this is the developing-world street finding its own uses for things, and working around the dependencies on systems that are optimised for the developed world.

  3. It’s the flip-side of Joshua Ellis’ grim meathook future, where we’re not facing down the barrel of a New-Orleans-style descent into barbarity if the power suddenly cuts out; tech can go on. It may be a little chunkier, though, and with more duct tape, but hey.

  4. It’s also a beautiful demonstration of how those of us in the developed world who assume that developing-worlders cannot find a use for high tech, are talking shit. (cf. Ethan Zuckerman as a good example of someone who gets this, more than almost anyone else I can think of.)

I think this is one of the most important lessons I learned while travelling through India and SE Asia a few years back — the developing world is using high tech, and it’s not using it in the same ways we do — or even the ways we anticipated, and we have plenty to learn from them too.

Found at Jan Chipchase’s site, which is full of great contemplation on this stuff. (The story on Seoul’s selca culture is nuts, too — it’s like Flickr^1000.)

(PS: I have a wisdom tooth extraction scheduled for next Friday… wish me luck. That’s another thing you don’t want to happen in the developing world, although I daresay it’d rock in Bangkok!)

(Update: clarification — my cite of Ethan Z was meant as a compliment ;)

IFSO Seminar In Dublin

Passing this on for readers in Ireland — this sounds like an interesting event. From the FSFE-IE mailing list:

On the morning of Friday November 18th, IFSO is organising an event hosted by MEP Proinsias De Rossa about preventing software patents in the EU. Topics covered will be:

  • An analysis of the software patent directive;
  • a discussion of Free Software and computer security;
  • an introduction to IFSO/FSFE and their work;
  • the future of legislative obstacles to the development and distribution of software.

The event will be held in the European Parliament Office in Ireland, and spaces are limited. Participants are therefore asked to register their intent to attend. See here for more details.