Firefox Download Evening

Download Day

Happy Firefox Download Day — or rather, Firefox Download Evening!

It turns out that the “day” in question has been defined as a 24-hour period starting at 10am Pacific Time; rather than compensating for the effects of timezones around the world, they’ve just picked an arbitrary 24-hour period.

That’s 6pm in Irish time, for example. At least I’m not one of the 57,000 Japanese pledgers, who’d be waiting up until 2am to kick off their download. It seems a little bizarre that there’s little leeway provided for non-US downloaders, who are right now twiddling their thumbs, waiting, while their “day” passes.

Annoyingly, the main world record page simply says ‘the official date for the launch of Firefox 3 is June 17, 2008′ — no mention of a starting time or official timezone at all!

This is the top thread on their forum right now — in addition to the omission of an entire continent ;)

Tags: , , , , , ,

Comments (37)

The Life of a SpamAssassin Rule

Spam: during a recent discussion on the SpamAssassin dev list, the question came up as to how long a rule could expect to maintain its effectiveness once it was public — the rule secrecy issue.

In order to make a point — that certain types of very successful rules can indeed last a long time — I picked out one rule, MIME_BOUND_DD_DIGITS. Here’s a smartened-up copy of what I found out.

This rule matches a certain format of MIME boundary, one observed in 17.4637% of our spam collection and with 0 nonspam hits. Since we have a massive collection of mails, received between Jan 2004 to May 2005, and a rule with a known history, we can then graph its effectiveness over time.

The rule’s history was:

  • bug 3396: the initial contribution from Bob Menschel, May 15 2004
  • r10692: arrived in SVN: May 16 2004
  • r20178: promoted to ‘MIME_BOUND_DD_DIGITS’: May 20 2004 (funnily enough, with a note speculating about its lifetime from felicity!)
  • released in the SpamAssassin 3.0.0 release: mid-Sep 2004

So, we would expect to see a drop in its effectiveness against spam in late May 2004 and onwards, if the spammers were reacting to SVN changes; or post September 2004, if they react to what’s released.

By graphing the number of hits on mails within each 2-hour window, we can get a good idea of its effectiveness over time:

The red bars are total spam mails in each time period; green bars, the number of spam mails that hit the rule in each period. May 15 2004 and Sep 20 2004 are marked; Jan 2004 is at the left, and May 2005 is at the right-most extreme of the graph. (There’s a massive spike in spam volume at the right — I think this is Sober.Q output, which disappears after a week or so.)

It appears that the rule remains about even in effectiveness in the 4 months it’s in SVN, but unreleased; it declines a little more after it makes it into a SpamAssassin release. However, it trails off very slowly — even in May 2005, it’s still hitting a good portion of spam.

Given this, I suspect that most spammers are not changing structural aspects of their spam in response to SpamAssassin with any particular alacrity, or at least are not capable of doing so.

To speculate on the latter, I think many spammers are using pirated copies of the spamware apps, so cannot get their hands on updated versions through ‘legitimate’ channels.

Speculating on the former — in my opinion there’s a very good chance that SpamAssassin just isn’t a particular big target for them to evade, compared to the juicy pool of gullible targets behind AOL’s filters, for example. ;)

Tags: , , , , , , , , ,

Comments (3)

Happy Midwinter’s Day!

Antarctic: Happy Midwinter’s Day!

I’ve just finished reading Big Dead Place , Nicholas Johnson’s book about life at McMurdo Base and the US South Pole Station, with anecdotes from his time there in the early years of this decade.

It’s a fantastic book — very illustrative of how life really goes on on a distant research base, once you get beyond romantic notions of exploration of the wild frontiers. (Like many geek kids, I spent my childhood dreaming of space exploration, and Antarctica is the nearest thing you can get to that right now.) A bonus: it’s hilarious, too.

Unfortunately it’s far from all good — as one review notes, it’s like ‘M*A*S*H on ice, a bleak, black comedy.’ There’s story after story of moronic bureaucratic edicts emailed from comparatively-sub-tropical Denver, Colorado, ass-covering emails from management on a massive scale, and injuries and asbestos exposures covered up to avoid spoiling ‘metrics’.

Here’s a sample of such absurdity, from an interview with Norwegian world-record breaking Antarctic explorer, Eirik Sønneland:

BDP: I was working at McMurdo when you arrived in 2001. I remember it well because we were commanded by NSF not to accommodate you in any way, and were forbidden to invite you to our rooms or into any buildings. We were told not to send mail for you, nor to send email messages for you. While you were in the area, NSF was keeping a close eye on you. What did the managers say to you when you arrived?

They asked us what plans we had for getting home. The manager at Scott Base (jm: the New Zealand base) was calm and listened to what we had to say. I must be honest and say that this was not the way we were treated by the U.S. manager. It was like an interrogation. Very unpleasant. He acted arrogant. However, it seemed like he started to realize after a couple of days that we didn’t try to fool anybody. He probably got his orders from people that were not in Antarctica at the time. And, to be honest, today I don’t have bad feelings toward anyone in McMurdo. Bottom line, what did hurt us was that people could not think without using bureaucracy. If people could only try to listen to what we said and stop looking up paragraphs in some kind of standard operating procedures for a short while, a lot could have been solved in a shorter time.

One example: our home office, together with Steven McLachlan and Klaus Pettersen in New Zealand, got a green light from the captain of the cargo ship that would deliver cargo (beer, etc.) to McMurdo, who said he would let us travel for free back to New Zealand if it was okay with his company. At first the company was agreeable, but then NSF told them that the ship would be under their rent until it left McMurdo and was 27 km away. Reason for the 27 km? The cargo ship needed support from the Coast Guard icebreaker to get through the ice. Since, technically, the contract with NSF did not cease until the ship left the ice, NSF could stop us from going on the ship. At which point NSF offered to fly us from McMurdo for US$50,000 each.

He also maintains an excellent website at BigDeadPlace.com, so go there for an idea of the writing. BTW, it appears the UK also maintains an Antarctic base. Here’s hoping they keep the bureaucracy at a saner level over there.

Tags: , , , , , , , , , ,

Comments

Bayesian learning animation

Spam: via John Graham-Cumming’s excellent anti-spam newsletter this month, comes a very cool animation of the dbacl Bayesian anti-spam filter being trained to classify a mail corpus. Here’s the animation:

And Laird’s explanation:

dbacl computes two scores for each document, a ham score and a spam score. Technically, each score is a kind of distance, and the best category for a document is the lowest scoring one. One way to define the spamminess is to take the numerical difference of these scores.

Each point in the picture is one document, with the ham score on the x-axis and the spam score on the y-axis. If a point falls on the diagonal y=x, then its scores are identical and both categories are equally likely. If the point is below the diagonal, then the classifier must mark it as spam, and above the diagonal it marks it as ham.

The points are colour coded. When a document is learned we draw a square (blue for ham, red for spam). The picture shows the current scores of both the training documents, and the as yet unknown documents in the SA corpus. The unknown documents are either cyan (we know it’s ham but the classifier doesn’t), magenta (spam), or black. Black means that at the current state of learning, the document would be misclassified, because it falls on the wrong side of the diagonal. We don’t distinguish the types of errors. Only we know the point is black, the classifier doesn’t.

At time zero, when nothing has been learned, all the points are on the diagonal, because the two categories are symmetric.

Over time, the points move because the classifier’s probabilities change a little every time training occurs, and the clouds of points give an overall picture of what dbacl thinks of the unknown points. Of course, the more documents are learned, the fewer unknown points are left.

This is an excellent visualisation of the process, and demonstrates nicely what happens when you train a Bayesian spam-filter. You can clearly see the ‘unsure’ classifications becoming more reliable as the training corpus size increases. Very nice work!

It’s interesting to note the effects of an unbalanced corpus early on; a lot of spam training and little ham training results in a noticeable bias towards the classifier returning a spam classification.

Tags: , , , , , , , , ,

Comments

Yet another non-smoking weblog

Life: seeing as yesterday was World No Tobacco Day, it’s worth noting that I gave up smoking last Thursday.

This is the first time I’ve taken the step of quitting with any seriousness. I’ve been smoking since I was 18 or 19, without any real attempts to quit before now. It was a gradual process, but imagining a smoker’s future, with the diseases and reduced life expectancy it involves, makes it quite sensible in the end. So far, it’s going pretty well — lots of occasional pangs, but nothing I can’t say no to… especially with the aid of Liquorice Altoids. wish me luck!

Tags: , , , , , , , , , ,

Comments

UBE, not UCE

Spam: About this time last year, German neo-nazis launched a massive worldwide spam run with the aid of the Sober.H worm.

Well, it looks like they’re planning to make this a regular occurrence, because it’s on again, spamming nazi opinions linking to stories on reputable news sites, as well as pages on less reputable right-wing sites, Joe Wein has posted some samples. I’ve already received nearly a thousand since last night.

The good news — here’s a SpamAssassin ruleset that catches these nicely. thanks Raymond!

Tags: , , , , , , , , ,

Comments

PVR Build Log

TV: I’ve taken a little time to throw up my PVR build log.

If you’re hacking on one yourself, or curious about what it takes, or just like reading cut-and-pasted UNIX command lines — go take a look!

Tags: , , , , , ,

Comments

More ways malware damages internet infrastructure: DNS servers

Malware: spotted on NANOG — Six PCs caused BigPond problems:

Disconnecting six compromised personal computers on Tuesday evening eased the difficulties caused by bogus requests which clogged BigPond’s domain name servers (DNS), slowing customer e-mail and Web site access, Telstra said.

A Telstra spokesperson said the carrier had narrowed the list of malware that could have infected the computers to three, adding the problem could have been caused by a combination of those viruses or Trojans. He declined to name the suspects.

He said the PCs generated 95 percent of the bogus requests which caused the problems that evening.

The ‘problems’ in question are described here :

One forum participant (on Aussie forum Whirlpool), who claimed to be a BigPond customer, said on Monday: ‘I’m in Canberra and it’s been almost unusable all afternoon. I’m snowed under at the moment and it is really driving me crazy. Three out of four links fail to load first time and sometimes take eight or nine tries before it does.’

Another said: ‘I am having problems loading Web pages, I get the 404 error. I have to retry five to 10 times to get some places.’

Petri Helenius, in a post to NANOG, notes:

Consumer ISP’s who don’t proactively take care of security/abuse usually end up with harvesting-bots which consume significant amount of DNS resources, typically doing anything from a few dozen to a thousand queries a second. A few hundred of these will seriously hamper an usually provisioned recursive server.

Interesting. It’s been a long time since I’ve relied on an ISP’s recursive DNS servers; in my recent experience (Comcast, Cox.net) they’ve always been overloaded, and take aaaages to give me answers. Maybe this is why.

It makes sense; most Windows machines will indeed use the ISP’s NSes, because that’s what DHCP tells you to do; and setting up a BIND or djbdns instance locally to query the roots directly is still a UNIX-only trick, as far as I know.

The upshot?

  • 1. Yet another good reason why ISPs should proactively disconnect infected customers, as they deny service to other users of the ISP.
  • 2. A good demonstration of yet another way the techie community’s experience of web surfing and internet use differs from that of the unwashed masses in the hinternet — that ’shanty-town of pop-ups and porn adware’, as Danny O’Brien puts it.
  • 3. Sometime soon, if it hasn’t happened already, someone’s going to bundle up an ‘Internet Accelerator’ lump of shareware that sets up a local recursive NS on Windows which queries the roots, and it’ll become the latest popular Windows download. Then the load on the root servers will really start rising.

(PS: top tip — ever wanted a publically-queriable recursive nameserver, or a good IP address for pinging, that’s easy to remember? 4.2.2.1 is what you’re after.)

Tags: , , , , , , , , , ,

Comments

RFID Scan Detector

RFID: Over on Adam Shostack’s weblog, in a comment on an entry regarding the plans to mandate remotely-readable RFID passports, Martin Forssen brings up a great idea:

What I want is a device which beeps every time somebody scans me for RFID-tags. I assume this would be fairly easy to construct since the scanner must send a signal of some strength to activate the chip.

I wonder if that’d work? A keyfob, for example, something similar in size to the dinky Chrysalis Wifi Seeker I have on my keyring, would be perfect. It’d be probably pretty cheap to make, would make a great geek toy, and be quite educational too. ;)

Tags: , , , , , , , , ,

Comments

ApacheCon, and cranes falling into the sea

Trips: So I’m just back from ApacheCon 2004, which took place in the lovely Alexis Park building site ;)

Good fun was had — very interesting to meet all the faces behind the names from various mailing lists and blogs, and get the inside track on how the ASF really works… there’s quite a lot you don’t get to understand from the outside, or even from being a committer. So, a useful trip.

Most of the talks were, naturally, very web-oriented — we’ll have to see what we can do about that, next time around! One useful tidbit: I didn’t realise, but found out at the conference, that the ASF ConCom are very generous with paying speakers’ expenses. So maybe next time I’ll join the speaker line-up, too.

A major goal, one we achieved, was an impromptu SpamAssassin developer summit, 5 days sitting down together hammering on bugs and plans, with 4 of the main developers present (myself, Daniel, Theo and Michael). Pretty much achieved, although there were some thorny bugs to deal with… one interesting factor is that we may now be moving towards emulating the Apache httpd’s preforking model to deal with a memory/performance issue we’re seeing in 3.0.x.

Finally — this sequence of photos has been cropping up all over the internets. When I saw it, I immediately thought it looked a lot like Ireland — and Roundstone, Co. Galway, in particular. Sure enough, it appears it is! I guess the Connemara landscape of Roundstone’s bay is pretty memorable, after all…

Tags: , , , , , , , , , ,

Comments

How to turn a stale project site into a useful Wiki

Web: Almost every project and organisation has, at some stage, bemoaned having stale data on their website, and wished there was a better way to keep it up to date; or wished their FAQ was more complete; or wished they had the time to HTML-ize all their know-how and get it up there.

Well, here’s what we did in SpamAssassin to deal with this problem. (Seeing as I’ve talked about this three times in the past month, I’ll write it up here so I can just point at the URL next time!)

First off, we experimented with having the site checked into CVS, FAQ-o-matic, and the Python FAQ software (which was pretty good). All were OK, but very specific in format, using the traditional question-answer FAQ layout — that’s good for FAQs, but not so good for a lot of other stuff — and keeping it updated was still limited to a small group, therefore the info got stale again.

So we moved to a Wiki. Here’s my tips for Wiki-izing your website so that the end results are better than what went in.

Use good wiki software: unusable software will be a pain to use, and the info will still go stale. We used Moin Moin - http://moin.sourceforge.net/ - partly because I like Python (it’s nearly perl! ;), it can produce RSS, and it was pretty easy to install.

Don’t worry: people won’t vandalise it (much). It turns out that vandalism and people throwing up crappy info isn’t a serious problem at all. You should increase the barrier, in the following ways:

Require user accounts: set the security policy so that a user account must be set up before editing is possible. This means you won’t get wiki-spammed, and also has the side effect of imposing a pretty big barrier to casual vandals.

Send changes to a list: set all changes to be mailed to a mailing list as diffs. This is the most important tip. If you already have a mailing list with the knowledgeable part of the community on it, use that list — because they’re the ones who’ll be able to recognise if erroneous info is put up, and will be annoyed about this enough to bother fixing it. There’s a bonus side-effect of this; even if some people didn’t like the wiki to start with, they’ll eventually be needled into using it by wanting to fix stuff they perceive as wrong. And then they get sucked in ;)

Use diff for the mailed changes: Moin by default will only send out change messages saying ’something changed on this page!’. That’s not good enough, unfortunately — you want to mail out what the new text looks like, and highlight exactly where the change happened. Moin can do this nicely, with this patch, which adds a mail_commits_address, where all diffs on every page are sent, using the normal diff mechanism.

Ensure the wiki software can revert quickly: If someone does make a bad change, Moin supports one-click reversion of the page to what it was beforehand. That’s great for dealing with spam, or clueless vandalism.

Keep one or two static pages: If you’re worried about some script kiddie thinking that defacing a wiki makes them look cool, then keep one or two of the primary user-facing pages as static data. For example, take a look at the link-bar at the top of http://spamassassin.apache.org/ ; five of the ten links are to static pages, the other five are now wiki-ized. In particular, our front page and our downloads page are both static, but our docs are predominantly Wiki’d.

Publicize Mozex: most techie groups will have techie users, and we hate using browser text-boxes to edit text. Mozex — http://mozex.mozdev.org/ — saves the day here — it’s a godsend.

Shepherd new changes: in the early stages, you want one or two people who tidy up changes from Wiki newbies, as they go in. They need to keep it looking pretty, and perform Refactoring of stuff that could be laid out better or should become multiple pages. Eventually, others will get the hang of that (and do a much better job than you do ;).

That’s the lot. Most of these are to, essentially, migrate aspects of your already-existing and already-working community into this new outlet. In our experience, it’s worked really well — our Wiki is now the most reliable source of info about SpamAssassin, and is extensive and up-to-date.

Tags: , , , , , , , , , ,

Comments

SpamAssassin 3.0.0 Released!

Spam: SpamAssassin 3.0.0 is now released! w00t! Only 4 months late this time ;) Announcement, techie details, Slashdot. New logo too:

(Note: if you’re running SpamAssassin 2.x and plan to upgrade, this is a new major release cycle — so we’ve taken the chance to break some backwards compatibility. Be sure to read the UPGRADE doc!)

Tags: , , , , , , , , , ,

Comments

RTE’s Bush Interview

TV: RTE’s ‘Prime Time’ secured a fantastic interview with GWB, with Carole Coleman asking a few very pointed questions. Watch it with RealPlayer, or listen to the audio in MP3 (2.7Mb).

There’s a pretty accurate transcript here:

Let me finish! How many times do I have to tell you how to do your job? See, I gotta insult France at least once. Then I gotta claim ‘merica to be the most generous nation in the whole wide world, even though it’s not true. And listen, let me mention that democracy in Pakistan, too. And guess what? I’m the first president to ever call for a Palestinian state and I’m damn proud of it - just look at the size of my smirk now. Listen, as long as I keep repeating myself and mouthing empty platitudes, you won’t have a chance to call me on any of the bullshit coming out of my mouth.

OK, the official one is here.

It appears that the White House just dropped the ball on this one; reportedly, they had her list of questions three days in advance, but given that they suggested that she ‘ask him a question on the outfit that Taoiseach Bertie Ahern wore to the G8 summit’ (!!!), they weren’t paying attention, and expected some kind of giggling moronic schoolgirl, or something.

Hilariously, the White House has since complained to RTE, the Irish Embassy, the Irish Government, and the reporter herself. Probably God, too. I doubt Prime Time will ever get a White House interview again, but given what they clearly expect from the poodles in the White House press corps, that’s hardly much of a loss.

(I’d love to see what’d happen if he had to deal with Paxman ;)

Also, went to see Fahrenheit 9/11. Fantastic movie, and best of all, incredibly well-attended.

My favourite moment: the reminder of just how easily the US news media sold itself out during the war. Seeing Katie Couric blurting ‘Navy Seals rock!!’ like some kind of starstruck 5-year-old with an Action Man toy, was a classic. It’s good to see that this will be immortalized in celluloid, as it was truly shocking at the time. (Not much has changed; Judith Miller is still writing for the NYT.)

Tags: , , , , , , , , , ,

Comments

Announcing a new script

Web: Minor software announcement — after some time using HTMLThumbnail, album, and even WebMake to build photo galleries, I finally got peeved enough, and gave in to the temptation of ‘not invented here’. ;)

Presenting Uffizi, a CSS- and template-driven, themable perl script to generate photo galleries. Quoting the POD:

  • it’s very self-contained, apart from dependencies on Image::Size and the ImageMagick convert command
  • fast, efficient incremental rebuilding
  • generates full CSS-styled, templated and valid HTML
  • every part of the generated HTML can be modified through the templates
  • generates reasonably-sized images as well as thumbnails, with a link to the full-sized image
  • secure — all pages are static HTML, so your webserver won’t get r00ted through a silly photo album script

I am, of course, using it on my own photo pages, and I’m very happy with it; it’s been a while since I had to hack it. (I need to get it to thumbnail MPEGs as well, but apart from that it’s teh nifty IMO.)

Tags: , , , , , , , , , ,

Comments

The ‘humans are 99.84% accurate’ figure

Spam: ‘The spam-classifying accuracy of a human being is 99.84%’. This statement has passed into SlashDot lore as the gospel truth, so time for some debunking.

First off, that’s not what Bill Yerazunis said in the CRM-114 Sparse Binary Polynomial Hashing and the CRM114 Discriminator paper. Here’s the real quote:

the human author’s measured accuracy as an antispam filter is only 99.84% on the first pass

Here’s a copy of the original mail:

I manually classified the same set of 1900 messages twice, and found three errors in my own classifications, hence I have a 99.84% success rate.

(my emphasis). In other words, the author sat down and ran through 1900 messages manually, then ran through them again, and checked to see how many messages in the first batch disagreed with the second.

Let’s consider an alternative situation, where a user is presented with one message, and asked to take their time, give it a full examination and some thought, and then classify the message. I would consider that more likely to be classified correctly, since fatigue will not be an issue (after 1900 messages, I’m pretty tired of eyeballing), and neither will time pressure (taking 20 seconds on each of 1900 mails would require 10.5 hours, and would be excruciatingly boring to boot).

In addition, the study wasn’t clear on exactly how much information from each mail was presented. Too little (just the subject line) or too much (every header and raw HTML), and a human will be more likely to make mistakes than if the mail is rendered fully, and the extraneous header info hidden. In my experience, I’ve never hand-classified 1900 messages purely through either method, because it’s just too tiring, and I know I’ll make quite a few mistakes. The UI for this work is important.

And finally, the figure is derived from a study with one user performing a task once. There’s no way you could use that figure in a serious setting — it’s not valid statistical science. Here’s Henry’s comment:

Yerazunis’ study of “human classification performance” is fundamentally flawed. He did a “user study” where he sat down and re-classified a few thousand of his personal e-mails and wrote down how many mistakes he made. He repeats this experiment once and calls his results “conclusive.” There are several reasons why this is not a sound methodology:
  • a) He has only one test subject (himself). You cannot infer much about the population from a sample size of 1.
  • b) He has already seen the messages before. We have very good associative memory. You will also notice that he makes fewer mistakes on the second run which indicates that a human’s classification accuracy (on the same messages) increases with experience. For this very reason, it is of the utmost importance to test classification performance on unseen data. After all, the problem tends towards “duplicate detection” when you’ve seen the data before hand.
  • c) He evaluates his own performance. When someone’s own ego is on the line, you would expect that it would be very difficult to remain objective.

So, to correct the statement:

‘The spam-classifying accuracy of this one guy, when classifying nearly two thousand mails by hand, was 99.84%, once.’

Tags: , , , , , , , , , ,

Comments

Making a Bootable CD from a Floppy Image

Tech: Troubleshooters: Making a bootable CD from a bootable floppy image.
Making a note of this for future reference — it should be handy next time I need to do a BIOS or firmware upgrade on my Thinkpad.

I ran into the need for this recently when trying to upgrade the BIOS on my Thinkpad running Linux, so hibernation would work. IBM don’t provide BIOS upgrade tools for Linux, so you have to keep a Windows partition around. (Yes, I pay the Windows Tax — I’ve been bitten by proprietary firmware upgrades requiring it in the past, as in this case.)

Amazingly, however, even after paying the Tax, the ‘non-diskette’ BIOS upgrade (ie. the standalone Windows app) doesn’t work from Windows XP! Instead, you get a hard hang when it tries to bring the machine down from XP to a single-app mode to perform the upgrade. Running from DOS similarly fails, because the BIOS upgrade app is a WIN32 application. Clever.

Eventually, I wound up reformatting my Windows partition, installing Windows 98 (!), and running the BIOS upgrade app from that worked fine. But next time around, I should be able to save myself a few hours of MCSE imitation by using this floppy-to-CD trick… here’s hoping. ;) PCs Are Hard.

Tags: , , , , , , , , , ,

Comments

‘Precision’ bombing, and iTMS Europe

War: A couple of war links, I’ll keep it short. ;)

High-profile air strikes ‘killed only civilians’. ‘The American military launched some 50 air strikes designed to kill specific targets during the Iraq war, it emerged yesterday, but none of them found its mark. Instead the air strikes had a high civilian toll, according to military officials serving at the time.’ Still, it sounded good, like as if CSI were doing all the war strategification and stuff ;)

And: the
Pentagon ‘Torture Memos’ took some tips
from the torture techniques used in Northern Ireland in the 1970s.

Music: Licensing row mars iTunes launch. UK indie labels report that ‘where Apple has spoken to labels the terms on offer have been commercial suicide’, and as a result, they won’t be selling their tunes via iTMS Europe.

I agree with Mark Twomey on this one — bad move. This (and the prices!) reduce the Euro-iTunes offering to about the usefulness of whatever that one is that Real.com have (you know, the one you can’t even remember the name of) – and nobody in Europe buys major-label music online anyway.

Tags: , , , , , , , , ,

Comments

Some history: Unisys and the GIF patent

Patents: I’ve just come across Tim Oren’s page on the Unisys GIF patent furore of 1994-5. Tim used to be VP of ‘Future Technology’ at CompuServe.

The GIF furore, in case you missed it, was one of the most far-ranging software patent debacles to date. Here’s what happened…

Compuserve was one of the biggest online services at the time. In 1987 they’d created GIF, an efficient image file format, for public use, with a very liberal license. As a result, everyone and their dog wrote software to read and write GIF files (including myself ;).

GIF, like many other tools of the time, used the LZW (Lempel-Ziv-Welch) file compression scheme, which had been widely published without any indication that it was considered proprietary. LZW was pretty much the de-facto standard for file compression in the early 90s, in the same way that ‘gzip’ is nowadays.

However — 7 years later, in 1994, Unisys suddenly announced that they had filed for, and eventually received, a patent on the LZW algorithm. As Tim wrote at the time, this was a ’submarine’ patent. (Unisys had owned that patent since 1985, and pursued hardware licenses — but all and sundry believed that the patent didn’t cover software-only implementations.)

Unisys shook downbrought an infringement suit against Compuserve, who had published the GIF standard and implemented it widely in their software. Compuserve had ‘no recourse but to settle’.

(Interestingly, it appears that at the time, Unisys seemed to think that GIF decoders needed licenses as well — popular thinking nowadays is that only GIF encoders need licensing, but Unisys didn’t think so at that stage at least.)

There is a happy ending — thankfully, free software saved the day. ;)

As Tim writes, Thomas Boutell, Jean-loup Gailly and others came up with PNG; Jean-loup and Mark Adler wrote GZIP; and LZW was consigned to the dustbin of unusable technology for most new projects. Old projects, of course, had to go through some redesign pains to achieve the same goal.

BTW, it’s worth noting that, even though the Unisys patent has expired, it’s still not safe to dust off LZW. GNU (and others) believe that there’s another patent filed on the same algorithm independently by — guess who — IBM, which doesn’t expire until 11 August

  1. The thoroughly-competent USPTO strikes again ;)

The lesson: be careful when implementing published standards. Nowadays, the IETF requires that contributors disclose ‘the existence of any proprietary or intellectual property rights in the contribution that are reasonably and personally known to the contributor’. But in this case, the patent was owned by another body, Unisys, and the contributor (CIS) didn’t know that, so that wouldn’t have helped.

So, the real lesson: Just Say No to software patents ;)

Tags: , , , , , , , , , ,

Comments

Clemens Vasters’ ‘Letter to Aiden’

Open Source: Clemens Vasters: Where do you want to go, Aiden? Sadly, Clemens misses the
point dramatically.

Point one: I’ve worked on open-source and proprietary software. I still do. I work on them both simultaneously (or, at least, proprietary 9-5 and open-source outside work hours ;). I have a good few of the things you’re supposed to have ‘by the time you’re 30′.

It’s not an all-or-nothing thing; working on open source doesn’t mean retreating into a garrett and staying up all night. Nothing is black-and-white like that, and surely Clemens should be able to recognise that aspect of the real world by now. ;)

Point two: Open source work does found a career. It acts as a fantastic testament to your ability — especially if you’ve written good code or organised a team. I’d be much more happy to hire someone who had demonstrated that ability, over people who had no OS dev experience, if I was interviewing candidates in the day job. (In fact, I have in the past. ;)

For one thing, a tar.gz from Sourceforge is a lot easier to verify than some assertion that when you worked for some big company, you were Very Important and did Amazing Things, but sorry, they were all secret and proprietary so you have no proof.

Point three: ‘It doesn’t matter whether you love what you are doing and consider this the hobby you want to spend 110% of your time on: It’s exploitation by companies who are not at all interested in creating stuff. They want to use your stuff for free. That’s why they trick you into doing it.’

This is total FUD — pretty much just shouting ‘it’s an IBM conspiracy!’

For the record, I’ve never even talked to anyone from IBM about open source, as far as I know — aside from when I stood up once at a conference and attempt to ask an IBM manager about their crappy software patent policy and how it conflicted with their avowed support of open-source. (Obviously their payoff cheque was late that month ;)

More good comments on slashdot, believe it or not (with the threshold at 3, that is).

(finally, an aside: I suspect the guy’s name was ‘Aidan’ BTW.)

Tags: , , , , , , , , , ,

Comments

Lovely Filelight

Linux: Doing my backups — it’s a good feeling to know your data will (probably) be safe if your computer suddenly carks it.

This time around, I have way too much data to actually back up the lot – so I’m being selective. Filelight is very helpful here; I can see exactly where my disk space is going, spot tmp files that I should have cleared up long ago, and so on.

One thing is clear — I have too many MP3s. How am I supposed to listen to all of those?

Tags: , , , , , , , , , ,

Comments

Aug 14th 2003 Blackout and the Blaster worm

Security: Bruce Schneier points out some interesting angles on the official report into the US power blackout of Aug 14th:

Why the tortured prose? The writers take pains to assure us that the power generation and delivery systems were not affected by MSBlast. But what about the alarm systems? Clearly, they were all affected by something–and all at the same time.

To be honest, it sounds pretty damn close to me, as I’ve said before.

Tags: , , , , , , , , , ,

Comments

Great article on e-voting issues

E-Voting: Do not miss this fantastic round-up on the e-voting situation in the US. It contains these amazing quotes from the leaked Diebold memos:

”Over (the past three years) I have become increasingly concerned about the apparent lack of concern over the practice of writing contracts to provide products and services which do not exist and then attempting to build these items on an unreasonable timetable with no written plan, little to no time for testing, and minimal resources. It also seems to be an accepted practice to exaggerate our progress and functionality to our customers and ourselves then make excuses at delivery time when these products and services do not meet expectations.’ (Source: ‘Resignation’, announce.w3archive/200110/msg00001.html, dated 5 October 2001)

‘It does not matter whether we get anything certified or not, if we can’t even get the foundation of Global stable. This company is a mess! We should stop development on all new, and old products and concentrate on making them stable instead of showing vaporware. Selling a new account will only load more crap on an already over burdened entity. … You are taxing the development team beyond what they can handle. … Why is it so hard to get things right! I have never been at any other company that has been so miss managed (sic).’ (Source: ‘Fw: Battery Status & Charging—and too much bull!!’, announce.w3archive/200110/msg00002.html, dated 20 October 2001)’

I’m speechless. At least the NEDAP system planned for Ireland isn’t this bad — or is it? We can’t tell.

Support the calls for a Voter-Verifiable Paper Audit Trail. There’s no other way to continue to have a trustworthy democratic system with widespread use of e-voting in place.

Tags: , , , , , , , , , ,

Comments

WorldChanging.com

Environment: WorldChanging.com. Bruce Sterling writes:

‘Worldchanging’ is very much the same work the Viridian movement has been doing since 1998, only now (thanks God!) it’s being done by a relatively organized team of capable activists instead of by some wacky novelist in his spare time! So go make them famous. Do it now.’

The Viridian movement is Bruce’s baby, best summed up, I reckon, as ‘electronic green‘.

Anyway, WorldChanging.com is a full-blown MovableType weblog, RDF and all, frequently updated and smartly written. Sign up!

Tags: , , , , , , , , , ,

Comments

Real-time DNS blocklist accuracy figures

Spam: DNS blocklists are the oldest means of spam-blocking, and are still exceedingly useful; nowadays, many of these are fully automated systems, using proxy-detection algorithms and sensing patterns in mailer behaviour indicative of spam.

A few months back on the ASRG list, there was a discussion of DNSBL accuracy; I posted some SpamAssassin figures, based on our ‘mass-check’ tests, but noted that they were computed using current DNSBL contents against a corpus of saved mail, so due to the time delta, were not 100% representative.

These figures are a lot better. Since August, I’ve been collecting real-time DNSBL hit data on my mail, as it is delivered at my SpamAssassin installation. In other words, it’s live accuracy data — it’s using just what the DNSBLs had listed at scan time.

(DNS blocklist accuracy figures continued…)

Note, however, that it’s still incomplete:

  • some DNSBLs were not measured; these are just the default DNSBL list in SpamAssassin 2.60, excluding RCVD_IN_NJABL_DIALUP (which I had to remove because I can’t parse out accurate data).
  • it’s only 1 person’s hand-classified mail.
  • SpamAssassin tests more than just the ‘delivering’ SMTP relay; it’ll also look backwards through the headers, at earlier relays, to catch spam sent via mailing lists. This is different from what’s used with most traditional DNSBL-supporting systems.

But the results should still be quite useful.

The time period covered:

  • Thu, 21 Aug 2003 17:11:30 -0700 (PDT)
  • Sat, 25 Oct 2003 23:11:52 -0700 (PDT)

Recap of the fields:

  • SPAM% = percentage of messages hit that were spam
  • HAM% = percentage of messages hit that were spam
  • S/O = Spam/Overall = Bayesian probability of spam
  • RANK = artificial ranking figure, ignore this!
  • SCORE = default SpamAssassin 2.60 score
  • NAME = name of test. Figuring out the exactly DNSBL should be pretty obvious ;)

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
21839     1993    19846    0.091   0.00    0.00  (all messages)
100.000   9.1259  90.8741    0.091   0.00    0.00  (all messages as %)
5.989  59.0567   0.6601    0.989   1.00    2.25  RCVD_IN_BL_SPAMCOP_NET
3.869  37.7822   0.4636    0.988   0.96    1.10  RCVD_IN_DSBL
0.751   8.2288   0.0000    1.000   0.95    4.30  RCVD_IN_OPM_HTTP
1.964  20.2709   0.1260    0.994   0.95    1.10  RCVD_IN_NJABL_PROXY
0.659   7.1751   0.0050    0.999   0.95    0.64  RCVD_IN_NJABL_SPAM
0.614   0.0000   0.6752    0.000   0.94   -0.10  RCVD_IN_BSP_OTHER
0.050   0.5519   0.0000    1.000   0.94    4.30  RCVD_IN_OPM_SOCKS
0.027   0.3011   0.0000    1.000   0.94    4.30  RCVD_IN_OPM_WINGATE
0.119   0.0000   0.1310    0.000   0.94   -4.30  RCVD_IN_BSP_TRUSTED
0.939   9.7341   0.0554    0.994   0.94    4.30  RCVD_IN_OPM
1.081  10.9383   0.0907    0.992   0.93    1.52  RCVD_IN_SORBS_SOCKS
1.062  10.7376   0.0907    0.992   0.93    1.27  RCVD_IN_SBL
0.229   2.4084   0.0101    0.996   0.93    1.10  RCVD_IN_SORBS_MISC
0.618   6.3221   0.0453    0.993   0.93    1.10  RCVD_IN_SORBS_HTTP
0.595   5.9709   0.0554    0.991   0.92    4.30  RCVD_IN_OPM_HTTP_POST
0.078   0.7526   0.0101    0.987   0.90    2.60  RCVD_IN_SORBS_ZOMBIE
0.815   7.5263   0.1411    0.982   0.89    1.39  DNS_FROM_RFCI_DSN
3.594  24.8369   1.4613    0.944   0.81    2.55  RCVD_IN_DYNABLOCK
1.685  11.4400   0.7054    0.942   0.78    0.10  RCVD_IN_RFCI
0.380   2.4586   0.1713    0.935   0.75    1.31  RCVD_IN_NJABL_RELAY
6.182  33.9689   3.3911    0.909   0.73    0.10  RCVD_IN_NJABL
10.422  44.4054   7.0090    0.864   0.63    0.10  RCVD_IN_SORBS
0.037   0.1505   0.0252    0.857   0.54    2.80  RCVD_IN_SORBS_WEB
2.344   4.1144   2.1667    0.655   0.17    0.00  RCVD_IN_SORBS_SPAM

Tags: , , , , , , , , ,

Comments (3)

Jody — still going strong

Spam: I just got another Jody spam; 40 points this time, and featuring the very latest in spam fashion, a .biz URL.

It’s amazing! The ‘Jody’ fake testimonial crops up in 9060 results on the web and 78600 results on USENET. The oldest spam Google Groups has with this text was posted back on 26th May 1998, which makes it 5 and a half years old by now. (Check it out for some classic period ASCII art, misspellings, and LOTS OF SHOUTING!!!!)

Last time I posted about it, Ben actually tracked down a ‘Mitchell Wolf M.D., Chicago, Illinois’ — Jody’s supposed spouse. Presumably he’s retired on the the ‘USD 147,200.00 every 45 days’ that Jody was amassing from her ‘hobby’, though. ;)

Tags: , , , , , , , , ,

Comments

Tim Bray on Dublin

Ireland: ‘The weather is bloody this time of year, the traffic is worse, but it’s a fine town.’ Agreed!

So I met up with SpamAssassin Dan, SpamAssassin Theo, and POPFile author John Graham-Cumming yesterday, down in San Diego — much spam stuff was discussed.

Great to meet up — not so great to miss the last train back to Irvine to my own inability to correctly read a timetable, and have to drag Dan and Theo out that way. oops, sorry guys! Not so smart, but at least we got to carry on the discussion for an hour or two more…

Tags: , , , , , , , , , ,

Comments

Statistical Art

Art: Jason Salavon: Selected projects, 1997 - 2003.

Salavon operates by taking data from various sources (DVDs of late-night talk shows, homes for sale in various states, MTV’s 10 Greatest Music Videos of All Time, Playboy centerfolds, etc.), then statistically combining them and converting that into another image, movie, or whatever.

The results are excellent. Check out Homes for Sale and Every Playboy Centerfold, The Decades (normalized).

I remember somebody asking me what I thought ‘computer art’ (sic) should be like, after I dissed yet another lame pixellated Photoshop/Flash thingy. Now I have something to point at ;) I’m well impressed.

Tags: , , , , , , , , , ,

Comments

iTrike — the World’s First Solar-Powered Internet Rickshaw

Green: iTrike: the World’s First* Solar-Powered Internet Rickshaw, from wireless.psand.net. Psand.net have done a great job in the past mucking about with wireless at green events in the UK from what I can see — I think I’ve even blogged about ‘em – but they’ve outdone themselves this time. Cool!

PS: mmm, proper cider… yum.

Tags: , , , , , , , , , ,

Comments

Another bad USPTO software patent

Patents: MS patents ‘phone-home’ failure reporting.

There’s a catch, in that it’s not just plain old ‘phone home’, as seen in probably a hundred products since 1960 — they’ve added a ‘match the reported error messages against a db of known issues on the server side’ step. So that’s vaguely inventive — well, no, it’s totally obvious, but at least nobody I can think of off the top of my head has done that before. (Well, I lie, it sounds a bit like KDE’s crash reporting tool which does a similar search before reporting a bug.)

The notable comment, though, is
this:

There is a significant institutional culture issue that has a strong influence on how the Office functions that took root several decades ago and has, regretfully, increased, monotonically, over time. The management attitude, in a nutshell, is that patents aren’t ‘examined’, they are ‘processed’. The examination process is driven by production ‘goals’; to be rated in the key rating category of ‘Production Goal Achievement’ as ‘fully successful’ you must have at least 95%; less than that you are marginal; less then 90% you are ‘unsatisfactory’, meaning your entire rating is ‘unsatisfactory’ meaning a ‘90 day letter’ to get it ‘fully successful’ else you are fired. Also there are other time related requirements to meet, such as no amended application pending more than two months without an action. Persons get fired (yes, this does happen) almost always for low production or exceeding time limits for actions, almost never for improperly allowing claims.

Great.

Tech: It seems it’s stunningly easy to rip off GPRS customers. Another well-designed system I don’t think.

Tags: , , , , , , , , , ,

Comments

Another bad USPTO software patent

MS patents ‘phone-home’ failure reporting.

There’s a catch, in that it’s not just plain old ‘phone home’, as seen in probably a hundred products since 1960 — they’ve added a ‘match the reported error messages against a db of known issues on the server side’ step. So that’s vaguely inventive — well, no, it’s totally obvious, but at least nobody I can think of off the top of my head has done that before. (Well, I lie, it sounds a bit like KDE’s crash reporting tool which does a similar search before reporting a bug.)

The notable comment, though, is
this:

There is a significant institutional culture issue that has a strong influence on how the Office functions that took root several decades ago and has, regretfully, increased, monotonically, over time. The management attitude, in a nutshell, is that patents aren’t ‘examined’, they are ‘processed’. The examination process is driven by production ‘goals’; to be rated in the key rating category of ‘Production Goal Achievement’ as ‘fully successful’ you must have at least 95%; less than that you are marginal; less then 90% you are ‘unsatisfactory’, meaning your entire rating is ‘unsatisfactory’ meaning a ‘90 day letter’ to get it ‘fully successful’ else you are fired. Also there are other time related requirements to meet, such as no amended application pending more than two months without an action. Persons get fired (yes, this does happen) almost always for low production or exceeding time limits for actions, almost never for improperly allowing claims.

Great.

Tech: It seems it’s stunningly easy to rip off GPRS customers. Another well-designed system I don’t think.

Tags: , ,