JFK Reloaded

Games: OK, JFK Reloaded is very, very wierd.

Read the insanely detailed FAQ and boggle at the author’s obsessive research and fetishistic recreation of the events at Dealey Plaza, November 22nd 1963.

Quite worrying, to be honest!

Tags: , , , , , , , , , ,

Comments

Selves and Others now publishing RSS feeds

News: Selves and Others is a site that cropped up a couple of months ago, tracking the output of many of the left’s strongest voices, for example:

Well, one feature they were missing was RSS feeds, allowing users to track new articles by a specific author as they’re published. They’ve just added it; the good old orange XML button now appears on each author’s page. Excellent!

Tags: , , , , , , , , , ,

Comments

The ‘humans are 99.84% accurate’ figure

Spam: ‘The spam-classifying accuracy of a human being is 99.84%’. This statement has passed into SlashDot lore as the gospel truth, so time for some debunking.

First off, that’s not what Bill Yerazunis said in the CRM-114 Sparse Binary Polynomial Hashing and the CRM114 Discriminator paper. Here’s the real quote:

the human author’s measured accuracy as an antispam filter is only 99.84% on the first pass

Here’s a copy of the original mail:

I manually classified the same set of 1900 messages twice, and found three errors in my own classifications, hence I have a 99.84% success rate.

(my emphasis). In other words, the author sat down and ran through 1900 messages manually, then ran through them again, and checked to see how many messages in the first batch disagreed with the second.

Let’s consider an alternative situation, where a user is presented with one message, and asked to take their time, give it a full examination and some thought, and then classify the message. I would consider that more likely to be classified correctly, since fatigue will not be an issue (after 1900 messages, I’m pretty tired of eyeballing), and neither will time pressure (taking 20 seconds on each of 1900 mails would require 10.5 hours, and would be excruciatingly boring to boot).

In addition, the study wasn’t clear on exactly how much information from each mail was presented. Too little (just the subject line) or too much (every header and raw HTML), and a human will be more likely to make mistakes than if the mail is rendered fully, and the extraneous header info hidden. In my experience, I’ve never hand-classified 1900 messages purely through either method, because it’s just too tiring, and I know I’ll make quite a few mistakes. The UI for this work is important.

And finally, the figure is derived from a study with one user performing a task once. There’s no way you could use that figure in a serious setting — it’s not valid statistical science. Here’s Henry’s comment:

Yerazunis’ study of “human classification performance” is fundamentally flawed. He did a “user study” where he sat down and re-classified a few thousand of his personal e-mails and wrote down how many mistakes he made. He repeats this experiment once and calls his results “conclusive.” There are several reasons why this is not a sound methodology:
  • a) He has only one test subject (himself). You cannot infer much about the population from a sample size of 1.
  • b) He has already seen the messages before. We have very good associative memory. You will also notice that he makes fewer mistakes on the second run which indicates that a human’s classification accuracy (on the same messages) increases with experience. For this very reason, it is of the utmost importance to test classification performance on unseen data. After all, the problem tends towards “duplicate detection” when you’ve seen the data before hand.
  • c) He evaluates his own performance. When someone’s own ego is on the line, you would expect that it would be very difficult to remain objective.

So, to correct the statement:

‘The spam-classifying accuracy of this one guy, when classifying nearly two thousand mails by hand, was 99.84%, once.’

Tags: , , , , , , , , , ,

Comments

Witty’s 110 seed hosts

Security: good ;login: preprint article on the ‘Witty’ worm. ‘Conclusion: Witty represents a new generation of malcode: written by a motivated, skilled, and malicious individual. Witty’s author is the first to combine both skill and substantial malice. The author had some motive which lead, for him, to desire a destructive effect. Witty was written by an expert and, unless caught, he could do it again.’

However, there’s one point where I think the authors have slipped up:

The use of previously compromised machines (for seeding) requires that the attacker either obtained access on 110 machines using a different tool, already had access to 110 machines, or took control of these machines from a third party. Thus Witty’s author probably possessed some ties to the attacker underground, to gain these machines in the short timeframe.

IMO, that’s not necessarily the case. Given that current estimates are that 80% of spam emanates via open proxies, and that those in turn are generally insecure machines that have been taken over, I would surmise that someone with access to a reasonable amount of spam and an off-the-shelf Windows vulnerability scanner could quickly amass 110 machines to launch the attack with — simply by scanning for the vulnerabilities those machines were r00ted with in the first place.

Good article otherwise, though…

Tags: , , , , , , , , , ,

Comments

good interview with Philip Greenspun

Open Source: ITConversations: Doug Kaye and Philip Greenspun (via Tony Bowden).

Very interesting interview overall. Philip notes that he didn’t see weblogs coming because ‘it never occurred to me that relatively minor changes in how you allow people to author would cause such a revolution’. I must admit, I was the same. As far as I could see, it was just another HTML page, being updated frequently — it took me quite a while before I realised the social aspects, of conversations taking places in a group of weblogs, was making a whole new thing.

Also, there’s a great few paragraphs where he discusses how sensitive to supply-side economics the whole ‘building a business on open source’ thing is. Search for ‘a dollar cheaper and a day faster’ to find it.

Tags: , , , , , , , , , ,

Comments

Editable Text-to-HTML converters

Web: Dive Into Markdown — a great post from John Gruber about editable-text-to-HTML formats (he’s the author of Markdown):

… my actual workflow looked like this:

  1. Write in BBEdit.
  2. Preview in a browser.
  3. Switch back to BBEdit for revisions.
  4. Repeat until done.
  5. Log into MT, paste the article, publish.

Eventually, it dawned on me: this is madness. The primary advantage to using a computer for writing is the immediacy of editing. Write, read, revise, all in the same window, all in the same mode.

Totally agreed (although note, I’m using my own, very similar, EtText instead of Markdown ;). But this weblog is 100% EtText-driven, instead of HTML — I just throw an email at it, and it publishes it. I don’t think I’ve used the web interface in months.

Which reminds me — I really should steal some ideas gather inspiration from Markdown for EtText at some stage. ;)

Tags: , , , , , , , , , ,

Comments

Megalithomania!

History: Megalithomania is an incredible website ‘originally dedicated to Irish megaliths, but now expanded to include all sorts of antiquities that are of importance/interest.’

The author visits sites each week, writes up brief reports, takes photos, and logs the log on this excellent website; every site is added to a map, and there’s a whole load of ways to find sites by location, by clicking on a flash map, by date of visit etc.

It’s a triumph of usability, very pretty, and who knew there was a kist in Dublin Zoo’s tapir enclosure?

Hope everyone had a good Paddy’s Day! (PS: note: most definitely not ‘Patty’s Day’.)

Tags: , , , , , , , , , ,

Comments

Tim Bray on Dublin

Ireland: ‘The weather is bloody this time of year, the traffic is worse, but it’s a fine town.’ Agreed!

So I met up with SpamAssassin Dan, SpamAssassin Theo, and POPFile author John Graham-Cumming yesterday, down in San Diego — much spam stuff was discussed.

Great to meet up — not so great to miss the last train back to Irvine to my own inability to correctly read a timetable, and have to drag Dan and Theo out that way. oops, sorry guys! Not so smart, but at least we got to carry on the discussion for an hour or two more…

Tags: , , , , , , , , , ,

Comments

LPR as a general spooling and queueing mechanism

Good article on the use of LPR/LPD as a general-purpose distributed queueing mechanism for non-printing applications.

I maintained PLP (the predecessor of LPRng , which the author uses) for a while, and this kind of thing was one of the main featuresets I wanted to enable.

I know someone in (if I recall correctly) BASF was using it to generate movies, from frame-grabs individually LPR’d by a network of machines. As a result we had to add sub-second accuracy to the queueing; not sure if that made it into LPRng though ;)

Tags: , , , , , , , , ,

Comments