Skip to content

Month: August 2005

Faster string search alternative to Boyer-Moore: BloomAV

An interesting technique, from the ClamAV development list — using Bloom filters to speed up string searching. This kind of thing works well when you’ve got 1 input stream, and a multitude of simple patterns that you want to match against the stream. Bloom filters are a hashing-based technique to perform extremely fast and memory-efficient, but false-positive-prone, binary lookups.

The mailing list posting (‘Faster string search alternative to Boyer-Moore‘) gives some benchmarks from the developers’ testing, along with the core (GPL-licensed) code:

Regular signatures (28,326) :

  • Extended Boyer-Moore: 11 MB/s

  • BloomAV 1-byte: 89 MB/s

  • BloomAV 4-bytes: 122 MB/s

Some implementation details:

the (implementation) we chose is a simple bit array of (256 K * 8) bits. The filter is at first initialized to all zeros. Then, for every virus signature we load, we take the first 7 bytes, and hash it with four very fast hash functions. The corresponding four bits in the bloom filter are set to 1s.

Our intuition is that if the filter is small enough to fit in the CPU cache, we should be able to avoid memory accesses that cost around 200 CPU cycles each.

Also, in followup discussion, the following paper was mentioned: A paper describing hardware-level Bloom filters in the Snort IDS — S. Dharmapurikar, P. Krishnamurthy, T. Sproull, and J. W. Lockwood, “Deep packet inspection using parallel Bloom filters,” in Hot Interconnects, (Stanford, CA), pp. 44–51, Aug. 2003.

This system is dubbed ‘BloomAV’. Pretty cool. It’s unclear if the ClamAV developers were keen to incorporate it, though, but it does point at interesting new techniques for spam signatures.

Tech Camp Ireland

Irish techies, mark your calendars! Various Irish bloggers are proposing a Tech Camp Ireland geek get-together, similar to Bar Camp in approach, for Saturday October 15th.

Ed Byrne and James Corbett are both blogging up a storm already. I’d go, but it’d be a hell of a trip ;)

I would say it needs a little less blog, a little more code, and a little more open source, but it does look very exciting, and it’s great to see the Bar Camp spirit hitting Ireland.

More on ‘Bluetooth As a Laptop Sensor’

Bluetooth As a Laptop Sensor in Cambridge, England.

I link-blogged this yesterday, where it got picked up by Waxy, and thence to Boing Boing — where some readers are reportedly considering it doubtful. Craig also expressed some skepticism. However, I think it’s for real.

Check out the comments section of Schneier’s post — there’s a few notable points:

  • Some Bluetooth-equipped laptops will indeed wake from suspend to respond to BT signals.

  • Davi Ottenheimer reports that the current Bluetooth spec offers “always-on discoverability” as a feature. (Obviously the protocol designers let usability triumph over security on that count.)

  • Many cellphones are equipped with Bluetooth, and can therefore be used to detect other ‘discoverable’ BT devices in range.

  • Walking around a UK hotel car park, while pressing buttons on a mobile phone, would be likely to appear innocuous — I know I’ve done it myself on several occasions. ;)

Finally — this isn’t the first time the problem has been noted. The same problem was reported at Disney World, in the US:

Here’s the interesting part: every break-in in the past month (in the Disney parking lots) had involved a laptop with internal bluetooth. Apparently if you just suspend the laptop the bluetooth device will still acknowledge certain requests, allowing the thief to target only cars containing these laptops.

Mind you, perhaps this is a ‘chinese whispers’ case of the Disney World thefts being amplified. Perhaps it was noted as happening in Disney World, reported in an ’emerging threats’ forum where the Cambridgeshire cop heard it, and he then picked it up as something worth warning the public about, without knowing for sure that it was happening locally.

Update: aha. An observant commenter on Bruce Schneier’s post has hit on a possibly good reason why laptops implement wake-on-Bluetooth:

On my PowerBook, the default Bluetooth settings were “Discoverable” and “Wake-on-Bluetooth” — the latter so that a Bluetooth keyboard or mouse can wake the computer up after it has gone to sleep.

Emergent Chaos: I’m a Spamateur

Emergent Chaos: I’m a Spamateur:

In private email to Justin “SpamAssassin” Mason, I commented about blog spam and “how to fix it,” then realized that my comments were really dumb. In realizing my stupidity, I termed the word “spamateur,” which is henceforth defined as someone inexperienced enough to think that any simple solution has a hope of fixing the problem.

I think this is my new favourite spam neologism ;)

How convenient does the ‘right thing’ have to be?

Environment: Kung Fu Monkey: Hybrids and Hypotheses. A great discussion of the Toyota Prius:

Kevin Drum recently quoted a study which re-iterated that there’s no “real” advantage to buying a hybrid. It’s only just as convenient — so if you’re driving a hybrid, you’re doing it for some other reason than financial incentive.

That made me think: what a perfect example of just how fucking useless as a society we’ve become. We can’t even bring ourselves to do the right thing when it’s only JUST as convenient as doing the wrong thing. And that’s not even considered odd. Even sadder.

Box Office Patents

Forbes: Box Office Patents.

It’s the kind of plot twist that will send some critics screaming into the aisles: Why not let writers patent their screenplay ideas? The U.S. Patent and Trademark Office already approves patents for software, business methods — remember Amazon.com’s patent on ‘one-click’ Internet orders? — even role-playing games. So why not let writers patent the intricate plot of the next cyberthriller?

So in other words, a law grad called Andrew Knight actually wants to see the world RMS described in his ‘Patent Absurdity’ article for the Guardian, where Les Miserables was unpublishable due to patent infringement. Incredible.

He himself plays the classic lines, familiar to those who followed the EU software patenting debate:

Knight agrees, up to a point. He won’t reveal the exact details of the plots he’s submitted to the Patent Office, other than to say they involve cyberspace. And he says patents would apply only to ideas that are unique and complex. But he worries that without patent protection, some Hollywood sharpies could change ideas like his around and pass them off as their own.

”I’m trying to address a person who comes up with a brand-new form of entertainment who may not be a Poe, may not be a Shakespeare, but still deserves to be paid for his work,” Knight says. ”Otherwise, who will create anything?”

A perfect pro-patent hat trick!

Running on WordPress!

I’ve decided to try out the real deal — a ‘proper’ weblogging platform, namely WordPress. Be sure to comment if you spot problems…

Grumpiness and Cigarettes

Meta: My apologies if you wound up running into me online at some stage this week — I’ve been in a lousy mood.

I gave up smoking cigarettes at the end of May, and switched to patches. That went pretty well, dropping from 21mg patches, to 14mg, to 7mg. But this week I finally hit the end of the line, stopped applying a patch every morning, and became fully nicotine-free. Only, ouch — it’s not quite as easy as I thought!

Cigarette addiction is (apparently) composed of two conceptual lumps — the physical addiction to nicotine, and the mental addiction to the ‘idea’ of smoking. Through the patches, I’ve successfully nailed the mental addiction, but I’m now facing the physical withdrawal. I’m sweating, dizzy, can’t focus my eyes, can’t concentrate, my skin is going crazy, and I’m INCREDIBLY grouchy. It’s amazing how much havoc the act of withholding nicotine can cause, especially when you consider that it’s not a required nutrient for the human body — it’s an ‘optional extra’ that I never should have gone near in the first place.

Wierdly, though, I don’t want a cigarette. Instead, I want a patch ;)

Xen and UKUUG 2005

Linux: PingWales’ round-up of UKUUG Linux 2005 Day 3 includes this snippet:

As well as running (Virtual Machines), Xen allows them to be migrated on the fly. If a physical system is overloaded, or showing signs of failure, a virtual machine can be migrated to a spare node. This process takes time, but causes very little interruption to service. The machine state is first copied in its entirety, then the changes are copied repeatedly until there are a small enough number than the machine can be stopped, the remaining changes copied and the new version started. This usually provides a service interruption of under 100ms – a small enough jitter that people playing Quake 3 on a server in a virtual machine did not notice when it was moved to a different node.

Now that is cool.

Jim Winstead’s A9 on foot

Images: Jim Winstead’s walk up Broadway from a few days ago has already garnered a few interested parties, since he’s Creative-Commons-licensed all the photos, and they’re easily findable via Google and on Flickr.

I find this interesting; the collision between open source, photography and cartography is cool. The result is a version of maps.A9.com, where you can actually use the images legally in your own work. More people should do this for other cities.

Where the ‘cursor’ came from

Stuff: So C is a massive antiques nut, and got tickets for the Antiques Roadshow next month in LA. As a result, we’ve been shopping around for interesting stuff for her to bring along.

Here’s what I found at the antiques market last weekend:

Click on the pic to check out my multiplication skills!

The Life of a SpamAssassin Rule

Spam: during a recent discussion on the SpamAssassin dev list, the question came up as to how long a rule could expect to maintain its effectiveness once it was public — the rule secrecy issue.

In order to make a point — that certain types of very successful rules can indeed last a long time — I picked out one rule, MIME_BOUND_DD_DIGITS. Here’s a smartened-up copy of what I found out.

This rule matches a certain format of MIME boundary, one observed in 17.4637% of our spam collection and with 0 nonspam hits. Since we have a massive collection of mails, received between Jan 2004 to May 2005, and a rule with a known history, we can then graph its effectiveness over time.

The rule’s history was:

  • bug 3396: the initial contribution from Bob Menschel, May 15 2004
  • r10692: arrived in SVN: May 16 2004
  • r20178: promoted to ‘MIME_BOUND_DD_DIGITS’: May 20 2004 (funnily enough, with a note speculating about its lifetime from felicity!)
  • released in the SpamAssassin 3.0.0 release: mid-Sep 2004

So, we would expect to see a drop in its effectiveness against spam in late May 2004 and onwards, if the spammers were reacting to SVN changes; or post September 2004, if they react to what’s released.

By graphing the number of hits on mails within each 2-hour window, we can get a good idea of its effectiveness over time:

The red bars are total spam mails in each time period; green bars, the number of spam mails that hit the rule in each period. May 15 2004 and Sep 20 2004 are marked; Jan 2004 is at the left, and May 2005 is at the right-most extreme of the graph. (There’s a massive spike in spam volume at the right — I think this is Sober.Q output, which disappears after a week or so.)

It appears that the rule remains about even in effectiveness in the 4 months it’s in SVN, but unreleased; it declines a little more after it makes it into a SpamAssassin release. However, it trails off very slowly — even in May 2005, it’s still hitting a good portion of spam.

Given this, I suspect that most spammers are not changing structural aspects of their spam in response to SpamAssassin with any particular alacrity, or at least are not capable of doing so.

To speculate on the latter, I think many spammers are using pirated copies of the spamware apps, so cannot get their hands on updated versions through ‘legitimate’ channels.

Speculating on the former — in my opinion there’s a very good chance that SpamAssassin just isn’t a particular big target for them to evade, compared to the juicy pool of gullible targets behind AOL’s filters, for example. ;)

‘Irish EFF’

Ireland: There’s been some discussion about ‘an Irish EFF’ recently, reminding me of the old days of Electronic Frontier Ireland in the 1990s.

I was reminded of this by Danny O’Brien’s article in The Guardian, where he notes an interesting point — half of the effectiveness of the EFF in the US, is because they have a few full-time people sitting in an office, answering phone calls. Essentially they act as a human PBX, being the go-to guy connecting journalists to activists and experts.

Now that is something that could really work, and is needed in Ireland, which is in the same boat as the UK in this respect; the journalists don’t know who to ask for a reliable opposing opinion when the BSA, ICT Ireland, or the IRMA put out incorrect statements. It has to be someone who’s always available for a quote at the drop of a hat, over the phone. From experience, this takes dedication — and without getting paid for it, it’s hard to keep the motivation going.

IrelandOffline have done it pretty well for the telecoms issue; ICTE have done a brilliant job, the best I’ve seen in Europe IMO, of grabbing hold of the e-voting issue to the stage where they own it; but for online privacy, software patenting, and other high-tech-meets-society issues, there’s nobody doing it that successfully.

(Update: added ICTE, slipped my mind! Sorry Colm!)

Happy Birthday to the RISKS Forum!

Tech: One of the first online periodicals I started reading regularly, when I first got access to USENET back in 1989 or so, was comp.risks — Peter G. Neumann’s RISKS Forum. Since then, I’ve been reading it religiously, in various formats over the years.

It appears that RISKS has just celebrated its 20th anniversary.

Every couple of weeks it provides a hefty dose of computing reality to counter the dreams of architecture astronauts and the more tech-worshipping members of our society, who fail to realise that just because something uses high technology, doesn’t necessarily make it safer.

I got to meet PGN a couple of weeks ago at CEAS, and I was happy to be able to give my thanks — RISKS has been very influential on my code and my outlook on computing and technology.

Nowadays, with remote code execution exploits for e-voting machines floating about, and National Cyber-Security Czars, I’d say RISKS is needed more than ever. Long may it continue!

Stupid ‘Ph’ Neologisms Considered Harmful

Words: ‘Pharming’. I recently came across this line in a discussion document:

‘Wait, isn’t this exactly the kind of attack pharmers mount?’

I was under the impression that ‘pharming’ was a transgenics term: ‘In pharming, … genetically modified (transgenic) animals are
mostly used to make human proteins that have medicinal value. The protein encoded by the transgene is secreted into the animal’s milk, eggs or blood, and then collected and purified. Livestock such as cattle, sheep, goats, chickens, rabbits and pigs have already been modified in this way to produce several useful proteins and drugs.’

Obviously this wasn’t what was being referred to. So I got googling. It appears the sales and marketing community of various security/filtering/etc. companies, have been getting all het up about various phishing-related dangers.

The earliest article I could find was this — GCN: Is a new ID theft scam in the wings? (2005-01-14):

”Pharming is a next-generation phishing attack,’ said Scott Chasin, CTO of MX Logic. ‘Pharming is a malicious Web redirect,’ in which a person trying to reach a legitimate commercial site is sent to the phony site without his knowledge. ‘We don’t have any hard evidence that pharming is happening yet,’ Chasin said. ‘What we do know is that all the ingredients to make it happen are in place.’

Oooh scary! The article is short on technical detail (but long on scary), but I think he’s talking about DNS cache poisoning, whereby an attacker implants incorrect data in the victim’s DNS cache, to cause them to visit the wrong IP address when they resolve a name. This Wired article (2005-03-14) seems to confirm this.

But wait! Another meaning is offered by Green Armor Solutions, who use the term to talk about the Panix and Hushmail domain hijacks, where an attacker social-engineered domain transfers from their registrars. There’s no date on the page, but it appears to be post-March 2005.

Finally, yet another meaning is offered in this article at CSO Online: How Can We Stop Phishing and Pharming Scams? (May 2005): ‘The Computing Technology Industry Association has reported that pharming occurrences are up for the third straight year.’ What?! Call Scott Chasin!

Steady on — it appears that the ‘pharming’ CSO Online is talking about, has devolved to the stage where it’s simply a pop-up window that attempts to emulate a legit site’s input — no DNS trickery involved. (This trick has, indeed, been used in phish for years.)

So right there we have three different meanings for ‘pharming’, or four if you count the biotech one.

It may be impossible to get the marketeers to stop referring to ‘pharming’. But please, if you’re a techie, don’t use that term, it’s lack of clarity renders it useless. Anyway, the biotech people were there first, by several years…

Stunning round-up of alleged election fraud in Ohio

Voting: None Dare Call It Stolen – Ohio, the Election, and America’s Servile Press, by Mark Crispin Miller.

Miller and many others have obviously been spending a lot of work chasing down each incident in Ohio since last November, and there’s quite a lot of them. It’s impressive the degree to which recounts were evaded, if these allegations are true. There’s many shocking cases alleged than I could really fit here — but here’s some of the lowest points:

On December 13, 2004, it was reported by Deputy Director of Hocking County Elections Sherole Eaton, that a Triad GSI employee had changed the computer that operated the tabulating machine, and had “advised election officials how to manipulate voting machinery to ensure that preliminary hand recount matched the machine count.” This same Triad employee said he worked on machines in Lorain, Muskingum, Clark, Harrison, and Guernsey counties.

it strongly appears that Triad and its employees engaged in a course of behavior to provide “cheat sheets” to those counting the ballots. The cheat sheets told them how many votes they should find for each candidate, and how many over and under votes they should calculate to match the machine count. In that way, they could avoid doing a full county-wide hand recount mandated by state law.

In Union County, Triad replaced the hard drive on one tabulator. In Monroe County, “after the 3 percent hand count had twice failed to match the machine count, a Triad employee brought in a new machine and took away the old one. (That machine’s count matched the hand count.)”

The willingness to throw away functioning, reliable election systems, and replacing them with new, easy-to-subvert ones, is astounding. But on top of that, when concerned parties investigate and find danger signs, it’s easily buried:

Miller emphasizes that, even after the National Election Data Archive Project, on March 31, 2005, “released its study demonstrating that the exit polls had probably been right, it made news only in the Akron Beacon-Journal,” while “the thesis that the exit polls were flawed had been reported by the Associated Press, the Washington Post, the Chicago Tribune, USA Today, the San Francisco Chronicle, the Columbus Dispatch, CNN.com, MSNBC, and ABC.”

Miller’s conclusion: ‘the press has unilaterally disarmed’.