DSPAM acquired by Sensory Networks

whoa, didn’t see that coming. Quoting Jonathan Zdziarski via jgc’s newsletter:

…The [DSPAM] project had grown to a point where it would take others - with enough free time - to bring DSPAM to the next level as a widely accepted enterprise-class solution, and [I] decided that it would be in the best interest of the project to entrust it to someone with the technical knowhow and dedication to reach these goals. Many of you are aware of my work in the past with Sensory Networks in developing a hardware-accelerated version of DSPAM (capable of supporting multi-megabit speeds in large carrier environments). I’ve spent a considerable amount of time with SN’s team over the past several years and when we initially discussed working together, they had shown to be very excited and motivated about the project.

After careful consideration and many discussions at length, I decided to allow Sensory Networks to acquire the rights to the project, and continue development on it with their own team. SN has displayed a strong commitment to the open source community and has been working closely with other leading projects such as Snort, Clam Antivirus, and SpamAssassin. They assured me that the project will remain open-source and available to all, and at the same time the project will receive exposure in commercial environments it has not seen before, as many of you have been asking for. We’ve now completed the acquisition for the project, and I’d like to encourage you to support them in helping them move forward as it grows into new areas.

More details at zdziarski.com.

Tags: , , , ,

Comments

Slashdot Anti-FUSSP Form, and DSPAM’s FAQ

Spam: Slashdot: This will fail because… Tick the boxes to produce
a generic slashdot comment on a new anti-spam proposal. Very funny.

So, regarding the Noise Reduction probabilistic-classification tokenizer tweak posted on Slashdot yesterday — it does look interesting; basically, it operates by monitoring the ‘noisiness’ of the token stream, and if the current probabilities for the tokens from the stream differs from what’s defined as acceptable for too long, it ‘dubs’ them out. In other words, it ignores those tokens until another sequence of ‘useful’ tokens is encountered. Plus I’m totally down with the Janine ref ;)

However, it’s disappointing to come across this in the DSPAM FAQ list:
Why Should I use DSPAM Instead of SpamAssassin?
– a lovely selection of anti-perl and anti-SpamAssassin FUD, generally overlooking SpamAssassin’s training components (’leaves the end-user with no means of recourse or satisfaction when they receive a spam’), and in general taking a combative tone. Is that really necessary?

BTW, in case you’ve been living in a hole for the last year – SpamAssassin does include a probabilistic classifier, in the form of the BAYES rules. It’s easy to train, uses good tokenizing and combining algorithms to get high accuracy (although doesn’t yet do multi-word windowing until we’ve determined that that works acceptably for the db size increase), and, importantly, has been measured on corpora that are not my own mail.

A story: way back when, in June 2001, the SpamAssassin README boasted of it’s 99.94% accuracy rate. This was true — it was measured on my mail feed over the course of a couple of months. However, once measured on someone else’s mail, that dropped pretty quickly. Measuring a spam filter on the developer’s mail feed, (where presence of HTML is a killer spam-sign!), is a sure-fire way to get (a) great but (b) non-portable accuracy figures.

Tags: , , , , , , , , ,

Comments