W00t! SpamAssassin 3.2.0 has
finally gone gold!
This release is a big one — it’s the first major release since 3.1.0, back in
September 2005, just over a year and a half ago. Here
is the release announcement mail, containing a list of major changes since
version 3.1.8. There are a few major new features that I feel worth picking out
in more detail and editorialising about:
sa-compile
This is a biggie. This new script takes the active SpamAssassin ruleset, and
uses code contributed by Matt
Sergeant to produce input for
re2c. re2c in turn compiles the ruleset into a
deterministic finite
automaton,
which can match multiple regular expressions in parallel. That’s not all,
though; re2c then compiles that DFA into C code — which is then compiled into
native object code. SpamAssassin will then load that object code and use it
to replace the slower perl regexp tests, if it’s available at scan-time.
Now, it’s been a long time since SpamAssassin’s ruleset consisted mainly of
rudimentary regular expressions matched against the body text — a good portion
of SpamAssassin’s ruleset these days operates against headers, performs network
lookups, analyzes URLs extracted from the body, uses the more advanced features
supported by Perl’s NFA regexp engine, or so on. But even given that, the effects
of ’sa-compile’ seem to average between a 15% and 25% speedup, in my testing.
That’s good ;)
Many of the commercial versions of SpamAssassin include their own body-rule
speedups — but this is the first time anything similar has made it into the
open source code.
Short-circuiting
Another good one for performance. There are some rules that you can reasonably
assume will never hit nonspam or spam mail in a well-configured setup.
For example, a hit on “ALL_TRUSTED” should mean that the message never
traversed an untrusted network, therefore it cannot be spam, so why bother
applying the expensive tests? It should be reasonable to “short-circuit” and
immediately return a “ham” score for that mail.
This new plugin implements that algorithm — and efficiently, too, which
historically has been the hard part!
I’ve been using this for a while with a ruleset
like this one — in my experience, it’s cut overall CPU time spent scanning
mail by 20%.
It is pretty flexible, too — there’s lot of tweakage that can be done with
this functionality to suit your own setup.
Reduced memory footprint
One aim of this release has been to reduce the memory usage of SpamAssassin;
the core code now uses less RAM than 3.1.x does, when tested with the same
ruleset. (Unfortunately we’ve added lots more rules in the interim, so it’s a
bit of a wash overall. ;)
The VBounce anti-bounce ruleset
Detects spurious bounce messages sent by broken mail systems in response to
spam or viruses. More info about that here.
Apache-spamd
apache-spamd implements spamd as a mod_perl module.
This was contributed by Radoslaw Zielinski, as a Google Summer of Code
project last year. Thanks Radoslaw!
There are plenty more new, useful features and rules — these are just the top ones, in my opinion. Pretty cool stuff!
Tags: anti-spam, dfa, perl, releases, sa-compile, sa-update, sa320, software, spamassassin