Real-time DNS blocklist accuracy figures
Spam: DNS blocklists are the oldest means of spam-blocking, and are still exceedingly useful; nowadays, many of these are fully automated systems, using proxy-detection algorithms and sensing patterns in mailer behaviour indicative of spam.
A few months back on the ASRG list, there was a discussion of DNSBL accuracy; I posted some SpamAssassin figures, based on our ‘mass-check’ tests, but noted that they were computed using current DNSBL contents against a corpus of saved mail, so due to the time delta, were not 100% representative.
These figures are a lot better. Since August, I’ve been collecting real-time DNSBL hit data on my mail, as it is delivered at my SpamAssassin installation. In other words, it’s live accuracy data — it’s using just what the DNSBLs had listed at scan time.
(DNS blocklist accuracy figures continued…)
Note, however, that it’s still incomplete:
- some DNSBLs were not measured; these are just the default DNSBL list in SpamAssassin 2.60, excluding RCVD_IN_NJABL_DIALUP (which I had to remove because I can’t parse out accurate data).
- it’s only 1 person’s hand-classified mail.
- SpamAssassin tests more than just the ‘delivering’ SMTP relay; it’ll also look backwards through the headers, at earlier relays, to catch spam sent via mailing lists. This is different from what’s used with most traditional DNSBL-supporting systems.
But the results should still be quite useful.
The time period covered:
- Thu, 21 Aug 2003 17:11:30 -0700 (PDT)
- Sat, 25 Oct 2003 23:11:52 -0700 (PDT)
Recap of the fields:
- SPAM% = percentage of messages hit that were spam
- HAM% = percentage of messages hit that were spam
- S/O = Spam/Overall = Bayesian probability of spam
- RANK = artificial ranking figure, ignore this!
- SCORE = default SpamAssassin 2.60 score
- NAME = name of test. Figuring out the exactly DNSBL should be pretty obvious ;)
OVERALL% SPAM% HAM% S/O RANK SCORE NAME 21839 1993 19846 0.091 0.00 0.00 (all messages) 100.000 9.1259 90.8741 0.091 0.00 0.00 (all messages as %) 5.989 59.0567 0.6601 0.989 1.00 2.25 RCVD_IN_BL_SPAMCOP_NET 3.869 37.7822 0.4636 0.988 0.96 1.10 RCVD_IN_DSBL 0.751 8.2288 0.0000 1.000 0.95 4.30 RCVD_IN_OPM_HTTP 1.964 20.2709 0.1260 0.994 0.95 1.10 RCVD_IN_NJABL_PROXY 0.659 7.1751 0.0050 0.999 0.95 0.64 RCVD_IN_NJABL_SPAM 0.614 0.0000 0.6752 0.000 0.94 -0.10 RCVD_IN_BSP_OTHER 0.050 0.5519 0.0000 1.000 0.94 4.30 RCVD_IN_OPM_SOCKS 0.027 0.3011 0.0000 1.000 0.94 4.30 RCVD_IN_OPM_WINGATE 0.119 0.0000 0.1310 0.000 0.94 -4.30 RCVD_IN_BSP_TRUSTED 0.939 9.7341 0.0554 0.994 0.94 4.30 RCVD_IN_OPM 1.081 10.9383 0.0907 0.992 0.93 1.52 RCVD_IN_SORBS_SOCKS 1.062 10.7376 0.0907 0.992 0.93 1.27 RCVD_IN_SBL 0.229 2.4084 0.0101 0.996 0.93 1.10 RCVD_IN_SORBS_MISC 0.618 6.3221 0.0453 0.993 0.93 1.10 RCVD_IN_SORBS_HTTP 0.595 5.9709 0.0554 0.991 0.92 4.30 RCVD_IN_OPM_HTTP_POST 0.078 0.7526 0.0101 0.987 0.90 2.60 RCVD_IN_SORBS_ZOMBIE 0.815 7.5263 0.1411 0.982 0.89 1.39 DNS_FROM_RFCI_DSN 3.594 24.8369 1.4613 0.944 0.81 2.55 RCVD_IN_DYNABLOCK 1.685 11.4400 0.7054 0.942 0.78 0.10 RCVD_IN_RFCI 0.380 2.4586 0.1713 0.935 0.75 1.31 RCVD_IN_NJABL_RELAY 6.182 33.9689 3.3911 0.909 0.73 0.10 RCVD_IN_NJABL 10.422 44.4054 7.0090 0.864 0.63 0.10 RCVD_IN_SORBS 0.037 0.1505 0.0252 0.857 0.54 2.80 RCVD_IN_SORBS_WEB 2.344 4.1144 2.1667 0.655 0.17 0.00 RCVD_IN_SORBS_SPAM
Tags: accuracy, asrg, behaviour, dns, dnsbl, mail, mailer, spam, spamassassin, time

Bas Janssen said,
November 5, 2006 @ 1:36 pm
Hi Justin,
How do you extract these statistics? From SA logfiles of from your mailbox? Do you have a script for this you would like to share with the world? If so, i’m very interested…
I work for a small isp in the netherlands, serving 10.000+ mailboxes.. We are looking for ways to maximize SA’s efficiency… Some stats on our situation concerning rbl hits would help a lot!!
Regards, bas janssen Amsterdam, the netherlands
Justin said,
November 6, 2006 @ 5:05 pm
Bas — those are results from SA’s “mass-check” tool.
As SA receives mails, it records the rules hit in the X-Spam-Status header; we “mass-checkers” then move the mails into “ham” or “spam” folders. Months later, when we run “mass-check” on those folders, it knows it can reuse the lookup results to get an idea of the accuracy of those network rules.
See the SA wiki, esp http://wiki.apache.org/spamassassin/MassCheck , for more details.
Note that one key potential issue for you guys would be that you have to capture the mails even if they hit an RBL. SA does this, but most large-scale sites using DNS blocklists cannot afford to do so…
Justin said,
November 6, 2006 @ 5:06 pm
oh — also, these are pretty old. there are newer mass-check results on the SA wiki. search for “DNSBL accuracy”…