more on social whitelisting with OpenID

An interesting post from Simon Willison, noting that he is now publishing a list of “non-spammy” OpenID identities (namely people who posted one or more non-spammy comments to his blog).

I attempted to comment, but my comments haven’t appeared — either they got moderated as irrelevant (I hope not!) or his new anti-comment-spam heuristics are wonky ;) Anyway, I’ll publish here instead.

It’s possible to publish a whitelist in a “secure” fashion — allowing third parties to verify against it, without explicitly listing the identities contained. One way is using Google’s enchash format. Another is using something like the algorithm in LOAF.

Also, a small group of people (myself included) tried social-network-driven whitelisting a few years back, with IP addresses and email, as the Web-o-Trust.

Social-network-driven whitelisting is not as simple as it first appears. Once someone in the web — a friend of a friend — trusts a marginally-spammy identity, and a spam is relayed via that identity, everyone will get the spam, and tracking down the culprit can be hard unless you’ve designed for that in the first place (this happened in our case, and pretty much killed the experiment). I think you need to use a more complex Advogato-style trust algorithm, and multiple “levels” of outbound trust, instead of the simplistic Web-o-Trust model, to avoid this danger.

Basically, my gut feeling is that a web of trust for anti-spam is an attractive concept, possible, but a lot harder than it looks. It’s been suggested repeatedly ever since I started writing SpamAssassin, but nobody’s yet come up with a working one… that’s got to indicate something ;) (Mind you, the main barrier has probably been waiting for workable authentication, which is now in place with DK/SPF/DKIM.)

In the meantime, the concept of a trusted third party who publishes their concept of an identity’s reputation — like Dun and Bradstreet, or Spamhaus — works very nicely indeed, and is pretty simple and easy to implement.

This entry was posted in Uncategorized and tagged , , , , , , , , . Bookmark the permalink. Both comments and trackbacks are currently closed.


  1. Posted January 23, 2007 at 22:25 | Permalink

    Justin, have you looked at the Advogato algorithm? It seems to be designed to deal with nodes that are not themselves bad, but are “confused” about who the bad nodes are.

  2. Posted January 23, 2007 at 23:20 | Permalink

    Don —

    yep, that’s good news. I seem to recall Raph talking about revisions to it to make it more attack-resistant, though… I wonder if that HOWTO includes them…

  3. Posted January 24, 2007 at 21:18 | Permalink

    I think you only have to worry about the pool being polluted if you try and abstract one monolithic whitelist out of all the activity. I think if you work on the principle that anything or any one you mark as spam remains spam for your list and has no impact on anyone else’s and that if someone elsewhere approves them your local list wins there shouldn’t be a problem. You could even put by each approval which whitelists were checked, so that if there was a problem after a while you could turn one of em off.

  4. Posted January 24, 2007 at 21:20 | Permalink

    Oh and I think it’s different from the web of trust. There you trust individuals and the people that they trust. I think the model here is to trust site-owners and the people they’re prepared to let post onto their sites. It’s way easier to build up trust relationships with a hundred prominent sites over time (and ten friends that you see every day) than with a friend of a friend of a friend…

  5. Posted January 25, 2007 at 11:45 | Permalink

    Thanks for the comments, Tom!

    You may have a point there — instead of using a “web” of trust, with >= 2 levels of depth, keeping it “shallow”, with only one level of links, would make things a lot easier to manage if/when a whitelist makes a mistake.

  6. Posted January 25, 2007 at 12:39 | Permalink

    All that is required to solve the FoaF problem is for the pool of trusted hosts to be mediated by a system that can differentiate between nodes who themselves trust a given host and nodes who trusted a host because they had previously added it from another trusted node. Rather than every node mirroring every list they have ever received at every exchange, each node should only provide the list of hosts that they trust given personal experience (this also lends itself well to hosts being rated on trustworthiness and could be extended to include the extent to which a given node trusts a host ie. the number of non-spam comments). If a previously trusted host is identified as bad then any node whose list contains that host can be ignored until the bad host has been removed from their personal list. Trusted hosts no longer supported by an active node can float in the pool unless they are identified as bad and individual nodes can choose to accept or reject them.

  7. Posted January 27, 2007 at 15:15 | Permalink

    That’s a bit ironic… I’ve got a heuristic-based comment spam system at the moment inspired at least in part by Spam Assassin, but so far you’re one of the only people to get caught by it (you triggered the “too many links in that comment” rule, which I’m going to replace with one that compares percentage of link text to size of actual comment).

  8. Posted January 28, 2007 at 21:16 | Permalink

    ha — I remember we had a similar issue with a rule exactly like that ;)