Skip to content

Archives

Tridge’s Spam Hashing System

Spam: Andrew ‘tridge’ Tridgell’s junkcode directory really does contain some useful snippets, like he said. Here’s spamsum, a checksum algorithm for hashing spam text:

The core of the spamsum algorithm is a rolling hash similar to the rolling hash used in ‘rsync’. The rolling hash is used to produce a series of ‘reset points’ in the plaintext that depend only on the immediate context (with a default context width of seven characters) and not on the earlier or later parts of the plaintext. A stronger hash based on the FNV algorithm is then used to produce hash values of the areas between two reset points. The resulting signature comes from the concatenation of a single character from the FNV hash per reset point.

Very very nice!

Comments closed