Skip to content

Month: July 2005

SpikeSource, Open Source, and Bongo

Open Source: so I was just looking at OSCON 2005‘s website, and I noticed that it listed Kim Polese, of SpikeSource, as a presenter.

I don’t really pay any attention to what’s happening in Java these days, but it appears that SpikeSource launched last year to provide ‘enterprise support services for open-source software’ with a Java/enterprise slant.

Funnily enough, my last encounter with a Kim-Polese-headed company did indeed have a big effect on me, open-source-wise.

That company was Marimba, and they made an excellent Java GUI builder called Bongo. In those days (nearly ten years ago!), I was working on a product for Iona as a developer, in Java and C++, and we needed to provide a GUI on a number of Java tools. I chose to use Bongo, as it had a great feature set and looked reliable.

Wow, was I wrong! The software was reliable — sadly, the same couldn’t be said about the vendor. What I hadn’t considered was the possibility that the company might decide to discontinue the product, and not offer any migration help to its customers — and that’s exactly what happened, Sometime around 1998, Marimba decided that Bongo wasn’t quite as important as their Castanet ‘push’ product, and dropped it. Despite calls from the Bongo-using community to release the code so that the community could maintain it and avoid code-rot, they never did, and as a result apps using Bongo had to be laboriously rewritten to remove the Bongo dependencies.

I learned an important lesson about writing software — if at all possible, build your products on open source, instead of relying on a fickle commercial software vendor. It’s a lot harder to have the rug pulled out from under you, that way.

Update: Well, it seems it was quite far off the mark about Marimba. Someone who worked at Marimba at the time read the blog entry, and got in touch via email:

I was an employee of Marimba in the early days, and was around when we developed Bongo, and still later, when we discontinued it, and still later, when Bongo *was* released to the open-source community (jm: appears to be around the start of 1999 I think). It was hosted on a site called freebongo.org and continued to be enhanced with new features and a lot of new and cool widgets. It was ultimately discontinued a few years later due to lack of interest.

It was hosted and primarily maintained in the open-source community by one of the original Bongo engineers. Here’s a link from the Java Gazette from the days when it was called Free Bongo.

So don’t go blaming Marimba. We did listen to our users and release the code!

Fair enough — and they deserve a lot more credit than I’d initially assumed. I guess I must have missed this later development after leaving Iona. Apologies, ex-Marimbans!

Patents and Laches

Patents: This has come up twice recently in discussions of software patenting, so it’s worth posting a blog entry as a note.

There’s a common misconception that a patenter does not necessarily need to enforce a patent in the courts, for it to remain valid. This isn’t true in the US at least, where there is the legal doctrine of ‘laches’, defined as follows in the Law.com dictionary:

Laches – the legal doctrine that a legal right or claim will not be enforced or allowed if a long delay in asserting the right or claim has prejudiced the adverse party (hurt the opponent) as a sort of ‘legal ambush’.

The Bohan Mathers law firm have a good paragraph explaining this:

…the patent holder has an obligation to protect and defend the rights granted under patent law. Just as permitting the public to freely cross one’s property may lead to the permanent establishment of a public right of way and the diminishment of one’s property rights, so the knowing failure to enforce one’s patent rights (one legal term for this is laches) against infringement by others may result in the forfeiture of some or all of the rights granted in a particular patent.

See also this and this page for discussion of cases where it was relevant. It seems by no means clear-cut, but the doctrine is there.

CEAS

Spam: back from CEAS. The schedule with links to full papers is up, so anyone can go along and check ’em out, if you’re curious.

Overall, it was pretty good — not as good as last year’s, but still pretty worthwhile. I didn’t find any of the talks to be quite up to the standards of last year’s TCP damping or Chung-Kwei papers; but the ‘hallway track’ was unbeatable ;)

Here’s my notes:

AOL’s introductory talk had some good figures; a Pew study reported that 41% of people check email first thing in morning, 40% have checked in the middle of the night, and 26% don’t go more than 2-3 days without checking mail. It also noted that URLs spimmed (spammed via IM) are not the same as URLs spammed — but the obfuscation techniques are the same; and they’re using 2 learning databases, per-user and global, and the ‘Report as Spam’ button feeds both.

Experiences with Greylisting: John Levine’s talk had some useful data — there are still senders that treat a 4xx SMTP response (temp fail) as 5xx (permanent fail), particularly after end of the DATA phase of the transaction, such as an ‘old version of Lotus Notes’; and there are some legit senders, such as Kodak’s mail-out systems, which regenerate the body in full on each send, even after a temp fail, so the body will look different. He found that less than 4% of real mail from real MTAs is delayed, and overall, 17% of his mail traffic was temp-failed. The 4% of nonspam that was delayed was delayed with peaks at 400 and 900 seconds between first tempfail and eventual delivery.

As usual, there were a variety of ‘antispam via social networks’ talks — there always are. Richard Clayton had a great point about all that: paraphrasing, I trust my friends and relatives on some things, and they are in my social networks — but I don’t trust their judgement of what is and is not spam. (If you’ve ever talked to your mother about how she always considers mails from Amazon to be spam, you’ll know what he means.)

Combating Spam through Legislation: A Comparative Analysis of US and European Approaches:
the EU ‘opt-in’ directive is now transposed everywhere in the EU; EU citizens who are spammed by a citizen from another EU country, the reports should be sent to the antispam authority in the sender’s country; and there’s something called ‘ECNSA’, an EU contact network of spam authorities, which sounds interesting (although ungoogleable).

Searching For John Doe: Finding Spammers and Phishers: MS’ antispam attorney, Aaron Kornblum, had a good talk discussing their recent court cases. Notably, he found one cases where an Austrian domain owner had set up a redirector site which sounded like it was expressly set up for spam use — news to me (and worrying).

A Game Theoretic Model of Spam E-Mailing: Ion Androutsopoulos gave a very interesting talk on a game theoretic approach to anti-spam — it was a little too complex for the time allotted, but I’d say the paper is worth a read.

Understanding How Spammers Steal Your E-Mail Address: An Analysis of the First Six Months of Data from Project Honey Pot: Matthew Prince of Project Honeypot had some excellent data in this talk; recommended. He’s found that there’s an exponential relationship between google Page Rank and spam received at scraped addresses, which matches with my theory of how scrapers work; and that only 3.2% of address-harvesting IPs are in proxy/zombie lists compared to 14% of spam SMTP delivery IPs. (BTW, my theory is that address scraping generally uses Google search results as a seed, which explains the former.)

Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs): this presented some great demonstrations of how a neural network can be used to solve HIPs (aka CAPTCHAs) automatically. However, I’m unsure how useful this data is, given that the NN required 90000 training characters to achieve the accuracy levels noted in the paper; unless the attacker has access to their own copy of the HIP implementation they can run themselves, they’d have to spend months performing HIPs to train it, before an attack is viable.

Throttling Outgoing SPAM for Webmail Services: cites Goodman in ACM E-Commerce 2004 as saying that ESP webmail services are a ‘substantial source of spam’, which was news to me! (less than 1% of spam corpora, I’d guess). It then discusses requiring the submitter of email via an ESP webmail system to perform a hashcash-style proof-of-work before their message is delivered. By using a Bayesian spam filter to classify submitted messages, the ESP can cause spammers to perform more work than non-spammers, thereby reducing their throughput. Didn’t strike me as particularly useful — Yahoo!’s Miles Libbey got right to the heart of the matter, asking if they’d considered a situation where spammers have access to more than one computer; they had not. A better paper for this situation would be Alan Judge’s USENIX LISA 2003 one which discusses more industry-standard rate-limiting techniques.

SMTP Path Analysis: IBM Research’s anti-spam team discuss something very similar to several techniques used in SpamAssassin; our versions have been around for a while, such as the auto-whitelist (which tracks the submitter’s IP address rounded to the nearest /16 boundary), since 2001 or 2002, and the Bayes tweaks we added from bug 2384, back in 2003.

Naive Bayes Spam Filtering Using Word-Position-Based Attributes: an interesting tweak to Bayesian classification using a ‘distance from start’ metric for the tokens in a message. Worth trying out for Bayesian-style filters, I think.

Good Word Attacks on Statistical Spam Filters: not so exciting. A bit of a rehash of several other papers — jgc’s talk at the MIT conference on attacking a Bayesian-style spam filter, the previous year’s CEAS paper on using a selection of good words from the SpamBayes guys, and it entirely missed something we found in our own tech report — that effective attacks will result in poisoned training data, with a significant bias towards false positives. In my opinion, the latter is a big issue that needs more investigation.

Stopping Outgoing Spam by Examining Incoming Server Logs: Richard Clayton’s talk. Well worth a read. It’s an interesting technique for ISPs — detecting outgoing spam by monitoring hits to your MX from your own dialup pools which uses known ratware patterns.

Anonymous remailers being tampered with

Politics: EDRI-gram notes that the Firenze Linux User Group’s server was tampered with last month at its ISP colo:

On Monday 27 June 2005, two members of FLUG (Firenze Linux User Group) visited the data centre of Dada S.p.a., in Milan, where the community server of the group is physically housed, in order to move it to another provider.

When the server was put out of the rack, however, it was discovered that the upper lid of the server case was half-opened. At a closer inspection, it was also discovered that the case lid was scratched, as if it had been put out and reinserted into the rack. Worse, the CD-ROM cable was missing, as were the screws that kept the hard disks in place.

What is particularly worrying is that the server hosted an anonymous remailer, whose keys and anonymity capabilities could have been compromised. Considering what happened to Autistici/Inventati server – which hosted another anonymous remailer – this possibility is not so far fetched. This begs the question whether a co-ordinated attempt at intercepting anonymous/private communications on the Internet has been ongoing in the past weeks and months.

Bizarre goings-on.

looking at the new DKIM draft

The combined DKIM standard, mixing Yahoo!’s DomainKeys and Cisco’s IIM, has been submitted to the IETF as a candidate spec by the MASS ‘pre-working group effort’. I like the idea behind both (a few years back, I, a few other SpamAssassin developers, and several others came up with the roots of a message-signature anti-forgery scheme we called ‘porkhash’, but never really went anywhere with it), so I’m glad to see this one progressing nicely.

Seeing as I never seem to write much about anti-spam here any more, I might as well remedy that now with some comments on the new DKIM draft. ;)

It’s a very good synthesis of the two previous drafts, DomainKeys and IIM, more DK-ish, but taking the nice features from IIM.

The ‘h=’ tag is now listed as REQUIRED. This specifies the list of headers that are to be signed. If I recall correctly, this was added in IIM, modifies the behaviour of DK, and is a good feature — it protects against in-transit corruption by, (a) specifying an order of the headers, to protect against MTAs that reorder them; and (b) allowing sites to protect the ‘important’ headers (From, To, Subject etc.) and ignore possible additions by MTAs down the line (scanner additions, mailing list munging and additions, and so on).

A list of recommended headers to sign is included, with From as a MUST and Subject, Date, Content-Type and Content-Transfer-Encoding as a SHOULD.

Forwarding is, of course, just fine. This one doesn’t suffer from the SPF failure mode, whereby a forwarder will break a signature if it doesn’t rewrite the SMTP MAIL FROM sender address. (Of course, it now has its own new failure modes — the message must be forwarded in a nearly-pristine state.)

The message length to sign can be specified with ‘l=’. This may be useful to protect against the issue where mailing list managers add a footer to a signed message. It recommends that verifiers remove text after the ‘l’ length, if it appears, since that offers a way for spammers to reuse existing signatures. I still have to think about this, but I suspect SpamAssassin could give points for additional text beyond the ‘l=’ point that doesn’t match mailing list footer profiles.

The IIM HTTP-based public-key infrastructure is gone; it’s all DNS, as it was in DK.

The ‘z=’ field, which contains copies of the original headers, is a great feature for filters — we can now pragmatically detect ‘acceptable’ header rewriting if necessary, and handle recovery at the receiver end.

Multiple signatures, unfortunately, couldn’t be supported. I can see why, though, it’s a very hard problem.

The ‘Security Considerations’ section is excellent — 9.1.2 uses a very clever HTML attack.

Looks like development of DKIM-Milter, and an associated library, libdkim, are underway.

Given all that, it looks good. It’s not clear how much we can do with DK, and now DKIM, in SpamAssassin, however — it’s very important in these schemes that the message be entirely unmunged, and in most SpamAssassin installs, the filter doesn’t get to see the message until after the delivering MTA, or the MDA (Message Delivery Agent), has performed some rewriting. This would cause FPs if we’re not very, very careful.

I hope though, that we can find a useful way to trust DKIM results. It appears likely that they’d make an excellent way to provide trustworthy whitelisting — ‘whitelist_from_dkim’ rules, similarly to our new whitelist_from_spf support. (In fact, we could probably just merge both into some new ‘whitelist_from_authenticated’ setting.)

OpenWRT vs Netgear MR814: no contest

Hardware: After a few weeks running OpenWRT on a Linksys WRT54G, here’s a status report.

Things that the new WRT54G running OpenWRT does a whole lot better than the Netgear MR814:

  • Baseline: obviously it doesn’t DDoS the University of Wisconsin, and it doesn’t lose the internet connection regularly, as noted in that prior post. I knew that, so those are not really new wins, though.
  • It’s quite noticeably faster. I’ve seen it spike to double the old throughput rates, and it’s solid, too; less deviation in those rates.
  • It doesn’t break my work VPN. I wasn’t sure if it was the MR814 that was doing this, requiring an average of about 20 reconnects per day — now, I know it for a fact. I’ve had to reconnect my VPN connection about 4 times over the past week.
  • It doesn’t break the Gigafast UIC-741 USB wifi dongle I’m using on the MythTV box. Previously that would periodically disappear from the HAN. Again, I had this pegged as an issue with the driver for that; removing the MR814 from the equation has solved it, too, and it’s now running with 100% uptime so far.
  • It does traffic shaping with Wondershaper, so I can use interactive SSH, VNC, or remote desktop while downloading, even if it’s another machine on the HAN doing the download.
  • It’s running linux — ssh’ing in, using ifconfig, and vi’ing shell scripts on my router is very, very nice.

Man, that MR814 was a piece of crud. ;) I can’t recommend OpenWRT enough…

EU software patents directive is history

Patents: A great outcome! The proposed Directive has been dropped, in the face of massive opposition. Coverage: /., FFII, FT.com, VNUnet, FSFE.

Unfortunately, Rocard’s proposed amendments which would have turned this directive into a major win for us, didn’t pass — but it’s still a good win. Software patents are not explicitly legal throughout Europe; although some jurisdictions do permit them, they’re in a legal grey area, and prosecution is therefore hard (and very expensive for patent holders). This is a much better situation than if the directive as proposed by the Council had passed, since that would have explicitly legalised them throughout the EU.

On top of this win, what I find significant is that we’ve now brought the issue from where it was a few years ago, as a minor concern known only to a few uber-geeks, to a major political issue that made headlines around the globe. Even my local NPR affiliate reported on this decision! That’s a far cry from the mid-90’s, when I had a hard time explaining the point of theLeague for Programming Freedom to my hacker friends in the TCD Maths Department.

A great quote from the VNUnet article:

‘This represents a clear victory for open source,’ said Simon Phipps, chief open source officer at Sun Microsystems. ‘It expresses Parliament’s clear desire to provide a balanced competitive market for software.’

Yes, that’s Sun saying that less software patenting is a good thing. Believe me, that’s a great leap forward. Or check out Irish MEP Kathy Sinnott’s amazing comments. She hits the nail right on the head; I’m very impressed by that speech.

McCreevy seeing anti-globalisation protesters everywhere

Patents: I’m just back from a fantastic holiday weekend, totally offline, hiking through Catalina Island. I’m a little bit sunburnt, my nose is peeling, but it was great fun. I got a fantastic picture of the sun setting over hundreds of boats bobbing at their moorings in Two Harbors, which I must upload at some stage.

Anyway, it seems that over the weekend, the EU software-patents debate has swung back heavily towards the anti-swpat side. Fingers crossed — the vote is this week.

Also, today, EUpolitix.com has an interview with Charlie McCreevy, quoting him as saying:

‘The theme, or the background music, to both of these particular directives (the CII and Services Directives) you could see as part of, anti-globalisation, anti-Americanism, anti-big business protests — in lots of senses, anti-the opening up of markets’

This is standard practice for the Irish government — they did exactly the same thing with the e-voting issue, painting the ICTE as ‘linked to the anti-globalisation movement’. (I have a feeling they think that any group organised online must be ‘anti-globalisation’, at this stage.)

Of course, with these accusations of being anti-free-market, it’s important to remember that a patent is a government-issued monopoly on an invention (or in the software field, on an idea), in a particular local jurisdiction. If anything, being against software patenting is a pro-free-market position, one shared by prominent US libertarians; and nothing gets more pro-free-market than those guys. ;)

CEAS coming up soon…

Spam: if you work in anti-spam, especially in filtering, or even just in working with email in general, it’s well worth going to CEAS 2005, the Conference on Email and Anti-Spam, on Thursday July 21st and Friday 22nd in Stanford:

The organizers of the Conference on Email and Anti-Spam invite you to participate in its second annual meeting. This forum brings together academic and industrial researchers to present new work in all aspects of email, messaging and spam — with papers this year covering fields as diverse as text classification, clustering and visualization of email, social network analysis applied to both email and spam, spam filtering methods including text classification and systems approaches, game theory, data analysis, Human Interactive Proofs, and legal studies, among others. The conference will feature 26 paper presentations, a banquet, and two invited speakers. See http://www.ceas.cc for details of the current program, as well as on-line registration.

Registration runs out on July 10th.

I went last year, and it was excellent — several very interesting papers were presented. I’m going this year, too, along with quite a few SpamAssassin committers, and I’m looking forward to it.