Skip to content

Month: April 2006

SpamAssassin in the Google Summer of Code 2006

Are you a student, and interested in earning $4,500 for contributing to open source, and fighting spam, over the course of the summer?

If so, get thee hence to the Google Summer of Code 2006 site, and propose a project!

Last year, we in SpamAssassin didn’t get it together to mentor SoC projects. This year, however, we have a few prospective mentors (including myself), and a few sample project ideas lined up; we’re all ready to go! Here’s the Student FAQ. Be quick; applications end in a week and a bit.

Here’s hoping we get some interesting submissions ;)

links for 2006-04-29

Single-Letter Google Hits

Here’s what happens when you search for single letters on Google:

Interestingly I got to see the new Google search results page, with the sidebar, once. It must be in the process of rolling out…

links for 2006-04-27

links for 2006-04-26

Peoplefeeds and Quick Aggregation

peoplefeeds is cool.

I’ve been looking for something to can aggregate my Flickr, WordPress blog, and del.icio.us feeds into one venue where I can look up items by tag, in a single page-load.

Suprglu was my leading contender, although they weren’t there yet since they didn’t seem to support importing my blog posts with tags preserved — pretty much everything wound up tagged as “uncategorized“. disappointing. :( so I was waiting for them to fix that.

This post by Richard MacManus pointed at another couple of options; 43Things and Peoplefeeds. I hadn’t actually noticed that 43Things was doing this kind of aggregation too; unfortunately as far as I can see, they doesn’t support tag preservation and browsing, so there goes my desired feature. shame.

However, Peoplefeeds was right on target, offering a ‘Unified Tagspace’ and a ‘Search All-Personal-Content’ mechanism. It works nicely, too. Here’s my personal aggregator, combining my Flickr feed, my weblog feed, and my del.icio.us feed into one — and with a unified tag-space; here’s my ‘hiking’ tag, hitting all 3 feeds. Perfect.

One other use for this — I’ve forgotten why I was looking for one of these, but I know I did want one ;) — it can be used to make a “private planet“. If you have 3 or 4 feeds that you need to combine into one, this provides a very easy way to do that; just set up a userid at Peoplefeeds for that purpose.

Phishing and Inept Banks

John-Graham Cumming asks, ‘Are Citibank crazy?’:

I blogged a while ago about Thunderbird’s phishing filter trapping a seemingly innnocent mail. Now, a reader has forwarded to me a genuine email from Citibank that he says was trapped by Thunderbird. I’m not going to reproduce the email here because it contains private details of the user, but it is a valid Citibank message.

Thunderbird thinks it’s a scam because Citibank uses one of the oldest phishing tricks in the book. The have a URL displayed in the message then when clicked goes to a totally different URL.

Sadly, this has proven to be really quite common. We’ve investigated using this rule as a worthwhile phish-detection rule in SpamAssassin, several times, and without much luck. In fact, we’ve had to create a FAQ entry for it — since it’s such a superficially-attractive but ultimately useless, idea, many people have had long discussions on our lists about it!

The companies that produce these false positives in their mails include American Express, Bed Bath & Beyond, Universal Studios, Microsoft, Hilton Hotels — and now Citibank.

A couple of other examples from real mails:

  <a href="http://www65.americanexpress.com/clicktrk/Tracking?
    mid=MESSAGEID&msrc=ENG-ALERTS&url=
    https://www.americanexpress.com/estatement/?12345">
    https://www.americanexpress.com/estatement/?12345</a>

  <A HREF="http://echo.epsilon.com/WebServices/EchoEngine/T.aspx?l=ID">
    https://www.hilton.com/en/ww/email/tab_email_subscriptions.jhtml</A>

By the way, it really is quite impressive for a bank as heavily phished as Citibank to still be making this kind of basic mistake in their mail-outs! It reinforces a point I made in a mailing list posting recently:

As far as I can see, the approach taken by pretty much all banks to their online services is simply too bureaucratic, hide-bound, and fundamentally driven by their marketing departments, to ever cope effectively with phishing. :(

(For what it’s worth, I know Citi have some smart techies working there; but the rest of the company needs to start paying attention to them.)

Optimo vs. Bud Rising

Optimo have a new mix up — the First Hour Mix:

Here’s the fourth in a brief series of mixes where we present something a little different. This mix isn’t really a mix in the conventional sense but rather 17 tracks blended together. To us, the first hour of Optimo, or to be more accurate, the ‘Espacio’ part of Optimo (Espacio) is a vital part of the night. It is our chance to play absolutely what we like without thinking about the dancefloor.

It’s a great mix — certainly not dancy, but some really interesting tracks here. The Optimo guys put together some really great music.

In fact, I went to see them play last Saturday — or, at least, myself and a couple of mates tried to. Supposedly, they were supporting The Juan Maclean at the Bud Rising festival over the weekend, but the show was such a shambles, without anyone having a clue when it started or who was on stage at any time, I’m pretty sure we missed their set entirely.

On top of that, it was EUR20 in, and to add insult to injury, the only lager on sale was Budweiser! I mean, I wouldn’t mind that if the “Bud Rising Festival” deal meant free entrance, but charging 20 squids and then cutting off the supply of decent booze as well, is just a crime.

Ah well, the Filthy Dukes were pretty good at least.

Google Calendar

So I’ve been using this for a few days now — and I’m loving it. A calendaring system that deals coherently with the web:

I keep finding little things that make perfect sense, and just feel more logical than what I’ve used elsewhere. This rocks!

One thing still needs work, though: the links to Mapping fail spectacularly, for non-US addresses at least. But that’s pretty minor.

By the way, I have a feeling that Mac.com had parts of this, but really, you had to drink a lot of Apple kool-aid to use that, and I just didn’t go for that. Sorry Jobs fans.

Do you know what would be cool now? If Upcoming.org published venue/location-specific iCal feeds. Oh look, they do! Awesome…

BT DSL’s Daily Disconnects

Argh! This is what happens every day to my DSL connection, at half past 12:

13 Mon Apr 10 12:26:53 2006 PP12 -WARN  SNMP TRAP 2: link down
14 Mon Apr 10 12:26:53 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
15 Mon Apr 10 12:26:53 2006 PP12 -WARN  SNMP TRAP 3: link up
26 Tue Apr 11 12:26:46 2006 PP12 -WARN  SNMP TRAP 2: link down
28 Tue Apr 11 12:26:48 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
29 Tue Apr 11 12:26:48 2006 PP12 -WARN  SNMP TRAP 3: link up
38 Wed Apr 12 12:26:56 2006 PP12 -WARN  SNMP TRAP 2: link down
40 Wed Apr 12 12:26:58 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
41 Wed Apr 12 12:26:58 2006 PP12 -WARN  SNMP TRAP 3: link up
50 Thu Apr 13 12:27:00 2006 PP12 -WARN  SNMP TRAP 2: link down
52 Thu Apr 13 12:27:03 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
53 Thu Apr 13 12:27:03 2006 PP12 -WARN  SNMP TRAP 3: link up

Worse than that, it will generally assign a different IP address to the connection when it reconnects! This buggers up any applications that rely on long-lived TCP connections, such as SSH shell logins, tunnels, remote-desktop sessions, and instant messaging; all get disconnected and have to be manually re-set up.

Initially, I thought this may have been a flaky connection. However, it appears not — check out those timestamps; that’s a scheduled, daily event. Also, there have been no other disconnections apart from those.

A discussion on the IIU mailing list revealed the reason — it seems BT Ireland have a policy of resetting their customers’ connections daily. That could be OK, if they came right back up with the same IP — TCP/IP is designed to cope with that, and generally does — but it does not do that. Instead the IP address is reassigned every single time.

This is turning out to be quite a nuisance. Working over the internet requires quite a few VPN connections, tunnels, and remote logins, and having to re-set those up, daily, is turning out to be a pain in the neck.

I’m casting around for hacks to get around this. Right now, I have an assortment of jiggery-pokery involving ssh, a shell script ‘while’ loop, and screen(1), but it’s messy and not working out too well. Ideally, I’d set up another VPN (via IPSec or CIPE), and set it up to reconnect on link failure, then route all other VPNs and remote logins out via that — but I don’t have spare routable IPs to do this with. Anyone got any good suggestions?

By the way, it’s worth noting that their FAQ fails to mention this, instead giving some incorrect information about my IP being ‘removed’ when my web browsing session ends:

Is it a fixed IP?

No, the product is set up with dynamic IP Addressing. This means that every time you open your browser you will be allocated a different IP address for the duration of that session. When the session ends the IP Address is removed.

That is incorrect — this has nothing to do with web browsing sessions.

To be honest, I’d prefer not to have to switch ISPs to get away from this brokenness — the rest of the service is quite nice, good pings, good throughput, no other disconnections or outages — but this is quite a problem for someone using BT Broadband for telecommuting purposes. :(

My QuitMeter

I gave up smoking last year on May 26 — that anniversary isn’t too far away. Here’s how much money I’ve saved, courtesy of QuitMeter.com:


QuitMeter Counter courtesy of www.quitmeter.com.

Wow — I could buy myself another iPod! ;)

Software Patenting and “Hot” Fields

Paul Graham’s recent essay on his experience with software patenting has been making the rounds recently.

Now Kevin Marks has commented. Worth reading, since he demonstrates nicely the kind of crap you see in a ‘hot’ field, such as video (which he worked on with Apple’s Quicktime):

I broadly agree with Paul Graham’s essay on Software Patents, but I do think he underestimates the damage from patent trolls, and from what he calls the mafia-like behaviour of some patent holders. Paul has been lucky in the field he has worked in, but in the Audio and Video area there are many patent thickets. … While I was at Apple on QuickTime, there was a steady stream of patent trolls claiming that Apple should pay them royalties; enough to keep several lawyers busy, and a lot of engineers spending time working on prior art evidence demonstrations. Several potential features were excluded from QuickTime due to patent thickets. The obvious one was the Unisys LZW patent that encumbered GIF, but there were other more subtle pressures that meant adopting open source codecs was discouraged. Working on the patent license agreements for MPEG meant that technology ready to ship was deferred pending legal agreement on more than one occasion.

In my experience, that’s what happens — once a field becomes “hot”, patent trolls and other nuisance “inventors” start appearing en masse, and then you’ve got to waste a lot of time dealing with that crap.

RSS Feeds for Events in Dublin

So, now that I’m back in Dublin, I’ve taken a quick look around for ways to keep up to date on upcoming live gigs — and found that the situation, frankly, sucks. In particular, almost none of the sites are offering RSS or Atom feeds yet.

Having said that, Waxy and Leonard‘s Upcoming.org is doing quite nicely for the Dublin metro area:

And lots of credit for the promoter, MCD, who seem to be just about the only Irish listings site who offer RSS:

This is fantastic, but — naturally — they don’t cover events put on by their competitors. ;)

Apart from that, it’s pretty shoddy. Lots of late-90’s-looking websites out there, and no feeds in sight. Thankfully, Feed43, and some perl scripting, is on hand to allow me to take matters into my own hands.

Entertainment Ireland offer a pretty good music news section — but sans feed. Feed43 saves the day:

And, surprisingly, Ticketmaster, of all sites, is turning out to be a great way to find out what’s on in Dublin, listing pretty much all ticketed events in a nice, clean, succinct format. Unfortunately, the highest location resolution it offers for Ireland is the country as a whole. However, this can be worked around by subscribing to individual venues, such as Crawdaddy or The Village. (This has a happy side-effect of narrowing down the types of music — I can skip finding out that The Eagles are playing, since they won’t be playing at Crawdaddy ;)

For some reason, though, Ticketmaster haven’t got around to offering their own RSS feeds. Not a problem — in response I’ve hacked up tm2rss.cgi, a little script which scrapes the venue pages and produces RSS:

For other venues, simply take the venue URL (for example, http://www.ticketmaster.ie/venue/198641 for The Village), add the numeric venue ID in place of NNNNN in this URL: http://taint.org/scraped/tm2rss.cgi?v=NNNNN , then use that as the Feed URL in your feed reader.

A Gotcha With perl’s “each()”

It’s my bi-monthly perl blog entry, to earn my place on planet.perl.org! ;)

Here’s an interesting “gotcha”. Take this code:

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    while(($k,$v)=each %t){print "1: $k\n"; last;}
    while(($k,$v)=each %t){print "2: $k\n";}'

In other words, iterate through all the key-value pairs in %t once, then do it again — but exit early in the first loop.

You would expect to get something like this output:

    1: 1
    2: 1
    2: 3
    2: 2

instead, you see:

    1: 1
    2: 3
    2: 2

The “1” entry in the second loop is AWOL. Here’s why — as “perldoc -f each” notes:

There is a single iterator for each hash, shared by all “each”, “keys”, and “values” function calls in the program

That’s all “each” calls, throughout the entire codebase, possibly in a different class entirely. Argh.

The workaround: reset the iterator using “keys” between calls to “each”:

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    while(($k,$v)=each %t){print "1: $k\n"; last;}
    keys %t;
    while(($k,$v)=each %t){print "2: $k\n";}'

This got us in SpamAssassin — bug 4829.

To be honest, having to call “keys” after the loop is kludgy — as you can see if you check the patch in bug 4829 there, we had to change from a “return inside loop” pattern to a “set variable and exit loop, reset state, then return” pattern. It’d be nice to have a scoped version of each(), instead of this global scope, so that this would work:

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    { while(($k,$v)=scoped_each %t){print "1: $k\n"; last;} }
    # that each() iterator is now out of scope, so GC'd;
    # the next call uses a new iterator, starting from scratch
    { while(($k,$v)=scoped_each %t){print "2: $k\n";} }'

Scoping, of course, has the benefit of allowing “return early” patterns to work; in my opinion, those are clearer — at the least because they require less lines of code ;)