Skip to content

Month: August 2007

Dublin-area Intro To Open Streetmap

A last-minute notice — the Irish Linux Users’ Group are organising an introduction to Open Streetmap tomorrow:

Open Streetmap : An Intro

The ILUG committee is organising an introduction to the Open Streetmap project on Saturday, 1st September, 2007 in Dublin.

This will include info on how to use your GPS and upload your data to the project, to contribute to a free and open map of the world.

The Hamlet Pub, Balbriggan (N 53.61396 W 6.20608 degrees)

Sat, 1st Sep 2007 2pm ~ 5pm

If you have a GPS and a laptop, please feel free to bring them. Wireless internet is available in the venue.

To register interest, please e-mail chairman-at-linux.ie

Not Cosmo

So, we were all set to name our new arrival Cosmo, assuming it was a boy. We were certain it was going to be a boy. Guess what? It wasn’t… so now we have to narrow down the girl-name shortlist in a hurry!

Isn’t she lovely? Lots more photees at Flickr.

Anyway, I may be hard to get hold of for a while… this lady will be keeping me busy I think ;)

Update: Looks like the name is Beatrice Lily Mason, although there’s still a fair bit of indecision, unfortunately ;)

Update 2: Beatrice Lily Gray Mason. Final answer!

Stupid Unicode Tricks

Cool Unicode trick, via Mantari — cut and paste this character into a Unicode-aware application (like this post’s comment box!), then type something and see what happens:

‫‬‭‮‪‫‬‭‮҉

My Nokia 770

A couple of weeks back, there was quite a bit of buzz in the Irish blogosphere and elsewhere about the Nokia 770; prices for new N770s had dropped from $290ish to a very reasonable $140 / EUR130-ish price-point. I, along with a good few others, bought one.

I bought mine through Expansys, with a free 1GB RS-MMC memory card. They’ve sold out and no longer have any N770s listed; however, Buy.com still seem to have them in stock, so if you’re interested, you can probably still pick one up. (It seems Nokia is trying to sell off their remaining N770 stock, cheap, with plans to drop support for the software platform. I’m fine with this, but it may put other buyers off.)

I’ve now been using it for a while, and am still happy. ;) Here are my recommended top apps:

Slimserver. Originally designed to operate as the backend software for the Squeezebox thin-client MP3 player, this has a fantastic UI built for the N770, and its MP3 stream output works perfectly on the tablet.

This is by far the neatest way to get at a 6000-song music library without a laptop; there was some talk in the GNOME community of making a decent DAAP client, but so far there’s no working results there that I could find. :(

maemo-mapper. This is a fantastic mapping app for the tablet; it presents map tiles downloaded from OpenStreetMap or Google Maps in an N770-optimized format, with the usual nice draggable UI. Bonus: it’ll work offline, so you can follow a route while online, then take the tablet along to help navigate.

Tip: once you start maemo-mapper, click the “Download…” button in the “Repository Manager” and it’ll download details for the 5 most useful map repositories, including Google and Virtual Earth.

FBReader. A very nice document reader; much nicer than trying to read long HTML pages in the builtin web browser, especially since it allows you to turn the device on its side.

In general, the Opera Mini browser works fine; be sure to enable Javascript and set up a swap file on the RS-MMC card first. It does all the basic HTML and rudimentary AJAX; Google Calendar is a no-go, but GMail and even Google Maps works adequately, modulo minor bugs. Plain Old HTML sites like Wikipedia, IMDB and so on all work great.

As long as you’re realistic about the platform, it won’t disappoint — video requires custom transcoding, for example, and proprietary apps like Flash and RealPlayer lag behind their desktop equivalents, but as far as I can tell that’s the case for every embedded platform. (Since I spent a couple of years developing such a platform, I’m quite comfortable with this.)

A really really nifty thing about the N770 is that it’s now entirely hackable — within 30 minutes of powering on, I was able to get a terminal window open with a root prompt, and was adding ext3 partitions to the RS-MMC card. Apps are installed using “apt-get”. The terminal even has word-completion system optimized for the UNIX command-line – nice ;)

This SomethingAwful thread contains plenty more good tips. I’m happy I bought it — so many of these gadgets can wind up as an overpriced door-stop, but this is easily worth what I paid for it.

Update: this thread at InternetTabletTalk seems pretty chock-full of good advice, too.

Test my auto-generated ruleset

(I posted this to the SA users and dev lists, too.)

I’ve been working on a new way to auto-generate body rules recently (see previous posts). The results are checked into SVN trunk daily in the “rulesrc/sandbox/jm/20_sought.cf” file.

We haven’t had much time to figure out how to produce auto-generated 3.2.x rule updates for our entire ruleset at updates.SpamAssassin.org, so instead of dealing with that, I’ve taken a shortcut around it ;) I’m now making just the “20_sought.cf” ruleset available as a standalone, unofficial sa-update ruleset at sought.rules.yerp.org.

Before using it, you’ll need the GPG key:

  wget http://yerp.org/rules/GPG.KEY
  sudo sa-update --import GPG.KEY                

then use this to update:

  sudo sa-update \
        --gpgkey 6C6191E3 --channel sought.rules.yerp.org \
        [...other channels...] \
        --channel updates.spamassassin.org

(similar to how you’d use Daryl’s sa-update version of the SARE rulesets.)

Feel free to run sa-update as frequently as you like.

Please consider it alpha; I may take it down in a few months depending on how it goes, or if we can get it working as part of the core updates. In the meantime though, I’m curious to hear how you get on with it. (In particular, copies of false positives would be very welcome.)

Update: it’s been very successful, so I’d now consider it in production.

The Prime Time Group pump-and-dump

Spamnation.info links to an interesting article by Computerworld’s Gregg Keizer about the massive PRTH.PK spam run.

As usual, there are no shortage of suckers:

The spam blast did drive up Prime Time’s share price from Monday’s low of around 7 cents to Wednesday’s high of 11 cents, a 57% jump. Thursday morning, however, the bottom dropped out, and the stock fell to under 7 cents. Trading volumes peaked Wednesday as well, at around 1.7 million shares, substantially higher than any day in the month prior. “You can actually see the wave of activity in the stock and compare it with the volume of spam that we trapped,” said [Sophos analyst Ron] O’Brien.

But here’s an interesting new tactic by the good guys:

Last Wednesday afternoon, Prime Time announced that it was ordering a Non Objecting Beneficial Owners (NOBO) list to get a clearer picture of who owned its shares. “The NOBO list will be used to determine the naked short positions in Prime Time Group Inc.,” the company said in a statement. “The finding will then be reported to the [National Association of Securities Dealers] to take action against the violators of the naked short regulations.”

“Naked short” is a investment term that refers to selling short, essentially a bet that the price will drop, but with a twist: “naked” means that the investor sells short without first making sure he can borrow the shares from another investor holding a “long” position on the stock.

I hope this works; it’d be great to see the profit mechanism behind pump-and-dump spam killed off.

Spamnation notes:

Incidentally, the greeting card spam that built the botnet used to promote PRTH.PK and CYTV.OB also continues. It has iterated through another couple of generations: the current incarnation tells recipients to collect their custom Musical ecard or custom Movie-quality ecard or other variants on that theme. We’ve seen about 150 of these in the past three days, suggesting that the unknown senders are probably well on their way to building up another botnet for their next stock spam run.

Spreading trojans via greeting-card spam is a trademark of the gigantic Storm botnet, AFAIK: SecureWorks info, MessageLabs info, spam levels causing DDoS for Canadian networks, DDoS threat for EDU sector.

The Haughey 419 returns

A few months back, Blogorrah noted an amazing 419 scam, claiming to be a missive from ex-Taoiseach of Ireland Charlie Haughey‘s wife, Maureen. It’s really quite appropriate Charlie becoming the subject of a scam himself, given what he did to this country. But anyway… over the weekend, a new variant on the theme emerged:

From Mrs Maureen Haughey, ROI

My Dear Friend,

I am Maureen Haughey, widow of former Taoiseach of the Republic of Ireland, Charles J. Haughey and daughter of former Taoiseach of the Republic of Ireland and heir to de Valera, Sean F. Lemass.The Press has written a lot about unresolved mysteries and corruption surrounding CharlesÂ’s dealings, but I tell you something,my Charlie was a good man. He was human and he did whatever he did.

People marvel why I stuck with Charlie and didn’t speak during the mess that came with the exposure of his affairs with Terry Keane (I just hate to think of her). I had to stand by him through the tribunal times…. it was to do with what I’m doing now. No one knew the details of all Charlie’s financial dealings but me. I remain the only one who knows all who got loans from Charlie and didn’t come back to pay when he was disgraced. I am the only one who knows about these monies and the other Ansbacher accounts.

I write to you, an old weary woman, sick and almost tired of living. My end is near but I will not depart until my final mission is accomplished and I also write this with an unshaken belief in the power of aspirations and dreams of a human being. The Irish government thinks it can shave and reduce me to a poor widow but I have the winning ace. A few years ago, when we werenÂ’t sure if my Charlie would be convicted, he kept some money in trust for me in a Security and Finance company. He did not open the account in our names so it will not be traced to us to enable the past remain the past. The name on the account is Cedric de Vregille. I never thought Charlie would leave me so soon and it never occurred to me to ask if this name were fictitious or not or a name of any of his friends. I have tried to find this man but to no avail. The amount he deposited in this name is 30,000,000 (Thirty Million Euros).

I want an honest person to come forward and lay claims to this amount, moreover to use the funds as instructed by me. I have all the documents needed, I just need a face for the name. I have mapped out 30% of the funds for you, as you will help us (you and I) execute this job.

As soon as I receive your acceptance for this work I shall give you necessary details of my solicitor who will facilitate the release of the funds in your name. Please reply me via my personal email: [email protected]


For my security and the sake of letting sleeping dogs lie, I strongly advice that you keep our dealings confidential. You can read more about my charlie from:

http://www.ireland.com/focus/haughey/ITstories/story11.htm

http://www.teachersparadise.com/ency/en/wikipedia/c/ch/charles_haughey.html

http://www.everything2.com/index.pl?node_id=548983&lastnode_id=0

Thank You.


Message sent using UebiMiau 2.7.2

It was sent via a webmail system at nildram.co.uk, from a proxy in Australia.

The writing is amazingly ornate — ‘I write to you, an old weary woman, sick and almost tired of living’, ‘the Irish government thinks it can shave and reduce me to a poor widow but I have the winning ace’, etc. Very odd stuff. Also, it looks spell-checked. And, once again, poor old cyclist Cedric de Vregille gets dragged into it, too! I wonder what he did to deserve that ;)

If you fancy scambaiting, ‘[email protected]’ is the one to go for. These guys seem to be having a good go of it‘The thought of the Irish government trying to shave an old woman has shocked and appauled me, so I will assist in anyway possible.’ ha!

Rule Discovery Progress Update

Back in March, I wrote a post about a new rule discovery algorithm I’d come up with, based on the BLAST bioinformatics algorithm. I’m still hacking on that; it’s gradually meandering towards production status, as time permits, so here’s an update on that progress.

There have been various tweaks to improve memory efficiency; I won’t go into those here, since they’re all in SVN history anyway. But the results are that the algorithm can now extract rules from 3500 spam and 50000 ham messages without consuming more than 36 MB of RAM, or hitting disk. It can also now generate a SpamAssassin rules file directly, and apply a basic set of QA parameters (required hit rate, required length of pattern, etc.).

On top of this, I’ve come up with a workflow to automatically generate a usable batch of rules, on a daily basis, from a spam and ham corpus. This works as follows:

  • Take a sample of the past 4 days traffic from our spamtrap network. Today this was about 3000 messages.

  • add the hand-vetted spam from my own accounts over the same period (this helps reduce bias, since spamtraps tend to collect a certain type of spam), about 3400 messages.

  • discard spams that scored over 10 points (to concentrate on the stuff we’re missing).

  • Pass the remaining 3517 spams, and text strings from over 50000 nonspam messages, into the “seek-phrases-in-log” script, specifying a minimum pattern length of 30 characters, and a minimum hitrate of 1% (in today’s corpus, a rule would have to hit at least 34 messages to qualify).

  • That script gronks for a couple of minutes, then produces an output rules file, in this case containing 28 rules, for human vetting. (Since I’ve started this workflow, I’ve only had to remove a couple of rules at this step, and not for false positives; instead, they were leaking spamtrap addresses.)

  • Once I’ve vetted it, I check it into rulesrc/sandbox/jm/20_sought.cf for testing by the SpamAssassin rule QA system.

The QA results for the ruleset from yesterday (Aug 3) can be seen here, and give a pretty good idea of how these rules have been performing over the past week or two; out of the nearly 70000 messages hit by the rules, only 2 ham mails are hit — 0.0009%.

In fact, I measured the ruleset’s overall performance in the logs provided by the 4 mass-check contributors who provided up-to-date data in yesterday’s nightly mass-check; bb-jm, jm, daf, dos, and theo (all SpamAssassin committers):

Contributor Hits Spams Percent
bb-jm 4249 24996 17.00%
jm 3450 14994 23.00%
daf 1236 35563 3.48%
dos 32867 100223 32.79%
theo 28077 382562 7.34%

(bb-jm and jm are both me; they scan different subsets of my mail.)

The “Percent” column measures the percentage of their spam collection that is hit by at least one of these rules; it works out to an average of 16.72% across all contributors. This is underestimating the true hitrate on “fresh” spam, too, since the mass-check corpora also include some really old spam collections (daf’s collection, for example, looks like it hasn’t been updated since the start of July).

Even better, a look at the score-map for these rules shows that they are, indeed, hitting the low-scoring spam that other rules don’t hit.

That’s pretty good going for an entirely-automated ruleset!

The next step is to come up with scores, and publish these for end-user use. I haven’t figured out how this’ll work yet; possibly we could even put them into the default “sa-update” channel, although the automated nature of these rules may mean this isn’t a goer.

If you’re interested, the hits-over-time graph for one of the rules (body JM_SEEK_ICZPZW / Home Networking For Dummies 3rd Edition \$10 /) can be viewed here.