TREC Spam Corpus

Some news from TREC’s Gordon Cormack:

The TREC 2005 Corpus (92,000 messages - 42,000 ham; 50,000 spam) is now available for self-serve download.

TREC Spam Evaluation is a NIST program to develop methods to measure spam filter accuracy and performance. More details here.

The corpus can be picked up at Gordon’s site. As far as I can tell, this should be a pretty solid corpus for spam researchers and developers.

Tags: , , , , , , , , ,

Comments (2)

Slurpie

Web: Slurpie - (another) distributed peer-to-peer downloading protocol (via HtP).

This looks pretty interesting; no special server is required, Slurpie can be used to download files from a HTTP/FTP server in a ’swarming’ fashion similar to BitTorrent.

However, Slurpie does require a central server of its own, which it needs to ‘know about’ somehow in advance, and that server will then know who’s downloading what. Not sure how you’d do that effectively; in this case, a .torrent-type file format that contains the ‘main’ file URL and a URL for the Slurpie server, might be more effective.

Tags: , , , , , , , , , ,

Comments

Using Subversion With Fedora Core 1

Linux: If you use Fedora Core 1, here’s a yum stanza to download and install Subversion.

Add these lines to /etc/yum.conf:

  [subversion]
  name=Subversion at Summersoft
  baseurl=http://summersoft.fay.ar.us/pub/subversion/bin/subversion-latest/fedora

Run:

yum install subversion

That’s it! svn will now be kept updated using yum.

Tags: , , , , , , , , , ,

Comments

Record business protects Irish and British consumers

Music: … from CDWow selling us cheap CDs. Paddy forwards on the news — ‘CDWow.ie will now charge EUR 3 on every CD sold from their Irish site. And they wonder why people download music illegally…’

It seems that IRMA and the BPI both joined forces in this case against CDWow, hence this decision affects Ireland, too. The record industry are very happy — ‘it is not the consumer that will suffer, just CD Wow’s profit margins.’ Not entirely clear how the consumer doesn’t suffer due to a 3 Euro surcharge, but I’m sure they have it all worked out.

Globalisation where it suits the producers, rather than the consumers, is the name of the game here.

More at The Register.

(Thanks, Paddy!)

Tags: , , , , , , , , ,

Comments

Audio Lunchbox

Music: Audio Lunchbox — let’s just quote the key parts of the FAQ:

  • Audio Lunchbox is the premiere digital download destination for the best new independent music.
  • ALL of the music on Audio Lunchbox is DRM-free. There are no technology imposed usage restrictions on the files you download. You can listen to the files you download however you like as long as it’s for your own personal use.
  • Every track on Audio Lunchbox is available in two formats: MP3 and Ogg Vorbis.
  • Browsers known to work with our service include Internet Explorer, Netscape, Mozilla, Opera, Safari, Galeon, Epiphany and Konqueror.
  • Anyone in the world can download tracks from us.

Good answers!

The music isn’t quite there yet — all I can find is current LA favourites, Death Cab for Cutie, but I can wait. For now, it’ll go alongside Epitonic as a good source of decent MP3s; and I hope the selection builds up well…

Tags: , , , , , , , , ,

Comments

Getting Postfix to use an SSH tunnel for outgoing SMTP

Given all the fuss over blocking dynamic IPs due to spam, I’ve long sent outgoing SMTP via my server (which lives on a static IP). I download my mail from that using fetchmail over an SSH tunnel, and have done for a while. It’s very reliable, and that way it really doesn’t matter where I download from — quite neat. Also means I don’t have to futz with SMTP AUTH, IMAP/SSL, Certifying Authorities, or any of the other hand-configured complex PKI machinery required to use SSL for authentication.

However, I’ve been using plain old SMTP for outgoing traffic, by just poking a hole in the access db for the IP I’m on. A bit messy and generally not-nice.

So I decided to make it sensible and deliver using SMTP-in-an-SSH-tunnel. In the same SSH tunnel, in fact ;) With Postfix, it turned out very easy — here’s how to do it:

Add this option to the SSH commandline in the SSH tunneling script (I’m presuming you have one ;):

-L 8025:127.0.0.1:25

That’ll port-forward port 25 on the remote system to port 8025 on localhost, so that if a connection is made to port 8025 on localhost, it’ll talk to port 25 on the remote host. Std SSH tunneling there.

Now for Postfix — add this to /etc/postfix/main.cf:

default_transport = smtp:localhost:8025

This means that Postfix will always use SMTP to localhost on port 8025 for any non-local deliveries.

Run service postfix reload (cough, Red Hat-ism) and that’s it! A whole lot easier than I was expecting… Postfix rocks.

Tags: , , , , , , , , ,

Comments

EMusic is dead

Music: All good things must come to an end. EMusic has been bought out by some bunch called ‘Dimensional Associates’, and will no longer offer its excellent download service; instead you’re limited to a measly 40 MP3s per month. (For context — last time I downloaded some listening material was on Monday, and I picked up about 80 MP3s in a single sitting.)

They’ve shut down their message boards; third-party discussion groups are filled with wailing and gnashing of teeth; and worst of all, I can’t even download the remaining stuff on ‘My Stash’ (the downloads-to-do list) because they’re overrun with rats deserting the sinking ship. (no reflection on the rats — I’m one myself.) Either that, or they’ve just turned them off; which is annoying as I had lots of music lined up to download when I got a chance.

This is very bad news — Apple’s iTunes is full of crappy music, Mac-only, and DRM-crippled; Rhapsody is Windows-only and DRM-crippled; there’s really no other legal MP3-download option.

I guess I’ll just have to go back to buying 1 or 2 CDs every few months when I’m buying stuff from Amazon (which I do nowadays anyway, in addition to EMusic) and just listening to the radio in general instead.

Thanks anyway, EMusic, for introducing me, helping me get into, or helping me rebuild my collection of such great music as:

  • Ladytron
  • Lemon Jelly
  • Belle and Sebastian
  • TRS-80
  • Yo La Tengo
  • Pepe Deluxe
  • Layo And Bushwacka
  • Asian Dub Foundation
  • The Pixies
  • Stereolab
  • Johnny Cash
  • Future Sound of London
  • Freq Nasty
  • Matmos
  • Cornershop
  • Thievery Corporation
  • Cocteau Twins

It was great while it lasted.

Ah well, I guess I’ll save a tenner a month, which I can put towards the GameFly subscription…

Tags: , , , , , , , , ,

Comments

Download Caps: Pay To Receive Viruses

Many non-US-based broadband systems impose a download cap – a limit on how much data a customer can download in one month. In some of the Irish ISPs’ cases, it’s 3Gb of data per month, with hefty per-Mb charges after that.

Well, here’s something. I filter my mail for viruses and spam on my server, and divert the viruses off to a side folder. I just checked, and that folder contains 1 gigabyte of virus data, received since SoBig.F started up last week.

Given that most users don’t have a colocated server to divert their viruses on, and therefore would have had to download that 1 gigabyte of virus mail before their virus scanner got to take a look — that’s a hefty third of the download cap gone, due to a virus.

I wonder if Eircom, Telstra down under, and the other capping ISPs, will be giving their customers refunds as a result?

(BTW, by contrast, I only received 10 megs of spam.)

Tags: , , , , , , , , ,

Comments

EMusic again

So I’ve signed up for EMusic. Just my luck — with perfect timing, they’ve instituted a new download policy, whereby one has to use a proprietary download application — and it doesn’t work on Red Hat versions after 7.3; to quote their install instructions:

The Linux version of the Download Manager 2.0 was developed for Red Hat 6.2, 7.3 and Mandrake 8.1. Any flavors of Linux outside of these may not support the EMusic Download Manager 2.0. If you are having issues, we recommend that you switch your Linux flavor or OS in order to download with the EMusic Download Manager 2.0.

There’s two workarounds: use the Red Hat 7.3 shared libraries for system libc and libnss, as described by John Anderson of genehack.org here; or apparently, a local proxy can be used as long as you use the IP address of the proxy in the emusicdlm app — not the hostname.

I’m conflicted now; I was about to go recommending this service to all and sundry, but

  • it really makes the Linux version a hell of a lot harder to run. (I hope they fix that, at least). Previously, it was simply ‘right click to download’, which is insanely easy and simple.

  • more worryingly — in my experience, this kind of ‘tightening up’ is often symptomatic of a company running out of cash and spiralling ’round the plughole, IMO. :(

On the good side, once I downloaded and set up the genehack hack^Wworkaround, it’s now working perfectly.

I’ve just downloaded an album from their service in about 3 minutes (at 400Kb/s), first try, and the tracks are all crystal-clear VBR MP3s. Now that’s nice…

(PS: -1 for whichever glibc genius decided to change the libnss API incompatibly.)

Tags: , , , , , , , , ,

Comments

1.4 gigabits per second

Take a look at the BitTorrent bandwidth graphs if you get a chance. The BitTorrent release of Red Hat 9 resulted in a nice smooth ramp up to 1.4 gigabits per second of download traffic, which has been trailing off slowly over the following 20 hours… wow.

Tags: , , , , , , , , ,

Comments