Skip to content

Month: January 2006

Weblog Spam and Adversarial Classification

Dr. Dave, author of the Spam Karma WordPress antispam plugin, has posted an interesting article about new weblog-spammer tactics:

These spams do not present most of the idiotic traits of their lower colleagues: they do not try cramming hundreds of URLs or inserting hundreds of easily spotted junk keywords in the comment content. Instead, they use only the dedicated name and homepage fields to sneak in spam URL and keywords. The comment content is often perfectly innocuous, sometimes even topical (by copying parts of another comment or a trackbacking post). All in all, these spams could easily be missed by a human moderator who wouldn’t look carefully at the contact name and URL.

(Thanks to Kelson Vibber for the pointer to this.)

In other words, he is noting what we noticed in email anti-spam; that what works well one year, is likely to degrade over time as the spammers attempt to evade it, and one has to keep working to keep up.

The best term for this appears to be adversarial classification. Anti-spam activities fall into this category, and it often means that classic text classification algorithms aren’t suitable — after all, the Reuters-21578 dataset never tried to evade your classifier ;)

In a similar vein, this MS research paper is interesting:

Previous work on adversarial classification has made the unrealistic assumption that the attacker has perfect knowledge of the classifier. …. We present efficient algorithms for reverse engineering linear classifiers with either continuous or Boolean features and demonstrate their effectiveness using real data from the domain of spam filtering.

It’s akin to John Graham-Cumming’s work looking into how a spammer could get past a bayesian filter “from the outside”, but with more techniques, and examining MS’ MaxEnt algorithm, too. PDF here, well worth a read.

(By the way, I’m in the process of moving house, so if you send me an email, it may take a while for me to reply. This situation is likely to prevail for the next few weeks, for what it’s worth — fun.)

Raw Food Crackpottery

Via RobotWisdom, a review of a new Primrose Hill cafe:

No wheat. No gluten. No sugar. No GMO. No dairy. No yeast. No shoes.

Yep, no shoes. If you want to enjoy the detoxifying glories of London’s first raw-food cafe, then please leave your clod-hoppers at the door, along with your high stress levels and your smart-arse scepticism.

I know of another cafe elsewhere which also offered a largely-raw menu. This one, however, shared a back alleyway with a shop where a friend of mine worked.

He noted that on several occasions, he’d seen rats near, or on, the pallets of plastic-wrapped fruit and vegetables. You see, the raw food was delivered to the kitchen door, where it laid outside for a short while — in the rat-infested alleyway. Rats crawling over your food, naturally, is not a good thing.

There’s a very good reason why some smart stone-age ancestor invented cooking our food — because it kills the germs that’ll make us sick!

Devotees claim that because the enzymes are destroyed when food is heated above 48C, our bodies have to utilise our own enzymes to break down the food, which can result in us feeling tired and run-down.

Yeah, devotees are pretty much talking crap there. ;) If anything, cooked food is easier to digest than raw. And good luck with the whole ‘getting by without using enzymes’ thing!

What a load of quackery.

Happy Spam-Solved Day!

Happy BillG-Scheduled Spam Solved Day!

“Two years from now, spam will be solved,” Microsoft’s Bill Gates said [at the 2004 World Economic Forum in Switzerland].

So is it? Weeeeell…..

To “solve” the problem for consumers in the short run doesn’t require eliminating spam entirely, said Ryan Hamlin, the general manager who oversees [Microsoft]’s anti-spam programs. Rather, he said, the idea is to contain it to the point that its impact on in-boxes is minor.

In that way, Hamlin said, Gates’ prediction has come true for people using the right tactics and advanced filtering technology.

Ha. I am reminded of ‘weapons of mass destruction-related program activities’.

As one slashdotter says, ‘when you fail, try try again; or conversely, change the requirements and make it look like a success, which is exactly what BG has done.’

It’s not washing, though, unsurprisingly. The poll on the same page, asks ‘do you agree with Microsoft’s contention that the spam problem has been “solved”?’ Right now, with 1169 votes, it has 7.2% (in other words, the MS employees) agreeing, and a whopping 92.8% not going for it.

SweetheartsConnection.com – Interesting Dating Scam

Here’s an interesting online scam. An anonymous friend, working in anti-spam, writes:

‘I’ve been covertly looking into rumours of a myspace scam and thought you might like to blog it – I don’t want to be attached to this in any way otherwise I’d write about it myself (I have a profile on there that I want to keep around in case other scams show up, but I don’t really want to advertise the profile).

It works like this:

You sign up for a myspace account and fill in your profile details. Then in a couple of days someone contacts you pretending they’re using their friend’s account because they haven’t signed up yet. They say something along the lines of “I saw your profile and thought you were cute, if you’re interested email me at (random)@yahoo”. If you email them, you get a reply back being all bubbly and cute, and a link to a web page that sort of looks like a “My First Homepage” – it even says “I’m taking a course at the community college in HTML”. There are pics on the page of a very cute girl, but at the bottom a teaser saucy picture in lingerie, and an Adult Pass signup to get more pics. Of course the signup is $40.

It’s a subtle scam, but definitely a scam. Here’s an example of the type of site you get sent to:

http://www.honesthost5mb.com/kristenssite/

Note the hosting service. Now delete the /kristenssite/ part and it looks legit, right? Until you click on a few links and realise they have nothing to sell.

Google has no knowledge of honesthost5mb – nobody links to them, so how did Kristen find them?

It’s indeed quite funny that there’s a terribly similar hosting service out there: http://www.jagflyhosting.com/ – yet for some reason all their links seem to work, and they have an accessible phone number. Shock. Horror!

I’m pretty sure the account being (ab)used on myspace is a stolen one – it looks pretty legit, including linked in friends and comments, so I’m suspecting a cracked password.

Anyway, thought you could blog this to warn others about it (feel free to advertise the above link – though I guess that’ll ruin the whole “google doesn’t know” thing ;-) I wish I had the guts to sign up for the extra pics to see what you end up with!’

They also passed on the email content, noting ‘here’s the email sent from yahoo webmail from an AOL account (sadly AOL proxies all web content so I can’t track it any further than New York proxies)’:

Hi [redacted] ! Hey you found me! I was a little worried you wouldn’t be able to :P so, how are you? I’m ok.. I’m sneaking a email in at work before my boss comes back in, so sorry if it’s a little short! I promise to write more later :)

So I promised you some pics:P well I will have to send you some of me when I get home (don’t have the pics here at work). In the meantime you can check out my personal homepage. It’s kind of playground while I’m taking this intro to HTML class, kind of like my blog page. Here is the link: http://www.honesthost5mb.com/kristenssite It’s not much yet but it’s getting there. hehe

So tell me more about yourself, are you a work to live or live to work kinda person? What are you looking for in a girl? Do you like myspace? I think I’ll make a profile soon, it’s free right? and you can add your own HTML? That would be cool.. So how is your 2006 going? Mine is ok, one thing I’m excited about though is that today is exactly 1 week before my birthday. Hey, maybe if we hit it off, we can go on a first date on my birthday, that would be really cool. :)

Anyways, enough with the 20 questions right? oh, I prefer to chat on IM, its more personal you know? Do you have AIM? im kriskat224 on there, msg me sometime ok?

Well I should log off and get some work done.. Write back soon! and take care!

xoxo ~ Kristen

Sure enough, a little further research on Google yields the following examples…

The earliest is this story at Jiveworld.net, of 2004-05-24, noting:

Aaron recently received an e-mail from someone he supposedly chatted with on Match.com:

Aaron: I had actually been chatting with someone I might have met there a LONG time ago. I couldn’t remember, so I gave her the benefit of the doubt. I thought it was SPAM, but hey, even my own e-mails sounds like SPAM sometimes. She sent me a picture in her e-mail, but the mail service she was using didn’t like it. So she sent me the link to her “website.” It initially seemed like a real personal web space until the big ADULT BUREAU logo appeared. Oh yes, very legitimate.

This was a unique experience for me since someone actually wrote a tailored response to my e-mail, responding to specific things I had mentioned. Even though the bulk of the e-mail seemed form generated, this had to have been a time intensive process for damn near no return. Well, after the ADULT thing, I thought my response to her e-mail was inventive. Since I haven’t received another response, it’s obvious she (Or he) took the hint.

Another: a thread at FordPower.net, 2004-09-24, with a link to http://www.4mbwickedweb.com/sites/melissa/ (since expired);

Another: a Fark thread posting, 2005-01-28, scroll down to the posting of ‘2005-01-28 10:42:28 AM’ by ‘XavierCrutch’, linking to http://www.stepstonehost.com/jesshomepage/ (since expired);

Another: this weblog post, scroll down to March 13, 2005, ‘Personal ads and the great porn conspiracy’, where the poster is snared, via IM with AIM user natkat224 this time, and is sent another link to a site using http://adultbureau.sweetheartsconnection.com/ to collect the $40 fee;

Another: another weblog post, 2005-10-28.

A google search for the AIM username ‘natkat224’ reveals plenty more hits.

So here’s a list of the sites found from those links, and via google, so far:

The common host, at all stages, is ‘SWEETHEARTSCONNECTION.COM’, registered to

INTERTRANS TRADING OVERSEAS LIMITED
VASILEOS OTHONOS 21, FANEROMENIX COMPLEX, OFFICE 102, 6030 LARNACA
N/A
N/A, CA N/A
CY

lots more detail here. SweetheartsConnection.com has terms and conditions that appear to prohibit spamming — but it turns out that they themselves have a pretty scary entry at RipoffReport.com, anyway, noting:

If you want a free LIFE TIME PASSWORD with Adult Bureau.. you have to apply for a 1 month membership @$39.95 to Sweetheartsconnection.com A DATING SERIVCE ….. charge appears as IT INTERNET SERVICES.

No matter if you request cancellation of service this company will continue to bill you ” it gets better ” then send you to there home made collection company ” Secure debt collections, ” two companies in one both fraud

Phony Notices will be sent to the home demanding final payment of a service NEVER USED. They will contact you, try intimidate you into paying a Balance of $200.00 (Sweetheartsconnecton.com automatically rebills your credit card every month @$39.95.

eek.

This weblog post, of 2005-10-28. is shaping up to be the canonical support group for victims of this scam; worth reading the comments there.

Quite a scam, and interesting to note the “personal touch” via email and IM.

The C=64-izer

Ever wondered what today’s internet meme images would look like on mid-’80’s home computing hardware?

Wonder no longer!

What Works in Software Development

I already posted this to the link-blog yesterday, but it’s so good it’s worth promoting more widely. If you write software for a living, you really ought to read the slides for Michael Schwern’s excellent ‘What Works In Software Development’ talk.

It’s a long presentation (108 slides!), but during the course of that, he covers:

  • effective teamwork
  • dealing with bad customers
  • dealing with bad management
  • classic coding mistakes
  • classic project management mistakes
  • classic design mistakes
  • test-driven development
  • refactoring
  • patterns

It’s a really good synthesis of what I think are the best bits of good OO design, XP, CPAN and perl’s design and coding styles, without most of the cruft. I’ll be pointing people at this for years to come, I think…

(Found via yoz.)

Planet Antispam: Beta No More

Planet Antispam has been working pretty nicely for the last couple of weeks — can’t say I’ve noticed any trouble, and its RSS feed is turning out to be a nice aggregation of anti-spam news. On top of that, John Levine was kind enough to set up a CNAME for it at a more appropriate URL — http://planet.spam.abuse.net/.

As a result, it’s now fully-fledged, and fit to lose the ‘beta’ qualifier. Please bookmark, subscribe to the feeds, and pass on the URL to others you think may be interested!

Moving Home — De-Cluttering

I’m moving home.

The flights are booked — Feb 14th, Valentine’s Day, I’ll be leaving Orange County and heading back to Dublin permanently. In the meantime, I’ve been selling stuff, throwing stuff out, decommissioning servers, and making backups.

The server

My erstwhile desktop, later my trusty back-room server, ‘jalapeno’, was sold earlier today. Thankfully, I bought a 250GB hard drive recently, so I actually had the room to back up its 70GB somewhere beforehand.

Being security-conscious, I overwrote its partitions using pseudo-random data before passing it on (‘dd if=/dev/urandom of=/dev/hda9 bs=1024k’). However, being lazy, I did this while the machine was up and running, over an SSH link.

Watching as ‘df’ produced gibberish output, and as later commands started producing nothing but bus errors, was odd — a very strange feeling to be actively destroying the disk’s data like that. Here’s hoping the backups worked

The yard sale

We had one, in the process selling about $1000 worth of IKEA furniture, books, camping equipment, bits of hardware, sports equipment, and a pink xmas tree:

The local bargain hunters starting knocking on the door at 8:15am, despite the sign’s posted start time of 9am. Once we did start bringing items out to the front lawn to sell, there were already about 10 people, which quickly swelled to a mob of 20 by 8:45am. They were keen!

By the end of Saturday, we’ve sold pretty much all the furniture, all of the sports and camping equipment, most of the hardware that isn’t total crap, and only 2 of the books. One shopper’s explanation: ‘she didn’t have the time to read books’.

Still, the yard sale has netted $345. Not bad, and a good feeling to de-clutter so successfully.

Music, and iPod Shuffle

I’ve realised I like the endings of songs; whether I like a song or not, entirely depends on how it ends.

Apple’s iPod shuffle algorithm is incredible. I’ve been spending quite a bit of time listening to it, and I’m sure it’s not random; I think it’s picking next tracks based partly on the similarity of metadata between the current and candidate tracks, which is quite neat as an automated mixing technique.

So is it random? Google says:

  • yes
  • no; a commenter on that article notes the same thing I’m talking about
  • yes
  • no; can’t say I’ve noticed the Beatles getting a push on mine
  • yes
  • and finally, no answer here, but a pretty cool stats experiment

Google DRM and WON Authentication

So, Google have invented their own DRM, apparently. I’m keen to find out more details; Techdirt and Plasticbag.org are so far the only places I can find in the blogosphere to discuss it in any detail.

One tidbit worth noting from the LA Times coverage:

The Google copy-protection software also imposes a big restriction: The CBS shows, NBA games and other material protected by the software can be watched only on a computer that’s connected to the Internet.

“I think it’s going to be a problem,” said Li, the Forrester analyst, adding that Google executives told her they were trying to fix it.

That’s interesting. In my opinion, given that quote, I’ll bet Google’s DRM is something similar to the copy-protection systems used for many games since about id’s Quake 3 and Valve’s Half-Life; an online “key server” which validates codes, tracks player IDs, and who’s viewing what, “live”, as the video is cued up and played.

Some more info on the Half-Life WON authentication system can be found in this GamaSutra article; subscription required — try viewing this google-cache version with Javascript off if you don’t have a sub. That’s historical now, of course, since that WON system has been replaced by a new auth protocol as part of Valve’s ‘Steam’ system.

The key factor is the network, separating the dangerous, untrustworthy user machine from the trusted key server. Since the online key server can act as a platform for trusted, known-insubvertable code to run, along with the video server, both being under Google’s control, it’s actually possible to build reasonably solid DRM on this model. That’s as opposed to the usual case, where a reasonably determined teenager can break it in a week of school-nights. ;)

Anyway, that’s speculation. It remains to be seen if they’ve come up with something along the lines of WON authentication — and if it’s still easily subvertable or not.

Update: Aristotle Pagaltzis has a pretty good point in the comments:

Watching video, unlike playing a multiplayer game, is not an activity that inherently requires connecting to a server. Playing a multiplayer game, OTOH, inherently is.

So cracking a multiplayer game’s key check is fruitless, because then you can’t play online anymore, which was the whole point of the game in the first place. In contrast, a video player with a cracked key check still fulfills its purpose just fine.

I think he’s right. That’s a key point, demonstrating how WON authentication still can’t help — media playback, as a task, is itself fundamentally crackable.

Wedding Plans

Myself and the lovely C are planning on getting married, hopefully sometime this year. I’ve just come across some details about Japanese weddings, and apparently:

‘If you are attending a Japanese wedding reception, you are expected to bring cash for a gift (called Oshugi). The amount depends on your relationship with the couple and the region, unless the fixed amount is indicated on the invitation card. The average is 30,000yen ($250) for a friend’s wedding. It’s important that the cash is enclosed in a special envelope called Shugi-bukuro and your name is written on the front.’ … ‘It is a grave insult to give less than $200.’

That gives me a great idea… ;)

Planet Antispam

So a few weeks back, I mooted the idea of an anti-spam Planet site, similar to Planet GNOME, Planet Java, Planet Perl et al.

Here’s the results: Planet Antispam.

It’s still got a few rough edges; notably, the URL is not permanent — I’d prefer something at a more spam-themed domain — and the logo is the generic “PlanetPlanet” one. But it’s up and running in a beta-ish fashion.

Feel free to bookmark, subscribe, post the URL on, etc.; and if you’d like to give it a better home with an A record at a spam-themed domain, drop me a line.

Update, Jan 17: Thanks to John Levine, it now has a permanent home at http://planet.spam.abuse.net/ . After several weeks of operation, I think it’s turning out to be pretty solid, too!

By the way, it also needs more source feeds. If you know of people with blogs, working on/writing about anti-spam (of the email variety), with RSS feeds that work, include the post text, and permit further redistribution of that text, drop us a line and I’ll add them.

Finally, here’s a picture of a Starbucks SPAM(r) Sandwich. (shudder)