Sup Rocks

For the past 2 years or so, I’ve been using GMail to handle my main mail feed for jmason.org. I’m an absolute convert to its “river of threads”/search-based workflow.

Since starting at Amazon, I’ve had to start dealing with a heavy volume of work mail. Previously jobs have either had low mail volumes, or used Google Apps hosting for their mail, but Amazon’s volumes are high and — obviously — they’re not using Google. ;) For a while, I tried using Thunderbird, but it just didn’t really cut it; I could never keep track of mails I wanted archived, or remember which folder they were in, etc. — the same old problems that GMail solved.

Enter Sup. It’s a console-based *nix email client, with a Mutt-like curses interface, which offers something closely approximating the GMail experience:


Sup is a console-based email client for people with a lot of email. It supports tagging, very fast full-text search, automatic contact-list management, custom code insertion via a hook system, and more. If you’re the type of person who treats email as an extension of your long-term memory, Sup is for you.

Inbox Zero is a daily occurrence for my work email now; I can simply archive pretty much everything, and reliably know the excellent full-text search support will allow me to find it again in an instant when I need it. The new-user guide is well worth a read to get an idea of its featureset and UI.

Setting it up

The process of getting it set up is quite hairy; here are some instructions for Ubuntu, which thoroughly failed to work for me on 9.04. I had a similarly tricky time using some Ruby packages on the Red Hat work desktop, but eventually avoided it by just building vanilla Ruby from source, then using that to install “gem” and from that, “sudo gem install sup”. Much easier…

Next step is to get the mail. From some reading, it appears the most reliable way to deal with a MS Exchange 2007 server is to use offlineimap to sync it to a local set of maildirs, then add those as Sup “sources” using sup-add, one by one. This is very well supported in Sup, and works well. Offlineimap is very easy to install on Ubuntu, and can easily be built from source if that’s not an option. My config is pretty much a vanilla copy of the minimal config.

There’s a good Sup hook to run “offlineimap” every poll interval, and rescan synced sources that contain new mail. It works well.

Sup has an interesting approach to mail storage — it doesn’t. Instead, it stores pointers to the messages’ locations in their source storage. This is a great idea, since bugs in Sup therefore cannot lose your mail — just your metadata about your mail. However, it means that if the source changes in a way which moves or removes messages, you need to tell Sup to rescan (using “sup-sync”), but that’s no big deal in practice; in the more usual case, if new mail arrives, it’s automatically rescanned.

I have just under 7000 mail messages in my Sup index, and rescans are speedy and searches super-fast. It’s very nicely done.

Outbound mail is delivered using /usr/sbin/sendmail by default, which should be working on any decent *nix desktop anyway ;)

Recommended Hooks

The Hooks wiki page has a few good hooks that you should install:

  • ~/.sup/hooks/before-poll.rb: the above-mentioned offlineimap poll hook
  • ~/.sup/hooks/mime-decode.rb: ‘uses w3m to translate all HTML attachments that don’t have a text/html alternative.’ Well worth installing.
  • ~/.sup/hooks/before-add-message.rb: essential to filter out cron noise and the like so it doesn’t hit the inbox; unfortunately Sup doesn’t (yet) support GMail’s “filter messages like this” UI.

Bad Points

  • Long URIs: unfortunately, very long URIs are broken by Sup’s renderer, and it doesn’t offer a native way to “activate” URIs and have them displayed in the browser; instead one has to cut and paste them. This is pretty lame. I’ve hacked up a perl script that will reconstruct the full URLs from the broken rendering, when the text is piped to it, but that’s a horrible hack.

  • Index Corruption: I’ve had the misfortune (once, in the month since I started) of corrupting my search index, causing Ruby exception stack traces when I attempted to run “sup-sync” to scan new mail. The only fix appeared to be to restore my index from a “sup-dump” backup. Thankfully all seems fine now, but it was a definite reminder of the product’s beta status.

  • Calendaring: still as painful as it’s ever been with UNIX command line email.

  • HTML: A good-quality, email-oriented, native HTML renderer would be awesome.

  • MIME: Sup again takes the traditional approach from UNIX command line clients of delegating to the mailcap file and its rules; unfortunately my RHEL5 desktop is too crappy to have a good mailcap setup. So I’ve had to write this from scratch to deal with the usual .docs and .xls’s etc., flying about.

  • Inconsistent Key Mapping: Given that it shares so much UI with GMail in other respects, it’s a little annoying that Sup doesn’t have the same key mapping. Not a big deal, as it took only a couple of hours to get the hang of Sup’s, though.

Overall

If you’re happy enough to spend a day or two getting the damn thing installed, and aren’t afraid of a little dalliance with the bleeding edge, I strongly recommend it. It’s definitely the best *NIX mail reader at the moment.

Tags: , , , , , , ,

Comments (6)

Fixing the Gmail Tasks window bug

Hey Gmail users! If you’re using Tasks, there’s a slightly annoying bug in Gmail right now — you may see the “Use this link to open Tasks” tip window appear every time you access the inbox page.

Several other people have reported it, and apparently the Google guys are ‘working to resolve it’ at the moment. In the meantime, though, here’s a way to work around the issue without losing Tasks (you will, unfortunately, lose the offline-gmail functionality, though). Simply disable Offline Gmail (Settings -> Offline -> “Disable Offline Gmail for this computer”), and the bug no longer manifests itself.

You can allow Gmail to keep the stored mail on your computer if you like, which will be handy for when the bug is fixed and Offline can be re-enabled — hopefully sooner rather than later.

Tags: , , , , , ,

Comments (2)

Links for 2008-08-15

Tags: , , , , , , , , , , , , , , ,

Comments (1)

More details on the “GMail forwarding hole”

Those INSERT guys who’ve been talking about a GMail security hole allowing spammers to relay spam, have released more previous-redacted details here. (thanks to the MailChannels blog for pointing that out.)

In essence, the attack works by allowing a spammer to set the “forward to” address in GMail to point at a target address, send a spam to the GMail account, then change the “forward to” address to the next target and repeat.

My response:

  1. it’d be trivial for Google to impose stringent rate limits on “forward to” address changes, and I’d be surprised if they haven’t already.

  2. ditto rate-limiting on the rate of forwarding messages for each GMail account.

  3. as they say in the paper — if Google required up-front confirmation of the target address before forwarding any mail, that would also cut this out neatly.

  4. It’s worth noting that GMail’s outbound servers may be whitelisted by some recipient sites, others are treating them negatively — word on the anti-spam “street” is that GMail is becoming a festering pit of 419 scammers these days.

Tags: , , , ,

Comments (10)

Google’s CAPTCHA – not entirely broken after all?

A couple of weeks ago, WebSense posted this article with details of a spammer’s attack on Google’s CAPTCHA puzzle, using web services running on two centralized servers:

[...] It is observed that two separate hosts active on same domain are contacted during the entire process. These two hosts work collaboratively during the CAPTCHA break process. [...]

Why [use 2 hosts]? Because of variations included in the Google CAPTCHA image, chances are that host 1 may fail breaking the code. Hence, the spammers have a backup or second CAPTCHA-learning host 2 that tries to learn and break the CAPTCHA code. However, it is possible that spammers also use these two hosts to check the efficiency and accuracy of both hosts involved in breaking one CAPTCHA code at a time, with the ultimate goal of having a successful CAPTCHA breaking process.

To be specific, host 1 has a similar concept that was used to attack Live mail CAPTCHA. This involved extracting an image from a victim’s machine in the form of a bitmap file, bearing BM.. file headers and breaking the code. Host 2 uses an entirely different concept wherein the CAPTCHA image is broken into segments and then sent as a portable image / graphic file bearing PV..X file headers as requests. [...]

While it doesn’t say as such, some have read the post to mean that Google’s CAPTCHA has been solved algorithmically. I’m pretty sure this isn’t the case. Here’s why.

Firstly, the FAQ text that appears on “host 1″ (thanks Alex for the improved translation!):

img

FAQ

If you cannot recognize the image or if it doesn’t load (a black or empty image gets displayed), just press Enter.

Whatever happens, do not enter random characters!!!

If there is a delay in loading images, exit from your account, refresh the page, and log in again.

The system was tested in the following browsers: Internet Explorer Mozilla Firefox

Before each payment, recognized images are checked by the admin. We pay only for correctly recognized images!!!

Payment is made once per 24 hours. The minimum payment amount is $3. To request payment, send your request to the admin by ICQ. If the admin is free, your request will be processed within 10-15 minutes, and if he is busy, it will be processed as soon as possible.

If you have any problems (questions), ICQ the admin.

That reads to me a lot like instructions to human “CAPTCHA farmers”, working as a distributed team via a web interface.

Secondly, take a look at the timestamps in this packet trace:

img2

The interesting point is that there’s a 40-second gap between the invocation on “Captcha breaking host 1″ and the invocation on “Captcha breaking host 2″. There is then a short gap of 5 seconds before the invocations occur on the Gmail websites.

Here’s my theory: “host 1″ is a web service gateway, proxying for a farm of human CAPTCHA solvers. “host 2″, however, is an algorithm-driven server, with no humans involved. A human may take 40 seconds to solve a CAPTCHA, but pure code should be a lot speedier.

Interesting to note that they’re running both systems in parallel, on the same data. By doing this, the attackers can

  1. collect training data for a machine-learning algorithm (this is implied by the ‘do not enter random characters!’ warning from the FAQ — they don’t want useless training data)

  2. collect test cases for test-driven development of improvements to the algorithm

  3. measure success/failure rates of their algorithms, “live”, as the attack progresses

Worth noting this, too:

Observation*: On average, only 1 in every 5 CAPTCHA breaking requests are successfully including both algorithms used by the bot, approximating a success rate of 20%. The second algorithm (segmentation) has very poor performance that sometimes totally fails and returns garbage or incorrect answers.

So their algorithm is unreliable, and hasn’t yet caught up with the human farmers. Good news for Google — and for the CAPTCHA farmers of Romania ;)

Update: here’s the NYTimes’ take, with broadly agreeing comments from Brad Taylor of Google. (The Register coverage is off-base, however.)

Tags: , , , , , ,

Comments (5)

Spambots stealing GMail and Hotmail passwords?

I just received this mail from a friend:

Dear friend

Welcome to stwoxy.com ! We are one of the largest electronic distributors and wholesalers in Beijing China. We offer qualified digital products: Motorcycles?TVs, Notebooks, phones. PSP, projectors, GPS, DVD, DV, DC, MP3/4 and so on, which are of world famous brands, such as Sony, IBM, PHILIPS, NOKIA, DELL and so on. All our items are brand new from the manufactures and they come with 1-3 years’ after service. These days we are expanding our overseas market, and every item is sold in extremely low price. Such chances should never be missed, ladies and gentlemen, do come to stwoxy.com! you will surely have a big surprise! We are looking forward to hearing from you!

It was sent from a HTTP connection into GMail, and was delivered from there using valid DKIM, Domain Keys and SPF signatures. In addition, it was sent to all the addresses in his address book. In other words, this was no run-of-the-mill impersonation spam — for this one, the spammer obtained my friend’s username and password somehow, logged into GMail, scraped the address book, and then sent spam via GMail that way.

My friend says he didn’t access GMail using a desktop mail client, but did have his Google password saved in his web browser (a pretty typical configuration). My theory is that some virus/malware has infected his desktop machine, captured the saved-passwords file from the web browser configuration, and used that to log into GMail. Alternatively, it could also be a guessable username and password which was picked up via dictionary attack, I guess…

This is the first case I’ve heard of where spammers are actively stealing user account authentication tokens, in order to take over the accounts for spamming. (We’d long predicted it, of course, since it’s a natural response to “pay for mail” schemes… but since there’s no widely-used pay-for-mail system available yet, it’s premature!)

It seems this is not just a GMail thing, btw. Here’s a report of the same thing happening to some French guy via HotMail last month (or in english). I don’t speak Dutch, but this forum post looks like it might be the same situation.

If you’re curious, here’s a copy of the spam, delivered to a Yahoo! group; it appears these spammers aren’t too sophisticated in terms of the text they’re sending, since they haven’t morphed that text, HTML, or even the domain in the link yet. It’s just the malware that’s sophisticated, at this stage.

Tags: , , , , , ,

Comments (44)

GNOME, Google and the UNIX user interface

Recently, after a flurry of annoying user interface issues, I’ve switched my RSS reader from Liferea to Google Reader. Interestingly, it turns out that Google Reader actually fits better with the traditional UNIX user interface concept, I’ve found.

What triggered this was an upgrade from Liferea 1.0.x to 1.4.4 as part of Ubuntu Gutsy; this brought with it a lot of changed behaviours, such as ‘drag-and-drop of feed URL to HTML view no longer subscribes’, and one crucial UI issue, ‘”Skim through articles” only works with ctrl+space’.

I’ve been a long-time UNIX user, dating back to the days where curses-based interfaces were the norm. As such, I tend to drive commonly-used applications using keyboard commands where possible. (This isn’t a purely UNIX thing; Windows has the phenomenon of the keyboard-wielding “power user”, too.)

Liferea was attractive, since it offered the ability to skim through articles quickly by just pressing the “Space” key; simply press space to page down, or to skip to the next unread article if at the end of the current one. Unfortunately, Liferea 1.4.x breaks this, and it wasn’t going to be fixed, since apparently a GNOME app shouldn’t behave this way:

GTK explicitely does implement as a key binding for several of it’s widgets. Rebinding means to break the default behaviour for such widgets (tree views, buttons, input fields). [....] Liferea as a web-browsing application should behave like any other web browser and like every other GNOME/GTK application as much as possible.

Now, I don’t know if it’s GNOME’s fault, or what, but for a UNIX desktop app to break with UNIX UI conventions, that’s a bad move in my opinion. I gave it a bit of argument in the bug tracker, but eventually gave up as I clearly wasn’t getting anywhere. :(

Instead, based on recommendation from friends, I gave Google Reader a try, and quickly figured out its extensive collection of keyboard shortcuts. Now, I’m skimming through my feeds in even less time than it took with Liferea, simply by hitting “ga” to go to my “all unread items” list, then “j”, “j”, “j” to skip through the postings one by one. Sweet!

It’s interesting to note that other Google web apps use the same concepts; Gmail also has a hefty set, and can be driven using them in a manner very reminiscent of the classic UNIX mailreader, Mutt. So, despite being designed with end-users in mind by extremely clever professional user experience designers, these apps still find space for power-user keyboard operation. Take note, GNOME.

Anyway, I’m not too bothered. Google Reader brings other benefits, such as fixing this bug: ‘please add ability to go to previous entry in Unread feed’, avoiding ‘constant memory leak requires daily restarts’, and, of course, the utility of being able to track the same set of feeds and keep track of which items I’ve read in two places (work and home).

If only it was open source ;)

Tags: , , , , , , , ,

Comments (4)

Searching GMail with a Firefox Smart Keyword

Here’s a Firefox Smart Keyword to search your GMail:

https://mail.google.com/mail/?search=query&view=tl&q=%s

Usage example, assuming you use ‘mail’ as the keyword: (CTRL-L) mail whatever

Tags: , , , ,

Comments (2)

Hacking Netflix

Movies: Hacking Netflix, via torrez.

Jason Kottke points out a great quote on a Friendster cross-site scripting attack — this great quote: ‘We have a policy that we are not being hacked.’

He also speculates that Google used the GMail invite-network data for whitelisting — but whitelisting based on email address alone is trivially exploitable, so I’d doubt it.

I’m just back from a trip over to Cape Cod to meet family (halfway between here and Ireland, y’see ;) — lots and lots of luvverly lobster and sundry shellfish — and after a 6 day trip, had 5000 spams and a couple of thousand nonspam mails to deal with. Thankfully SpamAssassin dealt with the spams (only about 5 false negatives, no false positives I could spot) – but I’m going to have to do something about that volume of mail. drowning in the stuff. argh.

Tags: , , , , , , , , , ,

Comments

GMail Invites

Mail: GMail users, check your mail; if mine was anything to go by, you should have three new invites to give out.

Tags: , , , ,

Comments (1)

More Thoughts on GMail

Mail: I’ve been playing around with GMail a bit more recently. They’ve fixed the issues they had with Firefox and keyboard control, and it is nice.

Threading: since I plan to bother a few open-source MUA developers ;), I’ve written up a thorough analysis of their ‘conversation’ model, with its ‘collapsable history’, archive-not-delete approach, etc. Take a look, if you’re curious.

HTML: one feature that no-one’s commented on, is that GMail does not create HTML mail — all mail composed through their composer is sent as text/plain only.

This is very interesting, because it suits me just fine. HTML mail causes so many more problems than it solves, especially when full-featured web browser components are used to display it, IMO. I get to see the security exploits this enables, every day in my anti-spam work.

But it’s also very significant that nobody else has commented on it – nobody misses it!

Phantom Labels: another interesting thing I’ve noted: sometimes a mail will appear in your Inbox with a ’spam’ label, even though you’ve never defined one. It’s not in the ‘Spam’ folder; it’s in your inbox.

Aaron has a good theory on what this is, and I think he’s right — he suggests it’s when ‘ the two emails are in a conversation (same subject); one is marked as spam, one isn’t. So the conversation (which is what appears in your inbox) gets two tags: Spam, and Inbox. So when viewing the list it looks like it gets the Spam tag.’

Also, while I’m here — details on LiveJournal’s distributed filesystem, MogileFS, which apparently ‘will be open source’. Link via acme.

Tags: , , , , , , , , ,

Comments

GMail and Anne

Spam: Anne Mitchell on GMail’s spam filtering — sounds like her results are actually worse than mine were. But the ads worked well:

… just today, in an email from Mrs. Nwakama Ani, the wife of the late James Ani, a farmer in ZImbabwe, asking me to please help her to export $50million dollars which her late husband amassed, Gmail’s Adsense very thoughtfully offered me ‘Cheap airline tickets from the USA to Zimbabwe’. You know, just in case I want to go over there and help her personally.

Anne’s spam weblog looks like good stuff — I’ve added it to the blogroll…

Tags: , , , , , , , , ,

Comments

Email Usability List updated in light of GMail, given new home

Mail: I’ve dusted off my old e-mail usability wishlist, made a couple of changes to reflect the current situation now that GMail has implemented some of them, and Wikified the page.

There’s still a couple that I think would be valuable, so anyone looking at new usability ideas for email is welcome to take a look ;)

Tags: , , , , , , , , , ,

Comments

Some stats on GMail’s spam filter

Update: greetings, visitors from 2006! Please pay no attention to these figures, they’re from 2004, and both GMail and SpamAssassin have undergone major changes since those days. Historical interests only.

So, I set up a .forward to forward all my personal mail to GMail to see how it coped with my spam load, and compared it against the personal SpamAssassin install I’m running these days. Here’s the results:

  • test start: Mon Apr 12 15:50:39 PDT 2004
  • test end: Tue Apr 13 18:26:45 PDT 2004
  • total spam messages received by both during the test: 210
  • total ham messages received by both during the test: 528

The SpamAssassin results:

  • true positives: 189
  • false positives: 0
  • false negatives: 21
  • true negatives: 528
  • FP%: 0.00%
  • FN%: 10.00%

The GMail results:

  • true positives: 144
  • false positives: 7
  • false negatives: 66
  • true negatives: 521
  • FP%: 1.32%
  • FN%: 31.42%

So, not too hot. But there are extenuating circumstances! ;)

  • The GMail false positives were not ‘typical’ mail, whatever that is – all of them were Mailman ‘administration required’ messages regarding spam in Mailman mailing list queues. I’d only be annoyed if I was a GMail user administrating Mailman lists. And it turns out there’s a bug in current dev SpamAssassin that now does the same thing…
  • presumably, GMail allows some element of per-user probabilistic classifier training — if so, some ‘move to Inbox’ might also sort those out quite quickly, I’d guess.
  • GMail seems to be a four-phase classification system. Messages can either go into: 1. the inbox, 2. the spam box, 3. the inbox with a little green ‘Spam’ indicator, or 4. the spam box with a little green ‘Inbox’ indicator. Not sure what the latter two do, but they may indicate some level of ‘unsure’ as per spambayes; worth noting that most of the FNs in the Inbox did not get the green ‘Spam’ indicator beside them, though.
  • I used a .forward to bounce the traffic over. So if GMail includes spam-evasion at the SMTP level, along with whatever content-filtering and probabilistic classification they’re using, they wouldn’t get the benefits of that.
  • SpamAssassin has the benefit of some user configuration; I’d got a couple of my spamtrap addresses blacklisted in the SpamAssassin config, and my Bayes databases have been trained using SpamAssassin’s autolearning.
  • this is all really unscientific, and it’s a really small sample ;)

Surprisingly, all the SpamAssassin mailing list traffic discussing spam, throwing around spammy URLs and phrases, didn’t get caught, however; probably because the volume of spammy phrases in those is less than in the Mailman admin stuff.

Tags: , , , , ,

Comments (5)

GMail Usability

Web: Check out GMail’s ‘thread history’ built into the message display, dubbed ‘collapsable history’ and ‘cards’. Very, very nice email usability!

More at Kevin Fox’ weblog, fury.com.

Tags: , , , , , , , , , ,

Comments