Home » Archives for March 2008

Month: March 2008

“What’s New” archaeology

Published March 31, 2008

jwz has, incredibly, resurrected home.mcom.com, the WWW site of the Mosaic Communications Corporation, as it was circa Oct 1994.

Edmund Roche-Kelly was kind enough to get in touch and note this link — http://home.mcom.com/home/whatsnew/whats_new_0993.html:

September 3, 1993

IONA Technologies (whose product, Orbix, is the first full and complete implementation of the Object Management Group’s Common Object Request Broker Architecture, or CORBA) is now running a Web server.

An online pamphlet on the Church of the SubGenius is now available.

Guess who was responsible for those two ;)

I was, indeed, running the IONA web server — it was set up in June 1993, and ran Plexus, a HTTP server written in Perl. IONA’s server was somewhere around public web server number 70, world-wide.

The SubGenius pamphlet is still intact, btw, although at a more modern, “hyplan”-less URL these days. It’ll be 15 years old in 6 months… how time flies!

Sharing, not consuming, news

Published March 28, 2008

The New York Times yesterday had a great article about modern news consumption:

According to interviews and recent surveys, younger voters tend to be not just consumers of news and current events but conduits as well — sending out e-mailed links and videos to friends and their social networks. And in turn, they rely on friends and online connections for news to come to them. In essence, they are replacing the professional filter — reading The Washington Post, clicking on CNN.com — with a social one.

“There are lots of times where I’ll read an interesting story online and send the URL to 10 friends,” said Lauren Wolfe, 25, the president of College Democrats of America. “I’d rather read an e-mail from a friend with an attached story than search through a newspaper to find the story.”

[Jane Buckingham, the founder of the Intelligence Group, a market research company] recalled conducting a focus group where one of her subjects, a college student, said, “If the news is that important, it will find me.”

In other words, as Techdirt put it, this generation of news readers now focuses on sharing the news, rather than just consuming it — and if you want to share a news story, there’s no point passing on a subscription-only URL that your friends and contacts cannot read.

What newspapers need to do to remain relevant for this generation of news consumers is not to hide their content behind paywalls and registration-required screens. The Guardian got their heads around this a few years back, and have come along in leaps and bounds since then. I wonder if the Irish Times is listening?

converting TAP output to JUnit-style XML

Published March 26, 2008

Here’s a perl script that may prove useful: tap-to-junit-xml…

NAME

tap-to-junit-xml – convert perl-style TAP test output to JUnit-style XML

SYNOPSIS

tap-to-junit-xml "test suite name" [ outputprefix ] < tap_output.log

DESCRIPTION

Parse test suite output in TAP (Test Anything Protocol) format, and produce XML output in a similar format to that produced by the <junit> ant task. This is useful for consumption by continuous-integration systems like Hudson.

Written in perl, requires TAP::Parser and XML::Generator. It's based on junit_xml.pl by Matisse Enzer, although pretty much entirely rewritten.

SpamAssassin thresholds again

Published March 25, 2008

Following on from this post, I came across a great graphing Flash applet from amCharts which is ideal for putting a much improved UI on those SpamAssassin ‘required_score’ threshold line charts.

Here’s the result — nice, zoomable, dynamic charts, displaying the effectiveness of various ‘required_score’ thresholds depending on whether Bayes and network rules are active in your SpamAssassin configuration. Very cool!

Pulseaudio ate my wifi

Published March 21, 2008

I’ve just spent a rather frustrating morning attempting to debug major performance problems with my home wireless network; one of my machines couldn’t associate with the AP at all anymore, and the laptop (which was upstairs in the home office, for a change) was getting horrific, sub-dialup speeds.

I did lots of moving of Linksys APs and tweaking of “txpower” settings, without much in the way of results. Cue tearing hair out etc.

Eventually, I logged into the OpenWRT AP over SSH, ran iftop to see what clients were using the wifi, and saw that right at the top, chewing up all the available bandwidth, was a multicast group called 224.0.0.56. The culprit! There was nothing wrong with the wifi setup after all — the problem was massive bandwidth consumption, crowding out all other traffic.

You see, “pulseaudio”, the new Linux sound server, has a very nifty feature — streaming of music to any number of listeners, over RTP. This is great. What’s not so great is that this seems to have magically turned itself on, and was broadcasting UDP traffic over multicast on my wifi network, which didn’t have enough bandwidth to host it.

Here’s how to turn this off without killing “pulseaudio”. Start “paman”, the PulseAudio Manager, and open the “Devices” tab:

(click on the image to view separately, if it’s partly obscured.)

Select the “RTP Monitor Stream” in the “Sources” list, and open “Properties”:

Hit the “Kill” button, and your network is back to normal. Phew.

Another (quicker) way to do this, is using the command-line “pacmd” tool:

echo kill-source-output 0 | pacmd

It’s a mystery where this is coming from, btw. Here’s what “paman” says it came from:

But I don’t seem to have an active ‘module-rtp-send’ line in my configuration:

: jm 98...; grep module-rtp-send /etc/pulse/* /home/jm/.pulse*
/etc/pulse/default.pa:#load-module module-rtp-send source=rtp.monitor

Curious. And irritating.

Update: it turns out there’s another source of configuration — GConf. “paprefs” can be used to examine that, and that’s where the setting had been set, undoubtedly by me hacking about at some stage. :(

more crap from St. Petersburg

Published March 19, 2008

Noted with alarm in this comment regarding the horrific privacy-invading adware that is Phorm:

Their programmers are mostly Saint Petersburg-based, home to the Russian Business Network. Their servers are kept only in Saint Petersburg and China, so no ISP customer data is ever stored in the UK. Any personally identifying information they obtain about UK citizens can never be seen or purged using existing UK Data Protection Laws.

St. Petersburg is turning out to be quite a source of online nastiness — the new Boca Raton.

Evading Audible Magic’s Copysense filtering

Published March 12, 2008

As I noted on Monday, the Irish branches of several major record companies have brought a case against Eircom, demanding in part that the ISP install Audible Magic’s Copysense anti-filesharing appliances on their network infrastructure.

I thought I’d do a quick bit of research online into how they do their filtering. Here’s what the EFF had to say:

Audible Magic’s technology can easily be defeated by using one-time session key encryption (e.g., SSL) or by modifying the behavior of the network stack to ignore RST packets.

It’s interesting to see that they used RST packets — this is the same mechanism used by the “Great Firewall of China” to censor the internet:

the keyword detection is not actually being done in large routers on the borders of the Chinese networks, but in nearby subsidiary machines. When these machines detect the keyword, they do not actually prevent the packet containing the keyword from passing through the main router (this would be horribly complicated to achieve and still allow the router to run at the necessary speed). Instead, these subsiduary machines generate a series of TCP reset packets, which are sent to each end of the connection. When the resets arrive, the end-points assume they are genuine requests from the other end to close the connection — and obey. Hence the censorship occurs.

But there’s a very easy way to avoid this, according to that blog post:

However, because the original packets are passed through the firewall unscathed, if both of the endpoints were to completely ignore the firewall’s reset packets, then the connection will proceed unhindered! We’ve done some real experiments on this — and it works just fine!! Think of it as the Harry Potter approach to the Great Firewall — just shut your eyes and walk onto Platform 9¾.

Clayton, Murdoch, and Watson’s paper on this technique provides the Linux and FreeBSD firewall commands they used to do this. Here’s Linux:

   iptables -A INPUT -p tcp --tcp-flags RST RST -j DROP

For FreeBSD, the command is:

   ipfw add 1000 drop tcp from any to me tcpflags rst in

So assuming Copysense haven’t changed their approach yet, it’s trivial to block Copysense’s filtering, if both ends are running Linux or BSD. I predict if Copysense becomes widespread, someone will patch Windows TCP to do the same.

I love Audible Magic’s response:

The current appliance happens to use the TCP Reset to accomplish this today. There are many other technical methods of blocking transfers. Again, we have strategies to deal with them should they ever prove necessary. This is why we recommend our customers purchase a software support agreement which provides for these enhancements that keep their purchase up-to-date and protect their investment.

in other words, “hey customers! if you don’t have a support contract, you’re shit out of luck when the p2p guys get around our filters!” Nice. ;)

Vim hanging while running VMWare: fixed

Published March 12, 2008

I’ve just fixed a bug on my linux desktop which had been annoying me for a while. Since there seems to be little online written about it, here’s a blog post to help future Googlers.

Here’s the symptoms: while you’re running VMWare, your Vim editing sessions freeze up for 20 seconds or so, roughly every 5 minutes. The editor is entirely hung.

If you strace -p the process ID before the hang occurs, you’ll see something like this:

select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
_llseek(7, 4096, [4096], SEEK_SET)      = 0
write(7, "tp\21\0\377\0\0\0\2\0\0\0|\0\0\0\1\0\0\0\1\0\0\0\6\0\0"..., 4096) = 4096
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost -isig -icanon -echo ...}) = 0
select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
_llseek(7, 20480, [20480], SEEK_SET)    = 0
write(7, "ad\0\0\245\4\0\0\341\5\0\0\0\20\0\0J\0\0\0\250\17\0\0\247"..., 4096) = 4096
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost -isig -icanon -echo ...}) = 0
select(6, [0 3 5], NULL, [0 3], {0, 0}) = 0 (Timeout)
fsync(

In other words, the hung process is sitting in an fsync() call, attempting to flush changed data for the current file to disk.

Investigation threw up the following: a kerneltrap thread about disk activity, poor responsiveness with Firefox 3.0b3 on linux, and a VIM bug report regarding this feature interfering with laptop-mode and spun-down hard disks.

VMWare must be issuing lots of unsynced I/O, so when Vim issues its fsync() or sync() call, it needs to wait for the VMWare I/O to complete before it can return — even though the machine is otherwise idle. A bit of a Linux kernel (or specifically, ext3) misfeature, it seems.

Synthesising details from those threads comes up with this fix: edit your ~/.vimrc and add the following lines —

set swapsync=
set nofsync

This will inhibit use of both fsync() and sync() by Vim, and the problem is avoided nicely.

Update: one of the Firefox developers discusses how this affects FF 3.0.

Irish ISPs in record company crosshairs

Published March 11, 2008

RTE reports that 4 record companies, EMI, Sony BMG, Universal Music and Warner Music, have brought a High Court action to compel Eircom — Ireland’s largest ISP — to prevent its networks being used for the illegal downloading of music:

Willie Kavanagh, Managing Director of EMI Ireland and chairman of IRMA, said because of illegal downloading and other factors, the Irish music industry was experiencing a “dramatic and accelerating decline” in income. He said sales in the Irish market dropped 30% in the six years up to 2007.

EMI and the other companies are challenging Eircom’s refusal to use filtering technology or other measures to voluntarily block or filter illegally downloaded material. Last October Eircom told the companies it was not in a position to use the filtering software.

(I wonder if those dropping sales in the Irish market comprise only CDs sold by Irish shops? 2001 to 2007 is also the time period when physical sales have given way to online shopping on a gigantic scale, especially for music.)

The Irish Times coverage includes another interesting factoid, which appears in a lot of press regarding this case:

Latest figures available, for 2006, indicate that 20 billion music files were illegally downloaded worldwide that year. The music industry estimates that for every single legal download, there are 20 illegal ones.

A little research reveals that that figure comes from the IFPI Digital Music Report 2008. I’d have a totally different take on it, however. In my opinion, the figure is probably correct, but not for the reasons the IFPI want them to be. There are a number of factors:

it’s bloody hard to buy legit MP3s online. You can buy all sorts of shitty DRM-laden formats, or rent access to files that “expire”, or buy MP3s in certain jurisdictions but not in others, or on certain platforms. So this is the main reason, in my opinion, why legit music downloads are a lot lower than they could be — because it’s tricky to get a legit download!

By contrast it’s a lot easier to log in to a filesharing service and download high-quality, DRM-free MP3s. Consumers have indicated that DRM frustrations drive them to filesharing sites, and no less than the Executive Vice President of the MPAA recognises this. (It looks like the IFPI dispute this, however.)
people, given the opportunity, tend to download free music, whether or not they plan to listen to it. It could be called the “radio effect”; they might listen to it, or they might not, there’s no pressure. If they have to pay for it, however, they’ll only download the stuff they really want. I definitely noticed this when the legal download site EMusic switched from an “all you can eat” model to a certain number of files per month; I immediately stopped downloading tunes speculatively, and restricted myself just to the ones I had already decided I wanted.
on BitTorrent filesharing sites, downloaders need to maintain high sharing ratios — so they download files just in order to re-upload them, regardless of whether they want them or not. (so I’m told. ;)

There’s more commentary on the 20-to-1 figure here.

The IFPI Digital Music Report 2008 also notes:

“2007 was the year ISP responsibility started to become an accepted principle. 2008 must be the year it becomes reality”

Governments are starting to accept that Internet Service Providers (ISPs) should take a far bigger role in protecting music on the internet, but urgent action is needed to translate this into reality, a new report from the international music industry says today.

ISP cooperation, via systematic disconnection of infringers and the use of filtering technologies, is the most effective way copyright theft can be controlled. Independent estimates say up to 80 per cent of ISP traffic comprises distribution of copyright-infringing files.

The IFPI Digital Music Report 2008 points to French President Sarkozy’s November 2007 plan for ISP cooperation in fighting piracy as a groundbreaking example internationally. Momentum is also gathering in the UK, Sweden and Belgium. The report calls for legislative action by the European Union and other governments where existing discussions between the music industry and record companies fail to progress.

So it seems Ireland is the vanguard of an international effort by IFPI members to force ISPs to install filtering, worldwide. It seems the same happened in Belgium last year — and I reckon there’ll be similar cases elsewhere soon.

Either way, I doubt this will be good for Irish internet users.

(PS: while I’m talking about buying MP3s online — a quick plug for 7digital. Last time I used them, I had a pretty crappy experience, but the situation is a lot better nowadays. They now have a great website that works perfectly in Firefox on Linux; they sell brand new releases like the Hercules and Love Affair album as 320kbps DRM-free MP3s; they support PayPal payments; and downloads are fast and simple — right click, “Save As”. hooray!)

Some other blog coverage: Lex Ferenda with some details about the legal situation, and Jim Carroll.

Update: EMI Ireland seem to be singing from a different hymn-sheet than their head office… interesting.

Update 2: I’ve taken a look at the Copysense filtering technology, and how it can be evaded.

Announcing IrishPulse

Published March 6, 2008

As I previously threatened, I’ve gone ahead and created a “Microplanet” for Irish twitterers, similar to Portland’s Pulse of PDX — an aggregator of the “stream of consciousness” that comes out of our local Twitter community: IrishPulse.

Here’s what you can do:

Add yourself: if you’re an Irish Twitter user, follow the user ‘irishpulse’. This will add you to the sources list.

Publicise it: feel free to pass on the URL to other Irish Twitter users, and blog about it.

Read it: bookmark and take a look now and again!

In terms of implementation, it’s just a (slightly patched) copy of Venus and a perl script using Net::Twitter to generate an OPML file of the Twitter followers. Here’s the source. I’d love to see more “Pulse” sites using this…

Google’s CAPTCHA – not entirely broken after all?

Published March 5, 2008

A couple of weeks ago, WebSense posted this article with details of a spammer’s attack on Google’s CAPTCHA puzzle, using web services running on two centralized servers:

[…] It is observed that two separate hosts active on same domain are contacted during the entire process. These two hosts work collaboratively during the CAPTCHA break process. […]

Why [use 2 hosts]? Because of variations included in the Google CAPTCHA image, chances are that host 1 may fail breaking the code. Hence, the spammers have a backup or second CAPTCHA-learning host 2 that tries to learn and break the CAPTCHA code. However, it is possible that spammers also use these two hosts to check the efficiency and accuracy of both hosts involved in breaking one CAPTCHA code at a time, with the ultimate goal of having a successful CAPTCHA breaking process.

To be specific, host 1 has a similar concept that was used to attack Live mail CAPTCHA. This involved extracting an image from a victim’s machine in the form of a bitmap file, bearing BM.. file headers and breaking the code. Host 2 uses an entirely different concept wherein the CAPTCHA image is broken into segments and then sent as a portable image / graphic file bearing PV..X file headers as requests. […]

While it doesn’t say as such, some have read the post to mean that Google’s CAPTCHA has been solved algorithmically. I’m pretty sure this isn’t the case. Here’s why.

Firstly, the FAQ text that appears on “host 1” (thanks Alex for the improved translation!):

FAQ

If you cannot recognize the image or if it doesn’t load (a black or empty image gets displayed), just press Enter.

Whatever happens, do not enter random characters!!!

If there is a delay in loading images, exit from your account, refresh the page, and log in again.

The system was tested in the following browsers: Internet Explorer Mozilla Firefox

Before each payment, recognized images are checked by the admin. We pay only for correctly recognized images!!!

Payment is made once per 24 hours. The minimum payment amount is $3. To request payment, send your request to the admin by ICQ. If the admin is free, your request will be processed within 10-15 minutes, and if he is busy, it will be processed as soon as possible.

If you have any problems (questions), ICQ the admin.

That reads to me a lot like instructions to human “CAPTCHA farmers”, working as a distributed team via a web interface.

Secondly, take a look at the timestamps in this packet trace:

The interesting point is that there’s a 40-second gap between the invocation on “Captcha breaking host 1” and the invocation on “Captcha breaking host 2”. There is then a short gap of 5 seconds before the invocations occur on the Gmail websites.

Here’s my theory: “host 1” is a web service gateway, proxying for a farm of human CAPTCHA solvers. “host 2”, however, is an algorithm-driven server, with no humans involved. A human may take 40 seconds to solve a CAPTCHA, but pure code should be a lot speedier.

Interesting to note that they’re running both systems in parallel, on the same data. By doing this, the attackers can

collect training data for a machine-learning algorithm (this is implied by the ‘do not enter random characters!’ warning from the FAQ — they don’t want useless training data)
collect test cases for test-driven development of improvements to the algorithm
measure success/failure rates of their algorithms, “live”, as the attack progresses

Worth noting this, too:

Observation*: On average, only 1 in every 5 CAPTCHA breaking requests are successfully including both algorithms used by the bot, approximating a success rate of 20%. The second algorithm (segmentation) has very poor performance that sometimes totally fails and returns garbage or incorrect answers.

So their algorithm is unreliable, and hasn’t yet caught up with the human farmers. Good news for Google — and for the CAPTCHA farmers of Romania ;)

Update: here’s the NYTimes’ take, with broadly agreeing comments from Brad Taylor of Google. (The Register coverage is off-base, however.)