Skip to content

Month: February 2004

Getting into KDE 3.2

Linux: I’m really getting into KDE 3.2. I’ve been looking for a music player that is better at handling large collections of MP3s better than the venerable XMMS, without much luck:

iTunes is, of course, the ‘gold standard’, but is Mac/Windows only, so that’s not going to work on my Linux machine.

Rhythmbox is getting there as an iTunes clone, but right now is woefully incomplete. It fails to play lots of my music, has serious interface shortcomings — you can rate songs, but then there’s no way to use those ratings, and you cannot edit any of the tag metadata in the released version.

JuK is the new KDE music player app. Initially, I wrote it off — it uses the clunky interface of ‘one big list’, at first glance.

But after Rhythmbox managed to confuse itself sufficiently so that it would only open as a 3-pixel-high window (seriously!), I gave JuK another try. Summary: it kicks ass.

It turns out that the multi-pane ‘artists, albums, and tracks’ mode of iTunes and Rhythmbox isn’t actually necessary, since JuK improves on it using a very nifty dynamic ‘Tree View’ mode.

Another nice feature is the MusicBrainz integration; it has built-in support for querying MB’s servers to get correct tag data for your music. In fact, its tagging support is fantastic — this is unsurprising, as it looks like it started off as a tagging app.

Being a well-written KDE app, it exposes some nifty scripting support via DCOP, and a quick look-over with KDCOP reveals a nice set of APIs — for example, running dcop juk Player playingString tells me the name of the track and artist playing right now. I’m not sure if there’s a way to register for callbacks on events like ‘track change’ just yet, here’s hoping…

No sign of rating support just yet, though; my dream player would allow me to rate my tracks, and then make a dynamic playlist which selects tracks by rating, playing the top-rated ones more often and never playing the bottom-rated ones. Here’s hoping it’s in the pipeline ;)

All in all, though, it looks like I’ll be giving JuK a try.

Using social-networking services to filter spam

Spam: filster: Linking reputations networks to email whitelists. Very interesting — a tool to use the social network data from Orkut, FOAFweb, Reputation Research Network, and CPAN to whitelist email senders in SpamAssassin. Only problems I can see:

  • needs an anti-forging mechanism like SPF to avoid spammers forging their way through your whitelist — but the author does cover that.
  • some of the site terms of service may prohibit scraping — Orkut’s, for example, is very strict.

Still, a very nifty idea, and one worth more investigation… the combination of FOAF and SPF in particular, given that tribe.net (if I recall correctly?) will be generating FOAF data, is quite cool.

Radio Tivo

Radio: Community Projects at Moertel Consulting: My new Radio VCR. That is so cool.

Interesting tidbits:

He records using Speex, the open-source speech-recording codec, in real-time. I wonder how well it’d work with a more music-oriented codec, like Ogg Vorbis. Bit-rate used is 16Kbps, which seems to be pretty reasonable according to the Speex folks.

The resulting output is 10 MB per hour. That works out as 1.4 years of radio time on one $95.00 hard disk, which strikes me as pretty excellent buffering room ;)

Next step: Retroactive Radio Recording.

However, I’m thinking a really nifty application of this would be a single drop-in Knoppix CD-ROM for radio stations to stream their output without paying up the big bucks to You Know Who and Those Other Guys.

Silly: The Moaning Goat Meter, by xiph.org — a load meter written in a proper programming language, and with an inexplicably spinning fish that stares at you.

READY…

Jeff Minter reminisces:

  * COMMODORE BASIC *

  7167 BYTES FREE

  READY...

7k free. Hard to imagine these days; even my watch has more than that.

‘Goblin-fancier’?

Insults: Tom takes issue with my assumption that ‘anyone not living in a hole would know that SpamAssassin includes a probabilistic classifier’. Hmm. OK, I should have made it clear I meant anyone following anti-spam filter development. Henceforth I’ll over-qualify every statement on this weblog accordingly.

But at least I know that badgers are CLEARLY down, since they do live in a hole. DO YOUR RESEARCH, FARRELL.

Thermal Depolymerization

Green: There’s been a bit of chat on the intarweb recently about a new high-tech fuel source that avoids the fossil-fuel trap, namely thermal depolymerization. Here’s a couple of links that are relevant:

Sounds possibly useful although: (a) is there enough biomass produced to produce fuel in useful quantities, and (b) I bet it stinks downwind of that. ;)

Craigslist genius

Funny: Craigslist: wanted: web designer (why this phrase may get your ass beat)
. ‘sneakily trying to advertise for a web designer to make you a porn site is weak. just say in your ad that you want to show naked pictures of women fucking dogs so i can decide, before i apply, if i want to see that sort of thing, and not AFTER you’ve sent me a mentally and emotionally scarring photo of a maybe-blonde (it was hard to tell, at that angle) and a great dane, and THEN ask me if i am comfortable with that kind of content.’ (via swhackit!)

Slashdot Anti-FUSSP Form, and DSPAM’s FAQ

Spam: Slashdot: This will fail because… Tick the boxes to produce
a generic slashdot comment on a new anti-spam proposal. Very funny.

So, regarding the Noise Reduction probabilistic-classification tokenizer tweak posted on Slashdot yesterday — it does look interesting; basically, it operates by monitoring the ‘noisiness’ of the token stream, and if the current probabilities for the tokens from the stream differs from what’s defined as acceptable for too long, it ‘dubs’ them out. In other words, it ignores those tokens until another sequence of ‘useful’ tokens is encountered. Plus I’m totally down with the Janine ref ;)

However, it’s disappointing to come across this in the DSPAM FAQ list:
Why Should I use DSPAM Instead of SpamAssassin?
— a lovely selection of anti-perl and anti-SpamAssassin FUD, generally overlooking SpamAssassin‘s training components (‘leaves the end-user with no means of recourse or satisfaction when they receive a spam’), and in general taking a combative tone. Is that really necessary?

BTW, in case you’ve been living in a hole for the last year — SpamAssassin does include a probabilistic classifier, in the form of the BAYES rules. It’s easy to train, uses good tokenizing and combining algorithms to get high accuracy (although doesn’t yet do multi-word windowing until we’ve determined that that works acceptably for the db size increase), and, importantly, has been measured on corpora that are not my own mail.

A story: way back when, in June 2001, the SpamAssassin README boasted of it’s 99.94% accuracy rate. This was true — it was measured on my mail feed over the course of a couple of months. However, once measured on someone else’s mail, that dropped pretty quickly. Measuring a spam filter on the developer’s mail feed, (where presence of HTML is a killer spam-sign!), is a sure-fire way to get (a) great but (b) non-portable accuracy figures.

sleep(1) in Berkeley DB?

Code: Berkeley DB, the de-facto std for open-source high-performance database files on UNIX, is displaying some odd behaviour — it appears to be sleeping for 1 second inside the database library code, under load, for some versions of libdb. If you’re curious, there’s More info here.

‘Social networks’ spam filtering technique

Spam: /.: New Method of Spam Filtering: ‘A simple and easily implemented scheme for combating e-mail spam has been devised by two researchers in the United States. P. Oscar Boykin and Vwani Roychowdhury of the University of California, Los Angeles use their method to exploit the structure of social networks to quickly determine whether a given message comes from a friend or a spammer. The method works for only about half of all e-mails received – but in all of those cases, it sorts the mail into the right category.’

Abstract here. It appears it classifies 53% of the emails and leaves the other 47% as undiagnosed.

The problem with this scheme is that it relies on the data in the To, From, and CC fields being accurate. Currently, there’s no means to stop spammers faking those addresses.

A trivial way to get around this filter, similarly to the other filters that trust the From address, is for a spammer to send a message using your address in both the From and To fields. Most people would include themselves in their web of trust, hence the spam would get through.

A more resilient method uses IP addresses from the Received headers in conjunction with the From address. Once you do this, you can no longer use To and CC data — and the scheme becomes pretty much similar to SpamAssassin‘s auto-whitelist.

Life Hacks

Work: Life Hacks: Tech Secrets of Overprolific Alpha Geeks, Danny O’Brien’s ETech talk.

Amazingly, despite not being an alpha geek ;), I already use all these things:

  • a todo.txt file (anything else is inconvenient).
  • everything incoming comes through email, including RSS (thanks to rss2email). Again, anything else is inconvenient; I couldn’t be bothered with another desktop app.
  • I hack scripts for every repetitive task I run into
  • I sync instead of backup; everything has a CVS repository running on a remote server, even my home dir
  • I have a nasty tendency to web-scrape data

These tips definitely are good advice. Although I have a feeling the result is optimised to a weblogging UNIX geek who spends hours hacking perl/python scripts. ;)

I’m looking forward to LifeHacks.com when it does eventually go live… should be interesting.

BitTorrent

Net: Great NYTimes article interviewing Bram Cohen about BitTorrent (u: sitescooper p: sitescooper). Good to see that it landed him a job with Valve, but let’s hope that’s not the last piece of free software from Bram…

One of the best things about the article, BTW, is that it does take notice that BT isn’t a tool for piracy. Refreshing, given how these things are often covered.

Future Firefox Features

Web: More on the Firefox crappy-movie-now-web-browser thing, from Chris Blizzard:

  • A mind-controlled UI: but it only works if you think in russian!
  • Flashback mode: whenever you hear a helicopter overhead the browser will
    • redirect all page loads to web.archive.org, circa 5 years ago.
      • Stealth mode: using specially malformed headers, Firefox will load your web pages and web servers will be unable to log your vists.
      • Mach 6 Technology: advanced compression algorithms will make the web faster than it’s ever been before!
      • Arctic compliant: you can land firefox on an ice floe in the middle of the north atlantic. Not sure why you would need this, but hey, we had some extra bandwith.

Lovely Filelight

Linux: Doing my backups — it’s a good feeling to know your data will (probably) be safe if your computer suddenly carks it.

This time around, I have way too much data to actually back up the lot — so I’m being selective. Filelight is very helpful here; I can see exactly where my disk space is going, spot tmp files that I should have cleared up long ago, and so on.

One thing is clear — I have too many MP3s. How am I supposed to listen to all of those?

Firebird now Firefox

Web: Donncha notes that Mozilla Firebird has been renamed ‘Firefox’. Retro cruddy 80’s Cold War movie reference? check!

I like it. In fact, I’m looking forward to Linux kernel 2.6.2 ‘Red Dawn’.

BTW, my current favourite Firebird^H^H^H^Hfox extension: Session Saver. Load and save the current list of open tabs, and have them automatically saved when you quit the browser. Given that I often have a few tabs on stuff I’m researching, leaving them until I’m a bit less busy (which can take days!), this fits perfectly with my modus operandi.

Funny: This is GREAT!

And if that’s too much product placement for you, there’s Students for an Orwellian Society: ‘Because 2004 is 20 years too late.’

How To Increase Voter Turnout With New Technology – The Right Way

eVoting: One of the desired features for new voting mechanisms is that they will increase voter ‘turnout’, encouraging people to vote who are too busy (or too unmotivated) to visit a polling station.

This has been used to suggest internet voting (see the fiasco that was the now-scrapped SERVE project) and voting-by-phone. Both offer a scary number of vote-fixing opportunities and possible failure modes, and are fundamentally a bad idea.

However, it turns out there is a great system to implement absentee voting securely, reliably, conveniently (for the voter) and even cheaply! A comment on Bruce Schneier’s Crypto-Gram newsletter (scroll down to comment number 3) details this.

I’ve copied the entire mail here, since it’s hard to link to in the other location, and is well worth a page to itself:

From: Fred Heutte

Thanks for your cogent thoughts on ballot security. I almost completely agree and was one of the first signers of David Dill’s petition. I am also involved professionally in voter data — from the campaign side, with voter files, not directly with voting equipment — but we’re close enough to the vote counting process to see how it actually works.

I would only disagree slightly in one area. Absentee voting is quite secure when looking at the overall approach and assessing the risks in every part of the process. As long as reasonable precautions like signature checking are done, it would be difficult and expensive to change the results of mail voting significantly.

For example, in Oregon, ballots are returned in an inside security envelope which is sealed by the voter. The outside envelope has a signature area on the back side. This is compared to the voter’s signature on file at the elections office. The larger counties actually do a digitized comparison, and back that up with a manual comparison with a stratified random sample (to validate machine results on an ongoing basis), as well as a final determination for any questionable matches.

Certainly it is possible to forge a signature. However, this authentication process would greatly raise the cost of forged mail ballots, absent consent of the voter. In turn, interference or coercion with absentee voting would require much higher travel costs (at least) than doing so at a polling place, for a given change in the outcome.

It is true that precincts have poll watchers, and absentee voters do not. But consider this. Ballot boxes, which are often delivered by temporary poll workers from the precinct to the elections office, are occasionally stolen, but mail ballots are handled within a vast stream of other mail by employees with paychecks and pensions at stake. The relatively low level of mail fraud inside the postal system is a testament to its relative security, and the points where ballots are aggregated for delivery to the elections office are usually on public property and can also be watched by outside observers if need be.

Oregon has had some elections with 100% ‘vote by mail’ since 1996, and all elections since 1999. So far, no verifiable evidence of voter fraud has emerged, despite many checks and some predictions by those with a political axe to grind that we would be engulfed in a wave of election fixing.

The reality is that Oregon’s system, which is based on some common-sense security principles, has proven to be robust. The one lingering problem has been the need of some counties to make their voters use punch cards at home because of their antiquated vote counting equipment. But while this is a vote integrity issue — since state statistics show a much higher undervote and spoiled ballot total for punch cards as compared to mark-sense ballots — it is not a security issue per se. And with Help America Vote Act (HAVA) funding to convert to more modern vote counting systems, the Oregon chad remains in only one county and will go extinct after 2004.

The mark-sense (‘fill in the ovals’) ballots we have work well, and have low rates of over-votes and under-votes, despite the lack of automated machine checking that is possible in well-designed precinct voting systems. This suggests that reasonable visual design and human-friendly paper and pencil/pen home voting is a very reliable and secure system. When aided by automated counting equipment, we even have the additional benefit of very fast initial counts.

The increase in voter participation in Oregon since the advent of vote-by-mail — 10 to 30 percentage points above national averages, depending on the kind of election — leads to the only other issue, which is slow machine counts on election night after the polls close due to the surge of late ballots received at drop-off locations around the state. Oregon in fact isn’t really ‘vote by mail,’ it’s vote-at-home, with a paper ballot that can be mailed or left at any official drop-off point in the state, including county election offices, many schools and libraries, malls, town squares, etc.

The great advantage of the Oregon system is that it relies on the principle that if you appeal to the best instincts of the citizen, the overwhelming majority will ‘do our part’ to ensure the integrity of the democratic voting process, whether it is full consideration of the candidates and issues before voting, watching to make sure all ballots are securely transferred and counted, or favoring those laws and policies that insure that everyone eligible can vote, that their votes are counted, and that the candidates and measures with the most votes win.

The system is also cheaper than running traditional precinct elections. What’s not to like?

It’s so simple, and so sensible. Next time someone suggests ‘i-voting’ or ‘m-voting’ or whatever, you know what to point to…

Firebird Extension Idea

Web: I watched a hilarious Rob Corddry segment from The Daily Show last night, repeated from earlier in the week. Having not seen The Daily Show in a while, since dropping everything but basic cable, I went looking through The Daily Show video archives to see if I could find a few more good ones — with no luck.

Every link on the Video page links to something like this:

javascript:openMediaPop(‘/multimedia/tds/cord/cord_8065.html’,”,’SRM’,’high’);

Which opens a popup with this page. Now, the interesting thing is that I do have Real Player installed — but for some reason, Firebird hasn’t figured this out. If I could just get through the twisty-turny maze of Javascript ‘detection’ code, I could get the URL for the .ram file directly from the server and play it.

So this is where my idea for a new extension comes in. It should do this:

  • intercept Javascript calls to navigator.userAgent, navigator.plugins et al, and allow the user to select what plugins to report;
  • add a context (right-click) menu item to list the URIs used in data attributes of object tags, and allow those to be cut and pasted — or launched in any helper apps registered for that filename extension. Alternatively, it could just replace the object with a link to open that file in the helper app.

The first allows the user to choose what plugins to report are installed, and navigate their way past broken ‘detection’ scripts like Comedy Central’s and The BBC Radio Player’s.

The second then allows the user to get hold of the URL for future use, or pop it up in an external viewer.

David Hasselhoff’s role in ending the Cold War

Funny: The Beeb reports that ‘Baywatch star David Hasselhoff is griping that his role in reuniting East and West Germany has been overlooked.’

Speaking to Germany’s TV Spielfilm magazine, the 51-year-old carped about how his pivotal role in harmonising relations between the two sides of the divide had been overlooked.

‘I find it a bit sad that there is no photo of me hanging on the walls in the Berlin Museum at Checkpoint Charlie,’ he told the magazine.

Hating ABIs

Software: OK, one of my current UNIX pet peeves, perfectly illustrated by the new RPMs for KDE 3.2.

  : jm 1015...; sudo rpm -Uvh *.rpm 
  Password:
  error: Failed dependencies:
      libiw.so.26 is needed by kdenetwork-3.2.0-0.1

I don’t have a wireless card in this machine.

WHY does kdenetwork, a network configuration applet, link with a shared library component of the wireless-tools package? Why is this not simply a shell script, or even an optional binary command? Have the UNIX desktop environments forgotten all about the UNIX way in their rush to implement ‘components’? To quote Doug McIlroy :

This is the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

(my emphasis added.)

Hint: if you don’t intend to call some third-party code over and over
again several times a second — in other words, so that performance is essential — you do not need to link against it as a shared library. Calling it as a command, with fork and exec, will work just fine and avoids this kind of ‘DLL hell’.

A related issue is how this emphasis on binary or component ABIs impacts scriptability and plugins. Ever since Netscape came up with their plugins, we’ve had this new model that third-party application extensibility meant linking shared libraries into the app (with ABI issues), or calling out to components over distributed-object transports like CORBA or MCOP (with API issues), instead of the traditional ‘helper app’ model.

As a result, generally, when I install a new version of Mozilla, I have to try and remember what plugins I had in the last one, track them down, download the latest version to work around ABI changes, and hope they work in this version of the browser.

Inevitably, they don’t — I haven’t found a working Java plugin in over a year. On the other hand, I can always click on a .ram link to listen to a RealAudio stream, because it doesn’t really matter if the browser and realplayer were built with different compilers in the ‘helper app’ case.

In addition, and paradoxically, scriptability is becoming less of an option in the modern UNIX GUI apps. Let’s say I want to be able to do the kind of thing Windows has had for years with it’s ‘Send To’ menu; put a simple shell script into an ‘actions’ directory, and it’ll appear in the right-click context menu, so that I can right-click on a file and select ‘Run frobnicator’ to frobnicate it. (Similar is possible from MS Internet Explorer.)

Is it possible in Firebird? Not a hope. But you can write an extension — 100KB of undocumented Javascript. Great.

In fairness, the file managers have the right idea — GNOME’s Nautilus does support this nicely, and so does Konqueror. But there’s an ongoing tendency to adopt the ABI dynamic-linking model, or the distributed-object model, in places where it’s just not necessary, and a simple UNIX pipe or command API — the ‘helper app’ model — would work beautifully.

hmm. </rant> ;)

More interesting bits on ‘rscheearch at Cmabrigde Uinervtisy’

Spam: Gary Schrock on the SpamAssassin-talk notes:

… that study that’s being talked about in an email doesn’t exist. There’s something in the Trends in Cognitive Science journal about it, that discusses why that email is actually as readable as it is. I’d try to pass on the knowledge, but while I may work in a lab that does psycholinguistics, that doesn’t mean I understand it enough to pass it on. But the short story is there’s no such research at Cambridge.

(The irony here is that this was being talked about in the lab where I work earlier today, and when I mentioned this email someone in the lab was able to hand me to article from Trends. Unfortunately the journal is only available online with subscription.)

No Longer Possible To Spoil Votes In Ireland?

eVoting: ‘Spoiling your vote’, e.g. writing in ‘none of the above’ on a ballot paper, is a legally-permitted response to a ballot in Ireland and many other countries. Secrecy in how you vote is constitutionally required.

Aengus Lawlor on the ICTE list points out that it appears the new e-voting system in Ireland will no longer permit spoiling to take place in secrecy.

Indeed, in the 7 constituencies where e-voting machines were trialed in the 2002 Nice Referendum, no spoiled votes were cast. Compare:

  • Carlow-Kilkenny: turnout 47,192, Spoiled Votes 244 (that’s 0.51%)
  • Cork North-West: 29,056, 144 (0.49%)
  • Dublin Central: 28,880, 115 (0.39%)
  • Dublin North-Central: 36,532, 93 (0.25%)

with the e-voting constituencies:

  • Dublin South: 51,229, 0
  • Dublin South-West: 31,336, 0
  • Dublin West: 25,659, 0
  • Dun Laoghaire: 50,070, 0

A pretty notable anomaly there, ignoring the wishes of 0.5% of the electorate.

On a separate issue — let’s hope the Powervote systems aren’t as bad as the Diebold ones. Here’s the RABA Technologies’ assessment of Diebold AccuVote-TS Voting System security (PDF, 167KB), noting locks picked in 10 seconds, default passwords used to re-encode a voter card as a supervisor card, etc etc.

BUA Training — clueless interview

Media: ever wondered why SCO is being targetted by the MyDoom virus?

Wonder no more. Apparently, according to William Campbell of BUA Training in this hilariously off-the-wall interview with RTE’s Morning Ireland radio show, it’s because of the Browser Wars and ‘Open System Software’. He goes on to explain:

‘if you go to a website, such as openoffice.org, you can download a free copy of what is the competitor for Microsoft Office, an equivalent of Microsoft Word, and equivalent of Microsoft Excel, which probably most of you have on their computers.’ ‘These competitors, they don’t really exists as companies, although there are some companies such as Open Office.org and eh, Star Office and lynux, but em, Microsoft has put all the commercial competition out of business, or they bought them up or whatever.’

Complete transcript here.

Sounds like Morning Ireland needs some new ‘computer experts’ ;)