Skip to content

Author: Justin

Justin Mason, the author of this weblog.

Moving House

Bit of a meta update.

This blog has been at taint.org for a long time, but that’s got to change…

When I started the blog, in March 2000 (!), “taint” had two primary meanings; one was (arguably) a technical term, referring to Perl’s “taint checking” feature, which allowed dataflow tracing of “tainted” externally-sourced data as it is processed through a Perl program. The second meaning was the more common, less technical one: “a trace of a bad or undesirable substance or quality.” The applicability of this to the first meaning is clear enough.

Both of those fit quite nicely for my intentions for a blog, with perl, computer security, and the odd trace of bad or undesirable substances. Perfect.

However. There was a third meaning, which was pretty obscure slang at the time…. for the perineum. The bad news is that in the intervening 23 years this has now by far become the primary meaning of the term, and everyone’s entirely forgotten the computer-nerdy meanings.

I finally have to admit I’ve lost the battle on this one!

From now on, the blog’s primary site will be the sensible-but-boring jmason.ie; I’ll keep a mirror at taint.org, and all RSS URLs on that site will still work fine, but the canonical address for the site has moved. Change is inevitable!

Comments closed

An Irish Web Pioneer!

I’m happy to announce that I’m now listed on TechArchives.Irish as one of the pioneers of the Irish web!

After extensive interviewing and collaboration with John Sterne, my testimony and timeline of those early days of the Irish web is now up at TechArchives.

It’s been a good opportunity to reflect on the differences between the tech scene, then and now. I was very idealistic 30 years ago at the possibilities that the web and internet technologies had to offer; nowadays, I’m a bit more grizzled and pragmatic. But I still have hope — particularly if we can apply this tech in a way that helps address climate change, in particular…. here’s to the next 30 years!

Anyway, I hope writing this down helps record the history of those great early years of the web. Please take a look.

Comments closed

DynamoDB-local on Apple Silicon

DynamoDB Local is one of the best features of AWS DynamoDB. It allows you to run a local instance of the data store, and is perfect for use in unit tests to validate correctness of your DynamoDB client code without calling out to the real service “in the cloud” and involving all sorts of authentication trickiness.

Unfortunately, if you’re using one of the new MacBooks with M1 Apple silicon, you may run into trouble:

11:08:56.893 [DEBUG] [TestEventLogger]          DynamoDB > Feb 04, 2022 11:08:56 AM com.almworks.sqlite4java.Internal log
11:08:56.893 [DEBUG] [TestEventLogger]          DynamoDB > SEVERE: [sqlite] SQLiteQueue[]: error running job queue
11:08:56.893 [DEBUG] [TestEventLogger]          DynamoDB > com.almworks.sqlite4java.SQLiteException: [-91] cannot load library: java.lang.UnsatisfiedLinkError: /.../DynamoDBLocal_lib/libsqlite4java-osx.dylib: dlopen(/.../DynamoDBLocal_lib/libsqlite4java-osx.dylib, 0x0001): tried: '/.../DynamoDBLocal_lib/libsqlite4java-osx.dylib' (fat file, but missing compatible architecture (have 'i386,x86_64', need 'arm64e')), '/usr/lib/libsqlite4java-osx.dylib' (no such file)
11:08:56.893 [DEBUG] [TestEventLogger]          DynamoDB >      at com.almworks.sqlite4java.SQLite.loadLibrary(SQLite.java:97)
11:08:56.893 [DEBUG] [TestEventLogger]          DynamoDB >      at com.almworks.sqlite4java.SQLiteConnection.open0(SQLiteConnection.java:1441)
11:08:56.893 [DEBUG] [TestEventLogger]          DynamoDB >      at com.almworks.sqlite4java.SQLiteConnection.open(SQLiteConnection.java:282)
11:08:56.894 [DEBUG] [TestEventLogger]          DynamoDB >      at com.almworks.sqlite4java.SQLiteConnection.open(SQLiteConnection.java:293)

It’s possible to invoke it via Rosetta, Apple’s qemu-based x86 emulation layer, like so:

arch -x86_64 /path/to/openjdk/bin/java dynamodb-local.jar

But if you don’t have control over the invocation of the Java command, or just don’t want to involve emulation, this is a bit hacky. Here’s a better way to make it work.

First, download dynamodb_local_latest.tar.gz from the DynamoDB downloads page, and extract it.

The DynamoDBLocal_lib/libsqlite4java-osx.dylib file in this tarball is the problem. It’s OSX x86 only, and will not run with an ARM64 JVM. However, the same lib is available for ARM64 in the libsqlite4java artifacts list, so this will work:

wget -O libsqlite4java-osx.dylib.arm64 'https://search.maven.org/remotecontent?filepath=io/github/ganadist/sqlite4java/libsqlite4java-osx-arm64/1.0.392/libsqlite4java-osx-arm64-1.0.392.dylib'
mv DynamoDBLocal_lib/libsqlite4java-osx.dylib libsqlite4java-osx.dylib.x86_64
lipo -create -output libsqlite4java-osx.dylib.fat libsqlite4java-osx.dylib.x86_64 libsqlite4java-osx.dylib.arm64
mv libsqlite4java-osx.dylib.fat DynamoDBLocal_lib/libsqlite4java-osx.dylib

This is now a “fat” lib which supports both ARM64 and x86 hardware. Hey presto, you can now invoke DynamoDBLocal in the normal Rosetta-free manner, and it’ll all work — on both hardware platforms.

(This post is correct as of version 2022-1-10 (1.18.0) of DynamoDB-Local — let me know by mail, or at @jmason on Twitter, if things break in future, and I’ll update it.)

Comments closed

Richard J. Hayes, Ireland’s WWII cryptographer and polymath

This is new to me — Thanks to David Mee for the pointer.

‘During WWII, one of Nazi Germany’s most notorious communication codes was broken by a mild mannered librarian and family man from West Limerick, Richard Hayes. His day-job was as Director of the National Library of Ireland – but during wartime, he secretly led a team of cryptanalysts as they worked feverishly on the infamous “Görtz Cipher” – a fiendish Nazi code that had stumped some of the greatest code breaking minds at Bletchley Park, the centre of British wartime cryptography.

But who was Richard Hayes? He was a man of many lives. An academic, an aesthete, a loving father and one of World War Two’s most prolific Nazi Codebreakers.

At the outbreak of WWII, Hayes, being highly regarded for his mathematical and linguistic expertise, was approached by the head of Irish Military Intelligence (G2), Colonel Dan Bryan, with a Top Secret mission. At the behest of Taoiseach Éamon de Valera, Hayes was given an office and three lieutenants to decode wireless messages being covertly transmitted via Morse code from a house in north Dublin owned by the German Embassy. The coded messages posed a huge threat to Irish national security and the wider war effort. As Hayes team worked to break the code, it was all academic until he met his greatest challenge yet. The man who was to be his nemesis, Dr. Herman Görtz, a German agent who parachuted into Ireland in 1940 in full Luftwaffe uniform in an attempt to spy and transmit his own coded messages back to Berlin. […] The events that transpired were a battle of wits between the mild mannered genius librarian and his nemesis, the flamboyant Nazi spy.

Hayes has been referred to by MI5 as Irelands “greatest unsung hero” and the American Office of Strategic Services as “a colossus of a man” yet due to the secret nature of his work he is virtually unheard of in his own country.’

Hayes was our lead code-breaker, director of the National Library of Ireland, and then director of the Chester Beatty Museum; he was the first to discover the German use of microdots to hide secret messages; and MI5 credited him with a “whole series of ciphers that couldn’t have been solved without [his] input”. Quite the polymath!

The book is apparently well worth a read: Code Breaker, by Marc McMenamin, and I can strongly recommend this RTE radio documentary. It’s full of amazing details, such as the process of feeding Hermann Görtz false information while he was in prison, in order to mislead the Nazis.

After the war, he fruitlessly warned the Irish government not to use a “Swedish cipher machine”, presumably one made by Boris Hagelin, who went on to found Crypto AG, which later proved to be providing backdoors in its machines to the CIA and BND.

Quite a towering figure in the history of Irish cryptography and cryptanalysis!

Comments closed

Peer-to-peer COVID-19 contact tracing without the surveillance

Maciej Ceglowski asks for a massive surveillance program to defeat COVID-19.

However, as I mentioned on twitter — there IS an alternative, privacy-preserving approach, which is what is being done in Singapore with their TraceTogether app.

In summary, everyone carries a phone running an app which has an anonymized a random ID, scans local Bluetooth periodically for other people’s apps with their random IDs, and records them locally (not uploading to a server). If you find out you have COVID-19 you then trigger an upload of your contact history to a central server. That server then broadcasts out the list of IDs, and everyone you’ve been in contact with will then get a ping on their app to get tested, self-isolate, etc.

No central surveillance, no creepy big brother watching your location.

My pinboard has a few more write-ups on basically the same idea from various other places, including MIT. This is similar to what China’s app does, but (as far as I can tell) with more privacy.

It looks like the Singaporean government digital services team behind TraceTogether is putting together an open source version, at Bluetrace.io.

IMO we have to do this or we will never get out of COVID-19 lockdown before 2021. I am massively in favour of adopting this approach in Ireland and across the world.

2 Comments

Fixing echoing sound effects with Huawei Histen

Here’s a quick tip for people using Huawei or Honor phones.

Huawei recently released EMUI version 9.1.0.326 as an OTA update, which I applied once it was offered as an upgrade option.

Once I installed that OS upgrade, however, I noticed that whenever I listened to music or podcasts using a Bluetooth headset or stereo speakers, there was a new and very noticeable ‘echoing’ effect on the audio.

It appears this was due to the addition of Huawei Histen, a 3D audio/equaliser feature, which apparently will add 3D audio effects when listening on wired headphones of various varieties — however this is supposed to be disabled on Bluetooth devices.

I spent several days fruitlessly googling how to disable Histen, but with no luck. Eventually, through trial and error, I discovered a workaround — simply plug in a pair of wired headphones, go into Settings -> Sounds -> Huawei Histen sound effects, and choose “Natural sound”. Hey presto, next time you use Bluetooth headphones, it should no longer have the echo.

1 Comment

Recipe: clara con limón granizado

I came across this cocktail in Pals, in Catalonia, in 30 degree heat, a few weeks back — I saw it on the menu at the cafe in the square of the old town, and had to give it a go. It’s incredible. Basically, it’s lager mixed with a lemon granita — like a beer slushy. Nothing is better at thirst quenching on a hot day, and best of all it’s quite low in alcohol so no worries about lorrying into it during the daytime :)

This year at Groovefest, our yearly get together/mini-festival, I got to serve up a few, with great results — they were quite popular. So here’s the recipe!

First off, a day or two in advance, make a batch of lemon granita. I based mine on this recipe which I’ll copy here just in case the original goes away:

Lemon Granita

Serves: about 8

Ingredients:

  • 3-4 lemons
  • 1L water
  • 150g of sugar

Method:

  • Zest the lemons and set the zest aside. Juice the lemons until you have 150ml juice (you may not need all of them).

  • Add the water and sugar to a large pan and bring to the boil. Reduce to a simmer and cook for 2 minutes, stirring to dissolve the sugar.

  • Add the lemon juice and zest, remove from the heat and cover. Set aside to cool for 20 minutes.

  • Strain the mixture into 2 containers that will fit in your freezer and leave to cool to room temperature.

  • Freeze until the mixture is partially frozen, which should take several hours. (I just left them overnight)

  • Remove the granita from the freezer and leave at room temperature until you can break it into chunks with a large spoon or fork.

  • Either transfer to a blender or food processor and blitz, or break it up with a fork. It doesn’t need to be perfectly smooth and snowy — a slushy texture is just right for this drink.

  • Store in the freezer. Take out 30 minutes before serving and break it up again with a fork.

Clara Con Limón Granizado

To serve: half-fill a half-pint glass with the lemon granita. Pour the beer on top to fill the glass. Stir once or twice to mix. Enjoy!

PS: I think — not sure as my Catalan is pretty terrible — it may be a clara granitzada in Catalonia…

Comments closed

Elsewhere….

It’s been a while since I wrote a long-form blog post here, but this post on the Swrve Engineering blog is worth a read; it describes how we use SSD caching on our EC2 instances to greatly improve EBS throughput.

Comments closed

Don’t use Timers with exponentially-decaying reservoirs in Graphite

A common error when using the Metrics library is to record Timer metrics on things like API calls, using the default settings, then to publish those to a time-series store like Graphite. Here’s why this is a problem.

By default, a Timer uses an Exponentially Decaying Reservoir. The docs say:

‘A histogram with an exponentially decaying reservoir produces quantiles which are representative of (roughly) the last five minutes of data. It does so by using a forward-decaying priority reservoir with an exponential weighting towards newer data. Unlike the uniform reservoir, an exponentially decaying reservoir represents recent data, allowing you to know very quickly if the distribution of the data has changed.’

This is more-or-less correct — but the key phrase is ‘roughly’. In reality, if the frequency of updates to such a timer drops off, it could take a lot longer, and if you stop updating a timer which uses this reservoir type, it’ll never decay at all. The GraphiteReporter will dutifully capture the percentiles, min, max, etc. from that timer’s reservoir every minute thereafter, and record those to Graphite using the current timestamp — even though the data it was derived from is becoming more and more ancient.

Here’s a demo. Note the long stretch of 800ms 99th-percentile latencies on the green line in the middle of this chart:

However, the blue line displays the number of events. As you can see, there were no calls to this API for that 8-hour period — this one was a test system, and the user population was safely at home, in bed. So while Graphite is claiming that there’s an 800ms latency at 7am, in reality the 800ms-latency event occurred 8 hours previously.

I observed the same thing in our production systems for various APIs which suffered variable invocation rates; if rates dropped off during normal operation, the high-percentile latencies hung around for far longer than they should have. This is quite misleading when you’re looking at a graph for 10pm and seeing a high 99th-percentile latency, when the actual high-latency event occurred hours earlier. On several occasions, this caused lots of user confusion and FUD with our production monitoring, so we needed to fix it.

Here are some potential fixes.

  • Modify ExponentiallyDecayingReservoir to also call rescaleIfNeeded() inside getSnapshot() — but based on this discussion, it appears the current behaviour is intended (at least for the mean measurement), so that may not be acceptable. Another risk of this is that it leaves us in a position where the percentiles displayed for time T may actually have occurred several minutes prior to that, which is still misleading (albeit less so).

  • Switch to sliding time window reservoirs, but those are unbounded in size — so a timer on an unexpectedly-popular API could create GC pressure and out-of-memory scenarios. It’s also the slowest reservoir type, according to the docs. That made it too risky for us to adopt in our production code as a general-purpose Timer implementation.

  • Update, Dec 2017: as of version 3.2.3 of Dropwizard Metrics, there is a new SlidingTimeWindowArrayReservoir reservoir implementation, which is a drop-in replacement for SlidingTimeWindowReservoir, with much more acceptable memory footprint and GC impact. It costs roughly 128 bits per stored measurement, and is therefore judged to be ‘comparable with ExponentiallyDecayingReservoir in terms of GC overhead and performance’. (thanks to Bogdan Storozhuk for the tip)

  • What we eventually did in our code was to use this Reporter class instead of GraphiteReporter; it clears all Timer metrics’ reservoirs after each write to Graphite. This is dumb and dirty, reaching across logical class boundaries, but at the same time it’s simple and comprehensible behaviour: with this, we can guarantee that the percentile/min/max data recorded at timestamp T is measuring events in that timestamp’s 1-minute window — not any time before that. This is exactly what you want to see in a time-series graph like those in Graphite, so is a very valuable feature for our metrics, and one that others have noted to be important in comparable scenarios elsewhere.

Here’s an example of what a graph like the above should look like (captured from our current staging stack):

Note that when there are no invocations, the reported 99th-percentile latency is 0, and each measurement doesn’t stick around after its 1-minute slot.

Another potential bug fix for a related issue, would be to add support to Metrics so that it can use Gil Tene’s LatencyUtils package, and its HdrHistogram class, as a reservoir. (Update: however, I don’t think this would address the “old data leaking into newer datapoints” problem as fully.) This would address some other bugs in the Exponentially Decaying Reservoir, as Gil describes:

‘In your example of a system logging 10K operations/sec with the histogram being sampled every second, you’ll be missing 9 out of each 10 actual outliers. You can have an outlier every second and think you have one roughly every 10. You can have a huge business affecting outlier happening every hour, and think that they are only occurring once a day.’

Eek.

Comments closed

the coming world of automated mass anti-terror false positives

Man sues RMV after driver’s license mistakenly revoked by automated anti-terror false positive:

John H. Gass hadn’t had a traffic ticket in years, so the Natick resident was surprised this spring when he received a letter from the Massachusetts Registry of Motor Vehicles informing him to cease driving because his license had been revoked. […] After frantic calls and a hearing with Registry officials, Gass learned the problem: An antiterrorism computerized facial recognition system that scans a database of millions of state driver’s license images had picked his as a possible fraud. “We send out 1,500 suspension letters every day,” said Registrar Rachel Kaprielian. […] “There are mistakes that can be made.”

See also this New Scientist story. This story notes that the system’s pretty widespread:

Massachusetts bought the system with a $1.5 million grant from the Department of Homeland Security. At least 34 states use such systems, which law enforcement officials say help prevent identity theft and ID fraud.

In my opinion, this kind of thing — trial by inaccurate, false-positive-prone algorithm, is one of the most worrying things about the post-PRISM world.

When we created SpamAssassin, we were well aware of the risk of automated misclassification. Any machine-learning classifier will always make mistakes. The key is to carefully calibrate the expected false-positive/false-negative ratio so that the negative side-effects of a misclassification corresponds to the expected rate.

These anti-terrorism machine learning systems are calibrated to catch as many potential cases as possible, but by aiming to reduce false negatives to this degree, they become wildly prone to false positives. And when they’re applied as a dragnet across all citizens’ interactions with the state — or even in the case of PRISM, all citizens’ interactions that can be surveilled en masse — it’s going to create buckets of bureaucratic false-positive horror stories, as random innocent citizens are incorrectly tagged as criminals due to software bugs and poor calibration.

Comments closed

The easy way to find JMX metrics in the field using jmxsh

(oh look, a proper blog post!)

JMX is the de-facto standard in the Java and JVM-based world for exposing service metrics, and feeds nicely to tools like Graphite using JMXTrans and others. However, it’s pretty obtuse and over-complex, and it can be hard to figure out what path the JMX metrics will show up under once deployed.

Unfortunately, once a JVM-based service is deployed to EC2, it becomes very difficult to use jconsole to connect to it, due to deficiencies and crappy design in the JMX RMI protocol (I love the way they reinvented the broken parts of IIOP in that respect). Don’t even bother; instead, use jmxsh: https://code.google.com/p/jmxsh/ .

To use this, you need to modify the service process’ command line to include the following JVM args, so that the remote JMX API is exposed:

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=16660 -Dcom.sun.management.jmxremote.local.only=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

Change the port number if there is already a process running on that port. Ensure the port isn’t accessible from off-host; in EC2, this should be safe enough to use once that port number is not in the EC2 security group.

Go to https://code.google.com/p/jmxsh/downloads/list and download the latest jmxsh-FOO.jar; e.g. ‘wget https://jmxsh.googlecode.com/files/jmxsh-R5.jar’. Then on the host, as the UID the service is running under, run: ‘java -jar jmxsh-R5.jar -h 127.0.0.1 -p 16660’. You can then hit “Enter” to go into “Browse Mode”, and you’ll get text menus like this:

 ====================================================

  Attribute List:

        1. -r- long        MaxFileDescriptorCount
        2. -r- long        OpenFileDescriptorCount
        3. -r- long        CommittedVirtualMemorySize
        4. -r- long        FreePhysicalMemorySize
        5. -r- long        FreeSwapSpaceSize
        6. -r- long        ProcessCpuTime
        7. -r- long        TotalPhysicalMemorySize
        8. -r- long        TotalSwapSpaceSize
        9. -r- String      Name
       10. -r- int         AvailableProcessors
       11. -r- String      Arch
       12. -r- double      SystemLoadAverage
       13. -r- String      Version

   SERVER: service:jmx:rmi:///jndi/rmi://127.0.0.1:16660/jmxrmi
   DOMAIN: java.lang
   MBEAN:  java.lang:type=OperatingSystem

 ====================================================

Navigate through the MBean tree looking for good Attributes which would make good metrics (5 in the list above, for example). Note the MBean and the Attribute names.

Comments closed

Leaving Amazon

So, after just over 3 and a half years, I’m leaving Amazon.

It’s been great fun — I can honestly say, even with my code being used by hundreds of millions of users in SpamAssassin and elsewhere, I hadn’t really had to come to grips with the distributed systems problems that an Amazon-scale service involves.

During my time at Amazon, I’ve had the pleasure of building out a brand-new, groundbreaking innovative internal service, from scratch to its current status where it’s deployed in production datacenters worldwide. It’s a low-latency service, used to monitor Amazon’s internal networks using massive quantities of measurement data and machine learning algorithms. It’s really very nifty, and I’m quite proud of what we’ve achieved. I was lucky to work closely with some very smart people during this, too — Amazon has some top-notch engineers.

But time to move on! In a week’s time, I’ll be joining Swrve to work on the server-side architecture of their system. Swrve have a very interesting product, extending the A/B-testing model into gaming, and a great team; and it’ll be nice to get back into startup-land once again, for a welcome change. (It’s not all roses working for a big company. ;) I’m looking forward to it. Who knows, I may even start blogging here again…

Pity about losing those 12 phone tool icons though!

5 Comments

Flood of posts

Sorry for the flood of recent posts — turns out my cron job to gateway from Pinboard had stopped running due to cron fail. (I should really set up some monitoring someday ;)

Comments closed

Telegraph spam in 1864

Here’s a letter to the editor of The Times, dated 1st June 1864:

TO THE EDITOR OF THE TIMES.
Sir, — On my arrival home late yesterday evening a “telegram,” by “London District Telegraph,” addressed in full to me, was put into my hands. It was as follows :–
“Messrs. Gabriel, dentists, 27, Harley-street, Cavendish-square. Until October Messrs. Gabriel’s professional attendance at 27, Harley-street, will be 10 till 5.”
I have never had any dealings with Messrs. Gabriel, and beg to ask by what right do they disturb me by a telegram which is evidently simply the medium of advertisement? A word from you would, I feel sure, put a stop to this intolerable nuisance. I enclose the telegram, and am,
Your faithful servant,
M.P.
Upper Grosvenor-street, May 30.

(thanks to Tony Finch for the forward)

Comments closed

In Dublin? Hear me talk about AWS network monitoring!

Reminder to Dublin-based readers — next week, Amazon (my employers) will be putting on Under the Hood at Amazon, billed as ‘A night of Beer, Pizza and Cloud Computing for Software Developers’. I’ll be speaking at it.

It’s partially a recruiting event, but even if you’re not looking for a new job, please come along. It’s also useful for us to talk about some details of what we’ve been doing in Dublin, since we’ve been operating to date with a pretty low profile, and in reality there’s some very interesting stuff going on here… particularly the product I’ll be talking about, naturally.

Also, there’ll be free beer and some Kindles to be won ;)

It’s next Thursday night, in our offices in Kilmainham. More info on this Facebook page.

Comments closed

temporary Hackerspace at MindField

This sounds very cool! Nice one, hackerspace ppl.

Ireland’s Hackerspaces and Makerspaces (091 Labs – Galway, Belfast Hackerspace, MilkLabs – Limerick, Nexus Cork and TOG – Dublin) have been asked to build and man a temporary hackerspace during the MindField – International Festival of Ideas (http://www.mindfield.ie/). MindField will take place over the weekend of 29 April – 1 May in Merrion Square.

During MindField our temporary hackerspace will provide a range of events where festival participants can learn about diybio, 3D printing, basic electronics and micro controllers, electronic fashion/crafting and open data. These events are included in the festival schedule (http://mindfield.ie/festival-schedul/).

In parallel with these events we have an opportunity run a Hardware Hacking Challenge. In this challenge we will try to engage a group of willing hacker, makers and festival participants in the challenge to create or construct interesting or innovative projects out of recycled hardware. We are trying to source interesting materials, electronic devices or equipment that can be used to based projects off or as sources of components.

We are particularly interested in devices that contain various types of transducers which can then be hooked up to micro controllers and computers. We’re not looking for normal computer equipment or servers we’ve got lots of that, but more unusual stuff that people have lying around.

If you think you’ve got something they might like, contact Robert Fitzsimons.

Comments closed

My Problem With Norris

I’m uncomfortable voting for David Norris for President. Here’s why.

In November last year, he was a key voice in a Senate debate on the topic of “Protection of Intellectual Property Rights”, where he quoted heavily from the flawed judgement by Mr. Justice Peter Charleton in the Warner, Universal, Sony BMG and EMI vs UPC case. (There are allegations that he called the debate after speaking to Paul McGuinness (U2’s manager) and Niall Stokes (of Hot Press).)

In the debate, Norris quotes Mr Justice Charleton, saying:

‘In failing to provide legislative provision for blocking, diverting and interrupting internet copyright theft, Ireland is not yet fully in compliance with its obligations under European law.’ Norris then says: ‘Irish law could be brought into alignment with the intention of the European directive through a simple statutory instrument.’ [1]

Now, let me clarify my position — I’m in favour of some means of resolving the level of piracy of music and movies which is widespread nowadays, and I believe there’s a mutually agreeable way to do this. But what Norris and Mr Justice Charleton propose is not it. Here are the problems as I see them.

It Lets The Internet Filtering Genie Out Of The Bottle

The big one.

The problem is that any infrastructure for ‘blocking, diverting and interrupting internet copyright theft’ is effectively infrastructure for ‘blocking, diverting and interrupting’ any communication on the net. We have to be very careful about how this is permitted, as it’ll very quickly suffer “feature creep” and become a general-purpose censorship system — the Great Firewall Of Ireland. As Damien Mulley put it:

‘first they’ll start with the Pirate Bay. Then comes Mininova, IsoHunt, then comes YouTube (they have dodgy stuff, right?), how long before we have Boards.ie because someone quoted a newspaper article or a section of a book? And don’t think they’ll stop there too, any site that links to The Pirate Bay and the others on the hate list will probably be added to the list too…’

In Australia, the anti-child-porn filtering system was quickly used to block gambling websites, gay and straight porn sites, political parties, Wikipedia entries, Christian sites, Wikileaks, and a dentist; in Thailand, a similar system was used to block criticism of the royal family.

Will It Help? I Don’t Think So

Norris:

‘As long as Irish law is deficient, Mr. Justice Charleton has found that all creative Irish industries are losing money.’

This is quite a hilariously overblown and sweeping statement. ALL creative Irish industries? What qualifies as a ‘creative’ industry? I suspect some in this country have been involved in industrial acts of creation that made money. ;)

While they’re not Irish, the well-known indie label Beggar’s Banquet has gone on the record as stating the opposite where the current music situation is concerned —

“There’s fewer gatekeepers now. We don’t have to knock on a TV station’s door or a radio station’s door and it’s made us far more competitive. […] There’s a wide highway in front of us we can go speeding down, and it wasn’t there even two years ago. It means the majors are looking at a world where only 35 Gold Albums a year are certified compared to ten times that recently. But going above Gold in the US is not a problem for us.”

So it appears a ‘creative’ industry (albeit in the UK) is finding things not quite so bad.

Norris again:

‘the facts were established in the judgment of Mr. Justice Charleton in which he stated: “Between 2005 and 2009 the recording companies experienced a reduction of 40% in the Irish market for the legal sale of recorded music.” That is a devastating blow. […] He went on to state: “Some 675,000 people are likely to be engaged in some form of illegal downloading from time to time.”’

Without quite lining up one statement with the other, this reinforces the impression that the only reason the recording companies have seen these drops in revenues is due to internet-borne piracy. However, quoting the brilliant Mumblin’ Deaf Ro on the topic of lies, damn lies, and music biz statistics:

‘The drop in the value of Irish retail music sales was 11.7% between 2008 and 2009, which is significantly less than the 18% overall drop in retail sales for the economy that year. Digital album sales have increased by 30% since 2007 both in terms of volume and market value.’

So in other words, between 2008 and 2009, Irish retail music sales outperformed the retail sales economy as a whole!

In addition, Ro provides the following BPI figures for UK market volumes over the 2005-2009 period:

    Year  Albums  Singles
    2005  159.0m   47.9m
    2006  154.7m   66.9m
    2007  138.1m   86.6m
    2008  133.6m  115.1m
    2009  128.9m  152.7m

It’s clear that singles sales went through the roof, more than tripling. Album sales did drop however, but nowhere near by 40% — and this coincided with the general drop in the prevailing global economy around that time. He also notes that digital sales in the UK went through the roof globally on a number of metrics in 2009.

While this does not provide figures for the Irish market, I’m at a loss as to how it could be radically different — Irish and UK consumers have pretty similar musical tastes and consumption habits, I would guess.

Here’s a theory: perhaps the issue could be that “Irish” music sales are associated with bricks-and-mortar music shops selling the physical product, whereas digital music sales are associated with online services based outside Ireland, and an Irish buyer buying an album at 7digital.co.uk, or on iTunes, isn’t counted as an “Irish retail sale”? Could the problem be that we don’t have any significant Irish shops selling music online, I wonder?

Bricks-and-mortar music shops, such as ex-Senator Donie Cassidy’s “Celtic Note” (who coincidentally was quite vociferous in that Seanad debate), are indeed hurting in this new model of music consumption — and that’s a problem. But given that good, working digital music sales systems are in operation, it doesn’t necessarily appear to be due to massive volumes of internet-borne piracy, going by these figures.

Essentially, internet piracy is a convenient bogeyman, especially for the technophobic old guard, but may have little bearing on the current woes of the Irish record industry and bricks-and-mortar music shops.

(Update: a couple of days after this was posted, a pair of economists at the LSE have said basically the same thing.)

Audible Magic Won’t Work For Long Anyway

Audible Magic, which Norris suggests is IRMA’s favoured filtering system, received the following verdict from the EFF back in 2004:

‘Should Audible Magic’s technology be widely adopted, it is likely that P2P file-sharing applications would be revised to implement encryption. Accordingly, network administrators will want to ask Audible Magic tough questions before investing in the company’s technology, lest the investment be rendered worthless by the next P2P “upgrade.”‘

Naturally, encryption is widespread nowadays, so this may already be the case.

Internet Censorship Harms Our Global Image

As Adrian Weckler points out:

‘do we really want to send out the message that, digitally, we’re the new France? Come to think of it, do we want to tell Google, Facebook, Apple and Twitter that, digitally, we’re the new Britain?’

Right now, more than ever, we need to put out an image that we’re ready to do business on our end of the internet. Mandatory censorship systems don’t exactly support this.

In Summary

So in summary, I would hope to see a more balanced approach to the issue from Norris. Most of the problematic statements in his speech were directly sourced from Mr. Justice Charleton’s flawed judgement, but some critical thinking would be vital, I would have thought. The fact that this was lacking, particularly given the allegations of heavy music-biz lobbying beforehand, leaves me feeling less inclined to vote for him than I would have been before, particularly since I haven’t heard any clarification on these issues.

([1]: Funnily enough, an SI similar to this was nearly sneaked through a couple of weeks ago, according to reports.)

1 Comment

Against The Use Of Programming Languages in Configuration Files

It’s pretty common for apps to require “configuration” — external files which can contain settings to customise their behaviour. Ideally, apps shouldn’t require configuration, and this is always a good aim. But in some situations, it’s unavoidable.

In the abstract, it may seem attractive to use a fully-fledged programming language as the language to express configuration in. However, I think this is not a good idea. Here are some reasons why configuration files should not be expressed in a programming language (and yes, I include “Ruby without parentheses” in that bucket):

Provability

If a configuration language is Turing-incomplete, configuration files written in it can be validated “offline”, ie. without executing the program it configures. All programming languages are, by definition, Turing-complete, meaning that the program must be executed in full before its configuration can be considered valid.

Offline validation is a useful feature for operational usability, as we’ve found with “spamassassin –lint”.

Security

Some configuration settings may be insecure in certain circumstances; for example, in SpamAssassin, we allow certain classes of settings like whitelist/blacklists to be set in a users ~/.spamassassin/user_prefs file, while disallowing rule definitions (which can cause poor performance if poorly written).

If your configuration file is simply an evaluated chunk of code, it becomes more difficult to protect against an attacker introspecting the interpreter and overriding the security limitations. It’s not impossible, since you can, for instance, use a sandboxed interpreter, but this is typically not particularly easy to implement.

Usability

Here’s a rather hairy configuration file I’ve concocted.

    #! /usr/bin/somelanguage
    !$ app.status load html
    !c = []
    ;c['sources'] = < >
    ;c['sources'].append(
        NewConfigurationThingy("foo_bar",
            baz="flargle"))
    ;c['builders'] = < >
    ;c['bots'] = < >
    !$ app.steps load source, shell
    ;bf_mc_generic = factory.SomethingFactory( <
        woo(source.SVN, svnurl="http://example.com/foo/bar"),
        woo(shell.Configure, command="/bar/baz start"),
        woo(shell.Test, command="/bar/baz test"),
        woo(shell.Configure, command="/bar/baz stop")
        > );
    ;b1 = < "name": "mc-fast", "slavename": "mc-fast",
                 "builddir": "mc-fast", "factory": ;bf_mc_generic >
    ;c['builders'].append(;b1)
    ;SomethingOrOther = ;c

This isn’t actually entirely concocted from thin air — it’s actually bits of our BuildBot configuration file, from before we switched to using Hudson. I’ve replaced the familiar Python syntax with deliberately-unfamiliar made-up syntax, to emulate the user experience I had attempting to configure BuildBot with no pre-existing Python knowledge. ;)

Compare with this re-stating of the same configuration data in a simplified, “configuration-oriented” imaginary DSL:

add_source NewConfigurationThingy foo_bar baz=flargle

buildfactory bf_mc_generic source.SVN http://example.com/foo/bar
buildfactory bf_mc_generic shell.Configure /bar/baz start
buildfactory bf_mc_generic shell.Test /bar/baz test
buildfactory bf_mc_generic shell.Configure /bar/baz stop

add_builder name=mc-fast slavename=mc-fast
     builddir=mc-fast factory=bf_mc_generic

Essentially, I’ve extracted the useful configuration data from the hairy example, discarded the symbology used to indicate types, function calls, data structure construction, and let the configuration domain knowledge imply what’s necessary. Not only is this easier to comprehend for the casual reader, it also reduces the risk of syntax errors, by simply minimising the number of syntactical components.

See Also

The Wikipedia page on DSLs is quite good on the topic, with a succinct list of pros and cons.

This StackOverflow thread has some good comments — I particularly like this point:

When you need your application to be very “configurable” in ways that you cannot imagine today, then what you really need is a plugins system. You need to develop your application in a way that someone else can code a new plugin and hook it into your application in the future.

+1.

This seems to be a controversial topic — as you can see, that page has people on both sides of the issue. Maybe it fundamentally comes down to a matter of taste. Anyway — my $.02.

Update: discussions elsewhere: HackerNews

Another Update, 2012-04-06: Robey Pointer wrote a post called Why Config?, in which he describes a Scala-based configuration language in use at Twitter, which uses Scala’s runtime code evaluation, and a Scala trait, to express configuration succinctly in a Scala source file and load it at runtime. The downside? It’s a Scala source file, executed at runtime, containing configuration. :(

However, this comment in the comments section is worth a read:

At Netli (now part of Akamai) we had a configuration framework very similar in spirit and appearance to Configgy. It was in early 2000-s, we open sourced it since. (http://ncnf.sourceforge.net/). It would provide on-the-fly reload for the C-based programs (the ncnf if a C library). It also had some perks like attribute inheritance and a concept of block references. Most importantly though, it contained a separate schema language and a validator to allow configuration be checked before pushing in production. At Netli we used it to configure 1200 services on over 400 hardware boxes, the configuration becoming about 20+mb in length (assembled from several pieces by the CPP, then M4 templating library).

Naturally, it wasn’t Netli’s first attempt at doing configuration. One of the first attempts failed since it was Turing-complete. That approach was to specify the configuration as a Perl data specification. In a very short time the lure of unused expressiveness of such Turing-complete environment prevailed and people started to write for-loops around data pieces and doing other tricks to remove redundancy from the configuration. It turned out to be a disaster in the end, with configuration becoming unmaintainable and flaky.

One principle I got out out of that exercise is that configuration shall not be Turing-complete. We’ve got burned specifically by that property far too many times. Yet I do agree with you that a validation facility is a must-have, which is something not usually part of the simple text-based frameworks. C-based NCNF had it almost from the very beginning though, and it proved to be a very useful harness.

+1. There’s lots more info on that system at this post at lionet.livejournal.com.

Another Update, 2017-05-09: casio_juarez on Twitter:

Also related: The Configuration Complexity Clock.

(Image credit: Turn The Dial by VERY URGENT Photography)

15 Comments

I made a sled

Facing yet another day of being snowed in, with Dublin’s icy roads and footpaths driving us all stir crazy, I came up with this:

More pics, vid — fun!

1 Comment

Science Gallery Xmas Cards

The Dublin Science Gallery Greeting Cards are excellent!

Get ’em here, or pick up one of the great gadgets and gifts they have in stock.

(disclaimer: I am mates with the designer and the guy who runs the shop — but I still think they’re great work, regardless ;)

4 Comments

Name-checked in the Seanad

So, after I posted this post about Aslan’s imaginary illegal downloads, someone on Twitter linked to this comment by Senator Paschal Mooney (Fianna Fail), in the Seanad the next day, repeating the incorrect Aslan factoid:

Sen. Paschal Mooney (Fianna Fail): There is a perception that the big five record companies, all international companies, have been ripping off the consumer for many years. I do not want to be seen as an apologist for the music industry, but at the lower level I can give a specific example to highlight the impact of illegal downloading on Aslan, an Irish band. It has sold 6,000 copies of its current album, but there have been 22,000 illegal downloads. […] Why must we wait for a High Court judgment to be made before we introduce relevant legislation?

It appears a few people, Adam Beecher for one, got in touch with the Senator by email. To my surprise, a couple of days later, I got some Twitter messages telling me that I’d been mentioned in the Seanad! Indeed, here it is:

Sen. Paschal Mooney (Fianna Fail): Last week on the Order of Business I raised an issue relating to illegal downloading of music on the Internet which followed on a court case which the major international record companies had lost that had been taken the previous day. I asked the Leader what possible legislation could be introduced to address this gap, and I am repeating the request. I have had quite a significant amount of response to the comments I made last week, specifically from persons who state that the figures quoted in my report, and also the figures quoted in the court case to defend the record companies’ position, are inaccurate, and I was asked by a number of those who emailed me to correct the record. Having investigated this further – I recommend to the House that those who are interested log on to taint.org – there is no doubt that the figures that have been quoted to support the court case, which was subsequently lost, are not accurate. It related to the group Aslan. I do not want to delay the House on this other than to correct the record in that I put the figures as I had received them in good faith and such has been the response to the comments I made in the House last week that I feel obliged to correct the record and state that there is no doubt but that the figures that have been used are, at best, suspect.

It would be important if the Leader could have the Minister for Enterprise, Trade and Innovation, Deputy Batt O’Keeffe, come to the House to give some indication of his proposals because the music industry is currently lobbying in this House and in the other House to have legislation changed to benefit it. However, there is a wider view that illegal downloading will continue irrespective of what happens, the record companies are now on the defensive and there are other alternatives that could be brought forward such as licensing those who wish to download. In that context, I would be interested in the Leader’s response.

A few comments in response:

  • Credit is due to Senator Mooney in that he admitted that he’d been misled, and corrected the record in that regard.

  • it’s amazing to see that the democratic process has opened up to this degree. I would have never expected to have this degree of input to our elected representatives without having to go through more traditional channels (face-to-face meetings etc.)

  • Finally: ‘The music industry is currently lobbying in this House and in the other House to have legislation changed to benefit it’. That is very, very worrying. Indeed, suzybie noted on Twitter:

@jmason not sure if you caught it but I saw Willie K and his mates entering Dáíl last Wednesday evening. FF backbenchers were being met

McGarr solicitors have been in touch with the relevant Ministers requesting that Digital Rights Ireland be included in any discussions regarding legislative change. This will be one to keep an eye on.

3 Comments

Irish Times Letter re EMI v UPC

Submitted via email to their letters page. This may be a bit too long for the format, but hey. Enjoy.

Madam, — Commentary in this paper and elsewhere has given the impression that Mr. Justice Charleton’s judgement on the EMI v. UPC case was a poor result for EMI and the other record companies represented. This is not necessarily the case. While UPC may not yet have to implement “three strikes”, there are many things to worry the Irish internet user in the judgement.

Mr. Justice Charleton states that he is satisfied that the business of the recording companies is being devastated by piracy, entirely based on evidence submitted by the record companies and IRMA. One of these assertions was that over 20,000 illegal downloads of an “Aslan” album had been “traced” — but no details of the methodology of this “tracing” has been produced.

Third-party attempts to reproduce this figure indicate that it is probable that an extremely naive approach was taken in this testing — the putative copies of the album available to download, and their large download figures, are in reality a lure used by criminals to persuade unwitting victims to provide their credit card details to fraudulent websites.

Worryingly, this flawed evidence has already been represented as fact in the Seanad by FF senator Paschal Mooney.

Other studies cited in the judgement have been criticised widely elsewhere, including by the US Government Accountability Office in its April 2010 report to the US Congress.

Mr. Justice Charleton goes on to suggest that all internet access from UPC (and presumably other ISPs) be filtered through a piracy-detection system. One wonders what the many companies who currently run internet-based services from Ireland would make of this proposal.

The government now seems keen to rush in and implement the filtering and blocking systems requested by IRMA and the music companies, as Mr. Justice Charleton recommends, or possibly even to give hand-outs to the music industry to compensate them, as IRMA demands. One hopes that more technical expertise will be brought to bear on the supposed “evidence” before this happens.

Yours, etc., Justin Mason

2 Comments

Aslan’s hard times, from the UPC judgement

Oh dear. Quoting Mr Justice Charleton’s judgement in favour of UPC vs. EMI, Sony, et al:

‘This scourge of internet piracy strongly affects Irish musicians, most of whom pay tax in Ireland. ‘Aslan’ is a distinguished Irish group which has a loyal fan base; but not all of them believe in paying for music.Previous sales of their albums were excellent, about 35,000 per album, and in respect of one called “Platinum Collection”, a three CD box set, 50,000 copies were sold. More recently, an album called “Uncased” was released and only 6,000 copies were sold.Perhaps, it might be thought, the album was not popular and did not sell well? In contrast, a search was made to see how many illegal downloads had been made on the internet from that album, and 22,000 were traced.’

Aslan, eh?

So, that would be about the same figure as EMI quoted in a press statement in July 2009, which ‘Gambra’ on the thumped.com boards thoroughly debunked at the time:

‘I’ve just been listening to the first minute or two of this and have done a mere 10 minutes of googling to try verify the claim of 25,000 downloads. The EMI press statement mentioned that they’ve tracked that amount of downloads “through Torrents Nova and Pirate Bay alone.” The first problem with that is that there’s no such site as Torrents Nova (I presumed they meant mininova but Aslan gets zero hits over there) but never mind, we’ll carry on. Next I search for ever possible permutation for downloads of the new Aslan album and I kept getting the same result which is “Aslan – Uncase’d (2009) KompletlyWyred Dhz.inc” which was uploaded to thepiratebay. However this file only has a grand total of 9 seeders and 6 leechers and has been alive since the 26th of June. There’s no way of telling how many times it’s been leeched exactly but even if it was 6 new leechers every day it’d be a total of 108 downloads. It is fair to assume that only 9 of these bothered to seed back so I’d say the total is right.

Wondering still where the hell they got their mystical 25,000 total from I just searched for “Aslan Uncased” and was surprised to see 5 links to torrents of the album in the first two pages of results. However 4 of the 5 just link back to the one on TPB with 9 seeders. The 5th is where I think they got their mystical 25,000 total from:

http://www.nowtorrents.com/torrents/aslan-uncased.html

This is the 7th result you get on google for the album title and when you click it you actually get “No Matches were found” but up at the top are FAKE results that are actually just ad links. You could search for anything and you’ll get those exact same four ad results.

http://www.nowtorrents.com/torrents/gambra-thumped.html

If you refresh the totals change each time so it’s safe to say they found this link by googling the name, added up the total of listed downloads they got (which is totally random) and are using that to moan about their loss of sales. Incredible.’

Indeed, according to the site, an album called ‘Justin Mason on the nose flute’ has been downloaded 24,752 times — I never knew! Where’s my cheque?

Some quality facts and figures from EMI there, I suspect.

6 Comments

E-mail Address Validating Regular Expressions – a Warning

This page has been floating around in links over the past couple of weeks, as a collection of test cases to compare e-mail address validating regular expressions. However, watch out: it’s wrong.

RFC822/2822 defines an email address with a bare IP address domain part as using:

  domain-literal  =       [CFWS] "[" *([FWS] dcontent) [FWS] "]" [CFWS]

In other words, this test case is not valid at all:

  [email protected]

Instead, it should be:

  IPInsteadOfDomain@[127.0.0.1]

ditto for the other addrs using IP addresses in the domain part. They’re rare, but the non-bracketed form is definitely not legal and should not be considered so in the test cases.

I sent a mail to the author a few days ago without response, hence this post.

5 Comments

Travel Insurance that works, even with ash clouds about?

Lazyweb request. In a few weeks I’ll be taking a flight, along with the wife and kids, for some holidays.

The trip was booked before the whole ash-cloud thing and I used Ace Travel Insurance, a typical low-cost travel insurance agency, winding up with an ‘ACE Travel Single Trip Travel HealthCover+ Insurance Policy’.

Looking at the policy doc now, it expressly excludes cover for ‘a Public Conveyance being cancelled or curtailed because of adverse weather, industrial action, or mechanical breakdown or derangement’ if ‘an aircraft, sea vessel or train is withdrawn from service on the orders of the recognised regulatory authority in any country’ — which is exactly what’s been happening in Ireland in the face of the Eyjafjallajoekull ash cloud.

That’s pretty useless, isn’t it? I’m considering booking another additional policy to cover the ‘ash case’. Anyone got any tips on single-trip policies that don’t use a similar exclusion?

1 Comment

what Colmcille really said

Mr. Justice Peter Charleton, in the course of his judgement on EMI Records & Ors -v- Eircom Ltd is quoted as having said the following:

‘ There is fundamental right to copyright in Irish Law. This has existed as part of Irish legal tradition since the time of Saint Colmcille. He is often quoted for his aphorism: le gach bó a buinín agus le gach leabhar a chóip (to each cow its calf and to every book its copy).’

As many have already noted, Colmcille didn’t say that at all; his opponent did. If anything, Colmcille invented copyleft.

Manus O’Donnell’s account:

Do inneis Finden a sceila art us don righ, ass ed adubhairt ris: “Do scrib C.C. mo leabhur gan fhis damh fen,”ar se, “aderim corub lim fen mac mo leabhur.”

“Aderim-se,” ar C.C., “nach mesde lebhur Findein ar scrib me ass, nach coir na neiche diadha do bi sa lebhur ud do muchadh no a bacudh dim fein no do duine eli a scribhadh no a leghadh no a siludh fan a cinedachaib; fos aderim ma do bi tarba dam-sa ina scribhadh, corb ail lium a chur a tarba do no poiplechaibh, gan dighbail Fhindein no a lebhair do techt ass, cor cedaigthe dam a scribudh.”

Is ansin ruc Diarmaid an breth oirrdearc .i. “le gach boin a boinin” .i. laugh “le gach lebhur a leabrán.”

Or, translated to English by A. O’ Kelleher and G. Schoepperle:

Finnen first told [High King Diarmaid] his story and he said “Colmcille hath copied my book without my knowing,” saith he “and I contend that the son of the book belongs to me.”

“I contend,” saith Colmcille, “that the book of Finnen is none the worse for my copying it, and it is not right that the divine words in that book should perish, or that I or any other should be hindered from writing them or reading them or spreading them among the tribes. And further I declare that it was right for me to copy it, seeing there was profit to me from doing in this wise, and seeing it was my desire to give the profit thereof to all peoples, with no harm therefore to Finnen or his book.”

Then it was that Diarmaid gave the famous judgement: “To every cow her young cow, that is, her calf, and to every book its transcript. And therefore to Finnen belongeth the book thou hast written, O Colmcille.”

Soon thereafter, of course, 3000 died in the Battle of the Book at Cooldrumman, bringing a rather literal meaning to the modern term “copyfight”. ‘Colmcille and the Battle of the Book: Technology, Law and Access to Knowledge in 6th Century Ireland’ is recommended for more background.

Comments closed

Guinness vs independent breweries

Guinness‘ latest product, Guinness Black Lager, gets a panning in the Irish Times today.

I’m not a fan of Guinness. It’s a good beer, but monotonous when it’s the only thing available. This, from the old Dublin Brewing Company website, makes some interesting allegations as to why that may be the case:

In 1996 the Dublin Brewing Company was set up in Smithfield, in the old James Crean soap factory. As the only other brewery in Dublin to Guiness, Dublin Brewing Company represented a small but real challenge to the Guinness monopoly. Initially [Guinness’] reaction was “it won’t work because, Irish people were brand loyal” and wouldn’t change to anything new.” However by November 1997 Guinness could see an increasing threat from a number of new microbreweries which were opening up around Ireland; it built its own microbrewery called St. James’s Gate Beers. In the words of their Weekly News No: 44 “the four unique and distinctive draught beers are designed to meet perceived demand amongst ale and lager drinkers over the age of 28 for a wider choice of tastier draught beers.”

The project team had spent 18 months conducting exhaustive R & D into the Irish drinking palette before the launch. This research included taking samples of Beckett’s and D’Arcy’s from public houses in Temple Bar and returning it to their citadel of brewing science for further analysis. Just exactly how do those “Fun Lovin Brewers” in Smithfield make beer? The code word for this return to basic brewing was affectionately known among company staff as “Operation Wolf”.

The Dublin Brewing Company, amongst other small breweries was going to be lambs for slaughter. Of course, when you have a virtual monopoly on tap space in most bars, it’s no problem launching no less than four beers in twenty pubs in Dublin overnight. Luckily drinkers in this country know what they want, and if they want a real beer they support the increasing number of microbreweries in Ireland, not a monopoly brewer masquerading as a small producer. The attempt at what was called “full taste” beers turned out to be a disaster. By October 1998 the operation was quietly closed down. However, now that St. James Gate is no more (£3-5m expenditure), we have its latest treat, Breo, being launched with the usual bravado Guinness display on these occasions – 10/15 kegs of beer free for every publican that takes it in. The pub gets the higher number of kegs if they take something else out. As the only other brewery in town, the Dublin Brewing Company is back on the firing line. The Dublin Brewing Company would like to dedicate D’Arcy’s Dublin Stout to the memory of those old Dublin breweries.

Sadly, whether due to Guinness’ tactics or not, the DBC appears to be no more. There are a few microbreweries around Ireland, but generally, the pub taps in this country are dominated by low-quality lagers, and Guinness. At least Paulaner is becoming widely available on tap, imported by Heineken…

1 Comment

spamass-milter != SpamAssassin

Just heading this one off before it gets too much further…

A couple of weeks ago, a researcher found a bug in the spamass-milter project, an open-source milter to integrate SpamAssassin filtering into an MTA. Here’s the exploit details.

This H-Online story covered it:

Security vulnerability in SpamAssassin filter module

The SpamAssassin Milter plug-in which plugs in to Milter and calls SpamAssassin, contains a security vulnerability which can be exploited by attackers using a crafted email to inject and execute code on a mail server. The SpamAssassin Milter plug-in is frequently used to run SpamAssassin on Postfix servers.

(I think this is the source article on Heise.de.)

That was more-or-less accurate — but the problem is the “chinese whispers” effect, where a news story on another site builds on misreadings of another news article. eSecurityPlanet:

Security Flaw Found in SpamAssassin Plug-in

The SpamAssassin Milter plug-in has been found to contain a security vulnerability. […]

sigh.

To clarify: spamass-milter is not a part of SpamAssassin. it’s a third-party product which allows sendmail/postfix users to integrate spamassassin into their message flows as a milter.

Comments closed

SAY2K10 Doh

Happy new year! Or maybe not. Doh.

Over a year ago, Lee Maguire noticed that a contributed SpamAssassin rule, FH_DATE_PAST_20XX, was naively written — simply to match any date in the year 2010 or later — and would start to false-positive on all mail in 14 months. We made the trivial fix to avoid this (for at least 10 years, by which point the rule would have obsoleted itself through normal means), and I committed it to SVN.

Problem solved, right? Nope. I’d committed to trunk, but in a moment of inattention had forgotten to backport the fix to the stable release branch, 3.2.x, as well. Nobody else noticed the mistake, and several months later, boom:

Bugger.

Annoyingly, the GA had assigned this rule 3.5 points in the 3.2.0 rescoring run. This meant that the effective default threshold had been lowered from 5.0 points to 1.5, which produced a 2% false positive rate during the first 13 hours of the new year.

After that point, the fix was pushed to the sa-update channel, and anyone who runs sa-update regularly (as they should!) was brought back to normal filtering behaviour.

The rule is superfluous anyway, since it overlaps with a better-written “eval” rule, DATE_IN_FUTURE_96_XX. Accordingly, most likely scenario is that it’ll be removed.

Personally, I see a few lessons from this:

  • Obviously, I need to pay more attention. This is easier said than done though, since SpamAssassin has nothing to do with my day job anymore; it’s a spare-time thing nowadays, and that’s a rare resource, unfortunately. :( But still, a chastening result, and I’m very sorry for my part in this screwup.

  • We need more active committers on Apache SpamAssassin. If we’d had more eyes, the fact that I’d forgotten to backport the fix might have been spotted. we’re definitely in a better situation now in this regard than we were 6 months ago, so that’s good.

  • IMO, this is a good demonstration of how too many simple rules are risky; without careful vetting and moderation, it’s easy for a bad one to slip past. Perhaps we need to move more towards a DNSBL/network-rule driven approach, although this has its downsides too. Still thinking about this.

  • It’d be good to fix the GA so that it wouldn’t assign such high points to simple rules like this, without some indication that a human has vetted them and believes them trustworthy.

Daryl posted a good comment on /.:

Clearly we dropped the ball on this one. As far as I know it’s our first big rule screw up in the project’s 10 years. If you’re going to screw up you might as well do it well.

+1 to that!

And to everyone who had to clean up the fallout and spend a holiday recovering lost mails from spam folders… sorry :(

4 Comments

Sup Rocks

For the past 2 years or so, I’ve been using GMail to handle my main mail feed for jmason.org. I’m an absolute convert to its “river of threads”/search-based workflow.

Since starting at Amazon, I’ve had to start dealing with a heavy volume of work mail. Previously jobs have either had low mail volumes, or used Google Apps hosting for their mail, but Amazon’s volumes are high and — obviously — they’re not using Google. ;) For a while, I tried using Thunderbird, but it just didn’t really cut it; I could never keep track of mails I wanted archived, or remember which folder they were in, etc. — the same old problems that GMail solved.

Enter Sup. It’s a console-based *nix email client, with a Mutt-like curses interface, which offers something closely approximating the GMail experience:


Sup is a console-based email client for people with a lot of email. It supports tagging, very fast full-text search, automatic contact-list management, custom code insertion via a hook system, and more. If you’re the type of person who treats email as an extension of your long-term memory, Sup is for you.

Inbox Zero is a daily occurrence for my work email now; I can simply archive pretty much everything, and reliably know the excellent full-text search support will allow me to find it again in an instant when I need it. The new-user guide is well worth a read to get an idea of its featureset and UI.

Setting it up

The process of getting it set up is quite hairy; here are some instructions for Ubuntu, which thoroughly failed to work for me on 9.04. I had a similarly tricky time using some Ruby packages on the Red Hat work desktop, but eventually avoided it by just building vanilla Ruby from source, then using that to install “gem” and from that, “sudo gem install sup”. Much easier…

Next step is to get the mail. From some reading, it appears the most reliable way to deal with a MS Exchange 2007 server is to use offlineimap to sync it to a local set of maildirs, then add those as Sup “sources” using sup-add, one by one. This is very well supported in Sup, and works well. Offlineimap is very easy to install on Ubuntu, and can easily be built from source if that’s not an option. My config is pretty much a vanilla copy of the minimal config.

There’s a good Sup hook to run “offlineimap” every poll interval, and rescan synced sources that contain new mail. It works well.

Sup has an interesting approach to mail storage — it doesn’t. Instead, it stores pointers to the messages’ locations in their source storage. This is a great idea, since bugs in Sup therefore cannot lose your mail — just your metadata about your mail. However, it means that if the source changes in a way which moves or removes messages, you need to tell Sup to rescan (using “sup-sync”), but that’s no big deal in practice; in the more usual case, if new mail arrives, it’s automatically rescanned.

I have just under 7000 mail messages in my Sup index, and rescans are speedy and searches super-fast. It’s very nicely done.

Outbound mail is delivered using /usr/sbin/sendmail by default, which should be working on any decent *nix desktop anyway ;)

Recommended Hooks

The Hooks wiki page has a few good hooks that you should install:

  • ~/.sup/hooks/before-poll.rb: the above-mentioned offlineimap poll hook
  • ~/.sup/hooks/mime-decode.rb: ‘uses w3m to translate all HTML attachments that don’t have a text/html alternative.’ Well worth installing.
  • ~/.sup/hooks/before-add-message.rb: essential to filter out cron noise and the like so it doesn’t hit the inbox; unfortunately Sup doesn’t (yet) support GMail’s “filter messages like this” UI.

Bad Points

  • Long URIs: unfortunately, very long URIs are broken by Sup’s renderer, and it doesn’t offer a native way to “activate” URIs and have them displayed in the browser; instead one has to cut and paste them. This is pretty lame. I’ve hacked up a perl script that will reconstruct the full URLs from the broken rendering, when the text is piped to it, but that’s a horrible hack.

  • Index Corruption: I’ve had the misfortune (once, in the month since I started) of corrupting my search index, causing Ruby exception stack traces when I attempted to run “sup-sync” to scan new mail. The only fix appeared to be to restore my index from a “sup-dump” backup. Thankfully all seems fine now, but it was a definite reminder of the product’s beta status.

  • Calendaring: still as painful as it’s ever been with UNIX command line email.

  • HTML: A good-quality, email-oriented, native HTML renderer would be awesome.

  • MIME: Sup again takes the traditional approach from UNIX command line clients of delegating to the mailcap file and its rules; unfortunately my RHEL5 desktop is too crappy to have a good mailcap setup. So I’ve had to write this from scratch to deal with the usual .docs and .xls’s etc., flying about.

  • Inconsistent Key Mapping: Given that it shares so much UI with GMail in other respects, it’s a little annoying that Sup doesn’t have the same key mapping. Not a big deal, as it took only a couple of hours to get the hang of Sup’s, though.

Overall

If you’re happy enough to spend a day or two getting the damn thing installed, and aren’t afraid of a little dalliance with the bleeding edge, I strongly recommend it. It’s definitely the best *NIX mail reader at the moment.

6 Comments

Met iPhone

Irish iPhone users — you may find this useful. I’ve written a web scraper which takes a couple of the more useful pages on Met Eireann’s website — the regional forecast and the rainfall radar page — and reformats them in an iPhone-optimised style. Enjoy:

(updated: supports all the provincial forecasts now)

Comments closed

Lest we forget

Regarding Google Wave’s similarity to Lotus Notes, which is a meme I’ve heard from several angles — David Jones hits the nail on the head:

Well, I used Notes from 1994 to 1999. It did have a database backend for e-mail and a rich collaborative editing model. But it didn’t have realtime shared editing, or instant annotation.

And it was shit. No-one in their right minds would have wanted the future of the web to have been Notes. Even though, and I completely agree, it did things that the web is now only just getting round to.

+1 to that!

4 Comments

n+30 Days

Colm’s “n+1” post reminded me that I’d forgotten to write about this.

On July 27th, I started at Amazon, in a new Dublin-based software dev team working on infrastructure automation. It’s now (just over) a month later, and I’m enjoying it immensely.

Needless to say, this company does some very interesting web-scale technology, and getting to look inside the AWS sausage factory is really enjoyable, believe it or not ;)

(I should also post a pic of my glorious screen real-estate. The hardware is a massive improvement over the previous gig, thankfully.)

Unfortunately, however, this has coincided with a lack of free time to blog and keep up with interweb-based leisure pursuits, including SpamAssassin. Really though, this is more due to looking after two wonderful little girls under 2 years of age, rather than the job — but still, I need to remedy my neglect of this site…

In SpamAssassin news: we’ve been putting out some alpha releases of 3.3.0, and are planning to do a mass-check for score-generation in the next couple of days. Hopefully we can drive 3.3.0 to a GA release in a few weeks.

Also — we’re still looking for more people in the Amazon team, and hiring aggressively. If you’re looking for an interesting software dev role in Dublin, get in touch!

PS: it was Bea’s second birthday last weekend. Check out the awesome Very Hungry Caterpillar cupcake cake made by the missus for the occasion:

9 Comments

Embedded software development

Found in an Ivan Krstic post about Sugar and the OLPC:

In truth, the XO ships a pretty shitty operating system, and this fact has very little to do with Sugar the GUI. It has a lot to do with the choice of incompetent hardware vendors that provided half-assedly built, unsupported and unsupportable components with broken closed-source firmware blobs that OLPC could neither examine nor fix. […]

We had an embedded controller that blocks keyboard events and stops machine suspend, and to which we — after a long battle — received the source, under strict NDA, only to find a jungle of nested if statements, twelve levels deep, and no code history. (The company that wrote the code doesn’t use version control, see. They put dates into code comments when they make changes, and the developers mail each other zip files with new versions.)

Haha. Been there, done that. Sometimes it’s great not to have to work with custom hardware anymore…

3 Comments

YA link-blog aggregator

Alex Payne writing about “Fever”, a new link-blog aggregator app:

Fever’s proposition is straightforward: supply it with the feeds you always want to read, and supplement those with feeds that you only want to read the juicy bits of. Fever will then show you a sort of personal Techmeme or Google News, pulling together stories that reference common URLs.

Fever is commercial software, costing $30. Alternatively, I’ve been doing something very similar for the past few years using SpicyLinks, which is free (if a great deal less pretty on the UI end).

It’s nice to see the idea getting some polish, though. ;)

Alex does raise an interesting point towards the end:

Fever is just fine for floating good techie content to the top, but poor for most any other subject. I’d love it if Fever could find me good posts from the set of minimal techno or cocktail blogs I subscribe to, but link blogs — and, indeed, linking outside one’s own site — just aren’t as prevalent in those communities.

True.

Comments closed

Eircom’s “DDOS”, or not

I woke up this morning to hear speculation on RTE Radio as to how Eircom’s DDOS woes were possibly being caused by the Russian mob, of all things. This absurd speculation is not helped by lines in statements like this:

‘The company blamed the problems on “an unusual and irregular volume of internet traffic” directed at its website, which affected the systems and servers that provide access to the internet for its customers.’

I’m speculating, too, but it seems a lot more likely to me that this isn’t just a DDOS, and someone — possibly just a lone Irish teenager — is running an attempted DNS cache-poisoning attack. Here’s why.

Last week, there were two features of the attack in reports: DDOS levels of traffic and incorrect pages coming up for some popular websites. To operate a Kaminsky DNS cache-poisoning attack requires buckets of packets — easily perceivable as DDOS levels. This level of traffic would be the first noticeable symptom on Eircom’s network management consoles, so it’d be easy to jump to the conclusion that a simple DDOS attack was the root cause.

This week, there’s just the DDOS levels of traffic. No cache poisoning effects have been reported. This would be consistent with Eircom’s engineers getting the finger out over the weekend, and upgrading the NSes to a non-vulnerable version. ;)

Once the attacker(s) realise this, they’ll probably stop the attack.

It’s not even a good attack for a bad guy to make, by the way. Given the timing, right after major press about a North Korean DDOS on US servers. it’s extremely high-profile, and made the news in several national newspapers (albeit in rather inept fashion). If someone wanted to make money from an attack, a massive-scale packet flood indistinguishable from a DDOS against the nation’s largest ISP is not exactly a subtle way to do it.

In the meantime, apparently OpenDNS have really seen the effects, with mass switchover of Eircom’s customers to the OpenDNS resolvers. Probably just as well…

11 Comments