June 17, 2013 - Justin Mason's Weblog

Atelier olschinsky – “Cities III 05”

Fine Art Print on Hahnemuehle Photo Rag Bright White, 310g: 40x50cm up to 70x100cm. Some great art based on decayed urban landscape shots, from a Vienna-based design studio. See also http://english.mashkulture.net/2011/10/17/atelier-olschinsky-cities-iii/ , http://www.mascontext.com/tag/atelier-olschinsky/

(tags: olschinsky cities urban decay landscape art prints want)
Possible ban on ‘factory food’ in French restaurants

I am very much in favour of this in Ireland, too. The pre-prepared food thing makes for crappy food:
In an attempt to crack down on the proliferation of restaurants serving boil-in-a-bag or microwave-ready meals, which could harm France’s reputation for good food, MP Daniel Fasquelle is putting a new law to parliament this month. […] The proposed law would limit the right to use the word “restaurant” to eateries where food is prepared on site using raw ingredients, either fresh or frozen. Exceptions would be made for some prepared products, such as bread, charcuterie and ice cream.

(tags: restaurants food france cuisine boil-in-the-bag microwave cooking daniel-fasquelle)
On Scala

great, comprehensive review of the language, its pros and misfeatures, from Bill de hOra

(tags: scala languages coding fp reviews)
Introducing Kale « Code as Craft

Etsy have implemented a tool to perform auto-correlation of service metrics, and detection of deviation from historic norms:
at Etsy, we really love to make graphs. We graph everything! Anywhere we can slap a StatsD call, we do. As a result, we’ve found ourselves with over a quarter million distinct metrics. That’s far too many graphs for a team of 150 engineers to watch all day long! And even if you group metrics into dashboards, that’s still an awful lot of dashboards if you want complete coverage. Of course, if a graph isn’t being watched, it might misbehave and no one would know about it. And even if someone caught it, lots of other graphs might be misbehaving in similar ways, and chances are low that folks would make the connection. We’d like to introduce you to the Kale stack, which is our attempt to fix both of these problems. It consists of two parts: Skyline and Oculus. We first use Skyline to detect anomalous metrics. Then, we search for that metric in Oculus, to see if any other metrics look similar. At that point, we can make an informed diagnosis and hopefully fix the problem.
It’ll be interesting to see if they can get this working well. I’ve found it can be tricky to get working with low false positives, without massive volume to “smooth out” spikes caused by normal activity. Amazon had one particularly successful version driving severity-1 order drop alarms, but it used massive event volumes and still had periodic false positives. Skyline looks like it will alarm on a single anomalous data point, and in the comments Abe notes “our algorithms err on the side of noise and so alerting would be very noisy.”

(tags: etsy monitoring service-metrics alarming deviation correlation data search graphs oculus skyline kale false-positives)
Paper: “Root Cause Detection in a Service-Oriented Architecture” [pdf]

LinkedIn have implemented an automated root-cause detection system:
This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to ?nd the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean average precision in ?nding root causes compared to baseline and current state-of-the-art methods.
This is a topic close to my heart after working on something similar for 3 years in Amazon! Looks interesting, although (a) I would have liked to see more case studies and examples of “real world” outages it helped with; and (b) it’s very much a machine-learning paper rather than a systems one, and there is no discussion of fault tolerance in the design of the detection system, which would leave me worried that in the case of a large-scale outage event, the system itself will disappear when its help is most vital. (This was a major design influence on our team’s work.) Overall, particularly given those 2 issues, I suspect it’s not in production yet. Ours certainly was ;)

(tags: linkedin soa root-cause alarming correlation service-metrics machine-learning graphs monitoring)
Announcing Zuul: Edge Service in the Cloud

Netflix’ library to implement “edge services” — ie. a front end to their API, web servers, and streaming servers. Some interesting features: dynamic filtering using Groovy scripts; Hystrix for software load balancing, fault tolerance, and error handling for originated HTTP requests; fine-grained service metrics; Archaius for configuration; and canary requests to detect overload risks. Pretty complex though

(tags: edge-services api netflix zuul archaius canary-requests http groovy hystrix load-balancing fault-tolerance error-handling configuration)
CloudFlare, PRISM, and Securing SSL Ciphers

Matthew Prince of CloudFlare has an interesting theory on the NSA’s capabilities:
It is not inconceivable that the NSA has data centers full of specialized hardware optimized for SSL key breaking. According to data shared with us from a survey of SSL keys used by various websites, the majority of web companies were using 1024-bit SSL ciphers and RSA-based encryption through 2012. Given enough specialized hardware, it is within the realm of possibility that the NSA could within a reasonable period of time reverse engineer 1024-bit SSL keys for certain web companies. If they’d been recording the traffic to these web companies, they could then use the broken key to go back and decrypt all the transactions. While this seems like a compelling theory, ultimately, we remain skeptical this is how the PRISM program described in the slides actually works. Cracking 1024-bit keys would be a big deal and likely involve some cutting-edge cryptography and computational power, even for the NSA. The largest SSL key that is known to have been broken to date is 768 bits long. While that was 4 years ago, and the NSA undoubtedly has some of the best cryptographers in the world, it’s still a considerable distance from 768 bits to 1024 bits — especially given the slide suggests Microsoft’s key would have to had been broken back in 2007. Moreover, the slide showing the dates on which “collection began” for various companies also puts the cost of the program at $20M/year. That may sound like a lot of money, but it is not for an undertaking like this. Just the power necessary to run the server farm needed to break a 1024-bit key would likely cost in excess of $20M/year. While the NSA may have broken 1024-bit SSL keys as part of some other program, if the slide is accurate and complete, we think it’s highly unlikely they did so as part of the PRISM program. A not particularly glamorous alternative theory is that the NSA didn’t break the SSL key but instead just cajoled rogue employees at firms with access to the private keys — whether the companies themselves, partners they’d shared the keys with, or the certificate authorities who issued the keys in the first place — to turn them over. That very well may be possible on a budget of $20M/year. [….] Google is a notable anomaly. The company uses a 1024-bit key, but, unlike all the other companies listed above, rather than using a default cipher suite based on the RSA encryption algorithm, they instead prefer the Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) cipher suites. Without going into the technical details, a key difference of ECDHE is that they use a different private key for each user’s session. This means that if the NSA, or anyone else, is recording encrypted traffic, they cannot break one private key and read all historical transactions with Google. The NSA would have to break the private key generated for each session, which, in Google’s case, is unique to each user and regenerated for each user at least every 28-hours. While ECDHE arguably already puts Google at the head of the pack for web transaction security, to further augment security Google has publicly announced that they will be increasing their key length to 2048-bit by the end of 2013. Assuming the company continues to prefer the ECDHE cipher suites, this will put Google at the cutting edge of web transaction security.
2048-bit ECDHE sounds like the way to go, and CloudFlare now support that too.

(tags: prism security nsa cloudflare ssl tls ecdhe elliptic-curve crypto rsa key-lengths)
Record companies to target 20 more pirate sites after court ruling – Independent.ie

Looks like IRMA are following the lead of the UK’s BPI, by chasing the proxy sites next:
Up to 20 internet sites are to be targeted by an organisation representing record companies in a move to stamp out the illegal pirating of music and other copyright material. The Irish Recorded Music Association (IRMA) said it would be immediately moving against the 20 “worst offenders” to “take out” internet sites involved in the illegal downloading of copyright work.
However, looks like this will involve more court time:
Last night IRMA director general, Dick Doyle said the High Court ruling was only the first step in “taking out many internet sites involved in illegally downloading music. “We will be back in court very shortly to take out five to 10 other sites. We have already selected a total of 20 of the worst offender sites and we will go after the next five in the very near future,” he said.
That’s not going to be cheap!

(tags: courts ireland law irma piracy pirate-bay bpi proxies filesharing copyright)
Building a Modern Website for Scale (QCon NY 2013) [slides]

some great scalability ideas from LinkedIn. Particularly interesting are the best practices suggested for scaling web services: 1. store client-call timeouts and SLAs in Zookeeper for each REST endpoint; 2. isolate backend calls using async/threadpools; 3. cancel work on failures; 4. avoid sending requests to GC’ing hosts; 5. rate limits on the server. #4 is particularly cool. They do this using a “GC scout” request before every “real” request; a cheap TCP request to a dedicated “scout” Netty port, which replies near-instantly. If it comes back with a 1-packet response within 1 millisecond, send the real request, else fail over immediately to the next host in the failover set. There’s still a potential race condition where the “GC scout” can be achieved quickly, then a GC starts just before the “real” request is issued. But the incidence of GC-blocking-request is probably massively reduced. It also helps against packet loss on the rack or server host, since packet loss will cause the drop of one of the TCP packets, and the TCP retransmit timeout will certainly be higher than 1ms, causing the deadline to be missed. (UDP would probably work just as well, for this reason.) However, in the case of packet loss in the client’s network vicinity, it will be vital to still attempt to send the request to the final host in the failover set regardless of a GC-scout failure, otherwise all requests may be skipped. The GC-scout system also helps balance request load off heavily-loaded hosts, or hosts with poor performance for other reasons; they’ll fail to achieve their 1 msec deadline and the request will be shunted off elsewhere. For service APIs with real low-latency requirements, this is a great idea.

(tags: gc-scout gc java scaling scalability linkedin qcon async threadpools rest slas timeouts networking distcomp netty tcp udp failover fault-tolerance packet-loss)
Why I won’t give the European Parliament the data protection analysis it wanted

Holy crap. Simon Davies rips into the EU data-protection reform disaster with gusto:
The situation was an utter disgrace. The advertising industry even gave an award to an Irish Minister for destroying some of the rights in the regulation while the UK managed to force a provision that would make the direct marketing industry a “legitimate” processing operation in its own right, putting it on the same level of lawful processing as fraud prevention. Things got to the point where even the most senior data protection officials in Europe stopped trying to influence events and had told me “let the chips fall as they may”. […] But let’s take a step back for a moment from this travesty. Out on the streets – while most may not know what data protection is – people certainly know what it is supposed to protect. People value their privacy and they will be vocal about attempts to destroy it. I had said as much to the joint parliamentary meeting, observing “the one element that has been left out of all these efforts is the public”. However, as the months rolled on, the only message being sent to the public was that data protection is an anachronism stitched together with self interest and impracticality. […] I wasn’t aware at the time that there was a vast stitch-up to kill the reforms. I cannot bring myself to present a temperate report with measured wording that pretends this is all just normal business. It isn’t normal business, and it should never be normal business in any civilized society. How does one talk in measured tones about such endemic hypocrisy and deception? If you want to know who the real enemy of privacy is, don’t just look to the American agencies. The real enemy is right here in the European Parliament in the guise of MEPs who have knowingly sold our rights away to maintain powerful relationships. I’d like to say they were merely hoodwinked into supporting the vandalism, but many are smart people who knew exactly what they were doing.
Nice work, Irish presidency! His bottom line:
Is there a way forward? I believe so. First, governments should yield to common decency and scrap the illegitimate and poisoned Irish Council draft and hand the task to the Lithuanian Presidency that commences next month. Second, the Irish and British governments should be infinitely more transparent about their cooperation with intrusive interests that fuelled the deception.

(tags: ireland eu europe reform law data-protection privacy simon-davies meps iab)
Persuading David Simon (Pinboard Blog)

Maciej Ceglowski with a strongly-argued rebuttal of David Simon’s post about the NSA’s PRISM. This point in particular is key:
The point is, you don’t need human investigators to find leads, you can have the algorithms do it [based on the call graph or network of who-calls-who]. They will find people of interest, assemble the watch lists, and flag whomever you like for further tracking. And since the number of actual terrorists is very, very, very small, the output of these algorithms will consist overwhelmingly of false positives.

(tags: false-positives maciej privacy security nsa prism david-simon accuracy big-data filtering anti-spam)
Schneier on Security: Blowback from the NSA Surveillance

Unintended consequences on US-focused governance of the internet and cloud computing:
Writing about the new Internet nationalism, I talked about the ITU meeting in Dubai last fall, and the attempt of some countries to wrest control of the Internet from the US. That movement just got a huge PR boost. Now, when countries like Russia and Iran say the US is simply too untrustworthy to manage the Internet, no one will be able to argue. We can’t fight for Internet freedom around the world, then turn around and destroy it back home. Even if we don’t see the contradiction, the rest of the world does.

(tags: internet freedom cloud-computing amazon google hosting usa us-politics prism nsa surveillance)

Comments closed

Archives

Links for 2013-06-17