Stephanie Dean on Amazon’s approach to CMs. This is solid gold advice for any company planning to institute a sensible technical change management process
I asked around my ex-Amazon mates on twitter about good docs on incident response practices outside the “iron curtain”, and they pointed me at this blog (which I didn’t realise existed). Stephanie Dean was the front-line ops manager for Amazon for many years, over the time where they basically *fixed* their availability problems. She since moved on to Facebook, Demonware, and Twitter. She really knows her stuff and this blog is FULL of great details of how they ran (and still run) front-line ops teams in Amazon.
Carlos Baquero presents several operation, state-based CRDTs for use in AP systems like Voldemort and Riak
Applications can saturate – i.e. become unable to serve users in a timely manner. Some users may experience high latencies, while others may not receive any service at all. The authors argue that it is better to downgrade the user experience and continue serving a larger number of clients with reasonable latency. “We define a cloud application as brownout compliant if it can gradually downgrade user experience to avoid saturation.” This is actually very reminiscent of circuit breakers, as described in Nygard’s ‘Release It!’ and popularized by Netflix. If you’re already designing with circuit breakers, you’ve probably got all the pieces you need to add brownout support to your application relatively easily. “Our work borrows from the concept of brownout in electrical grids. Brownouts are an intentional voltage drop often used to prevent blackouts through load reduction in case of emergency. In such a situation, incandescent light bulbs dim, hence originating the term.” “To lower the maintenance effort, brownouts should be automatically triggered. This enables cloud applications to rapidly and robustly avoid saturation due to unexpected environmental changes, lowering the burden on human operators.”This is really similar to the Circuit Breaker pattern — in fact it feels to me like a variation on that, driven by measured latencies of operations/requests. See also http://blog.acolyer.org/2014/10/27/improving-cloud-service-resilience-using-brownout-aware-load-balancing/ .
“Slow-motion Chernobyl”, as Greenpeace are calling it. You thought legacy code was a problem? try legacy Magnox fuel rods.
Previously unseen pictures of two storage ponds containing hundreds of highly radioactive fuel rods at the Sellafield nuclear plant show cracked concrete, seagulls bathing in the water and weeds growing around derelict machinery. But a spokesman for owners Sellafield Ltd said the 60-year-old ponds will not be cleaned up for decades, despite concern that they are in a dangerous state and could cause a large release of radioactive material if they are allowed to deteriorate further. “The concrete is in dreadful condition, degraded and fractured, and if the ponds drain, the Magnox fuel will ignite and that would lead to a massive release of radioactive material,” nuclear safety expert John Large told the Ecologist magazine. “I am very disturbed at the run-down condition of the structures and support services. In my opinion there is a significant risk that the system could fail.
An interview with Richard Bartle, the creator of MUD, back in 1978.
Perceiving the different ways in which players approached the game led Bartle to consider whether MMO players could be classified according to type. “A group of admins was having an argument about what people wanted out of a MUD in about 1990,” he recalls. “This began a 200-long email chain over a period of six months. Eventually I went through everybody’s answers and categorised them. I discovered there were four types of MMO player. I published some short versions of them then, when the journal of MUD research came out I wrote it up as a paper.” The so-called Bartle test, which classifies MMO players as Achievers, Explorers, Socialisers or Killers (or a mixture thereof) according to their play-style remains in widespread use today. Bartle believes that you need a healthy mix of all dominant types in order to maintain a successful MMO ecosystem. “If you have a game full of Achievers (players for whom advancement through a game is the primary goal) the people who arrive at the bottom level won’t continue to play because everyone is better than them,” he explains. “This removes the bottom tier and, over time, all of the bottom tiers leave through irritation. But if you have Socialisers in the mix they don’t care about levelling up and all of that. So the lowest Achievers can look down on the Socialisers and the Socialisers don’t care. If you’re just making the game for Achievers it will corrode from the bottom. All MMOs have this insulating layer, even if the developers don’t understand why it’s there.”
Redis uses forking to perform persistence flushes, which means that once every 30 minutes it performs like crap (and kills the 99th percentile latency). Given this, various Redis people have been benchmarking fork() times on various Xen platforms, since Xen has a crappy fork() implementation
Jay Rosen interviews his 17-year-old daughter. it’s pretty eye-opening. Got to start them early!
Carbon is a great idea, but fundamentally, twisted doesn’t do what carbon-relay or carbon-aggregator were built to do when hit with sustained and heavy throughput. Much to my chagrin, concurrency isn’t one of python’s core competencies.+1, sadly. We are patching around the edges with half-released third-party C rewrites in our graphite setup, as we exceed the scale Carbon can support.
MOST of the page view attempts will experience the 99%’lie server response time in modern web applications. You didn’t read that wrong.
The next Pub Standards, on Thursday 13th November, will be the last one. When I started Pub Standards in August 2010, there wasn’t very many meetups for people who build apps, interfaces and businesses. These days, there are loads! I don’t feel that Pub Standards is needed anymore. It served it’s purpose — other meetups were formed, startups were founded, projects were created and people got hired. We had a good run :)
$50 print (plus shipping of course), 16″ x 16″
Good, thought-provoking post on good client library approaches for complex client-server systems, particularly distributed stores like Voldemort or Riak. I’m of the opinion that a smart client lib is unavoidable, and in fact essential, since the clients are part of the distributed system, personally.
Dublin had its own time zone, 25 minutes off what would become GMT, until 1916
a Riak-based clone of Roshi, the CRDT server built on top of Redis. some day I’ll write up the CRDT we use on top of Voldemort in $work. Comments: https://lobste.rs/s/tim5xc
‘Verizon Wireless is monitoring users’ mobile internet traffic, using a token slapped onto web requests, to facilitate targeted advertising even if a user has opted out. The unique identifier token header (UIDH) was launched two years ago, and has caused an uproar in tech circles after it was re-discovered Thursday by Electronic Frontier Foundation staffer Jacob Hoffman-Andrews. The Relevant Mobile Advertising program, under which the UIDH was used, allowed a restaurant to advertised to locals only or for retail websites to promote to previous visitors, according to Verizon Wireless.’
‘In many networking systems, Bloom filters are used for high-speed set membership tests. They permit a small fraction of false positive answers with very good space efficiency. However, they do not permit deletion of items from the set, and previous attempts to extend “standard” Bloom filters to support deletion all degrade either space or performance. We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set member- ship tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters out-perform previous data structures that extend Bloom filters to support deletions substantially in both time and space.’
This has _already_ been used to trump national law. As Simon McGarr noted at https://twitter.com/Tupp_Ed/statuses/526103760041680898 : ‘Philip Morris initiated a dispute under the Australia-Hong Kong Bilateral Investment Treaty to force #plainpacks repeal and compensation’. “Plain packs” anti-smoking is being bitterly fought at the moment here in Ireland. More from the US point of view: http://www.washingtonpost.com/opinions/harold-meyerson-allowing-foreign-firms-to-sue-nations-hurts-trade-deals/2014/10/01/4b3725b0-4964-11e4-891d-713f052086a0_story.html : ‘The Obama administration’s insistence on ISDS may please Wall Street, but it threatens to undermine some of the president’s landmark achievements in curbing pollution and fighting global warming, not to mention his commitment to a single standard of justice. It’s not worthy of the president, and he should join Europe in scrapping it.’
We’ve started running game day exercises at Stripe. During a recent game day, we tested failing over a Redis cluster by running kill -9 on its primary node, and ended up losing all data in the cluster. We were very surprised by this, but grateful to have found the problem in testing. This result and others from this exercise convinced us that game days like these are quite valuable, and we would highly recommend them for others.Excellent post. Game days are a great idea. Also: massive Redis clustering fail
“Everybody hits the wall, generally between three and five months,” says a former YouTube content moderator I’ll call Rob. “You just think, ‘Holy shit, what am I spending my day doing? This is awful.’”
Perhaps simply by the virtue of being a part of that bundle, the strings utility tries to leverage the common libbfd infrastructure to detect supported executable formats and “optimize” the process by extracting text only from specific sections of the file. Unfortunately, the underlying library can be hardly described as safe: a quick pass with afl (and probably with any other competent fuzzer) quickly reveals a range of troubling and likely exploitable out-of-bounds crashes due to very limited range checking
This Java library can route paths to targets and create paths from targets and params (reverse routing). This library is tiny, without additional dependencies, and is intended for use together with an HTTP server side library. If you want to use with Netty, see netty-router.
classic replication paper, via aphyr: ‘This paper presents an updated version of Viewstamped Replication, a replication technique that handles failures in which nodes crash. It describes how client requests are handled, how the group reorganizes when a replica fails, and how a failed replica is able to rejoin the group. The paper also describes a number of important optimizations and presents a protocol for handling reconfigurations that can change both the group membership and the number of failures the group is able to handle.’
Holy shit we are living in the future.
BioBrick parts are DNA sequences which conform to a restriction-enzyme assembly standard. These Lego-like building blocks are used to design and assemble synthetic biological circuits, which would then be incorporated into living cells such as Escherichia coli cells to construct new biological systems. Examples of BioBrick parts include promoters, ribosomal binding sites (RBS), coding sequences and terminators.(via Soren)
I have to agree with this assessment — there are a lot of loose ends still for production use of Docker in a SOA stack environment:
From my point of view, Docker is probably the best thing I’ve seen in ages to automate a build. It allows to pre build and reuse shared dependencies, ensuring they’re up to date and reducing your build time. It avoids you to either pollute your Jenkins environment or boot a costly and slow Virtualbox virtual machine using Vagrant. But I don’t feel like it’s production ready in a complex environment, because it adds too much complexity. And I’m not even sure that’s what it was designed for.
This is a very solid benchmarking post, examining Kafka in good detail. Nicely done. Bottom line:
I basically spend 2/3 of my work time torture testing and operationalizing distributed systems in production. There’s some that I’m not so pleased with (posts pending in draft forever) and some that have attributes that I really love. Kafka is one of those systems that I pretty much enjoy every bit of, and the fact that it performs predictably well is only a symptom of the reason and not the reason itself: the authors really know what they’re doing. Nothing about this software is an accident. Performance, everything in this post, is only a fraction of what’s important to me and what matters when you run these systems for real. Kafka represents everything I think good distributed systems are about: that thorough and explicit design decisions win.
wow, an actually quite-good cheapo Android tablet from Tesco for UKP65 of Clubcard vouchers, recommended by conoro. Good for the kids
Major improvements for Kafka consistency coming in 0.8.2; replication to multiple in-sync replicas, controlled by a new “min.isr” setting
I have repeatedly been confounded to discover just how many mistakes in both test and application code stem from misunderstandings or misconceptions about time. By this I mean both the interesting way in which computers handle time, and the fundamental gotchas inherent in how we humans have constructed our calendar — daylight savings being just the tip of the iceberg. In fact I have seen so many of these misconceptions crop up in other people’s (and my own) programs that I thought it would be worthwhile to collect a list of the more common problems here.See also the follow-up: http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time-wisdom (via Marc)
What an utter fuckup. Business as usual for Irish Water:
However the spokeswoman said application packs for rented dwellings would be addressed to the landlord, at the landlord’s residence, and it would be the landlord’s responsibility to ensure the tenant received the application pack. Bills are to be issued quarterly, but as Irish Water will have the tenant’s PPS number, the utility firm will be able to pursue the tenant for any arrears and even apply any arrears to new accounts, when the tenant moves to a new address. Last week landlords had expressed concern over potential arrears, the liability for them and the possibility of being used as collection agents by Irish Water.
ugh, what a mess….
* Every rental unit in the State is to get a pack addressed personally to the occupant. If Irish Water does not have details of a tenant, the pack will be addressed to ‘The Occupier’ * Packs will only be issued to individual rental properties in so far as Irish Water is aware of them * Landlords can contact Irish Water to advise they have let a property * Application Packs are issued relative to the information on the Irish Water mailing list. If this is incorrect or out of date, landlords can contact Irish Water to have the information adjusted *Irish Water will contact known landlords after the initial customer application campaign, to advise of properties for which no application has been received * Irish Water said that when a household is occupied the tenant is liable and when vacant the owner is liable. Both should advise Irish Water of change of status to the property – the tenant to cease liability, the landlord to take it up. Either party may take a reading and provide it to Irish Water, alternatively Irish Water will bill on average consumption, based on the date of change.
Like, say, the Christian right, which came together through the social media of its day — little-watched television broadcasts, church bulletins, newsletters—or the Tea Party, which found its way through self-selection on social media and through back channels, Gamergate, in the main, comprises an assortment of agitators who sense which way the winds are blowing and feel left out. It has found a mobilizing event, elicited response from the established press, and run a successful enough public relations campaign that it’s begun attracting visible advocates who agree with the broad talking points and respectful-enough coverage from the mainstream press. If there is a ground war being waged, as the movement’s increasingly militaristic rhetoric suggests, Gamergate is fighting largely unopposed. A more important resemblance to the Tea Party, though, is in the way in which it’s focused the anger of people who realize the world is changing, and not necessarily to their benefit.
There are several reasons that the ID cards have proved so easy to steal: Identity numbers started to be issued in the 1960s and still follow the same pattern. The first few digits are the user’s birth date, followed by either a one for male or two for female; Their usage across different sectors makes them master keys for hackers, say experts; If details are leaked, citizens are unable to change themvia Tony Finch.
looks great, around the corner from Cineworld on King’s Inn St, D1
with Neil McKenzie, Nov 9-16 2014, in the National History Museum in Dublin: ‘These six helmets/viewing devices start off by exploring physical conditions of viewing: if we have two eyes, they why is our vision so limited? Why do we have so little perception of depth? Why don’t our two eyes offer us two different, complementary views of the world around us? Why can’t they extend from our body so we can see over or around things? Why don’t they allow us to look behind and in front at the same time, or sideways in both directions? Why can’t our two eyes simultaneously focus on two different tasks? Looking through Michael Land’s defining work Animal Eyes, we see that nature has indeed explored all of these possibilities: a Hammerhead Shark has hyper-stereo vision; a horse sees 350° around itself; a chameleon has separately rotatable eyes… The series of Meta-Perceptual Helmets do indeed explore these zoological typologies: proposing to humans the hyper-stereo vision of the hammerhead shark; or the wide peripheral vision of the horse; or the backward/forward vision of the chameleon… but they also take us into the unnatural world of mythology and literature: the Cheshire Cat Helmet is so called because of the strange lingering effect of dominating visual information such as a smile or the eyes; the Cyclops allows one large central eye to take in the world around while a second tiny hidden eye focuses on a close up task (why has the creature never evolved that can focus on denitting without constantly having to glance around?).’ (via Emma)
The figures show that, between 2004 and 2013, an average of 71.7 per cent of students at TCD graduated with either a 1st or a 2.1. DCU and UCC had the next highest rate of such awards (64.3 per cent and 64.2 per cent respectively), followed by UCD (55.8 per cent), NUI Galway (54.7 per cent), Maynooth University (53.7 per cent) and University of Limerick (50.2 per cent).
Last year we interviewed Oleg Moskalenko and presented the rfc5766-turn-server project, which is a free open source and extremely popular implementation of TURN and STURN server. A few months later we even discovered Amazon is using this project to power its Mayday service. Since then, a number of features beyond the original RFC 5766 have been defined at the IETF and a new open-source project was born: the coTURN project.
Today we are publishing details of a vulnerability in the design of SSL version 3.0. This vulnerability allows the plaintext of secure connections to be calculated by a network attacker.ouch.
It’s been a while since I wrote a long-form blog post here, but this post on the Swrve Engineering blog is worth a read; it describes how we use SSD caching on our EC2 instances to greatly improve EBS throughput.
“O Cormac, grandson of Conn”, said Carbery, “What is the worst pleading and arguing?” “Not hard to tell”, said Cormac. “Contending against knowledge, contending without proofs, taking refuge in bad language, a stiff delivery, a muttering speech, hair-splitting, uncertain proofs, despising books, turning against custom, shifting one’s pleading, inciting the mob, blowing one’s own trumpet, shouting at the top of one’s voice.”
a simple, lightweight HTTP server for storing and distributing custom Debian packages around your organisation. It is designed to make it as easy as possible to use Debian packages for code deployments and to ease other system administration tasks.
ZDNet’s Steven J. Vaughan-Nichols on the systemd mess (via Kragen)
Criminal complaints have been filed in the UK against Gamma “acting as an accessory to Bahrain’s illegal targeting of activists” using the FinFisher spyware
Meritocracy is a myth. And our belief in it is holding back the tech industry from getting better.
“It’s completely insane. It’s insane that you even have to say out loud that sending death threats to people who disagree with your opinion of video games is wrong. Yet here we are: Apparently, it needs to be said.”
#Gamergate, as they have treated myself and peers in our industry, is a hate group. This word, again, should not lend them any mystique or credence. Rather it should illuminate the fact that even the most nebulous and inconsistent ideas can proliferate wildly if strung onto the organizational framework of the hate group, which additionally gains a startling amount of power online. #Gamergate is a hate group, and they are all the more dismissible for it. And the longer we treat them otherwise, the longer I fear for our industry’s growth.
A group representing frontline emergency staff has warned lives will be lost unless the Government reverses its decision on a new national postcode system due to be rolled out next spring. John Kidd, chairman of the Irish Fire and Emergency Services Association, said the “mainly random nature” of the Eircode system would mean errors by users would go unnoticed, as well as cause confusion and may be “catastrophic” in terms of sending services to the wrong location. [....] Neil McDonnell, general manager of the Freight Transport Association Ireland, said he understood Mr Kidd’s concerns. “Take, for example, two adjacent houses in Glasnevin, Dublin,” said Mr McDonnell. “One could be D11 ZXQ8, the other one D11 67TR. The four-character unique identifier is completely random, with no sequence or algorithm linking one house to the other.”
Two types of people own homes in Vancouver?—?wealthy foreigners who are looking for a place to park their money, and long-time Vancouver residents who have benefited from skyrocketing equity, through no actual effort of their own. There is a simple problem with these people being the primary homeowners in any city?—?they don’t actually create much value for the place they live in. A very large percentage of wealthy foreigners who “park” their money here don’t actually live in Vancouver. Take a drive around most expensive areas and you’ll realize the homes are empty. At most, they send their kids to live in Vancouver, learn english/go to school, and then return to their country (usually to Hong Kong). For some reason this is okay with people who live here. The amount of value added to a city from this sort of activity approaches zero. In fact, I’d argue that these people actually leech off of the system more than anything else.
Welp, that’s the end of my reading The Escapist. this is fucked up. ‘these people say that this is a hate movement, but let’s see what these white supremacists and serial harassers have to say’
hmason: TIL that the phrase software “patch” is from a physical patch applied to Mark 1 paper tape to modify the program.It’s amazing how a term like that can become so divorced from its original meaning so effectively. History!
I analyzed several chunks of The Ultimate Player’s Guide using the Flesch-Kincaid Reading Ease scale, and they scored from grade 8 to grade 11. Yet in my neighborhood they’re being devoured by kids in the early phases of elementary school. Games, it seems, can motivate kids to read—and to read way above their level. This is what Constance Steinkuehler, a games researcher at the University of Wisconsin-Madison, discovered. She asked middle and high school students who were struggling readers (one 11th-grade student read at a 6th-grade level) to choose a game topic they were interested in, and then she picked texts from game sites for them to read—some as difficult as first-year-college language. The kids devoured them with no help and nearly perfect accuracy. How could they do this? “Because they’re really, really motivated,” Steinkuehler tells me. It wasn’t just that the students knew the domain well; there were plenty of unfamiliar words. But they persisted more because they cared about the task. “It’s situated knowledge. They see a piece of language, a turn of phrase, and they figure it out.”When my kids are playing Minecraft, there’s a constant stream of “how do you spell X?” as they craft nametags for their pets. It’s great!
Niall Heery belatedly follows up Small Engine Repair, his 2006 mumblecore critical hit, with a slightly less off-centre comedy that makes imaginative use of a smashing cast. The story skirts tragedy on its leisurely passage from mishap to misadventure, but Gold remains the sort of picture you want to hug indulgently to a welcoming bosom. It gives humanism a good name.Go Niall! it’s a great movie, go see it
web service API for Dublin Bikes data (and other similar bikesharing services run by JCD):
Two kinds of data are delivered by the platform: Static data provides stable information like station position, number of bike stands, payment terminal availability, etc. Dynamic data provides station state, number of available bikes, number of free bike stands, etc. Static data can be downloaded manually in file format or accessed through the API. Dynamic data are refreshed every minute and can be accessed only through the API.Ruby API: https://github.com/oisin/bikes
my coworker JK’s favourite games of 2013: Gone Home, Last Of Us, Proteus, Papers Please etc. I really want to play these, since they’re all totally my bag too.
Amazon has perhaps 1% of the US retail market by value. Should it stop entering new categories and markets and instead take profit, and by extension leave those segments and markets for other companies? Or should it keep investing to sweep them into the platform? Jeff Bezos’s view is pretty clear: keep investing, because to take profit out of the business would be to waste the opportunity. He seems very happy to keep seizing new opportunities, creating new businesses, and using every last penny to do it.
Massive improvement over plain old Hadoop. This blog post goes into really solid techie reasons why, including:
First and foremost, in Spark 1.1 we introduced a new shuffle implementation called sort-based shuffle (SPARK-2045). The previous Spark shuffle implementation was hash-based that required maintaining P (the number of reduce partitions) concurrent buffers in memory. In sort-based shuffle, at any given point only a single buffer is required. This has led to substantial memory overhead reduction during shuffle and can support workloads with hundreds of thousands of tasks in a single stage (our PB sort used 250,000 tasks).Also, use of Timsort, an external shuffle service to offload from the JVM, Netty, and EC2 SR-IOV.
During the 1970s, when Northern Ireland was gripped by near-civil-war, British military intelligence staged the evidence of “black masses” in order to create a Satanism panic among the “superstitious” Irish to discredit the paramilitaries. The secret history of imaginary Irish Satanism is documented in Black Magic and Bogeymen: Fear, Rumour and Popular Belief in the North of Ireland 1972-74, a new book from Sheffield University’s Richard Jenkins, who interviewed Captain Colin Wallace, the former head of British Army “black operations” for Northern Ireland.
Interesting — I hadn’t heard of this being an official practise anywhere before (although we actually did it ourselves this week)…
If a build has made it [past the 'integration test' phase], it is ready to be deployed to one or more internal environments for user-acceptance testing. Users could be UI developers implementing a new feature using the API, UI Testers performing end-to-end testing or automated UI regression tests. As far as possible, we strive to not have user-acceptance tests be a gating factor for our deployments. We do this by wrapping functionality in Feature Flags so that it is turned off in Production while testing is happening in other environments.
Felix says: ‘Like I said, I’d like to move it to a more general / non-personal repo in the future, but haven’t had the time yet. Anyway, you can still browse the code there for now. It is not a big code base so not that hard to wrap one’s mind around it. It is Apache licensed and both Kafka and Voldemort are using it so I would say it is pretty self-contained (although Kafka has not moved to Tehuti proper, it is essentially the same code they’re using, minus a few small fixes missing that we added). Tehuti is a bit lower level than CodaHale (i.e.: you need to choose exactly which stats you want to measure and the boundaries of your histograms), but this is the type of stuff you would build a wrapper for and then re-use within your code base. For example: the Voldemort RequestCounter class.’
Great presentation about Github dev culture and building software without breakage, but still with real progress.
Syncthing is becoming Ind.ie Pulse. Pulse replaces proprietary sync and cloud services with something open, trustworthy and decentralised. Your data is your data alone and you deserve to choose where it is stored, if it is shared with some third party, and how it’s transmitted over the Internet.
This is a harrowing post from Kathy Sierra, full of valid observations:
You’re probably more likely to win the lottery than to get any law enforcement agency in the United States to take action when you are harassed online, no matter how visciously and explicitly. Local agencies lack the resources, federal agencies won’t bother.That to the power of ten in Ireland, too, I’d suspect. Fuck this. Troll culture is way out of control….
An embryonic metrics library for Java/Scala from Felix GV at LinkedIn, extracted from Kafka’s metric implementation and in the new Voldemort release. It fixes the major known problems with the Meter/Timer implementations in Coda-Hale/Dropwizard/Yammer Metrics. ‘Regarding Tehuti: it has been extracted from Kafka’s metric implementation. The code was originally written by Jay Kreps, and then maintained improved by some Kafka and Voldemort devs, so it definitely is not the work of just one person. It is in my repo at the moment but I’d like to put it in a more generally available (git and maven) repo in the future. I just haven’t had the time yet… As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn’t like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there’s also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput fairly constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.’ More background at the kafka-dev thread: http://mail-archives.apache.org/mod_mbox/kafka-dev/201402.mbox/%3C131A7649-ED57-45CB-B4D6-F34063267664@linkedin.com%3E
‘Chiranjeeb Buragohain and Subhash Suri: “Quantiles on Streams” in Encyclopedia of Database Systems, Springer, pp 2235–2240, 2009. ISBN: 978-0-387-35544-3′, cited by Martin Kleppman in http://mail-archives.apache.org/mod_mbox/kafka-dev/201402.mbox/%3C131A7649-ED57-45CB-B4D6-F34063267664@linkedin.com%3E as a good, short literature survey re estimating percentiles with a small memory footprint.
Many Belkin routers attempt to determine if they’re connected to the internet by pinging ‘heartbeat.belkin.com’, in a classic amateur fail move. Good reason not to run Belkin firmware if that’s the level of code quality to expect
An _extremely_ detailed resource about the bash bug
Brilliant. Nice use of an anime avatar, to boot…. ‘Consulting for men who have better things to do than educate themselves about feminism. Got a question for a feminist? I would be happy to educate you! Below are my rates.’
by Rami Rosen — extremely detailed presentation into the state of Linux containers, LXC, Docker, namespaces, cgroups, and checkpoint/restore in userspace (via lusis)
Reddit forces all remote workers (about half the workforce, in SLC and NYC) to move to SF, provoking a shitstorm:
In a tweet confirming the move, Reddit’s CEO justified his treatment of non-San Francisco workers with a push for Optimal Teamwork to drive the New And $50M Improved Reddit forward. I shit you not. That was the actual term! (I added the New & Improved fan fiction here). So let’s leave aside the debate over whether working remotely is as efficient as being in the same office all the time. Let’s just focus on the size of the middle finger given to the people who work at Reddit outside the Bay Area, given the choice of forced, express relocation or a pink slip. How optimal do you think these employees will feel about leadership and the rest of the team going forward? Do you think they’ll just show up at the new, apparently-not-even-in-San-Francisco-proper office with a smile from ear to ear, ready to begin in earnest on Optimal Teamwork, left-behind former colleagues be damned?
‘I designed this jacket as a tribute to the continuing legacy of American spaceflight. I wanted it to embody everything I loved about the space program, and to eventually serve as an actual flight jacket for present-day astronauts on missions to the ISS (International Space Station). There are other “replica” flight jackets made for space enthusiasts, but I decided to come up with something boldly different, yet also completely wearable and well-suited for space.’
Kevin Marks has a pretty good point here:
Your tweet could win the fame lottery, and everyone on the Internet who thinks you are wrong could tell you about it. Or one of the “verified” could call you out to be the tribute for your community and fight in their Hunger Games. Say something about feminism, or race, or sea lions and you’d find yourself inundated by the same trite responses from multitudes. Complain about it, and they turn nasty, abusing you, calling in their friends to join in. Your phone becomes useless under the weight of notifications; you can’t see your friends support amongst the flood. The limited tools available – blocking, muting, going private – do not match well with these floods. Twitter’s abuse reporting form takes far longer than a tweet, and is explicitly ignored if friends try to help.
A common “trick” is to claim: ‘We assume network partitions can’t happen. Therefore, our system is CA according to the CAP theorem.’ This is a nice little twist. By asserting network partitions cannot happen, you just made your system into one which is not distributed. Hence the CAP theorem doesn’t even apply to your case and anything can happen. Your system may be linearizable. Your system might have good availability. But the CAP theorem doesn’t apply. [...] In fact, any well-behaved system will be “CA” as long as there are no partitions. This makes the statement of a system being “CA” very weak, because it doesn’t put honesty first. I tries to avoid the hard question, which is how the system operates under failure. By assuming no network partitions, you assume perfect information knowledge in a distributed system. This isn’t the physical reality.
Spark Streaming has been getting some attention lately as a real-time data processing tool, often mentioned alongside Apache Storm. [...] I added an example Spark Streaming application to kafka-storm-starter that demonstrates how to read from Kafka and write to Kafka, using Avro as the data format and Twitter Bijection for handling the data serialization. In this post I will explain this Spark Streaming example in further detail and also shed some light on the current state of Kafka integration in Spark Streaming. All this with the disclaimer that this happens to be my first experiment with Spark Streaming.
‘a system for allowing servers with encrypted root file systems to reboot unattended and/or remotely.’ (via Tony Finch)
‘a set of command line tools for managing Route53 DNS for an AWS infrastructure. It intelligently uses tags and other metadata to automatically create the associated DNS records.’
‘The work, Inspeqtor which is hosted at GitHub, is far from a “clean-room” implementation. This is basically a rewrite of Monit in Go, even using the same configuration language that is used in Monit, verbatim. a. [private] himself admits that Inspeqtor is “heavily influenced“ by Monit https://github.com/mperham/inspeqtor/wiki/Other-Solutions. b. This tweet by [private] demonstrate intent. https://twitter.com/mperham/status/452160352940064768 “OSS nerds: redesign and build monit in Go. Sell it commercially. Make $$$$. I will be your first customer.”’ IANAL, but using the same config language does not demonstrate copyright infringement…
So what is #gamergate? #gamergate is a mob with torches aloft, hunting for any combustible dwelling and calling it a monster’s lair. #gamergate is a rage train, and everyone with an axe to grind wants a ride. Its fuel is a sour mash of entitlement, insecurity, arrogance and alienation. #gamergate is a vindication quest for political intolerance. #gamergate is revenge for every imagined slight. #gamergate is Viz’s Meddlesome Ratbag.
An interview with the scientist who was part of the team which discovered the Ebola virus in 1976:
Other samples from the nun, who had since died, arrived from Kinshasa. When we were just about able to begin examining the virus under an electron microscope, the World Health Organisation instructed us to send all of our samples to a high-security lab in England. But my boss at the time wanted to bring our work to conclusion no matter what. He grabbed a vial containing virus material to examine it, but his hand was shaking and he dropped it on a colleague’s foot. The vial shattered. My only thought was: “Oh, shit!” We immediately disinfected everything, and luckily our colleague was wearing thick leather shoes. Nothing happened to any of us.
This is the downside of publicly-funded labs selling patent-licensing rights to private companies:
Given the urgency, it’s inexplicable that one of the candidate vaccines, developed at the Public Health Agency of Canada (PHAC) in Winnipeg, has yet to go in the first volunteer’s arm, says virologist Heinz Feldmann, who helped develop the vaccine while at PHAC. “It’s a farce; these doses are lying around there while people are dying in Africa,” says Feldmann, who now works at the Rocky Mountain Laboratories of the U.S. National Institute of Allergy and Infectious Diseases (NIAID) in Hamilton, Montana. At the center of the controversy is NewLink Genetics, a small company in Ames, Iowa, that bought a license to the vaccine’s commercialization from the Canadian government in 2010, and is now suddenly caught up in what WHO calls “the most severe acute public health emergency seen in modern times.” Becker and others say the company has been dragging its feet the past 2 months because it is worried about losing control over the development of the vaccine.
“A command-line power tool for Twitter.” It really is — much better timeline searchability than the “real” Twitter UI, for example
We’ve had almost 40 years to develop, test and stockpile an Ebola vaccine. That has not happened because big pharma has been entirely focused on shareholder value and profits over safety and survival from a deadly virus. For the better part of Ebola’s 38 years ? big pharma has been asleep. The question ahead is what virus or superbug will wake them up?
a “firehose of emails that are just going out at 2:45 in the morning” and “if you forwarded something to one of your people at 1 o’clock in the morning and they didn’t reply promptly, you got a little annoyed at them.”Fuck. That.
“Snopes for Twitter”. great idea
FB are using a Blu-Ray robot library
That the company’s consistent, nearly frozen posture of disingenuous smirking means that the most perceptible “Uber problem” is almost always how it frames things, rather than how it actually operates, whether it’s systematically sabotaging of competitors or using its quarter-billion-dollar war chest to relentlessly cut fares and driver pay to unsustainable levels in order to undercut existing transit systems, is remarkable in its way, though. If your company’s trying to conquer the world, in the end, being a dick might be the best PR strategy of all.
The sql! macro will validate that its string literal argument parses as a valid Postgres query.Based on https://pganalyze.com/blog/parse-postgresql-queries-in-ruby.html , which links the PostgreSQL server code directly into a C extension. Mad stuff, Ted! (via Rob Clancy)
right down the road from my house! how convenient
‘a distribution of long-living [distributed] transactions where steps may interleave, each with associated compensating transactions providing a compensation path across databases in the occurrence of a fault that may or may not compensate the entire chain back to the originator.’
this is nuts. 99 cents per month for a super-cheap host — I’m sure there’s a use case for this (via Elliot)
Prototype is a brand new festival of play and interaction. This is your chance to experience the world from a new perspective with removable camera eyes, to jostle and joust to a Bach soundtrack whilst trying to disarm an opponent, to throw shapes as you figure out who got an invite to the silent disco, to duel with foam pool noodles, and play chase in the dark with flashlights. A unique festival that incites new types of social interaction, involving technology and the city, Prototype is a series of performances, workshops, talks, and games that spill across the city, alongside an adult playground in the heart of Temple Bar.Project Arts Centre, 17-18 October. looks nifty
I want to tell you about when violent campaigns against harmless bloggers weren’t any halfway decent troll’s idea of a good time — even the then-malicious would’ve found it too easy to be fun. When the punches went up, not down. Before the best players quit or went criminal or were changed by too long a time being angry. When there was cruelty, yes, and palpable strains of sexism and racism and every kind of phobia, sure, but when these things had the character of adolescents pushing the boundaries of cheap shock, disagreeable like that but not criminal. Not because that time was defensible — it wasn’t, not really — but because it was calmer and the rage wasn’t there yet. Because trolling still meant getting a rise for a laugh, not making helpless people fear for their lives because they’re threatening some Redditor’s self-proclaimed monopoly on reason. I want to tell you about it because I want to make sense of how it is now and why it changed.
Paul Hickey’s gite near Toulouse, available for rent! ‘a beautifully converted barn on 5 acres, wonderfully located in the French countryside. 4 Bedrooms, sleeps 2-10, Large Pool, Tennis Court, Large Trampoline, Broadband Internet, 30 Mins Toulouse/Albi, 65 Mins Carcassonne, 90 Mins Rodez’
The Ello founders are positioning it as an alternative to other social networks — they won’t sell your data or show you ads. “You are not the product.” If they were independently-funded and run as some sort of co-op, bootstrapped until profitable, maybe that’s plausible. Hard, but possible. But VCs don’t give money out of goodwill, and taking VC funding — even seed funding — creates outside pressures that shape the inevitable direction of a company.
With the increasing size and complexity of Hadoop deployments, being able to locate and understand performance is key to running an efficient platform. Inviso provides a convenient view of the inner workings of jobs and platform. By simply overlaying a new view on existing infrastructure, Inviso can operate inside any Hadoop environment with a small footprint and provide easy access and insight.This sounds pretty useful.
‘Linux is becoming the thing that we adopted Linux to get away from.’ Great post on the horrible complexity of systemd. It reminds me of nothing more than mid-90s AIX, which I had the displeasure of opsing for a while — the Linux distros have taken a very wrong turn here.
this is truly heinous. Given that any CGI which invokes popen()/system() on a Linux system where /bin/sh is a link to bash is vulnerable, there will be a lot of vulnerable services out there (via Elliot)
Some common problems which arise using Chef with ASGs in EC2, and how these guys avoided it — they stopped using Chef for service provisioning, and instead baked AMIs when a new version was released. ASGs using pre-baked AMIs definitely works well so this makes good sense IMO.
Mark “ONEList” Fletcher’s back, and he’s reinventing the email group! awesome.
email groups (the modern version of mailing lists) have stagnated over the past decade. Yahoo Groups and Google Groups both exude the dank air of benign neglect. Google Groups hasn’t been updated in years, and some of Yahoo’s recent changes have actually made Yahoo Groups worse! And yet, millions of people put up with this uncertainty and neglect, because email groups are still one of the best ways to communicate with groups of people. And I have a plan to make them even better. So today I’m launching Groups.io in beta, to bring email groups into the 21st Century. At launch, we have many features that those other services don’t have, including: Integration with other services, including: Github, Google Hangouts, Dropbox, Instagram, Facebook Pages, and the ability to import Feeds into your groups. Businesses and organizations can have their own private groups on their own subdomain. Better archive organization, using hashtags. Many more email delivery options. The ability to mute threads or hashtags. Fully searchable archives, including searching within attachments. One other feature that Groups.io has that Yahoo and Google don’t, is a business model that’s not based on showing ads to you. Public groups are completely free on Groups.io. Private groups and organizations are very reasonably priced.
SFU announces award for students who demonstrate excellence in contributing to an Open Source project
‘provides citizens, public sector workers and companies with real-time information, time-series indicator data, and interactive maps about all aspects of the city. It enables users to gain detailed, up to date intelligence about the city that aids everyday decision making and fosters evidence-informed analysis.’
New from Facebook engineering:
Last year, at the Data@Scale event and at the USENIX Networked Systems Design and Implementation conference , we spoke about turning caches into distributed systems using software we developed called mcrouter (pronounced “mick-router”). Mcrouter is a memcached protocol router that is used at Facebook to handle all traffic to, from, and between thousands of cache servers across dozens of clusters distributed in our data centers around the world. It is proven at massive scale — at peak, mcrouter handles close to 5 billion requests per second. Mcrouter was also proven to work as a standalone binary in an Amazon Web Services setup when Instagram used it last year before fully transitioning to Facebook’s infrastructure. Today, we are excited to announce that we are releasing mcrouter’s code under an open-source BSD license. We believe it will help many sites scale more easily by leveraging Facebook’s knowledge about large-scale systems in an easy-to-understand and easy-to-deploy package.This is pretty crazy — basically turns a memcached cluster into a much more usable clustered-storage system, with features like shadowing production traffic, cold cache warmup, online reconfiguration, automatic failover, prefix-based routing, replicated pools, etc. Lots of good features.
Where you have obtained contact details in the context of the sale of a product or service, you may only use these details for direct marketing by electronic mail if the following conditions are met: the product or service you are marketing is of a kind similar to that which you sold to the customer at the time you obtained their contact details At the time you collected the details, you gave the customer the opportunity to object, in an easy manner and without charge, to their use for marketing purposes Each time you send a marketing message, you give the customer the right to object to receipt of further messages The sale of the product or service occurred not more than twelve months prior to the sending of the electronic marketing communication or, where applicable, the contact details were used for the sending of an electronic marketing communication in that twelve month period.
This algorithm, which Bob Boyer and I invented in 1980, decides which element of a sequence is in the majority, provided there is such an element.
tinystat is used to compare two or more sets of measurements (e.g., runs of a multiple runs of benchmarks of two possible implementations) and determine if they are statistically different, using Student’s t-test. It’s inspired largely by FreeBSD’s ministat (written by Poul-Henning Kamp).
The relationship between this Dark Tetrad [of narcissism, Machiavellianism, psychopathy, and sadism] and trolling is so significant, that the authors write the following in their paper: “… the associations between sadism and GAIT (Global Assessment of Internet Trolling) scores were so strong that it might be said that online trolls are prototypical everyday sadists.” [emphasis added] Trolls truly enjoy making you feel bad. To quote the authors once more (because this is a truly quotable article): “Both trolls and sadists feel sadistic glee at the distress of others. Sadists just want to have fun … and the Internet is their playground!”Bloody hell.
Great runbook for C* ops
get page cache statistics for files.
A common question when tuning databases and other IO-intensive applications is, “is Linux caching my data or not?” pcstat gets that information for you using the mincore(2) syscall. I wrote this is so that Apache Cassandra users can see if ssTables are being cached.
One could read the success of Go as an indictment of contemporary PLT, but I prefer to see it as a reminder of just how much language tooling matters. Perhaps even more critical, Go’s lean syntax, selective semantics, and cautiously-chosen feature set demonstrate the importance of a strong editorial voice in a language’s design and evolution. Having co-authored a book on Scala, it’s been painful to see systems programmers in my community express frustration with the ambitious hybrid language. I’ve watched them abandon ship and swim back to the familiar shores of Java, or alternately into the uncharted waters of Clojure, Go, and Rust. A pity, but not entirely surprising if we’re being honest with ourselves. Unlike Go, Scala has struggled with tooling from its inception. More than that, Scala has had a growing editorial problem. Every shop I know that’s been successful with Scala has limited itself to some subset of the language. Meanwhile, in pursuit of enterprise developers, its surface area has expanded in seemingly every direction. The folks behind Scala have, thankfully, taken notice: upcoming releases are promised to focus on simplicity, clarity, and better tooling.
“The First Amendment of the U.S. Constitution is similarly suspicious of prior restraints,” wrote Justice Lehrmann in the decision highlighting a cornerstone that has “been reaffirmed time and again by the Supreme Court, this Court, Texas courts of appeals, legal treatises, and even popular culture.” That last reference to popular culture contained an interesting footnote citing none other than Walter Sobchak, a character in ['The Big Lebowski'].
Ben Hughes on twitter: “JSON is fine for config files, if you don’t want to comment your config file. Which is a way of saying, it isn’t fine for config files.”
Peter Bailis complaining about the horrors of modern transactional databases and their unserializability, which noone seems to be paying attention to: ‘As you’re probably aware, there’s an ongoing and often lively debate between transactional adherents and more recent “NoSQL” upstarts about related issues of usability, data corruption, and performance. But, in contrast, many of these transactional inherents and the research community as a whole have effectively ignored weak isolation — even in a single server setting and despite the fact that literally millions of businesses today depend on weak isolation and that many of these isolation levels have been around for almost three decades.’ ‘Despite the ubiquity of weak isolation, I haven’t found a database architect, researcher, or user who’s been able to offer an explanation of when, and, probably more importantly, why isolation models such as Read Committed are sufficient for correct execution. It’s reasonably well known that these weak isolation models represent “ACID in practice,” but I don’t think we have any real understanding of how so many applications are seemingly (!?) okay running under them. (If you haven’t seen these models before, they’re a little weird. For example, Read Committed isolation generally prevents users from reading uncommitted or non-final writes but allows a number of bad things to happen, like lost updates during concurrent read-modify-write operations. Why is this apparently okay for many applications?)’
‘In this paper, we describe a generic concurrency control technique with Blocking write operations and Wait-Free Population Oblivious read operations, which we named the Left-Right technique. It is of particular interest for real-time applications with dedicated Reader threads, due to its wait-free property that gives strong latency guarantees and, in addition, there is no need for automatic Garbage Collection. The Left-Right pattern can be applied to any data structure, allowing concurrent access to it similarly to a Reader-Writer lock, but in a non-blocking manner for reads. We present several variations of the Left-Right technique, with different versioning mechanisms and state machines. In addition, we constructed an optimistic approach that can reduce synchronization for reads.’ See also http://concurrencyfreaks.blogspot.ie/2013/12/left-right-concurrency-control.html for java implementation code.
‘bring your .bashrc, .vimrc, etc. with you when you ssh’. A really nice implementation of this idea (much nicer than my own version!)
remotely trigger GCs, finalization, heap dumps etc. Handy
We appealed this decision, but on June 2014 the Upper Tribunal agreed with the First-tier Tribunal, cancelling our monetary penalty notice against Niebel and McNeish, and largely rendering our power to issue fines for breaches of PECR involving spam texts redundant.This is pretty terrible. The UK appears to have the weakest anti-spam regime in Europe due to the lack of powers given to ICO.
A nice curl/wget replacement which supports multi-TCP-connection downloads of HTTP/FTP resources. packaged for most Linux variants and OSX via brew
Linux users familiar with other filesystems or ZFS users from other platforms will often ask whether ZFS on Linux (ZoL) is “stable”. The short answer is yes, depending on your definition of stable. The term stable itself is somewhat ambiguous.Oh dear. that’s not a good start. Good reference page, though
“This is rule No. 1: There are no screens in the bedroom. Period. Ever.”
How can we measure the number of additional clicks or sales that an AdWords campaign generated? How can we estimate the impact of a new feature on app downloads? How do we compare the effectiveness of publicity across countries? In principle, all of these questions can be answered through causal inference. In practice, estimating a causal effect accurately is hard, especially when a randomised experiment is not available. One approach we’ve been developing at Google is based on Bayesian structural time-series models. We use these models to construct a synthetic control — what would have happened to our outcome metric in the absence of the intervention. This approach makes it possible to estimate the causal effect that can be attributed to the intervention, as well as its evolution over time. We’ve been testing and applying structural time-series models for some time at Google. For example, we’ve used them to better understand the effectiveness of advertising campaigns and work out their return on investment. We’ve also applied the models to settings where a randomised experiment was available, to check how similar our effect estimates would have been without an experimental control. Today, we’re excited to announce the release of CausalImpact, an open-source R package that makes causal analyses simple and fast. With its release, all of our advertisers and users will be able to use the same powerful methods for estimating causal effects that we’ve been using ourselves. Our main motivation behind creating the package has been to find a better way of measuring the impact of ad campaigns on outcomes. However, the CausalImpact package could be used for many other applications involving causal inference. Examples include problems found in economics, epidemiology, or the political and social sciences.
Shamefully, I haven’t visited most of these!
Now a series of decisions from lower courts is starting to bring the ruling’s practical consequences into focus. And the results have been ugly for fans of software patents. By my count there have been 11 court rulings on the patentability of software since the Supreme Court’s decision — including six that were decided this month. Every single one of them has led to the patent being invalidated. This doesn’t necessarily mean that all software patents are in danger — these are mostly patents that are particularly vulnerable to challenge under the new Alice precedent. But it does mean that the pendulum of patent law is now clearly swinging in an anti-patent direction. Every time a patent gets invalidated, it strengthens the bargaining position of every defendant facing a lawsuit from a patent troll.
A practical demo of “differential privacy” — allowing public data dumps to happen without leaking privacy, using Laplace noise addition
I’m ambivalent about Microsoft acquiring Mojang. Will they Embrace and Extend Minecraft as they’ve done with other categories? Let’s hope not. On the other hand, some adult supervision and a Plugin API would be welcome. Mojang have the financial resources but lack the will and focus needed to publish and support a Plugin API. Perhaps Mojang themselves don’t realise just how important their little game has become.
Dublin, 24th September 2014, hosted by Enterprise Ireland. Hosted by former Ubuntu counsel (via gcarr)
Even with buffered streams the application must be able to instruct the OS to forward all pending data when the stream has been flushed for optimal performance. The application does not know where packet boundaries reside, hence buffer flushes might not align on packet boundaries. TCP_CORK can pack data more effectively, because it has direct access to the TCP/IP layer. [..] If you do use an application buffering and streaming mechanism (as does Apache), I highly recommend applying the TCP_NODELAY socket option which disables Nagle’s algorithm. All calls to write() will then result in immediate transfer of data.
relatively-new Japanese place in the North Strand — delivers, too. Comes recommended by JK. Must try it out soon!
Actual scientific research showing that antibiotic use may be implicated in allergies: ‘Nagler’s team first confirmed that mice given antibiotics early in life were far more susceptible to peanut sensitization, a model of human peanut allergy. Then, they introduced a solution containing Clostridia, a common class of bacteria that’s naturally found in the mammalian gut, into the rodents’ mouths and stomachs. The animals’ food allergen sensitization disappeared, the team reports online today in the Proceedings of the National Academy of Sciences. When the scientists instead introduced another common kind of healthy bacteria, called Bacteroides, into similarly allergy-prone mice, they didn’t see the same effect. Studying the rodents more carefully, the researchers determined that Clostridia were having a surprising effect on the mouse gut: Acting through certain immune cells, the bacteria helped keep peanut proteins that can cause allergic reactions out of the bloodstream. “The bacteria are maintaining the integrity of the [intestinal] barrier,” Nagler says.’
ah, memories. This is the bug that caused me to have to run a fleet-wide upgrade across the EC2 substrate. Thanks, boost::asio!