The report’s revelations, based on a survey of nearly 800 writers worldwide, are alarming. Concern about surveillance is now nearly as high among writers living in democracies (75%) as among those living in non-democracies (80%). The levels of self-censorship reported by writers living in democratic countries are approaching the levels reported by writers living in authoritarian or semi-democratic countries.
the urgency of repealing the Irish blasphemy legislation cannot now be overstated. The same cartoons that saw their authors murdered for blasphemy recently, would see Irish authors hauled before our courts. The same nations that execute their citizens for blasphemy, wish to promote the wording of the Irish blasphemy legislation through the UN, in order to expand such provisions to more countries. Ireland is the only European country to recently introduce a new blasphemy law. Following the horrific recent events in Paris, let us be the next country to repeal our blasphemy laws.
If you haven’t heard about it, it is a compulsory database of the personal information of children, including PPS numbers, ethnicity, race and language skills, to be held for decades and shared across State agencies.
What if Silicon Valley had emerged from a racially integrated community? Would the technology industry be different? Would we? And what can the technology industry do now to avoid repeating the mistakes of the past?Amazing article — this is the best thing I’ve ever read on TechCrunch: the political history of race in Silicon Valley and East Palo Alto.
All of our assets loaded via the CDN [to our client in Australia] in just under 5 seconds. It only took ~2.7s to get those same assets to our friends down under with SPDY. The performance with no CDN blew the CDN performance out of the water. It is just no comparison. In our case, it really seems that the advantages of SPDY greatly outweigh that of a CDN when it comes to speed.
Excellent “In Focus” this week — ‘The continued massive growth of connected mobile devices is shaping not only how we communicate with each other, but how we look, behave, and experience the world around us. Smartphones and other handheld devices have become indispensable tools, appendages held at arm’s length to record a scene or to snap a selfie. Recent news photos show refugees fleeing war-torn regions holding up their phones as prized possessions to be saved, and relatives of victims lost to a disaster holding up their smartphones to show images of their loved ones to the press. Celebrity selfies, people alone in a crowd with their phones, events obscured by the very devices used to record that event, the brightly lit faces of those bent over their small screens, these are some of the scenes depicted below.’
‘Unlike existing alternatives, such as stream processing, that favor the execution of arbitrary application code, we want to capture much of the processing logic as a set of known operations over specialized Computational CRDTs, with particular semantics and invariants, such as min/max/average/median registers, accumulators, top-N sets, sorted sets/maps, and so on. Keeping state also allows the system to decrease the amount of propagated information. Preliminary results obtained in a single example show that Titan has an higher throughput when compared with state of the art stream processing systems.’
‘Turn websites into structured APIs from your browser in seconds’ — next-generation web scraping, recommended by conoro
as one insider told me, it feels like “Lab126 is in the doghouse” and that “Jeff is taking out his frustration with the failure of the Fire Phone” on upper management.
a conceptual model, with accompanying XML schema, that may be used to quantify and exchange complex uncertainties in data. The interoperable model can be used to describe uncertainty in a variety of ways including: Samples Statistics including mean, variance, standard deviation and quantile Probability distributions including marginal and joint distributions and mixture models
How to secure SSH, disabling insecure ciphers etc. (via Padraig)
Make “Paste and Match Style” the default, as it should be
Twitter open-sources an anomaly-spotting R package:
Early detection of anomalies plays a key role in ensuring high-fidelity data is available to our own product teams and those of our data partners. This package helps us monitor spikes in user engagement on the platform surrounding holidays, major sporting events or during breaking news. Beyond surges in social engagement, exogenic factors – such as bots or spammers – may cause an anomaly in number of favorites or followers. The package can be used to find such bots or spam, as well as detect anomalies in system metrics after a new software release. We’re open-sourcing AnomalyDetection because we’d like the public community to evolve the package and learn from it as we have.
Rx/reactive in style, autoscaling, support for queue/broker-based strong consistency as well as TCP-based lossy delivery
‘I now a man with a wooden leg named sea what was the name of the other leg SAND’
Fergal Crehan’s new gig — good idea!
The Hit Team helps you fight back against leaked photos and videos, internet targeting and revenge porn.
Beyond the interesting-enough stuff about scalability in a distributed SQL store, there’s this really nifty point about avoiding the horrors of the SQL/ORM impedance mismatch:
At Google, Protocol Buffers are ubiquitous for data storage and interchange between applications. When we still had a MySQL schema, users often had to write tedious and error-prone transformations between database rows and in-memory data structures. Putting protocol buffers in the schema removes this impedance mismatch and gives users a universal data structure they can use both in the database and in application code…. Protocol Buffer columns are more natural and reduce semantic complexity for users, who can now read and write their logical business objects as atomic units, without having to think about materializing them using joins across several tables.This is something that pretty much any store can already adopt. Go protobufs. (or Avro, etc.) Also, I find this really neat, and I hope this idea is implemented elsewhere soon: asynchronous schema updates:
Schema changes are applied asynchronously on multiple F1 servers. Anomalies are prevented by the use of a schema leasing mechanism with support for only current and next schema versions; and by subdividing schema changes into multiple phases where consecutive pairs of changes are mutually compatible and cannot cause anomalies.
This is a really excellent post on the topic, rebutting Paul Graham’s Bay-Area-centric thoughts on the topic very effectively. I’ve worked in both distributed and non-distributed, as well as effective and ineffective teams ;), and Avleen’s thoughts are very much on target.
I’ve been involved in the New York start up scene since I joined Etsy in 2010. Since that time, I’ve seen more and more companies there embrace having distributed teams. Two companies I know which have risen to the top while doing this have been Etsy and DigitalOcean. Both have exceptional engineering teams working on high profile products used by many, many people around the world. There are certainly others outside New York, including Automattic, GitHub, Chef Inc, Puppet… the list goes on. So how did this happen? And why do people continue to insist that distributed teams lower performance, and are a bad idea? Partly because we’ve done a poor job of showing our industry how to be successful at it, and partly because it’s hard. Having successful distributed teams requires special skills from management, which arent’t easily learned until you have to manage a distributed team. Catch 22.
As used in Cassandra ( http://grokbase.com/t/hbase/dev/13bf9kezes/about-xx-threadprioritypolicy-42 )!
if you just set the “ThreadPriorityPolicy” to something else than the legal values 0 or 1, [...] a slight logic bug in Sun’s JVM code kicks in, and thus sets the policy to be as if running with root – thus you get exactly what one desire. The operating system, Linux, won’t allow priorities to be heightened above “Normal” (negative nice value), and thus just ignores those requests (setting it to normal instead, nice value 0) – but it lets through the requests to set it lower (setting the nice value to some positive value).
Enigma is a Linux based alternative to the default Spark operating system on these boxes. Enigma is a more customisable OS and provides the ability to add plugins which can accomplish many tasks enabling users to have a box which might look and perform like a Sky box, giving a 7 day EPG and an alternative to series link.Looks like a pretty solid hacker community…
William Hague, the leader of the House of Commons, has responded to concerns raised by an MP about the security of parliamentary data stored on Microsoft’s Cloud-based servers in Europe. “The relevant servers are situated in the Republic of Ireland and the Netherlands, both being territories covered by the EC Data Protection Directive,” William Hague wrote in a letter to John Hemming, MP for Birmingham Yardley. “Any access by US authorities to such data would have to be by way of mutual legal assistance arrangements with those countries.” [...] John Hemming MP told Computer Weekly Hague’s reassurances carried little weight in the face of aggressive legal action by the US government. “The Microsoft case makes it clear that, in the end, the fact that Microsoft is a US company legally trumps the European Data Protection Directive [...] and where [the letter says] the US authorities could not exercise a right of search and seizure on an extraterritorial basis, well, they are doing that, in America, today.”Sounds like they didn’t think that through…
Nearly half the EU-wide average.
Sweden has also created 12,600 safer pedestrian crossings with features such as bridges, flashing lights, and speed bumps. That’s estimated to have halved pedestrian deaths over the past five years. The country has lowered speed limits in urban, crowded areas and built barriers to protect bikers from incoming traffic. A crackdown on drunk driving has also helped.
Formats the year based on ISO week numbering, which often is not what you want. Both have been responsible for high-profile production bugs (in Apple and Android).
Spectacularly inept. Pretty much every UGC site there is
Wow, where has this person been for the past 20 years that they haven’t had to encounter this? I can only imagine having a private office, tbh.
my personal performance at work has hit an all-time low. Each day, my associates and I are seated at a table staring at each other, having an ongoing 12-person conversation from 9 a.m. to 5 p.m. It’s like being in middle school with a bunch of adults. Those who have worked in private offices for decades have proven to be the most vociferous and rowdy. They haven’t had to consider how their loud habits affect others, so they shout ideas at each other across the table and rehash jokes of yore. As a result, I can only work effectively during times when no one else is around, or if I isolate myself in one of the small, constantly sought-after, glass-windowed meeting rooms around the perimeter.
‘The fee [airline pricing] model comes with systematic costs that are not immediately obvious. Here’s the thing: in order for fees to work, there needs be something worth paying to avoid. That necessitates, at some level, a strategy that can be described as “calculated misery.” Basic service, without fees, must be sufficiently degraded in order to make people want to pay to escape it. And that’s where the suffering begins.’
‘Ádám was trying his hand at a problem in Excel, but the official rules prohibit the use of Excel macros. In a daze, he came up with one of the most clever uses of Excel: building an assembly interpreter with the most popular spreadsheet program. This is a virtual Harvard architecture machine without writable RAM; the stack is only lots and lots of IFs.’
A causal profiler for C++.
Causal profiling is a novel technique to measure optimization potential. This measurement matches developers’ assumptions about profilers: that optimizing highly-ranked code will have the greatest impact on performance. Causal profiling measures optimization potential for serial, parallel, and asynchronous programs without instrumentation of special handling for library calls and concurrency primitives. Instead, a causal profiler uses performance experiments to predict the effect of optimizations. This allows the profiler to establish causality: “optimizing function X will have effect Y,” exactly the measurement developers had assumed they were getting all along.I can see this being a good technique to stochastically discover race conditions and concurrency bugs, too.
This is the version with the superfast petabyte-sort record:
Spark 1.2 includes several cross-cutting optimizations focused on performance for large scale workloads. Two new features Databricks developed for our world record petabyte sort with Spark are turned on by default in Spark 1.2. The first is a re-architected network transfer subsystem that exploits Netty 4’s zero-copy IO and off heap buffer management. The second is Spark’s sort based shuffle implementation, which we’ve now made the default after significant testing in Spark 1.1. Together, we’ve seen these features give as much as 5X performance improvement for workloads with very large shuffles.
As the 1 January deadline gallops towards the EU, microbusinesses desperate to stay open without breaking the law try to find out, “Can I email stuff out instead?” Well… Yes. – No – It depends – and simultaneously yes AND no, according to Schrödinger’s VAT. So that’s clear, then.
Nice work, EU
I keep forgetting about sshuttle. It’s by far the easiest way to get a cheapo IP-over-SSH VPN working with an OSX client, particularly since it’s in homebrew
Things hotting up in TOR-land.
Until I have had the time and information available to review the situation, I am strongly recommending my mirrors are not used under any circumstances. If they come back online without a PGP signed message from myself to further explain the situation, exercise extreme caution and treat even any items delivered over TLS to be potentially hostile.
$14.99 ebook, recommended by Steve Vinoski, looks good
Google made a change in Android 4.4 which allows operators to know when users are using tethering and conveniently block tethered devices from accessing internet. This can be fixed permanently using the following procedure.Well this is stupid. (via Tony Finch)
It’s now over a year since Edward Snowden went public with evidence of mass surveillance and extensive abuses by the NSA, GCHQ and other intelligence agencies. In other countries these revelations prompted parliamentary inquiries, diplomatic representations and legislation. In Ireland the only response was a promise [..] to help extradite Mr Snowden should he land here.
For the record
To demonstrate that hackers have no interest in suppressing speech, quashing controversy, or being intimidated by vague threats, we ask that Sony allow the hacker community to distribute “The Interview” for them on the 25th of December. Now, we’re aware that Sony may refer to this distribution method as piracy, but in this particular case, it may well prove to be the salvation of the motion picture industry. By freely offering the film online, millions of people will get to see it and decide for themselves if it has any redeeming qualities whatsoever – as opposed to nobody seeing it and the studios writing it off as a total loss. Theaters would be free from panic as our servers would become the target of any future vague threats (and we believe Hollywood will be most impressed with how resilient peer-to-peer distribution can be in the face of attacks). Most importantly, we would be defying intimidation, something the motion picture industry doesn’t quite have a handle on, which is surprising considering how much they’ve relied upon it in the past.
Need to keep an eye out for a few of these — will probably be a little more than $30 given the whole import/export carry-on of course
In CAP terms, ZooKeeper is CP, meaning that it’s consistent in the face of partitions, not available. For many things that ZooKeeper does, this is a necessary trade-off. Since ZooKeeper is first and foremost a coordination service, having an eventually consistent design (being AP) would be a horrible design decision. Its core consensus algorithm, Zab, is therefore all about consistency. For coordination, that’s great. But for service discovery it’s better to have information that may contain falsehoods than to have no information at all. It is much better to know what servers were available for a given service five minutes ago than to have no idea what things looked like due to a transient network partition. The guarantees that ZooKeeper makes for coordination are the wrong ones for service discovery, and it hurts you to have them.Yes! I’ve been saying this for months — good to see others concurring.
omg, Die Gute Fabrik’s game collection featuring the AMAZING Johann Sebastian Joust — now available on Mac, Linux and (missing JSJ) Windows. Time to buy an assload of Move controllers!
Nice worked-through Lambda example
Oh god yes. This is absolutely spot on, as you would expect from a Google paper — at this stage they probably have accumulated more real-world ML-at-scale experience than anywhere else. ‘Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns. [....] ‘In this paper, we focus on the system-level interaction between machine learning code and larger systems as an area where hidden technical debt may rapidly accumulate. At a system-level, a machine learning model may subtly erode abstraction boundaries. It may be tempting to re-use input signals in ways that create unintended tight coupling of otherwise disjoint systems. Machine learning packages may often be treated as black boxes, resulting in large masses of “glue code” or calibration layers that can lock in assumptions. Changes in the external world may make models or input signals change behavior in unintended ways, ratcheting up maintenance cost and the burden of any debt. Even monitoring that the system as a whole is operating as intended may be difficult without careful design. Indeed, a remarkable portion of real-world “machine learning” work is devoted to tackling issues of this form. Paying down technical debt may initially appear less glamorous than research results usually reported in academic ML conferences. But it is critical for long-term system health and enables algorithmic advances and other cutting-edge improvements.’
Since Operation Torpedo [use of a Metasploit side project], there’s evidence the FBI’s anti-Tor capabilities have been rapidly advancing. Torpedo was in November 2012. In late July 2013, computer security experts detected a similar attack through Dark Net websites hosted by a shady ISP called Freedom Hosting—court records have since confirmed it was another FBI operation. For this one, the bureau used custom attack code that exploited a relatively fresh Firefox vulnerability—the hacking equivalent of moving from a bow-and-arrow to a 9-mm pistol. In addition to the IP address, which identifies a household, this code collected the MAC address of the particular computer that infected by the malware. “In the course of nine months they went from off the shelf Flash techniques that simply took advantage of the lack of proxy protection, to custom-built browser exploits,” says Soghoian. “That’s a pretty amazing growth … The arms race is going to get really nasty, really fast.”
Microsoft -v- USA is an important ongoing case, currently listed for hearing in 2015 before the US Federal Court of Appeal of the 2nd Circuit. However, as the case centres around the means by which NY law enforcement are seeking to access data of an email account which resides in Dublin, it is also crucially significant to Ireland and the rest of the EU. For that reason, Digital Rights Ireland instructed us to file an Amicus Brief in the US case, in conjunction with the global law firm of White & Case, who have acted pro bono in their representation. Given the significance of the case for the wider EU, both Liberty and the Open Rights Group in the UK have joined Digital Rights Ireland as amici on this brief. We hope it will be of aid to the US court in assessing the significance of the order being appealed by Microsoft for EU citizens and European states, in the light of the existing US and EU Mutual Legal Assistance Treaty.
Hey look, PID 1 segfaulting! I haven’t seen that happen since we managed to corrupt /bin/sh on Ultrix in 1992. Nice work Fedora
GCHQ maintains a huge repository named MUTANT BROTH that stores billions of these intercepted cookies, which it uses to correlate with IP addresses to determine the identity of a person. GCHQ refers to cookies internally as “target detection identifiers.”
Generate graphs/flowcharts from text a la Markdown. Pretty much identical to graphviz surely?
Very impressive. I particularly like the use of Tester Dojos to get through a backlog of unwritten tests — we had a similar problem recently…
From 7-8pm on Friday, [RepricerExpress] software, used by third-party sellers to ensure their products are the cheapest on the market, went a bit haywire and reduced prices to as little as 1p.
Wow, this looks cool. $159
littleBits and Korg have demystified a traditional analog synthesizer, making it super easy for novices and experts alike to create music. connects to speakers, computers and headphones. can be used to make your own instruments. fits into the littleBits modular system for infinite combos of audio, visual and sensory experiences
‘Find the best cold & flu meds for your symptoms’ — actually pretty useful, although of course the US-only brandnames aren’t available over here…
This sounds really excellent — the dimensionality problem it deals with is a familiar one, particularly with red/black deployments, autoscaling, and so on creating trees of metrics when new transient servers appear and disappear. Looking forward to Netflix open sourcing enough to make it usable for outsiders
Hello! I love satellite imagery and topographic maps so I made several wallpapers with those gorgeous pictures. All wallpapers are iPhone 6 Plus optimized with 1242×2208 pixels, but you can resize for any device you have. The original sources are listed below.(Via This Is Colossal)
I’ve been bitten by poor key distribution in tests in the past, so this is spot on: ‘I’d run it with Zipfian, Pareto, and Dirac delta distributions, and I’d choose read-modify-write transactions.’ And of course, a dataset bigger than all combined RAM. Also: http://smalldatum.blogspot.ie/2014/04/biebermarks.html — the “Biebermark”, where just a single row out of the entire db is contended on in a read/modify/write transaction: “the inspiration for this is maintaining counts for [highly contended] popular entities like Justin Bieber and One Direction.”
“AWS Key Management Service (AWS KMS) provides cryptographic keys and operations scaled for the cloud. AWS KMS keys and functionality are used by other AWS cloud services, and you can use them to protect user data in your applications that use AWS. This white paper provides details on the cryptographic operations that are executed within AWS when you use AWS KMS.”
some good details of Aurora innards
ex-Percona MySQL wizard Baron Schwartz, noting that MVCC as implemented in common SQL databases is not all that simple or reliable compared to big bad NoSQL Eventual Consistency:
Since I am not ready to assert that there’s a distributed system I know to be better and simpler than eventually consistent datastores, and since I certainly know that InnoDB’s MVCC implementation is full of complexities, for right now I am probably in the same position most of my readers are: the two viable choices seem to be single-node MVCC and multi-node eventual consistency. And I don’t think MVCC is the simpler paradigm of the two.
This is quite interesting/weird — Stripe’s protocol for mass-CCing email as they scale up the company, based around http://en.wikipedia.org/wiki/Civil_inattention
If their interests were better serving the world, using technology as a force for social justice, and equitably distributing technology wealth to enrich society … sure, they’d be acting against their interests. But the reality is that tech companies centralize power and wealth in a small group of privileged white men. When that’s the goal, then exploiting the labor of marginalized people and denying them access to power and wealth is 100 percent in line with the endgame. A more diverse tech industry would be better for its workers and everyone else, but it would be worse for the privileged white men at the top of it, because it would mean they would have to give up their monopoly on money and power. And they will fight that with everything they’ve got, which is why we see barriers to equality at every level of the industry.
Awesome! I was completely unaware this was coming down the pipeline.
A new, transactionally updated Ubuntu for the cloud. Ubuntu Core is a new rendition of Ubuntu for the cloud with transactional updates. Ubuntu Core is a minimal server image with the same libraries as today’s Ubuntu, but applications are provided through a simpler mechanism. The snappy approach is faster, more reliable, and lets us provide stronger security guarantees for apps and users — that’s why we call them “snappy” applications. Snappy apps and Ubuntu Core itself can be upgraded atomically and rolled back if needed — a bulletproof approach to systems management that is perfect for container deployments. It’s called “transactional” or “image-based” systems management, and we’re delighted to make it available on every Ubuntu certified cloud.
Ouch. I think Amazon did a better job of the Technical Track concept than this, at least
“git for operating system binaries”. OSTree is a tool for managing bootable, immutable, versioned filesystem trees. It is not a package system; nor is it a tool for managing full disk images. Instead, it sits between those levels, offering a blend of the advantages (and disadvantages) of both. You can use any build system you like to place content into it on a build server, then export an OSTree repository via static HTTP. On each client system, “ostree admin upgrade” can incrementally replicate that content, creating a new root for the next reboot. This provides fully atomic upgrades. Any changes made to /etc are propagated forwards, and all local state in /var is shared. A key goal of the project is to complement existing package systems like RPM and Debian packages, and help further their evolution. In particular for example, RPM-OSTree (linked below) has as a goal a hybrid tree/package model, where you replicate a base tree via OSTree, and then add packages on top.
Well, this stinks.
Foreign law enforcement agencies will be allowed to tap Irish phone calls and intercept emails under a statutory instrument signed into law by Minister for Justice Frances Fitzgerald. Companies that object or refuse to comply with an intercept order could be brought before a private “in camera” court. The legislation, which took effect on Monday, was signed into law without fanfare on November 26th, the day after documents emerged in a German newspaper indicating the British spy agency General Communications Headquarters (GCHQ) had directly tapped undersea communications cables between Ireland and Britain for years.
Your tax dollars at work: Spying on people just because they demand that the government’s agents stop killing black people. [...] Anonymous has released a video featuring what appear to be Chicago police radio transmissions revealing police wiretapping of organizers’ phones at the protests last night the day after Thanksgiving, perhaps using a stingray. The transmissions pointing to real-time wiretapping involve the local DHS-funded spy ‘fusion’ center.
Very good article around the privacy implications of derived and inferred aggregate metadata from Ben Goldacre.
We are entering an age – which we should welcome with open arms – when patients will finally have access to their own full medical records online. So suddenly we have a new problem. One day, you log in to your medical records, and there’s a new entry on your file: “Likely to die in the next year.” We spend a lot of time teaching medical students to be skilful around breaking bad news. A box ticked on your medical records is not empathic communication. Would we hide the box? Is that ethical? Or are “derived variables” such as these, on a medical record, something doctors should share like anything else?
Prof. Mazières’s research indicated some risk that consensus could fail, though we were nor certain if the required circumstances for such a failure were realistic. This week, we discovered the first instance of a consensus failure. On Tuesday night, the nodes on the network began to disagree and caused a fork of the ledger. The majority of the network was on ledger chain A. At some point, the network decided to switch to ledger chain B. This caused the roll back of a few hours of transactions that had only been recorded on chain A. We were able to replay most of these rolled back transactions on chain B to minimize the impact. However, in cases where an account had already sent a transaction on chain B the replay wasn’t possible.
In July 1967, astronomers at the Cavendish Laboratory in Cambridge, observed an unidentified radio signal from interstellar space, which flashed periodically every 1.33730 seconds. This object flashed with such regularity that it was accurate enough to be used as a clock and only be off by one part in a hundred million. It was eventually determined that this was the first discovery of a pulsar, CP-1919. This is an object that has about the same mass as the Sun, but is the size of the San Francisco Bay at its widest (~20 kilometers) that is rotating so fast that its emitting a beam of light towards Earth like a strobing light house! Pulsars are neutron stars that are formed from the remnants of a massive star when it experiences stellar death. A hand drawn graph plotted in the style of a waterfall plot, in the Cambridge Encyclopedia of Astronomy, later became renown for its use on the cover of the album “Unknown Pleasures” by 1970s English band Joy Division.The entire blog at http://intothecontinuum.tumblr.com/ is pretty great. Lots of nice mathematical animated GIFs, accompanied by Mathematica source and related ponderings.
‘Pubs & Bars With Raging Fires in Dublin’. This is important!
‘Anurag@AWS posts a quite interesting comment on Aurora failover: We asynchronously write to 6 copies and ack the write when we see four completions. So, traditional 4/6 quorums with synchrony as you surmised. Now, each log record can end up with a independent quorum from any other log record, which helps with jitter, but introduces some sophistication in recovery protocols. We peer to peer to fill in holes. We also will repair bad segments in the background, and downgrade to a 3/4 quorum if unable to place in an AZ for any extended period. You need a pretty bad failure to get a write outage.’ (via High Scalability)
Nice list — lots of random toy services
actual stats and data on how programming languages affect coding work
Whoa, trouble at mill in Dockerland!
When Docker was first introduced to us in early 2013, the idea of a “standard container” was striking and immediately attractive: a simple component, a composable unit, that could be used in a variety of systems. The Docker repository included a manifesto of what a standard container should be. This was a rally cry to the industry, and we quickly followed. Brandon Philips, co-founder/CTO of CoreOS, became a top Docker contributor, and now serves on the Docker governance board. CoreOS is one of the most widely used platforms for Docker containers, and ships releases to the community hours after they happen upstream. We thought Docker would become a simple unit that we can all agree on. Unfortunately, a simple re-usable component is not how things are playing out. Docker now is building tools for launching cloud servers, systems for clustering, and a wide range of functions: building images, running images, uploading, downloading, and eventually even overlay networking, all compiled into one monolithic binary running primarily as root on your server. The standard container manifesto was removed. We should stop talking about Docker containers, and start talking about the Docker Platform. It is not becoming the simple composable building block we had envisioned.
excellent guide (via JK)
isn’t that curious.
Reading between the lines, it looks like Rails 4 is waaay slower than 3….
Good Docker info from Bridget Kromhout, on their production and dev usage of Docker at DramaFever. lots of good real-world tips
Two years later, he heard from Lisa S., an assistant set designer on [the movie] Stuart Little. She had bought the painting for $500 from an antiques store in Pasadena specifically for the movie because she thought its cool elegance was perfectly suited for the Little’s New York City apartment. Lisa S. had tracked it down in another warehouse and purchased it from Sony just because she liked it so much. When she contacted Barki, she had no idea of the history of the painting hanging on her bedroom wall. After Barki visited the painting in person and confirmed its identity, Lisa sold it to a private collector. That collector has now been persuaded to sell it in Hungary. It will go up for auction at the Virag Judit Art Gallery in Budapest on December 13th with a starting price of 110,000 euros ($160,000). Gergely Barki won’t make a dime off of his discovery, but he will have a great story to tell in his biography of the artist.
wow, that is too much effort for a 7-year-old’s Minecraft server ;) Very impressive
How Rust avoids GC overhead using it’s “borrow” system:
Rust achieves memory safety without GC by using a sophiscated borrow system. For any resource (stack memory, heap memory, file handle and so on), there is exactly one owner which takes care of its resource deallocation, if needed. You may create new bindings to refer to the resource using & or &mut, which is called a borrow or mutable borrow. The compiler ensures all owners and borrowers behave correctly.
Actually, I’m really agreeing with a lot of this. Particularly this part:
Programmers will cringe at writing some kind of command dispatch list: if command = “up”: up() elif command = “status”: status() elif command = “revert”: revert() … so they’ll go off and write some introspecting auto-dispatch cleverness, but that takes longer to write and will surely confuse future readers who’ll wonder how the heck revert() ever gets called. Yet the programmer will incorrectly feel as though he saved himself time. This is the trap of the dynamic language. It feels like you’re being more productive, but aside from the first 10 minutes of a new program, you’re not. Just write the stupid dispatch manually and get on with the real work.I’ve also gone right off dynamic languages for any kind of non-toy work. Mind you he needs to get around to ditching Vim for a proper IDE. That’s the key thing that makes coding in a statically-typed language really pleasant — when graphical refactoring becomes easy and usable, and errors are visible as you type them…
whoa, this is incredibly in-depth
“ping foo.bar” will not append the “search” domains configured in /etc/resolv.conf. Apparently this has been broken since OS X Lion, no sign of a fix. Nice work Apple
a catastrophic TCP throughput collapse that occurs as the number of storage servers sending data to a client increases past the ability of an Ethernet switch to buffer packets. In a clustered file system, for example, a client application requests a data block striped across several storage servers, issuing the next data block request only when all servers have responded with their portion (Figure 1). This synchronized request workload can result in packets overfilling the buffers on the client’s port on the switch, resulting in many losses. Under severe packet loss, TCP can experience a timeout that lasts a minimum of 200ms, determined by the TCP minimum retransmission timeout (RTOmin).
Excellent real-world war story from Facebook — a long-running mystery bug was eventually revealed to be a combination of edge-case behaviours across all the layers of the networking stack, from L2 link aggregation at the agg-router level, up to the L7 behaviour of the MySQL client connection pool.
Facebook collocates many of a user’s nodes and edges in the social graph. That means that when somebody logs in after a while and their data isn’t in the cache, we might suddenly perform 50 or 100 database queries to a single database to load their data. This starts a race among those queries. The queries that go over a congested link will lose the race reliably, even if only by a few milliseconds. That loss makes them the most recently used when they are put back in the pool. The effect is that during a query burst we stack the deck against ourselves, putting all of the congested connections at the top of the deck.
Macaroons are an excellent fit for NoSQL data storage for several reasons. First, they enable an application developer to enforce security policies at very fine granularity, per object. Gone are the clunky security policies based on the IP address of the client, or the per-table access controls of RDBMSs that force you to split up your data across many tables. Second, macaroons ensure that a client compromise does not lead to loss of the entire database. Third, macaroons are very flexible and expressive, able to incorporate information from external systems and third-party databases into authorization decisions. Finally, macaroons scale well and are incredibly efficient, because they avoid public-key cryptography and instead rely solely on fast hash functions.
Cable listed as owned by Eircom and Cable and Wireless (now Vodafone?)
[Hermitage is] a test suite for databases which probes for a variety of concurrency issues, and thus allows a fair and accurate comparison of isolation levels. Each test case simulates a particular kind of race condition that can happen when two or more transactions concurrently access the same data. Each test can pass (if the database’s implementation of isolation prevents the race condition from occurring) or fail (if the race condition does occur).
Hootsuite used Consul for distributed configuration, specifically dark-launch feature flags, with great results: ‘Trying out bleeding edge software can be a risky proposition, but in the case of Consul, we’ve found it to be a solid system that works basically as described and was easy to get up and running. We managed to go from initial investigations to production within a month. The value was immediately obvious after looking into the key-value store combined with the events system and it’s DNS features and each of these has worked how we expected. Overall it has been fun to work with and has worked well and based on the initial work we have done with the Dark Launching system we’re feeling confident in Consul’s operation and are looking forward to expanding the scope of it’s use.’
excellent case study of production-scale usage of Docker
UNIX system service [jmason: ie a sidecar] that collects events and reliably delivers them to kafka, relieving other services on the same system from having to do so. Journals events through bolt-db so that in the event of an kafka outage, events can still be accepted, and will be delivered when kafka becomes available.
Excellent advice from Tomasz Nurkiewicz’ blog for anyone using java.util.concurrent.ExecutorService regularly. The whole blog is full of great posts btw
Good advice for Hadoop optimization
Nice work by Andrew Spyker — this should be an official feature of the re:Invent website, really
Excellent data on current EBS performance characteristics
A desktop app for finding and inserting GIFs into any conversationOh yes.
The researchers have no doubt that Regin is a nation-state tool and are calling it the most sophisticated espionage machine uncovered to date—more complex even than the massive Flame platform, uncovered by Kaspersky and Symantec in 2012 and crafted by the same team who created Stuxnet. “In the world of malware threats, only a few rare examples can truly be considered groundbreaking and almost peerless,” writes Symantec in its report about Regin. Though no one is willing to speculate on the record about Regin’s source, news reports about the Belgacom and Quisquater hacks pointed a finger at GCHQ and the NSA. Kaspersky confirms that Quisqater was infected with Regin, and other researchers familiar with the Belgacom attack have told WIRED that the description of Regin fits the malware that targeted the telecom, though the malicious files used in that attack were given a different name, based on something investigators found inside the platform’s main file.
Charted is a tool for automatically visualizing data, created by the Product Science team at Medium. Give it the link to a data file and Charted returns a beautiful, shareable chart of the data.Nice, but it’s no graphite — pretty basic.
The Uber controversy is not just—or even mainly—a technology story, it’s fundamentally a deregulation story; the story of a uniquely American fundamentalist free-market worldview being sold to us in the name of “car-sharing” and innovation.
‘Design your own beer labels’
This is classic. I love the “Rouge”:
We also wanted the Rouge to actually look like a stealth-oriented make-up artist, but our 3D artist thought the goat looked ridiculous with a pink wig and a Gucci bag, so we remade the Rouge to actually look like a Rogue.
Finally after all traditional means of infection were covered; IT started looking into other possibilities. They finally asked the Executive, “Have there been any changes in your life recently”? The executive answer “Well yes, I quit smoking two weeks ago and switched to e-cigarettes”. And that was the answer they were looking for, the made in china e-cigarette had malware hard coded into the charger and when plugged into a computer’s USB port the malware phoned home and infected the system. Moral of the story is have you ever question the legitimacy of the $5 dollar EBay made in China USB item that you just plugged into your computer? Because you should, you damn well should.(Via Elliot)
As part of a performance update to Azure Storage, an issue was discovered that resulted in reduced capacity across services utilizing Azure Storage, including Virtual Machines, Visual Studio Online, Websites, Search and other Microsoft services. Prior to applying the performance update, it had been tested over several weeks in a subset of our customer-facing storage service for Azure Tables. We typically call this “flighting,” as we work to identify issues before we broadly deploy any updates. The flighting test demonstrated a notable performance improvement and we proceeded to deploy the update across the storage service. During the rollout we discovered an issue that resulted in storage blob front ends going into an infinite loop, which had gone undetected during flighting. The net result was an inability for the front ends to take on further traffic, which in turn caused other services built on top to experience issues.I’m really surprised MS deployment procedures allow a change to be rolled out globally across multiple regions on a single day. I suspect they soon won’t.
This is a really solid talk — not surprising, alv@ is one of the speakers!
“AWS and Amazon.com operate some of the world’s largest distributed systems infrastructure and applications. In our past 18 years of operating this infrastructure, we have come to realize that building such large distributed systems to meet the durability, reliability, scalability, and performance needs of AWS requires us to build our services using a few common distributed systems primitives. Examples of these primitives include a reliable method to build consensus in a distributed system, reliable and scalable key-value store, infrastructure for a transactional logging system, scalable database query layers using both NoSQL and SQL APIs, and a system for scalable and elastic compute infrastructure. In this session, we discuss some of the solutions that we employ in building these primitives and our lessons in operating these systems. We also cover the history of some of these primitives — DHTs, transactional logging, materialized views and various other deep distributed systems concepts; how their design evolved over time; and how we continue to scale them to AWS. “Slides: http://www.slideshare.net/AmazonWebServices/spot302-under-the-covers-of-aws-core-distributed-systems-primitives-that-power-our-platform-aws-reinvent-2014
“SCE to off?” someone said. The switch was so obscure that neither of his bosses knew what he was talking about. “What the hell’s that,” blurted out Gerald Carr, who was in charge of communicating with the capsule. The rookie flight director, Gerry Griffin, didn’t know either. Sixty seconds had passed since the initial lightning strike. No one else knew what to do. The call to abort was fast approaching. Finally, Carr reluctantly gave the order in a voice far cooler than the moment. “Apollo 12, Houston, try SCE to Auxiliary, over.”
Twitter’s new massive-scale twitter search backend. Sharding galore
‘I went over to Japan right around the time Takeshi was deciding which emoji were going to make it into the first cut of Gmail emoji. The [PILE_OF_POO emoji] was absolutely one of the necessary emoji that Takeshi said we have to have. There was actually conflict because there were people back at headquarters who had no idea what emoji were, and thought that having an animated [turd] in their Gmail was offensive.’ ‘[The poop emoji] got very popular when a comic called “Dr. Slump” was broadcast in Japan back to the ‘90s. Such poop was not an object to be disliked, but it had a funny meaning. This was a very popular comedy animation where a girl played a trick on other people using the poop. The poop was this funny object to play with. It was never serious.’ ‘In Japanese that’s called “unchi.” It’s a child word with a benign meaning. ‘
We plan to send an unmanned robotic landing module to the South Pole of the Moon – an area unexplored by previous missions. We’re going to use pioneering technology to drill down to a depth of at least 20m – 10 times deeper than has ever been drilled before – and potentially as deep as 100m. By doing this, we will access lunar rock dating back up to 4.5 billion years to discover the geological composition of the Moon, the ancient relationship it shares with our planet and the effects of asteroid bombardment. Ultimately, the project will improve scientific understanding of the early solar system, the formation of our planet and the Moon, and the conditions that initiated life on Earth.Kickstarter-funded — UKP 600k goal. Just in time for xmas!
Unlike the (excellent) Typescript, it’ll infer types:
An extremely good explanation from Marc Brooker that exactly-once delivery in a distributed system is very hard.
And so on. There’s always a place to slot in one more turtle. The bad news is that I’m not aware of a nice solution to the general problem for all side effects, and I suspect that no such solution exists. On the bright side, there are some very nice solutions that work really well in practice. The simplest is idempotence. This is a very simple idea: we make the tasks have the same effect no matter how many times they are executed.
This topographic map represents Ireland. It is designed for “hillwalking”. The contour lines are extracted from SRTM public data provided by NASA. These files contain a digitized ground represented by points. The sample rate defines a grid resolution for Ireland around 90m in northing and 60m in easting. In major cases, digitized points do not correspond with summits. Carrauntoohil (1039m, the highest summit of Ireland) does not appear in SRTM data. The altitude reaches only 1018m. Data were obtain from space with a radar. Because of the relative position between the radar and the earth, a shadow appears in some conditions (along ridges, behind summits…). This shadow matches with a gap in data (Imagine you with a flashlight in a dark room. It is hard to see what is in shadow). To close these gaps, you need other data or you can do interpolation. The second solution is applied in our case. There is one square degree per SRTM file with a sample rate of 1200×1200 points/square degree at Ireland latitude. [...] All in all you obtain contour lines pretty sufficient for walking.
Marcus Ramberg says: ‘If you have a chromecast and you’re not using castnow, I don’t know what is wrong with you.’
John Allspaw with an interesting assertion that we need to ask “how”, not “why” in five-whys postmortems:
“Why?” is the wrong question. In order to learn (which should be the goal of any retrospective or post-hoc investigation) you want multiple and diverse perspectives. You get these by asking people for their own narratives. Effectively, you’re asking “how?“ Asking “why?” too easily gets you to an answer to the question “who?” (which in almost every case is irrelevant) or “takes you to the ‘mysterious’ incentives and motivations people bring into the workplace.” Asking “how?” gets you to describe (at least some) of the conditions that allowed an event to take place, and provides rich operational data.
hooray, no more uberjars or monster commandlines!
‘From 19 Nov, 2014 00:52 to 05:50 UTC a subset of customers using Storage, Virtual Machines, SQL Geo-Restore, SQL Import/export, Websites, Azure Search, Azure Cache, Management Portal, Service Bus, Event Hubs, Visual Studio, Machine Learning, HDInsights, Automation, Virtual Network, Stream Analytics, Active Directory, StorSimple and Azure Backup Services in West US and West Europe experienced connectivity issues. This incident has now been mitigated.’ There was knock-on impact until 11:00 UTC (storage in N Europe), 11:45 UTC (websites, West Europe), and 09:15 UTC (storage, West Europe), from the looks of things. Should be an interesting postmortem.
Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortunately, they can use too much memory. To compensate, we often use compressed bitmaps. Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, they can be hundreds of times faster and they often offer significantly better compression. Roaring bitmaps are used in Apache Lucene (as of version 5.0 using an independent implementation) and Apache Spark (as of version 1.2).
‘Unsupervised anomaly detection is the process of finding outliers in data sets without prior training. In this paper, a histogram-based outlier detection (HBOS) algorithm is presented, which scores records in linear time. It assumes independence of the features making it much faster than multivariate approaches at the cost of less precision. A comparative evaluation on three UCI data sets and 10 standard algorithms show, that it can detect global outliers as reliable as state-of-the-art algorithms, but it performs poor on local outlier problems. HBOS is in our experiments up to 5 times faster than clustering based algorithms and up to 7 times faster than nearest-neighbor based methods.’
iPad On A Face by Cheryl Wu is a telepresence robot, except it’s a human with an iPad on his or her face.