love it when things like this show up
She thought they were a normal couple until she found a passport in a glovebox – and then her world shattered. Now she is finally getting compensation and a police apology for that surreal, state-sponsored deception. But she still lies awake and wonders: did he ever really love me?I can’t believe this was going on in the 2000s!
Using SHA-1 [to generate random numbers] in this way, with a random seed and a counter, is just building a (perfectly sound) CSPRNG with, I believe, an 80-bit security level. If you trust the source of the random seed, e.g. /dev/urandom, you may as well just use /dev/urandom itself. If you don’t, you’re already in trouble. And if you somehow need a userspace PRNG, the usual advice about not rolling your own crypto unless you know what you’re doing applies. (Especially for database IDs, the risk of collisions should be considered a security problem, ergo this should be considered crypto, until proven otherwise.) In this case, using BLAKE2 instead of SHA-1 would get you a higher security level and faster hashing. Or, in tptacek’s words: http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/
Nice approach to package authentication UX using Keybase/PGP.
When you go to install a package, Sandstorm verifies that the package is correctly signed by the Ed25519 key. It looks for a PGP signature in the metadata, and verifies that the PGP-signed assertion is for the correct app ID and the email address specified in the metadata. It queries the Keybase API to see what accounts the packager has proven ownership of, and lists them with their links on the app install page.
Floating car data (FCD), also known as floating cellular data, is a method to determine the traffic speed on the road network. It is based on the collection of localization data, speed, direction of travel and time information from mobile phones in vehicles that are being driven. These data are the essential source for traffic information and for most intelligent transportation systems (ITS). This means that every vehicle with an active mobile phone acts as a sensor for the road network. Based on these data, traffic congestion can be identified, travel times can be calculated, and traffic reports can be rapidly generated. In contrast to traffic cameras, number plate recognition systems, and induction loops embedded in the roadway, no additional hardware on the road network is necessary.
A lovely cite from @conor. Turns out the sheer size of an OO class is itself a solid fault-proneness metric
Orcas Island, WA. impressive stuff
“Whether your personal information has been stolen or not, your best protection against someone opening new credit accounts in your name is the security freeze (also known as the credit freeze), not the often-offered, under-achieving credit monitoring. Paid credit monitoring services in particular are not necessary because federal law requires each of the three major credit bureaus to provide a free credit report every year to all customers who request one. You can use those free reports as a form of do-it-yourself credit monitoring.”
ugh, quite a long list of LastPass security issues
News emerging from Paris — as well as evidence from a Belgian ISIS raid in January — suggests that the ISIS terror networks involved were communicating in the clear, and that the data on their smartphones was not encrypted.
Netflix’ CD platform, post-Atlas. looks interesting
by reordering items to optimize locality. Via aphyr’s dad!
lol. Nice work, Forbes
Ugh. Queue tracking using secret MAC address tracking in Dublin Airport:
“I think the fundamental issue is one of consent. Dublin Airport have been tracking individual MAC addresses since 2012 and there doesn’t appear to be anywhere in the airport where they warn passengers that this is this occurring. “If they have to signpost CCTV, then mobile phone tracking should at a very minimum be sign-posted for passengers,” he continues.And how long are MAC addresses retained for, I wonder?
Maciej Ceglowski went to an O’Reilly SV-boosterish conference and produced these excellent tweets
Kim Stanley Robinson on the feasibility of interstellar colonization: ‘There is no Planet B! Earth is our only possible home!’
In this paper, we have assessed the impact of Docker containers technology on the performance of genomic pipelines, showing that container “virtualization” has a negligible overhead on pipeline performance when it is composed of medium/long running tasks, which is the most common scenario in computational genomic pipelines. Interestingly for these tasks the observed standard deviation is smaller when running with Docker. This suggests that the execution with containers is more “homogeneous,” presumably due to the isolation provided by the container environment. The performance degradation is more significant for pipelines where most of the tasks have a fine or very fine granularity (a few seconds or milliseconds). In this case, the container instantiation time, though small, cannot be ignored and produces a perceptible loss of performance.
The astonishing figures come two months after computer scientists in the UK warned that thousands of cars – including high-end brands such as Porsches and Maseratis – are at risk of electronic hacking. Their research was suppressed for two years by a court injunction for fear it would help thieves steal vehicles to order. The kit required to carry out such “mouse jacking”, as the French have coined the practice, can be freely purchased on the internet for around £700 and the theft of a range of models can be pulled off “within minutes,” motor experts warn.
Awesome new mock DynamoDB implementation:
An implementation of Amazon’s DynamoDB, focussed on correctness and performance, and built on LevelDB (well, @rvagg’s awesome LevelUP to be precise). This project aims to match the live DynamoDB instances as closely as possible (and is tested against them in various regions), including all limits and error messages. Why not Amazon’s DynamoDB Local? Because it’s too buggy! And it differs too much from the live instances in a number of key areas.We use DynamoDBLocal in our tests — the availability of that tool is one of the key reasons we have adopted Dynamo so heavily, since we can safely test our code properly with it. This looks even better.
Imagine you are an operator in a nuclear power control room. An accident has started to unfold. During the first few minutes, more than 100 alarms go off, and there is no system for suppressing the unimportant signals so that you can concentrate on the significant alarms. Information is not presented clearly; for example, although the pressure and temperature within the reactor coolant system are shown, there is no direct indication that the combination of pressure and temperature mean that the cooling water is turning into steam. There are over 50 alarms lit in the control room, and the computer printer registering alarms is running more than 2 hours behind the events. This was the basic scenario facing the control room operators during the Three Mile Island (TMI) partial nuclear meltdown in 1979. The Report of the President’s Commission stated that, “Overall, little attention had been paid to the interaction between human beings and machines under the rapidly changing and confusing circumstances of an accident” (p. 11). The TMI control room operator on the day, Craig Faust, recalled for the Commission his reaction to the incessant alarms: “I would have liked to have thrown away the alarm panel. It wasn’t giving us any useful information”. It was the first major illustration of the alarm problem, and the accident triggered a flurry of human factors/ergonomics (HF/E) activity.A familiar topic for this ex-member of the Amazon network monitoring team…
We observed that the vast majority of the re-shipped packages end up in the Moscow, Russia area, and that the goods purchased with stolen credit cards span multiple categories, from expensive electronics such as Apple products, to designer clothes, to DSLR cameras and even weapon accessories. Given the amount of goods shipped by the reshipping mule sites that we analysed, the annual revenue generated from such operations can span between 1.8 and 7.3 million US dollars. The overall losses are much higher though: the online merchant loses an expensive item from its inventory and typically has to refund the owner of the stolen credit card. In addition, the rogue goods typically travel labeled as “second hand goods” and therefore custom taxes are also evaded. Once the items purchased with stolen credit cards reach their destination they will be sold on the black market by cybercriminals. [...] When applying for the job, people are usually required to send the operator copies of their ID cards and passport. After they are hired, mules are promised to be paid at the end of their first month of employment. However, from our data it is clear that mules are usually never paid. After their first month expires, they are never contacted back by the operator, who just moves on and hires new mules. In other words, the mules become victims of this scam themselves, by never seeing a penny. Moreover, because they sent copies of their documents to the criminals, mules can potentially become victims of identity theft.
That’s a lesson that Spruce Manor Special Care Home in Saskatchewan had to learn the hard way (as surprising as that might sound). As a trustee with custody of personal health information, Spruce Manor was required under section 17(2) of the Saskatchewan Health Information Protection Act to dispose of its patient records in a way that protected patient privacy. So, when Spruce Manor chose a chicken farm for the job, it found itself the subject of an investigation by the Saskatchewan Information and Privacy Commissioner. In what is probably one of the least surprising findings ever, the commissioner wrote in his final report that “I recommend that Spruce Manor […] no longer use [a] chicken farm to destroy records”, and then for good measure added “I find using a chicken farm to destroy records unacceptable.”
‘Caffeine is a Java 8 rewrite of Guava’s cache. In this version we focused on improving the hit rate by evaluating alternatives to the classic least-recenty-used (LRU) eviction policy. In collaboration with researchers at Israel’s Technion, we developed a new algorithm that matches or exceeds the hit rate of the best alternatives (ARC, LIRS). A paper of our work is being prepared for publication.’ Specifically:
W-TinyLfu uses a small admission LRU that evicts to a large Segmented LRU if accepted by the TinyLfu admission policy. TinyLfu relies on a frequency sketch to probabilistically estimate the historic usage of an entry. The window allows the policy to have a high hit rate when entries exhibit a high temporal / low frequency access pattern which would otherwise be rejected. The configuration enables the cache to estimate the frequency and recency of an entry with low overhead. This implementation uses a 4-bit CountMinSketch, growing at 8 bytes per cache entry to be accurate. Unlike ARC and LIRS, this policy does not retain non-resident keys.
The ever-shitty Java serialization creates a security hole
Danish glassware artist making wonderful Wunderkammers — cabinets of curiosities — entirely from glass. Seeing as one of his works sold for UKP50,000 last year, I suspect these are a bit out of my league, sadly
If it goes ahead, people’s progress across the structure would be tracked by monitors detecting the Wi-Fi signals from their phones, which show up the device’s Mac address, or unique identifying code. The Garden Bridge Trust says it will not store any of this data and is only tracking phones to count numbers and prevent overcrowding.
The Anderson Report to the House of Lords in the UK on RIPA introduces a concept of a “red line”:
“Firm limits must also be written into the law: not merely safeguards, but red lines that may not be crossed.” … “Some might find comfort in a world in which our every interaction and movement could be recorded, viewed in real time and indefinitely retained for possible future use by the authorities. Crime fighting, security, safety or public health justifications are never hard to find.” [13.19] The Report then gives examples, such as a perpetual video feed from every room in every house, the police undertaking to view the record only on receipt of a complaint; blanket drone-based surveillance; licensed service providers, required as a condition of the licence to retain within the jurisdiction a complete plain-text version of every communication to be made available to the authorities on request; a constant data feed from vehicles, domestic appliances and health-monitoring personal devices; fitting of facial recognition software to every CCTV camera and the insertion of a location-tracking chip under every individual’s skin. It goes on: “The impact of such powers on the innocent could be mitigated by the usual apparatus of safeguards, regulators and Codes of Practice. But a country constructed on such a basis would surely be intolerable to many of its inhabitants. A state that enjoyed all those powers would be truly totalitarian, even if the authorities had the best interests of its people at heart.” [13.20] … “The crucial objection is that of principle. Such a society would have gone beyond Bentham’s Panopticon (whose inmates did not know they were being watched) into a world where constant surveillance was a certainty and quiescence the inevitable result. There must surely come a point (though it comes at different places for different people) where the escalation of intrusive powers becomes too high a price to pay for a safer and more law abiding environment.” [13.21]
Comparable to Copenhagen or Amsterdam, albeit without sufficient cycling/public-transport infrastructural investment
I’m tired of this shit. Full stop tired. It’s 2015 and these turds who grope their way around conferences and the like can make allegations like this, get a hand wave and an, “Oh, that’s just crazy Raymond!” Fuck that. Fuck it from here to hell and back. Here’s a man who really hasn’t done anything all that special, is a totally crazy gun-toting misogynist of the highest order and, yet, he remains mostly unchallenged after the tempest dies down, time after time. [...] I’m sure ESR will still be haunting conferences when your daughters reach their professional years unless you get serious about outing the assholes like him and making the community a lot less toxic than it is now.?Amen to that.
An app from Drugs.com, meanwhile, sent the medical search terms “herpes” and “interferon” to five domains, including doubleclick.net, googlesyndication.com, intellitxt.com, quantserve.com, and scorecardresearch.com, although those domains didn’t receive other personal information.
Is this the first case of tech debt costing $18 billion?
“Perhaps the engineers told themselves that the cheat was a stopgap, and they’d address it later. If so, they didn’t.”
Graphite has a place in our current monitoring stack, and together with StatsD will always have a special place in the hearts of DevOps practitioners everywhere, but it’s not representative of state-of-the-art in the last few years. Graphite is where the puck was in 2010. If you’re skating there, you’re missing the benefits of modern monitoring infrastructure. The future I foresee is one where time series capabilities (the raw power needed, which I described in my time series requirements blog post, for example) are within everyone’s reach. That will be considered table stakes, whereas now it’s pretty revolutionary.Like I’ve been saying — we need Time Series As A Service! This should be undifferentiated heavy lifting.
PICO-8 is a fantasy console for making, sharing and playing tiny games and other computer programs. When you turn it on, the machine greets you with a shell for typing in Lua commands and provides simple built-in tools for creating your own cartridges.So cute! See also Voxatron, something similar for voxel-oriented 3D gaming
So is that kind of thriving food-truck scene something the city should work to encourage? Theresa Hernandez, one of the owners of K Chido Mexico, thinks so. “There’s a whole market there for a new culture,” she says. “There’s no doubt about it, the appetite is there. It’s just a matter for somebody who is innovative enough in Dublin City Council to say: ‘Right, let’s do this.’”Amen to that.
Facebook’s open-source implementation of the CoDel queue management algorithm applied to server request-handling capacity in their C++ service bootstrap library, Wangle.
Despite its overarching abstractions, it is semantically non-uniform and its complicated transaction and job scheduling heuristics ordered around a dependently networked object system create pathological failure cases with little debugging context that would otherwise not necessarily occur on systems with less layers of indirection. The use of bus APIs complicate communication with the service manager and lead to duplication of the object model for little gain. Further, the unit file options often carry implicit state or are not sufficiently expressive. There is an imbalance with regards to features of an eager service manager and that of a lazy loading service manager, having rusty edge cases of both with non-generic, manager-specific facilities. The approach to logging and the circularly dependent architecture seem to imply that lots of prior art has been ignored or understudied.
Great paper from Ben Maurer of Facebook in ACM Queue.
A “move-fast” mentality does not have to be at odds with reliability. To make these philosophies compatible, Facebook’s infrastructure provides safety valves.This is full of interesting techniques. * Rapidly deployed configuration changes: Make everybody use a common configuration system; Statically validate configuration changes; Run a canary; Hold on to good configurations; Make it easy to revert. * Hard dependencies on core services: Cache data from core services. Provide hardened APIs. Run fire drills. * Increased latency and resource exhaustion: Controlled Delay (based on the anti-bufferbloat CoDel algorithm — this is really cool); Adaptive LIFO (last-in, first-out) for queue busting; Concurrency Control (essentially a form of circuit breaker). * Tools that Help Diagnose Failures: High-Density Dashboards with Cubism (horizon charts); What just changed? * Learning from Failure: the DERP (!) methodology,
(tags: ben-maurer facebook reliability algorithms codel circuit-breakers derp failure ops cubism horizon-charts charts dependencies soa microservices uptime deployment configuration change-management)
This is really impressive, but also a little scary. Drivers driving the Tesla Model S are “phoning home” training data as they drive:
A Model S owner by the username Khatsalano kept a count of how many times he had to “rescue” (meaning taking control after an alert) his Model S while using the Autopilot on his daily commute. He counted 6 “rescues” on his first day, by the fourth day of using the system on his 23.5 miles commute, he only had to take control over once. Musk said that Model S owners could add ~1 million miles of new data every day, which is helping the company create “high precision maps”.Wonder if the data protection/privacy implications have been considered for EU use.
For requesting a copy of an article that was legally obtained by a colleague from a paywalled source, Pazsowski found himself hit with around US$10,000-worth of damages. This completely disproportionate punishment for what is at most a minor case of copyright infringement is a perfect demonstration of where the anti-circumvention madness leads.
Add another one to the “yay for DST” pile. (also yay for AWS using PST/PDT as default internal timezone instead of UTC…)
GCE’s LB product is pretty nice — HTTP/2 support, and a built-in URL mapping feature (presumably based on how Google approach that problem internally, I understand they take that approach). I’m hoping AWS are taking notes for the next generation of ELB, if that ever happens
a Lambda emulator in Python, suitable for unit testing lambdas
Symantec are getting a crash course in how to conduct an incident post-mortem to boot:
More immediately, we are requesting of Symantec that they further update their public incident report with: A post-mortem analysis that details why they did not detect the additional certificates that we found. Details of each of the failures to uphold the relevant Baseline Requirements and EV Guidelines and what they believe the individual root cause was for each failure. We are also requesting that Symantec provide us with a detailed set of steps they will take to correct and prevent each of the identified failures, as well as a timeline for when they expect to complete such work. Symantec may consider this latter information to be confidential and so we are not requesting that this be made public.
google now mirroring Maven Central.
In the new design, we use Hierarchical Timing Wheels for the timeout timer and DelayQueue of timer buckets to advance the clock on demand. Completed requests are removed from the timer queue immediately with O(1) cost. The buckets remain in the delay queue, however, the number of buckets is bounded. And, in a healthy system, most of the requests are satisfied before timeout, and many of the buckets become empty before pulled out of the delay queue. Thus, the timer should rarely have the buckets of the lower interval. The advantage of this design is that the number of requests in the timer queue is the number of pending requests exactly at any time. This allows us to estimate the number of requests need to be purged. We can avoid unnecessary purge operation of the watcher lists. As the result we achieve a higher scalability in terms of request rate with much better CPU usage.
a new LinkedIn open source data store, for write-once/read-mainly side data, java, Apache licensed. RocksDB discussion: https://www.facebook.com/groups/rocksdb.dev/permalink/834956096602906/
“The computer can recognize faces, a feature that comes in handy if somebody’s is trying to get an illegal ID. It apparently is not programmed to detect twins.” As Hilary Mason put it: “You do not want to be an edge case in this future we are building.”
‘By Bordne’s account, at the height of the Cuban Missile Crisis, Air Force crews on Okinawa were ordered to launch 32 missiles, each carrying a large nuclear warhead. Only caution and the common sense and decisive action of the line personnel receiving those orders prevented the launches—and averted the nuclear war that most likely would have ensued.’
super-basic ECS tutorial, using a docker-compose.yml to create a new ECS-managed service fleet
In the end, sheer political fatigue may have played a major part in undermining net neutrality in the EU. However, the battle is not quite over. As Anne Jellema, CEO of the Web Foundation, which was established by Berners-Lee in 2009, notes in her response to today’s EU vote: “The European Parliament is essentially tossing a hot potato to the Body of European Regulators, national regulators and the courts, who will have to decide how these spectacularly unclear rules will be implemented. The onus is now on these groups to heed the call of hundreds of thousands of concerned citizens and prevent a two-speed Internet.”
Swrve’s own Dave Brodigan on game user-data analysis techniques:
The goal is to give the audience a roadmap for analysing user data using python friendly tools. I will touch on many aspects of the data science pipeline from data cleansing to building predictive data products at scale. I will start gently with pandas and dataframes and then discuss some machine learning techniques like kmeans and random forests in scikitlearn and then introduce Spark for doing it at scale. I will focus more on the use cases rather than detailed implementation. The talk will be informed by my experience and focus on user behaviour in games and mobile apps.
fast, modern, zero-conf load balancing HTTP(S) router managed by consul; serves 15k reqs/sec, in Go, from eBay
pretty conventional HTTP/1.1, WebSockets and HTTP/2 front-end services with modern Netty practices
One of the best things about working at Amazon was having a clear, well-defined career progression, and it’s something that’s always been absent in startups. Career growth, levelling, and tech management is important, and also helps in hiring by providing clear levels. This is the RentTheRunway engineering ladder, Camille Fournier’s team, which they open sourced back in March 2015
The stolen cards were still considered evidence, so the researchers couldn’t do a full tear-down or run any tests that would alter the data on the card, so they used X-ray scans to look at where the chip cards had been tampered with. They also analyzed the way the chips distributed electricity when in use and used read-only programs to see what information the cards sent to a Point of Sale (POS) terminal. According to the paper, the fraudsters were able to perform a man-in-the-middle attack by programming a second hobbyist chip called a FUN card to accept any PIN entry, and soldering that chip onto the card’s original chip. This increased the thickness of the chip from 0.4mm to 0.7mm, “making insertion into a PoS somewhat uneasy but perfectly feasible,” the researchers write. [....] The researchers explain that a typical EMV transaction involves three steps: card authentication, cardholder verification, and then transaction authorization. During a transaction using one of the altered cards, the original chip was allowed to respond with the card authentication as normal. Then, during card holder authentication, the POS system would ask for a user’s PIN, the thief would respond with any PIN, and the FUN card would step in and send the POS the code indicating that it was ok to proceed with the transaction because the PIN checked out. During the final transaction authentication phase, the FUN card would relay the transaction data between the POS and the original chip, sending the issuing bank an authorization request cryptogram which the card issuer uses to tell the POS system whether to accept the transaction or not.
using Spark, Tesseract, HBase, Solr and Leptonica. Actually pretty feasible
The metric is termed ?(P)-consistency, and is actually very simple. A read for the same data is sent to all replicas in P, and ?(P)-consistency is defined as the frequency with which that read returns the same result from all replicas. ?(G)-consistency applies this metric globally, and ?(R)-consistency applies it within a region (cluster). Facebook have been tracking this metric in production since 2012.
How FB push config changes from Git (where it is code reviewed, version controlled, and history tracked with strong auth) to Zeus (their Zookeeper fork) and from there to live production servers.
a high-performance multiple regex matching library. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.Via Tony Finch
Hologram exposes an imitation of the EC2 instance metadata service on developer workstations that supports the [IAM Roles] temporary credentials workflow. It is accessible via the same HTTP endpoint to calling SDKs, so your code can use the same process in both development and production. The keys that Hologram provisions are temporary, so EC2 access can be centrally controlled without direct administrative access to developer workstations.
Andrew Spyker’s roundup:
my quick index of all re:Invent sessions. Please wait for a few days and I’ll keep running the tool to fill in the index. It usually takes Amazon a few weeks to fully upload all the videos and slideshares.Pretty definitive, full text descriptions of all sessions (and there are an awful lot of ‘em).
Describing PlayOn! Sports’ Lambda setup. Sounds pretty productionizable
Familial DNA searching has massive false positives, but is being used to tag suspects:
The bewildered Usry soon learned that he was a suspect in the 1996 murder of an Idaho Falls teenager named Angie Dodge. Though a man had been convicted of that crime after giving an iffy confession, his DNA didn’t match what was found at the crime scene. Detectives had focused on Usry after running a familial DNA search, a technique that allows investigators to identify suspects who don’t have DNA in a law enforcement database but whose close relatives have had their genetic profiles cataloged. In Usry’s case the crime scene DNA bore numerous similarities to that of Usry’s father, who years earlier had donated a DNA sample to a genealogy project through his Mormon church in Mississippi. That project’s database was later purchased by Ancestry, which made it publicly searchable—a decision that didn’t take into account the possibility that cops might someday use it to hunt for genetic leads. Usry, whose story was first reported in The New Orleans Advocate, was finally cleared after a nerve-racking 33-day wait — the DNA extracted from his cheek cells didn’t match that of Dodge’s killer, whom detectives still seek. But the fact that he fell under suspicion in the first place is the latest sign that it’s time to set ground rules for familial DNA searching, before misuse of the imperfect technology starts ruining lives.
ScyllaDB (the C* clone in C++) is now actually looking promising — still need more reassurance about its consistency/reliabilty side though
As we will see below, there has long been ample evidence that errors in spreadsheets are pandemic. Spreadsheets, even after careful development, contain errors in one percent or more of all formula cells. In large spreadsheets with thousands of formulas, there will be dozens of undetected errors. Even significant errors may go undetected because formal testing in spreadsheet development is rare and because even serious errors may not be apparent.
great post from Ross Duggan on avoiding developer burnout
If a client and server are speaking Diffie-Hellman, they first need to agree on a large prime number with a particular form. There seemed to be no reason why everyone couldn’t just use the same prime, and, in fact, many applications tend to use standardized or hard-coded primes. But there was a very important detail that got lost in translation between the mathematicians and the practitioners: an adversary can perform a single enormous computation to “crack” a particular prime, then easily break any individual connection that uses that prime. How enormous a computation, you ask? Possibly a technical feat on a scale (relative to the state of computing at the time) not seen since the Enigma cryptanalysis during World War II. Even estimating the difficulty is tricky, due to the complexity of the algorithm involved, but our paper gives some conservative estimates. For the most common strength of Diffie-Hellman (1024 bits), it would cost a few hundred million dollars to build a machine, based on special purpose hardware, that would be able to crack one Diffie-Hellman prime every year. Would this be worth it for an intelligence agency? Since a handful of primes are so widely reused, the payoff, in terms of connections they could decrypt, would be enormous. Breaking a single, common 1024-bit prime would allow NSA to passively decrypt connections to two-thirds of VPNs and a quarter of all SSH servers globally. Breaking a second 1024-bit prime would allow passive eavesdropping on connections to nearly 20% of the top million HTTPS websites. In other words, a one-time investment in massive computation would make it possible to eavesdrop on trillions of encrypted connections.(via Eric)
Coursera are running user-submitted code in ECS! interesting stuff about how they use Docker security/resource-limiting features, forking the ecs-agent code, to run user-submitted code. :O
At Twitter, a team had a unusual failure where corrupt data ended up in memcache. The root cause appears to have been a switch that was corrupting packets. Most packets were being dropped and the throughput was much lower than normal, but some were still making it through. The hypothesis is that occasionally the corrupt packets had valid TCP and Ethernet checksums. One “lucky” packet stored corrupt data in memcache. Even after the switch was replaced, the errors continued until the cache was cleared.YA occurrence of this bug. When it happens, it tends to _really_ screw things up, because it’s so rare — we had monitoring for this in Amazon, and when it occurred, it overwhelmingly occurred due to host-level kernel/libc/RAM issues rather than stuff in the network. Amazon design principles were to add app-level checksumming throughout, which of course catches the lot.
How Spotify use nginx as a frontline for their sites and services
Supports Spotify — totally getting one of these
The sounds came first — as experiments in vocalization — and parents adopted them as pet names for themselves. If you open your mouth and make a sound, it will probably be an open vowel like /a/ unless you move your tongue or lips. The easiest consonants are perhaps the bilabials /m/, /p/, and /b/, requiring no movement of the tongue, followed by consonants made by raising the front of the tongue: /d/, /t/, and /n/. Add a dash of reduplication, and you get mama, papa, baba, dada, tata, nana. That such words refer to people (typically parents or other guardians) is something we have imposed on the sounds and incorporated into our languages and cultures; the meanings don’t inhere in the sounds as uttered by babies, which are more likely calls for food or attention.
‘A fast build system for Docker images’, open source, in Go, hooks into Github
All 11 terabytes of our LIDAR data (that’s roughly equivalent to 2,750,000 MP3 songs) will eventually be available through our new Open LIDAR portal under an Open Government Licence, allowing it to be used for any purpose. We hope that by giving free access to our data businesses and local communities will develop innovative solutions to benefit the environment, grow our thriving rural economy, and boost our world-leading food and farming industry. The possibilities are endless and we hope that making LIDAR data open will be a catalyst for new ideas and innovation.Are you reading, Ordnance Survey Ireland?
Another sorry tale of Storm issues:
Storm has been successful at Librato, but we experienced many of the limitations cited in the Twitter Heron: Stream Processing at Scale paper and outlined here by Adrian Colyer, including: Inability to isolate, reason about, or debug performance issues due to the worker/executor/task paradigm. This led to building and configuring clusters specifically designed to attempt to mitigate these problems (i.e., separate clusters per topology, only running a worker per server.), which added additional complexity to development and operations and also led to over-provisioning. Ability of tasks to move around led to difficult to trace performance problems. Storm’s work provisioning logic led to some tasks serving more Kafka partitions than others. This in turn created latency and performance issues that were difficult to reason about. The initial solution was to over-provision in an attempt to get a better hashing/balancing of work, but eventually we just replaced the work allocation logic. Due to Storm’s architecture, it was very difficult to get a stack trace or heap dump because the processes that managed workers (Storm supervisor) would often forcefully kill a Java process while it was being investigated in this way. The propensity for unexpected and subsequently unhandled exceptions to take down an entire worker led to additional defensive verbose error handling everywhere. This nasty bug STORM-404 coupled with the aforementioned fact that a single exception can take down a worker led to several cascading failures in production, taking down entire topologies until we upgraded to 0.9.4. Additionally, we found the performance we were getting from Storm for the amount of money we were spending on infrastructure was not in line with our expectations. Much of this is due to the fact that, depending upon how your topology is designed, a single tuple may make multiple hops across JVMs, and this is very expensive. For example, in our time series aggregation topologies a single tuple may be serialized/deserialized and shipped across the wire 3-4 times as it progresses through the processing pipeline.
Librato’s service discovery library using Zookeeper (so strongly consistent, but with the ZK downside that an AZ outage can stall service discovery updates region-wide)
“Big companies didn’t only rely on safe harbour: they also rely on binding corporate rules and standard contractual clauses. But it’s interesting that the court decided the case on fundamental rights grounds: so it doesn’t matter remotely what ground you transfer on, if that process is still illegal under 7 and 8 of charter, it can’t be done.”Also:
“Ireland has no interest in doing its job, and will continue not to, forever. Clearly it’s an investment issue – but overall the policy is: we don’t regulate companies here. The cost of challenging any of this in the courts is prohibitive. And the people don’t seem to care.”:(
Sounds like the CJEU’s Bara decision may cause problems for the Irish government’s wilful data-sharing:
Articles 10, 11 and 13 of Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995, on the protection of individuals with regard to the processing of personal data and on the free movement of such data, must be interpreted as precluding national measures, such as those at issue in the main proceedings, which allow a public administrative body of a Member State to transfer personal data to another public administrative body and their subsequent processing, without the data subjects having been informed of that transfer or processing.
uses the techniques invented by the authors of Paris-traceroute to enumerate the paths of ECMP flow-based load balancing, but introduces a new technique for NAT detection.handy. written by AWS SDE Andrea Barberio!
‘Seekable and Splittable Gzip’, from eBay
There was a breakdown in communication between the developer who requested the index migration and the database operator who deleted the old index. Instead of working on the migration together, they communicated in an implicit way through flawed tooling. The dashboard that surfaced the migration request was missing important context: the reason for the requested deletion, the dependency on another index’s creation, and the criticality of the index for API traffic. Indeed, the database operator didn’t have a way to check whether the index had recently been used for a query.Good demo of how the Etsy-style chatops deployment approach would have helped avoid this risk.
Wendy Grossman on where the Safe Harbor decision is leading.
One clause would require European companies to tell their relevant data protection authorities if they are being compelled to turn over data – even if they have been forbidden to disclose this under US law. Sounds nice, but doesn’t mobilize the rock or soften the hard place, since companies will still have to pick a law to violate. I imagine the internal discussions there revolving around two questions: which violation is less likely to land the CEO in jail and which set of fines can we afford?(via Simon McGarr)
bookmarking as a potential future addition to the back garden
Good writeup of current best practices for a production AWS architecture
notable mainly for the details of Terraform support for Lambda: that’s a significant improvement to Lambda’s production-readiness
The court based its reasoning on the fact that, although an isolated gene such as BRCA1 was “a product of human action, it was the existence of the information stored in the relevant sequences that was an essential element of the invention as claimed.” Since the information stored in the DNA as a sequence of nucleotides was a product of nature, it did not require human action to bring it into existence, and therefore could not be patented.Via Tony Finch.
client-side ‘service discovery and routing system for microservices’ — another Smartstack, then
ugh, quite a bit of complexity here
Good intro to fuzz-testing a distributed system; I’ve had great results using similar approaches in unit tests
you can now launch Spot instances that will run continuously for a finite duration (1 to 6 hours). Pricing is based on the requested duration and the available capacity, and is typically 30% to 45% less than On-Demand.
Very perceptive post on the next steps for safe harbor, post-Schrems.
And behind that elephant there are other elephants: if US surveillance and surveillance law is a problem, then what about UK surveillance? Is GCHQ any less intrusive than the NSA? It does not seem so – and this puts even more pressure on the current reviews of UK surveillance law taking place. If, as many predict, the forthcoming Investigatory Powers Bill will be even more intrusive and extensive than current UK surveillance laws this will put the UK in a position that could rapidly become untenable. If the UK decides to leave the EU, will that mean that the UK is not considered a safe place for European data? Right now that seems the only logical conclusion – but the ramifications for UK businesses could be huge. [....] What happens next, therefore, is hard to foresee. What cannot be done, however, is to ignore the elephant in the room. The issue of surveillance has to be taken on. The conflict between that surveillance and fundamental human rights is not a merely semantic one, or one for lawyers and academics, it’s a real one. In the words of historian and philosopher Quentin Skinner “the current situation seems to me untenable in a democratic society.” The conflict over Safe Harbor is in many ways just a symptom of that far bigger problem. The biggest elephant of all.
The only current way to comply with EU law, the judgment indicates, is to keep EU data within the EU. Whether those data can be safely managed within facilities run by US companies will not be determined until the US rules on an ongoing Microsoft case. Microsoft stands in contempt of court right now for refusing to hand over to US authorities, emails held in its Irish data centre. This case will surely go to the Supreme Court and will be an extremely important determination for the cloud business, and any company or individual using data centre storage. If Microsoft loses, US multinationals will be left scrambling to somehow, legally firewall off their EU-based data centres from US government reach.(cough, Amazon)
“@alexbfree @ThijsFeryn [ElasticSearch is] fine as long as data loss is acceptable. https://aphyr.com/posts/317-call-me-maybe-elasticsearch . We lose ~1% of all writes on average.”
Many organisations I’ve spoken to have had the cunning plan of adopting model contract clauses as their fall back position to replace their reliance on Safe Harbor. [....] The best that can be said for Model Clauses is that they haven’t been struck down by the CJEU. Yet.
Reacting to the ruling, the [EC] stressed that data transfers between the U.S. and Europe can continue on the basis of other legal mechanisms. A lot rides on what steps the Commission and national data protection supervisors take in response. “It is crucial for legal certainty that the EC sends a clear signal,” said Nauwelaerts. That could involve providing a timeline for concluding an agreement with U.S. authorities, together with a commitment from national data protection authorities not to block data transfers while negotiations are on-going, he explained.
The new engine has similarities with LSM Trees (like LevelDB and Cassandra’s underlying storage). It has a write ahead log, index files that are read only, and it occasionally performs compactions to combine index files. We’re calling it a Time Structured Merge Tree because the index files keep contiguous blocks of time and the compactions merge those blocks into larger blocks of time. Compression of the data improves as the index files are compacted. Once a shard becomes cold for writes it will be compacted into as few files as possible, which yield the best compression.
new Dublin delivery service takes Bitcoin?!
interesting new data structure from Tony Finch. “Some simple benchmarks say qp tries have about 1/3 less memory overhead and are about 10% faster than crit-bit tries.”
When we talk about surveillance, we tend to concentrate on the problems of data collection: CCTV cameras, tagged photos, purchasing habits, our writings on sites like Facebook and Twitter. We think much less about data analysis. But effective and pervasive surveillance is just as much about analysis. It’s sustained by a combination of cheap and ubiquitous cameras, tagged photo databases, commercial databases of our actions that reveal our habits and personalities, and – most of all – fast and accurate face recognition software. Don’t expect to have access to this technology for yourself anytime soon. This is not facial recognition for all. It’s just for those who can either demand or pay for access to the required technologies – most importantly, the tagged photo databases. And while we can easily imagine how this might be misused in a totalitarian country, there are dangers in free societies as well. Without meaningful regulation, we’re moving into a world where governments and corporations will be able to identify people both in real time and backwards in time, remotely and in secret, without consent or recourse. Despite protests from industry, we need to regulate this budding industry. We need limitations on how our images can be collected without our knowledge or consent, and on how they can be used. The technologies aren’t going away, and we can’t uninvent these capabilities. But we can ensure that they’re used ethically and responsibly, and not just as a mechanism to increase police and corporate power over us.
China just introduced a universal credit score, where everybody is measured as a number between 350 and 950. But this credit score isn’t just affected by how well you manage credit – it also reflects how well your political opinions are in line with Chinese official opinions, and whether your friends’ are, too.Measuring using online mass surveillance, naturally. This may be the most dystopian thing I’ve heard in a while….
YESSSS. Joe and Brian have delivered — going to be giving a lot of copies of this for xmas ;)
your command line environment in the [Google] Cloud. This feature enables you to connect to a shell environment on a virtual machine, pre-loaded with the tools you need to easily run commands to develop, deploy and manage your projects. Currently, Cloud Shell is an f1-micro Google Compute Engine machine that exposes a Debian-based development environment. You are also assigned 5 GB of standard persistent disk space as the home disk so you can store files between sessions.It’s also free. This is a great idea — handy both for beginners getting to grips with GoogCloud and for experts looking for a quite dev env to hack with. I wish AWS had something similar.
“A Neapolitan-American friend of mine, who’s in his mid-fifties, fondly remembers how his mother used to serve him an espresso with Fernet Branca and an egg yolk every morning before he went off to elementary school.”
come recommended by http://gearmoose.com/the-ten-best-minimalist-wallets-a-recap/ , looks pretty nice
Below is a list of some lessons I’ve learned as an startup engineering manager that are worth being told to a new manager. Some are subtle, and some are surprising, and this being human beings, some are inevitably controversial. This list is for the new head of engineering to guide their thinking about the job they are taking on. It’s not comprehensive, but it’s a good beginning. The best characteristic of this list is that it focuses on social problems with little discussion of technical problems a manager may run into. The social stuff is usually the hardest part of any software developer’s job, and of course this goes triply for engineering managers.
Some bookmarks around post-mortem activity
Han Sung is bizarrely located in the back of an Asian supermarket just off the Millennium Walk on Great Strand Street. [...] You’d see this a lot in Korea, I ask, a restaurant in the back of a supermarket? Not really, no, he says.
“Spex in the City”, “Fidler on the Tooth”, “Sight For Four Eyes”, “Fried Egg I’m In Love”, “Lice Knowing You” and many more
this is quite nice. PipelineDB allows direct hookup of a Kafka stream, and will ingest durably and reliably, and provide SQL views computed over a sliding window of the stream.
Ireland leading the pack with a drop of funding by 20% :(
recommended by Paul Hickey
First of all, banks could be chopped up into units that can safely go bust – meaning they could never blackmail us again. Banks should not have multiple activities going on under one roof with inherent conflicts of interest. Banks should not be allowed to build, sell or own overly complex financial products – clients should be able to comprehend what they buy and investors understand the balance sheet. Finally, the penalty should land on the same head as the bonus, meaning nobody should have more reason to lie awake at night worrying over the risks to the bank’s capital or reputation than the bankers themselves. You might expect all major political parties to have come out by now with their vision of a stable and productive financial sector. But this is not what has happened.
So the fact is that our experience of the world will increasingly come to reflect our experience of our computers and of the internet itself (not surprisingly, as it’ll be infused with both). Just as any user feels their computer to be a fairly unpredictable device full of programs they’ve never installed doing unknown things to which they’ve never agreed to benefit companies they’ve never heard of, inefficiently at best and actively malignant at worst (but how would you now?), cars, street lights, and even buildings will behave in the same vaguely suspicious way. Is your self-driving car deliberately slowing down to give priority to the higher-priced models? Is your green A/C really less efficient with a thermostat from a different company, or it’s just not trying as hard? And your tv is supposed to only use its camera to follow your gestural commands, but it’s a bit suspicious how it always offers Disney downloads when your children are sitting in front of it. None of those things are likely to be legal, but they are going to be profitable, and, with objects working actively to hide them from the government, not to mention from you, they’ll be hard to catch.
“MAPS.ME is an open source cross-platform offline maps application, built on top of crowd-sourced OpenStreetMap data. It was publicly released for iOS and Android.”