‘This document is intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. If you have taken a class in machine learning, or built or worked on a machine-learned model, then you have the necessary background to read this document.’ Full of good tips, if you wind up using ML in a production service.
Dictator-friendly censorship tools? no probs!
This is fascinating, re “authenticity” of food:
The objection that curry house food was inauthentic was true, but also unfair. It’s worth asking what “authenticity” really means in this context, given that people in India – like humans everywhere – do not themselves eat a perfectly “authentic” diet. When I asked dozens of people, while on a recent visit to India, about their favourite comfort food, most of them – whether from Delhi, Bangalore or Mumbai – told me that what they really loved to eat, especially when drinking beer, was something called Indian-Chinese food. It is nothing a Chinese person would recognise, consisting of gloopy dishes of meat and noodles, thick with cornflour and soy sauce, but spiced with green chillis and vinegar to please the national palate. Indian-Chinese food – just like British curry house food – offers a salty night away from the usual home cooking. The difference is that Indian people accept Indian-Chinese food for the ersatz joy that it is, whereas many British curry house customers seem to have believed that recipe for their Bombay potatoes really did come from Bombay, and felt affronted to discover that it did not.
We raised the issue of discrimination in 2011 with one of the banks and with the Commission for Racial Equality, but as no-one was keeping records, nothing could be proved, until today. How can this discrimination happen? Well, UK rules give banks a lot of discretion to decide whether to refund a victim, and the first responders often don’t know the full story. If your HSBC card was compromised by a skimmer on a Tesco ATM, there’s no guarantee that Tesco will have told anyone (unlike in America, where the law forces Tesco to tell you). And the fraud pattern might be something entirely new. So bank staff end up making judgement calls like “Is this customer telling the truth?” and “How much is their business worth to us?” This in turn sets the stage for biases and prejudices to kick in, however subconsciously. Add management pressure to cut costs, sometimes even bonuses for cutting them, and here we are.
Agreed, this is a big issue.
If artificial intelligence takes over our lives, it probably won’t involve humans battling an army of robots that relentlessly apply Spock-like logic as they physically enslave us. Instead, the machine-learning algorithms that already let AI programs recommend a movie you’d like or recognize your friend’s face in a photo will likely be the same ones that one day deny you a loan, lead the police to your neighborhood or tell your doctor you need to go on a diet. And since humans create these algorithms, they’re just as prone to biases that could lead to bad decisions—and worse outcomes. These biases create some immediate concerns about our increasing reliance on artificially intelligent technology, as any AI system designed by humans to be absolutely “neutral” could still reinforce humans’ prejudicial thinking instead of seeing through it.
Much of my professional work for the last 10+ years has revolved around handing, importing and exporting CSV files. CSV files are frustratingly misunderstood, abused, and most of all underspecified. While RFC4180 exists, it is far from definitive and goes largely ignored. Partially as a companion piece to my recent post about how CSV is an encoding nightmare, and partially an expression of frustration, I’ve decided to make a list of falsehoods programmers believe about CSVs. I recommend my previous post for a more in-depth coverage on the pains of CSVs encodings and how the default tooling (Excel) will ruin your day.(via Tony Finch)
Pretty amazing, particularly for this revelation:
Tetsuya Nomura (Character and battle visual director, Square Japan): OK, so maybe I did kill Aerith. But if I hadn’t stopped you, in the second half of the game, you were planning to kill everyone off but the final three characters the player chooses! Yoshinori Kitase (Director, Square Japan) No way! I wrote that? Where? Tetsuya Nomura (Character and battle visual director, Square Japan) In the scene where they parachute into Midgar. You wanted everyone to die there!
in 1977, Jet Propulsion Lab (JPL) scientists packed a Reed-Solomon encoder in each Voyager, hardware designed to add error-correcting bits to all data beamed back at a rate of efficiency 80 percent higher than an older method also included with Voyager. Where did the hope come in? When the Voyager probes were launched with Reed-Solomon encoders on board, no Reed-Solomon decoders existed on Earth.
Using jemalloc to instrument the contents of the native heap and record stack traces of each chunk’s allocators, so that leakers can be quickly identified (GZIPInputStream in this case). See also https://gdstechnology.blog.gov.uk/2015/12/11/using-jemalloc-to-get-to-the-bottom-of-a-memory-leak/ .
If you’ve always loved Hello Kitty but wish she also came with a deep well of rage, Sanrio has introduced just the character for you: Aggretsuko. An adorable 25-year-old red panda who works as an office associate, Aggretsuko is constantly taken advantage of and bothered by her boss and co-workers. So she deals with it by pounding beers and screaming death-metal karaoke.
This documentation covers parts of the PagerDuty Incident Response process. It is a cut-down version of our internal documentation, used at PagerDuty for any major incidents, and to prepare new employees for on-call responsibilities. It provides information not only on preparing for an incident, but also what to do during and after. It is intended to be used by on-call practitioners and those involved in an operational incident response process (or those wishing to enact a formal incident response process).This is a really good set of processes — quite similar to what we used in Amazon for high-severity outage response.
Dr. Kelly, desperate to become intoxicated while maintaining The Pledge, realized that not only could ether vapors be inhaled, but liquid ether could be swallowed. Around 1845 he began consuming tiny glasses of ether, and then started dispensing these to his patients and friends as a nonalcoholic libation. It wasn’t long before it became a popular beverage, with one priest going so far as to declare that ether was “a liquor on which a man could get drunk with a clean conscience.” In some respects ingesting ether is less damaging to the system than severe alcohol intoxication. Its volatility – ether is a liquid at room temperature but a gas at body temperature -dramatically speeds its effects. Dr. Ernest Hart wrote that “the immediate effects of drinking ether are similar to those produced by alcohol, but everything takes place more rapidly; the stages of excitement, mental confusion, loss of muscular control, and loss of consciousness follow each other so quickly that they cannot be clearly separated.” Recovery is similarly rapid. Not only were ether drunks who were picked up by the police on the street often completely sober by the time they reached the station, but they suffered no hangovers. Ether drinking spread rapidly throughout Ireland, particularly in the North, and the substance soon could be purchased from grocers, druggists, publicans, and even traveling salesmen. Because ether was produced in bulk for certain industrial uses, it could also be obtained quite inexpensively. Its low price and rapid action meant than even the poorest could afford to get drunk several times a day on it. By the 1880s ether, distilled in England or Scotland, was being imported and widely distributed to even the smallest villages. Many Irish market towns would “reek of the mawkish fumes of the drug” on fair days when “its odor seems to cling to the very hedges and houses for some time.”
Can’t help feeling danah boyd is hitting the nail on the head here:
The Internet has long been used for gaslighting, and trolls have long targeted adversaries. What has shifted recently is the scale of the operation, the coordination of the attacks, and the strategic agenda of some of the players. For many who are learning these techniques, it’s no longer simply about fun, nor is it even about the lulz. It has now become about acquiring power. A new form of information manipulation is unfolding in front of our eyes. It is political. It is global. And it is populist in nature. The news media is being played like a fiddle, while decentralized networks of people are leveraging the ever-evolving networked tools around them to hack the attention economy.
per Difford’s Guide — Amaretto Sour, Margarita, Bramble, Espresso Martini, Old-Fashioned, Negroni, White Lady and Manhattan up there.
Instead of discussing recent site visits or photographs we’ll be looking at a recent controversy sparked by comments about the reconstruction of Newgrange and, in particular, three claims made in the media by an Irish archaeologist; 1. That the “roof-box” at Newgrange may not be an original feature, instead it was “fabricated” and has “not a shred of authenticity” 2. That two vitally important structural stones, both decorated with megalithic art, from Newgrange were lost after the excavation and 3. That the photographic evidence that backs up the existing restoration is either inaccessible or never existed at all. I hope to show why we can be sure none of these claims are sustainable and that in fact the winter solstice phenomenon at Newgrange is an original and central feature of the tomb.
Google offers public NTP service with leap smearing — I didn’t realise! (thanks Keith)
The root cause of the bug that affected our DNS service was the belief that time cannot go backwards. In our case, some code assumed that the difference between two times would always be, at worst, zero. RRDNS is written in Go and uses Go’s time.Now() function to get the time. Unfortunately, this function does not guarantee monotonicity. Go currently doesn’t offer a monotonic time source.So the clock went “backwards”, s1 – s2 returned < 0, and the code couldn't handle it (because it's a little known and infrequent failure case). Part of the root cause here is cultural -- Google has solved the leap-second problem internally through leap smearing, and Go seems to be fundamentally a Google product at heart. The easiest fix in general in the "outside world" is to use "ntpd -x" to do a form of smearing. It looks like AWS are leap smearing internally (https://aws.amazon.com/blogs/aws/look-before-you-leap-the-coming-leap-second-and-aws/), but it is a shame they aren't making this a standard part of services running on top of AWS and a feature of the AWS NTP fleet.
via twitter: “interesting conversation between author of a parenting book and the guy who introduced the concept of “flow”” — summary, family life is interrupt-driven (via nagging) and fundamentally hard to align with “flow”
The recent movement to get all traffic encrypted has of course been great for the Internet. But the use of encryption in these protocols is different than in TLS. In TLS, the goal was to ensure the privacy and integrity of the payload. It’s almost axiomatic that third parties should not be able to read or modify the web page you’re loading over HTTPS. QUIC and TOU go further. They encrypt the control information, not just the payload. This provides no meaningful privacy or security benefits. Instead the apparent goal is to break the back of middleboxes . The idea is that TCP can’t evolve due to middleboxes and is pretty much fully ossified. They interfere with connections in all kinds of ways, like stripping away unknown TCP options or dropping packets with unknown TCP options or with specific rare TCP flags set. The possibilities for breakage are endless, and any protocol extensions have to jump through a lot of hoops to try to minimize the damage.
Paper from Google describing one of their internal building block services:
A general purpose sharding service. I normally think of sharding as something that happens within a (typically data) service, not as a general purpose infrastructure service. What exactly is Slicer then? It has two key components: a data plane that acts as an affinity-aware load balancer, with affinity managed based on application-specified keys; and a control plane that monitors load and instructs applications processes as to which keys they should be serving at any one point in time. In this way, the decisions regarding how to balance keys across application instances can be outsourced to the Slicer service rather than building this logic over and over again for each individual back-end service. Slicer is focused exclusively on the problem of balancing load across a given set of backend tasks, other systems are responsible for adding and removing tasks.interesting.
a competing-consumer messaging queue that is durable, fault-tolerant, highly available and scalable. We achieve durability and fault-tolerance by replicating messages across storage hosts, and high availability by leveraging the append-only property of messaging queues and choosing eventual consistency as our basic model. Cherami is also scalable, as the design does not have single bottleneck. […] Cherami is completely written in Go, a language that makes building highly performant and concurrent system software a lot of fun. Additionally, Cherami uses several libraries that Uber has already open sourced: TChannel for RPC and Ringpop for health checking and group membership. Cherami depends on several third-party open source technologies: Cassandra for metadata storage, RocksDB for message storage, and many other third-party Go packages that are available on GitHub. We plan to open source Cherami in the near future.
This is scary shit. It’s amazing how Russia has weaponised transparency, but I guess it’s not new to observers of “kompromat”: https://en.wikipedia.org/wiki/Kompromat
good preso from Percona Live 2015 on the messiness of MySQL vs UTF-8 and utf8mb4
A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. The t-digest algorithm is also very parallel friendly making it useful in map-reduce and parallel streaming applications. The t-digest construction algorithm uses a variant of 1-dimensional k-means clustering to product a data structure that is related to the Q-digest. This t-digest data structure can be used to estimate quantiles or compute other rank statistics. The advantage of the t-digest over the Q-digest is that the t-digest can handle floating point values while the Q-digest is limited to integers. With small changes, the t-digest can handle any values from any ordered set that has something akin to a mean. The accuracy of quantile estimates produced by t-digests can be orders of magnitude more accurate than those produced by Q-digests in spite of the fact that t-digests are more compact when stored on disk.Super-nice feature is that it’s mergeable, so amenable to parallel usage across multiple hosts if required. Java implementation, ASL licensing.
good hardware recommendations
Good advice — let’s hope it doesn’t come to this. Example: ’17. Watch out for the paramilitaries: When the men with guns who have always claimed to be against the system start wearing uniforms and marching around with torches and pictures of a Leader, the end is nigh. When the pro-Leader paramilitary and the official police and military intermingle, the game is over.’
Why don’t Irish tech startup activity show up on a EU-wide comparisons? Turns out we tend to transition to a US-based model, with US-based management and EU-based operations and engineering, like $work does:
Successful Irish tech companies have a skewed geographic profile. This presents a data gathering problem for the data companies but its also a strong indicator of the market reality for Irish startups. The size of the local market and a focus on software business in particular means many Irish startups are transitioning to the US (some earlier and with more commitment than others), and getting backed by a spectrum of local and international VCs.Correcting for this put Ireland’s tech venture investment in the second half of 2014 at $125m, midway between Sweden and Finland, 8th in Europe overall.
ooh, Lascaux 4 is finally opening:
St-Cyr added: “It’s impossible for anyone to see the original now, but this is the next best thing. What is lost in not having the real thing is balanced by the fact people can see so much more of the detail of the wonderful paintings and engravings.”
Johanson said it’s possible to use an RFID “gate antenna” — two electronic readers spanning a doorway, similar to the anti-theft gates in retail stores — to scan the credit cards of people passing through. With enough high-powered gates installed at key doorways in a city or across the country, someone could collect comprehensive information on people’s movements, buying habits and social patterns. “These days you can buy a $500 antenna to mount in doorways that can read every card that goes through it,” Johanson said.Amazingly, these seem to be rife with holes — they still use the legacy EMV protocol, do not require online verification with backend systems, and allow replay attacks. A Journal.ie article today claims that attackers are sniffing EMV data, then replaying it against card readers in shops in Dublin, which while it may not be true, the attack certainly seems viable…
rather dramatic differences
Donald Trump’s media strategy as a form of Surkovian control via post-truth ‘destabilised perception’, through deliberate flooding with fake news:
By attacking the very notion of shared reality, the president-elect is making normal democratic politics impossible. When the truth is little more than an arbitrary personal decision, there is no common ground to be reached and no incentive to look for it. To men like Surkov, that is exactly as it should be. Government policy should not be set through democratic oversight; instead, the government should “manage” democracy, ensuring that people can express themselves without having any influence over the machinations of the state. According to a 2011 openDemocracy article by Richard Sakwa, a professor of Russian and European politics at the University of Kent, Surkov is “considered the main architect of what is colloquially known as ‘managed democracy,’ the administrative management of party and electoral politics.” “Surkov’s philosophy is that there is no real freedom in the world, and that all democracies are managed democracies, so the key to success is to influence people, to give them the illusion that they are free, whereas in fact they are managed,” writes Sakwa. “In his view, the only freedom is ‘artistic freedom.’”
remove RFID from a payment card with a single drilled hole
Nice comparison of a counting Bloom filter and a Cuckoo Filter, implemented in Python:
This post provides an update by exploring Cuckoo filters, a new probabilistic data structure that improves upon the standard Bloom filter. The Cuckoo filter provides a few advantages: 1) it enables dynamic deletion and addition of items 2) it can be easily implemented compared to Bloom filter variants with similar capabilities, and 3) for similar space constraints, the Cuckoo filter provides lower false positives, particularly at lower capacities. We provide a python implementation of the Cuckoo filter here, and compare it to a counting Bloom filter (a Bloom filter variant).
Football Manager includes what is effectively a parallel universe, so they modelled the effects of Brexit on the UK Premier League: ‘In my own current “save”, Brexit kicked in at the end of season three. Unfortunately I got one of the hard options, where all non-homegrown players are now going through a work permit system, albeit one that’s slightly relaxed. It means I can no longer bring in that 19-year-old Italian keeper I’d been eyeing up as one for the future. Instead I have to wait for him to break into the Italian squad, and play 30% of their fixtures over the next two years. Then he’ll be mine. Meanwhile, my TV revenue has just dropped by a few million. Let’s hope that doesn’t continue, or I won’t even be able to afford him.’
It was recently discovered that some surprising operations on Rust’s standard hash table types could go quadratic.Quite a nice unexpected accidental detour into O(n^2)
This is intriguing — using Jupyter notebooks to embody data analysis work, and ensure it’s reproducible, which brings better rigour similarly to how unit tests improve coding. I must try this.
Reproducibility makes data science at Stripe feel like working on GitHub, where anyone can obtain and extend others’ work. Instead of islands of analysis, we share our research in a central repository of knowledge. This makes it dramatically easier for anyone on our team to work with our data science research, encouraging independent exploration. We approach our analyses with the same rigor we apply to production code: our reports feel more like finished products, research is fleshed out and easy to understand, and there are clear programmatic steps from start to finish for every analysis.
neat — aggregation of histograms for Datadog statsd
auditd -> go-audit -> elasticsearch at Slack
Eir ship vulnerable firmware images AGAIN. ffs
Amazing virtuoso performance — be sure to scroll up all the way to Chapter 1
good call — new EMR feature
LMAX’ approach to acceptance/system-testing time-dependent code. We are doing something similar in Swrve too, so finding that LMAX have taken a similar approach is a great indicator
scumbags. Attempting to pass off their pissy beer under alternative names to con consumers into buying it! ‘There will be no sanctions against Heineken for passing off non-craft beer as “locally produced”, the Food Safety Authority of Ireland (FSAI) has said. The FSAI and HSE launched a joint investigation last month after it emerged that Heineken Ireland had sold some of its products, including Foster’s lager, under craft-type names such as Blasket Blonde and Beanntrai Bru. Two well-known stouts, Beamish and Murphy’s, were also sold under craft-type names by the international brewing giant. C&C, a Tipperary-based drinks company, was also investigated after it admitted selling its Clonmel 1650 lager under a different name, Pana Cork, in Cork.’
great, I’ve looked for this so many times. Only tricky limit I can spot is the 300 tps limit, and it’s US-East/US-West only for now
good intro to Airflow usage preso
by John Allspaw, Morgan Evans and Daniel Schauenberg; the Etsy blameless postmortem style crystallized into a detailed 27-page PDF ebook
‘bike-shedding’, or needless arguing about trivial issues, actually dates back to 1957 as C. Northcote Parkinson’s ‘law of triviality’
simple usage of Docker, blue/green deploys, and AWS ALBs
ICRs are the perfect material for blackmail, which makes them valuable in a way that traditional telephone records are not. And where potentially large sums of money are involved, corruption is sure to follow. Even if ICR databases are secured with the best available technology, they are still vulnerable to subversion by individuals whose jobs give them ready access. This is no theoretical risk. Just one day ago, it emerged that corrupt insiders at offshore call centres used by Australian telecoms were offering to sell phone records, home addresses, and other private details of customers. Significantly, the price requested was more if the target was an Australian “VIP, politician, police [or] celebrity.”
a low-cost online vendor in Ireland, recommended by @irldexter on ITS (along with webdoctor.ie): ‘For basic consultations I halved the cost €55 to engage a GP with https://www.webdoctor.ie/ down to €25 (for limited domains) and after paying €8.48 and €9.48 respectively for a Ventolin inhaler, I now get them for €3.50 at http://www.purepharmacy.ie/ (closer to mainland EU costs). I also benchmarked my parents medicine costs which worked out 40% cheaper too.’
“The scale of the challenge here remains depressing,” says the report. “It has never been viable to build apartment blocks in the vast majority of this country.” […] The report notes that the rise in living costs of almost three quarters in less than five years is “a symptom of strong demand for housing” as economic recovery continues and the population grows. “But there is nothing inevitable about housing costs rising with demand,” it says. “That only happens when supply fails to respond, and the complete absence of any meaningful level of construction over the past five years is a systemic failure in desperate need of policy solutions. “There is no more urgent task facing the Minister for Housing, his department and advisers, and the Housing Agency, than understanding why the costs of building, and building apartments in particular, is so dramatically out of line with our own incomes and indeed with the cost in other countries.”
I’m not remotely interested in shockingly good graphics, in murder simulators, in guns and knives and swords. I’m not that interested in adrenaline. My own life is thrilling enough. There is enough fear and hatred in the world to get my heart pounding. My Facebook feed and Twitter feed are enough for that. Walking outside in summer clothing is enough for that. I’m interested in care, in characters, in creation, in finding a path forward inside games that helps me find my path forward in life. I am interested in compassion and understanding. I’m interested in connecting. As Miranda July said, “all I ever wanted to know is how other people are making it through life.” I want to make games that help other people understand life. We are all overwhelmed with shock, with information, with change. The degree of interactivity in our lives is amazing and wonderful and I wouldn’t exchange it for anything, but it is also shocking and overwhelming and it’s causing us to dig in and try to find some peace by shutting each other out. On all sides of the political spectrum we’ve stopped listening to each other and I fear we are all leaning toward fascist thinking. We should be using this medium to help us adapt to our new, interactive lives. This is how we become relevant.
“Any financial loss that results from this fraudulent activity will be borne by the bank,” Mr Higgins said. “Customers are not at financial risk.”Well, that would be surprising….
Hooray for nuclear power. (via Ossian Smyth)
Ivan’s Childhood, Andrei Rublev, Solaris, The Mirror, and Stalker — all viewable for free on YouTube thanks to Mosfilm. quality not great though….
This page contains lecture notes and other course materials for various algorithms classes I have taught at the University of Illinois, Urbana-Champaign. The notes are numbered in the order I cover the material in a typical undergraduate class, wtih notes on more advanced material (indicated by the symbol ?) interspersed appropriately. […] In addition to the algorithms notes I have been maintaining since 1999, this page also contains new notes on “Models of Computation”, which cover a small subset of the material normally taught in undergraduate courses in formal languages and automata. I wrote these notes for a new junior-level course on “Algorithms and Models of Computation” that Lenny Pitt and I developed, which is now required for all undergraduate computer science and computer engineering majors at UIUC.Via Tony Finch
“I started the site for a easy way to make money,” said a 17-year-old who runs a site [from Veles] with four other people. “In Macedonia the economy is very weak and teenagers are not allowed to work, so we need to find creative ways to make some money. I’m a musician but I can’t afford music gear. Here in Macedonia the revenue from a small site is enough to afford many things.”
‘an antagonistic GSM base station [disguised] in the form of an innocuous office printer. It brings the covert design practice of disguising cellular infrastructure as other things – like trees and lamp-posts – indoors, while mimicking technology used by police and intelligence agencies to surveil mobile phone users.’
wow, Docker Swarm looks like a turkey right now if performance is important. Only “host” gives reasonably perf numbers
Subreddit devoted to becoming a software developer in Ireland, with a decent wiki
In short, the answer to the question “is this what it would look like if I was there?” is almost always no, but that is true of every photograph. The photos taken from space cameras are no more fake or false than the photos taken from any camera. Like all photos they are a visual interpretation using color to display data. Most space photos have information online about how they were created, what filters were used, and all kinds of interesting details about processing. The discussion about whether a space photo is real or fake is meaningless. There’s no distinction between photoshopped and not. It’s a nuanced view but the nature of the situation demands it.
LOL as DST bug uncovers spurious automated noise complaints:
In January last year the airport unearthed a scheme whereby campaigners were using automated software to generate complaints against the airport. Officials caught out the set-up when the two anti-Heathrow enthusiasts forgot to take into account the hour going back in October, and began complaining about flights that had not yet taken off or arrived.
Well, this is amazingly awful:
The Guardian claims to have further details of the kind of tell-tale signs that Admiral’s algorithmic analysis would have looked out for in Facebook posts. Good traits include “writing in short concrete sentences, using lists, and arranging to meet friends at a set time and place, rather than just ‘tonight’.” On the other hand, “evidence that the Facebook user might be overconfident—such as the use of exclamation marks and the frequent use of ‘always’ or ‘never’ rather than ‘maybe’—will count against them.”The future is shitty.
An improved hashing algorithm called optimistic cuckoo hashing, and a CLOCK-based eviction algorithm that works in tandem with it. They are evaluated in the context of Memcached, where combined they give up to a 30% memory usage reduction and up to a 3x improvement in queries per second as compared to the default Memcached implementation on read-heavy workloads with small objects (as is typified by Facebook workloads).