‘We suck at dealing with abuse and trolls on the platform and we’ve sucked at it for years. It’s no secret and the rest of the world talks about it every day. We lose core user after core user by not addressing simple trolling issues that they face every day. I’m frankly ashamed of how poorly we’ve dealt with this issue during my tenure as CEO. It’s absurd. There’s no excuse for it. I take full responsibility for not being more aggressive on this front. It’s nobody else’s fault but mine, and it’s embarrassing. We’re going to start kicking these people off right and left and making sure that when they issue their ridiculous attacks, nobody hears them. Everybody on the leadership team knows this is vital.’More like this!
nice deep-dive from Adrian Colyer
Excellent post — Delta sounds like a very well-designed product
Today sees the publication of a report I [Ross Anderson] helped to write for the Nuffield Bioethics Council on what happens to medical ethics in a world of cloud-based medical records and pervasive genomics. As the information we gave to our doctors in private to help them treat us is now collected and treated as an industrial raw material, there has been scandal after scandal. From failures of anonymisation through unethical sales to the care.data catastrophe, things just seem to get worse. Where is it all going, and what must a medical data user do to behave ethically? We put forward four principles. First, respect persons; do not treat their confidential data like were coal or bauxite. Second, respect established human-rights and data-protection law, rather than trying to find ways round it. Third, consult people who’ll be affected or who have morally relevant interests. And fourth, tell them what you’ve done – including errors and security breaches.
A good overview — I like the summary table. tl;dr:
If you are light on DevOps and not latency sensitive use SQS for job management and Kinesis for event stream processing. If latency is an issue, use ELB or 2 RabbitMQs (or 2 beanstalkds) for job management and Redis for event stream processing.
Al Tobey does some trial runs of -XX:+AlwaysPreTouch and -XX:+UseHugePages
ahh, interesting! This looks much easier (via JBaruch)
Marc Brooker: ‘When it comes to building working software in the long term, the emotional pursuit of craft is not as important as the human pursuit of teamwork, or the intellectual pursuit of correctness. Patterns is one of the most powerful ideas we have. The critics may be right that it devalues the craft, but we would all do well to remember that the craft of software is a means, not an end.’
Via Walter, the best description of the appeal of Minecraft I’ve read:
Minecraft is exceptionally good at intrinsic narrative. It recognises, preserves and rewards everything you do. It presses you to play frontiersman. A Minecraft world ends up dotted with torchlit paths, menhirs, landmarks, emergency caches. Here’s the hole where you dug stone for your first house. Here’s the causeway you built from your spawn point to a handy woodland. Here’s the crater in the landscape where the exploding monster took out you and your wheatfield at once. And, of course, here’s your enormous castle above a waterfall. There’s no utility in building anything bigger than a hut, but the temptations of architecture are irresistible. Minecraft isn’t so much a world generator as a screenshot-generator and a war-story generator. This is what will get the game the bulk of its critical attention, and deservedly so. That’s why I want to call attention to the extrinsic narrative. It’s minimal, implicit, accidental and very powerful. It’s this: you wake alone beside an endless sea in a pristine, infinite wilderness. The world is yours. You can literally sculpt mountains, with time and effort. You’ll die and be reborn on the beach where you woke first. You’ll walk across the world forever and never see another face. You can build a whole empire of roads and palaces and beacon towers, and the population of that empire will only ever be you. When you leave, your towers will stand empty forever. I haven’t seen that surfaced in a game before. It’s strong wine.
whoa, this is pretty excellent. The major improvement over a graphite-based system would be the multi-dimensional tagging of metrics, which we currently have to do by simply expanding the graphite metric’s name to encompass all those dimensions and use searching at query time, inefficiently.
Good example of a clean java OSS release, from Soundcloud. will be copying bits of this myself soon…
A good set of basic, controversy-free guidelines for clean java code style
from 1946 to present
According to a major new study in the journal ‘Pediatrics’, trying to [persuade anti-vaxxers to vaccinate] may actually make the problem worse. The paper tested the effectiveness of four separate pro-vaccine messages, three of which were based very closely on how the Centers for Disease Control and Prevention (CDC) itself talks about vaccines. The results can only be called grim: Not a single one of the messages was successful when it came to increasing parents’ professed intent to vaccinate their children. And in several cases the messages actually backfired, either increasing the ill-founded belief that vaccines cause autism or even, in one case, apparently reducing parents’ intent to vaccinate.
“dysaguria” is the perfect noun, and “dysagurian” is the perfect adjective, to describe the eponymous company in Dave Eggers’ The Circle. It’s not in the same league as Orwell, or Huxley, or Bradbury, or Burgess. But it does raise very important questions about what could possibly go wrong if one company controlled all the world’s information. In the novel, the company operates according to the motto “all that happens must be known”; and one of its bosses, Eamon Bailey, encourages everywoman employee Mae Holland to live an always-on (clear, transparent) life according the maxims “secrets are lies”, “sharing is caring”, and “privacy is theft”. Eggers’s debts to dystopian fiction are apparent. But, whereas writers like Orwell, Huxley, Bradbury, and Burgess were concerned with totalitarian states, Eggers is concerned with a totalitarian company. However, the noun “dystopia” and the adjective “dystopian” – perfect though they are for the terror of military/security authoritarianism in 1984, or Brave new World, or Farenheit 451, or A Clockwork Orange – do not to my mind encapsulate the nightmare of industrial/corporate tyranny in The Circle. On the other hand, “dysaguria” as a noun and “dysagurian” as an adjective, in my view really do capture the essence of that “frightening company”.
Via negatendo: ‘I would like to share my excitement about the fact that after almost a year of development, an instance of my NetHack bot has finally managed to ascend a game for the first time without human interventions, wizard mode cheats or bones stuffing, and did so at the public server at acehack.de.’ The bot is written in Clojure. Apparently ‘pudding farming’ did the trick…
League of Legends has set up private network links to a variety of major US ISPs to avoid internet weather (via Nelson)
Because there exists no method known to man, more terribly suited to expose the cosmic meaningless of existence than pairing the words of H.P. Lovecraft with seemingly delightful and charming pictures of adorable kittens.
These are very good — bookmarking for the next time I’m using gdb, probably about 3 years from now
For years, we’ve been working on a strategy to end mass surveillance of digital communications of innocent people worldwide. Today we’re laying out the plan, so you can understand how all the pieces fit together—that is, how U.S. advocacy and policy efforts connect to the international fight and vice versa. Decide for yourself where you can get involved to make the biggest difference. This plan isn’t for the next two weeks or three months. It’s a multi-year battle that may need to be revised many times as we better understand the tools and authorities of entities engaged in mass surveillance and as more disclosures by whistleblowers help shine light on surveillance abuses.
This group aims to consolidate opposition, give clear information and support letter writing and information awareness against the Dept. of Education’s Primary Online Database.
Fraud in Apple Pay will in time, come to be managed – but the fact that easily available PII can waylay best in class protection should give us all pause.
Fred Logue notes how this failed Mayo TD Michelle Mulherin:
From recent reports it mow appears that the Department of Education is discussing anonymisation of the Primary Online Database with the Data Protection Commissioner. Well someone should ask Mayo TD Michelle Mulherin how anonymisation is working for her. The Sunday Times reports that Ms Mulherin was the only TD in the Irish parliament on the dates when expensive phone calls were made to a mobile number in Kenya. The details of the calls were released under the Freedom of Information Act in an “anonymised” database. While it must be said the fact that Ms Mulherin was the only TD present on those occasions does not prove she made the calls – the reporting in the press is now raising the possibility that it was her. From a data protection point of view this is a perfect example of the difficulty with anonymisation. Data protection rules apply to personal data which is defined as data relating to a living individual who is or can be identified from the data or from the data in conjunction with other information. Anonymisation is often cited as a means for processing data outside the scope of data protection law but as Ms Mulherin has discovered individuals can be identified using supposedly anonymised data when analysed in conjunction with other data. In the case of the mysterious calls to Kenya even though the released information was “anonymised” to protect the privacy of public representatives, the phone log used in combination with the attendance record of public representatives and information on social media was sufficient to identify individuals and at least raise evidence of association between individuals and certain phone calls. While this may be well and good in terms of accounting for abuses of the phone service it also has worrying implications for the ability of public representatives to conduct their business in private. The bottom line is that anonymisation is very difficult if not impossible as Ms Mulherin has learned to her cost. It certainly is a lot more complex than simply removing names and other identifying features from a single dataset. The more data that there is and the more diverse the sources the greater the risk that individuals can be identified from supposedly anonymised datasets.
Nice wrapper for ‘tc’ and ‘netem’, for network latency/packet loss emulation
ohhhh this is very nice indeed. Great viz!
538 apply their numbercrunching skills to the BoardGameGeek ratings index
Pretty amazing specs for a 33 quid SBC.
Amlogic ARM® Cortex®-A5(ARMv7) 1.5Ghz quad core CPUs * Mali™-450 MP2 GPU (OpenGL ES 2.0/1.1 enabled for Linux and Android) * 1Gbyte DDR3 SDRAM * Gigabit Ethernet * 40pin GPIOs * eMMC4.5 HS200 Flash Storage slot / UHS-1 SDR50 MicroSD Card slot * USB 2.0 Host x 4, USB OTG x 1, * Infrared(IR) Receiver * Uses Ubuntu 14.04 or Android KitKat operating systemsIncludes HDMI out. (via Conor O’Neill)
good description of the process
A bot created by a group of artists spent the last few months selecting items at random from a Silk Road-style darknet marketplace, buying them with Bitcoin, and having them shipped to a gallery in Switzerland. After the it bought some ecstasy pills and a counterfeit passport, we asked: How will authorities deal with the complex legal and moral issue of a piece of artificial intelligence breaking the law? It turns out, the answer was simple: just arrest the computer.
Java Concurrency Tools for the JVM. This project aims to offer some concurrent data structures currently missing from the JDK: Bounded lock free queues SPSC/MPSC/SPMC/MPMC variations for concurrent queues Alternative interfaces for queues (experimental) Offheap concurrent ring buffer for ITC/IPC purposes (experimental) Executor (planned)
Good, and very accessible even for FP noobs like myself ;)
Great slide deck from Elasticsearch on JVM/dist-sys performance optimization
Nice trick — wrap servers with a libc wrapper to intercept bind(2) and accept(2) calls, so that transparent restarts becode possible
This is spot on —
By flooding the system with false positives, big-data approaches to counterterrorism might actually make it harder to identify real terrorists before they act. Two years before the Boston Marathon bombing, Tamerlan Tsarnaev, the older of the two brothers alleged to have committed the attack, was assessed by the city’s Joint Terrorism Task Force. They determined that he was not a threat. This was one of about a thousand assessments that the Boston J.T.T.F. conducted that year, a number that had nearly doubled in the previous two years, according to the Boston F.B.I. As of 2013, the Justice Department has trained nearly three hundred thousand law-enforcement officers in how to file “suspicious-activity reports.” In 2010, a central database held about three thousand of these reports; by 2012 it had grown to almost twenty-eight thousand. “The bigger haystack makes it harder to find the needle,” Sensenbrenner told me. Thomas Drake, a former N.S.A. executive and whistle-blower who has become one of the agency’s most vocal critics, told me, “If you target everything, there’s no target.”
‘All deleted tweets from politicians’. Great idea
The Youtube music service was introduced to me as a win win and they don’t understand why I don’t see it that way. “We are trying to create a new revenue stream on top of the platform that exists today.” A lot of people in the music industry talk about Google as evil. I don’t think they are evil. I think they, like other tech companies, are just idealistic in a way that works best for them. I think this because I used to be one of them. The people who work at Google, Facebook, etc can’t imagine how everything they make is not, like, totally awesome. If it’s not awesome for you it’s because you just don’t understand it yet and you’ll come around. They can’t imagine scenarios outside their reality and that is how they inadvertently unleash things like the algorithmic cruelty of Facebook’s yearly review (which showed me a picture I had posted after a doctor told me my husband had 6-8 weeks to live).
Jacobin Magazine on the revolutionary political allegory in “Snowpiercer”: ‘If Snowpiercer had merely told the tale of an oppressed working class rising up to seize power from an evil overlord, it would already have been an improvement over most of the political messages in mainstream cinema. There are all sorts of nice touches in its portrayal of a declining capitalism that can maintain its ideological legitimacy even when it literally has no more bullets in its guns. But the story Bong tells goes beyond that. It’s about the limitations of a revolution which merely takes over the existing social machinery rather than attempting to transcend it. ‘
A great resource bookmark from Falkvinge.
There are at least four good reasons to reject this argument solidly and uncompromisingly: The rules may change, it’s not you who determine if you’re guilty, laws must be broken for society to progress, and privacy is a basic human need.
‘Reasons abound for international entrepreneurs and top technical talent to stay away from Silicon Valley and build their startup somewhere else.’ Strongly agreed. This factoid is particularly nuts: ‘As Balaji Srinivasan of a16z has observed, roughly 50%+ of the capital allocated for early stage tech investments is actually flowing into Bay Area real estate, directly through office rentals and indirectly via home rentals as a primary driver of skyrocketing salaries.’
A much better carbon-relay, written in C rather than Python. Linking as we’ve been using it in production for quite a while with no problems.
The main reason to build a replacement is performance and configurability. Carbon is single threaded, and sending metrics to multiple consistent-hash clusters requires chaining of relays. This project provides a multithreaded relay which can address multiple targets and clusters for each and every metric based on pattern matches.
Blanket surveillance of social media is not the solution to combating terrorism and the rights of the individual to privacy must be protected, Data Protection Minister Dara Murphy said on Monday. [He] said Ireland and the European Union must protect the privacy rights of individuals on social media. “Freedom of expression, freedom of movement, and the protection of privacy are core tenets of the European Union, which must be upheld.”
‘Here’s a story for you. I’m not a party to any of this. I’ve done nothing wrong, I’ve never been suspected of doing anything wrong, and I don’t know anyone who has done anything wrong. I don’t even mean that in the sense of “I pissed off the wrong people but technically haven’t been charged.” I mean that I am a vanilla, average, 9-5 working man of no interest to anybody. My geographical location is an accident of my birth. Even still, I wasn’t accidentally born in a high-conflict area, and my government is not at war. I’m a sysadmin at a legitimate ISP and my job is to keep the internet up and running smoothly. This agency has stalked me in my personal life, undermined my ability to trust my friends attempting to connect with me on LinkedIn, and infected my family’s computer. They did this because they wanted to bypass legal channels and spy on a customer who pays for services from my employer. Wait, no, they wanted the ability to potentially spy on future customers. Actually, that is still not accurate – they wanted to spy on everybody in case there was a potentially bad person interacting with a customer. After seeing their complete disregard for anybody else, their immense resources, and their extremely sophisticated exploits and backdoors – knowing they will stop at nothing, and knowing that I was personally targeted – I’ll be damned if I can ever trust any electronic device I own ever again. You all rationalize this by telling me that it “isn’t surprising”, and that I don’t live in the [USA,UK] and therefore I have no rights. I just have one question. Are you people even human?’
‘Broadly, they are satisfied with what we are doing’ versus: ‘We have deep concerns about the Eircode initiative… We want to state clearly that we are not at all ‘satisfied’ with the postcode that has been designed or the implementation proposals.’
The young women interns [in one story in this post] worked in a very different way. As I explored their notes, I noticed that ideas were expanded upon, not abandoned. Challenges were identified, but the male language so often heard in Silicon Valley conference rooms – “Well, let me tell you what the problem with that idea is….” – was not in the room. These young women, without men to define the “appropriate business behavior,” used different behaviors and came up with a startling and valuable solution. They showed many of the values that exist outside of dominance-based leadership: strategic thinking, intuition, nurturing and relationship building, values-based decision-making and acceptance of other’s input. Women need space to be themselves at work. Until people who have created their success by worshipping at the temple of male behavior, like Sheryl Sandberg, learn to value alternate behaviors, the working world will remain a foreign and hostile culture to women. And if we do not continuously work to build corporate cultures where there is room for other behaviors, women will be cast from or abandoned in a world not of our making, where we continuously “just do not fit in,” but where we still must go to earn our livings.
Heh, nice trolling.
Here are two helpful guidelines (for largely disjoint populations): If you are going to use a big data system for yourself, see if it is faster than your laptop. If you are going to build a big data system for others, see that it is faster than my laptop. [...] We think everyone should have to do this, because it leads to better systems and better research.
Give them the power, they’ll use that power. ‘A document obtained under Freedom of Information legislation confirms the BBC’s use of RIPA in Northern Ireland. It states: “The BBC may, in certain circumstances, authorise under the Regulation of Investigatory Powers Act 2000 and Regulation of Investigatory Powers (British Broadcasting Corporation) Order 2001 the lawful use of detection equipment to detect unlicensed use of television receivers… the BBC has used detection authorised under this legislation in Northern Ireland.”‘
Researchers are warned off [discussing] 512-bits-plus key lengths, systems “designed or modified to perform cryptanalytic functions, or “designed or modified to use ‘quantum cryptography’”. [....] “an email to a fellow academic could land you a 10 year prison sentence”.https://twitter.com/_miw/status/556023024009224192 notes ‘the DSGL 5A002 defines it as >512bit RSA, >512bit DH, >112 bit ECC and >56 bit symmetric ciphers; weak as fuck i say.’
I drive a Toyota, and this is scary stuff. Critical software systems need to be coded with care, and this isn’t it — they don’t even have a bug tracking system!
Investigations into potential causes of Unintended Acceleration (UA) for Toyota vehicles have made news several times in the past few years. Some blame has been placed on floor mats and sticky throttle pedals. But, a jury trial verdict was based on expert opinions that defects in Toyota’s Electronic Throttle Control System (ETCS) software and safety architecture caused a fatal mishap. This talk will outline key events in the still-ongoing Toyota UA litigation process, and pull together the technical issues that were discovered by NASA and other experts. The results paint a picture that should inform future designers of safety critical software in automobiles and other systems.
“We have spoken to the National Consumer Agency, logistics companies and Digital Rights Ireland, with which we have had an indepth conversation to see if there is anything in the proposal that might be considered to have an impact on anyone’s privacy. Broadly, they are satisfied with what we are doing,” [Patricia Cronin, head of the Department of Communications’ postcodes division] told the committee. However in his letter, [DRI's] O’Lachtnain said the group “want to state clearly that we are not at all ‘satisfied’ with the postcode that has been designed or the implementation proposals”.Some nerve!
Today, 23andMe announced what Forbes reports is only the first of ten deals with big biotech companies: Genentech will pay up to $60 million for access to 23andMe’s data to study Parkinson’s. You think 23andMe was about selling fun DNA spit tests for $99 a pop? Nope, it’s been about selling your data all along.
Really nice time series dashboarding app. Might consider replacing graphitus with this…
This is pretty incredible.
Balzer downloaded a free software program called InVesalius, developed by a research center in Brazil to convert MRI and CT scan data to 3D images. He used it to create a 3D volume rendering from Scott’s DICOM images, which allowed him to look at the tumor from any angle. Then he uploaded the files to Sketchfab and shared them with neurosurgeons around the country in the hope of finding one who was willing to try a new type of procedure. Perhaps unsurprisingly, he found the doctor he was looking for at UPMC, where Scott had her thyroid removed. A neurosurgeon there agreed to consider a minimally invasive operation in which he would access the tumor through Scott’s left eyelid and remove it using a micro drill. Balzer had adapted the volume renderings for 3D printing and produced a few full-size models of the front section of Scott’s skull on his MakerBot. To help the surgeon vet his micro drilling idea and plan the procedure, Balzer packed up one of the models and shipped it off to Pittsburgh.
Some good advice and guidelines (although some are just silly).
The researchers started with 86,000 subjects who had filled out the 100-question personality profile – and this, of course, was done as another app on Facebook – and whose personality scores had been matched by algorithms with their Facebook likes. They then found 17,000 who were willing to have a friend or family member take the personality test on their behalf, trying to predict the answers they would give. The results, from most humans, were stunningly inaccurate. Friends, family and co-workers were all less able to predict how someone would fill out a personality test than the algorithms that had been primed with the subject’s Facebook likes. With only 10 likes to work on, the computer was more accurate than a work colleague would be. With 150 likes, it described the subject’s personality better than a parent or sibling could. And with 300 likes to work on, it was more accurate than a spouse.
One insider at a major US technology firm told the Guardian that “politicians are fond of asking why it is that tech companies don’t base themselves in the UK” … “I think if you’re saying that encryption is the problem, at a time when consumers and businesses see encryption as a very necessary part of trust online, that’s a very indicative point of view.”
ffs Apple. (Via Tony Finch)
A good reference URL to cut-and-paste when “scanning internet traffic for terrorist plots” rears its head:
This unrealistically accurate system will generate 1 billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999 percent and you’re still chasing 2,750 false alarms per day — but that will inevitably raise your false negatives, and you’re going to miss some of those 10 real plots.Also, Ben Goldacre saying the same thing: http://www.badscience.net/2009/02/datamining-would-be-lovely-if-it-worked/
The Prime Minister said today that he would stop the use of methods of communication that cannot be read by the security services even if they have a warrant. He said: “In our country, do we want to allow a means of communication between people which […] we cannot read?” He made the connection between encrypted communications tools and letters and phone conversations, both of which can be read by security services in extreme situations and with a warrant from the home secretary.Is this key escrow for the UK?
this is a great shopping list ;)
Lots and lots of good detail into the Spotify C* setup (via Bill de hOra)
The report’s revelations, based on a survey of nearly 800 writers worldwide, are alarming. Concern about surveillance is now nearly as high among writers living in democracies (75%) as among those living in non-democracies (80%). The levels of self-censorship reported by writers living in democratic countries are approaching the levels reported by writers living in authoritarian or semi-democratic countries.
the urgency of repealing the Irish blasphemy legislation cannot now be overstated. The same cartoons that saw their authors murdered for blasphemy recently, would see Irish authors hauled before our courts. The same nations that execute their citizens for blasphemy, wish to promote the wording of the Irish blasphemy legislation through the UN, in order to expand such provisions to more countries. Ireland is the only European country to recently introduce a new blasphemy law. Following the horrific recent events in Paris, let us be the next country to repeal our blasphemy laws.
If you haven’t heard about it, it is a compulsory database of the personal information of children, including PPS numbers, ethnicity, race and language skills, to be held for decades and shared across State agencies.
What if Silicon Valley had emerged from a racially integrated community? Would the technology industry be different? Would we? And what can the technology industry do now to avoid repeating the mistakes of the past?Amazing article — this is the best thing I’ve ever read on TechCrunch: the political history of race in Silicon Valley and East Palo Alto.
All of our assets loaded via the CDN [to our client in Australia] in just under 5 seconds. It only took ~2.7s to get those same assets to our friends down under with SPDY. The performance with no CDN blew the CDN performance out of the water. It is just no comparison. In our case, it really seems that the advantages of SPDY greatly outweigh that of a CDN when it comes to speed.
Excellent “In Focus” this week — ‘The continued massive growth of connected mobile devices is shaping not only how we communicate with each other, but how we look, behave, and experience the world around us. Smartphones and other handheld devices have become indispensable tools, appendages held at arm’s length to record a scene or to snap a selfie. Recent news photos show refugees fleeing war-torn regions holding up their phones as prized possessions to be saved, and relatives of victims lost to a disaster holding up their smartphones to show images of their loved ones to the press. Celebrity selfies, people alone in a crowd with their phones, events obscured by the very devices used to record that event, the brightly lit faces of those bent over their small screens, these are some of the scenes depicted below.’
‘Unlike existing alternatives, such as stream processing, that favor the execution of arbitrary application code, we want to capture much of the processing logic as a set of known operations over specialized Computational CRDTs, with particular semantics and invariants, such as min/max/average/median registers, accumulators, top-N sets, sorted sets/maps, and so on. Keeping state also allows the system to decrease the amount of propagated information. Preliminary results obtained in a single example show that Titan has an higher throughput when compared with state of the art stream processing systems.’
‘Turn websites into structured APIs from your browser in seconds’ — next-generation web scraping, recommended by conoro
as one insider told me, it feels like “Lab126 is in the doghouse” and that “Jeff is taking out his frustration with the failure of the Fire Phone” on upper management.
a conceptual model, with accompanying XML schema, that may be used to quantify and exchange complex uncertainties in data. The interoperable model can be used to describe uncertainty in a variety of ways including: Samples Statistics including mean, variance, standard deviation and quantile Probability distributions including marginal and joint distributions and mixture models
How to secure SSH, disabling insecure ciphers etc. (via Padraig)
Make “Paste and Match Style” the default, as it should be
Twitter open-sources an anomaly-spotting R package:
Early detection of anomalies plays a key role in ensuring high-fidelity data is available to our own product teams and those of our data partners. This package helps us monitor spikes in user engagement on the platform surrounding holidays, major sporting events or during breaking news. Beyond surges in social engagement, exogenic factors – such as bots or spammers – may cause an anomaly in number of favorites or followers. The package can be used to find such bots or spam, as well as detect anomalies in system metrics after a new software release. We’re open-sourcing AnomalyDetection because we’d like the public community to evolve the package and learn from it as we have.
Rx/reactive in style, autoscaling, support for queue/broker-based strong consistency as well as TCP-based lossy delivery
‘I now a man with a wooden leg named sea what was the name of the other leg SAND’
Fergal Crehan’s new gig — good idea!
The Hit Team helps you fight back against leaked photos and videos, internet targeting and revenge porn.
Beyond the interesting-enough stuff about scalability in a distributed SQL store, there’s this really nifty point about avoiding the horrors of the SQL/ORM impedance mismatch:
At Google, Protocol Buffers are ubiquitous for data storage and interchange between applications. When we still had a MySQL schema, users often had to write tedious and error-prone transformations between database rows and in-memory data structures. Putting protocol buffers in the schema removes this impedance mismatch and gives users a universal data structure they can use both in the database and in application code…. Protocol Buffer columns are more natural and reduce semantic complexity for users, who can now read and write their logical business objects as atomic units, without having to think about materializing them using joins across several tables.This is something that pretty much any store can already adopt. Go protobufs. (or Avro, etc.) Also, I find this really neat, and I hope this idea is implemented elsewhere soon: asynchronous schema updates:
Schema changes are applied asynchronously on multiple F1 servers. Anomalies are prevented by the use of a schema leasing mechanism with support for only current and next schema versions; and by subdividing schema changes into multiple phases where consecutive pairs of changes are mutually compatible and cannot cause anomalies.
This is a really excellent post on the topic, rebutting Paul Graham’s Bay-Area-centric thoughts on the topic very effectively. I’ve worked in both distributed and non-distributed, as well as effective and ineffective teams ;), and Avleen’s thoughts are very much on target.
I’ve been involved in the New York start up scene since I joined Etsy in 2010. Since that time, I’ve seen more and more companies there embrace having distributed teams. Two companies I know which have risen to the top while doing this have been Etsy and DigitalOcean. Both have exceptional engineering teams working on high profile products used by many, many people around the world. There are certainly others outside New York, including Automattic, GitHub, Chef Inc, Puppet… the list goes on. So how did this happen? And why do people continue to insist that distributed teams lower performance, and are a bad idea? Partly because we’ve done a poor job of showing our industry how to be successful at it, and partly because it’s hard. Having successful distributed teams requires special skills from management, which arent’t easily learned until you have to manage a distributed team. Catch 22.
As used in Cassandra ( http://grokbase.com/t/hbase/dev/13bf9kezes/about-xx-threadprioritypolicy-42 )!
if you just set the “ThreadPriorityPolicy” to something else than the legal values 0 or 1, [...] a slight logic bug in Sun’s JVM code kicks in, and thus sets the policy to be as if running with root – thus you get exactly what one desire. The operating system, Linux, won’t allow priorities to be heightened above “Normal” (negative nice value), and thus just ignores those requests (setting it to normal instead, nice value 0) – but it lets through the requests to set it lower (setting the nice value to some positive value).
Enigma is a Linux based alternative to the default Spark operating system on these boxes. Enigma is a more customisable OS and provides the ability to add plugins which can accomplish many tasks enabling users to have a box which might look and perform like a Sky box, giving a 7 day EPG and an alternative to series link.Looks like a pretty solid hacker community…
William Hague, the leader of the House of Commons, has responded to concerns raised by an MP about the security of parliamentary data stored on Microsoft’s Cloud-based servers in Europe. “The relevant servers are situated in the Republic of Ireland and the Netherlands, both being territories covered by the EC Data Protection Directive,” William Hague wrote in a letter to John Hemming, MP for Birmingham Yardley. “Any access by US authorities to such data would have to be by way of mutual legal assistance arrangements with those countries.” [...] John Hemming MP told Computer Weekly Hague’s reassurances carried little weight in the face of aggressive legal action by the US government. “The Microsoft case makes it clear that, in the end, the fact that Microsoft is a US company legally trumps the European Data Protection Directive [...] and where [the letter says] the US authorities could not exercise a right of search and seizure on an extraterritorial basis, well, they are doing that, in America, today.”Sounds like they didn’t think that through…
Nearly half the EU-wide average.
Sweden has also created 12,600 safer pedestrian crossings with features such as bridges, flashing lights, and speed bumps. That’s estimated to have halved pedestrian deaths over the past five years. The country has lowered speed limits in urban, crowded areas and built barriers to protect bikers from incoming traffic. A crackdown on drunk driving has also helped.
Formats the year based on ISO week numbering, which often is not what you want. Both have been responsible for high-profile production bugs (in Apple and Android).
Spectacularly inept. Pretty much every UGC site there is
Wow, where has this person been for the past 20 years that they haven’t had to encounter this? I can only imagine having a private office, tbh.
my personal performance at work has hit an all-time low. Each day, my associates and I are seated at a table staring at each other, having an ongoing 12-person conversation from 9 a.m. to 5 p.m. It’s like being in middle school with a bunch of adults. Those who have worked in private offices for decades have proven to be the most vociferous and rowdy. They haven’t had to consider how their loud habits affect others, so they shout ideas at each other across the table and rehash jokes of yore. As a result, I can only work effectively during times when no one else is around, or if I isolate myself in one of the small, constantly sought-after, glass-windowed meeting rooms around the perimeter.
‘The fee [airline pricing] model comes with systematic costs that are not immediately obvious. Here’s the thing: in order for fees to work, there needs be something worth paying to avoid. That necessitates, at some level, a strategy that can be described as “calculated misery.” Basic service, without fees, must be sufficiently degraded in order to make people want to pay to escape it. And that’s where the suffering begins.’
‘Ádám was trying his hand at a problem in Excel, but the official rules prohibit the use of Excel macros. In a daze, he came up with one of the most clever uses of Excel: building an assembly interpreter with the most popular spreadsheet program. This is a virtual Harvard architecture machine without writable RAM; the stack is only lots and lots of IFs.’
A causal profiler for C++.
Causal profiling is a novel technique to measure optimization potential. This measurement matches developers’ assumptions about profilers: that optimizing highly-ranked code will have the greatest impact on performance. Causal profiling measures optimization potential for serial, parallel, and asynchronous programs without instrumentation of special handling for library calls and concurrency primitives. Instead, a causal profiler uses performance experiments to predict the effect of optimizations. This allows the profiler to establish causality: “optimizing function X will have effect Y,” exactly the measurement developers had assumed they were getting all along.I can see this being a good technique to stochastically discover race conditions and concurrency bugs, too.
This is the version with the superfast petabyte-sort record:
Spark 1.2 includes several cross-cutting optimizations focused on performance for large scale workloads. Two new features Databricks developed for our world record petabyte sort with Spark are turned on by default in Spark 1.2. The first is a re-architected network transfer subsystem that exploits Netty 4’s zero-copy IO and off heap buffer management. The second is Spark’s sort based shuffle implementation, which we’ve now made the default after significant testing in Spark 1.1. Together, we’ve seen these features give as much as 5X performance improvement for workloads with very large shuffles.
As the 1 January deadline gallops towards the EU, microbusinesses desperate to stay open without breaking the law try to find out, “Can I email stuff out instead?” Well… Yes. – No – It depends – and simultaneously yes AND no, according to Schrödinger’s VAT. So that’s clear, then.
Nice work, EU
I keep forgetting about sshuttle. It’s by far the easiest way to get a cheapo IP-over-SSH VPN working with an OSX client, particularly since it’s in homebrew
Things hotting up in TOR-land.
Until I have had the time and information available to review the situation, I am strongly recommending my mirrors are not used under any circumstances. If they come back online without a PGP signed message from myself to further explain the situation, exercise extreme caution and treat even any items delivered over TLS to be potentially hostile.
$14.99 ebook, recommended by Steve Vinoski, looks good
Google made a change in Android 4.4 which allows operators to know when users are using tethering and conveniently block tethered devices from accessing internet. This can be fixed permanently using the following procedure.Well this is stupid. (via Tony Finch)
It’s now over a year since Edward Snowden went public with evidence of mass surveillance and extensive abuses by the NSA, GCHQ and other intelligence agencies. In other countries these revelations prompted parliamentary inquiries, diplomatic representations and legislation. In Ireland the only response was a promise [..] to help extradite Mr Snowden should he land here.
For the record
To demonstrate that hackers have no interest in suppressing speech, quashing controversy, or being intimidated by vague threats, we ask that Sony allow the hacker community to distribute “The Interview” for them on the 25th of December. Now, we’re aware that Sony may refer to this distribution method as piracy, but in this particular case, it may well prove to be the salvation of the motion picture industry. By freely offering the film online, millions of people will get to see it and decide for themselves if it has any redeeming qualities whatsoever – as opposed to nobody seeing it and the studios writing it off as a total loss. Theaters would be free from panic as our servers would become the target of any future vague threats (and we believe Hollywood will be most impressed with how resilient peer-to-peer distribution can be in the face of attacks). Most importantly, we would be defying intimidation, something the motion picture industry doesn’t quite have a handle on, which is surprising considering how much they’ve relied upon it in the past.
Need to keep an eye out for a few of these — will probably be a little more than $30 given the whole import/export carry-on of course
In CAP terms, ZooKeeper is CP, meaning that it’s consistent in the face of partitions, not available. For many things that ZooKeeper does, this is a necessary trade-off. Since ZooKeeper is first and foremost a coordination service, having an eventually consistent design (being AP) would be a horrible design decision. Its core consensus algorithm, Zab, is therefore all about consistency. For coordination, that’s great. But for service discovery it’s better to have information that may contain falsehoods than to have no information at all. It is much better to know what servers were available for a given service five minutes ago than to have no idea what things looked like due to a transient network partition. The guarantees that ZooKeeper makes for coordination are the wrong ones for service discovery, and it hurts you to have them.Yes! I’ve been saying this for months — good to see others concurring.
omg, Die Gute Fabrik’s game collection featuring the AMAZING Johann Sebastian Joust — now available on Mac, Linux and (missing JSJ) Windows. Time to buy an assload of Move controllers!
Nice worked-through Lambda example
Oh god yes. This is absolutely spot on, as you would expect from a Google paper — at this stage they probably have accumulated more real-world ML-at-scale experience than anywhere else. ‘Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns. [....] ‘In this paper, we focus on the system-level interaction between machine learning code and larger systems as an area where hidden technical debt may rapidly accumulate. At a system-level, a machine learning model may subtly erode abstraction boundaries. It may be tempting to re-use input signals in ways that create unintended tight coupling of otherwise disjoint systems. Machine learning packages may often be treated as black boxes, resulting in large masses of “glue code” or calibration layers that can lock in assumptions. Changes in the external world may make models or input signals change behavior in unintended ways, ratcheting up maintenance cost and the burden of any debt. Even monitoring that the system as a whole is operating as intended may be difficult without careful design. Indeed, a remarkable portion of real-world “machine learning” work is devoted to tackling issues of this form. Paying down technical debt may initially appear less glamorous than research results usually reported in academic ML conferences. But it is critical for long-term system health and enables algorithmic advances and other cutting-edge improvements.’
Since Operation Torpedo [use of a Metasploit side project], there’s evidence the FBI’s anti-Tor capabilities have been rapidly advancing. Torpedo was in November 2012. In late July 2013, computer security experts detected a similar attack through Dark Net websites hosted by a shady ISP called Freedom Hosting—court records have since confirmed it was another FBI operation. For this one, the bureau used custom attack code that exploited a relatively fresh Firefox vulnerability—the hacking equivalent of moving from a bow-and-arrow to a 9-mm pistol. In addition to the IP address, which identifies a household, this code collected the MAC address of the particular computer that infected by the malware. “In the course of nine months they went from off the shelf Flash techniques that simply took advantage of the lack of proxy protection, to custom-built browser exploits,” says Soghoian. “That’s a pretty amazing growth … The arms race is going to get really nasty, really fast.”
Microsoft -v- USA is an important ongoing case, currently listed for hearing in 2015 before the US Federal Court of Appeal of the 2nd Circuit. However, as the case centres around the means by which NY law enforcement are seeking to access data of an email account which resides in Dublin, it is also crucially significant to Ireland and the rest of the EU. For that reason, Digital Rights Ireland instructed us to file an Amicus Brief in the US case, in conjunction with the global law firm of White & Case, who have acted pro bono in their representation. Given the significance of the case for the wider EU, both Liberty and the Open Rights Group in the UK have joined Digital Rights Ireland as amici on this brief. We hope it will be of aid to the US court in assessing the significance of the order being appealed by Microsoft for EU citizens and European states, in the light of the existing US and EU Mutual Legal Assistance Treaty.
Hey look, PID 1 segfaulting! I haven’t seen that happen since we managed to corrupt /bin/sh on Ultrix in 1992. Nice work Fedora
GCHQ maintains a huge repository named MUTANT BROTH that stores billions of these intercepted cookies, which it uses to correlate with IP addresses to determine the identity of a person. GCHQ refers to cookies internally as “target detection identifiers.”
Generate graphs/flowcharts from text a la Markdown. Pretty much identical to graphviz surely?
Very impressive. I particularly like the use of Tester Dojos to get through a backlog of unwritten tests — we had a similar problem recently…
From 7-8pm on Friday, [RepricerExpress] software, used by third-party sellers to ensure their products are the cheapest on the market, went a bit haywire and reduced prices to as little as 1p.
Wow, this looks cool. $159
littleBits and Korg have demystified a traditional analog synthesizer, making it super easy for novices and experts alike to create music. connects to speakers, computers and headphones. can be used to make your own instruments. fits into the littleBits modular system for infinite combos of audio, visual and sensory experiences