The intuition behind Hydra is something like this, “I have a lot of data, and there are a lot of things I could try to learn about it — so many that I’m not even sure what I want to know.” It’s about the curse of dimensionality — more dimensions means exponentially more cost for exhaustive analysis. Hydra tries to make it easy to reduce the number of dimensions, or the cost of watching them (via probabilistic data structures), to just the right point where everything runs quickly but can still answer almost any question you think you might care about.Code: https://github.com/addthis/hydra Getting Started blog post: https://www.addthis.com/blog/2014/02/18/getting-started-with-hydra/
a Cisco fail.
It looks like there’s a firewall in the middle that’s doing additional TCP sequence randomisation which was a good thing, but has been fixed in all current operating systems. Unfortunately, it seems that firewall doesn’t understand TCP SACK, which when coupled with a small amount of packet loss and a stateful host firewall that blocks invalid packets results in TCP connections that stall randomly. A little digging revealed that firewall to be the Cisco Firewall Services Module on our Canterbury network border.(via Tony Finch)
‘Having the private keys inaccessible is a good defense in depth move. For this patch to work you have to make sure all sensitive values are stored in the secure area, not just check that the area looks inaccessible. You can’t do that by keeping the private key in the same process. A review by a security engineer would have prevented a false sense of security. A version where the private key and the calculations are in a separate process would be more secure. If you decide to write that version, I’ll gladly see if I can break that too.’ Akamai’s response: https://blogs.akamai.com/2014/04/heartbleed-update-v3.html — to their credit, they recognise that they need to take further action. (via Tony Finch)
Colm MacCarthaigh writes about a simple sharding/load-balancing algorithm which uses randomized instance selection and optional additional compartmentalization. See also: continuous hashing, and http://aphyr.com/posts/278-timelike-2-everything-fails-all-the-time
phase I, a source code audit by iSEC Partners, is now complete. Bruce Schneier says: “I’m still using it”.
In the PNAS paper, Brad Bushman and colleagues looked at 107 couples over 21 days and found that people experiencing uncharacteristically low blood sugar were more likely to display anger toward their spouse. (The researchers measured this by having subjects stick needles into voodoo dolls representing their significant others.)
Where it is not possible to avoid reversing, it is ESB policy that staff driving on behalf of the company or anybody on company premises should reverse into car spaces/bays, allowing them to drive out subsequently.BUT WHYYYYYYYYYY
from nginx. ‘Based on the findings, we recommend everyone reissue + revoke their private keys.’
Fastmail.FM nearly had their domain stolen through an attack exploiting missing 2FA authentication in Gandi.
An important lesson learned is that just because a provider has a checkbox labelled “2 factor authentication” in their feature list, the two factors may not be protecting everything – and they may not even realise that fact themselves. Security risks always come on the unexpected paths – the “off label” uses that you didn’t think about, and the subtle interaction of multiple features which are useful and correct in isolation.
Steve Marquess of the OpenSSL Foundation on their funding, and lack thereof:
I stand in awe of their talent and dedication, that of Stephen Henson in particular. It takes nerves of steel to work for many years on hundreds of thousands of lines of very complex code, with every line of code you touch visible to the world, knowing that code is used by banks, firewalls, weapons systems, web sites, smart phones, industry, government, everywhere. Knowing that you’ll be ignored and unappreciated until something goes wrong. The combination of the personality to handle that kind of pressure with the relevant technical skills and experience to effectively work on such software is a rare commodity, and those who have it are likely to already be a valued, well-rewarded, and jealously guarded resource of some company or worthy cause. For those reasons OpenSSL will always be undermanned, but the present situation can and should be improved. There should be at least a half dozen full time OpenSSL team members, not just one, able to concentrate on the care and feeding of OpenSSL without having to hustle commercial work. If you’re a corporate or government decision maker in a position to do something about it, give it some thought. Please. I’m getting old and weary and I’d like to retire someday.
a system for building agents that perform automated tasks for you online. They can read the web, watch for events, and take actions on your behalf. Huginn’s Agents create and consume events, propagating them along a directed event flow graph. Think of it as Yahoo! Pipes plus IFTTT on your own server. You always know who has your data. You do.MIT-licensed open source, built on Rails.
Poul-Henning Kemp details why Varnish doesn’t do SSL — basically due to the quality and complexity of open-source SSL implementations:
There is no other way we can guarantee that secret krypto-bits do not leak anywhere they should not, than by fencing in the code that deals with them in a child process, so the bulk of varnish never gets anywhere near the certificates, not even during a core-dump.Now looking pretty smart, post-Heartbleed.
Tiered storage is turning out to be a pretty practical trick to take advantage of SSDs:
The justification for two types/speeds of storage arrays is simple. leveldb is extremely write intensive in its lower levels. The write intensity drops off as the level number increases. Similarly, current and frequently updated data tends to be in lower levels while archival data tends to be in higher levels. These leveldb characteristics create a desire to have faster, more expensive storage arrays for the high intensity lower levels. This branch allows the high intensity lower levels to be on expensive storage arrays while slower, less expensive storage arrays to hold the higher level data to reduce costs.
This is a great point:
Obviously, those tending to the security protocols that support the rest of the Web need better infrastructure and more funding. “Large portions of the software infrastructure of the Internet are built and maintained by volunteers, who get little reward when their code works well but are blamed, and sometimes savagely derided, when it fails,” writes Foster in the New Yorker. [...] “money and support still tend to flow to the newest and sexiest projects, while boring but essential elements like OpenSSL limp along as volunteer efforts,” he writes. “It’s easy to take open-source software for granted, and to forget that the Internet we use every day depends in part on the freely donated work of thousands of programmers.” We need to find ways to pay for work that is currently essentially donated freely. One promising project is Bithub, from Whisper Systems, where people who make valuable contributions to open source projects are rewarded (with Bitcoin of course). But the pool of Bitcoin is still donation based. The Internet has helped create a culture of free, but what we may need to recognize is that we get what we pay for. Well-funded companies pulling critical code from open source projects for their sites should have formal fee arrangements, rather than the volunteer group simply hoping these users will pony up some Benjamins for “prominent logo placement” on a website most people had never heard of before Heartbleed.
An excellent list of aspects of the Heartbleed OpenSSL bug which need to be thought about/talked about/considered
‘Yes, clients are vulnerable to attack. A malicious server can use the Heartbleed vulnerability to compromise an affected client.’ Ouch.
pricing announced, and available in most regions
Great new distcomp protocol work from Peter Bailis et al:
We’ve developed three new algorithms—called Read Atomic Multi-Partition (RAMP) Transactions—for ensuring atomic visibility in partitioned (sharded) databases: either all of a transaction’s updates are observed, or none are. [...] How they work: RAMP transactions allow readers and writers to proceed concurrently. Operations race, but readers autonomously detect the races and repair any non-atomic reads. The write protocol ensures readers never stall waiting for writes to arrive. Why they scale: Clients can’t cause other clients to stall (via synchronization independence) and clients only have to contact the servers responsible for items in their transactions (via partition independence). As a consequence, there’s no mutual exclusion or synchronous coordination across servers. The end result: RAMP transactions outperform existing approaches across a variety of workloads, and, for a workload of 95% reads, RAMP transactions scale to over 7 million ops/second on 100 servers at less than 5% overhead.
wow, WFMU’s “Beware of the Blog” did a really good job of assembling these classic 1960s Cambodian psych mp3s, featuring Sinn Sisamouth and Ros Sereysothea among others. I still love the start of Ma Pi Naok….
Very interesting new approach to building a scalable in-memory K/V store. As Rajiv Kurian notes on the mechanical-sympathy list: ‘The basic idea is that each core is responsible for a portion of the key-space and requests are forwarded to the right core, avoiding multiple-writer scenarios. This is opposed to designs like memcache which uses locks and shared memory. Some of the things I found interesting: The single writer design is taken to an extreme. Clients assist the partitioning of requests, by calculating hashes before submitting GET requests. It uses Intel DPDK instead of sockets to forward packets to the right core, without processing the packet on any core. Each core is paired with a dedicated RX/TX queue. The design for a lossy cache is simple but interesting. It does things like replacing a hash slot (instead of chaining) etc. to take advantage of the lossy nature of caches. There is a lossless design too. A bunch of tricks to optimize for memory performance. This includes pre-allocation, design of the hash indexes, prefetching tricks etc. There are some other concurrency tricks that were interesting. Handling dangling pointers was one of them.’ Source code here: https://github.com/efficient/mica
Open Bidder traditionally used Jetty as an embedded webserver, for the critical tasks of accepting connections, processing HTTP requests, managing service threads, etc. Jetty is a robust, but traditional stack that carries the weight and tradeoffs of Servlet’s 15 years old design. For a maximum performance RTB agent that must combine very large request concurrency with very low latencies, and often benefit also from low-level control over the transport, memory management and other issue, a different webserver stack was required. Open Bidder now supports Netty, an asynchronous, event-driven, high-performance webserver stack. For existing code, the most important impact is that Netty is not compatible with the Servlet API. Its own internal APIs are often too low-level, not to mention proprietary to Netty; so Open Bidder v0.5 introduces some new, stack-neutral APIs for things like HTTP requests and responses, cookies, request handlers, and even simple HTML templating based on Mustache. These APIs will work with both Netty and Jetty. This means you don’t need to change any code to switch between Jetty and Netty; on the other hand, it also means that existing code written for Open Bidder 0.4 may need some changes even if you plan to keep using Jetty. [....] Netty’s superior efficiency is very significant; it supports 50% more traffic in the same hardware, and it maintains a perfect latency distribution even at the peak of its supported load.This doc is noteworthy on a couple of grounds: 1. the use of Netty in a public API/library, and the additional layer in place to add a friendlier API on top of that. I hope they might consider releasing that part as OSS at some point. 2. I also find it interesting that their API uses protobufs to marshal the message, and they plan in a future release to serialize those to JSON documents — that makes a lot of sense.
Students, staff and alumni put pressure on Provost to reconsider changes to Trinity College Dublin’s name and coat of arms.
alumni scholars from 2004 and 1994 who had been invited back for the dinner shouted ‘Dublin’ after the Provost welcomed them back to “Trinity College”.
We shouldn’t think of “the web” as only what renders in web browsers. We should think of the web as anything transmitted using HTTP and HTTPS. Apps and websites are peers, not competitors. They’re all just clients to the same services.+1. Finally, a Daring Fireball post I agree with! ;)
A good demonstration of what it looks like when network-level packet corruption occurs on a TCP connection
The Court has found that data retention “entails a wide-ranging and particularly serious interference with the fundamental rights to respect for private life and to the protection of personal data” and that it “entails an interference with the fundamental rights of practically the entire European population”. TJ McIntyre, Chairman of Digital Rights Ireland, said that “This is the first assessment of mass surveillance by a supreme court since the Snowden revelations. The ECJ’s judgement finds that untargeted monitoring of the entire population is unacceptable in a democratic society.” [...] Though the Directive has now been struck down, the issue will remain live in all the countries who have passed domestic law to implement the data retention mass surveillance regime. Digital Rights Ireland’s challenge to the Irish data retention system will return to the High Court in Dublin for the next phase of litigation.
a cron job monitoring tool that keeps an eye on your periodic processes and notifies you when something doesn’t happen. Daily backups, monthly emails, or cron jobs you need to monitor? Dead Man’s Snitch has you covered. Know immediately when one of these processes doesn’t work.via Marc.
the Apollo 11 ALO and LM Descent Monitoring charts are tidied up and downloadable
LinkedIn talk about the GC opts they used to optimize the Feed. good detail
‘In the grand scheme of things, insurance trumps consistency.’ Love it.
neat hack. Pity it returns a 403 error code due to the misuse of the ErrorDocument feature though
Tracking Irish News Stories Over Time; Irish NewsDiffs archives changes in articles after publication. Currently, we track rte.ie and irishtimes.com.
Jeff Atwood’s persuasive argument that remote working needs to be the norm in tech work:
There’s an elephant in the room in the form of an implied clause: Always hire the best people… who are willing to live in San Francisco. Substitute Mountain View, New York, Boston, Chicago, or any other city. The problem is the same. We pay lip service to the idea of hiring the best people in the world — but in reality, we’re only hiring the best people who happen to be close by.
in excruciating detail
For EVERY youtube video, I always open the video and then immediately punch the slider bar to about 30 percent. For example, in this video, it should have just started at :40. Everything before :40 was a waste. This holds true for nearly every video in the universe.Via Joe Drumgoole — it’s true!
Advice from a developer who helped rebuild Walmart.ca with Scala and PlayThis is really good advice.
open source, system-level exploration: capture system state and activity from a running Linux instance, then save, filter and analyze. Think of it as strace + tcpdump + lsof + awesome sauce. With a little Lua cherry on top.This sounds excellent. Linux-based, GPLv2.
Impressive, but a fierce whiff of “NIH” off of this
European fans of the open internet can breathe a sigh of relief: the European parliament has passed a major package of telecoms law reform, complete with amendments that properly define and protect net neutrality. The amendments were introduced by the Socialist, Liberal, Green and Left blocs in the European Parliament after the final committee to tweak the package – the industry committee – left in a bunch of loopholes that would have allowed telcos to start classifying web services of their choice as “specialized services” that they can treat differently. [...] Now the whole package gets passed through to the next Parliament (elections are coming up in May), then the representatives of European countries for final approval.
Finally got around to migrating this old CPAN module to github
an easily embeddable, decentralized, k-ordered unique ID generator. It can use the same encoded ID format as Twitter’s Snowflake or Boundary’s Flake implementations as well as any other customized encoding without too much effort. The fauxflake-core module has no external dependencies and is meant to be about as light as possible while still delivering useful functionality. Essentially, if you want to be able to generate a unique identifier across your infrastructure with reasonable assurances about collisions, then you might find this useful.From the same guy as the excellent Guava Retrier library; java, ASL2-licensed open source.
Another cool library from Roy Holder: ‘an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.’ Similar to his Guava-Retrier java lib, but using a decorator.
good comment thread on HN, discussing hlld and bloomd as well
This is a brilliant feature. It just sent a warning to a friend about an old account he was no longer using
from Pól Ó Conghaile’s “Secret Dublin”. great stuff
must give this a try. performance will be the issue I suspect
the inside story of the great work done by Paul Buchheit, Kevin Fox, and Sanjeev Singh to reinvent email in 2004
this HN thread on the age-old UDP vs TCP question is way better than the original post — lots of salient comments
‘a.k.a. Faster == Better’. Slides from a talk given at Facebook on 19th March 2014 by Norman Maurer
Switching over to daylight saving time, and losing one hour of sleep, raised the risk of having a heart attack the following Monday by 25 per cent, compared to other Mondays during the year, according to a new US study released today. [...] The study found that heart attack risk fell 21 per cent later in the year, on the Tuesday after the clock was returned to standard time, and people got an extra hour’s sleep.One clear answer: we need 25-hour days. More details: http://www.sciencedaily.com/releases/2014/03/140329175108.htm —
Researchers used Michigan’s BMC2 database, which collects data from all non-federal hospitals across the state, to identify admissions for heart attacks requiring percutaneous coronary intervention from Jan. 1, 2010 through Sept. 15, 2013. A total of 42,060 hospital admissions occurring over 1,354 days were included in the analysis. Total daily admissions were adjusted for seasonal and weekday variation, as the rate of heart attacks peaks in the winter and is lowest in the summer and is also greater on Mondays and lower over the weekend. The hospitals included in this study admit an average of 32 patients having a heart attack on any given Monday. But on the Monday immediately after springing ahead there were on average an additional eight heart attacks. There was no difference in the total weekly number of percutaneous coronary interventions performed for either the fall or spring time changes compared to the weeks before and after the time change.
Deep-packet inspection and rewriting on DNS packets for Google and OpenDNS servers. VPNs and DNSSEC up next!
a reasonable workaround for Ruby’s GC problems in web apps
actually sounds quite nice
This is a couple of years old, but I like this:
Turbo Boyer-Moore is disappointing, its name doesn’t do it justice. In academia constant overhead doesn’t matter, but here we see that it matters a lot in practice. Turbo Boyer-Moore’s inner loop is so complex that we think we’re better off using the original Boyer-Moore.A good demo of how large values of O(n) can be slower than small values of O(mn).
in a field as critical and competitive as smartphones, Google’s R&D strategy was being dictated, not by the company’s board, or by its shareholders, but by a desire not to anger the CEO of a rival company.This is utterly bananas and anti-competitive. (via Des Traynor)
Interesting — Netty has imported an optimized ASL2-licensed MPSC queue implementation from Akka (presumably for performance raisins)
disastrous GC bugs in Ruby, requiring horrible kludgy workarounds
[via Boing Boing:] A new, exhaustive report from Human Rights Watch details the way the young state of modern Ethiopia has become a kind of pilot program for the abuse of “off-the-shelf” surveillance, availing itself of commercial products from the US, the UK, France, Italy and China in order to establish an abusive surveillance regime that violates human rights and suppresses legitimate political opposition under the guise of a anti-terrorism law that’s so broadly interpreted as to be meaningless. The 137 page report [from Human Rights Watch] details the technologies the Ethiopian government has acquired from several countries and uses to facilitate surveillance of perceived political opponents inside the country and among the diaspora. The government’s surveillance practices violate the rights to freedom of expression, association, and access to information. The government’s monopoly over all mobile and Internet services through its sole, state-owned telecom operator, Ethio Telecom, facilitates abuse of surveillance powers.
The street finds its own uses for things, in this case Stinger/IMSI-catcher-type fake mobile-phone base stations:
Fake base stations are becoming a particularly popular modus operandi. Often concealed in a van or car, they are driven through city streets to spread their messages. The professional spammer in question charged 1,000 yuan (£100) to spam thousands of users in a radius of a few hundred metres. The pseudo-base station used could send out around 6,000 messages in just half an hour, the report said. Often such spammers are hired by local businessmen to promote their wares.(via Bernard Tyers)
The most grave issue is that each recording likely amounted to a serious criminal offence. Under Irish law, the recording of a telephone conversation on a public network without the consent of at least one party to the call amounts to an “interception”, a criminal offence carrying a possible term of imprisonment of up to five years. [...] Consequently, unless gardai were notified that their calls might be recorded then a large number of criminal offences are likely to have been committed by and within the Garda Siochana itself.
A cool-looking new debugging tool for C/C++ from Mozilla.
Many, many people have noticed that if we had a way to reliably record program execution and replay it later, with the ability to debug the replay, we could largely tame the nondeterminism problem. This would also allow us to deliberately introduce nondeterminism so tests can explore more of the possible execution space, without impacting debuggability. Many record and replay systems have been built in pursuit of this vision. (I built one myself.) For various reasons these systems have not seen wide adoption. So, a few years ago we at Mozilla started a project to create a new record-and-replay tool that would overcome the obstacles blocking adoption. We call this tool rr.Low runtime overhead; easy deployability; targeted at 32-bit (?!) Linux; OSS. (via Bryan O’Sullivan)
AIB now have a dedicated customer-support forum on Boards.ie. That is a *great* idea
A great reaction to Martin Fowler’s “microservices” coinage, from Arnon Rotem-Gal-Oz: ‘I guess it is easier to use a new name (Microservices) rather than say that this is what SOA actually meant’; ‘these are the very principles of SOA before vendors does pushed the [ESB] in the middle.’ Others have also chosen to define microservices slightly differently, as a service written in 10-100 LOC. Arnon’s reaction: “Nanoservice is an antipattern where a service is too fine-grained. A nanoservice is a service whose overhead (communications, maintenance, and so on) outweighs its utility.” Having dealt with maintaining an over-fine-grained SOA stack in Amazon, I can only agree with this definition; it’s easy to make things too fine-grained and create a raft of distributed-computing bugs and deployment/management complexity where there is no need to do so.
slightly ruined by the inclusion of some “deliberately Turing-complete” systems
The next step in the Turkish twitter-block arms race.
Bridge relays (or “bridges” for short) are Tor relays that aren’t listed in the main Tor directory. Since there is no complete public list of them, even if your ISP is filtering connections to all the known Tor relays, they probably won’t be able to block all the bridges. If you suspect your access to the Tor network is being blocked, you may want to use the bridge feature of Tor. The addition of bridges to Tor is a step forward in the blocking resistance race. It is perfectly possible that even if your ISP filters the Internet, you do not require a bridge to use Tor. So you should try to use Tor without bridges first, since it might work.
The detailed summaries of outages from cloud vendors are comprehensive and the response to each highlights many lessons in how to build robust distributed systems. For outages that significantly affected Netflix, the Netflix techblog report gives insight into how to effectively build reliable services on top of AWS. [....] I plan to collect reports here over time, and welcome links to other write-ups of outages and how to survive them.
This looks like an excellent new feature for parents:
A supervised user is a special type of Chrome user who can browse the web with guidance. Under the supervision of the manager, a supervised user can browse the web and sign in to websites. Supervised users don’t need a Google Account or an email address because the manager creates a profile for the supervised user through the manager’s Google Account. As a manager of a supervised user, you can see the user’s browsing history, block specific sites, and approve which sites the user can see, all from the supervised users dashboard that is accessible from any browser.
This WWW page is intended to serve as a comprehensive collection of algorithm implementations for over seventy of the most fundamental problems in combinatorial algorithms. The problem taxonomy, implementations, and supporting material are all drawn from my [ie. Steven Skiena's] book ‘The Algorithm Design Manual’. Since the practical person is more often looking for a program than an algorithm, we provide pointers to solid implementations of useful algorithms, when they are available.
There is a big difference between avoiding major hazards and making every decision with the primary goal of optimizing child safety (or enrichment, or happiness). We can no more create the perfect environment for our children than we can create perfect children. To believe otherwise is a delusion, and a harmful one; remind yourself of that every time the panic rises.
an empty 204 response to a HTTP PUT will trigger this. See also https://code.google.com/p/android/issues/detail?id=24672, ‘”java.io.IOException: unexpected end of stream” on HttpURLConnection HEAD call’.
‘The European election will take place between 22 and 25 May 2014. Citizens, promise to vote for candidates that have signed a 10-point charter of digital rights! Show candidates that they need to earn your vote by signing our charter!’
‘Microsoft went through a blogger’s private Hotmail account in order to trace the identity of a source who allegedly leaked trade secrets.’ Bear in mind that the alleged violation which MS allege allows them to read their email was a breach of the terms of service, which also include distribution of content which ‘incites, advocates, or expresses pornography, obscenity, vulgarity, [or] profanity’. So no dirty jokes on Hotmail!
Y! is moving to Dublin to evade GCHQ spying on its users. And what is the UK response?
“There are concerns in the Home Office about how Ripa will apply to Yahoo once it has moved its headquarters to Dublin,” said a Whitehall source. “The home secretary asked to see officials from Yahoo because in Dublin they don’t have equivalent laws to Ripa. This could particularly affect investigations led by Scotland Yard and the national crime agency. They regard this as a very serious issue.”There’s priorities for you!
Very nice, Airbnb!
Netflix CEO Reed Hastings blogs about the need for Net Neutrality:
Interestingly, there is one special case where no-fee interconnection is embraced by the big ISPs — when they are connecting among themselves. They argue this is because roughly the same amount of data comes and goes between their networks. But when we ask them if we too would qualify for no-fee interconnect if we changed our service to upload as much data as we download** — thus filling their upstream networks and nearly doubling our total traffic — there is an uncomfortable silence. That’s because the ISP argument isn’t sensible. Big ISPs aren’t paying money to services like online backup that generate more upstream than downstream traffic. Data direction, in other words, has nothing to do with costs. ISPs around the world are investing in high-speed Internet and most already practice strong net neutrality. With strong net neutrality, new services requiring high-speed Internet can emerge and become popular, spurring even more demand for the lucrative high-speed packages ISPs offer. With strong net neutrality, everyone avoids the kind of brinkmanship over blackouts that plague the cable industry and harms consumers. As the Wall Street Journal chart shows, we’re already getting to the brownout stage. Consumers deserve better.
pinning threads to CPUs to reduce jitter and latency. Lots of graphs and measurements from Peter Lawrey
“in 1979, no-one died. in 1980, some one died. in 1981, no-one died. in 1982, no-one died. … I could go on”
Many aspects of the story already look like a caricature of journalism gone awry. The man Goodman fingered as being worth $400 million or more is just as modest as his house suggests. He’s had a stroke and struggles with other health issues. Unemployed since 2001, he strives to take care of basic needs for himself and his 93-year-old mother, according to a reddit post by his brother Arthur Nakamoto (whom Goodman quoted as calling his brother an “asshole”). If Goodman has mystery evidence supporting the Dorian Nakamoto theory, it should have been revealed days ago. Otherwise, Newsweek and Goodman are delaying an inevitable comeuppance and doubling down on past mistakes. Nakamoto’s multiple denials on the record have changed the dynamic of the story. Standing by the story, at this point, is an attack on him and his credibility. The Dorian Nakamoto story is a “Dewey beats Truman” moment for the Internet age, with all of the hubris and none of the humor. It shouldn’t be allowed to end in the mists of “he said, she said.” Whether or not a lawsuit gets filed, Nakamoto v. Newsweek faces an imminent verdict in the court of public opinion: either the man is lying or the magazine is wrong.
While going through her papa’s old belongings, a young girl discovered something incredible – a mind-bogglingly intricate maze that her father had drawn by hand 30 years ago. While working as a school janitor it had taken him 7 years to produce the piece, only for it to be forgotten about… until now.34″ x 24″ print, $40
Lonely Planet and Dr Foster Intelligence both make heavy use of ETL in their products, and both organisations have applied the principles of Continuous Delivery to their delivery process. Some of the Continuous Delivery norms need to be adapted in the context of ETL, and some interesting patterns emerge, such as running Continuous Integration against data, as well as code.
‘On March 9th a group posted a data leak, which included the trading history of all MtGox users from April 2011 to November 2013. The graphs below explore the trade behaviors of the 500 highest volume MtGox users from the leaked data set. These are the Bitcoin barons, wealthy speculators, dueling algorithms, greater fools, and many more who took bitcoin to the moon.’
hats off to Rockstar — GTA V has a great mystery mural with clues dotted throughout the game, and it’s as-yet unsolved
Creating your own 3-D printer filament from old used milk jugs is exponentially cheaper, and uses considerably less energy, than buying new filament, according to new research from Michigan Technological University. [...] The savings are really quite impressive — 99 cents on the dollar, in addition to the reduced use of energy. Interestingly (but again not surprisingly), the amount of energy used to ‘recycle’ the old milk jugs yourself is considerably less than that used in recycling such jugs conventionally.
This is a really good post on governmental computing, open data, and so on:
The fact that I can go months hearing about “open data” without a single mention of ETL is a problem. ETL is the pipes of your house: it’s how you open data.
as TJ McIntyre noted: ‘€100 fine for a repeat spammer. Data Protection Commissioner calls this “strong protection”. With a straight face.’ Next will doubtless fork over the 100 Euros out of the petty cash drawer, then carry on regardless. This isn’t a useful fine. What a farce…
The mass surveillance methods employed in [the UK, USA, and India], many of them exposed by NSA whistleblower Edward Snowden, are all the more intolerable because they will be used and indeed are already being used by authoritarians countries such as Iran, China, Turkmenistan, Saudi Arabia and Bahrain to justify their own violations of freedom of information. How will so-called democratic countries will able to press for the protection of journalists if they adopt the very practices they are criticizing authoritarian regimes for?This is utterly jaw-dropping — throughout the world, real-time mass-monitoring infrastructure is silently being dropped into place. France and India are particularly pervasive
“Microservices” seems to be yet another term for SOA; small, decoupled, independently-deployed services, with well-defined public HTTP APIs. Pretty much all the services I’ve worked on over the past few years have been built in this style. Still, let’s keep an eye on this concept anyway. Another definition seems to be a more FP-style one: http://www.slideshare.net/michaelneale/microservices-and-functional-programming — where the “microservice” does one narrowly-defined thing, and that alone.
Great essay on sexism in tech, “brogrammer” culture, “clubhouse chemistry”, outsiders, wierd nerds and exclusion:
Every group, including the excluded and disadvantaged, create cultural capital and behave in ways that simultaneously create a sense of belonging for them in their existing social circle while also potentially denying them entry into another one, often at the expense of economic capital. It’s easy to see that wearing baggy, sagging pants to a job interview, or having large and visible tattoos in a corporate setting, might limit someone’s access. These are some of the markers of belonging used in social groups that are often denied opportunities. By embracing these markers, members of the group create real barriers to acceptance outside their circle even as they deepen their peer relationships. The group chooses to adopt values that are rejected by the society that’s rejecting them. And that’s what happens to “weird nerd” men as well—they create ways of being that allow for internal bonding against a largely exclusionary backdrop.(via Bryan O’Sullivan)
some nice graphs and data on CMS performance, with/without -XX:ParGCCardsPerStrideChunk
these are fantastic
a utility to perform parallel, pipelined execution of a single HTTP GET. htcat is intended for the purpose of incantations like: htcat https://host.net/file.tar.gz | tar -zx It is tuned (and only really useful) for faster interconnects: [....] 109MB/s on a gigabit network, between an AWS EC2 instance and S3. This represents 91% use of the theoretical maximum of gigabit (119.2 MiB/s).
Abe Stanway crunches the stats on Citibike usage in NYC, compared to the weather data from Wunderground.
Storing them in a 30-day rolling buffer, allowing retrospective targeting weeks after the call. 100% of all voice calls in that country, although it’s unclear which country that is
a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.
good explanation of all the new features — I’m really looking forward to fixing up all the crappy over-verbose interface-as-lambdas we have scattered throughout our code
a compressed full-text substring index based on the Burrows-Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini, who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for ‘Full-text index in Minute space’. It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. Both the query time and storage space requirements are sublinear with respect to the size of the input data.kragen notes ‘gene sequencing is using [them] in production’.
some good tips
Nice Irish Times article on the first 3 web servers in Ireland — including the one I set up at Iona Technologies. 21 years ago!
‘Light Blue Touchpaper’ notes:
Three NGOs have lodged a formal complaint to the Information Commissioner about the fact that PA Consulting uploaded over a decade of UK hospital records to a US-based cloud service. This appears to have involved serious breaches of the UK Data Protection Act 1998 and of multiple NHS regulations about the security of personal health information.Let’s see if ICO can ever do anything useful…. not holding my breath
It’s admittedly hard for outsiders to analyze Google Flu Trends, because the company doesn’t make public the specific search terms it uses as raw data, or the particular algorithm it uses to convert the frequency of these terms into flu assessments. But the researchers did their best to infer the terms by using Google Correlate, a service that allows you to look at the rates of particular search terms over time. When the researchers did this for a variety of flu-related queries over the past few years, they found that a couple key searches (those for flu treatments, and those asking how to differentiate the flu from the cold) tracked more closely with Google Flu Trends’ estimates than with actual flu rates, especially when Google overestimated the prevalence of the ailment. These particular searches, it seems, could be a huge part of the inaccuracy problem. There’s another good reason to suspect this might be the case. In 2011, as part of one of its regular search algorithm tweaks, Google began recommending related search terms for many queries (including listing a search for flu treatments after someone Googled many flu-related terms) and in 2012, the company began providing potential diagnoses in response to symptoms in searches (including listing both “flu” and “cold” after a search that included the phrase “sore throat,” for instance, perhaps prompting a user to search for how to distinguish between the two). These tweaks, the researchers argue, likely artificially drove up the rates of the searches they identified as responsible for Google’s overestimates.via Boing Boing
clever hack — shellcode in a format string
The UK government’s failure to deal with spam law in a consumer-friendly way escalates further: UCAS, the university admissions service, is operating as a mass-mailer of direct marketing on behalf of Vodafone, O2, Microsoft, Red Bull and others, without even a way to later opt out from that spam without missing important admissions-related mail as a side effect. ‘Teenagers using Ucas Progress must explicitly opt in to mailings from the organisation and advertisers, though the organisation’s privacy statement says: “We do encourage you to tick the box as it helps us to help you.”‘ Their website also carries advertising, and the details of parents are sold on to advertisers as well. Needless to say, the toothless ICO say they ‘did not appear to breach marketing rules under the privacy and electronic communications regulations’, as usual. Typical ICO fail.
I’ve often had to explain this key feature verbosely, and it’s hard to do without handwaving. Great to have a solid, well-explained URL to point to
Allegations of fixing to fit the stack-ranking curve: ‘someone at Google always had to get a low score “of 2.9”, so the unit could match the bell curve. She said senior staff “calibrated” the ratings supplied by line managers to ensure conformity with the template and these calibrations could reduce a line manager’s assessment of an employee, in effect giving them the poisoned score of less than three.’
According to our calculation about €40bn or over 40% of Irish services exports of €90bn in 2012 and related national output, resulted from global tax avoidance schemes. It is true that Ireland gains little from tax cheating but at some point, the US tax system will be reformed and a territorial system where companies are only liable in the US on US profits, would only be viable if there was a disincentive to shift profits to non-tax or low tax countries. The risk for Ireland is that a minimum foreign tax would be introduced that would be greater than the Irish headline rate of 12.5%. It’s also likely that US investment in Ireland would not have been jeopardized if Irish politicians had not been so eager as supplicants to doff the cap. Nevertheless today it would be taboo to admit the reality of participation in massive tax avoidance and the Captain Renaults of Merrion Street will continue with their version of the Dance of the Seven Veils.
TimBL backing the “web we want” campaign — https://webwewant.org/
Via jgc, the search for the downed Air France flight was optimized using this technique: ‘Metron’s approach to this search planning problem is rooted in classical Bayesian inference, which allows organization of available data with associated uncertainties and computation of the Probability Distribution Function (PDF) for target location given these data. In following this approach, the first step was to gather the available information about the location of the impact site of the aircraft. This information was sometimes contradictory and filled with ambiguities and uncertainties. Using a Bayesian approach we organized this material into consistent scenarios, quantified the uncertainties with probability distributions, weighted the relative likelihood of each scenario, and performed a simulation to produce a prior PDF for the location of the wreck.’
The implants being deployed were once reserved for a few hundred hard-to-reach targets, whose communications could not be monitored through traditional wiretaps. But the documents analyzed by The Intercept show how the NSA has aggressively accelerated its hacking initiatives in the past decade by computerizing some processes previously handled by humans. The automated system – codenamed TURBINE – is designed to “allow the current implant network to scale to large size (millions of implants) by creating a system that does automated control implants by groups instead of individually.” In a top-secret presentation, dated August 2009, the NSA describes a pre-programmed part of the covert infrastructure called the “Expert System,” which is designed to operate “like the brain.”Great. Automated malware deployment to millions of random victims. See also the “I hunt sysadmins” section further down…
Burrito Justice nerds out on ‘Goodnight Moon’. ‘Maybe the bunny and the old lady are actually in a space elevator, getting closer to the moon as he gets into bed? Or as suggested by @transitmaps, the bunny can bend space and time? I do not have a good answer to this conundrum, but that is what the comments are for.’
An exceptionally well-researched and thorough disassembly of ‘Public Health Investigation of Epidemiological data on Disease and Mortality in Ireland related to Water Fluoridation and Fluoride Exposure’ by Declan Waugh, which appears to be going around currently
Hilariously, “The Girl Against Flouride” and other antiflouridation campaigners now allege he’s a undercover agent of Alcoa and/or Glaxo Smith Kline, rather than dealing with any awkwardly hostile realities
fantastic piece of C=64 history — the “Ocean fast loader” by Paul Hughes, which allowed Commodore 64 games to load from tape at 4000 baud, far faster than the built-in system implementation, and with graphics and music at the same time
This, IMO, would be a really good reason to upgrade to the payware version of IDEA – Chronon looks cool.
Chronon is a new revolutionary tool keeping track of running Java programs and recording their execution process for later analysis, which can be helpful when you need to thoroughly retrace your steps when dealing with complicated bugs.
Google paper describing the infrastructure they’ve built for cross-service request tracing (ie. “tracer requests”). Features: low code changes required (since they’ve built it into the internal protobuf libs), low performance impact, sampling, deployment across the ~entire production fleet, output visibility in minutes, and has been live in production for over 2 years. Excellent read
‘a Japanese term that means “mistake-proofing”. A poka-yoke is any mechanism in a lean manufacturing process that helps an equipment operator avoid (yokeru) mistakes (poka). Its purpose is to eliminate product defects by preventing, correcting, or drawing attention to human errors as they occur.’
lovely New Yorker writeup on Tove Jansson, author of those beautiful children’s books
I am very heartened by Minister of State for Research and Innovation Sean Sherlock’s recent announcement of a review of the costs and benefits of Ireland’s membership of international research organisations including CERN. I disagreed with the conclusion of the last review which suggested that costs outweighed the benefits to Ireland. I think it was an extreme oversight not to be a part of the engineering phase of the Collider during the period 1998-2008 – but it’s not too late. CERN will celebrate its 60th anniversary in 2014. There is no public scientific institution its equal in terms of the scale and complexity of problems being analysed and solved. No longer excluding young Irish people from being a part of this, from learning and growing from it, can only help Ireland.Also, spot my name in lights ;)
Mining Arscoins, dogecoins and litecoins — CPU/GPU mining apps and how to run ‘em
‘a fucking nightmare’:
Cascading requires a compilation step, yet since you’re writing Ruby code, you get get none of the benefits of static type checking. It was standard to discover a type issue only after kicking off a job on, oh, 10 EC2 machines, only to have it fail because of a type mismatch. And user code embedded in strings would regularly fail to compile – which you again wouldn’t discover until after your job was running. Each of these were bad individually, together, they were a fucking nightmare. The interaction between the code in strings and the type system was the worst of all possible worlds. No type checking, yet incredibly brittle, finicky and incomprehensible type errors at run time. I will never forget when one of my friends at Etsy was learning Cascading.JRuby and he couldn’t get a type cast to work. I happened to know what would work: a triple cast. You had to cast the value to the type you wanted, not once, not twice, but THREE times.
Attempting to cash out of Bitcoins turns out to be absurdly difficult:
Trying to sell the coins in person, and basically saying he ether wants Cash, or a Cashiers check (since it can be handed over right then and there), has apparently been a hilarious clusterfuck. Today he met some guy infront of his bank, and apparently as soon as he mentioned that he needs to get the cash checked to make sure it is not counterfeit, the guy freaked out and basically walked away. Stuff like this has been happening all week, and he apparently so far has only sold a single coin of several hundred.
Harris is the leading maker of [IMSI catchers aka "stingrays"] in the U.S., and the ACLU has long suspected that the company has been loaning the devices to police departments throughout the state for product testing and promotional purposes. As the court document notes in the 2008 case, “the Tallahassee Police Department is not the owner of the equipment.” The ACLU now suspects these police departments may have all signed non-disclosure agreements with the vendor and used the agreement to avoid disclosing their use of the equipment to courts. “The police seem to have interpreted the agreement to bar them even from revealing their use of Stingrays to judges, who we usually rely on to provide oversight of police investigations,” the ACLU writes.
At the core of the redesign is a Dynamic Scripting Platform which provides us the ability to inject code into a running Java application at any time. This means we can alter the behavior of the application without a full scale deployment. As you can imagine, this powerful capability is useful in many scenarios. The API Server is one use case, and in this post, we describe how we use this platform to support a distributed development model at Netflix.Holy crap.
essentially decoupling the client services from ZK using a local daemon on each client host; very similar to Airbnb’s Smartstack. This is a bit of an indictment of ZK’s usability though
Good post on the ‘FOI costs too much’ talking point.
I realise if you’re a councillor, tea and biscuits sounds much more appealing than transparency and being held accountable and actually having to answer to voters, but those things are what you signed up to when you stood for election.
Good to know:
‘As far as I understand (this was true as of 2013, when I last looked into this issue) there’s at least one Apache ZooKeeper znode per topic in Kafka. While there is no hard limitation in Kafka itself (Kafka is linearly scalable), it does mean that the maximum number of znodes comfortable supported by ZooKeeper (on the order of about ten thousand) is the upper limit of Kafka’s scalability as far as the number of topics goes.’
There are people in my profession who think they can ignore this problem. Some are murmuring that this mess is like MMR, a public misunderstanding to be corrected with better PR. They are wrong: it’s like nuclear power. Medical data, rarefied and condensed, presents huge power to do good, but it also presents huge risks. When leaked, it cannot be unleaked; when lost, public trust will take decades to regain. This breaks my heart. I love big medical datasets, I work on them in my day job, and I can think of a hundred life-saving uses for better ones. But patients’ medical records contain secrets, and we owe them our highest protection. Where we use them – and we have used them, as researchers, for decades without a leak – this must be done safely, accountably, and transparently. New primary legislation, governing who has access to what, must be written: but that’s not enough. We also need vicious penalties for anyone leaking medical records; and HSCIC needs to regain trust, by releasing all documentation on all past releases, urgently. Care.data needs to work: in medicine, data saves lives.
bookmarking as a future reference
Nice bit of marketing from the day job:
The group of gamers responsible for half of all in-game revenue in mobile titles is frightening because it is so narrow, according to a survey by Swrve, an established analytics and app marketing firm. About 0.15 percent of mobile gamers contribute 50 percent of all of the in-app purchases generated in free-to-play games. This means it may even more important than game companies realized in the past to find and retain the users that fall into the category of big spenders, or “whales.” The vast majority of users never spend any money, despite the clever tactics that game publishers have developed to incentivize people to spend money in their favorite games.
‘EAT CELEBRITY MEAT! BiteLabs grows meat from celebrity tissue samples and uses it to make artisanal salami.’ Genius. (via John Looney)
‘A system that proactively detects and avoids bad neighbouring VMs without significantly penalizing node instantiation [in EC2]. With Bobtail, common [datacenter] communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.’ Excellent stuff — another conclusion they come to is that it’s not the network’s fault, it’s the Xen hosts themselves. The EC2 networking team will be happy about that ;)
Charlie Stross on GCHQ’s 1984-esque webcam spying
These problems can be circumvented, but they must be dealt with, publically and soberly, if the NHS really does want to win public confidence. The NHS should approach selling the scheme to the public as if was opt-in, not opt-out, then work to convince us to join it. Tell us how sharing our data can help, but tell us what risk too. Let us decide if that balance is worth it. If it’s found wanting, the NHS must go back to the drawing board and retool the scheme until it is. It’s just too important to get wrong.
“Computer says no”, taken to the next level.
Even if an algorithmic prisoner knows he is in a prison, he may not know who his jailer is. Is he unable to get a loan because of a corrupted file at Experian or Equifax? Or could it be TransUnion? His bank could even have its own algorithms to determine a consumer’s creditworthiness. Just think of the needle-in-a-haystack effort consumers must undertake if they are forced to investigate dozens of consumer-reporting companies, looking for the one that threw them behind algorithmic bars. Now imagine a future that contains hundreds of such companies. A prisoner might not have any idea as to what type of behavior got him sentenced to a jail term. Is he on an enhanced screening list at an airport because of a trip he made to an unstable country, a post on his Facebook page, or a phone call to a friend who has a suspected terrorist friend?
as we all know by now, a misplaced “goto fail” caused a critical, huge security flaw in versions of IOS and OSX SSL, since late 2012. Lessons: 1. unit test the failure cases, particularly for critical security code! 2. use braces. 3. dead-code analysis would have caught this. I’m not buying the “goto considered harmful” line, though, since any kind of control flow structure would have had the same problem.
in a world where Netflix and Yahoo connect directly to residential ISPs, every Internet company will have its own separate pipe. And policing whether different pipes are equally good is a much harder problem than requiring that all of the traffic in a single pipe be treated the same. If it wanted to ensure a level playing field, the FCC would be forced to become intimately involved in interconnection disputes, overseeing who Verizon interconnects with, how fast the connections are and how much they can charge to do it.
nice piece of classic graph design
With Cogent and Verizon fighting, [peering capacity] upgrades are happening at a glacial pace, according to Schaeffer. “Once a port hits about 85 percent throughput, you’re going to begin to start to drop packets,” he said. “Clearly when a port is at 120 or 130 percent [as the Cogent/Verizon ones are] the packet loss is material.” The congestion isn’t only happening at peak times, he said. “These ports are so over-congested that they’re running in this packet dropping state 22, 24 hours a day. Maybe at four in the morning on Tuesday or something there might be a little bit of headroom,” he said.
The 274-page report describes the NHS Hospital Episode Statistics as a “valuable data source in developing pricing assumptions for ‘critical illness’ cover.” It says that by combining hospital data with socio-economic profiles, experts were able to better calculate the likelihood of conditions, with “amazingly” clear forecasts possible for certain diseases, in particular lung cancer. Phil Booth, from privacy campaign group medConfidential, said: “The language in the document is extraordinary; this isn’t about patients, this is about exploiting a market. Of course any commercial organisation will focus on making a profit – the question is why is the NHS prepared to hand this data over?”