These companies and their technologies are built on data, and the data is us. If we are to have any faith in the Internet, we have to trust them to protect it. That’s a relationship dynamic that will become only more intertwined as the Internet finds its way into more aspects of our daily existences, from phones that talk to us to cars that drive themselves. The US’s surveillance programs threaten to destroy that trust permanently. America’s tech companies must stand up to this pervasive and corrosive surveillance system. They must ask that difficult question: “Is it worth it?”
‘a service discovery and orchestration tool that is decentralized, highly available, and fault tolerant. Serf runs on every major platform: Linux, Mac OS X, and Windows. It is extremely lightweight: it uses 5 to 10 MB of resident memory and primarily communicates using infrequent UDP messages [and an] efficient gossip protocol.’
Skew is prevalent in many data sources such as IP traffic streams.To continually summarize the distribution of such data, a high-biased set of quantiles (e.g., 50th, 90th and 99th percentiles) with finer error guarantees at higher ranks (e.g., errors of 5, 1 and 0.1 percent, respectively) is more useful than uniformly distributed quantiles (e.g., 25th, 50th and 75th percentiles) with uniform error guarantees. In this paper, we address the following two prob-lems. First, can we compute quantiles with finer error guarantees for the higher ranks of the data distribution effectively, using less space and computation time than computing all quantiles uniformly at the finest error? Second, if specific quantiles and their error bounds are requested a priori, can the necessary space usage and computation time be reduced? We answer both questions in the affirmative by formalizing them as the “high-biased” quantiles and the “targeted” quantiles problems, respectively, and presenting algorithms with provable guarantees, that perform significantly better than previously known solutions for these problems. We implemented our algorithms in the Gigascope data stream management system, and evaluated alternate approaches for maintaining the relevant summary structures.Our experimental results on real and synthetic IP data streams complement our theoretical analyses, and highlight the importance of lightweight, non-blocking implementations when maintaining summary structures over high-speed data streams.
Implemented as a timer-histogram storage system in http://armon.github.io/statsite/ .
A C reimplementation of Etsy’s statsd, with some interesting memory optimizations.
Statsite is designed to be both highly performant, and very flexible. To achieve this, it implements the stats collection and aggregation in pure C, using libev to be extremely fast. This allows it to handle hundreds of connections, and millions of metrics. After each flush interval expires, statsite performs a fork/exec to start a new stream handler invoking a specified application. Statsite then streams the aggregated metrics over stdin to the application, which is free to handle the metrics as it sees fit. This allows statsite to aggregate metrics and then ship metrics to any number of sinks (Graphite, SQL databases, etc). There is an included Python script that ships metrics to graphite.
if Linnane’s and Cronin’s are anything to go by, these will be worth a visit
A fax machine called my #twilio voice number, this is how @twilio transcribed it…. http://pic.twitter.com/RYh19Pg2pGThis is amazing. Machine talking to machine, with hilarious results