April 10, 2013 - Justin's Linklog

Minister Rabbitte welcomes EU agreement on re-use of Public Sector Information

Lots of talk about “charging regimes”, “income-generating public sector bodies” etc., but not a single mention of open data or free access. Terrible stuff. :( (via conoro)

(tags: via:conoro open-access government public-sector ireland eu open-data public free)
Compression in Kafka: GZIP or Snappy ?

With Ack: in this mode, as far as compression is concerned, the data gets compressed at the producer, decompressed and compressed on the broker before it sends the ack to the producer. The producer throughput with Snappy compression was roughly 22.3MB/s as compared to 8.9MB/s of the GZIP producer. Producer throughput is 150% higher with Snappy as compared to GZIP. No ack, similar to Kafka 0.7 behavior: In this mode, the data gets compressed at the producer and it doesn’t wait for the ack from the broker. The producer throughput with Snappy compression was roughly 60.8MB/s as compared to 18.5MB/s of the GZIP producer. Producer throughput is 228% higher with Snappy as compared to GZIP. The higher compression savings in this test are due to the fact that the producer does not wait for the leader to re-compress and append the data; it simply compresses messages and fires away. Since Snappy has very high compression speed and low CPU usage, a single producer is able to compress the same amount of messages much faster as compared to GZIP.

(tags: gzip snappy compression kafka streaming ops)
The Bw-Tree: A B-tree for New Hardware – Microsoft Research

The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.

(tags: bw-trees database paper toread research algorithms microsoft sql sql-server b-trees data-structures storage cache-friendly mechanical-sympathy)
Boundary Techtalk – Large-scale OLAP with Kobayashi

Boundary on their TSD-on-Riak store.
Dietrich Featherston, Engineer at Boundary, walks through the process of designing Kobayashi, the time-series analytics database behind our network metrics. He goes through the false-starts and lessons learned in effectively using Riak as the storage layer for a large-scale OLAP database. The system is ultimately capable of answering complex, ad-hoc queries at interactive latencies.

(tags: video boundary tsd riak eventual-consistency storage kobayashi olap time-series)
Adding Insult to Plagiary?

A few days old, but already an instant Streisand-Effect classic:
Sometimes people borrow [Colin Purrington’s free guide about making scientific posters] without giving him credit. This happens fairly regularly, and when he finds out about it, he sends an e-mail asking them to take it down. Usually they do. But when he sent an e-mail to the Consortium for Plant Biotechnology Research, asking that a roughly 1,200-word, near-verbatim, uncredited chunk from his guide be removed from the consortium’s materials, the response was unexpected. Rather than apologise, a lawyer sent him a cease-and-desist letter accusing him of plagiarizing the consortium’s materials and demanding that he take down his guide or face a lawsuit seeking damages up to $150,000.

(tags: streisand-effect lawsuits law infringement copyright cpbr bullying science posters)
Kafka 0.8 Producer Performance

Great benchmarking from Piotr Kozikowski at the LiveRamp team, into performance of the upcoming Kafka 0.8 release

(tags: performance kafka apache benchmarks ops queueing)
Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node

an excellent writeup on Kafka 0.8’s use and operation, including details of the new replication features

(tags: kafka replication queueing distributed ops)
Ah Here (To Coin A Phrase)

‘A €10 silver coin being offered for sale to the public in honour of James Joyce by the Central Bank tomorrow contains a misquote from the author. The line used on the coin from Chapter 3 of Ulysses includes a superfluous conjunction – a rogue ‘that’.’ [..] The coin reads:
“Ineluctable modality of the visible: at least that if no more, thought through my eyes. Signatures of all things *that* I am here to read.”
(Incorrect ‘that’ emphasised)

(tags: for:robotwisdom james-joyce typos funny fail central-bank ireland coins minting errors ulysses)

Archives

Links for 2013-04-10