Skip to content


Links for 2014-03-18

  • Analyzing Citibike Usage

    Abe Stanway crunches the stats on Citibike usage in NYC, compared to the weather data from Wunderground.

    (tags: data correlation statistics citibike cycling nyc data-science weather)

  • NSA surveillance recording every single voice call in at least 1 country

    Storing them in a 30-day rolling buffer, allowing retrospective targeting weeks after the call. 100% of all voice calls in that country, although it’s unclear which country that is

    (tags: nsa surveillance gchq telephones phone bugging)

  • S3QL

    a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.

    (tags: s3 s3ql backup aws filesystems linux freebsd osx ops)

  • What’s New in Java 8

    good explanation of all the new features — I’m really looking forward to fixing up all the crappy over-verbose interface-as-lambdas we have scattered throughout our code

    (tags: java java8 lambdas fp functional-programming currying joda-time)

  • FM-index

    a compressed full-text substring index based on the Burrows-Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini,[1] who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for ‘Full-text index in Minute space’. It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. Both the query time and storage space requirements are sublinear with respect to the size of the input data.
    kragen notes ‘gene sequencing is using [them] in production’.

    (tags: sequencing bioinformatics algorithms bowtie fm-index indexing compression search burrows-wheeler bwt full-text-search)

Comments closed