Skip to content


Links for 2022-09-06

  • What is Fog Data Science?

    EFF post on a data broker being misused by US police for warrantless “dragnet” surveillance:

    Fog claims that their product is made of [location] data willingly given by people. But people did not hand their geolocation data over to Fog or the police, willingly or even knowingly. Rather, they gave it over, for example, to a weather app so that they could see if it will rain in their town today. When they downloaded the app, they may have clicked a box purporting to grant various so-called “consents,” but no reasonable person expects this will result in the app tracking all their movements, the app developer selling this sensitive information to a data broker, and police ultimately buying it.
    and this is why the GDPR is so valuable.

    (tags: fog police surveillance gdpr privacy data-privacy location)

  • How a tool to map computer viruses came to power biology research

    This is some fantastic symmetry! Years ago, I took the BLAST bioinformatics algorithm, normally used to spot correlations between DNA/RNA sequences, and applied it to correlate and detect spam. And now here’s UMAP, an algorithm used to correlate and detect malware and viruses, going in the opposite direction!

    When mathematicians Leland McInnes and John Healy walked into their work’s annual “Big Dig” — a sort of classified hackathon for Canada’s version of the National Security Agency — in 2017, they were not thinking about biology at all. They wanted to find a way to quickly spot the differences between computer viruses. They ended up creating a tool to simplify datasets and visualize the data points in them: an algorithm they named Uniform Manifold Approximation and Projection, or UMAP. They published a paper on it in 2018. To their great surprise, in fewer than five years, it has become one of the most ubiquitous tools in modern biology research. UMAP has now been used to study everything from forecasting rain in the Alps to identifying the many-hued pigments in a Gauguin artwork to modeling how Covid-19 tweets are disseminated. And, of course, scientists have applied UMAP to studying the actual virus itself. The technique is now the method of choice for most computational biologists who want to see what, exactly, is going on in a dataset.

    (tags: dna rna sequences matching correlation spam antispam malware umap blast algorithms)