feedback loop n-gram analyzer : ‘a simple parser of ARF compliant FBL complaints, which normalizes the email complaints and generates a 6-tuple n-gram version of the message. These n-grams are stored in a Redis database, keyed by the file in which they can be found. An inverse index also exists that allow you to find all messages containing a particular n-gram word.’
(tags: anti-spam spam fbl feedback filtering n-grams similarity hashing redis searching)
Month: September 2011
Lovelace’s Leap : a great observation from jgc. ‘Lovelace realized that even though a computer was, at its heart, a mathematical machine, it wasn’t restricted to doing mathematics. She realized that a computer could be used to process other types of ‘information’ by having numbers represent anything else. She realized that a computer could handle text, or music, or practically anything. That’s Lovelace’s Leap.’
(tags: jgc history ada-lovelace computing software information code babbage)Rectangular subdivisions of the world : ‘Eric Fischer, who continues his string of mapping fun and doesn’t even do it for his day job, maps the world in binary subdivisions. Each bounding box contains an equal number of geotagged tweets.’ via Nelson
(tags: maps mapping bounding-boxes world earth geodata geotagging twitter)
Hikaru Dorodango : ‘Hikaru dorodango are balls of mud, molded by hand into perfect spheres, dried, and polished to an unbelievable luster. The process is simple, but the result makes it seem like alchemy. A traditional pastime among the children of Japan, the exact origin of hikaru dorodango is unknown.’
(tags: mud dirt dorodango japan art howto sculpture hands craft play children)
Storm : ‘The past decade has seen a revolution in data processing. MapReduce, Hadoop, and related technologies have made it possible to store and process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to be. There’s no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing. However, realtime data processing at massive scale is becoming more and more of a requirement for businesses. The lack of a “Hadoop of realtime” has become the biggest hole in the data processing ecosystem. Storm fills that hole.’
(tags: data scaling twitter realtime scalability storm queueing)Storm: distributed and fault-tolerant realtime computation : intro slideshow to this really nifty-looking distcomp platform
(tags: distcomp distributed realtime storm slides twitter)Hacker News thread on Storm : lots of good questions and answers in here
(tags: twitter storm distcomp distributed)
Computer gamers solve problem in AIDS research that puzzled scientists for years : “This is the first instance that we are aware of in which online gamers solved a longstanding scientific problem,” writes Khatib. “These results indicate the potential for integrating video games [like FoldIt] into the real-world scientific process: the ingenuity of game players is a formidable force that, if properly directed, can be used to solve a wide range of scientific problems.”
(tags: foldit gaming games science biology aids viruses protease protein-folding proteins vr)
Black Hat: Insulin pumps can be hacked : “Everything has an embedded processor and computer in it,” he said. “Every time you hide behind [security by] obscurity, it is going to fail.” Brad Smith, a researcher and Black Hat conference staffer who also is a registered nurse, said the medical field largely looks the other way when it comes to securing patient devices. “I lecture at all the medical conferences,” he said during the press conference. “They just hide it. Pay attention to what [Radcliffe] is saying. His life is in this pump.” (via Risks Digest)
(tags: via:risks insulin pump medicine security hacking health wireless)A few git tips you didn’t know about : ‘git checkout -t’ alone is worth the bookmark
(tags: git tips coding unix reference tricks via:proggit)
Conor O’Neill on his freesat/DTT system : ‘Our replacement for Sky TV cost €99. Ariva 120. No monthly fees!’ — sounds very intriguing, that’s a good price point
(tags: digital fta satellite television dtt sky upc ireland)Golomb-coded sets : ‘a probabilistic data structure conceptually similar to a Bloom filter, but with a more compact in-memory representation, and a slower query time.’ could come in handy
(tags: gcs bloom-filters probabilistic data-structures memory algorithms)
The Best Science Fiction Books (According to Reddit) : contains a surprisingly-large number which I haven’t read
(tags: scifi fiction books science-fiction)
Dutch grepping Facebook for welfare fraud : ‘The [Dutch] councils are working with a specialist Amsterdam research firm, using the type of computer software previously deployed only in counterterrorism, monitoring [LinkedIn, Facebook and Twitter] traffic for keywords and cross-referencing any suspicious information with digital lists of social welfare recipients. Among the giveaway terms, apparently, are “holiday” and “new car”. If the automated software finds a match between one of these terms and a person claiming social welfare payments, the information is passed on to investigators to gather real-life evidence.’ With a 30% false positive rate, apparently — let’s hope those investigations aren’t too intrusive!
(tags: grep dutch holland via:tjmcintyre privacy facebook twitter linkedin welfare dole fraud false-positives searching)
The Monkeysphere Project : OpenPGP’s web of trust extending further. ‘Everyone who has used a web browser has been interrupted by the “Are you sure you want to connect?” warning message, which occurs when the browser finds the site’s certificate unacceptable. But web browser vendors (e.g. Microsoft or Mozilla) should not be responsible for determining whom (or what) the user trusts to certify the authenticity of a website, or the identity of another user online. The user herself should have the final say, and designation of trust should be done on the basis of human interaction. The Monkeysphere project aims to make that possibility a reality.’
(tags: via:filippo gpg pki security software ssh ssl web)Convergence : ‘Convergence is a secure replacement for the Certificate Authority System. Rather than employing a traditionally hard-coded list of immutable CAs, Convergence allows you to configure a dynamic set of Notaries which use network perspective to validate your communication. Convergence allows you to choose who you want to trust, rather than having someone else’s decision forced on you. You can revise your trust decisions at any time, so that you’re not locked in to trusting anyone for longer than you want.’
(tags: ssl tls trust security https web via:filippo firefox plugins pki)
Dave Neary on The Cost of Going it Alone : ‘I’m going to talk about the costs associated with modifying and maintaining free software “out of tree” – that is, when you don’t work with the developers of the software to have your changes integrated. But I’m also going to talk about the costs of working with upstream projects. It can be easy for us to forget that working upstream takes time and money – and we ignore that to our peril. It’s in our interests as free software developers to make it as cost-effective as possible for people to work with us. Hopefully, if you’re a commercial developer, you’ll come away from this article with a better idea of when it’s worthwhile to work upstream, and when it isn’t. And if you’re a community developer, perhaps this will give you some ideas about how to make it easier for people to work with you.’
(tags: dave-neary gnome open-source maintainers upstream forking)
Google App Engine Price Hike Stuns Developers – – Platform as a Service – Informationweek : ‘Now that Google has begun offering App Engine users a way to calculate the new rate and compare it with the old rate, developers are realizing their bills will rise, by a factor of 10 or 100 or more in some cases, when the pricing change takes effect in a few months.’ – ouch
(tags: google gae appengine costs pricing paas)
Through speed of traffic on San Francisco area streets vs. popularity with Flickr and Twitter users : “slower streets” generate more photos/tweets than “faster streets”, with a peak around 9 mph
(tags: data san-francisco photos flickr twitter speed driving)