Links for 2012-06-28

  • “Machine Learning That Matters” [paper, PDF] : Great paper. This point particularly resonates: “It is easy to sit in your o?ce and run a Weka algorithm on a data set you downloaded from the web. It is very hard to identify a problem for which machine learning may o?er a solution, determine what data should be collected, select or extract relevant features, choose an appropriate learning method, select an evaluation method, interpret the results, involve domain experts, publicize the results to the relevant scienti?c community, persuade users to adopt the technique, and (only then) to truly have made a di?erence (see Figure 1). An ML researcher might well feel fatigued or daunted just contemplating this list of activities. However, each one is a necessary component of any research program that seeks to have a real impact on the world outside of machine learning.”
  • Massive identity-theft breach in South Korea results in calls for national ID system to be abandoned : In South Korea, web users are required to provide their national ID number for “virtually every type of Internet activity, not only for encrypted communications like e-commerce, online banking and e-government services but also casual tasks like e-mail and blogging”, apparently in an attempt to “curb cyber-bullying”. The result is obvious — those ID numbers being collected in giant databases at companies like “SK Communications, which runs top social networking service Cyworld and search site Nate”, and those giant databases being tasty targets for black-hats. Now: “In Korea’s biggest-ever case of data theft the recent hacking attack at SK Communications, which runs top social networking service Cyworld and search site Nate, breached 35 million accounts, a mind-boggling total for a country that has about 50 million people and an economically-active population of 25 million. The compromised information includes names, passwords, phone numbers, e-mail addresses, and most alarmingly, resident registration numbers, the country’s equivalent to social security numbers.” This is an identity-fraudster’s dream: “In the hands of criminals, resident registration numbers could become master keys that open every door, allowing them to construct an entire identity based on the quality and breadth of data involved.”
