Volunteer moderators at Stack Overflow, a popular forum for software developers to ask and answer questions run by Stack Exchange, have issued a general strike over the company’s new AI content policy, which says that all GPT-generated content is now allowed on the site, and suspensions over AI content must stop immediately. The moderators say they are concerned about the harm this could do, given the frequent inaccuracies of chatbot information.
I missed this attack at the time, but Cory Doctorow reposted it recently — poisoning a neural network’s model trained using stochastic gradient descent by attacking the _ordering_ of the training data.
Suppose for example a company or a country wanted to have a credit-scoring system that’s secretly sexist, but still be able to pretend that its training was actually fair. Well, they could assemble a set of financial data that was representative of the whole population, but start the model’s training on ten rich men and ten poor women drawn from that set – then let initialisation bias do the rest of the work. Does this generalise? Indeed it does. Previously, people had assumed that in order to poison a model or introduce backdoors, you needed to add adversarial samples to the training data. Our latest paper shows that’s not necessary at all. If an adversary can manipulate the order in which batches of training data are presented to the model, they can undermine both its integrity (by poisoning it) and its availability (by causing training to be less effective, or take longer). This is quite general across models that use stochastic gradient descent.