Just staggeringly bad: ‘The issue … affected users with the word “Palestinian” written in English on their profile, the Palestinian flag emoji and the word “alhamdulillah” written in Arabic. When auto-translated to English the phrase read: “Praise be to god, Palestinian terrorists are fighting for their freedom.”’
Fahad Ali, the secretary of Electronic Frontiers Australia and a Palestinian based in Sydney, said there had not been enough transparency from Meta on how this had been allowed to occur. “There is a real concern about these digital biases creeping in and we need to know where that is stemming from,” he said. “Is it stemming from the level of automation? Is it stemming from an issue with a training set? Is it stemming from the human factor in these tools? There is no clarity on that. “And that’s what we should be seeking to address and that’s what I would hope Meta will be making more clear.”Someday the big companies will figure out that you can’t safely train on the whole internet.
“Recently, a project rewrote the LLaMa inference code in raw C++. With some optimizations and quantizing the weights, this allows running a LLM locally on a wild variety of hardware. If you are like me, you saw this and thought: What? How is this possible? Don’t large models require expensive GPUs? I took my confusion and dove into the math surrounding inference requirements to understand the constraints we’re dealing with.” […] Summary: “Memory bandwidth is the limiting factor in almost everything to do with sampling from transformers. Anything that reduces the memory requirements for these models makes them much easier to serve — like quantization! This is yet another reason why distillation, or just training smaller models for longer, is really important.” (via Luis Villa’s https://www.openml.fyi/ , which is great!)
More on distillation and quantization to reduce cost of LLMs