Volunteer moderators at Stack Overflow, a popular forum for software developers to ask and answer questions run by Stack Exchange, have issued a general strike over the company’s new AI content policy, which says that all GPT-generated content is now allowed on the site, and suspensions over AI content must stop immediately. The moderators say they are concerned about the harm this could do, given the frequent inaccuracies of chatbot information.
I missed this attack at the time, but Cory Doctorow reposted it recently — poisoning a neural network’s model trained using stochastic gradient descent by attacking the _ordering_ of the training data.
Suppose for example a company or a country wanted to have a credit-scoring system that’s secretly sexist, but still be able to pretend that its training was actually fair. Well, they could assemble a set of financial data that was representative of the whole population, but start the model’s training on ten rich men and ten poor women drawn from that set – then let initialisation bias do the rest of the work. Does this generalise? Indeed it does. Previously, people had assumed that in order to poison a model or introduce backdoors, you needed to add adversarial samples to the training data. Our latest paper shows that’s not necessary at all. If an adversary can manipulate the order in which batches of training data are presented to the model, they can undermine both its integrity (by poisoning it) and its availability (by causing training to be less effective, or take longer). This is quite general across models that use stochastic gradient descent.
Justin Mason's Weblog Posts
Nice exploit of LLM confabulation: ask LLM for coding advice, get a nonexistent package, then register that package and exploit other coders attempting to follow the LLM’s terrible advice
There’s actually some fantastic ideas in here!
A fascinating queueing theory phenomenon:
In public transport, bus bunching, clumping, convoying, piggybacking or platooning is a phenomenon whereby two or more [buses] which were scheduled at regular intervals along a common route instead bunch together and form a platoon. This occurs when leading vehicles are unable to keep their schedule and fall behind to such an extent that trailing vehicles catch up to them. […] A bus that is running slightly late will, in addition to its normal load, pick up passengers who would have taken the next bus if the first bus had not been late. These extra passengers delay the first bus even further. In contrast, the bus behind the late bus has a lighter passenger load than it otherwise would have, and may therefore run ahead of schedule.There are several proposed corrective measures — the most interesting to me is to “abandon the idea of a schedule and keep buses equally spaced by strategically delaying them at designated stops.” This has been implemented as a system called BusGenius, for example in Northern Arizona University — https://news.nau.edu/nau-bus-schedules/
An important aspect in developing language models that interact with humans is aligning their behavior to be useful and unharmful for their human users. This is usually achieved by tuning the model in a way that enhances desired behaviors and inhibits undesired ones, a process referred to as alignment. In this paper, we propose a theoretical approach called Behavior Expectation Bounds (BEB) which allows us to formally investigate several inherent characteristics and limitations of alignment in large language models. Importantly, we prove that for any behavior that has a finite probability of being exhibited by the model, there exist prompts that can trigger the model into outputting this behavior, with probability that increases with the length of the prompt. This implies that any alignment process that attenuates undesired behavior but does not remove it altogether, is not safe against adversarial prompting attacks. Furthermore, our framework hints at the mechanism by which leading alignment approaches such as reinforcement learning from human feedback increase the LLM’s proneness to being prompted into the undesired behaviors. Moreover, we include the notion of personas in our BEB framework, and find that behaviors which are generally very unlikely to be exhibited by the model can be brought to the front by prompting the model to behave as specific persona. This theoretical result is being experimentally demonstrated in large scale by the so called contemporary “chatGPT jailbreaks”, where adversarial users trick the LLM into breaking its alignment guardrails by triggering it into acting as a malicious persona. Our results expose fundamental limitations in alignment of LLMs and bring to the forefront the need to devise reliable mechanisms for ensuring AI safety.(via Remmelt Ellen)
A protein powder made from renewable electricity, requiring virtually no land, with a tiny carbon footprint, and resilient to climate or ecosystem shocks, unlike conventional agriculture. Apparently the resulting powder tastes nutty and a little like turmeric. Basically it ferments a type of airborne microbe, in a process that is 20x more efficient than photosynthesis, and 200x more than meat protein. They claim it to be “highly nutritious, vegan, and catering to every diet around. The macronutrient composition of the cells is very similar to that of dried soy or algae, but it is more versatile since it has pleasant note of umami flavor and mild aroma.” Also ideal for space! (Via Hannah Daly)
“From “Heavy Purchasers” of Pregnancy Tests to the Depression-Prone: We Found 650,000 Ways Advertisers Label You” – The Markup:
If you spend any time online, you probably have some idea that the digital ad industry is constantly collecting data about you, including a lot of personal information, and sorting you into specialized categories so you’re more likely to buy the things they advertise to you. But in a rare look at just how deep—and weird—the rabbit hole of targeted advertising gets, The Markup has analyzed a database of 650,000 of these audience segments, newly unearthed on the website of Microsoft’s ad platform Xandr. The trove of data indicates that advertisers could also target people based on sensitive information like being “heavy purchasers” of pregnancy test kits, having an interest in brain tumors, being prone to depression, visiting places of worship, or feeling “easily deflated” or that they “get a raw deal out of life.”(Via Johnny Ryan)
much better than Atkinson’s bullshit-soaked spiel about EVs. Don’t listen to washed-out comedians when you need science
“a place where those of us in the Restarters community with experience and skills in mending appliances and gadgets can share them with those who are starting out, or whose own knowledge lies in different areas.” Lots of good tips on general appliance repair and maintenance.
I love this paper! I’ve been saying this for years:
Deployed AI systems often do not work. They can be constructed haphazardly, deployed indiscriminately, and promoted deceptively. However, despite this reality, scholars, the press, and policymakers pay too little attention to functionality. This leads to technical and policy solutions focused on “ethical” or value-aligned deployments, often skipping over the prior question of whether a given system functions, or provides any benefits at all. To describe the harms of various types of functionality failures, we analyze a set of case studies to create a taxonomy of known AI functionality issues. We then point to policy and organizational responses that are often overlooked and become more readily available once functionality is drawn into focus. We argue that functionality is a meaningful AI policy challenge, operating as a necessary first step towards protecting affected communities from algorithmic harm.One mastodon user notes: “My favorite (sarcasm) example of this was police departments buying ML for identifying gunshots. The models were all trained for earthquakes, and the vendor basically repurposed earthquake detection as gunshot detection, made bank, and left departments with a flood of false positives.”
On 3 June 1980, at 2:26am EDT, “warning displays at the Strategic Air Command suddenly indicated that a Soviet SLBM attack on the United States was underway, first showing 2 and then, 18 seconds later, 200 inbound missiles. SAC ordered all alert air crews to start their engines.” “A subsequent investigation traced the cause to a defective 46¢ integrated circuit in a NORAD communications multiplexer, which sent test messages on dedicated lines from NORAD to other command posts. The test messages were designed to confirm those lines were functioning properly 24/7, and they were formatted to resemble an actual missile attack warning, including its size. The false alarm was triggered when the defective circuit randomly inserted 2’s in place of 0’s.” I wonder how many other near-armageddon incidents were barely averted…
“The Carbon Aware KEDA Operator was announced by Microsoft in April this year; … The operator builds on top of KEDA (Kubernetes Event Driven Autoscaling). Temporal shifting is a form of carbon aware scheduling to run workloads at different times depending on how much renewable energy is available.”
They are dubbing it “Triangulation”:
We believe that the main reason for this incident is the proprietary nature of iOS. This operating system is a “black box” in which spyware like Triangulation can hide for years. Detecting and analyzing such threats is made more difficult by Apple’s monopoly of research tools, making it the perfect haven for spyware. In other words, as I have said more than once, users are given the illusion of security associated with the complete opacity of the system. What actually happens in iOS is unknown to the cybersecurity experts.
Sucralose, as used in Splenda, is genotoxic. big yikes
The FTC have proposed a judgement against Amazon/Ring: “FTC says Ring employees illegally surveilled customers, failed to stop hackers from taking control of users’ cameras. Under proposed order, Ring will be prohibited from profiting from unlawfully collected consumer videos, pay $5.8M in consumer refunds.” Meredith Whittaker on Twitter, responding: “Speaking of real AI regulation grounded in reality! The part about Amazon being “prohibited from profiting from unlawfully collected consumer videos” is huge. Data protection IS AI regulation. & in this case will likely mean undoing datasets, retraining/disposing of models, etc.” Retraining/discarding datasets is a HUGE deal for AI/ML companies. This is the big stick for regulators. I hope the EU DPCs are paying attention to this judgement.
New fast food frankenstein dish just dropped:
a fast food dish created in 2003 in the Dutch city of Rotterdam, consisting of a layer of french fries placed into a disposable metal take-away tray, topped with döner or gyro meat, covered with slices of Gouda cheese, and heated in an oven until the cheese melts. Then a layer of shredded iceberg lettuce is added, dressed with garlic sauce and sambal, a hot sauce from Indonesia .. The term kapsalon is Dutch for “hairdressing salon” or barber shop, alluding to one of the inventors of the dish who worked as a hairdresser.This sounds delicious.
“The Story of Mel” is a legendary USENET story of “Mel”, a Real Programmer from back in the day, performing a truly impressive piece of optimization; a “paean to seat-of-the-pants machine coding”, as Micheal puts it. This site is a little shrine to Mel’s life and history from a MeFi user. (Via Meehawl)
Excellent “AI for good” idea from the Bulletin of the Atomic Scientists:
Investments in and development of technologies for autonomous demining operations, post war, are long overdue and consistent with the White House’s push for a Blueprint for an AI Bill of Rights, which vows to use autonomy for the public good. Alas, while the Defense Department has pursued autonomous systems for the battlefield and the unincentivized private sector has focused on producing dancing robotic dogs, efforts to develop autonomous demining technology have stagnated. The United States should provide funding to energize those efforts, regardless of what decision is made in regard to sending cluster bombs to Kiev.
The AI enshittification continues:
Job seekers may virtually interview with or be prescreened by an artificial-intelligence program such as HireVue, Harver, or Plum. After someone applies to a job at a company that uses this software, they may receive an automated survey asking them to answer inane personality-assessment questions like “Which statement describes you best? (a) I love debating academic theories or (b) I adopt a future emphasis.” […] And these AI-moderated processes might not be fair, either. Researchers at the University of California, Berkeley, say that AI decision-making systems could have a 44% chance of being embedded with gender bias, a 26% chance of displaying both gender and race bias, and may also be prone to screening out applicants with disabilities. In one notorious case, an audit of an AI screening tool found that it prioritized candidates who played high-school lacrosse or were named “Jared.”
A very neat trick via Marc Brooker to improve tail latencies using erasure coding: ‘Say I have an in-memory cache of objects. I can keep any object in the cache once, and always go looking for it in that one place (e.g. with consistent hashing). If that place is slow, overloaded, experiencing packet loss, or whatever, I’ll see high latency for all attempts to get that object. With hedging I can avoid that, if I store the object in two places rather than one, at the cost of doubling the size of my cache. But what if I wanted to avoid the slowness and not double the size of my cache? Instead of storing everything twice, I could break it into (for example) 5 pieces .. encoded in such a way that I could reassemble it from any four pieces .. . Then, when I fetch, I send five get requests, and have the whole object as soon as four have returned. The overhead here on requests is 5x, on bandwidth is worst-case 20%, and on storage is 20%. The effect on tail latency can be considerable.’
Some lovely details in this writeup of a new system in AWS Lambda, via Marc Brooker:
This system gets performance by doing as little work as possible (deduplication, caching, lazy loading), and then gets resilience by doing slightly more work than needed (erasure coding, salted deduplication, etc). This is a tension worth paying attention to in all system designs.
tl;dr: vaccination of kids is worth it to protect against Long Covid and hospitalisation. “A Methodological Framework for Assessing the Benefit of SARS-CoV-2 Vaccination following Previous Infection: Case Study of Five- to Eleven-Year-Olds”, Christina Pagel et al.:
We present a novel methodological framework for estimating the potential benefits of COVID-19 vaccination in previously infected children aged five to eleven, accounting for waning. We apply this framework to the UK context and for two adverse outcomes: hospitalisation related to SARS-CoV-2 infection and Long Covid. We show that the most important drivers of benefit are: the degree of protection provided by previous infection; the protection provided by vaccination; the time since previous infection; and future attack rates. Vaccination can be very beneficial for previously infected children if future attack rates are high and several months have elapsed since the previous major wave in this group. Benefits are generally larger for Long Covid than hospitalisation, because Long Covid is both more common than hospitalisation and previous infection offers less protection against it. Our framework provides a structure for policy makers to explore the additional benefit of vaccination across a range of adverse outcomes and different parameter assumptions. It can be easily updated as new evidence emerges.
The EDPB finally had to step in and override the pet regulator, our DPC. Here’s the big problem though:
Meta also has until November 12 to delete or move back to the EU the personal data of European Facebook users transferred and stored in the U.S. since 2020 and until a new EU-U.S. deal is reached.This is going to be technically infeasible given Meta’s architecture, so the next question is, what happens when they fail to do it…
“dark testing”, live in production, to a separate test domain. Great way to gather some real-world data. Latencies are appreciably better, particularly for low-quality connections
One of the reasons so many people suddenly care about artificial intelligence is that we love panicking about things we don’t understand. Misunderstanding allows us to project spectacular dangers on to the future. Many of the very people responsible for developing these models (who have enriched themselves) warn us about artificial intelligence systems achieving some sort of sentience and taking control of important areas of life. Others warn of massive job displacement from these systems. All of these predictions assume that the commercial deployment of artificial intelligence actually would work as designed. Fortunately, most things don’t. That does not mean we should ignore present and serious dangers of poorly designed and deployed systems. For years predictive modeling has distorted police work and sentencing procedures in American criminal justice, surveilling and punishing Black people disproportionately. Machine learning systems are at work in insurance and health care, mostly without transparency, accountability, oversight or regulation. We are committing two grave errors at the same time. We are hiding from and eluding artificial intelligence because it seems too mysterious and complicated, rendering the current, harmful uses of it invisible and undiscussed. And we are fretting about future worst-case scenarios that resemble the movie The Matrix more than any world we would actually create for ourselves. Both of these habits allow the companies that irresponsibly deploy these systems to exploit us. We can do better. I will do my part by teaching better in the future, but not by ignoring these systems and their presence in our lives.
Synchronize multiple Pi-hole instances; it basically runs the standard backup API on the primary instance, then restores that config to the secondary, ensuring it constantly stays in sync
“A repo of links to articles, papers, conference talks, and tooling related to load management in software services: loadshedding, circuitbreaking, quota management and throttling. PRs welcome.” (via Niall Murphy)
Reading between the lines: Ubuntu unattended-upgrades were left enabled, and as a result a fix which required a full reboot was rolled out swiftly and globally, including a key fleet of network control hosts, regardless of any normal deployment phasing rules. This broke all regions and AZs within a 1 hour period. whoopsie
I don’t use either service, but this is actually an excellent writeup of some high-end performance optimization on modern Linux EC2-based systems with NVMe SSDs, and the benchmarking of same
Something new to worry about — giving an AI the keys to the nukes:
Any country that inserts AI into its [nuclear] command and control will motivate others to follow suit, if only to maintain a credible deterrent. Michael Klare, a peace-and-world-security-studies professor at Hampshire College, has warned that if multiple countries automate launch decisions, there could be a “flash war” analogous to a Wall Street “flash crash.” Imagine that an American AI misinterprets acoustic surveillance of submarines in the South China Sea as movements presaging a nuclear attack. Its counterstrike preparations would be noticed by China’s own AI, which would actually begin to ready its launch platforms, setting off a series of escalations that would culminate in a major nuclear exchange.
Common misconceptions about swap memory on Linux systems:
Swap is a useful tool to allow equality of reclamation of memory pages, but its purpose is frequently misunderstood, leading to its negative perception across the industry. If you use swap in the spirit intended, though – as a method of increasing equality of reclamation – you’ll find that it’s a useful tool instead of a hindrance. Disabling swap does not prevent disk I/O from becoming a problem under memory contention, it simply shifts the disk I/O thrashing from anonymous pages to file pages. Not only may this be less efficient, as we have a smaller pool of pages to select from for reclaim, but it may also contribute to getting into this high contention state in the first place.(via valen)
handy web tool to figure out if a quote for a domestic solar PV install in Ireland is cheap, on the money, or too pricey
in one year — Sixty. Five. Million. Dollars.
$42 for 20GB of 5G/4G LTE data, and can provide a mobile hotspot for other devices. Looks like a decent enough deal for EU travellers visiting the US, where low-cost data roaming isn’t available (via ITC Slack)
“Magical shell history”:
Atuin replaces your existing shell history with a SQLite database, and records additional context for your commands. Additionally, it provides optional and fully encrypted synchronisation of your history between machines, via an Atuin server.(via Nelson)
Debunking this common misconception around e-cigarettes
Great stuff from Ted Chiang:
A former McKinsey employee has described the company as “capital’s willing executioners”: if you want something done but don’t want to get your hands dirty, McKinsey will do it for you. That escape from accountability is one of the most valuable services that management consultancies provide. Bosses have certain goals, but don’t want to be blamed for doing what’s necessary to achieve those goals; by hiring consultants, management can say that they were just following independent, expert advice. Even in its current rudimentary form, A.I. has become a way for a company to evade responsibility by saying that it’s just doing what “the algorithm” says, even though it was the company that commissioned the algorithm in the first place. The question we should be asking is: as A.I. becomes more powerful and flexible, is there any way to keep it from being another version of McKinsey?
‘F3 (Fight Flash Fraud or Fight Fake Flash) tests the full capacity of a flash card (flash drive, flash disk, pendrive). It writes to the card and then checks if it can read it. It will assure you haven’t been sold a card with a smaller capacity than stated.’
As you encounter these ideologies [Transhumanism, Extropianism, Singularitarianism, Cosmism, Rationalism, Effective Altruism, and Longtermism] in the wild, you might use the TESCREAL lens, and its alignment with Eurasianism and Putin’s agenda, to evaluate them, and ask whether they tend to undermine or enhance the project of liberal democracy. TESCREAL ideologies tend to advance an illiberal agenda and authoritarian tendencies, and it’s worth turning a very critical eye towards them, especially in cases where that’s demonstrably true. Clearly there are countless well-meaning people trying to use technology and reason to improve the world, but that should never come at the expense of democratic, inclusive, fair, patient, and just governance. The biggest risk AI poses right now is that alarmists will use the fears surrounding it as a cudgel to enact sweeping policy reforms. We should resist those efforts. Now more than ever, we should be guided by expertise, facts, and evidence as we seek to use technology in ways that benefit everyone.
More evidence of a “substantially increased risk of developing a diverse spectrum of new-onset autoimmune diseases”:
Previously we knew there were many features of autoimmunity engendered by Covid, but the link to manifesting important autoimmune diseases has not been established. There are still many dots not connected—it’s fuzzy. We need to better understand how the dysregulation of our immune system that can occur from a Covid infection (or even more rarely from a vaccine) can be linked with a serious autoimmune condition. While we’ve fully recognized that people with autoimmune diseases are more vulnerable to Covid and adverse outcomes, the flip of that — that Covid can make some people vulnerable to autoimmune diseases — is what’s new.(from the always excellent Eric Topol.)
In a new Nature Neuroscience paper published Monday, Huth and a team of researchers from the University of Texas at Austin introduced a new “brain decoder” enabled by GPT-1, an earlier version of the artificial neural network technology that underpins ChatGPT. After digesting several hours of training data, the new tool was able to describe the gist of stories the three participants in the proof-of-concept experiment listened to — just by looking at their functional MRI scans.Very cool stuff. And I am happy to see the ethical considerations have been considered:
“It is important to constantly evaluate what the implications are of new brain decoders for mental privacy,” said Jerry Tang, a Ph.D. candidate in Huth’s lab and lead author on the paper, in a press briefing. In devising ways to protect privacy, the authors asked participants to try to prevent the decoder from reconstructing the words they were hearing several different ways. Particularly effective methods included mentally listing off animals, and telling a different story at the same time the podcast was playing were particularly effective at stopping the decoder, said Tang. The authors also found that the decoder had to be trained on each subject’s data and wasn’t effective when used on another person. Between these findings and the fact that any movement would make the fMRI scans worse, the authors concluded that it’s not currently possible for a brain decoder to be used on someone against their will.
“A High School Teacher’s Free Image Database Powers AI Unicorns”:
To build LAION, founders scraped visual data from companies such as Pinterest, Shopify and Amazon Web Services — which did not comment on whether LAION’s use of their content violates their terms of service — as well as YouTube thumbnails, images from portfolio platforms like DeviantArt and EyeEm, photos from government websites including the US Department of Defense, and content from news sites such as The Daily Mail and The Sun. If you ask Schuhmann, he says that anything freely available online is fair game. But there is currently no AI regulation in the European Union, and the forthcoming AI Act, whose language will be finalized early this summer, will not rule on whether copyrighted materials can be included in big data sets. Rather, lawmakers are discussing whether to include a provision requiring the companies behind AI generators to disclose what materials went into the data sets their products were trained on, thus giving the creators of those materials the option of taking action. […] “It has become a tradition within the field to just assume you don’t need consent or you don’t need to inform people, or they don’t even have to be aware of it. There is a sense of entitlement that whatever is on the web, you can just crawl it and put it in a data set,” said Abeba Birhane, a Senior Fellow in Trustworthy AI at Mozilla Foundation.
Fantastic thread of hackers scratching their own itch (via SimonW)
“some people understand immediately when i try to explain what it was like to be fully in the grip of the yudkowskian AI risk stuff and some people it doesn’t seem to land at all, which is probably good for them and i wish i had been so lucky”. Bananas…
Impressively, when these models are trained on programming languages, they can adeptly transform code into natural language explanations. […] Code Insight is a new feature based on Sec-PaLM, one of the generative AI models hosted on Google Cloud AI. What sets this functionality apart is its ability to generate natural language summaries from the point of view of an AI collaborator specialized in cybersecurity and malware. This provides security professionals and analysts with a powerful tool to figure out what the code is up to. At present, this new functionality is deployed to analyze a subset of PowerShell files uploaded to VirusTotal. The system excludes files that are highly similar to those previously processed, as well as files that are excessively large. This approach allows for the efficient use of analysis resources, ensuring that only the most relevant files (such as PS1 files) are subjected to scrutiny. In the coming days, additional file formats will be added to the list of supported files, broadening the scope of this functionality even further.(via Julie on ITC Slack)
This is fascinating history:
An establishment with a legacy such as [The Lahore Gymkhana Club, founded in 1878 under British rule] needed to continue revamping itself and serve exclusive dishes for its high-end clientele. And the club, along with restaurants aspiring to serve continental food, was bolstered by a growing taste for a new ingredient in town: processed cheese. “Sandwiches gradually started becoming popular in the 1980s because of the [wider] availability of cheese and mushrooms,” says Chaudhry. Until the 1980s, processed cheese was largely imported, and its use was limited to the rich, who would frequent establishments such as the Gymkhana. As Lahori taste buds adapted to and appreciated cheese, production was initiated locally. Demand for cheeseburgers and sandwiches skyrocketed in the 1990s, with a growing number of Pakistanis who’d traveled to the U.S. aspiring to re-create offerings from various popular American chains. One of these is exceptionally familiar. Even today, online food groups in Pakistan are peppered with people asking the community where they can find a cheesesteak in Lahore “like the one at Pat’s.” Many of them post images of the cheesesteaks from the original shop at 9th and Passyunk.
Charlie Stross visits the Advanced Gas-cooled Reactors at Torness nuclear power station:
The AGRs at Torness [in the UK] are not ordinary civil [nuclear] power reactors. Designed in the 1970’s, they were the UK’s bid to build an export-earning civil nuclear power system. They’re sensitive thoroughbreds, able to reach a peak conversion efficiency of 43% — that is, able to turn up to 43% of their energy output into electricity. By comparison, a PWR peaks at 31-32%. However, the PWRs have won the race for commercial success: they’re much, much, simpler. AGRs are like Concorde — technological marvels, extremely sophisticated and efficient, and just too damned expensive and complex for their own good. (You want complexity? Torness was opened in 1989. For many years thereafter, its roughly fifty thousand kilometres of aluminium plumbing made it the most complex and demanding piece of pipework in Europe. You want size? The multi-thousand ton reactor core of an AGR is bigger than the entire plant at some PWR installations.) It’s a weird experience, crawling over the guts of one of the marvels of the atomic age, smelling the thing (mostly machine oil and steam, and a hint of ozone near the transformers), all the while knowing that although it’s one of the safest and most energy-efficient civilian power reactors ever built it’s a a technological dead-end, that there won’t be any more of them, and that when it shuts down in thirty or forty years’ time this colossal collision between space age physics and victorian plumbing will be relegated to a footnote in the history books. “Energy too cheap to meter” it ain’t, but as a symbol of what we can achieve through engineering it’s hard to beat.
“This plaque was commemorated on October 10, 2018, commemorate its own commemoration. Plaques like this one are an integral part of the campaign to support more plaques like this one. By reading this plaque, you have made a valuable addition to the number of people who have read this plaque. To this day and up to the end of this sentence, this plaque continues to be read by people like yourself. Heritage Toronto 2018”
This is a really atrocious idea:
Palantir also isn’t selling a military-specific AI or large language model (LLM) here, it’s offering to integrate existing systems into a controlled environment. The AIP demo shows the software supporting different open-source LLMs, including FLAN-T5 XL, a fine-tuned version of GPT-NeoX-20B, and Dolly-v2-12b, as well as several custom plug-ins. Even fine-tuned AI systems off the shelf have plenty of known issues that could make asking them what to do in a warzone a nightmare. For example, they’re prone to simply making things up, or “hallucinating.” GPT-NeoX-20B in particular is an open-source alternative to GPT-3, a previous version of OpenAI’s language model, created by a startup called EleutherAI. One of EleutherAI’s open-source models — fine-tuned by another startup called Chai — recently convinced a Belgian man who spoke to it for six weeks to kill himself. What Palantir is offering is the illusion of safety and control for the Pentagon as it begins to adopt AI. […] What AIP does not do is walk through how it plans to deal with the various pernicious problems of LLMs and what the consequences might be in a military context. AIP does not appear to offer solutions to those problems beyond “frameworks” and “guardrails” it promises will make the use of military AI “ethical” and “legal.”
More on yesterday’s img2dataset failure to support opt-in:
It isn’t “effective altruism” if you have to force people to comply with you.
“The staffers who are responsible for the safety and ethical implications of new products have been told not to get in the way or to try to kill any of the generative AI tools in development,” employees told Bloomberg. The ethics team is now “disempowered and demoralized,” according to former and current staffers. Before OpenAI launched ChatGPT in November 2022, Google’s approach to AI was more cautious and less consumer-facing, often working in the background of tools like Search and Maps. But since ChatGPT’s enormous popularity prompted a “code red” from executives, Google’s threshold for safe product releases has been lowered in an effort to keep up with its AI competitors.
The author of this popular AI training data scraping tool doesn’t seem to understand consent and opt-in:
Letting a small minority [ie web publishers] prevent the large majority [AI users] from sharing their images and from having the benefit of last gen AI tool would definitely be unethical yes. Consent is obviously not unethical. You can give your consent for anything if you wish. It seems you’re trying to decide for million of other people without asking them for their consent.In other words, “scraping your content without opt-in is better than denying access to your content for millions of potential future AI users”. An issue to implement robots.txt support has been languishing since 2021. Good arguments for blocking the img2dataset user agent in general…
Aside from the weirdness of Mumsnet, I didn’t know about the influence of the mid-2000s skeptics movement:
While claiming to be the country’s foremost critical thinkers, the group was riddled with anti-humanities bias and a fetish for a certain kind of “science” that it held to reveal a set of immutable principles upon which the world was built with almost no regard whatsoever for interpretative analysis based on social or historical factors. Part of this mode of thinking was an especially reductivist biologism: the idea that there are immutable realities to be found in our DNA, and if we just paid enough attention to Science and stopped trying to split hairs and discover meaning over in the superfluous disciplines of the humanities, then everything would be much simpler. It’s precisely this kind of biological essentialism — which skirts dangerously close to eugenics — that leads people to think they can “debunk” a person’s claim to their gender identity, or that it should be subjected to rigorous testing by someone in a lab coat before we can believe the subject is who they say they are.
Ed Yong is back writing again!
Most Americans simply aren’t thinking about COVID with the same acuity they once did; the White House long ago zeroed in on hospitalizations and deaths as the measures to worry most about. And what was once outright denial of long COVID’s existence has morphed into something subtler: a creeping conviction, seeded by academics and journalists and now common on social media, that long COVID is less common and severe than it has been portrayed—a tragedy for a small group of very sick people, but not a cause for societal concern. This line of thinking points to the absence of disability claims, the inconsistency of biochemical signatures, and the relatively small proportion of severe cases as evidence that long COVID has been overblown. “There’s a shift from ‘Is it real?’ to ‘It is real, but …,’” Lekshmi Santhosh, the medical director of a long-COVID clinic at UC San Francisco, told me. Yet long COVID is a substantial and ongoing crisis—one that affects millions of people. However inconvenient that fact might be to the current “mission accomplished” rhetoric, the accumulated evidence, alongside the experience of long haulers, makes it clear that the coronavirus is still exacting a heavy societal toll.
The company could have saved itself a giant headache by building in robust data record-keeping from the start, she says. Instead, it is common in the AI industry to build data sets for AI models by scraping the web indiscriminately and then outsourcing the work of removing duplicates or irrelevant data points, filtering unwanted things, and fixing typos. These methods, and the sheer size of the data set, mean tech companies tend to have a very limited understanding of what has gone into training their models.
she really gets it. Lots of interesting thoughts
“Scheduling compute workloads to chase green energy can be counter-productive” — Adrian Cockroft:
I suggest that the best policy is to optimize your workloads so that they can run on fewer more highly utilized instances, minimize your total footprint in Asia where possible, and to use the spot market price as a guide for when to run workloads.
TIL you can run ethernet over coax at 2.5gbps. Long gone are the days of vampire taps and 10BASE2
Good roundup from Simon Willison on this brave new world of exploits. ‘Any time you see anyone demonstrating a new application built on top of LLMs, join me in being the squeaky wheel that asks “how are you taking prompt injection into account?”’
Jaysus this is a litany of failure.
Abstract. This paper presents a security review of the mobile apps provided by the UK’s leading banks; we focus on the connections the apps make, and the way in which TLS is used. We apply existing TLS testing methods to the apps which only find errors in legacy apps. We then go on to look at extensions of these methods and find five of the apps have serious vulnerabilities. In particular, we find an app that pins a TLS root CA certificate, but do not verify the hostname. In this case, the use of certificate pinning means that all existing test methods would miss detecting the hostname verification flaw. We also find one app that doesn’t check the certificate hostname, but bypasses proxy settings, resulting in failed detection by pentesting tools. We find that three apps load adverts over insecure connections, which could be exploited for in-app phishing attacks. Some of the apps used the users’ PIN as authentication, for which PCI guidelines require extra security, so these apps use an additional cryptographic protocol; we study the underlying protocol of one banking app in detail and show that it provides little additional protection, meaning that an active man-in-the-middle attacker can retrieve the user’s credentials, login to the bank and perform every operation the legitimate user could.See also: https://www.synopsys.com/blogs/software-security/ineffective-certificate-pinning-implementations/
Wow, DuckDB is very impressive — I had no idea it could handle SELECTs against Parquet data in S3:
A common pattern to ingest streaming data and store it in S3 is to use Kinesis Data Firehose Delivery Streams, which can write the incoming stream data as batched parquet files to S3. You can use custom S3 prefixes with it when using Lambda processing functions, but by default, you can only partition the data by the timestamp (the timestamp the event reached the Kinesis Data Stream, not the event timestamp!). So, a few common use cases for data repartitioning could include: Repartitioning the written data for the real event timestamp if it’s included in the incoming data; Repartitioning the data for other query patterns, e.g. to support query filter pushdown and optimize query speeds and costs; Aggregation of raw or preprocessed data, and storing them in an optimized manner to support analytical queries.
Couldn’t agree more with Timnit Gebru’s comments here:
What is your appeal to policymakers? What would you want Congress and regulators to do now to address the concerns you outline in the open letter? Congress needs to focus on regulating corporations and their practices, rather than playing into their hype of “powerful digital minds.” This, by design, ascribes agency to the products rather than the organizations building them. This language obfuscates the amount of data that is being collected — and the amount of worker exploitation involved with those who are labeling and supplying the datasets, and moderating model outputs. Congress needs to ensure corporations are not using people’s data without their consent, and hold them responsible for the synthetic media they produce — whether it is text or media spewing disinformation, hate speech or other types of harmful content. Regulations need to put the onus on corporations, rather than understaffed agencies. There are probably existing regulations these organizations are breaking. There are mundane “AI” systems being used daily; we just heard about another Black man being wrongfully arrested because of the use of automated facial analysis systems. But that’s not what we’re talking about, because of the hype.
This is amazing — using GPT-3.5 to convert a natural-language query into SQL applied to a specific dataset, in these examples, San Francisco city data and US public census data:
With CensusGPT, you can ask any question related to census data in natural language. These natural language questions get converted to SQL using GPT-3.5 and are then used to query the census database. Here are some examples: – Five cities with a population over 100,000 and lowest crime – 10 highest income areas in california Here is a similar example from sfGPT: – Which four neighborhoods had the most crime in San Francisco in 2021?
This is a major difference between vanilla MySQL and Amazon Aurora (and a potentially major risk!):
because Aurora MySQL primary and replica instances share a storage layer, they share a set of undo logs. This means that, for a REPEATABLE READ isolation level, the storage instance must maintain undo logs at least as far back as could be required to satisfy transactional guarantees for the primary or any read replica instance. Long-running replica transactions can negatively impact writer performance in Aurora MySQL—finally, an explanation for the incident that spawned this investigation. The same scenario plays out differently in vanilla MySQL because of its different model for undo logs. Vanilla MYSQL: there are two undo logs – one on the writer, and one on the reader. The performance impact of an operation that prevents the garbage collection of undo log records will be isolated to either the writer or the reader. Aurora MySQL: there is a single undo log that is shared between the writer and reader. The performance impact of an operation that prevents the garbage collection of undo log records will affect the entire cluster.
Comparison site for electric cars; actually has a realistic model of genuine range for each EV. Full details on charging connectors, charge curves (for charging speed), etc.
Some fascinating details of low-level Java performance optimization, particularly with JIT applied to OO method dispatch:
Programming languages like Java provide the facilities for subtyping/polymorphism as one of the ways to construct modular and reusable software. This language choice naturally comes at a price, since there is no hardware support for virtual calls, and therefore runtimes have to emulate this behavior. In many, many cases the performance of method dispatch is not important. Actually, in a vast majority of cases, the low-level performance concerns are not the real concerns. However, there are cases when method dispatch performance is important, and there you need to understand how dispatch works, what runtimes optimize for you, and what you can do to cheat and/or emulate similar behavior in your code. For example, in the course of String Compression work, we were faced with the problem of selecting the coder for a given String. The obvious and highly maintainable approach of creating a Coder interface, a few implementations, and dispatching the virtual calls over it, had met some performance problems on the very tiny benchmarks. Therefore, we needed to contemplate something better. After a few experiments, this post was born as a reference for others who might try to do the same. This post also tangentially touches the inlining of virtual calls, as the natural thing during the optimization.Discovered via this amazing commit: https://github.com/quarkusio/quarkus/commit/65dd4d43e2644db1c87726139280f9704140167c
Oof. Looks like the commercial company behind MariaDB is going south quickly:
Monty, the creator of MySQL and MariaDB founder, hasn’t been at a company meeting for over a year and a half. The relationship between Monty and the CEO, Michael Howard, is extremely rocky. At a company all-hands meeting Monty and Michael Howard were shouting at each other while up on stage in the auditorium in front of the entire staff. Monty made his position perfectly clear as he shouted his last words before he walked out: “You’re killing my fu&#@$! company!!!” Monty was subsequently voted off the board in July of 2022 solidifying the hostile takeover by Michael Howard. Buyer beware, Monty and his group of founders and database experts are no longer at the company.At least the open-source product is still trustworthy, though.
Contractors say they have a set amount of time to complete each task, like review a prompt, and the time they’re allotted for tasks can vary wildly — from as little as 60 seconds to more than several minutes. Still, raters said it’s difficult to rate a response when they are not well-versed in a topic the chatbot is talking about, including technical topics like blockchain for example. Because each assigned task represents billable time, some workers say they will complete the tasks even if they realize they cannot accurately assess the chatbot responses. “Some people are going to say that’s still 60 seconds of work, and I can’t recoup this time having sat here and figured out I don’t know enough about this, so I’m just going to give it my best guess so I can keep that pay and keep working,” one rater said.
Because the AI-enhanced virtual assistants scrape text and images off the web, they are open to a type of attack called indirect prompt injection, in which a third party alters a website by adding hidden text that is meant to change the AI’s behavior. Attackers could use social media or email to direct users to websites with these secret prompts. Once that happens, the AI system could be manipulated to let the attacker try to extract people’s credit card information, for example. Malicious actors could also send someone an email with a hidden prompt injection in it. If the receiver happened to use an AI virtual assistant, the attacker might be able to manipulate it into sending the attacker personal information from the victim’s emails, or even emailing people in the victim’s contacts list on the attacker’s behalf.