They are dubbing it “Triangulation”:
We believe that the main reason for this incident is the proprietary nature of iOS. This operating system is a “black box” in which spyware like Triangulation can hide for years. Detecting and analyzing such threats is made more difficult by Apple’s monopoly of research tools, making it the perfect haven for spyware. In other words, as I have said more than once, users are given the illusion of security associated with the complete opacity of the system. What actually happens in iOS is unknown to the cybersecurity experts.
Justin Mason's Weblog Posts
Sucralose, as used in Splenda, is genotoxic. big yikes
The FTC have proposed a judgement against Amazon/Ring: “FTC says Ring employees illegally surveilled customers, failed to stop hackers from taking control of users’ cameras. Under proposed order, Ring will be prohibited from profiting from unlawfully collected consumer videos, pay $5.8M in consumer refunds.” Meredith Whittaker on Twitter, responding: “Speaking of real AI regulation grounded in reality! The part about Amazon being “prohibited from profiting from unlawfully collected consumer videos” is huge. Data protection IS AI regulation. & in this case will likely mean undoing datasets, retraining/disposing of models, etc.” Retraining/discarding datasets is a HUGE deal for AI/ML companies. This is the big stick for regulators. I hope the EU DPCs are paying attention to this judgement.
New fast food frankenstein dish just dropped:
a fast food dish created in 2003 in the Dutch city of Rotterdam, consisting of a layer of french fries placed into a disposable metal take-away tray, topped with döner or gyro meat, covered with slices of Gouda cheese, and heated in an oven until the cheese melts. Then a layer of shredded iceberg lettuce is added, dressed with garlic sauce and sambal, a hot sauce from Indonesia .. The term kapsalon is Dutch for “hairdressing salon” or barber shop, alluding to one of the inventors of the dish who worked as a hairdresser.This sounds delicious.
“The Story of Mel” is a legendary USENET story of “Mel”, a Real Programmer from back in the day, performing a truly impressive piece of optimization; a “paean to seat-of-the-pants machine coding”, as Micheal puts it. This site is a little shrine to Mel’s life and history from a MeFi user. (Via Meehawl)
Excellent “AI for good” idea from the Bulletin of the Atomic Scientists:
Investments in and development of technologies for autonomous demining operations, post war, are long overdue and consistent with the White House’s push for a Blueprint for an AI Bill of Rights, which vows to use autonomy for the public good. Alas, while the Defense Department has pursued autonomous systems for the battlefield and the unincentivized private sector has focused on producing dancing robotic dogs, efforts to develop autonomous demining technology have stagnated. The United States should provide funding to energize those efforts, regardless of what decision is made in regard to sending cluster bombs to Kiev.
The AI enshittification continues:
Job seekers may virtually interview with or be prescreened by an artificial-intelligence program such as HireVue, Harver, or Plum. After someone applies to a job at a company that uses this software, they may receive an automated survey asking them to answer inane personality-assessment questions like “Which statement describes you best? (a) I love debating academic theories or (b) I adopt a future emphasis.” […] And these AI-moderated processes might not be fair, either. Researchers at the University of California, Berkeley, say that AI decision-making systems could have a 44% chance of being embedded with gender bias, a 26% chance of displaying both gender and race bias, and may also be prone to screening out applicants with disabilities. In one notorious case, an audit of an AI screening tool found that it prioritized candidates who played high-school lacrosse or were named “Jared.”
A very neat trick via Marc Brooker to improve tail latencies using erasure coding: ‘Say I have an in-memory cache of objects. I can keep any object in the cache once, and always go looking for it in that one place (e.g. with consistent hashing). If that place is slow, overloaded, experiencing packet loss, or whatever, I’ll see high latency for all attempts to get that object. With hedging I can avoid that, if I store the object in two places rather than one, at the cost of doubling the size of my cache. But what if I wanted to avoid the slowness and not double the size of my cache? Instead of storing everything twice, I could break it into (for example) 5 pieces .. encoded in such a way that I could reassemble it from any four pieces .. . Then, when I fetch, I send five get requests, and have the whole object as soon as four have returned. The overhead here on requests is 5x, on bandwidth is worst-case 20%, and on storage is 20%. The effect on tail latency can be considerable.’
Some lovely details in this writeup of a new system in AWS Lambda, via Marc Brooker:
This system gets performance by doing as little work as possible (deduplication, caching, lazy loading), and then gets resilience by doing slightly more work than needed (erasure coding, salted deduplication, etc). This is a tension worth paying attention to in all system designs.
tl;dr: vaccination of kids is worth it to protect against Long Covid and hospitalisation. “A Methodological Framework for Assessing the Benefit of SARS-CoV-2 Vaccination following Previous Infection: Case Study of Five- to Eleven-Year-Olds”, Christina Pagel et al.:
We present a novel methodological framework for estimating the potential benefits of COVID-19 vaccination in previously infected children aged five to eleven, accounting for waning. We apply this framework to the UK context and for two adverse outcomes: hospitalisation related to SARS-CoV-2 infection and Long Covid. We show that the most important drivers of benefit are: the degree of protection provided by previous infection; the protection provided by vaccination; the time since previous infection; and future attack rates. Vaccination can be very beneficial for previously infected children if future attack rates are high and several months have elapsed since the previous major wave in this group. Benefits are generally larger for Long Covid than hospitalisation, because Long Covid is both more common than hospitalisation and previous infection offers less protection against it. Our framework provides a structure for policy makers to explore the additional benefit of vaccination across a range of adverse outcomes and different parameter assumptions. It can be easily updated as new evidence emerges.
The EDPB finally had to step in and override the pet regulator, our DPC. Here’s the big problem though:
Meta also has until November 12 to delete or move back to the EU the personal data of European Facebook users transferred and stored in the U.S. since 2020 and until a new EU-U.S. deal is reached.This is going to be technically infeasible given Meta’s architecture, so the next question is, what happens when they fail to do it…
“dark testing”, live in production, to a separate test domain. Great way to gather some real-world data. Latencies are appreciably better, particularly for low-quality connections
One of the reasons so many people suddenly care about artificial intelligence is that we love panicking about things we don’t understand. Misunderstanding allows us to project spectacular dangers on to the future. Many of the very people responsible for developing these models (who have enriched themselves) warn us about artificial intelligence systems achieving some sort of sentience and taking control of important areas of life. Others warn of massive job displacement from these systems. All of these predictions assume that the commercial deployment of artificial intelligence actually would work as designed. Fortunately, most things don’t. That does not mean we should ignore present and serious dangers of poorly designed and deployed systems. For years predictive modeling has distorted police work and sentencing procedures in American criminal justice, surveilling and punishing Black people disproportionately. Machine learning systems are at work in insurance and health care, mostly without transparency, accountability, oversight or regulation. We are committing two grave errors at the same time. We are hiding from and eluding artificial intelligence because it seems too mysterious and complicated, rendering the current, harmful uses of it invisible and undiscussed. And we are fretting about future worst-case scenarios that resemble the movie The Matrix more than any world we would actually create for ourselves. Both of these habits allow the companies that irresponsibly deploy these systems to exploit us. We can do better. I will do my part by teaching better in the future, but not by ignoring these systems and their presence in our lives.
Synchronize multiple Pi-hole instances; it basically runs the standard backup API on the primary instance, then restores that config to the secondary, ensuring it constantly stays in sync
“A repo of links to articles, papers, conference talks, and tooling related to load management in software services: loadshedding, circuitbreaking, quota management and throttling. PRs welcome.” (via Niall Murphy)
Reading between the lines: Ubuntu unattended-upgrades were left enabled, and as a result a fix which required a full reboot was rolled out swiftly and globally, including a key fleet of network control hosts, regardless of any normal deployment phasing rules. This broke all regions and AZs within a 1 hour period. whoopsie
I don’t use either service, but this is actually an excellent writeup of some high-end performance optimization on modern Linux EC2-based systems with NVMe SSDs, and the benchmarking of same
Something new to worry about — giving an AI the keys to the nukes:
Any country that inserts AI into its [nuclear] command and control will motivate others to follow suit, if only to maintain a credible deterrent. Michael Klare, a peace-and-world-security-studies professor at Hampshire College, has warned that if multiple countries automate launch decisions, there could be a “flash war” analogous to a Wall Street “flash crash.” Imagine that an American AI misinterprets acoustic surveillance of submarines in the South China Sea as movements presaging a nuclear attack. Its counterstrike preparations would be noticed by China’s own AI, which would actually begin to ready its launch platforms, setting off a series of escalations that would culminate in a major nuclear exchange.
Common misconceptions about swap memory on Linux systems:
Swap is a useful tool to allow equality of reclamation of memory pages, but its purpose is frequently misunderstood, leading to its negative perception across the industry. If you use swap in the spirit intended, though – as a method of increasing equality of reclamation – you’ll find that it’s a useful tool instead of a hindrance. Disabling swap does not prevent disk I/O from becoming a problem under memory contention, it simply shifts the disk I/O thrashing from anonymous pages to file pages. Not only may this be less efficient, as we have a smaller pool of pages to select from for reclaim, but it may also contribute to getting into this high contention state in the first place.(via valen)
handy web tool to figure out if a quote for a domestic solar PV install in Ireland is cheap, on the money, or too pricey
in one year — Sixty. Five. Million. Dollars.
$42 for 20GB of 5G/4G LTE data, and can provide a mobile hotspot for other devices. Looks like a decent enough deal for EU travellers visiting the US, where low-cost data roaming isn’t available (via ITC Slack)
“Magical shell history”:
Atuin replaces your existing shell history with a SQLite database, and records additional context for your commands. Additionally, it provides optional and fully encrypted synchronisation of your history between machines, via an Atuin server.(via Nelson)
Debunking this common misconception around e-cigarettes
Great stuff from Ted Chiang:
A former McKinsey employee has described the company as “capital’s willing executioners”: if you want something done but don’t want to get your hands dirty, McKinsey will do it for you. That escape from accountability is one of the most valuable services that management consultancies provide. Bosses have certain goals, but don’t want to be blamed for doing what’s necessary to achieve those goals; by hiring consultants, management can say that they were just following independent, expert advice. Even in its current rudimentary form, A.I. has become a way for a company to evade responsibility by saying that it’s just doing what “the algorithm” says, even though it was the company that commissioned the algorithm in the first place. The question we should be asking is: as A.I. becomes more powerful and flexible, is there any way to keep it from being another version of McKinsey?
‘F3 (Fight Flash Fraud or Fight Fake Flash) tests the full capacity of a flash card (flash drive, flash disk, pendrive). It writes to the card and then checks if it can read it. It will assure you haven’t been sold a card with a smaller capacity than stated.’
As you encounter these ideologies [Transhumanism, Extropianism, Singularitarianism, Cosmism, Rationalism, Effective Altruism, and Longtermism] in the wild, you might use the TESCREAL lens, and its alignment with Eurasianism and Putin’s agenda, to evaluate them, and ask whether they tend to undermine or enhance the project of liberal democracy. TESCREAL ideologies tend to advance an illiberal agenda and authoritarian tendencies, and it’s worth turning a very critical eye towards them, especially in cases where that’s demonstrably true. Clearly there are countless well-meaning people trying to use technology and reason to improve the world, but that should never come at the expense of democratic, inclusive, fair, patient, and just governance. The biggest risk AI poses right now is that alarmists will use the fears surrounding it as a cudgel to enact sweeping policy reforms. We should resist those efforts. Now more than ever, we should be guided by expertise, facts, and evidence as we seek to use technology in ways that benefit everyone.
More evidence of a “substantially increased risk of developing a diverse spectrum of new-onset autoimmune diseases”:
Previously we knew there were many features of autoimmunity engendered by Covid, but the link to manifesting important autoimmune diseases has not been established. There are still many dots not connected—it’s fuzzy. We need to better understand how the dysregulation of our immune system that can occur from a Covid infection (or even more rarely from a vaccine) can be linked with a serious autoimmune condition. While we’ve fully recognized that people with autoimmune diseases are more vulnerable to Covid and adverse outcomes, the flip of that — that Covid can make some people vulnerable to autoimmune diseases — is what’s new.(from the always excellent Eric Topol.)
In a new Nature Neuroscience paper published Monday, Huth and a team of researchers from the University of Texas at Austin introduced a new “brain decoder” enabled by GPT-1, an earlier version of the artificial neural network technology that underpins ChatGPT. After digesting several hours of training data, the new tool was able to describe the gist of stories the three participants in the proof-of-concept experiment listened to — just by looking at their functional MRI scans.Very cool stuff. And I am happy to see the ethical considerations have been considered:
“It is important to constantly evaluate what the implications are of new brain decoders for mental privacy,” said Jerry Tang, a Ph.D. candidate in Huth’s lab and lead author on the paper, in a press briefing. In devising ways to protect privacy, the authors asked participants to try to prevent the decoder from reconstructing the words they were hearing several different ways. Particularly effective methods included mentally listing off animals, and telling a different story at the same time the podcast was playing were particularly effective at stopping the decoder, said Tang. The authors also found that the decoder had to be trained on each subject’s data and wasn’t effective when used on another person. Between these findings and the fact that any movement would make the fMRI scans worse, the authors concluded that it’s not currently possible for a brain decoder to be used on someone against their will.
“A High School Teacher’s Free Image Database Powers AI Unicorns”:
To build LAION, founders scraped visual data from companies such as Pinterest, Shopify and Amazon Web Services — which did not comment on whether LAION’s use of their content violates their terms of service — as well as YouTube thumbnails, images from portfolio platforms like DeviantArt and EyeEm, photos from government websites including the US Department of Defense, and content from news sites such as The Daily Mail and The Sun. If you ask Schuhmann, he says that anything freely available online is fair game. But there is currently no AI regulation in the European Union, and the forthcoming AI Act, whose language will be finalized early this summer, will not rule on whether copyrighted materials can be included in big data sets. Rather, lawmakers are discussing whether to include a provision requiring the companies behind AI generators to disclose what materials went into the data sets their products were trained on, thus giving the creators of those materials the option of taking action. […] “It has become a tradition within the field to just assume you don’t need consent or you don’t need to inform people, or they don’t even have to be aware of it. There is a sense of entitlement that whatever is on the web, you can just crawl it and put it in a data set,” said Abeba Birhane, a Senior Fellow in Trustworthy AI at Mozilla Foundation.
Fantastic thread of hackers scratching their own itch (via SimonW)
“some people understand immediately when i try to explain what it was like to be fully in the grip of the yudkowskian AI risk stuff and some people it doesn’t seem to land at all, which is probably good for them and i wish i had been so lucky”. Bananas…
Impressively, when these models are trained on programming languages, they can adeptly transform code into natural language explanations. […] Code Insight is a new feature based on Sec-PaLM, one of the generative AI models hosted on Google Cloud AI. What sets this functionality apart is its ability to generate natural language summaries from the point of view of an AI collaborator specialized in cybersecurity and malware. This provides security professionals and analysts with a powerful tool to figure out what the code is up to. At present, this new functionality is deployed to analyze a subset of PowerShell files uploaded to VirusTotal. The system excludes files that are highly similar to those previously processed, as well as files that are excessively large. This approach allows for the efficient use of analysis resources, ensuring that only the most relevant files (such as PS1 files) are subjected to scrutiny. In the coming days, additional file formats will be added to the list of supported files, broadening the scope of this functionality even further.(via Julie on ITC Slack)
This is fascinating history:
An establishment with a legacy such as [The Lahore Gymkhana Club, founded in 1878 under British rule] needed to continue revamping itself and serve exclusive dishes for its high-end clientele. And the club, along with restaurants aspiring to serve continental food, was bolstered by a growing taste for a new ingredient in town: processed cheese. “Sandwiches gradually started becoming popular in the 1980s because of the [wider] availability of cheese and mushrooms,” says Chaudhry. Until the 1980s, processed cheese was largely imported, and its use was limited to the rich, who would frequent establishments such as the Gymkhana. As Lahori taste buds adapted to and appreciated cheese, production was initiated locally. Demand for cheeseburgers and sandwiches skyrocketed in the 1990s, with a growing number of Pakistanis who’d traveled to the U.S. aspiring to re-create offerings from various popular American chains. One of these is exceptionally familiar. Even today, online food groups in Pakistan are peppered with people asking the community where they can find a cheesesteak in Lahore “like the one at Pat’s.” Many of them post images of the cheesesteaks from the original shop at 9th and Passyunk.
Charlie Stross visits the Advanced Gas-cooled Reactors at Torness nuclear power station:
The AGRs at Torness [in the UK] are not ordinary civil [nuclear] power reactors. Designed in the 1970’s, they were the UK’s bid to build an export-earning civil nuclear power system. They’re sensitive thoroughbreds, able to reach a peak conversion efficiency of 43% — that is, able to turn up to 43% of their energy output into electricity. By comparison, a PWR peaks at 31-32%. However, the PWRs have won the race for commercial success: they’re much, much, simpler. AGRs are like Concorde — technological marvels, extremely sophisticated and efficient, and just too damned expensive and complex for their own good. (You want complexity? Torness was opened in 1989. For many years thereafter, its roughly fifty thousand kilometres of aluminium plumbing made it the most complex and demanding piece of pipework in Europe. You want size? The multi-thousand ton reactor core of an AGR is bigger than the entire plant at some PWR installations.) It’s a weird experience, crawling over the guts of one of the marvels of the atomic age, smelling the thing (mostly machine oil and steam, and a hint of ozone near the transformers), all the while knowing that although it’s one of the safest and most energy-efficient civilian power reactors ever built it’s a a technological dead-end, that there won’t be any more of them, and that when it shuts down in thirty or forty years’ time this colossal collision between space age physics and victorian plumbing will be relegated to a footnote in the history books. “Energy too cheap to meter” it ain’t, but as a symbol of what we can achieve through engineering it’s hard to beat.
“This plaque was commemorated on October 10, 2018, commemorate its own commemoration. Plaques like this one are an integral part of the campaign to support more plaques like this one. By reading this plaque, you have made a valuable addition to the number of people who have read this plaque. To this day and up to the end of this sentence, this plaque continues to be read by people like yourself. Heritage Toronto 2018”
This is a really atrocious idea:
Palantir also isn’t selling a military-specific AI or large language model (LLM) here, it’s offering to integrate existing systems into a controlled environment. The AIP demo shows the software supporting different open-source LLMs, including FLAN-T5 XL, a fine-tuned version of GPT-NeoX-20B, and Dolly-v2-12b, as well as several custom plug-ins. Even fine-tuned AI systems off the shelf have plenty of known issues that could make asking them what to do in a warzone a nightmare. For example, they’re prone to simply making things up, or “hallucinating.” GPT-NeoX-20B in particular is an open-source alternative to GPT-3, a previous version of OpenAI’s language model, created by a startup called EleutherAI. One of EleutherAI’s open-source models — fine-tuned by another startup called Chai — recently convinced a Belgian man who spoke to it for six weeks to kill himself. What Palantir is offering is the illusion of safety and control for the Pentagon as it begins to adopt AI. […] What AIP does not do is walk through how it plans to deal with the various pernicious problems of LLMs and what the consequences might be in a military context. AIP does not appear to offer solutions to those problems beyond “frameworks” and “guardrails” it promises will make the use of military AI “ethical” and “legal.”
More on yesterday’s img2dataset failure to support opt-in:
It isn’t “effective altruism” if you have to force people to comply with you.
“The staffers who are responsible for the safety and ethical implications of new products have been told not to get in the way or to try to kill any of the generative AI tools in development,” employees told Bloomberg. The ethics team is now “disempowered and demoralized,” according to former and current staffers. Before OpenAI launched ChatGPT in November 2022, Google’s approach to AI was more cautious and less consumer-facing, often working in the background of tools like Search and Maps. But since ChatGPT’s enormous popularity prompted a “code red” from executives, Google’s threshold for safe product releases has been lowered in an effort to keep up with its AI competitors.
The author of this popular AI training data scraping tool doesn’t seem to understand consent and opt-in:
Letting a small minority [ie web publishers] prevent the large majority [AI users] from sharing their images and from having the benefit of last gen AI tool would definitely be unethical yes. Consent is obviously not unethical. You can give your consent for anything if you wish. It seems you’re trying to decide for million of other people without asking them for their consent.In other words, “scraping your content without opt-in is better than denying access to your content for millions of potential future AI users”. An issue to implement robots.txt support has been languishing since 2021. Good arguments for blocking the img2dataset user agent in general…
Aside from the weirdness of Mumsnet, I didn’t know about the influence of the mid-2000s skeptics movement:
While claiming to be the country’s foremost critical thinkers, the group was riddled with anti-humanities bias and a fetish for a certain kind of “science” that it held to reveal a set of immutable principles upon which the world was built with almost no regard whatsoever for interpretative analysis based on social or historical factors. Part of this mode of thinking was an especially reductivist biologism: the idea that there are immutable realities to be found in our DNA, and if we just paid enough attention to Science and stopped trying to split hairs and discover meaning over in the superfluous disciplines of the humanities, then everything would be much simpler. It’s precisely this kind of biological essentialism — which skirts dangerously close to eugenics — that leads people to think they can “debunk” a person’s claim to their gender identity, or that it should be subjected to rigorous testing by someone in a lab coat before we can believe the subject is who they say they are.
Ed Yong is back writing again!
Most Americans simply aren’t thinking about COVID with the same acuity they once did; the White House long ago zeroed in on hospitalizations and deaths as the measures to worry most about. And what was once outright denial of long COVID’s existence has morphed into something subtler: a creeping conviction, seeded by academics and journalists and now common on social media, that long COVID is less common and severe than it has been portrayed—a tragedy for a small group of very sick people, but not a cause for societal concern. This line of thinking points to the absence of disability claims, the inconsistency of biochemical signatures, and the relatively small proportion of severe cases as evidence that long COVID has been overblown. “There’s a shift from ‘Is it real?’ to ‘It is real, but …,’” Lekshmi Santhosh, the medical director of a long-COVID clinic at UC San Francisco, told me. Yet long COVID is a substantial and ongoing crisis—one that affects millions of people. However inconvenient that fact might be to the current “mission accomplished” rhetoric, the accumulated evidence, alongside the experience of long haulers, makes it clear that the coronavirus is still exacting a heavy societal toll.
The company could have saved itself a giant headache by building in robust data record-keeping from the start, she says. Instead, it is common in the AI industry to build data sets for AI models by scraping the web indiscriminately and then outsourcing the work of removing duplicates or irrelevant data points, filtering unwanted things, and fixing typos. These methods, and the sheer size of the data set, mean tech companies tend to have a very limited understanding of what has gone into training their models.
she really gets it. Lots of interesting thoughts
“Scheduling compute workloads to chase green energy can be counter-productive” — Adrian Cockroft:
I suggest that the best policy is to optimize your workloads so that they can run on fewer more highly utilized instances, minimize your total footprint in Asia where possible, and to use the spot market price as a guide for when to run workloads.
TIL you can run ethernet over coax at 2.5gbps. Long gone are the days of vampire taps and 10BASE2
Good roundup from Simon Willison on this brave new world of exploits. ‘Any time you see anyone demonstrating a new application built on top of LLMs, join me in being the squeaky wheel that asks “how are you taking prompt injection into account?”’
Jaysus this is a litany of failure.
Abstract. This paper presents a security review of the mobile apps provided by the UK’s leading banks; we focus on the connections the apps make, and the way in which TLS is used. We apply existing TLS testing methods to the apps which only find errors in legacy apps. We then go on to look at extensions of these methods and find five of the apps have serious vulnerabilities. In particular, we find an app that pins a TLS root CA certificate, but do not verify the hostname. In this case, the use of certificate pinning means that all existing test methods would miss detecting the hostname verification flaw. We also find one app that doesn’t check the certificate hostname, but bypasses proxy settings, resulting in failed detection by pentesting tools. We find that three apps load adverts over insecure connections, which could be exploited for in-app phishing attacks. Some of the apps used the users’ PIN as authentication, for which PCI guidelines require extra security, so these apps use an additional cryptographic protocol; we study the underlying protocol of one banking app in detail and show that it provides little additional protection, meaning that an active man-in-the-middle attacker can retrieve the user’s credentials, login to the bank and perform every operation the legitimate user could.See also: https://www.synopsys.com/blogs/software-security/ineffective-certificate-pinning-implementations/
Wow, DuckDB is very impressive — I had no idea it could handle SELECTs against Parquet data in S3:
A common pattern to ingest streaming data and store it in S3 is to use Kinesis Data Firehose Delivery Streams, which can write the incoming stream data as batched parquet files to S3. You can use custom S3 prefixes with it when using Lambda processing functions, but by default, you can only partition the data by the timestamp (the timestamp the event reached the Kinesis Data Stream, not the event timestamp!). So, a few common use cases for data repartitioning could include: Repartitioning the written data for the real event timestamp if it’s included in the incoming data; Repartitioning the data for other query patterns, e.g. to support query filter pushdown and optimize query speeds and costs; Aggregation of raw or preprocessed data, and storing them in an optimized manner to support analytical queries.
Couldn’t agree more with Timnit Gebru’s comments here:
What is your appeal to policymakers? What would you want Congress and regulators to do now to address the concerns you outline in the open letter? Congress needs to focus on regulating corporations and their practices, rather than playing into their hype of “powerful digital minds.” This, by design, ascribes agency to the products rather than the organizations building them. This language obfuscates the amount of data that is being collected — and the amount of worker exploitation involved with those who are labeling and supplying the datasets, and moderating model outputs. Congress needs to ensure corporations are not using people’s data without their consent, and hold them responsible for the synthetic media they produce — whether it is text or media spewing disinformation, hate speech or other types of harmful content. Regulations need to put the onus on corporations, rather than understaffed agencies. There are probably existing regulations these organizations are breaking. There are mundane “AI” systems being used daily; we just heard about another Black man being wrongfully arrested because of the use of automated facial analysis systems. But that’s not what we’re talking about, because of the hype.
This is amazing — using GPT-3.5 to convert a natural-language query into SQL applied to a specific dataset, in these examples, San Francisco city data and US public census data:
With CensusGPT, you can ask any question related to census data in natural language. These natural language questions get converted to SQL using GPT-3.5 and are then used to query the census database. Here are some examples: – Five cities with a population over 100,000 and lowest crime – 10 highest income areas in california Here is a similar example from sfGPT: – Which four neighborhoods had the most crime in San Francisco in 2021?
This is a major difference between vanilla MySQL and Amazon Aurora (and a potentially major risk!):
because Aurora MySQL primary and replica instances share a storage layer, they share a set of undo logs. This means that, for a REPEATABLE READ isolation level, the storage instance must maintain undo logs at least as far back as could be required to satisfy transactional guarantees for the primary or any read replica instance. Long-running replica transactions can negatively impact writer performance in Aurora MySQL—finally, an explanation for the incident that spawned this investigation. The same scenario plays out differently in vanilla MySQL because of its different model for undo logs. Vanilla MYSQL: there are two undo logs – one on the writer, and one on the reader. The performance impact of an operation that prevents the garbage collection of undo log records will be isolated to either the writer or the reader. Aurora MySQL: there is a single undo log that is shared between the writer and reader. The performance impact of an operation that prevents the garbage collection of undo log records will affect the entire cluster.
Comparison site for electric cars; actually has a realistic model of genuine range for each EV. Full details on charging connectors, charge curves (for charging speed), etc.
Some fascinating details of low-level Java performance optimization, particularly with JIT applied to OO method dispatch:
Programming languages like Java provide the facilities for subtyping/polymorphism as one of the ways to construct modular and reusable software. This language choice naturally comes at a price, since there is no hardware support for virtual calls, and therefore runtimes have to emulate this behavior. In many, many cases the performance of method dispatch is not important. Actually, in a vast majority of cases, the low-level performance concerns are not the real concerns. However, there are cases when method dispatch performance is important, and there you need to understand how dispatch works, what runtimes optimize for you, and what you can do to cheat and/or emulate similar behavior in your code. For example, in the course of String Compression work, we were faced with the problem of selecting the coder for a given String. The obvious and highly maintainable approach of creating a Coder interface, a few implementations, and dispatching the virtual calls over it, had met some performance problems on the very tiny benchmarks. Therefore, we needed to contemplate something better. After a few experiments, this post was born as a reference for others who might try to do the same. This post also tangentially touches the inlining of virtual calls, as the natural thing during the optimization.Discovered via this amazing commit: https://github.com/quarkusio/quarkus/commit/65dd4d43e2644db1c87726139280f9704140167c
Oof. Looks like the commercial company behind MariaDB is going south quickly:
Monty, the creator of MySQL and MariaDB founder, hasn’t been at a company meeting for over a year and a half. The relationship between Monty and the CEO, Michael Howard, is extremely rocky. At a company all-hands meeting Monty and Michael Howard were shouting at each other while up on stage in the auditorium in front of the entire staff. Monty made his position perfectly clear as he shouted his last words before he walked out: “You’re killing my fu&#@$! company!!!” Monty was subsequently voted off the board in July of 2022 solidifying the hostile takeover by Michael Howard. Buyer beware, Monty and his group of founders and database experts are no longer at the company.At least the open-source product is still trustworthy, though.
Contractors say they have a set amount of time to complete each task, like review a prompt, and the time they’re allotted for tasks can vary wildly — from as little as 60 seconds to more than several minutes. Still, raters said it’s difficult to rate a response when they are not well-versed in a topic the chatbot is talking about, including technical topics like blockchain for example. Because each assigned task represents billable time, some workers say they will complete the tasks even if they realize they cannot accurately assess the chatbot responses. “Some people are going to say that’s still 60 seconds of work, and I can’t recoup this time having sat here and figured out I don’t know enough about this, so I’m just going to give it my best guess so I can keep that pay and keep working,” one rater said.
Because the AI-enhanced virtual assistants scrape text and images off the web, they are open to a type of attack called indirect prompt injection, in which a third party alters a website by adding hidden text that is meant to change the AI’s behavior. Attackers could use social media or email to direct users to websites with these secret prompts. Once that happens, the AI system could be manipulated to let the attacker try to extract people’s credit card information, for example. Malicious actors could also send someone an email with a hidden prompt injection in it. If the receiver happened to use an AI virtual assistant, the attacker might be able to manipulate it into sending the attacker personal information from the victim’s emails, or even emailing people in the victim’s contacts list on the attacker’s behalf.
I’m not sure who is advising GitHub, but the suggestion that the unauthorized use of “publicly available data is consistent with global copyright laws” is a fantastical claim, for any number of reasons, and that’s even before addressing the ridiculous notion that machines learn “much as humans have done throughout history.”
well, looks like I won’t ever buy another HP printer
This essay is spot on about the recent AI open letter from the Future of Life Institute, asking for “a 6-month pause on training language models “more powerful than” GPT-4”:
Over 1,000 researchers, technologists, and public figures have already signed the letter. The letter raises alarm about many AI risks: “Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization?” We agree that misinformation, impact on labor, and safety are three of the main risks of AI. Unfortunately, in each case, the letter presents a speculative, futuristic risk, ignoring the version of the problem that is already harming people. It distracts from the real issues and makes it harder to address them. The letter has a containment mindset analogous to nuclear risk, but that’s a poor fit for AI. It plays right into the hands of the companies it seeks to regulate.Couldn’t agree more.
There are 18 images in the Reddit slideshow [a series of Midjourney-generated images of “selfies through history”] and they all feature the same recurring composition and facial expression. For some, this sequence of smiling faces elicits a sense of warmth and joyousness, comprising a visual narrative of some sort of shared humanity […] But what immediately jumped out at me is that these AI-generated images were beaming a secret message hidden in plain sight. A steganographic deception within the pixels, perfectly legible to your brain yet without the conscious awareness that it’s being conned. Like other AI “hallucinations,” these algorithmic extrusions were telling a made up story with a straight face — or, as the story turns out, with a lying smile. […] How we smile, when we smile, why we smile, and what it means is deeply culturally contextual.
“Social media and newspapers are flooded with myths about heat pumps. Let’s take them one by one in this post.”
Grim. This is the downside of LLM-based chatbots with ineffective guardrails against toxic output.
“Without these conversations with the chatbot, my husband would still be here,” the man’s widow has said, according to La Libre. She and her late husband were both in their thirties, lived a comfortable life and had two young children. However, about two years ago, the first signs of trouble started to appear. The man became very eco-anxious and found refuge with ELIZA, the name given to a chatbot that uses GPT-J, an open-source artificial intelligence language model developed by EleutherAI. After six weeks of intensive exchanges, he took his own life.There’s a transcript of the last conversation with the bot here: https://news.ycombinator.com/item?id=35344418 .
Adding an “Idempotency-Key:” header to HTTP to control idempotent operation on REST APIs. (via Tomasz Nurkiewicz)
Excellent thread from Dr. Michael Mina:
Ive written SARS-CoV-2 is a “textbook virus” • Textbook does NOT mean mild; • Textbook viruses kill people; • Textbook viruses harm long-term immunity; • Textbook viruses cause dizzying amounts of poorly understood debilitating problems I explain w examples here!
This is a great essay on GPT and LLMs:
Roy Amara, who died on the last day of 2007, was the president of a Palo Alto based think tank, the Institute for the future, and is credited with saying what is now known as Amara’s Law: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.” This has been a common problem with Artificial Intelligence, and indeed of all of computing. In particular, since I first became conscious of the possibility of Artificial Intelligence around 1963 (and as an eight year old proceeded to try to build my own physical and intelligent computers, and have been at it ever since), I have seen these overestimates many many times.and:
I think that GPTs will give rise to a new aphorism (where the last word might vary over an array of synonymous variations): “If you are interacting with the output of a GPT system and didn’t explicitly decide to use a GPT then you’re the product being hoodwinked.” I am not saying everything about GPTs is bad. I am saying that, especially given the explicit warnings from OpenAI, that you need to be aware that you are using an unreliable system. Using an unreliable system sounds awfully unreliable, but in August 2021 I had a revelation at TED in Monterey, California, when Chris Anderson (the TED Chris), was interviewing Greg Brockman, the Chairman of Open AI about an early version of GPT. He said that he regularly asked it questions about code he wanted to write and it very quickly gave him ideas for libraries to use, and that was enough to get him started on his project. GPT did not need to be fully accurate, just to get him into the right ballpark, much faster than without its help, and then he could take it from there. Chris Anderson (the 3D robotics one, not the TED one) has likewise opined (as have responders to some of my tweets about GPT) that using ChatGPT will get him the basic outline of a software stack, in a well tread area of capabilities, and he is many many times more productive than with out it. So there, where a smart person is in the loop, unreliable advice is better than no advice, and the advice comes much more explicitly than from carrying out a conventional search with a search engine. The opposite of useful can also occur, but again it pays to have a smart human in the loop. Here is a report from the editor of a science fiction magazine which pays contributors. He says that from late 2022 through February of 2023 the number of submissions to the magazine increased by almost two orders of magnitude, and he was able to determine that the vast majority of them were generated by chatbots. He was the person in the loop filtering out the signal he wanted, human written science fiction, from vast volumes of noise of GPT written science fiction. Why should he care? Because GPT is an auto-completer and so it is generating variations on well worked themes. But, but, but, I hear people screaming at me. With more work GPTs will be able to generate original stuff. Yes, but it will be some other sort of engine attached to them which produces that originality. No matter how big, and how many parameters, GPTs are not going to to do that themselves. When no person is in the loop to filter, tweak, or manage the flow of information GPTs will be completely bad. That will be good for people who want to manipulate others without having revealed that the vast amount of persuasive evidence they are seeing has all been made up by a GPT. It will be bad for the people being manipulated. And it will be bad if you try to connect a robot to GPT. GPTs have no understanding of the words they use, no way to connect those words, those symbols, to the real world. A robot needs to be connected to the real world and its commands need to be coherent with the real world. Classically it is known as the “symbol grounding problem”. GPT+robot is only ungrounded symbols. It would be like you hearing Klingon spoken, without any knowledge other than the Klingon sound stream (even in Star Trek you knew they had human form and it was easy to ground aspects of their world). A GPT telling a robot stuff will be just like the robot hearing Klingonese. My argument here is that GPTs might be useful, and well enough boxed, when there is an active person in the loop, but dangerous when the person in the loop doesn’t know they are supposed to be in the loop. [This will be the case for all young children.] Their intelligence, applied with strong intellect, is a key component of making any GPT be successful.
How unsurprising is this? And needless to say, a bunch of that is being reused for training:
In a recent report, data security service Cyberhaven detected and blocked requests to input data into ChatGPT from 4.2% of the 1.6 million workers at its client companies because of the risk of leaking confidential information, client data, source code, or regulated information to the LLM. In one case, an executive cut and pasted the firm’s 2023 strategy document into ChatGPT and asked it to create a PowerPoint deck. In another case, a doctor input his patient’s name and their medical condition and asked ChatGPT to craft a letter to the patient’s insurance company.
GitHub Copilot is also based on a large language model. What does indirect prompt injection do to it? Again, we demonstrate that, as long as an attacker controls part of the context window, the answer is: pretty much anything. Attackers only have to manipulate the documentation of a target package or function. As you reference and use them, this documentation is loaded into the context window based on complex and ever-changing heuristics. We show […] how importing a synthetic library can lead Copilot to introduce subtle or not-so-subtle vulnerabilities into the code generated for you.
many reports from Proxmox users across Ireland — seems there’s a bug in systemd timezone code when handling daylight savings in the Europe/Dublin timezone (which is unique because it causes “mktime moving backward for change to “summer time” […] as for them the summer time is the standard time”. (via Kiall)