Skip to content

Justin Mason's Weblog Posts

Links for 2023-05-15

  • Kafka vs Redpanda Performance

    I don’t use either service, but this is actually an excellent writeup of some high-end performance optimization on modern Linux EC2-based systems with NVMe SSDs, and the benchmarking of same

    (tags: kafka redpanda benchmarks ec2 aws ssd optimization performance ops)

  • Never Give Artificial Intelligence the Nuclear Codes

    Something new to worry about — giving an AI the keys to the nukes:

    Any country that inserts AI into its [nuclear] command and control will motivate others to follow suit, if only to maintain a credible deterrent. Michael Klare, a peace-and-world-security-studies professor at Hampshire College, has warned that if multiple countries automate launch decisions, there could be a “flash war” analogous to a Wall Street “flash crash.” Imagine that an American AI misinterprets acoustic surveillance of submarines in the South China Sea as movements presaging a nuclear attack. Its counterstrike preparations would be noticed by China’s own AI, which would actually begin to ready its launch platforms, setting off a series of escalations that would culminate in a major nuclear exchange.

    (tags: ai command-and-control nuclear-war nuclear flash-war)

Comments closed

Links for 2023-05-11

  • In defence of swap

    Common misconceptions about swap memory on Linux systems:

    Swap is a useful tool to allow equality of reclamation of memory pages, but its purpose is frequently misunderstood, leading to its negative perception across the industry. If you use swap in the spirit intended, though – as a method of increasing equality of reclamation – you’ll find that it’s a useful tool instead of a hindrance. Disabling swap does not prevent disk I/O from becoming a problem under memory contention, it simply shifts the disk I/O thrashing from anonymous pages to file pages. Not only may this be less efficient, as we have a smaller pool of pages to select from for reclaim, but it may also contribute to getting into this high contention state in the first place.
    (via valen)

    (tags: linux memory performance swap vm oom)

  • Solar Quote Analyser

    handy web tool to figure out if a quote for a domestic solar PV install in Ireland is cheap, on the money, or too pricey

    (tags: quotes solar-pv solar home money finance)

  • Coinbase spent $65M on Datadog

    in one year — Sixty. Five. Million. Dollars.

    (tags: datadog saas coinbase fail lol money)

Comments closed

Links for 2023-05-08

Comments closed

Links for 2023-05-05

  • Will A.I. Become the New McKinsey?

    Great stuff from Ted Chiang:

    A former McKinsey employee has described the company as “capital’s willing executioners”: if you want something done but don’t want to get your hands dirty, McKinsey will do it for you. That escape from accountability is one of the most valuable services that management consultancies provide. Bosses have certain goals, but don’t want to be blamed for doing what’s necessary to achieve those goals; by hiring consultants, management can say that they were just following independent, expert advice. Even in its current rudimentary form, A.I. has become a way for a company to evade responsibility by saying that it’s just doing what “the algorithm” says, even though it was the company that commissioned the algorithm in the first place. The question we should be asking is: as A.I. becomes more powerful and flexible, is there any way to keep it from being another version of McKinsey?

    (tags: ai capitalism mckinsey future politics ted-chiang)

Comments closed

Links for 2023-05-03

  • The Wide Angle: Understanding TESCREAL — Silicon Valley’s Rightward Turn

    As you encounter these ideologies [Transhumanism, Extropianism, Singularitarianism, Cosmism, Rationalism, Effective Altruism, and Longtermism] in the wild, you might use the TESCREAL lens, and its alignment with Eurasianism and Putin’s agenda, to evaluate them, and ask whether they tend to undermine or enhance the project of liberal democracy. TESCREAL ideologies tend to advance an illiberal agenda and authoritarian tendencies, and it’s worth turning a very critical eye towards them, especially in cases where that’s demonstrably true. Clearly there are countless well-meaning people trying to use technology and reason to improve the world, but that should never come at the expense of democratic, inclusive, fair, patient, and just governance. The biggest risk AI poses right now is that alarmists will use the fears surrounding it as a cudgel to enact sweeping policy reforms. We should resist those efforts. Now more than ever, we should be guided by expertise, facts, and evidence as we seek to use technology in ways that benefit everyone.

    (tags: ideology future tescreal ea longtermism ai politics silicon-valley)

  • heightened risk of autoimmune diseases after Covid

    More evidence of a “substantially increased risk of developing a diverse spectrum of new-onset autoimmune diseases”:

    Previously we knew there were many features of autoimmunity engendered by Covid, but the link to manifesting important autoimmune diseases has not been established. There are still many dots not connected—it’s fuzzy. We need to better understand how the dysregulation of our immune system that can occur from a Covid infection (or even more rarely from a vaccine) can be linked with a serious autoimmune condition. While we’ve fully recognized that people with autoimmune diseases are more vulnerable to Covid and adverse outcomes, the flip of that — that Covid can make some people vulnerable to autoimmune diseases — is what’s new.
    (from the always excellent Eric Topol.)

    (tags: covid-19 long-covid pasc autoimmune diseases health medicine research eric-topol)

Comments closed

Links for 2023-05-02

  • In a small study, an AI ‘brain decoder’ inches toward reading minds

    In a new Nature Neuroscience paper published Monday, Huth and a team of researchers from the University of Texas at Austin introduced a new “brain decoder” enabled by GPT-1, an earlier version of the artificial neural network technology that underpins ChatGPT. After digesting several hours of training data, the new tool was able to describe the gist of stories the three participants in the proof-of-concept experiment listened to — just by looking at their functional MRI scans.
    Very cool stuff. And I am happy to see the ethical considerations have been considered:
    “It is important to constantly evaluate what the implications are of new brain decoders for mental privacy,” said Jerry Tang, a Ph.D. candidate in Huth’s lab and lead author on the paper, in a press briefing. In devising ways to protect privacy, the authors asked participants to try to prevent the decoder from reconstructing the words they were hearing several different ways. Particularly effective methods included mentally listing off animals, and telling a different story at the same time the podcast was playing were particularly effective at stopping the decoder, said Tang. The authors also found that the decoder had to be trained on each subject’s data and wasn’t effective when used on another person. Between these findings and the fact that any movement would make the fMRI scans worse, the authors concluded that it’s not currently possible for a brain decoder to be used on someone against their will.

    (tags: fmri scanning brain mri mindreading gpt podcasts)

Comments closed

Links for 2023-04-28

  • Inside LAION

    “A High School Teacher’s Free Image Database Powers AI Unicorns”:

    To build LAION, founders scraped visual data from companies such as Pinterest, Shopify and Amazon Web Services — which did not comment on whether LAION’s use of their content violates their terms of service — as well as YouTube thumbnails, images from portfolio platforms like DeviantArt and EyeEm, photos from government websites including the US Department of Defense, and content from news sites such as The Daily Mail and The Sun. If you ask Schuhmann, he says that anything freely available online is fair game. But there is currently no AI regulation in the European Union, and the forthcoming AI Act, whose language will be finalized early this summer, will not rule on whether copyrighted materials can be included in big data sets. Rather, lawmakers are discussing whether to include a provision requiring the companies behind AI generators to disclose what materials went into the data sets their products were trained on, thus giving the creators of those materials the option of taking action. […] “It has become a tradition within the field to just assume you don’t need consent or you don’t need to inform people, or they don’t even have to be aware of it. There is a sense of entitlement that whatever is on the web, you can just crawl it and put it in a data set,” said Abeba Birhane, a Senior Fellow in Trustworthy AI at Mozilla Foundation.

    (tags: consent opt-in web ai ml laion training-data scraping)

  • Ask HN: Most interesting tech you built for just yourself?

    Fantastic thread of hackers scratching their own itch (via SimonW)

    (tags: via:simonw hacking projects hn hacks open-source)

  • informative Twitter thread on the LessWrong/rationalist/”AI risk”/effective altruism cult

    “some people understand immediately when i try to explain what it was like to be fully in the grip of the yudkowskian AI risk stuff and some people it doesn’t seem to land at all, which is probably good for them and i wish i had been so lucky”. Bananas…

    (tags: cults ai-risk yudkowski future rokos-basilisk lesswrong effective-altruism)

Comments closed

Links for 2023-04-27

  • Introducing VirusTotal Code Insight: Empowering threat analysis with generative AI

    Impressively, when these models are trained on programming languages, they can adeptly transform code into natural language explanations. […] Code Insight is a new feature based on Sec-PaLM, one of the generative AI models hosted on Google Cloud AI. What sets this functionality apart is its ability to generate natural language summaries from the point of view of an AI collaborator specialized in cybersecurity and malware. This provides security professionals and analysts with a powerful tool to figure out what the code is up to.  At present, this new functionality is deployed to analyze a subset of PowerShell files uploaded to VirusTotal. The system excludes files that are highly similar to those previously processed, as well as files that are excessively large. This approach allows for the efficient use of analysis resources, ensuring that only the most relevant files (such as PS1 files) are subjected to scrutiny. In the coming days, additional file formats will be added to the list of supported files, broadening the scope of this functionality even further.
    (via Julie on ITC Slack)

    (tags: virustotal analysis malware code reverse-engineering infosec security)

  • How Philly Cheesesteaks Became a Big Deal in Lahore, Pakistan

    This is fascinating history:

    An establishment with a legacy such as [The Lahore Gymkhana Club, founded in 1878 under British rule] needed to continue revamping itself and serve exclusive dishes for its high-end clientele. And the club, along with restaurants aspiring to serve continental food, was bolstered by a growing taste for a new ingredient in town: processed cheese. “Sandwiches gradually started becoming popular in the 1980s because of the [wider] availability of cheese and mushrooms,” says Chaudhry. Until the 1980s, processed cheese was largely imported, and its use was limited to the rich, who would frequent establishments such as the Gymkhana. As Lahori taste buds adapted to and appreciated cheese, production was initiated locally. Demand for cheeseburgers and sandwiches skyrocketed in the 1990s, with a growing number of Pakistanis who’d traveled to the U.S. aspiring to re-create offerings from various popular American chains. One of these is exceptionally familiar. Even today, online food groups in Pakistan are peppered with people asking the community where they can find a cheese­steak in Lahore “like the one at Pat’s.” Many of them post images of the cheese­steaks from the original shop at 9th and Passyunk.

    (tags: food cheesesteaks philadelphia history pakistan lahore sandwiches)

Comments closed

Links for 2023-04-26

  • “Nothing like this will be built again”

    Charlie Stross visits the Advanced Gas-cooled Reactors at Torness nuclear power station:

    The AGRs at Torness [in the UK] are not ordinary civil [nuclear] power reactors. Designed in the 1970’s, they were the UK’s bid to build an export-earning civil nuclear power system. They’re sensitive thoroughbreds, able to reach a peak conversion efficiency of 43% — that is, able to turn up to 43% of their energy output into electricity. By comparison, a PWR peaks at 31-32%. However, the PWRs have won the race for commercial success: they’re much, much, simpler. AGRs are like Concorde — technological marvels, extremely sophisticated and efficient, and just too damned expensive and complex for their own good. (You want complexity? Torness was opened in 1989. For many years thereafter, its roughly fifty thousand kilometres of aluminium plumbing made it the most complex and demanding piece of pipework in Europe. You want size? The multi-thousand ton reactor core of an AGR is bigger than the entire plant at some PWR installations.) It’s a weird experience, crawling over the guts of one of the marvels of the atomic age, smelling the thing (mostly machine oil and steam, and a hint of ozone near the transformers), all the while knowing that although it’s one of the safest and most energy-efficient civilian power reactors ever built it’s a a technological dead-end, that there won’t be any more of them, and that when it shuts down in thirty or forty years’ time this colossal collision between space age physics and victorian plumbing will be relegated to a footnote in the history books. “Energy too cheap to meter” it ain’t, but as a symbol of what we can achieve through engineering it’s hard to beat.

    (tags: engineering nuclear-power agr history uk torness power plumbing)

  • The Toronto Recursive History Project

    “This plaque was commemorated on October 10, 2018, commemorate its own commemoration. Plaques like this one are an integral part of the campaign to support more plaques like this one. By reading this plaque, you have made a valuable addition to the number of people who have read this plaque. To this day and up to the end of this sentence, this plaque continues to be read by people like yourself. Heritage Toronto 2018”

    (tags: heritage toronto recursive plaque commemoration funny)

  • Palantir Demos AI to Fight Wars But Says It Will Be Totally Ethical Don’t Worry About It

    This is a really atrocious idea:

    Palantir also isn’t selling a military-specific AI or large language model (LLM) here, it’s offering to integrate existing systems into a controlled environment. The AIP demo shows the software supporting different open-source LLMs, including  FLAN-T5 XL, a fine-tuned version of GPT-NeoX-20B, and Dolly-v2-12b, as well as several custom plug-ins. Even fine-tuned AI systems off the shelf have plenty of known issues that could make asking them what to do in a warzone a nightmare. For example, they’re prone to simply making things up, or “hallucinating.” GPT-NeoX-20B in particular is an open-source alternative to GPT-3, a previous version of OpenAI’s language model, created by a startup called EleutherAI. One of EleutherAI’s open-source models — fine-tuned by another startup called Chai — recently convinced a Belgian man who spoke to it for six weeks to kill himself.  What Palantir is offering is the illusion of safety and control for the Pentagon as it begins to adopt AI. […] What AIP does not do is walk through how it plans to deal with the various pernicious problems of LLMs and what the consequences might be in a military context. AIP does not appear to offer solutions to those problems beyond “frameworks” and “guardrails” it promises will make the use of military AI “ethical” and “legal.”

    (tags: palantir grim-meathook-future war llm aip military ai ethics)

Comments closed

Links for 2023-04-25

  • Silence Isn’t Consent

    More on yesterday’s img2dataset failure to support opt-in:

    It isn’t “effective altruism” if you have to force people to comply with you.

    (tags: img2dataset ai scraping web consent opt-in)

  • Google Launched Bard Despite Major Ethical Concerns From Its Employees

    “The staffers who are responsible for the safety and ethical implications of new products have been told not to get in the way or to try to kill any of the generative AI tools in development,” employees told Bloomberg. The ethics team is now “disempowered and demoralized,” according to former and current staffers. Before OpenAI launched ChatGPT in November 2022, Google’s approach to AI was more cautious and less consumer-facing, often working in the background of tools like Search and Maps. But since ChatGPT’s enormous popularity prompted a “code red” from executives, Google’s threshold for safe product releases has been lowered in an effort to keep up with its AI competitors.

    (tags: google ai safety chatgpt bard corporate-responsibility)

Comments closed

Links for 2023-04-24

  • Shitty behaviour around the img2dataset AI scraper

    The author of this popular AI training data scraping tool doesn’t seem to understand consent and opt-in:

    Letting a small minority [ie web publishers] prevent the large majority [AI users] from sharing their images and from having the benefit of last gen AI tool would definitely be unethical yes. Consent is obviously not unethical. You can give your consent for anything if you wish. It seems you’re trying to decide for million of other people without asking them for their consent.
    In other words, “scraping your content without opt-in is better than denying access to your content for millions of potential future AI users”. An issue to implement robots.txt support has been languishing since 2021. Good arguments for blocking the img2dataset user agent in general…

    (tags: opt-in consent ai ml bad-behaviour scraping robots)

  • Why is British media so transphobic?

    Aside from the weirdness of Mumsnet, I didn’t know about the influence of the mid-2000s skeptics movement:

    While claiming to be the country’s foremost critical thinkers, the group was riddled with anti-humanities bias and a fetish for a certain kind of “science” that it held to reveal a set of immutable principles upon which the world was built with almost no regard whatsoever for interpretative analysis based on social or historical factors. Part of this mode of thinking was an especially reductivist biologism: the idea that there are immutable realities to be found in our DNA, and if we just paid enough attention to Science and stopped trying to split hairs and discover meaning over in the superfluous disciplines of the humanities, then everything would be much simpler. It’s precisely this kind of biological essentialism — which skirts dangerously close to eugenics — that leads people to think they can “debunk” a person’s claim to their gender identity, or that it should be subjected to rigorous testing by someone in a lab coat before we can believe the subject is who they say they are.

    (tags: debunking scepticism skeptics history terfs uk uk-politics gender)

Comments closed

Links for 2023-04-20

  • Long COVID Is Being Erased — Again – The Atlantic

    Ed Yong is back writing again!

    Most Americans simply aren’t thinking about COVID with the same acuity they once did; the White House long ago zeroed in on hospitalizations and deaths as the measures to worry most about. And what was once outright denial of long COVID’s existence has morphed into something subtler: a creeping conviction, seeded by academics and journalists and now common on social media, that long COVID is less common and severe than it has been portrayed—a tragedy for a small group of very sick people, but not a cause for societal concern. This line of thinking points to the absence of disability claims, the inconsistency of biochemical signatures, and the relatively small proportion of severe cases as evidence that long COVID has been overblown. “There’s a shift from ‘Is it real?’ to ‘It is real, but …,’” Lekshmi Santhosh, the medical director of a long-COVID clinic at UC San Francisco, told me. Yet long COVID is a substantial and ongoing crisis—one that affects millions of people. However inconvenient that fact might be to the current “mission accomplished” rhetoric, the accumulated evidence, alongside the experience of long haulers, makes it clear that the coronavirus is still exacting a heavy societal toll.

    (tags: long-covid ed-yong covid-19 health medicine society healthcare)

  • OpenAI’s hunger for data is coming back to bite it

    Spot on:

    The company could have saved itself a giant headache by building in robust data record-keeping from the start, she says. Instead, it is common in the AI industry to build data sets for AI models by scraping the web indiscriminately and then outsourcing the work of removing duplicates or irrelevant data points, filtering unwanted things, and fixing typos. These methods, and the sheer size of the data set, mean tech companies tend to have a very limited understanding of what has gone into training their models. 

    (tags: training data provenance ai ml common-crawl openai chatgpt data-protection privacy)

  • Holly Herndon on AI music

    she really gets it. Lots of interesting thoughts

    (tags: holly-herndon ai music ml future tech sampling spawning)

Comments closed

Links for 2023-04-17

Comments closed

Links for 2023-04-14

  • “Why Banker Bob (still) Can’t Get TLS Right: A Security Analysis of TLS in Leading UK Banking Apps”

    Jaysus this is a litany of failure.

    Abstract. This paper presents a security review of the mobile apps provided by the UK’s leading banks; we focus on the connections the apps make, and the way in which TLS is used. We apply existing TLS testing methods to the apps which only find errors in legacy apps. We then go on to look at extensions of these methods and find five of the apps have serious vulnerabilities. In particular, we find an app that pins a TLS root CA certificate, but do not verify the hostname. In this case, the use of certificate pinning means that all existing test methods would miss detecting the hostname verification flaw. We also find one app that doesn’t check the certificate hostname, but bypasses proxy settings, resulting in failed detection by pentesting tools. We find that three apps load adverts over insecure connections, which could be exploited for in-app phishing attacks. Some of the apps used the users’ PIN as authentication, for which PCI guidelines require extra security, so these apps use an additional cryptographic protocol; we study the underlying protocol of one banking app in detail and show that it provides little additional protection, meaning that an active man-in-the-middle attacker can retrieve the user’s credentials, login to the bank and perform every operation the legitimate user could.
    See also: https://www.synopsys.com/blogs/software-security/ineffective-certificate-pinning-implementations/

    (tags: ssl tls certificates certificate-pinning security infosec banking apps uk pci mobile)

  • Using DuckDB to repartition parquet data in S3

    Wow, DuckDB is very impressive — I had no idea it could handle SELECTs against Parquet data in S3:

    A common pattern to ingest streaming data and store it in S3 is to use Kinesis Data Firehose Delivery Streams, which can write the incoming stream data as batched parquet files to S3. You can use custom S3 prefixes with it when using Lambda processing functions, but by default, you can only partition the data by the timestamp (the timestamp the event reached the Kinesis Data Stream, not the event timestamp!). So, a few common use cases for data repartitioning could include: Repartitioning the written data for the real event timestamp if it’s included in the incoming data; Repartitioning the data for other query patterns, e.g. to support query filter pushdown and optimize query speeds and costs; Aggregation of raw or preprocessed data, and storing them in an optimized manner to support analytical queries.

    (tags: duckdb repartitioning s3 parquet orc hive kinesis firehose)

  • Timnit Gebru’s anti-‘AI pause’

    Couldn’t agree more with Timnit Gebru’s comments here:

    What is your appeal to policymakers? What would you want Congress and regulators to do now to address the concerns you outline in the open letter? Congress needs to focus on regulating corporations and their practices, rather than playing into their hype of “powerful digital minds.” This, by design, ascribes agency to the products rather than the organizations building them. This language obfuscates the amount of data that is being collected — and the amount of worker exploitation involved with those who are labeling and supplying the datasets, and moderating model outputs. Congress needs to ensure corporations are not using people’s data without their consent, and hold them responsible for the synthetic media they produce — whether it is text or media spewing disinformation, hate speech or other types of harmful content. Regulations need to put the onus on corporations, rather than understaffed agencies. There are probably existing regulations these organizations are breaking. There are mundane “AI” systems being used daily; we just heard about another Black man being wrongfully arrested because of the use of automated facial analysis systems. But that’s not what we’re talking about, because of the hype.

    (tags: data privacy ai ml openai monopoly)

Comments closed

Links for 2023-04-13

  • caesarHQ/textSQL

    This is amazing — using GPT-3.5 to convert a natural-language query into SQL applied to a specific dataset, in these examples, San Francisco city data and US public census data:

    With CensusGPT, you can ask any question related to census data in natural language. These natural language questions get converted to SQL using GPT-3.5 and are then used to query the census database. Here are some examples: – Five cities with a population over 100,000 and lowest crime – 10 highest income areas in california Here is a similar example from sfGPT: – Which four neighborhoods had the most crime in San Francisco in 2021?

    (tags: sfgpt censusgpt textsql natural-language gpt-3.5 sql querying search open-source)

Comments closed

Links for 2023-04-12

  • Exploring performance differences between Amazon Aurora and vanilla MySQL | Plaid

    This is a major difference between vanilla MySQL and Amazon Aurora (and a potentially major risk!):

    because Aurora MySQL primary and replica instances share a storage layer, they share a set of undo logs. This means that, for a REPEATABLE READ isolation level, the storage instance must maintain undo logs at least as far back as could be required to satisfy transactional guarantees for the primary or any read replica instance. Long-running replica transactions can negatively impact writer performance in Aurora MySQL—finally, an explanation for the incident that spawned this investigation. The same scenario plays out differently in vanilla MySQL because of its different model for undo logs. Vanilla MYSQL: there are two undo logs – one on the writer, and one on the reader. The performance impact of an operation that prevents the garbage collection of undo log records will be isolated to either the writer or the reader. Aurora MySQL: there is a single undo log that is shared between the writer and reader. The performance impact of an operation that prevents the garbage collection of undo log records will affect the entire cluster.

    (tags: aurora aws mysql performance databases isolation-levels)

Comments closed

Links for 2023-04-11

  • EV Database

    Comparison site for electric cars; actually has a realistic model of genuine range for each EV. Full details on charging connectors, charge curves (for charging speed), etc.

    (tags: ev driving cars vehicles)

  • The Black Magic of (Java) Method Dispatch

    Some fascinating details of low-level Java performance optimization, particularly with JIT applied to OO method dispatch:

    Programming languages like Java provide the facilities for subtyping/polymorphism as one of the ways to construct modular and reusable software. This language choice naturally comes at a price, since there is no hardware support for virtual calls, and therefore runtimes have to emulate this behavior. In many, many cases the performance of method dispatch is not important. Actually, in a vast majority of cases, the low-level performance concerns are not the real concerns. However, there are cases when method dispatch performance is important, and there you need to understand how dispatch works, what runtimes optimize for you, and what you can do to cheat and/or emulate similar behavior in your code. For example, in the course of String Compression work, we were faced with the problem of selecting the coder for a given String. The obvious and highly maintainable approach of creating a Coder interface, a few implementations, and dispatching the virtual calls over it, had met some performance problems on the very tiny benchmarks. Therefore, we needed to contemplate something better. After a few experiments, this post was born as a reference for others who might try to do the same. This post also tangentially touches the inlining of virtual calls, as the natural thing during the optimization.
    Discovered via this amazing commit: https://github.com/quarkusio/quarkus/commit/65dd4d43e2644db1c87726139280f9704140167c

    (tags: optimization performance java oo jit coding polymorphism)

Comments closed

Links for 2023-04-07

  • MariaDB.com is dead, long live MariaDB.org

    Oof. Looks like the commercial company behind MariaDB is going south quickly:

    Monty, the creator of MySQL and MariaDB founder, hasn’t been at a company meeting for over a year and a half. The relationship between Monty and the CEO, Michael Howard, is extremely rocky. At a company all-hands meeting Monty and Michael Howard were shouting at each other while up on stage in the auditorium in front of the entire staff. Monty made his position perfectly clear as he shouted his last words before he walked out: “You’re killing my fu&#@$! company!!!” Monty was subsequently voted off the board in July of 2022 solidifying the hostile takeover by Michael Howard. Buyer beware, Monty and his group of founders and database experts are no longer at the company.
    At least the open-source product is still trustworthy, though.

    (tags: databases storage mariadb software open-source companies)

Comments closed

Links for 2023-04-06

  • Google “raters” say they don’t have enough time to verify correct answers from Bard

    Contractors say they have a set amount of time to complete each task, like review a prompt, and the time they’re allotted for tasks can vary wildly — from as little as 60 seconds to more than several minutes. Still, raters said it’s difficult to rate a response when they are not well-versed in a topic the chatbot is talking about, including technical topics like blockchain for example.  Because each assigned task represents billable time, some workers say they will complete the tasks even if they realize they cannot accurately assess the chatbot responses.  “Some people are going to say that’s still 60 seconds of work, and I can’t recoup this time having sat here and figured out I don’t know enough about this, so I’m just going to give it my best guess so I can keep that pay and keep working,” one rater said.

    (tags: google raters contractors fact-checking verification llms bard facts)

Comments closed

Links for 2023-04-03

Comments closed

Links for 2023-04-01

Comments closed

Links for 2023-03-31

  • A misleading open letter about sci-fi AI dangers ignores the real risks

    This essay is spot on about the recent AI open letter from the Future of Life Institute, asking for “a 6-month pause on training language models “more powerful than” GPT-4”:

    Over 1,000 researchers, technologists, and public figures have already signed the letter. The letter raises alarm about many AI risks: “Should we let machines flood our information channels with propaganda and untruth? Should we automate away all the jobs, including the fulfilling ones? Should we develop nonhuman minds that might eventually outnumber, outsmart, obsolete and replace us? Should we risk loss of control of our civilization?” We agree that misinformation, impact on labor, and safety are three of the main risks of AI. Unfortunately, in each case, the letter presents a speculative, futuristic risk, ignoring the version of the problem that is already harming people. It distracts from the real issues and makes it harder to address them. The letter has a containment mindset analogous to nuclear risk, but that’s a poor fit for AI. It plays right into the hands of the companies it seeks to regulate.
    Couldn’t agree more.

    (tags: ai scifi future risks gpt-4 regulation)

Comments closed

Links for 2023-03-30

  • AI and the American Smile. How AI misrepresents culture through a facial expression

    There are 18 images in the Reddit slideshow [a series of Midjourney-generated images of “selfies through history”] and they all feature the same recurring composition and facial expression. For some, this sequence of smiling faces elicits a sense of warmth and joyousness, comprising a visual narrative of some sort of shared humanity […] But what immediately jumped out at me is that these AI-generated images were beaming a secret message hidden in plain sight. A steganographic deception within the pixels, perfectly legible to your brain yet without the conscious awareness that it’s being conned. Like other AI “hallucinations,” these algorithmic extrusions were telling a made up story with a straight face — or, as the story turns out, with a lying smile. […] How we smile, when we smile, why we smile, and what it means is deeply culturally contextual.

    (tags: ai america culture photography midjourney smiling smiles context history)

  • Heat pump myths

    “Social media and newspapers are flooded with myths about heat pumps. Let’s take them one by one in this post.”

    (tags: myths mythbusting heat-pumps heating house home)

  • Belgian man dies by suicide following exchanges with chatbot

    Grim. This is the downside of LLM-based chatbots with ineffective guardrails against toxic output.

    “Without these conversations with the chatbot, my husband would still be here,” the man’s widow has said, according to La Libre. She and her late husband were both in their thirties, lived a comfortable life and had two young children. However, about two years ago, the first signs of trouble started to appear. The man became very eco-anxious and found refuge with ELIZA, the name given to a chatbot that uses GPT-J, an open-source artificial intelligence language model developed by EleutherAI. After six weeks of intensive exchanges, he took his own life.
    There’s a transcript of the last conversation with the bot here: https://news.ycombinator.com/item?id=35344418 .

    (tags: bots chatbots ai gpt gpt-j grim future grim-meathook-future)

Comments closed

Links for 2023-03-28

Comments closed

Links for 2023-03-27

  • What Will Transformers Transform? – Rodney Brooks

    This is a great essay on GPT and LLMs:

    Roy Amara, who died on the last day of 2007, was the president of a Palo Alto based think tank, the Institute for the future, and is credited with saying what is now known as Amara’s Law: “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.” This has been a common problem with Artificial Intelligence, and indeed of all of computing. In particular, since I first became conscious of the possibility of Artificial Intelligence around 1963 (and as an eight year old proceeded to try to build my own physical and intelligent computers, and have been at it ever since), I have seen these overestimates many many times.
    and:
    I think that GPTs will give rise to a new aphorism (where the last word might vary over an array of synonymous variations): “If you are interacting with the output of a GPT system and didn’t explicitly decide to use a GPT then you’re the product being hoodwinked.” I am not saying everything about GPTs is bad. I am saying that, especially given the explicit warnings from OpenAI, that you need to be aware that you are using an unreliable system. Using an unreliable system sounds awfully unreliable, but in August 2021 I had a revelation at TED in Monterey, California, when Chris Anderson (the TED Chris), was interviewing Greg Brockman, the Chairman of Open AI about an early version of GPT. He said that he regularly asked it questions about code he wanted to write and it very quickly gave him ideas for libraries to use, and that was enough to get him started on his project. GPT did not need to be fully accurate, just to get him into the right ballpark, much faster than without its help, and then he could take it from there. Chris Anderson (the 3D robotics one, not the TED one) has likewise opined (as have responders to some of my tweets about GPT) that using ChatGPT will get him the basic outline of a software stack, in a well tread area of capabilities, and he is many many times more productive than with out it. So there, where a smart person is in the loop, unreliable advice is better than no advice, and the advice comes much more explicitly than from carrying out a conventional search with a search engine. The opposite of useful can also occur, but again it pays to have a smart human in the loop. Here is a report from the editor of a science fiction magazine which pays contributors. He says that from late 2022 through February of 2023 the number of submissions to the magazine increased by almost two orders of magnitude, and he was able to determine that the vast majority of them were generated by chatbots. He was the person in the loop filtering out the signal he wanted, human written science fiction, from vast volumes of noise of GPT written science fiction. Why should he care? Because GPT is an auto-completer and so it is generating variations on well worked themes. But, but, but, I hear people screaming at me. With more work GPTs will be able to generate original stuff. Yes, but it will be some other sort of engine attached to them which produces that originality. No matter how big, and how many parameters, GPTs are not going to to do that themselves. When no person is in the loop to filter, tweak, or manage the flow of information GPTs will be completely bad. That will be good for people who want to manipulate others without having revealed that the vast amount of persuasive evidence they are seeing has all been made up by a GPT. It will be bad for the people being manipulated. And it will be bad if you try to connect a robot to GPT. GPTs have no understanding of the words they use, no way to connect those words, those symbols, to the real world. A robot needs to be connected to the real world and its commands need to be coherent with the real world. Classically it is known as the “symbol grounding problem”. GPT+robot is only ungrounded symbols. It would be like you hearing Klingon spoken, without any knowledge other than the Klingon sound stream (even in Star Trek you knew they had human form and it was easy to ground aspects of their world). A GPT telling a robot stuff will be just like the robot hearing Klingonese. My argument here is that GPTs might be useful, and well enough boxed, when there is an active person in the loop, but dangerous when the person in the loop doesn’t know they are supposed to be in the loop. [This will be the case for all young children.] Their intelligence, applied with strong intellect, is a key component of making any GPT be successful.

    (tags: gpts rodney-brooks ai ml amaras-law hype technology llms future)

  • Employees Are Feeding Sensitive Business Data to ChatGPT

    How unsurprising is this? And needless to say, a bunch of that is being reused for training:

    In a recent report, data security service Cyberhaven detected and blocked requests to input data into ChatGPT from 4.2% of the 1.6 million workers at its client companies because of the risk of leaking confidential information, client data, source code, or regulated information to the LLM.  In one case, an executive cut and pasted the firm’s 2023 strategy document into ChatGPT and asked it to create a PowerPoint deck. In another case, a doctor input his patient’s name and their medical condition and asked ChatGPT to craft a letter to the patient’s insurance company.

    (tags: chatgpt openai ip privacy data-protection security)

  • GitHub Copilot is open to remote prompt-injection attacks

    GitHub Copilot is also based on a large language model. What does indirect prompt injection do to it? Again, we demonstrate that, as long as an attacker controls part of the context window, the answer is: pretty much anything. Attackers only have to manipulate the documentation of a target package or function. As you reference and use them, this documentation is loaded into the context window based on complex and ever-changing heuristics. We show […] how importing a synthetic library can lead Copilot to introduce subtle or not-so-subtle vulnerabilities into the code generated for you.

    (tags: injection copilot security exploits github llms chatgpt)

Comments closed

Links for 2023-03-26

Comments closed

Links for 2023-03-24

  • Google and Microsoft’s chatbots are already citing one another in a misinformation shitshow

    What we have here is an early sign we’re stumbling into a massive game of AI misinformation telephone, in which chatbots are unable to gauge reliable news sources, misread stories about themselves, and misreport on their own capabilities. In this case, the whole thing started because of a single joke comment on Hacker News. Imagine what you could do if you wanted these systems to fail. It’s a laughable situation but one with potentially serious consequences. Given the inability of AI language models to reliably sort fact from fiction, their launch online threatens to unleash a rotten trail of misinformation and mistrust across the web, a miasma that is impossible to map completely or debunk authoritatively. All because Microsoft, Google, and OpenAI have decided that market share is more important than safety.

    (tags: google ai ml microsoft openai chatgpt trust spam misinformation disinformation)

Comments closed

Links for 2023-03-23

  • Vatican flag SVG on Wikimedia Commons was incorrect for 5 years, and widely copied

    In 2017 a Wikimedia Commons user changed the inside of the tiara to red because that’s how it appears on the Vatican Coat of Arms. But this assumption turned out to be faulty, because the official flag spec sheet uses different colors than the Coat of Arms. The mistake was quickly noticed by an anonymous IP who wrote an extensive and well-researched explanation of the error on the file’s talk page. Unfortunately, nobody read it, and the mistake lived on for 5 years before another user noticed it and reverted the file.

    (tags: wikipedia wikimedia commons vatican flags oops)

  • ThumbHash

    “A very compact representation of an image placeholder. Store it inline with your data and show it while the real image is loading for a smoother loading experience.”

    (tags: graphics images webdev compression lossy thumbnails)

Comments closed

Links for 2023-03-22

  • new LFP batteries will unlock cheaper electric vehicles

    Lithium ferrous phosphate (LFP) batteries, the type to be produced at the new [Ford] plant are a lower-cost alternative to the nickel- and cobalt-containing batteries used in most electric vehicles in the US and Europe today. While the technology has grown in popularity in China, Ford’s factory, developed in partnership with the Chinese battery giant CATL, marks a milestone in the West. By cutting costs while also boosting charging speed and extending lifetime, LFP batteries could help expand EV options for drivers. 

    (tags: lfp technology ev cars batteries renewable-energy)

  • You Broke Reddit: The Pi-Day Outage : RedditEng

    Quality post-mortem writeup of last week’s Reddit outage. tl;dr: an in-place Kubernetes upgrade broke it. We use blue/green deployments — with two separate parallel k8s clusters — in order to avoid this risk, as k8s upgrades are very very risky in our experience; tiny “minor” changes often seem to cause breakage.

    (tags: k8s kubernetes outages reddit ops post-mortems)

  • Superb thread on effective AI regulation

    from Baldur Bjarnason:

    First, you clarify that for the purposes of Section 230 protection (or similar), whoever provides the AI as a service is responsible for its output as a publisher. If Bing Chat says something offensive then Microsoft would be as liable as if it were an employee; You’d set a law requiring tools that integrate generative AI to attach disclosures to the content. Gmail/Outlook should pop up a notice when you get an email that their AI generated. Word/Docs should have metadata fields and notices when you open files that have used built-in AI capabilities. AI chatbots have to disclose that they are bots. Copilot should add a machine-parsable code comment. You could always remove the metadata, but doing so would establish an intent to deceive; Finally, you’d mandate that all training data sets be made opt-in (or that all of its contents are released under a permissive license) and public. Heavy fines for non-disclosure. Heavy fines for violating opt-in. Even heavier fines for lying about your training data set. Make every AI model a “vegan” model. Remove every ethical and social concern about the provenance and rights regarding the training data.
    I think #3 in particular is the most important of all.

    (tags: ai regulation data-privacy training llm ethics)

  • Bing Chat is still vulnerable to hidden prompt injection attacks

    happily parses hidden text in webpages, acting on information there that isn’t visible to human viewers. Related: https://twitter.com/matteosonoioo/status/1630941926454185992/photo/1 , where Matteo Contrini demonstrated an attack to turn it into a scammer with prompt injection.

    (tags: bing-chat bing chatgpt openai prompt-injection exploits attacks hidden-text)

Comments closed

Links for 2023-03-20

  • Pop Culture Pulsar: Origin Story of Joy Division’s Unknown Pleasures Album Cover

    Great dig into the CP1919 pulsar signal plot that was used for “Unknown Pleasures”:

    This plotting of sequences like this, it started just a little bit earlier when we were looking at potentially drifting subpulses within the major pulse itself. So, the thought was, well, is there something like this peak here, which on the next pulse moves over here, and then moves over here, and over there. Actually, would be moving this way in that case – either way. I think Frank Drake and I published a paper in Science Magazine on exactly that issue – suggesting there might be drifting subpulses within the major pulse, which would then get back to the physics of what was causing the emission in the first place. So, then the thought was, well let’s plot out a whole array of pulses, and see if we can see particular patterns in there. So that’s why, this one was the first I did – CP1919 – and you can pick out patterns in there if you really work at it. But I think the answer is, there weren’t any that were real obvious anyway. I don’t really recall, but my bet is that the first one of these that I did, I didn’t bother to block out the stuff, and I found that it was just too confusing. So then, I wrote the program so that I would block out when a hill here was high enough, then the stuff behind it would stay hidden. And it was pretty easy to do from a computer perspective.

    (tags: design joy-division music science physics pulsars astronomy cp1919 dataviz)

  • moyix/gpt-wpre: Whole-Program Reverse Engineering with GPT-3

    This is a little toy prototype of a tool that attempts to summarize a whole binary using GPT-3 (specifically the text-davinci-003 model), based on decompiled code provided by Ghidra. However, today’s language models can only fit a small amount of text into their context window at once (4096 tokens for text-davinci-003, a couple hundred lines of code at most) — most programs (and even some functions) are too big to fit all at once. GPT-WPRE attempts to work around this by recursively creating natural language summaries of a function’s dependencies and then providing those as context for the function itself. It’s pretty neat when it works! I have tested it on exactly one program, so YMMV.

    (tags: gpt-3 reverse-engineering ghidra decompilation reversing llm)

Comments closed

Links for 2023-03-16

Comments closed

Links for 2023-03-15

  • Cat6a FTP Tool-Less Keystone Module

    For future use — CAT6A cable endpoints which don’t require tricky crimping: “no crimp tool required at all, very much worth the extra cost, and they clip into the wall sockets or a patch panel … you can do them with your fingers and a flush snips to get rid of the ends after you push the wires in” says Adam C on ITC Slack, at https://irishtechcommunity.slack.com/archives/C11BG27L2/p1678841261913069

    (tags: cat6a wiring home networking cables via:itc)

Comments closed

Links for 2023-03-14

  • Infra-Red, In Situ (IRIS) Inspection of Silicon

    Cool:

    This post introduces a technique I call “Infra-Red, In Situ” (IRIS) inspection. It is founded on two insights: first, that silicon is transparent to infra-red light; second, that a digital camera can be modified to “see” in infra-red, thus effectively “seeing through” silicon chips. We can use these insights to inspect an increasingly popular family of chip packages known as Wafer Level Chip Scale Packages (WLCSPs) by shining infrared light through the back side of the package and detecting reflections from the lowest layers of metal using a digital camera. This technique works even after the chip has been assembled into a finished product. However, the resolution of the imaging method is limited to micron-scale features.

    (tags: electronics hardware reversing bunnie-huang infrared x-ray-vision silicon)

Comments closed