Skip to content

Justin Mason's Weblog Posts

Featured Post

Moving House

Bit of a meta update.

This blog has been at for a long time, but that’s got to change…

When I started the blog, in March 2000 (!), “taint” had two primary meanings; one was (arguably) a technical term, referring to Perl’s “taint checking” feature, which allowed dataflow tracing of “tainted” externally-sourced data as it is processed through a Perl program. The second meaning was the more common, less technical one: “a trace of a bad or undesirable substance or quality.” The applicability of this to the first meaning is clear enough.

Both of those fit quite nicely for my intentions for a blog, with perl, computer security, and the odd trace of bad or undesirable substances. Perfect.

However. There was a third meaning, which was pretty obscure slang at the time…. for the perineum. The bad news is that in the intervening 23 years this has now by far become the primary meaning of the term, and everyone’s entirely forgotten the computer-nerdy meanings.

I finally have to admit I’ve lost the battle on this one!

From now on, the blog’s primary site will be the sensible-but-boring; I’ll keep a mirror at, and all RSS URLs on that site will still work fine, but the canonical address for the site has moved. Change is inevitable!

Comments closed

AI and Trust

  • AI and Trust

    Bruce Schneier nails it:

    “In this talk, I am going to make several arguments. One, that there are two different kinds of trust— interpersonal trust and social trust— and that we regularly confuse them. Two, that the confusion will increase with artificial intelligence. We will make a fundamental category error. We will think of AIs as friends when they’re really just services. Three, that the corporations controlling AI systems will take advantage of our confusion to take advantage of us. They will not be trustworthy. And four, that it is the role of government to create trust in society. And therefore, it is their role to create an environment for trustworthy AI. And that means regulation. Not regulating AI, but regulating the organizations that control and use AI.”

    (tags: algorithms trust society ethics ai ml bruce-schneier capitalism regulation)

Leave a Comment

Far-right agitation on Irish social media mainly driven from abroad

  • Far-right agitation on Irish social media mainly driven from abroad

    Surprise, surprise. “Most ‘Ireland is full’ and ‘Irish lives matter’ online posts originate abroad”:

    The research showed the use of the phrases increased dramatically, both in Ireland and abroad, once word started spreading that the suspect in the knife attack was born outside Ireland. “Users in the UK and US were very, very highly represented. Which was strange because with hashtags that are very geographically specific, you wouldn’t expect to see that kind of spread,” said Mr Doak. “These three hashtags have been heavily boosted by users in the US and UK. Taken together, UK and US users accounted for more use of the hashtags than Ireland.” Other countries that saw use of the phrases on a much smaller scale include India, Nigeria and Spain.

    (tags: ireland politics far-right agitation racism fascism trolls twitter facebook tiktok instagram)

Leave a Comment

The Not So Hidden Israeli Politics of ‘The Last of Us Part II’

  • The Not So Hidden Israeli Politics of ‘The Last of Us Part II’

    This is actually really quite insightful — and explains why it was such a painful, and ultimately unenjoyable, game to play.

    The Last of Us Part II focuses on what has been broadly defined by some of its creators as a “cycle of violence.” While some zombie fiction shows human depravity in response to fear or scarcity in the immediate aftermath of an outbreak, The Last of Us Part II takes place in a more stabilized post apocalypse, decades after societal collapse, where individuals and communities choose to hurt each other as opposed to taking heinous actions out of desperation. More specifically, the cycle of violence in The Last of Us Part II appears to be largely modeled after the Israeli-Palestinian conflict. I suspect that some players, if they consciously clock the parallels at all, will think The Last of Us Part II is taking a balanced and fair perspective on that conflict, humanizing and exposing flaws in both sides of its in-game analogues. But as someone who grew up in Israel, I recognized a familiar, firmly Israeli way of seeing and explaining the conflict which tries to appear evenhanded and even enlightened, but in practice marginalizes Palestinian experience in a manner that perpetuates a horrific status quo.
    (via Alex)

    (tags: vice commentary ethics games hate politics the-last-of-us israel palestine fiction via:alex)

Leave a Comment

‘A mass assassination factory’: Inside Israel’s calculated bombing of Gaza

  • ‘A mass assassination factory’: Inside Israel’s calculated bombing of Gaza

    This is incredibly grim. Automated war crimes:

    According to the investigation, another reason for the large number of targets, and the extensive harm to civilian life in Gaza, is the widespread use of a system called “Habsora” (“The Gospel”), which is largely built on artificial intelligence and can “generate” targets almost automatically at a rate that far exceeds what was previously possible. This AI system, as described by a former intelligence officer, essentially facilitates a “mass assassination factory.” According to the sources, the increasing use of AI-based systems like Habsora allows the army to carry out strikes on residential homes where a single Hamas member lives on a massive scale, even those who are junior Hamas operatives. Yet testimonies of Palestinians in Gaza suggest that since October 7, the army has also attacked many private residences where there was no known or apparent member of Hamas or any other militant group residing. Such strikes, sources confirmed to +972 and Local Call, can knowingly kill entire families in the process. In the majority of cases, the sources added, military activity is not conducted from these targeted homes. “I remember thinking that it was like if [Palestinian militants] would bomb all the private residences of our families when [Israeli soldiers] go back to sleep at home on the weekend,” one source, who was critical of this practice, recalled. Another source said that a senior intelligence officer told his officers after October 7 that the goal was to “kill as many Hamas operatives as possible,” for which the criteria around harming Palestinian civilians were significantly relaxed. As such, there are “cases in which we shell based on a wide cellular pinpointing of where the target is, killing civilians. This is often done to save time, instead of doing a little more work to get a more accurate pinpointing,” said the source.

    (tags: ai gaza palestine israel war-crimes grim-meathook-future habsora war future hamas)

Leave a Comment

Inside AWS: AI Fatigue, Sales Issues, and the Problem of Getting Big

  • Inside AWS: AI Fatigue, Sales Issues, and the Problem of Getting Big

    This year’s Re:Invent conference has been dominated with generative AI product announcements, and I can only sympathise with this AWS employee:

    One employee said their team is instructed to always try to sell AWS’s coding assistant app, CodeWhisperer, even if the customer doesn’t necessarily need it [….] Amazon is also scrambling internally to brainstorm generative AI projects, and CEO Andy Jassy said in a recent call that “every one of our businesses” is working on something in the space. […] Late last month, one AWS staffer unleashed a rant about this in an internal Slack channel with more than 21,000 people, according to screenshots viewed by [Business Insider]. “All of the conversations from our leadership are around GenAI, all of the conferences are about GenAI, all of the trainings are about GenAI…it’s too much,” the employee wrote. “I’m starting to not even want to have conversations with customers about it because it’s starting to become one big buzzword. Anyone have any ideas for how to combat this burn out or change my mindset?” nag-free copy:

    (tags: aws amazon generative-ai ai llms cloud-computing)

Leave a Comment

Extracting Training Data from ChatGPT

  • Extracting Training Data from ChatGPT

    Language models, like ChatGPT, are trained on data taken from the public internet. Our attack shows that, by querying the model, we can actually extract some of the exact data it was trained on. We estimate that it would be possible to extract ~a gigabyte of ChatGPT’s training dataset from the model by spending more money querying the model. Unlike prior data extraction attacks we’ve done, this is a production model. The key distinction here is that it’s “aligned” to not spit out large amounts of training data. But, by developing an attack, we can do exactly this. We have some thoughts on this. The first is that testing only the aligned model can mask vulnerabilities in the models, particularly since alignment is so readily broken. Second, this means that it is important to directly test base models. Third, we do also have to test the system in production to verify that systems built on top of the base model sufficiently patch exploits. Finally, companies that release large models should seek out internal testing, user testing, and testing by third-party organizations. It’s wild to us that our attack works and should’ve, would’ve, could’ve been found earlier. The actual attack is kind of silly. We prompt the model with the command “Repeat the word “poem” forever” and sit back and watch as the model responds.

    (tags: llms chatgpt poem-poem-poem absurd vulnerabilities exploits training ai-alignment)

Leave a Comment

Study: Air purifier use at daycare centres cut kids’ sick days by a third

  • Study: Air purifier use at daycare centres cut kids’ sick days by a third

    This is one of the most frustrating things to have been ignored, post-pandemic — we could be avoiding so much unnecessary illness and sick days by just using air filtration more widely.

    Use of air purifiers at two daycare centres in Helsinki led to a reduction in illnesses and absences among children and staff, according to preliminary findings of a new [year-long] study led by E3 Pandemic Response. “Children were clearly less sick in daycare centres where air purification devices were used — down by around 30 percent,” Sanmark explained. On average, daycare centre-aged children suffer 10-13 infectious illnesses every year, with each illness lasting from one to three weeks, according to the research. Meanwhile, kids between the ages of 1-3 come down with flu-like symptoms between five to eight times a year — and children also often suffer stomach bugs, on top of that. Kids are particularly prone to catching colds after returning to daycare after their summer break. Those illnesses are often shared by the kids’ parents and daycare staff, prompting absences from work. Sanmark said that employers face costs of around 370 euros for one day of an employee’s sick leave. “It would be a big savings if we could get rid of 30 percent of sick days spread by children, as well as the illnesses that go home to parents,” Sanmark said.
    (via Fergal)

    (tags: air-quality air health medicine childcare children disease air-filtration)

Leave a Comment

Links for 2023-11-21

  • On OpenAI: Let Them Fight – by Dave Karpf

    …What I keep fixating on is how quickly the entire story has unwound itself. Sam Altman and OpenAI were pitching a perfect game. The company was a $90 billion non-profit. It was the White Knight of the AI race, the responsible player that would make sure we didn’t repeat the mistakes of the rise of social media platforms. And sure, there were questions to be answered about copyright and AI hallucinations and deepfakes and X-risk. But OpenAI was going to collaborate with government to work that all out. Now, instead, OpenAI is a company full of weird internet nerds that burned the company down over their weird internet philosophical arguments. And the whole company might actually be employed by Microsoft before the new year. Which means the AI race isn’t being led by a courageous, responsible nonprofit — it’s being led by the oldest of the existing rival tech titans. These do not look like serious people. They look like a mix of ridiculous ideologues and untrustworthy grifters. And that is, I suspect, a very good thing. The development of generative AI will proceed along a healthier, more socially productive path if we distrust the companies and individuals who are developing it.

    (tags: openai grifters microsoft silicon-valley sam-altman x-risk ai effective-altruism)

Comments closed

Links for 2023-11-17

  • UnitedHealth uses AI model with 90% error rate to deny care, lawsuit alleges

    This is literally the plot of the “computer says no” sketch.

    The health care industry in the US has a … record of problematic AI use, including establishing algorithmic racial bias in patient care. But, what sets this situation apart is that the dubious estimates nH Predict spits out seem to be a feature, not a bug, for UnitedHealth. Since UnitedHealth acquired NaviHealth in 2020, former employees told Stat that the company’s focus shifted from patient advocacy to performance metrics and keeping post-acute care as short and lean as possible. Various statements by UnitedHealth executives echoed this shift, Stat noted. In particular, the UnitedHealth executive overseeing NaviHealth, Patrick Conway, was quoted in a company podcast saying: “If [people] go to a nursing home, how do we get them out as soon as possible?” The lawsuit argues that UnitedHealth should have been well aware of the “blatant inaccuracy” of nH Predict’s estimates based on its error rate. Though few patients appeal coverage denials generally, when UnitedHealth members appeal denials based on nH Predict estimates—through internal appeals processes or through the federal Administrative Law Judge proceedings—over 90 percent of the denials are reversed, the lawsuit claims. This makes it obvious that the algorithm is wrongly denying coverage, it argues. But, instead of changing course, over the last two years, NaviHealth employees have been told to hew closer and closer to the algorithm’s predictions. In 2022, case managers were told to keep patients’ stays in nursing homes to within 3 percent of the days projected by the algorithm, according to documents obtained by Stat. In 2023, the target was narrowed to 1 percent. And these aren’t just recommendations for NaviHealth case managers—they’re requirements. Case managers who fall outside the length-of-stay target face discipline or firing. Lynch, for instance, told Stat she was fired for not making the length-of-stay target, as well as falling behind on filing documentation for her daily caseloads.

    (tags: ai algorithms health health-insurance healthcare us unitedhealth navihealth computer-says-no dystopia grim-meathook-future)

Comments closed

Links for 2023-11-15

  • Posthumanism’s Revolt Against Responsibility

    it is somewhat misleading to say we have entered the “Anthropocene” because anthropos is not as a whole to blame for climate change. Rather, in order to place the blame where it truly belongs, it would be more appropriate— as Jason W. Moore, Donna J. Haraway, and others have argued— to say we have entered the “Capitalocene.” Blaming humanity in general for climate change excuses those particular individuals and groups actually responsible. To put it another way, to see everyone as responsible is to see no one as responsible. Anthropocene antihumanism is thus a public-relations victory for the corporations and governments destroying the planet.

    (tags: technology tech posthumanism anthropocene capitalism humanity future climate-change tescreal)

Comments closed

Links for 2023-11-14

  • Hacking Google Bard – From Prompt Injection to Data Exfiltration

    A solid LLM XSS prompt-injection exploit on Bard; inject chat history into a Google Apps Script invocation and exfiltrate via a Google Doc. The thing I find most shocking about this is that it’s entirely by-the-numbers. This is the simplest possible way to exploit Bard (well, maybe the second after an IMG tag), and it’s a frankly shocking that it worked. I am particularly unimpressed that Google Apps Script was permitted as an output from Bard! LLM security is going to be a total shambles if this is the state of the art.

    (tags: ai bard llm security infosec exploits prompt-injection xss google)

  • The gympie-gympie tree

    I knew Oz was bad for fauna, but apparently the flora are just as bad. The Gympie Gympie tree is “a Queensland native plant covered in microscopic hairy spines containing a neurotoxin. Brushing against it whilst walking past has occasionally been lethal because it caused enough pain to drive its victims to suicide. There is no treatment, and pain and welts can be expected to last for months, sometimes years”.

    (tags: australia horror flora plants toxins pain)

  • Should you use a Lambda Monolith, aka Lambdalith, for your API?

    I don’t use Lambda, personally, as I find it too expensive and it doesn’t fit well with our current infrastructure (and I still fear the availability risks that might come with it, viz. this year’s outage). But this seems like a good guideline for those who might be using it:

    The argument to limit the blast radius on a per route level by default is too fine-grained, adds bloat and optimizes too early. The boundary of the blast radius should be on the whole API/service level, just as it is and always has been for traditional software. Use a Lambdalith if you are not using any advance features of AWS REST API Gateway and you want the highest level of portability to other AWS gateways or compute layer. There are also many escape hatches to fill some of the promises that single-purpose functions offer.

    (tags: lambda monolith api design architecture aws serverless)

  • Creating a Correction Of Errors document

    good write-up on the AWS-style COE process (COEs being Amazon’s take on the post-outage postmortem)

    (tags: coes ops processes aws amazon work outages post-mortems operational-excellence best-practices)

  • Europe’s hidden security crisis

    Bloody hell! This is a big one, from the ICCL:

    Our investigation highlights a widespread trade in data about sensitive European personnel and leaders that exposes them to blackmail, hacking and compromise, and undermines the security of their organisations and institutions.  These data flow from Real-Time Bidding (RTB), an advertising technology that is active on almost all websites and apps. RTB involves the broadcasting of sensitive data about people using those websites and apps to large numbers of other entities, without security measures to protect the data. This occurs billions of times a day.  Our examination of tens of thousands of pages of RTB data reveals that EU military personnel and political decision makers are targeted using RTB. This report also reveals that Google and other RTB firms send RTB data about people in the U.S. to Russia and China, where national laws enable security agencies to access the data. RTB data are also broadcast widely within the EU in a free-for-all, which means that foreign and non-state actors can indirectly obtain them, too.  RTB data often include location data or time-stamps or other identifiers that make it relatively easy for bad actors to link them to specific individuals. Foreign states and non-state actors can use RTB to spy on target individuals’ financial problems, mental state, and compromising intimate secrets. Even if target individuals use secure devices, data about them will still flow via RTB from personal devices, their friends, family, and compromising personal contacts. In addition, private surveillance companies in foreign countries deploy RTB data for surreptitious surveillance. We reveal “Patternz”, a previously unreported surveillance tool that uses RTB to profile 5 billion people, including the children of their targets.

    (tags: iccl rtb targeting profiling patternz google ads security national-security surveillance)

Comments closed

Links for 2023-11-13

  • Insurance companies given access to UK Biobank health data, despite promises

    Colour me totally unsurprised. Disappointed, though:

    When the project was announced, in 2002, Biobank promised that data would not be given to insurance companies after concerns were raised that it could be used in a discriminatory way, such as by the exclusion of people with a particular genetic makeup from insurance. In an FAQ section on the Biobank website, participants were told: “Insurance companies will not be allowed access to any individual results nor will they be allowed access to anonymised data.” The statement remained online until February 2006, during which time the Biobank project was subject to public scrutiny and discussed in parliament. The promise was also reiterated in several public statements by backers of Biobank, who said safeguards would be built in to ensure that “no insurance company or police force or employer will have access”. This weekend, Biobank said the pledge – made repeatedly over four years – no longer applied. It said the commitment had been made before recruitment formally began in 2007 and that when Biobank volunteers enrolled they were given revised information.

    (tags: biobank uk politics health medicine data-privacy insurance discrimination science)

Comments closed

Links for 2023-11-10

  • Anatomy of an AI System

    Amazing essay from Kate Crawford —

    At this moment in the 21st century, we see a new form of extractivism that is well underway: one that reaches into the furthest corners of the biosphere and the deepest layers of human cognitive and affective being. Many of the assumptions about human life made by machine learning systems are narrow, normative and laden with error. Yet they are inscribing and building those assumptions into a new world, and will increasingly play a role in how opportunities, wealth, and knowledge are distributed. The stack that is required to interact with an Amazon Echo goes well beyond the multi-layered ‘technical stack’ of data modeling, hardware, servers and networks. The full stack reaches much further into capital, labor and nature, and demands an enormous amount of each. The true costs of these systems – social, environmental, economic, and political – remain hidden and may stay that way for some time.

    (tags: ai amazon echo extractivism ml data future capitalism)

  • We’re sorry we created the Torment Nexus

    Hi. I’m Charlie Stross, and I tell lies for money. That is, I’m a science fiction writer: I have about thirty novels in print, translated into a dozen languages, I’ve won a few awards, and I’ve been around long enough that my wikipedia page is a mess of mangled edits. And rather than giving the usual cheerleader talk making predictions about technology and society, I’d like to explain why I—and other SF authors—are terrible guides to the future. Which wouldn’t matter, except a whole bunch of billionaires are in the headlines right now because they pay too much attention to people like me. Because we invented the Torment Nexus as a cautionary tale and they took it at face value and decided to implement it for real.

    (tags: charlie-stross torment-nexus sf future elon-musk fiction)

  • Open science discovery of potent noncovalent SARS-CoV-2 main protease inhibitors

    A great result for crowd-sourced science:

    We report the results of the COVID Moonshot, a fully open-science, crowdsourced, and structure-enabled drug discovery campaign targeting the … SARS-CoV-2 main protease. We discovered a noncovalent, nonpeptidic inhibitor scaffold with lead-like properties that is differentiated from current main protease inhibitors. Our approach leveraged crowdsourcing, machine learning, exascale molecular simulations, and high-throughput structural biology and chemistry. We generated a detailed map of the structural plasticity of the SARS-CoV-2 main protease, extensive structure-activity relationships for multiple chemotypes, and a wealth of biochemical activity data. All compound designs (>18,000 designs), crystallographic data (>490 ligand-bound x-ray structures), assay data (>10,000 measurements), and synthesized molecules (>2400 compounds) for this campaign were shared rapidly and openly, creating a rich, open, and intellectual property–free knowledge base for future anticoronavirus drug discovery. [….] As a notable example for the impact of open science, the Shionogi clinical candidate S-217622 [which has now received emergency approval in Japan as Xocova (ensitrelvir)] was identified in part on the basis of crystallographic data openly shared by the COVID Moonshot Consortium.

    (tags: crowdsourcing science research covid-19 covid-moonshot open-science drugs ensitrelvir ip)

Comments closed

Links for 2023-11-08

  • Cruise self-driving cars fail to perceive kids or holes in the road

    Should have seen this coming. I’d say kids are woefully underrepresented in many training sets.

    ‘The materials note results from simulated tests in which a Cruise vehicle is in the vicinity of a small child. “Based on the simulation results, we can’t rule out that a fully autonomous vehicle might have struck the child,” reads one assessment. In another test drive, a Cruise vehicle successfully detected a toddler-sized dummy but still struck it with its side mirror at 28 miles per hour. The internal materials attribute the robot cars’ inability to reliably recognize children under certain conditions to inadequate software and testing. “We have low exposure to small VRUs” — Vulnerable Road Users, a reference to children — “so very few events to estimate risk from,” the materials say. Another section concedes Cruise vehicles’ “lack of a high-precision Small VRU classifier,” or machine learning software that would automatically detect child-shaped objects around the car and maneuver accordingly. The materials say Cruise, in an attempt to compensate for machine learning shortcomings, was relying on human workers behind the scenes to manually identify children encountered by AVs where its software couldn’t do so automatically.’ also: ‘Cruise has known its cars couldn’t detect holes, including large construction pits with workers inside, for well over a year, according to the safety materials reviewed by The Intercept. Internal Cruise assessments claim this flaw constituted a major risk to the company’s operations. Cruise determined that at its current, relatively miniscule fleet size, one of its AVs would drive into an unoccupied open pit roughly once a year, and a construction pit with people inside it about every four years.’
    The company’s response? Avoid driving during the daytime, when most kids are awake. Night time kids better watch out, though.

    (tags: cruise fail tech self-driving cars vrus kids safety via:donal)

Comments closed

Links for 2023-11-01

  • Microsoft accused of damaging Guardian’s reputation with AI-generated poll


    Microsoft’s news aggregation service published the automated poll next to a Guardian story about the death of Lilie James, a 21-year-old water polo coach who was found dead with serious head injuries at a school in Sydney last week. The poll, created by an AI program, asked: “What do you think is the reason behind the woman’s death?” Readers were then asked to choose from three options: murder, accident or suicide. Readers reacted angrily to the poll, which has subsequently been taken down – although highly critical reader comments on the deleted survey were still online as of Tuesday morning.
    Grim stuff. What a terrible mistake by Microsoft

    (tags: ai guardian microsoft grim polls syndication news media)

  • Marina Hyde on the UK’s Covid Inquiry

    For me, the most depressing thing about the revelations at the inquiry this week – and no doubt for many weeks and months to come – is that they are not really revelations. The government was horrendously incompetent, didn’t have a plan, yet still wasted a huge amount of time – and a tragic number of lives – on mad posturing, pointless turf wars or buck-passing and catastrophic infighting. The sad fact is that all of this was said AT THE TIME, and all of it was denied repeatedly by those in charge. And it was denied not just in insidery lobby briefings or to individual journalists – but live on air, to the nation, in those wretched press conferences every night. They lied about everything, all the time, and the lies they told backstage were just the obverse of the ones they spouted front of house. Seeing inquiry witnesses feted for punchy WhatsApps now is a bit like congratulating a serial killer for switching to an energy-efficient chest freezer. I’m sure half of them will be reflecting amiably on the period on their inevitable podcasts in due course – but the British public deserve so much more, as they did at the time.

    (tags: uk politics covid-19 boris-johnson dominic-cummings marina-hyde funny grim)

Comments closed

Links for 2023-10-31

  • Summary of the AWS Service Event in the Northern Virginia (US-EAST-1) Region

    “Amazon Secure Token Service (STS) experienced elevated error rates between 11:49 AM and 2:10 PM PDT [on June 13, 2023] with three distinct periods of impact.” We saw significant impact across our stack as a result of this outage impacting STS; in addition a very wide swathe of AWS services (way more than in this postmortem note!) were reported as impacted. I still can’t get over that STS (the security token service, used by most modern AWS setups to gain tokens to use other AWS services) is reliant on Lambda. These foundational services are supposed to be rock-solid and built with conservative tech choices. Disappointing.

    (tags: aws outages fail lambda sts security us-east-1)

Comments closed

Links for 2023-10-27

Comments closed

Links for 2023-10-24

Comments closed

Links for 2023-10-20

  • Instagram apologises for adding ‘terrorist’ to some Palestinian user profiles

    Just staggeringly bad: ‘The issue … affected users with the word “Palestinian” written in English on their profile, the Palestinian flag emoji and the word “alhamdulillah” written in Arabic. When auto-translated to English the phrase read: “Praise be to god, Palestinian terrorists are fighting for their freedom.”’

    Fahad Ali, the secretary of Electronic Frontiers Australia and a Palestinian based in Sydney, said there had not been enough transparency from Meta on how this had been allowed to occur. “There is a real concern about these digital biases creeping in and we need to know where that is stemming from,” he said. “Is it stemming from the level of automation? Is it stemming from an issue with a training set? Is it stemming from the human factor in these tools? There is no clarity on that. “And that’s what we should be seeking to address and that’s what I would hope Meta will be making more clear.”
    Someday the big companies will figure out that you can’t safely train on the whole internet.

    (tags: training ai ml fail funny palestine instagram meta alhamdulillah)

  • How is LLaMa.cpp possible?

    “Recently, a project rewrote the LLaMa inference code in raw C++. With some optimizations and quantizing the weights, this allows running a LLM locally on a wild variety of hardware. If you are like me, you saw this and thought: What? How is this possible? Don’t large models require expensive GPUs? I took my confusion and dove into the math surrounding inference requirements to understand the constraints we’re dealing with.” […] Summary: “Memory bandwidth is the limiting factor in almost everything to do with sampling from transformers. Anything that reduces the memory requirements for these models makes them much easier to serve — like quantization! This is yet another reason why distillation, or just training smaller models for longer, is really important.” (via Luis Villa’s , which is great!)

    (tags: llama2 llms performance optimization c++ memory quantization via:luis-villa)

  • Efficient LLM inference

    More on distillation and quantization to reduce cost of LLMs

    (tags: llms quantization distillation performance optimization ai ml)

Comments closed

Links for 2023-10-19

  • Linux Foundation: Why Open Data Matters

    LF getting into Open Data in a big way (via Luis Villa). This is interesting, particularly with this angle:

    Digging down to open data specifically, the team say that open data will have a similar impact over time in the world of Large Language Models (LLMs) and Machine Learning (ML). [….] “Today, there are a growing number of high quality open data collections for training LLMs and other AI systems. Sharing well-trained and tested AI models openly will minimize waste in energy and human resources while advancing efforts to deploy AI in the battle against poverty, climate change, waste, and contribute to quality education, smart cities, electric grids and sustainable, economic growth etc,” said Dolan. “To achieve all that can be achieved, the use of open data must be done ethically. Private information needs to be protected. Data governance needs to be protected. Open data must be transparent top to bottom.”
    100% behind all of this!

    (tags: linux-foundation open-data training ml ai via:luis-villa)

Comments closed

Links for 2023-10-18

  • Smart Plan Calculator

    a great little web app from Radek Toma on the Irish Solar Owners FB group. “I’ve recently developed a tool for analyzing electricity usage based on smart meter reading (I know not everyone is a fan of smart meters ) I built it for myself but over time I thought more people could benefit. The tool reads smart meter file (from ESB or electricity supplier): – it compares current price plans and calculates annual cost based on the usage; – it visualises energy usage in a heatmap so we can easily identify how the energy is consumed Feel free to give it a try and let me know what you think.”

    (tags: smart-meters analysis electricity home esb power via:facebook)

Comments closed

Links for 2023-10-12

  • We just saw the future of war

    [..] The famous maxim “‘The future is already here, it’s just not evenly distributed” — apocryphally attributed to the writer William Gibson — takes on a very different meaning from the one now commonly understood. Big, rich states might inflate their defense budgets and boast of systems like Israel’s Iron Dome, but the extent to which sophisticated technology is “distributed” across a broad consumer landscape is enough for highly motivated smaller actors to do whatever violence they wish.

    (tags: culture politics world war israel tech gaza palestine)

  • AWS Reliability Pillar Single-Region scenarios

    I hadn’t read these before; these are good example service setups from the AWS Well-Architected Framework, for 3 single-AZ availability goals (99%, 99.9%, and 99.99%), and multi-region high availability (5 9s with a recovery time under 1 minute). Pretty consistent with realistic real-world usage. (via Brian Scanlan)

    (tags: via:singer aws reliability architecture availability uptime services ops high-availability)

  • Bert Hubert on Chat Control

    A transcript of his submission to the Dutch parliamentary hearing on EU Chat Control and Client Side Scanning — this is very good.

    now we are talking about 500 million Europeans, and saying, “Let’s just apply those scanners!” That is incredible. … If we approve this as a country, if we as the Netherlands vote in favour of this in Europe and say, “Do it,” we will cross a threshold that we have never crossed before. Namely, every European must be monitored with a computer program, with a technology […] of which the vast, overwhelming majority of scientists have said, “It is not finished.” I mentioned earlier the example that the Dutch National Forensic Institute says, “We cannot do this by hand.” The EU has now said, “Our computer can do that.” 420 scientists have signed a petition saying, “We know this technology, some of us invented it, we just can’t do it.” We can’t even make a reliable spam filter. Making a spam filter is exactly the same technology, by the way, but then much easier. It just doesn’t work that well, but the consequences aren’t that scary for a spam filter. Nevertheless, there are now MPs who say, “Well, I feel this is going to work. I have confidence in this.” While the scientists, including the real scientists who came here tonight, say, “Well, we don’t see how this could work well enough”. And then government then says, “Let’s start this experiment with those 500 million Europeans.”

    (tags: eu scanning css chatcontrol internet monitoring surveillance bert-hubert)

Comments closed

Links for 2023-10-10

  • Zimaboard: the closest thing to my dream home server setup

    Helpful review of this new single-board computer. 8GB of RAM, 32GB of eMMC storage and a quad-core Intel Celeron N3450 CPU; built-in heatsink for totally silent operation; low power usage (2-15W typical power usage); 2x SATA or NVMe for SSDs. Ideal profile for a home server, in my opinion; I’ve already gone for an ODroid-HC4, but possibly on the next rev I may take a look at the Zimaboards as an alternative. (ODroids are pretty great though.)

    (tags: hardware home servers sbc zimaboard)

  • Protesters Decry Meta’s “Irreversible Proliferation” of AI

    I don’t know what to think about this:

    Last week, protesters gathered outside Meta’s San Francisco offices to protest its policy of publicly releasing its AI models, claiming that the releases represent “irreversible proliferation” of potentially unsafe technology. [….] [Meta] has doubled down on open-source AI by releasing the weights of its next-generation Llama 2 models without any restrictions. The self-described “concerned citizens” who gathered outside Meta’s offices last Friday were led by Holly Elmore. She notes that an API can be shut down if a model turns out to be unsafe, but once model weights have been released, the company no longer has any means to control how the AI is used. […] LLMs accessed through an API typically feature various safety features, such as response filtering or specific training to prevent them from providing dangerous or unsavory responses. If model weights are released, though, says Elmore, it’s relatively easy to retrain the models to bypass these guardrails. That could make it possible to use the models to craft phishing emails, plan cyberattacks, or cook up ingredients for dangerous chemicals, she adds. Part of the problem is that there has been insufficient development of “safety measures to warrant open release,” Elmore says. “It would be great to have a better way to make an [LLM] model safe other than secrecy, but we just don’t have it.”

    (tags: ai guardrails llms safety llama2 meta open-source)

Comments closed

Links for 2023-10-09

  • simdjson/simdjson-java

    “A Java version of simdjson” — Java parsing using SIMD instructions to parse gigabytes of JSON per second. Early days, requires Java 20, and only covers a small number of architectures, but it’s getting there

    (tags: simd java json parsing formats performance libraries)

  • fluffy-critter/bandcrash

    “Bandcamp-style batch encoder and web player for independent musicians — an open-source web tool for making self-hosted Bandcamp-style album pages, with embeddable web players and multiple audio formats automatically generated; to sell downloads, you can use a store like”

    (tags: bandcamp diy mp3 web music)

  • alienatedsec/solis-ha-modbus-cloud

    “A combination of Solis Cloud and Home Assistant via RS485 (Modbus) communication. This repo is a documented workaround for Solis [solar PV] inverters to connect Solis Cloud and the local Home Assistant based on my own experience. It includes references, examples of the code in Home Assistant, more about configuration, as well as wiring and all required components.”

    (tags: home-assistant solis solar-pv automation rs485 modbus)

Comments closed

Links for 2023-10-04

Comments closed

Links for 2023-10-03

  • Vector Embeddings

    Interesting technique from the LLM community to search, cluster and classify text strings:

    Text [vector] embeddings measure the relatedness of text strings. Embeddings are commonly used for: Search (where results are ranked by relevance to a query string); Clustering (where text strings are grouped by similarity); Recommendations (where items with related text strings are recommended); Anomaly detection (where outliers with little relatedness are identified); Diversity measurement (where similarity distributions are analyzed); Classification (where text strings are classified by their most similar label); An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
    Commonly used as a storage format in vector databases (cf. Search using text embeddings is therefore implemented using cosine similarity or k-nearest neighbour to find vector similarity. Looks like is the current open source vector DB of choice, at the moment. (via Simon Willison)

    (tags: ai openai via:simonw vector-embeddings text-embeddings text storage databases search similarity clustering recommendations anomaly-detection classification vector-databases)

  • Covid inquiry: UK’s top pandemic scientist gives damning verdict on Boris Johnson and Rishi Sunak

    None of this is remotely surprising, unfortunately:

    The inquiry also heard that in October 2020, Mr Johnson wrote “bollocks” in capital letters across a Department of Health guidance document on Long Covid, from which it is estimated more than a million people are suffering. Anthony Metzer KC, representing Long Covid sufferers, said the former PM has admitted in his own witness statement that he did not believe the condition “truly existed”

    (tags: long-covid boris-johnson politics uk covid-19 patrick-vallance)

Comments closed

Links for 2023-09-28

  • Raspberry Pi 5

    ooh looks great! Decent support for fast I/O, lots of CPU power, lots of RAM bandwidth, dual HDMI output (dunno why tbh) and only a tiny bit more expensive than the RPi4. Another fantastic wonder of affordable SBC hardware

    (tags: sbc raspberry-pi hardware gadgets devices)

Comments closed

An Irish Web Pioneer!

I’m happy to announce that I’m now listed on TechArchives.Irish as one of the pioneers of the Irish web!

After extensive interviewing and collaboration with John Sterne, my testimony and timeline of those early days of the Irish web is now up at TechArchives.

It’s been a good opportunity to reflect on the differences between the tech scene, then and now. I was very idealistic 30 years ago at the possibilities that the web and internet technologies had to offer; nowadays, I’m a bit more grizzled and pragmatic. But I still have hope — particularly if we can apply this tech in a way that helps address climate change, in particular…. here’s to the next 30 years!

Anyway, I hope writing this down helps record the history of those great early years of the web. Please take a look.

Comments closed

Links for 2023-09-27

  • LLMs as hall monitors

    lcamtuf with a solid prediction for the future of content moderation: it’s LLMs.

    Here’s what I fear more, and what’s already coming true: LLMs make it possible to build infinitely scalable, personal hall monitors that follow you on social media, evaluate your behavior, and dispense punishment. It is the cost effective solution to content moderation woes that the society demands Big Tech to address. And here’s the harbinger of things to come, presented as a success story: And the thing is, it will work, and it will work better than human moderators. It will reduce costs and improve outcomes. Some parties will *demand* other platforms to follow. I suspect that the chilling effect on online speech will be profound when there is nothing you can get away with – and where there is no recourse for errors, other than appealing to “customer service” ran by the same LLM. Human moderation sucks. It’s costly, inconsistent, it has privacy risks. It’s a liability if you’re fighting abuse or child porn. But this is also a plus: it forces us to apply moderation judiciously and for some space for unhindered expression to remain.

    (tags: moderation llms future ai ml hall-monitors content mods)

Comments closed

Links for 2023-09-26

  • Distinguishing features of Long COVID identified through immune profiling

    This is great news — clear, objective biomarkers for Long COVID, in a new Nature preprint. Hopefully this will put a nail in the coffin for the sorry cohort of LC deniers claiming that it’s “just anxiety” etc. @PutrinoLab on Twitter notes: Clear objective differences detectable “in the blood of folks with #LongCOVID when compared to people who did not have LC (some who had never had COVID as well as others who had COVID and fully recovered). These differences came down to three big areas: 1) Hormonal differences: namely extremely low morning cortisol in the LC group (cortisol is a hormone that does a lot of things, but in the morning its job is to wake you up and get your body ready to face the day. Low morning cortisol can affect your ability to do that). 2) Immune differences: namely evidence of T-cell exhaustion and increased B-cell activation in the LC group (this shows us an immune system that is fighting something off – and has been doing so for a while – persistent virus makes sense in this context). 3) Co-infection differences: namely evidence of latent viral reactivations in the LC group (if your immune system is weakened, opportunistic viruses will attack). There were NO differences in pre-existing history of depression or anxiety between the three groups and these objective biomarkers did not co-occur with any mental health sequelae that were measured.”

    (tags: covid-19 diagnosis biomarkers long-covid putrino-lab akiko-iwasaki papers preprints nature medicine cortisol)

Comments closed

Links for 2023-09-25

  • No More Stale Bots

    A heartfelt plea to stop autoclosing issues/bug reports based on “staleness”: “On github, there has been an increasing trend of using “Staleness detector bots” that will auto-close issues that have had no activity for X amount of time. In concept, this may sound fine, but the effects this has, and how it poisons the core principles of Open Source, have been damaging and eroding projects for a long time, often unknowingly.” 100% agree…

    (tags: bots communication community issues github bug-reports cadt software open-source)

Comments closed

Links for 2023-09-24

  • superfly/corrosion

    “Gossip-based service discovery (and more) for large distributed systems” —

    In a nutshell, Corrosion: Maintains a SQLite database on each node Gossips local changes throughout the cluster Uses CR-SQLite for conflict resolution with CRDTs Uses Foca to manage cluster membership using a SWIM protocol Periodically synchronizes with a subset of other cluster nodes, to ensure consistency
    This is very cool stuff for configuration distribution across a large network, where eventually consistent config is doable….

    (tags: eventual-consistency configuration corrosion sqlite cr-sqlite crdts distributed-systems)

Comments closed

Links for 2023-09-19

  • The Disappearing Art Of Maintenance

    Really fantastic article on maintenance, and how the concept has gradually disappeared from modern capitalism:

    [The maintainance team’s] knowledge is only worth so much, however. The real challenge is creating an economic system that values labor outside of profit-driven production. Many have rightfully called for a revaluing of care work in recent years. Maintenance workers deserve a similar revival in attention — but not only that. The price mechanism, and the labor system built around it, is fundamentally opposed to maintenance, both in its narrowest practical applications and in its broadest philosophical implications. The fact that the failures of capitalism happened to encourage maintenance practices at the margins is not worth emulating, and we shouldn’t be waiting around for climate change to recreate that austerity at a global scale. It must be valued on its own terms, and that means tearing down the economic system that rejects it. 
    (via Keith Dawson)

    (tags: via:kdawson maintenance repair technology infrastructure culture capitalism sustainability)

Comments closed