1479 stories
·
2 followers

If an AI persona told you something true, would you know it?

1 Share

The people who create social media platforms say they want a good portion of the content to land on the entertaining-to-informative axis. To me, the result reads more like mining our collective attention by turning the short-form video format into a slot machine that relentlessly targets our dopamine and fear receptors until we’re all a bunch of strung-out content junkies. But for the sake of argument, let’s assume these platforms are interested in more than just stacking money so high it disrupts air traffic.

We live in an age where AI-generated slop pours into your feed from every angle—news anchors, influencers, fitness coaches, body cam footage that looks and sounds real but isn’t. At this point we’ve all wrung our hands over AI-generated misinformation (and disinformation!), and rightly so, but what about just plain-old AI-generated information? I think it also deserves a fair bit of hand-wringing.

One of the goals of Riddance is to help people reliably converge on true beliefs without being coerced. And boy howdy, AI personas are not gonna help you get there. The capacity for one human to impart knowledge onto another human is a very subtle one. It happens only infrequently on social media from what I can tell, but it does happen.

I don’t think that AI personas, even when they say words that are true, can transfer knowledge to you. And so insofar as a piece of content purports to be informative (a thing these companies say they care about), if it’s delivered by an AI, you don’t walk away with knowledge.

Two examples of getting knowledge by testimony

Imagine your friend calls to say they’ve run out of gas on the highway and need a pickup. You believe them. Suppose it’s true their car is out of gas, and you’re justified in believing them—you know this person, they’ve never lied to you about something like this before, and you can hear the frustration over the phone. Congratulations! You’ve just gained a piece of knowledge by testimony. Now get in your car and go help them out.

Some things to notice about this exchange while you drive over there. Your friend is your friend; you know who they are and have background reasons to trust them. You can ask them follow-up questions (“Which highway?” “Why are you so dumb?”). And most important of all, there are conversational stakes: if you drove all the way out there and it turned out they were pranking you for a TikTok video, there would be consequences—wrath and reputational harm are some of the things that come to mind.

Now consider a different example that’s lightly paraphrased from a popular one in the testimony literature. Suppose a high school biology teacher goes down a few too many YouTube rabbit-holes and stops believing in evolution by natural selection. Bummer. Unfortunately for him, his textbook disagrees, and he has to teach the section on evolution despite not believing in evolution. Suppose he covers the material accurately and thoroughly and assigns the right readings. Do the students gain knowledge?

Most people’s intuition is yes, the students do learn. But in these cases our (oddly?) stoic biology teacher is functioning more like a conduit for knowledge rather than a source of knowledge, whereas the friend who needed gas seemed to be a source. Despite these two cases differing substantially, they have a lot in common that make them both good candidates for knowledge by testimony.

  • There are conversational stakes for lying. The friend risks the friendship and social status; the teacher risks his job and professional reputation. These stakes create accountability.

  • The speaker is identifiable. You know who your friend is. Students know who the teacher is.

  • The listener has background reasons to treat them as reliable. Past interactions, institutional roles, and social context all play a part in generating trust.

  • Follow-up questions are possible. You can ask your friend if the gas light was on. Students can raise their hand in class.

How many of these properties do you think AI personas on social media have? If you answered none, you could stop reading now! But please don’t. In order to understand why certain AIs cannot testify knowledge, it’s necessary to bring some machinery from epistemology and the testimonial knowledge literature to the fore.

Subscribe now

Lightning round: Epistemology and testimonial knowledge

Philosophers disagree about almost everything, including what exactly constitutes knowledge. However, epistemologists, philosophers who study knowledge, manage some agreement on the subject before views diverge. Most agree knowledge that p where p is some proposition (e.g., “The US and Israel just started a war in Iran”) requires at least the following to all hold:

  • Belief: in order for someone to know that p, they must believe that p.

  • Truth: in order for someone to know that p, p must be true.

  • Justification: in order for someone to know that p, they must be justified in their belief that p.

This is often referred to as the justified true belief (JTB) account, and while it sounds good, it’s widely agreed to be jointly insufficient for knowledge. So long as one doesn’t require absolute certainty (infallibility) for a belief to count as justified, there is always the possibility that something you believe is true and you are justified in believing it, but the reason the thing is true is not related to your justification. In fact, in these cases, your (fallible) justification turns out to be unwarranted. Situations like these are known as Gettier problems, named after Edmund Gettier who first identified them as a general phenomenon in his delightfully short paper on the subject.

Epistemologists who study testimony, used here as a term of art that roughly means a speaker asserting something to a listener, can be grouped in many ways, but the most common division is on the reductionist/non-reductionist spectrum.

  • Reductionists don’t think that testimony can be a fundamental source of knowledge. The idea is that the speaker is a reliable indicator but basically just another component of a broader justification pipeline. Their testimony rides on top of or bundles the other mechanisms one has for generating justified beliefs.

  • Non-reductionists think the reductionist view is too demanding, and so they treat testimony as its own independently justified property, akin to visual perception or taste. And while this status as a source of justification is defeasible, in the normal case, knowledge as the norm for assertion is sufficient for one person to testify knowledge to another.

  • Hybrid views try to combine the best of both approaches described above. Some argue that testimony can be a basic type of justification, but only if it’s situated within a broader normative practice with shared expectations, sensitivity to defeaters, and sensitivity to error correction. On these views, in which testimony is still special, it really matters that it happens in an environment where testifiers have social and conversational stakes that curl back on them.

The examples from earlier, your friend who needs gas and the pilled teacher, reflect some of the fault lines in the testimony literature. But more importantly, the four qualities they have in common snap into focus. Justification might be described as a type of relationship between your belief in something and the truth of that belief. The more we know about what matters in that relationship the better.

Why AI personas cannot testify knowledge

Even if LLMs can transmit knowledge to a user in the right circumstances, AI avatars or personas are in a completely different boat. Interactions with chatbots, while far from perfect, actually do have a lot of the properties described above. You can push back on something they say, make a decision not to use them for certain tasks (counting the number of letters in certain fruits), and follow-up with questions. What LLMs lack is stakes and accountability, whereas AI personas are lacking in all four departments. These public facing personas don’t have any of the features we identified as common to successful instances of testifying knowledge.

  • No conversational stakes for lying. Reputational harm is the primary mechanism by which public personas are incentivized to speak truthfully and responsibly. Not only do AI personas not suffer reputational harm, there’s an incentive for them to be irresponsible and inflammatory insofar as that generates engagement.

  • The speaker is not identifiable. When an AI avatar can change its appearance with a prompt, there is no stable identity for the user to pick out. An AI avatar of a news anchor could be generated by two different models with no shared context between two videos, and the viewer might have no way of knowing.

  • There are no background reasons to trust AI personas. There is no background to trust other than facts about the account, but this is justification from an entirely different source and has nothing to do with the AI.

  • Follow-ups are impossible. Users can comment on a video and sometimes get responses from the content creators. Follows-ups from AI personas, if they happen, will almost certainly not come from the AI itself.

While public-facing content in general is less amenable to the various checks and balances that listeners impose on speakers when deciding whether to update their beliefs on what is said, public-facing AI personas supercharge all of these problems. Given their chameleon status, and the ease with which they migrate to new accounts, reputational checks do little to deter AI personas from lying. Given this abject failure to meet basic and broad standards for testimonial knowledge acquisition, it follows that even if AI personas or avatars on social media tell you something true and you believe them, you don’t get knowledge because your belief is not justified.1

Implications for AI on social media

Even before AI, social media was relatively hostile to knowledge transfer via testimony. Many of the design features central to any short-form video platform seem custom-built to flout the requirements for testimonial knowledge outlined above. Short-form videos are hard to search over, hard to verify, have low stakes for lying, and present little follow-up opportunity for the user. Adding AI to the mix is just a six fingered slap in the face to anyone hoping to reliably update their worldview from interacting with these content sources.

Even more unfortunate, measures the platforms do have in place are basically optimized to prevent real people from pretending to be something or someone they aren’t. Many AI personas exist for the sole purpose of pretending to be someone or something they aren’t. Furthermore, the calculus for pushing AI personas on these platforms is so different the moderation policies are much less effective against AI. If I’m making public facing AI content for engagement (whether that’s an influencer, a reaction video, a fake arrest, or an AI anchor) I care much less that an account gets actioned than I do if I’m a real person putting in effort to post by hand. I can price the moderation actions into my operation (assume 60% get axed, how much AI slop do I have to make) and go from there.

Until social media platforms completely demonetize synthetic AI content and increase capacity for labelling synthetic content as such, it is unlikely this problem will go away. Insofar as social media platforms are places people go for knowledge and not just entertainment, AI content on these platforms is parasitic on that purpose.

Riddance is a reader-supported publication. To receive new posts and support our work, consider becoming a free or paid subscriber.

1

Interestingly, if the AI avatar is so good you don’t know that it’s AI, and if the account is well-managed to say things that are true, and you believe what it says, we get back into Gettier territory.

Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

“A day some have predicted and many have feared”

1 Share

As a former ISP employee I occasionally like dipping my toes into some networking stuff, and this 25-minute video from The Serial Port is a good retelling of the day in 2014 when one of internet’s important routing tables crossed a threshold of 512K, which caused all sorts of trouble:

What I appreciate about The Serial Port is that they always seem to actually test the vintage hardware or rebuild the old software they’re commenting on, and this time was no exception: they grabbed a classic unsung hero of ISPs, a Cisco Catalyst 6500-series router, and then recreated “The 512K Day” in their studio.

This was a nice comment under the video:

Have absolutely no knowledge about networking, but watched this video as if a thriller movie. Thanks for opening my world of tech to networking.

Yeah, the video is kind of nerdy and intense, but maybe you’ll enjoy it; even a classic aging piece of hardware with an arbitrary ticking-bomb limit deserves some respect.

Also, the funniest comment:

I had a 2.4k day a couple days ago when I realized Farm Sim 22 only allows a max of 2400 bales. Couldn’t load into my saved game. Had to go into items.xml and temp remove a hundred bales.

Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

YouTube Shorts and Instagram Reels are making you dumber, according to science

1 Share
A study from scientists at Zhejiang University is going viral for showing a correlation between short-form content and lower impulse control.

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

An Internet of Checkpoints

1 Share

Thoughtful stories for thoughtless times.

Longreads has published hundreds of original stories—personal essays, reported features, reading lists, and more—and more than 14,000 editor’s picks. And they’re all funded by readers like you. Become a member today.

Bijan Stephen | Longreads | February 26, 2026 | 1,573 words (7 minutes)

I stumbled across the videos the same way many other people did: in search of something else, something I can’t remember now, years later. As I scrolled through YouTube, my attention caught on a spray of Japanese characters in the sidebar, and above it a thumbnail image I half-remembered from a childhood spent in front of cathode ray televisions. It was a pixelated thicket of forest green brambles in front of a pure cerulean sky, peppered with pillowy white clouds—the kind of perfect scene you only find in video games. 

My search was derailed; I had to click. The clouds scrolled across the screen as the music began to play, an ambient synth track called “Stickerbush Symphony (Bramble Blast),” composed by David Wise for the game Donkey Kong Country 2: Diddy’s Kong Quest, released in 1995. The song is wistful and calm, like something you would hear in a movie while a character floats dreamily underwater. But the real magic was happening below, in the comments. 

“I’m not sure where the algorithm is taking me but words cannot describe the feeling I get just sitting here reading comments while this strangely familiar song plays.”

“Did we all find this at the same time? What could it mean. Regardless, we’re all here together.”

“This feels like the end credits for life itself. It could be the end, but I hope it’s just a checkpoint. There are still so many things I have left to do. I hope I find my way home. I hope you all do too.”

For seemingly no reason at all, thousands of people were telling stories about themselves, unguarded even against the background toxicity of internet comment sections. Many of them used the word “checkpoint.” In video games, a checkpoint is a safe space: a place to save your game, where the danger can’t reach you. It’s a place to breathe, in other words. A relief; a respite. They’re also places to marshal your bravery, because they’re not the end of the game. That comes later, after more struggle. One of the better ways to get through something difficult—whether a video game or the general unpredictability of life—is to feel connected with other people, like you’re not alone. I saw it again and again. 

“Checkpoint: Turns out life ain’t how you want even if you have your goal in mind, times were hard these past four years, I was so close to throwing the towel. I don’t know if I can reach my goal of owning a house, having a career I love, or having a love interest. Nothing seems to work and lost my way. I keep going and keep trying different ways around this obstacle. Doesn’t matter if my past choices were mistakes, or if other paths could of taken me to my goal, I keep moving forward.” 

I couldn’t tell you how long I spent scrolling that day, paging through the unguarded mysteries of other lives. It’s stuck with me ever since. Lately, I’ve found myself thinking of all these people and wondering what happened to them. Where did they go next? Did they find what they were looking for? 


The video—titled “とげとげタルめいろ’スーパードンキーコング2,’” or “Spiky Barrel Maze ‘Super Donkey Kong 2’”—was uploaded to YouTube on April 26, 2012, by an anonymous user named Taia777. It was the first video on their channel. Whoever Taia was, they never shared anything personal; they just sporadically uploaded similar videos of pixelated animations and music from retro games. There were five videos in 2012, more than a dozen in 2013, nine in 2014, one in September 2017, then silence. The channel quietly sat with a few thousand subscribers. Taia posted nothing new for years. 

Then one day, as 2019 slid into 2020, something happened. (No, not that.) Something switched in YouTube’s recommendation algorithms, and almost overnight thousands of new users were directed to Taia’s first video. 

A recommendation for a nearly decade-old video with a title in another language was a genuinely uncanny experience, especially if your browsing history had nothing to do with early ’90s video games. People migrated to the comments section to wonder what was going on. Many described finding the channel in quasi-spiritual terms; they felt that the YouTube algorithm brought them there for a reason. 

In video games, a checkpoint is a safe space: a place to save your game, where the danger can’t reach you.

Maybe all the video game imagery put commenters in a certain mindset. They began to make jokes about being the main character of, well, life. As one commenter explained to another, “Legends say, if you find this video in your recommended, you are truly a main character in your world. Not an NPC [non-player character]. Thus, this is a place to write a ‘checkpoint’ to ‘save your game.’” And people started posting—at first ironically, and then with total sincerity. Which is how Taia’s first video became the internet checkpoint. 

The widening pandemic brought a firehose of new comments, burying many of the older, rougher ones under a shower of emotional vulnerability: Checkpoint November 1st, 2020. I’m in confinement again. I hope this time won’t be as hard. This time, I won’t be alone. 15:59 Game saved.”

The community spread outside of YouTube, too. In January 2020, someone started a subreddit called r/taia777, which billed itself as “the premier community for discussing the internet checkpoint, as well as its uploader.” That February, a Discord—the Taia777 Sanctuary—was founded as a haven for those emotionally vulnerable commenters. Immediately, more than 450 people joined; today it has over 5,000 members.

“We get people from all over the world,” said the Sanctuary’s founder, who goes by Izeezus. Izeezus appreciates places like YouTube and Discord, where you can still be semi-anonymous online. “The Sanctuary Discord is in a cool middle ground, where we can befriend people online and share troubles and things in our lives, but it’s never extremely deep or consequential enough where it takes over your real life,” Izeezus said. “And that was a hard transition to make for people coming out of quarantine and post-pandemic, understanding that this space is not supposed to be your number one source of social energy.” 

Taia777 started uploading videos again in 2021, after a four-year hiatus. Their popularity, however, soon brought unwanted attention. By that summer, YouTube had started removing Taia777’s videos over copyright infringement claims. On March 14, 2022, Taia777’s channel was deleted from YouTube altogether. By then, the channel had published 29 videos and had amassed more than 28 million views. When it disappeared, everything—the videos, the memories, the well-wishes—was gone. Presumably for good. 


Then a funny thing happened. The channel came back online—sort of. Back in 2021, as the takedowns were heating up, a person named Rebane posted on the Taia777 subreddit. “Hey,” she began, “I’m an internet archivist and I archived the taia777 channel and also the comments on it. Now that Nintendo has struck down many of the videos, I’m going to share my archives.” What she shared was a fully functioning dump of all 29 videos and every comment—up until April 7, 2021, when she’d grabbed the data. 

Below the videos in Rebane’s archive, there is a chorus of voices doing their best to leave a permanent mark in the ephemerality of the internet.

Speaking with me years later, Rebane explained that she runs a large private archive of internet culture, of which the Taia777 archive is one very small part. (Her archival software, called Hobune, is open source.) Rebane came across the Taia777 videos the same way everyone else did—as a random YouTube recommendation. “I like the music. I thought the visuals were cool,” she said. “But I didn’t think too much of it.” 

Even so, Rebane archived it—just because. Her archive has 1.2 million videos in it so far; of that, she said, 300,000 of those videos have since been removed from YouTube. To Rebane, Taia777’s videos aren’t any more special just because there was a community around them. “If you feel the loss of internet culture every single day for years and you see every day . . . like, I don’t know, 50 videos just disappear,” she said. “After years, it’s just not gonna feel as impactful anymore.” 

Maybe not to Rebane, but the checkpoint community appreciated it. On Reddit, users hailed Rebane as a “legend” and a “hero.” Another member of the Sanctuary created a website that uses Rebane’s archive as a database to let people find their old comments easily. That’s how I was able to revisit the comments that grabbed my attention years ago.

Now, of course, there are also imitators: channels creating their own checkpoints. Some feature reuploads of Taia777’s original videos (which haven’t yet been taken down, for whatever reason); others publish their own. There’s even a new Taia777 channel, though I suspect it isn’t the original creator’s, whoever they are. 

Below the videos in Rebane’s archive, there is a chorus of voices doing their best to leave a permanent mark in the ephemerality of the internet. “Checkpoint: Teaching my daughter to read. I couldn’t be prouder,” says one person. “Checkpoint: started cleaning my room after two years of depression,” writes another. “Checkpoint: Went through brain surgery due to removal of a tumor four months ago. Currently relearning how to walk, can breathe, eat and talk again, trying to get through all this,” says someone. “Checkpoint: I’m trying one more time,” says another. 

And isn’t that it? All of it, I mean. The future is as unknowable as the past is inaccessible. Time, for us, flows one way. All we can do—all I can do—is keep trying, and remember to save our progress along the way.


Bijan Stephen is a music critic at The Nation and a writer at Compulsion Games. His writing has appeared in The New YorkerThe New York TimesEsquire, and elsewhere.

You can find more of Rebane’s work on Bluesky and at her site.

Editor: Brendan Fitzgerald
Copyeditor: Krista Stevens

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

EdTech is Borrowing Zuckerberg's Playbook

1 Share

This article was originally published on The Digital Delusion. We thank Jared for allowing us to share it with our readers.

1st graders looking at laptops in a classroom
Summit Art Creations/Shutterstock.com

Last week, Mark Zuckerberg — founder and CEO of Meta — took the stand under oath for the first time in a criminal trial.

At one point, Zuckerberg was questioned about Meta’s use of beauty filters: digital effects that make users, including children, appear younger, fitter, and more conventionally attractive in photos and videos.

The prosecution referenced Meta’s own internal review, Project MYST. According to reports, 18 out of 18 wellbeing experts who evaluated the psychological impact of these filters raised serious concerns about potential harm to young users’ mental health.

Despite those warnings, the filters remained available.

Zuckerberg’s defense rested on a familiar line of reasoning: there was no peer-reviewed, causal evidence demonstrating this specific product directly harmed children. Absent validated proof of causation, harm could not be established.

“There is no evidence of harm.”

This is the same argument now being deployed by EdTech lobbyists at statehouses across the country as lawmakers attempt to regulate classroom technology.

No Evidence of Causative Harm

This year, more than a dozen bills aimed at regulating EdTech have been introduced across at least nine states. Utah’s SAFE and BALANCE acts led the way, followed closely by Vermont’s effort to formalize parental opt-out rights and Tennessee’s proposal to remove digital devices from primary classrooms.

These efforts are informed by decades of research showing that, on average, classroom technologies do not outperform – and often underperform - well-implemented analog instruction.

Despite strong bipartisan support in most states, pro-tech lobbyists are pushing back with a familiar refrain: “There is no evidence of harm for emerging EdTech products.”

Strictly speaking, that statement is often true.

Educational technology evolves so rapidly that by the time researchers evaluate one platform, it has already been patched, rebranded, or replaced. Product-specific causal evidence is perpetually just out of reach.

But this is not a scientific defense. It is a misleading procedural maneuver.

When Causation Becomes Dangerous

Demanding product-specific, long-term, high-risk causative trials in children sets an unrealistic and ethically impossible standard.

Returning to beauty filters, no ethics board would approve a study deliberately exposing children to a tool that 18 experts consider risky simply to “prove” harm. That is why no randomized control trial has tested whether these filters damage young users’ mental health — the likely harms of such a study outweigh any possible benefits.

Luckily, we don’t live in a vacuum.

A substantial body of correlational research links image manipulation and filter use to body dissatisfaction, self-objectification, weight concerns, and reduced wellbeing. The experts reviewing Meta’s policies were not guessing — they were applying decades of psychological research to a new technological wrapper.

Software changes. Human biology does not.

The same logic governs learning.

Returning to Utah

Utah’s digital inflection year occurred in 2014, corresponding with the statewide launch of SAGE — a fully computerized adaptive assessment system. Before this, digital tools were largely peripheral in Utah classrooms. After this, they became structurally embedded.

Before widespread digital adoption, Utah NAEP scores rose consistently from 1992 through 2013. Pooled by subject and indexed to 2013:

  • Math scores increased +0.76 points per year

  • Reading scores increased +0.14 points per year.

After 2014, the slopes reversed:

  • Math scores declined -0.39 points per year

  • Reading scores declined -0.88 points per year.

This represents a structural swing of -1.15 points per year in math, and -1.02 points per year in reading. Importantly, excluding 2022 — the year most impacted by COVID-related closures — leaves these swings essentially unchanged: -1.05 points per year in math and -1.07 points per year in reading. In other words, this pattern is not a lockdown artifact — it’s a structural break beginning in 2015.

These are correlational patterns, but so were the early signals about smoking, lead exposure, and beauty filters.

When consistent patterns appear across nearly all 50 states’ NAEP data and across dozens of countries’ PISA, TIMSS, and PIRLS results — and when those patterns align with established cognitive mechanisms — we are no longer looking at coincidence. We are looking at converging evidence.

And what we cannot ethically do (just as with beauty filters) is deliberately expose children to systems we have strong reason to believe may undermine learning simply to satisfy an unrealistic evidentiary demand.

Demanding perfect causation before action doesn’t protect children; it protects developers.

A Generous Interpretation

Even if we assume the decline argument is overstated — that Utah’s NAEP data has merely “plateaued” since 2014 — the harm does not disappear.

Between 2015 and 2025, Utah invested roughly $500 million in K-12 educational technology. If half a billion dollars produces stagnation, that’s not neutral.

Every dollar committed to devices and platforms is a dollar not spent on interventions we know improve learning: teacher development, structured literacy programs, small-group instruction, targeted support for struggling students.

Even under the most generous interpretation of the data, the opportunity cost alone is staggering.

So Now Then…

Demanding definitive causative proof of harm before acting to protect children sets an unrealistic and dangerous standard. If we wait for perfect causation, we will always act too late.

Our society does not demand product-specific randomized trials before regulating food additives, vehicle safety standards, or consumer protections. We act when converging evidence suggests that risk outweighs benefit.

Education should be no different.

When billions of dollars and millions of children are involved, the burden of proof should rest on demonstrating clear, durable, replicable benefit — not on proving harm after the fact.

Caution is not fear, and restraint is not regression. They are marks of a society that prioritizes children over products.


For more on EdTech, check out Jared’s previous After Babel piece, The EdTech Revolution Has Failed.


After Babel is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

AI doesn’t think like a human. Stop talking to it as if it does

1 Share

Autonomous agents take the first part of their names very seriously and don’t necessarily do what their humans tell them to do — or not to do. 

But the situation is more complicated than that. Generative (genAI) and agentic systems operate quite differently than other systems — including older AI systems — and humans. That means that how tech users and decision-makers phrase instructions, and where those instructions are placed, can make a major difference in outcomes.

AI systems have already developed quite a history of disregarding instructions and overriding guardrails. (I’ll spare you for now my admonitions about how the “lack of trustworthiness of today’s genAI and agentic systems is a dealbreaker that means they should simply not be used.”)

But this month saw two powerful examples of how two hyperscalers — AWS and Meta — got burned by how they communicated with these complicated AI systems.

The first involved a December incident affecting AWS, where an engineer didn’t know his own privileges and therefore didn’t know — literally — what his agentic system was capable of doing. The agent deleted and then recreated a key AWS environment.

AWS declined to say just what the system had asked and what the engineer said when approving the request. 

The Meta mess

The Meta case is even more frightening because the perpetrator/victim was not some nameless AWS engineer, but the director of AI Safety and Alignment at Meta Superintelligence Labs, Summer Yue.

As Yue described the incident in a posting on X, “Nothing humbles you like telling your OpenClaw ‘confirm before acting’ and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to run to my Mac mini like I was defusing a bomb.”

Yue may have only begun working for Meta last July, but she held senior AI roles for years, including stints as VP/Research at Scale AI and five years in senior research positions at Google. She was no novice.

When someone in the discussion group asked how it happened, her posted reply said: “Rookie mistake tbh. Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different.”

Yue said she had instructed the system to “check this inbox and suggest what you would archive or delete. Don’t take action until I tell you to.” She added that “this has been working well for my toy inbox, but my real inbox was huge and triggered compaction. During the compaction, it lost my original instruction.”

As various readers in that forum noted, Yue tried begging the agent to stop deleting her emails (she told the system “Stop don’t do anything”) as opposed to giving a machine-friendly order such as /stop or /kill. She eventually made the system respond when she got to her desktop computer. (She had been trying to stop things from her phone, which didn’t work.)

One commenter suggested the problem involved giving a prompt, which agents do not always follow, especially if there is a long list of prompts. “The real fix is architectural. Write critical instructions to files the agent re-reads every cycle, not online instructions that vanish when the context window fills up.”

Lessons learned?

There are many lessons to unpack from the monstrous Meta mishap. First, don’t rush to extrapolate from what an agent does with a small test area or even a sandboxed trial performed with air-gapped machines. Once it’s released into the wild of a global environment, lessons learned from limited exposure might not apply. Tests show what an agent can do, not necessarily what it will do when unleashed.

Even ordinary communications with an agent can be problematic. When an agent asks for permission to perform a function, avoid assuming any common sense or shared understanding of reasonableness. 

In the AWS situation, AWS said the engineer’s first mistake was not understanding their own system privileges and therefore what capabilities and access they’d given to the agent. That suggests a good procedure: create accounts with minimal access and then log into that low-level account when creating the agent.

That won’t guarantee that the agent will obey its instructions, but at least it will limit how much damage it can do if/when it goes rogue. 

I asked Claude — who better to know how to talk with a large language model (LLM) than an LLM? — for tips on talking with agents. “Rather than implying constraints, state them directly. Instead of ‘keep it appropriate,’ say, “Do not include any violence, profanity, or adult content.’ The more precise the boundary, the easier it is to follow consistently.”

Even better, Claude suggested telling an LLM “both what to do and what not to do. For example: ‘Write only about the topic I provide. Do not go off-topic, add unsolicited advice, or mention competing products.’”

Claude also acknowledged its own systems can forget instructions. “For long conversations or complex system prompts, restating the most important guardrails near the end or in a summary helps them stay active in Claude’s attention.” In other words, treat LLMs as if you’re talking with a 2-year-old. 

The real world is different

Part of the problem involves the nature of autonomous agents. Enterprises are not used to them and they think they are safely cocooned inside of walled-off sandboxes during their proof-of-concept (POC) testing  — just like 99% of the trials they’ve seen for decades.

But agentic AI doesn’t work that way. For those agents to deliver the massive efficiencies and flexibilities that hyperscaler sales people promise, they need to be dispatched in the wild, touching lots of live systems and interacting with other agents. 

That forces an impossible choice: keeping the agents secure means they can’t deliver the purported  benefits. A wise executive would say, “So be it. The risk of letting these agents loose is way too high. Cancel all genAI and agentic POCs.”

But wise executives also like to keep their jobs, which usually means efficiency and cost cutting will beat security and risk every single time. 

Joshua Woodruff, CEO of MassiveScale.AI, said the Meta situation offers a good peek into the IT mindset for many agentic trials.

“That’s how most people think about AI safety right now,” he said. “They write an instruction and assume it’s a control. It’s not. It’s a suggestion the model can forget when things get busy. Look at what the agent actually did from a security perspective. It performed well on low-value tasks. It earned trust. It got promoted to access sensitive data. Then it caused damage. That’s the exact behavioral pattern every security team is trained to watch for in humans.

“You have to use those architectural constraints and put the instructions in one of the memory artifacts. That way, it can’t compact it and the rule will have a better chance of surviving. Just remember that the agent can still read the rule and ignore it. Think of it as a policy manual, not a locked door.”

One ongoing issue is that there is a rash of human terms being used to describe these systems — they  “think” and use a “reasoning model” — even though users should know that none of these systems do any actual thinking or reasoning, Woodruff said. “It’s just math.”

But that anthropomorphization is dangerous; it allows people to treat and interact with these systems as if they’re human. The next thing you know, an experienced manager at Meta is shouting at her system to please stop. 

Treating an autonomous agent as if it’s a person gives a whole new meaning to someone “acting very Meta.”



Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories