1630 stories
·
2 followers

Admissions Officers Beware: Some Advanced Placement Scores Are Inflated

1 Share

Illustration

While high school GPAs have been gliding upwards for years, college admissions officers have relied on Advanced Placement (AP) exams as a more stable, rigorous measure of college readiness. That confidence is now misplaced—at least for most of the exams that dominate the AP landscape.

The College Board has phased in a new scoring system that has inflated student results on nine of the most frequently taken AP exams. The share of students receiving the top score of 5 on these exams has jumped by an average of 61 percent in just four years. The share receiving a passing score (3 or higher) has risen by 37 percent.

Some less common AP exams still appear to function as reliable indicators of high academic achievement. But for the most popular exams, high school counselors and college admissions committees must go beyond a quick glance at the AP scores listed on an application. They now need to look closely at which AP exams a student took, and in which years.

Trevor Packer, the senior vice president in charge of AP programs, denies that any score inflation has occurred. He has described the claim that AP is being “dumbed down” as “entirely false.” This essay explains how the scoring system has changed, demonstrates that inflation has occurred, and shows why the official denials are misleading.

Why AP Matters

High performance on AP exams is an important way students signal readiness for rigorous college work:

  • Scores of 5 on multiple tests serve as a positive signal that a student is prepared for admission to Ivy League and other highly selective institutions.
  • Many selective but non-elite colleges award course credit or waive introductory course requirements for scores of 4 or 5.
  • Most non-selective colleges grant credit for a passing score of 3 or higher.

The financial stakes are high. By substituting a high school AP course for a college course, students can reduce college costs and shorten their time to earning a degree. Reflecting this, more than 1.3 million high school students in 2025 paid a $99 fee for each of over 4.8 million AP exams.

Given these stakes, the integrity of AP exams depends on a scoring system that is stable across subjects and from year to year.

How AP Scoring Used to Work

Until 2022, the College Board used a relatively consistent procedure to set score distributions:

  • Each AP exam was reviewed every 5–10 years by a panel of approximately 10 to 18 experienced college professors and high school teachers.
  • These experts had deep subject-matter knowledge and a clear sense of what level of performance justified advanced placement in college.
  • They determined what share of test takers should receive each of the five AP scores (1 through 5).

Under this system, the distribution of students awarded scores of 5, 4, 3, and so on was anchored to the standards of a carefully selected expert group and remained fairly stable over time.

The Shift to a New Scoring System

After 2021, the College Board began phasing in a different approach for nine of its most popular exams:

  • English Language and Composition
  • U.S. History
  • English Literature and Composition
  • World History
  • U.S. Government and Politics
  • Psychology
  • Biology
  • Human Geography
  • Chemistry

Less commonly taken exams—such as Music Theory, Art History, Japanese, Italian, and Physics (Electricity and Magnetism)— continue to be scored under the traditional expert-judgment system.

What Happened to the Scores?

Under the new system, performance on the nine popular exams suddenly “improved” in ways that are historically unprecedented:

  • Top score (5): The share of students earning a 5 increased from about 10 percent in 2021 to 17 percent in 2025, on average—a 61 percent increase. Under the old system, the share of 5s awarded in these subjects, on average, hardly changed over the previous six years.
  • Top two scores (4 or 5): In 2021, just 28 percent of test takers received a 4 or 5 on these nine exams. By 2025, that had jumped to 45 percent, a gain of 17 percentage points—or a roughly 63 percent increase.
  • Passing scores (3 or higher): The share of students receiving a 3 or better rose from roughly 52 percent to 71 percent over the same period, a 19 percentage-point increase—resulting in a 37 percent jump in passing rates.

Such large, rapid gains call for explanation.

Three Possible Explanations

Three broad explanations (or some combination of them) could account for this sudden surge in scores:

The test-taking pool became more selective. Perhaps weaker students stopped taking AP exams, leaving a stronger group of test takers.

This is easily rejected. Since 2021, the number of AP test takers has increased, not decreased. The pool has expanded rather than narrowed to a high-performing elite.

Teaching and learning improved dramatically. Perhaps teachers and students suddenly found far more effective ways to teach and learn AP material.

If students’ knowledge truly improved so dramatically, we would expect to see similar gains on other large-scale, independent tests. In fact, national and international data tell a different story:

  • NAEP (National Assessment of Educational Progress) scores in 8th-grade math, reading, and science were already slipping before Covid and have fallen sharply since. 12th-grade math and reading scores also declined between 2019 and 2024.
  • PISA (Programme for International Student Assessment) scores show stagnation in science and reading for U.S. 15-year-olds since 2015, and a decline in math.

These results provide no evidence of a sudden, broad-based leap in academic achievement. If anything, they point to stagnation and decline.

That leaves a third explanation.

The scoring system was relaxed. Perhaps a new evaluative approach altered the way tests were scored so that higher scores were given for the same level of performance. Let’s take a look.

Evidence-Based Standard Setting (EBSS): The New Method

After 2021, the College Board introduced what it calls “Evidence-Based Standard Setting” (EBSS) to determine score distributions on its most popular AP exams.

Under EBSS, the College Board consults hundreds of college instructors instead of relying on a small panel of carefully selected experts. These instructors are asked to recommend what proportion of students should receive each AP score.

In practice, the standards produced by this large, dispersed group are substantially lower than those set by the traditional expert panels.

The Impact of EBSS

With the implementation of EBSS, the share of passing scores rose sharply across the nine popular courses that that used it. The size of the increase varies by subject:

  • In English Literature, U.S. History, U.S. Government and Politics, and, the share of 4s and 5s rose by 24 percentage points or more (see Figure 1).
  • In Psychology and English Language and Composition, the increase was smaller but still substantial—about 9 or 10 percentage points.
  • In each of the nine subjects, EBSS is associated with higher scores and higher passing rates between 2021 and 2025.
  • The largest score increases on each exam within this period correspond to the specific year when EBSS was first applied.

These patterns are precisely what we would expect if the scoring standards had been relaxed.

Figure 1: Evidence of score inflation

Parsing the Official Denials

Trevor Packer insists there has been no “dumbing down” of AP exams, stating, “The exams themselves have not changed . . . Well-established equating processes ensure the difficulty of AP Exams remains consistent from year to year.”

This statement is technically correct but strategically framed. It emphasizes one piece of the puzzle (the difficulty of the test questions) while ignoring another (the conversion of raw test scores into AP scores).

Dumbing down does not require easier questions. It can be achieved just as effectively by changing how test scores are mapped onto the 1–5 scale—exactly what EBSS does.

According to one report of a public appearance, Packer acknowledged that the College Board aimed “to bring all exams to between a 60 and 80 percent success rate.” In 2025, the average passing rate on the nine EBSS exams was 71 percent, almost exactly the midpoint of that target range. EBSS appears to have been used to recruit scorers whose standards would produce the desired “success” rates.

Packer further claims that fluctuations in passing rates are driven by changes in student performance, pointing to recent declines in pass rates for AP Calculus BC, AP Statistics, AP Physics C: Mechanics, and AP Government and Politics courses. He neglects to highlight that:

  • All but one of these courses have not been subjected to EBSS.
  • For AP U.S. Government and Politics, the 2025 pass rate is only slightly below its 2024 level—after a 20 percentage-point increase following the adoption of EBSS.

The pattern is consistent: where EBSS is applied, scores rise substantially; where it is not, scores tend to reflect the stagnation or decline seen in broader national tests.

AP vs. IB and the Role of Marketing

To justify higher AP passing rates, Packer points to the International Baccalaureate (IB) program, where roughly 80 percent of candidates succeed. The comparison is misleading:

  • IB is an integrated two-year program, not a set of independent single-course exams.
  • Earning the IB diploma requires sustained performance across multiple subjects and assessments over time.

Nonetheless, the comparison reveals something important: the College Board is attentive to market positioning. If IB can boast an 80 percent “success” rate, AP’s passing rates must appear competitive to students, parents, schools, and policymakers.

Financial Incentives and Score Inflation

Market considerations are not incidental to the College Board. They are central to its operations:

  • In 2024, over 86 percent of College Board revenue came from fees and similar payments, including 48 percent from the basic AP exam fee.
  • In 2024, total revenues exceeded $1.17 billion, and the organization held reserves of over $2 billion.

Generous compensation at the top reinforces these incentives:

  • The CEO received $2.3 million in total compensation in 2024, comparable to the pay of the president of Stanford University, though Stanford’s operating budget is about ten times larger.
  • The second-in-command earned $1.5 million.

To sustain these revenues and salaries, the College Board must keep AP attractive to schools and students. Guaranteeing that more than two-thirds of test takers “succeed”—via relaxed scoring standards—serves that purpose well.

If this requires inflating AP scores, so be it. The more troubling question is why a senior vice president feels compelled to deny the inflation and to frame it instead as a story of scoring becoming “more precise.”


EdNext in your inbox

Sign up for the EdNext Weekly newsletter, and stay up to date with the Daily Digest, delivered straight to your inbox.


Implications for Admissions Officers and Counselors

For college admissions officers, high school counselors, and policymakers, several implications follow:

  • AP scores are no longer directly comparable across subjects and years. A score of 4 in AP U.S. History today (post-EBSS) does not mean the same thing as a 4 in U.S. History before 2021, nor as a 4 in AP Music Theory (still scored under the old system).
  • The most popular exams are the most inflated. The very tests taken by the largest number of students—those that dominate application profiles—are the ones whose standards have been relaxed.
  • Context now matters critically. Evaluators should:
    • Note which AP courses and exams a student took.
    • Check the year(s) in which those exams were taken.
    • Recognize that high scores on EBSS-affected exams are far less informative than scores on exams that retained traditional standard setting.

AP exams, once a gold-standard external check on grade inflation, now vary in reliability. Without close attention to subject and year, admissions decisions risk being distorted by hidden inflation.

Conclusion

The College Board’s shift to Evidence-Based Standard Setting for its most popular AP exams has produced an unmistakable pattern of score inflation, even as broader measures of student achievement show stagnation or decline. Official statements that “the exams themselves have not changed” hide the central fact: the scoring system has changed, in ways that dramatically raise reported performance.

AP remains influential in college admissions and credit decisions, but its signals are no longer uniform or stable. Those who rely on AP scores must recognize that some exam results reflect not a surge in student learning but a quiet lowering of the bar.

Paul E. Peterson is the Henry Lee Shattuck Professor of Government and Director of the Program on Education Policy and Governance at Harvard University. He is also a Senior Fellow at the Hoover Institution, Stanford University.

The post Admissions Officers Beware: Some Advanced Placement Scores Are Inflated appeared first on Education Next.

Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

The Memory Maker

1 Share

Thoughtful stories for thoughtless times.

Longreads has published hundreds of original stories—personal essays, reported features, reading lists, and more—and more than 14,000 editor’s picks. And they’re all funded by readers like you. Become a member today.

Tim Requarth| Longreads | April 9, 2026 | 18 minutes (5,003 words)

How does the brain decide what’s real? It’s a question most of us never have to ask. Our memories feel like records—imperfect, sure, but records nonetheless. We trust them to tell us where we’ve been, what we’ve done, who we are. But that trust rests on neural machinery we can’t access, reality-sorting processes that operate beneath conscious awareness. 

We’ve been fortunate to publish Tim Requarth in the past. Please be sure to check out “The Final Five Percent.” The piece won a 2020 Science in Society Journalism Award and was anthologized in The Best American Science and Nature Writing in 2020.

My wife insists we once took a yoga class together, early in our relationship. She remembers the teacher vividly (a French acrobat, rainbow dreads, apparently quite a character), where we sat (to the left of the door), and the color of the yoga mats (teal). I insist she is misremembering: I have never been to a yoga class, even to this day. I scrolled back years through my phone’s location history once to settle it, but we’d started dating not long after the iPhone came out, and if the data ever existed, it was gone. The yoga story comes up every few years, but we never resolve it. It is probably unresolvable. As a neuroscientist, I know how these things happen—the encoding mishaps, the source confusion, the neuroscience of how two people can end up telling different stories about the same afternoon. This knowledge has never once brought us closer to agreeing.

I was thinking about this story when I heard something strange from a neighborhood friend of mine, Andrew Deutsch, who was using OpenAI’s Sora app. Sora, if you aren’t familiar, worked like this: You would record your face, say a few numbers, rotate your head left to right. Moments later, you would have an AI video replica of yourself, a self-deepfake, insertable into any scenario you can prompt the AI to produce. Scuba diving with SpongeBob. Dancing K-pop style in a futuristic cityscape. You could then share your videos with your friends and scroll through the videos of others, in what is often described as a “TikTok for deepfakes.” Sora hit one million downloads in only five days. Six months later, OpenAI shut it down, reportedly redirecting resources toward coding tools ahead of a planned IPO. Consider this, then, a eulogy for Sora, a technology with the lifespan of an off-Broadway flop that, in its brief and ignominious run, exposed a crack in human cognition that the next self-deepfake app will surely exploit.

I’d had an early invite to try it for weeks, but couldn’t quite bring myself to open it. Deutsch, on the other hand, had been using Sora heavily. He’s worked in animation and augmented reality for 20 years; he’s both interested in and not easily duped by tech. But he’d been using Sora to make AI videos of himself doing things he’s never done. And now he’s having trouble. Not with the videos. With his memory.

He created an AI-generated video of himself scaling Mount Rushmore and watched it several times. Then, a few weeks later, he was getting his dog ready for a walk. He felt a flicker of recollection, of that time he’d climbed Mount Rushmore. “I felt just this twitch of confusion about it. It felt like a memory, very faintly.”

Not a full memory, exactly. But not not a memory either.

A memory twitch like this might not sound alarming, especially considering the many dangers of deepfake technology like Sora. Within hours of its public release, users were generating videos of mass shootings, copyrighted characters promoting crypto scams, SpongeBob dressed as Hitler. Misinformation, slop, harassment. Those are real problems. But something subtler and eerier is going on with Deutsch. His minor but real neurological glitch is a sentinel signal that technologies like Sora are capable of interfering with some of the brain’s root processes. Things like autobiographical memories, which form the raw material of identity. Things like how the brain determines whether a thought is a memory based in reality, or not. Sora was just the first app that let you deepfake yourself; I suspect it won’t be the last. I wanted to understand what was happening in the brain—and what it means that a free app on your phone can now manufacture, in seconds, the kind of mental imagery the brain is least equipped to reject.


To get a sense of what might be going on in Deutsch’s brain, I called up Elizabeth Loftus, the psychologist who made “false memory” a household word. Her famous 1995 “lost in the mall” study convinced people they’d been lost in a shopping mall as children, using nothing more than a fabricated paragraph slipped in among real family memories. More recently, she teamed up with MIT’s Media Lab to show that AI-generated videos from AI-edited images could double false memory rates.

When I described what Deutsch had experienced, she wasn’t surprised. Exposure to AI-generated images or video, she said, could absolutely contaminate memory.

She was intrigued. Most deepfake research, including her recent work with MIT Media Lab, focuses on memories about other people or events. The concern is misinformation. You see an AI-altered image of a politician, and later you misremember what they did. At scale, chaos ensues. What Sora enabled was different: false memories about yourself. I’d first heard of Loftus’s work during a seminar on memory at Columbia, sitting in a cramped room at the Neurological Institute on 168th Street. The circumstances always seemed like edge cases—whether undergrads tricked by clever experimenters, or traumatized patients confused by leading questions. This was not a phenomenon I thought would be replicated by a short-form video AI slop app. 

I wanted to understand what was happening in the brain—and what it means that a free app on your phone can now manufacture, in seconds, the kind of mental imagery the brain is least equipped to reject.

To understand why we’re so susceptible to false memories requires understanding that the brain doesn’t store memories the way a phone stores photos. When you live through something, your hippocampus— a deep brain structure vaguely shaped like a seahorse—encodes that experience by binding together its constituent pieces: what you saw, what you heard, where you were, how you felt. That bound-together pattern is the memory. Over hours and days, the hippocampus replays these patterns, perhaps while you sleep, gradually strengthening their hold in the cortex, in a process called consolidation. What makes these memories so unlike phone storage, and especially relevant here, is that recalling a memory means the brain must partially relive it. The brain recalls by reactivating some of the same sensory and spatial patterns that were present during the original experience. Your brain doesn’t access a stable, static stored memory of yourself at that summer picnic in the park; your brain recreates it by activating some of the same neural circuitry that fired when you were actually squinting in the sun, actually wiggling your toes in the warmed grass. During recall, it fires again, faintly.

The beauty of memory, not as a static storage bank but as a dynamic process of on-demand re-creation, is that it’s efficient. You can access a tremendous amount of information about your past without having to dedicate special storage space to your personal archive. But that efficiency comes with risks. Each time you replay and reconsolidate a memory, it can subtly change. Other things you’re thinking about during recall, how you feel while recalling it, other, similar memories that activate similar patterns of neurons, these can mix and mingle and, ultimately, change the reconsolidation of the original memory itself. And once changed, it doesn’t revert because there is no gold-standard stored version. There is only the latest replay. And because memories are, essentially, reactivations of specific patterns of sensory and other neural activity, that means that sensory patterns alone can get consolidated as memories. This is a false memory. And a false memory, once seeded, benefits from the same machinery as real ones. And the brain’s fact-checker, the prefrontal cortex, arrives late to the scene: the reactivation of sensory and other neural pathways is already underway, the memory reconstruction already in progress, before any evaluation of whether the memory is genuine even begins.

Applying any of this to a deepfake app is uncharted territory, but talking to Loftus, I started to see how false memory science might apply. The passage of time would likely be important. Initially, a person might remember creating a specific video, and the mind could reject the contents as false. But if the memory of creation fades while the contents persist, the pre-frontal fact-checking defenses begin to disappear. The false memory is more likely to feel real. False memories would probably strengthen with repeated exposure to the video—the illusory truth effect shows that repetition makes false claims feel truer, and while studies of false autobiographical memory have mostly involved active suggestion rather than passive viewing, those using multiple sessions consistently produced stronger effects. So false Sora memories would probably strengthen with repeated exposure to the video—essentially, Sora would be stimulating replay processes in the brain, helping to further consolidate the false memory. And knowing they are AI generated may not matter: In Loftus’s MIT study, labeling content as “AI-enhanced” didn’t prevent false memory formation. We would probably tend to defer to AI-generated videos simply because they resemble the kind of external record we’re used to treating as incontrovertible truth. Sora capitalized on every one of these dynamics: synthetic video of yourself, in your pocket and infinitely rewatchable, stealthily inheriting the authority already granted to the phone’s camera roll.

To illustrate how these forces compound, Loftus offered her own poignant memory. “My house burned down in a large fire in Los Angeles. This is when I was in high school. But it happened to have appeared in a magazine—there were photographs.” She’d consulted that magazine repeatedly over the years, the way Deutsch might end up returning to his Mount Rushmore video. “And my entire memories are just what’s in this magazine. Now, if you asked me anything else that isn’t a picture here, I think I’d have trouble telling you.”

The external record of an event, repeatedly visited, becomes what you remember. This strikes me as why labels and tech literacy can only go so far in protecting our minds from what Loftus’s MIT study calls “synthetic memories” or memories implanted by AI of events that never occurred. You can know exactly how the trick works and still fall for it because metacognition doesn’t override encoding. The kinds of proposed fixes I’ve heard of—things like labels, disclaimers, AI literacy initiatives—will probably help but only partially, because they assume that knowing is enough, that we have a level of conscious awareness of, and control over, memory formation that, biology suggests, we simply don’t have. 

Of course, there’s one big difference between Loftus’s memory of the house fire and Deutsch’s fanciful scaling of Lincoln’s nose. One was real, the other wasn’t. Not only unreal, but unlikely. “I would make a distinction between something that’s plausible and implausible,” she said. “If suddenly there’s a picture of you in a Russian prison in Siberia and you’ve never been, you’re obviously going to be able to reject it. Maybe you’ll have a weird feeling seeing yourself, but you’re just going to know you’ve never been.”

Fair enough. I started to think that maybe it’s a stretch to say that Deutsch’s “twitch of confusion” is anything to be concerned about. But then I talked to another Sora user and things got weirder.


Elena Piech is an interactive producer who has spent years building immersive experiences like virtual reality for major entertainment and technology companies. She had been experimenting with AI video tools for months, but something about Sora was different. When we talked, she was trying to pin down what exactly was happening when she watched herself in imaginary scenes. She gave an example: a video of her avatar watching a huge screen overlooking a Blade Runner-style futuristic city.

She said she could describe the panorama of that scene, what it felt like to be there, overlooking the city, even though the place doesn’t actually exist, and that she knows she couldn’t have visited.

What Piech was describing, I realized, involved spatial memory: the brain’s capacity to encode and reconstruct the three-dimensional layout of environments you’ve inhabited. When you remember your childhood bedroom, you don’t just recall an image; you can mentally rotate through the space, sense where the door was relative to the window, feel the room’s proportions. The hippocampus is central to this process, building what neuroscientists sometimes call cognitive maps—internal models of space constructed from actual navigation and sensory experience. Normally, fiction doesn’t produce this kind of encoding. Piech told me she’d recently started watching Friends for the first time and found she couldn’t do the same thing. The set stayed two-dimensional, flat, a place observed but never inhabited. But with her Sora generations, Piech said she could feel the spatial layout around her synthetic self, a sense of the panorama and depth of a Blade Runner cityscape that couldn’t possibly exist. She described it as a “3D mind map,” a visceral sense of the space she usually associates with places she’s actually been. If Deutsch had described a neural ripple, Piech was describing a wave—an electrochemical disturbance that didn’t dissipate but propagated until it lapped the shores of distant brain regions, setting off the encoding of spatial memories for places she’d never visited.

David Pillemer, an emeritus professor of psychology at University of New Hampshire who studies how specific moments lodge in memory and shape our lives, offered a clue. When a memory includes a visual image, he told me, the person remembering it is more likely to believe it actually happened. Seeing yourself in the scene is a hallmark of vivid memories.There’s an evolutionary logic to this, he explained. “If your life was in danger 5,000 years ago and you were at the water hole and the tiger came up, if you have a visual image of what happened, it’s good to not only hold that image, but believe the image, trust it. You’ll avoid that water hole.” The visual doesn’t just record experience; it confers credibility. I thought about the yoga teacher—the French acrobat with dreads, the studio, the spot where my wife says we sat. Her evidence was a lifelike mental image. Mine was an argument. Pillemer had just told me which one the brain trusts. And that ancient trust, calibrated over thousands of generations to actual waterholes and actual predators, doesn’t have a mechanism to determine whether the image was rendered on a server farm. 

Piech’s experience suggests that Sora videos could activate spatial memory, meaning that Sora videos also tripped up the brain’s more fundamental systems for sorting real from imagined. “Although it may be disconcerting to contemplate,” as cognitive psychologist Marcia K. Johnson wrote in a 2006 paper, “true and false memories arise in the same way. Memories are attributions that we make about our mental experiences based on their subjective qualities, our prior knowledge and beliefs, our motives and goals, and the social context.” Johnson’s work on source monitoring, which is the brain’s process for sorting reality from imagination, revealed there’s no tag, no stamp in the brain that says this actually happened. Instead, a scene’s qualities during recall—how vivid it is, how spatially coherent, whether it arrives unbidden or requires effort to reconstruct—are what make it feel real or imagined. Memories of actual events are usually richer, more embedded in space and context. Imagined scenes, or recollections of scenes from movies, tend to feel thinner, more schematic. But the distributions overlap, and the brain relies on these imperfect cues to sort memory from imagination.

The trouble is that these cues can mislead. If remembering a synthetic experience activates the brain just widely enough—rich perceptual detail, spatial depth, the feeling of having been somewhere, of having been with someone—it stops registering as fantasy and starts registering as memory. Piech’s recollection of Sora generations were arriving with enough of those qualities to blur the distinction.

I expected that the fantastical or outlandish videos would have unsettled Piech the most, but that wasn’t the case. Most unsettling were the videos she asked to be set in her apartment, which Sora had apparently extrapolated from the background of her initialization video: a glimpse of TV, two picture frames on the wall, enough for the model to generate something that felt, as she put it, “65 percent there.” The wall colors roughly right, the TV in the right place, the pictures close enough. Her first instinct was that OpenAI had somehow accessed her camera roll. They hadn’t; Sora had just guessed well enough—presumably from the selfie snippets she used to initialize the app—to briefly fool her about her own living room.

This is the plausibility threshold Loftus was pointing out. A Russian prison is easy to reject—there are other, more systematic cognitive processes that check what feels familiar against what you know, and you know you’ve never been to Russia. But your own apartment at 65% fidelity sits in a zone of ambiguity. It activates familiarity circuits, which run through the perirhinal cortex and operate partly beneath conscious awareness. When Piech’s Sora-generated apartment matched enough features of her actual living room—TV placement, wall color, the pictures close enough—it activated perirhinal neurons, recruiting enough neural corroboration to slip past whatever rational defenses would reject it as synthetic. It started to feel real.

Then there’s what happened with the jet-ski video. Piech and Deutsch know each other, and Sora let users grant permission to appear in each other’s generations, so Deutsch made a video of the two of them jet-skiing on the East River in a gang called the Barracudas, talking smack to tourists on the ferry. Piech laughed when she watched it, but she also had an odd sensation. “It’s weird describing this to you,” she said. “Obviously it’s just a video. But it kind of does feel like—oh yeah, we hung out. Somehow my brain’s like, yep, that’s a social interaction.” The neural wave had propagated further, reaching brain areas that encode not just place but social connection.


I don’t know what to make of all this. A faint spatial memory of a place that doesn’t exist, a glimmer of social connection from an interaction that never happened—these might seem like neurological curios, oddities to file away in an academic’s desk but nothing to spill 5,000 words over. But I find myself unsettled in a way I can’t quite shake. Sora wasn’t just producing AI slop. The neural systems being activated here—the ones that register social connection, that lay down the raw material of who you are, that sort real from imagined—aren’t supposed to be accessed by a random app. They’re supposed to require actual experience. And yet.

I’m not going to tell you how worried you should be. But I want to think through what this might mean.

But I find myself unsettled in a way I can’t quite shake.

One potential consequence is how these tools could shape identity, at scale. I was particularly taken by a term Deutsch coined: propagandi, or propaganda directed at yourself. If propaganda works by shaping collective memory, propagandi is more atomized, more intimate. You’re the propagandist and the mark, constructing a version of yourself that doesn’t exist, for an audience of one. I called Northwestern University psychologist Dan McAdams to help me stress-test Deutsch’s speculation. McAdams developed the influential concept of narrative identity—the idea that identity is built from autobiographical memories, that the self you’ll be tomorrow is constructed from the memories you have today. Contaminate the memories, and the identity may shift. When I described what Sora users like Deutsch were experiencing, McAdams said he hadn’t heard of the phenomenon yet, “but a moment’s reflection suggests that it is inevitable.” These AI videos could “ultimately be encoded and reworked as ‘things that happened to me,’ and then perhaps ‘important things that happened to me that are now part of my life story.” Propagandi, in other words, isn’t just a clever coinage. It names a mechanism for rewriting who you are.

A hopeful read isn’t hard to find. Piech made a K-pop dance video of herself, fluid and confident, moving in ways she can’t. After watching it a few times, she told me, she started to feel like maybe she actually could. Athletes have used visualization for decades; maybe Sora was just a more vivid format. Therapists working with trauma have long known that memory can be beneficially malleable; perhaps tools like Sora, carefully deployed, could help people revise the scenes that haunt them. 

But consider who’s building these tools. OpenAI confirmed that user prompts and outputs trained the model by default; meanwhile, videos which were saved, shared, or regenerated almost certainly shaped the feed. Users save confident-looking videos and regenerate awkward ones. Across millions of interactions, the system drifts towards flattery. More than a decade of social media research has documented the harm of exposure to idealized images of others. But there’s always been an escape hatch: The comparison is to someone else. The escape hatch works because comparison requires holding self and other apart. What happens when the idealized image is you? What the memory research suggests is that Sora generations could have, with time, slipped beneath that defense. The gap between your real autobiography and your synthetically infused, commercially tainted one drives a pervasive sense of inadequacy, as your actual life fails to live up to a narrative identity that was never yours to begin with. “What people could do with marketing and this technology is making a lot of people salivate right now,” Deutsch wearily noted.

Imagine how this plays out for a 17-year-old girl who’s been on the app for months. She’s given it her face, her voice, her mannerisms. The app knows from her browsing that she’s been looking at prom dresses. But the video that appears in her Sora feed isn’t anything special—it’s just her, in her own bedroom, getting ready for school on what looks like a normal morning, wearing a dress from a brand she can’t quite afford. The synthetic version of her isn’t doing anything extraordinary. She just looks like herself on a slightly better day. Skin a little clearer, hair a little more together. The dress, by the way, is tagged and purchasable. On Instagram, in viewing someone else’s photos, she’d be comparing herself to someone else, and there are more psychological defenses against that: The other person’s feed is curated, it’s not real life, and so the comparison isn’t fair. This defense doesn’t always work, but it’s there. Sora was different. The feeling isn’t envy. It’s closer to confusion. Not I wish I looked like her but Why don’t I look like that? or even more insidious, Why don’t I look like myself? And if she’s been watching these videos for months, memory research suggests that remembering the AI videos won’t register as being memories of AI videos. They’ll feel vaguely like mornings she half-remembers, days when things just came together a little more easily. Each actual morning, in her actual mirror, will begin to always feel like an off day, which is precisely the feeling the whole experience was engineered to produce, and precisely the feeling a “Buy” button is eagerly positioned to resolve.

But something else was nagging at me, in addition to the potential psychological consequences: Even something as intimate as autobiographical memory doesn’t form in isolation. It’s fundamentally social. In a process scientists endearingly call maternal reminiscing, children learn to shape experience into story through dialogue with caregivers, a process that continues throughout life: the friend who leans in or looks skeptical, the partner who remembers it differently, the listener who asks a question that reframes the whole event. Even the distraction level of the listener can affect how well we remember our own memories. In one experiment, a psychologist had participants tell a story to a friend who was secretly distracted. A month later, the speakers remembered their own experience less well simply because of how a listener behaved during their retelling of it. The attentive listener isn’t just receiving the memory; they’re helping to construct it.

Now imagine referencing something your friend doesn’t share, because it never happened. The blank look. The awkward silence. You might question yourself, wondering if you imagined it. You might question them. Or you might learn to stop bringing it up altogether, retreating from actual human social interaction to more AI simulacra of human social interactions, which never push back, which always affirm. The false memory, born in isolation, produces isolation again when it enters conversation.

Piech seemed to be thinking about this, even if she wasn’t citing social psychology to back her intuitions up. Her partner was traveling for two months, and Piech found herself pondering whether she could use Sora to maintain a sense of connection—generate videos of them together, something to watch when she missed her partner. Then she thought through what that would actually mean: one person accumulating an archive of shared experiences the other had never seen, building memories of a relationship that only existed on one side. “What if I watched all of them and she didn’t watch them,” she said, “and now I’m referring to these things that she has no idea about?” She decided not to make the videos.

Piech stopped because she wondered what Sora might do to her relationship. I couldn’t even get started. I’d pulled up Sora’s App Store page at least twice, downloaded the app, then deleted it. I’d then toggle the invite text message back to “unread” and swear to think about it harder, later. I already had reasons not to download Sora—the usual ones, about data privacy and the general question of whether the world needs more AI-generated slop. Those are the reasons I’d give if you asked me at a dinner party. They’re rational, articulable, and they were all in place before Deutsch told me about his bizarre memory twitch.

What I likely underestimate is my own vulnerability to this technology. The psychological and neurological mechanisms that sort real from imagined—the ones Pillemer described, the ones Johnson mapped, the ones Piech felt activate while she watched herself in a city that doesn’t exist—don’t always check in with the part of you that might know better. They don’t consult your AI literacy or your PhD or your healthy skepticism about OpenAI’s privacy policy. They run underneath all of that, and they trust pictures.

The yoga class thing is amusing, but it truly is a simple question of whether my wife has a false memory, or I have forgotten something that really happened. I honestly have no idea which it is, and as confident as I am that I’m right, neuroscience suggests this confidence is unwarranted, and I accept that. There was a period, years later, when my wife and I had differing accounts of a consequential series of events. For a while, we couldn’t talk about the subject at all. Not a misremembered yoga instructor but a stretch of our life together that we had apparently lived through twice, once each, in parallel versions that couldn’t be reconciled. I won’t get into the details. This was somewhat different from the yoga instructor. Here, two people agreed they were present for the same events in real time, but experienced them differently (for all the reasons this happens—hunger, emotional state, past experiences, attention, etc). That difference in experience, in turn, led to different memories—the way we suppress certain details, cast an action in a different light. You could say one of us was remembering right and one of us was remembering wrong, but the reality is more complex. We were remembering things that happened the way memories are always made: subjectively and idiosyncratically. Memory, with time, is essentially storytelling. And with time, we settled into two separate narratives each replete with its own accompanying details. 

The hardest part wasn’t the disagreement itself, but what it took away: the quotidian ability to say Remember when? and have the other person nod. You don’t realize how much of a relationship runs on the shared archive, the stuff you can both reach for without negotiation until it’s not there. We got through it, but what we arrived at wasn’t closure, the way a judge would rule on the facts of the case. It was something richer and truer to the human experience: that another person’s memory and experience can diverge from yours, and that to love someone is to accept rather than be threatened by this divergence—that a perfectly duplicate shared memory bank is not a prerequisite for building a life together. 

My wife and I made our divergent narratives the old-fashioned way, with proximity and time and two brains encoding and decoding the same events differently. Sora did it in a closed loop between you and a screen. No one to push back, no one to say that isn’t what happened. By the time the memory enters a conversation with someone who actually shares your life, it has already hardened into something that feels like yours.

Some nights after our son is asleep, my wife and I sit on the couch and reconstruct the day for each other. What he said at breakfast, the weird thing he did with his yogurt spoon, whether the stalling tactics at bedtime were really that outlandish or whether we were both just tired. Sometimes we seem to disagree on the details, even if we were both there. She’ll point out something I didn’t notice, or I’ll interpret something we both noticed differently, or she’ll add a layer of interpretation by connecting his actions to similar actions the day before. The narrative shifts a little, adjusts a little to accommodate both of us, and by the time we’ve moved on we both begin to consolidate memories of something neither of us quite experienced—which, in the end, is the uncomfortable truth: Memory and experience are not synonymous. I used to think of this process as more akin to fact-checking, of sifting fact from embellishment, reality from interpretation. But it’s not quite that. It’s something more meaningful than checking facts: sitting there, remembering them together.


Tim Requarth is director of graduate science writing and research assistant professor of neuroscience at the NYU Grossman School of Medicine, where he studies how artificial intelligence is changing the way scientists think, learn, and write. He writes “The Third Hemisphere,” a newsletter that explores AI’s effects on cognition from a neuroscientist’s perspective. His essays and reporting have appeared in The New York Times, The Atlantic, and Slate, where he is a contributing writer.


Editor: Krista Stevens
Fact-checker: Julie Schwietert Collazo
Copyeditor: Cheri Lucas Rowlands

Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

Helium Is Hard to Replace

1 Share

The war in Iran, and the subsequent closure of the Strait of Hormuz, has unfortunately made us all familiar with details of the petroleum supply chain that we could formerly happily ignore. Every day we get some new story about some good or service that depends on Middle East petroleum and the production of which has been disrupted by the war. Fertilizer production, plastics, aluminum, the list goes on.

One such supply chain that’s suddenly getting a lot of attention is helium. Helium is produced as a byproduct of natural gas extraction: it collects in the same underground pockets that natural gas collects in. Qatar is responsible for roughly 1/3rd of the world’s supply of helium, which was formerly transported through the Strait of Hormuz in specialized containers. Thanks to the closure of the strait, helium prices have spiked, suppliers are declaring force majeure, and businesses are scrambling to deal with looming shortages. (For many years the US government maintained a strategic helium reserve, but this was sold off in 2024.)

What I find interesting about helium is that in many cases, it’s very hard to substitute for. Helium has a unique set of properties — in particular, it has a lower melting point and boiling point than any other element — and technologies and processes that rely on those properties can’t easily switch to some other material.

Helium production

Helium is the second lightest element in the periodic table (after hydrogen), and the second most common element in the universe (also after hydrogen). But while helium is very common on a cosmic scale, here on earth it’s not so easy to get. Because helium is so light, it rises to the very top of the atmosphere, where it eventually escapes into space.1 So essentially all helium used by modern civilization comes from underground.

Helium is produced via the radioactive decay of elements like uranium and thorium, and it collects in underground pockets of natural gas. This source of helium was first discovered in the US in 1903, when a natural gas well in Kansas produced a geyser of gas that refused to burn. Scientists at the University of Kansas eventually determined that this was due to the presence of helium. Like petroleum, helium has collected in these pockets over the course of millions of years, and thus (like petroleum) there’s a limited supply of underground helium that can be extracted. As with petroleum, people are often worried that we’re running out of it.

Because helium is a byproduct of natural gas extraction, and because only some natural gas fields have helium in appreciable quantities, a small number of countries are responsible for the world’s supply of helium. The US and Qatar together produce around 2/3rds of the world’s helium supply. Russia, Algeria, Canada, China, and Poland produce most of the remaining balance.

Elemental helium has a few different useful properties. The most important one is that, thanks to the small size and completely filled outer electron shell of helium atoms, helium has a lower boiling point than any other element. Liquid helium boils at just 4.2 kelvin (-452 degrees Fahrenheit). By comparison, liquid hydrogen boils at 20 K, and liquid nitrogen boils at a positively balmy 77 K.

Its low boiling point makes helium very useful for getting something really, really cold. When a liquid boils, it transforms into a gas, and during this process it will pull energy from its surroundings due to evaporative cooling. This is why your body sweats: to cool you down as the liquid evaporates. When a liquid has a very low boiling point, this heat extraction happens at a very low temperature. Helium also stays a liquid at much lower temperatures than other elements. Nitrogen freezes solid at 63 K, and hydrogen freezes at 14K, but at atmospheric pressure helium stays a liquid all the way to absolute zero. If you need to cool something to just a few degrees above absolute zero, liquid helium is essentially the only practical way to do that.

Helium also has a few other useful properties. As we noted, helium is very light: it will naturally rise in the atmosphere, which makes it useful as a lifting gas. Thanks to its filled outer electron shell, it is inert, and won’t react with other materials. Helium also has high thermal conductivity — at room temperature, helium can move heat about six times better than air.

The uses of helium

The world uses around 180 million cubic meters of helium each year. (This sounds like a lot, but it’s just 0.11% of the 159 billion cubic meters of nitrogen the world uses each year, and 0.004% of the over 4 trillion cubic meters of natural gas that the world uses each year.) But while it’s not used in enormous quantities compared to some other gases, helium is nevertheless quite important. Different industries make use of helium’s properties in different ways, and while in some cases there are reasonable substitutes for helium, in most cases helium has no practical replacement.

MRI machines

Some of the biggest consumers of helium are MRI machine operators, which consume around 17% of the helium used in the US. MRI machines work by creating very strong magnetic fields, which change the orientation of hydrogen atoms in tissues in your body. A pulse of radio waves is then sent into your body, which temporarily disrupts this orientation. When the pulse stops, different types of tissue return to their alignment with the magnetic field at different rates, and that rate of change can be measured and converted into a picture of the interior of the body. The strong magnetic fields in MRI machines are created by superconducting magnets: when some materials get cold enough, they drop to zero electrical resistance, which makes it possible to put enormous amounts of electrical current through them and create extremely strong magnetic fields.2 The vast majority of MRI machines used today use superconducting magnets made from niobium-titanium (NbTi), which becomes superconducting at 9.2 degrees above absolute zero. This is well below the boiling point of any other coolant, making liquid helium the only practical option for cooling the magnets. A handful of MRI machines have been built using higher-temperature superconductors that don’t require helium cooling, but the vast majority of the 50,000 existing MRI machines in the world require helium.

The helium consumption of MRI machines has fallen drastically over time. Early MRI machines would lose helium at a rate of around 0.4 liters per hour, requiring large tanks of 1000-2000 liters that needed to be refilled every few months. (It’s notoriously difficult to prevent gaseous helium from leaking out of containers, which is why helium is also often used for leak detection.) But modern MRI machines are “zero boil-off,” which essentially never need to be recharged with helium. As these machines take up more market share, the helium requirements of MRI machines can be expected to fall. But for the foreseeable future, MRI will remain a substantial source of demand.

Semiconductors

Another major consumer of helium is the semiconductor industry, which uses around 25% of the helium worldwide, and around 10% of the helium in the US.3 As with MRI machines, helium is used to cool superconducting magnets, which are used to increase the purity of silicon ingots grown using the Czochralski method. Helium is also used as a coolant in some production processes, as well as a non-reactive gas to flush out some containers, for leak detection, and for a variety of other uses. A 2023 report from the Semiconductor Industry Association noted that helium was used “as a carrier gas, in energy and heat transfer with speed and precision, in reaction mediation, for back side and load lock cooling, in photolithography, in vacuum chambers, and for cleaning.” The same report notes that for many of these uses, helium has no substitute.

Unlike MRI machines, which have used less and less helium over time, helium usage in the semiconductor industry seems to be trending up: some sources claim that helium consumed by the semiconductor industry is expected to rise by a factor of five by 2035. This seems to be in part due to the development of DUV and EUV semiconductor lithography machines, which require helium to function. Unlike many other gases, helium absorbs almost no EUV radiation, which (as I understand it) makes it hard to substitute for helium in EUV machines.

Fiber optics

Helium is also used in the manufacturing of fiber optic cable. Optical cable is made with an inner core of glass, surrounded by an outer “sleeve” of glass with a different index of refraction. This keeps photons within the inner core via the phenomenon of total internal reflection. During the manufacturing process, helium is used as a coolant when the outer “sleeve” is being deposited onto the core — with any other atmosphere, bubbles form between the two layers of glass. Roughly 5-6% of helium worldwide is used for the production of optical fiber, and there’s no known alternative.

Purging gas

Other than semiconductor manufacturing, other industries (particularly the aerospace industry) use helium as a “purge gas” to clean out containers. Cleaning out a tank of liquid hydrogen, often used as a liquid rocket fuel, requires a gas with a boiling point low enough that it won’t freeze when it contacts the hydrogen. Cleaning a tank of liquid oxygen doesn’t require a gas with quite as low a boiling point, but it is best to use an inert gas to reduce the chance of it reacting with the highly reactive oxygen. Aerospace purging makes up around 7% of US helium consumption. Around half of that is used by NASA, which is the single biggest user of helium in the US.

Lifting gas

Because helium is lighter than air, it’s also used as a lifting gas in balloons and lighter-than-air airships as an alternative to the highly flammable hydrogen. Each Goodyear Blimp, for instance, uses around 300,000 cubic feet of helium. Around 18% of the helium consumed in the US is as a lifting gas.

Scientific research and instruments

Helium is also widely used in scientific research. Much of this is for keeping things cold: superconducting magnets, such as those used in the Large Hadron Collider, typically require helium, as do the superconducting elements in SQUIDs, which are highly sensitive magnetic field detectors. Helium is also used in mass spectrometers, which are used for, among other things, detecting microscopic leaks in containers.

This is a major category of use in the US; roughly 22% of its helium consumption goes to “analytical, engineering, lab, science, and specialty gases.”

Welding

In the US, helium is also used for welding: its high thermal conductivity and its inertness make helium an excellent shielding gas, which prevents the pool of molten metal from being contaminated before it cools. In the US, welding makes up roughly 8% of helium use, but elsewhere in the world, it’s more common to use other shielding gases like argon.

Diving

Helium is also used for breathing gas in deep sea commercial diving. At depths beyond 30 meters, breathing nitrogen (which makes up 78% of normal air) causes nitrogen narcosis, and diving beyond this depth is done using gas mixes that replace part of the nitrogen for helium. Roughly 5% of helium consumed in the US goes towards diving.

Helium for diving is difficult to substitute for. Virtually every other breathable gas except for possibly neon causes some degree of narcosis, and neon is heavier than helium, making breathing more difficult.

Conclusion

For some of these applications, it’s possible to substitute helium with other materials. There are other shielding gases, such as argon, that can be used for welding, and other lifting gases, such as hydrogen, that can be used for balloons or airships. In other applications, it’s possible to dramatically reduce the consumption of helium via recycling systems or other methods designed to reduce its use. As we’ve noted, this has occurred with MRI machines, where modern ones use far less helium than their predecessors. And it seems to have happened with aerospace purging. A 2010 report from the National Academies of Sciences notes that if NASA and the Department of Defense were sufficiently motivated, they could dramatically reduce their helium consumption by recycling it. Since then, aerospace use of helium has fallen from 18.2 million cubic meters (26% of total US consumption) to 4 million cubic meters (7% of total US consumption). But the United States Geological Survey notes that most helium in the US is still unrecycled, and there’s lots of opportunity to dramatically reduce helium usage with various recapture and recycling systems. Many of these systems are capable of reducing helium consumption by 90% or more.

But “reducing” doesn’t mean “eliminating,” and it’s interesting to me how in so many cases there doesn’t seem to be any good substitute for helium.

1

Though thanks to circulation in the air, the helium concentration below the turbopause is roughly constant, about 5 parts per million.

2

If the magnets get too warm, the sudden loss of superconductivity, called a “quench,” can damage or destroy the magnets due to the heat generated from the now-present electrical resistance.

3

I estimated this by subtracting the 5-6% of helium used globally by the fiber optic industry from the 15% of helium used by “semiconductors and fiber optics” from the United States Geological Survey report on helium.



Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

Saturday Morning Breakfast Cereal - Spheres Part 1

1 Share


Click here to go see the bonus panel!

Hovertext:
Does this change my imaginary Erdos number to a complex number?


Today's News:
Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

A Day in the Life of an Enshittificator

1 Share

From the Norwegian Consumer Council, a funny video that warns against the dangers of enshittification. It’s part of their Breaking Free initiative:

Digital products and services are steadily becoming worse. Software
becomes increasingly difficult and frustrating to use, websites and apps
are littered with ads and spam content, and useful features are removed,
degraded, or made subscription-only. This is part of a process called
enshittification.

Enshittification happens in stages: First a company attracts users by
providing a valuable service, often seemingly for free or at an artificially
low price. The company then exploits those users to draw in business
customers, and finally abuses its business customers and claws back all
the value for itself and its shareholders.

Enshittification is the result of a dysfunctional market, where companies
have been able to get away with mistreating and exploiting consumers.
Consumers are trapped in digital services, potential competitors are
shut out, and policymakers and regulators are unable or reluctant to
clamp down on anticompetitive, illegal and otherwise abusive behavior.
In practice, a handful of tech companies have become so powerful that
they do not have reason to fear any consequences.

Tags: video

Read the whole story
mrmarchant
19 hours ago
reply
Share this story
Delete

The Use of LLM Chatbots and Human Cognitive Surrender

1 Share

What is the long-term effect of using LLM chatbots for daily tasks? According to a study (DOI link) by Steven D Shaw and Gideon Nave of the University of Pennsylvania the observable effect is that of ‘cognitive surrender’, where users are seen to blindly accept the generated answers.

There has long been a struggle between those who feel that it’s fine for humans to rely on available technologies to make tasks like information recall and calculations easier, and those who insist that a human should be perfectly capable of doing such tasks without any assistance. Plato argued that reading and writing hurt our ability to memorize, and for the longest time it was deemed inappropriate for students to even consider taking one of those newfangled digital calculators into an exam, while now we have many arguing that using an ‘AI’ is the equivalent of using a calculator.

Yet as the authors succinctly point out, there’s a big difference between a digital calculator and one of these LLM-powered chatbots in how they affect human cognition, and it’s one that’s worth thinking about for yourself.

Surrender Versus Offloading

Cognitive offloading is the practice of shifting cognitive tasks to external aids, and it is thought to make learning complex tasks easier. In contrast to rote memorization of facts like dates of events and formulas, if we consider books to be an external memory storage device, then we can offload such precise memorization to their pages and only require from students that they are capable of efficiently finding information, as well as judging it on their merit.

An often misquoted anecdote here pertains to Albert Einstein, who was was once asked why he couldn’t cite the speed of sound from memory. To this he responded with a curt:

[I do not] carry such information in my mind since it is readily available in books. …The value of a college education is not the learning of many facts but the training of the mind to think.

Einstein is making the case for the benefits of cognitive offloading. Rote memorization does not enhance one’s cognition, and the ability to solve complicated equations and sums without so much as the use of pen and paper is fairly irrelevant when a slide rule and a digital calculator can offload all that work. As a benefit these devices tend to be more precise, faster and very accessible.

It is still important to have a ‘feeling’ for whether a calculation is correct, and one should never assume that what is written in a book is the absolute truth, and that is the key difference between cognitive offloading and “cognitive surrender”. If you type numbers into your calculator, and they seem off, and you re-type them to be sure, that’s cognitive offloading. If you don’t bother with the sniff-test, that’s cognitive surrender.

So are we using LLM chatbots as reference sources that we’ll think twice about, or is it something more?

External Cognition

In the referenced study, Shaw et al. had three groups of volunteers take a standardized test, during which one group had to rely purely on their own wits, the second group could use an LLM chatbot which gave correct answers, while a third group also had access to this chatbot, but for them it gave wrong answers.

System 3 facilitates cognitive surrender. (Credit: Shaw et al., 2026)
System 3 facilitates cognitive surrender. (Credit: Shaw et al., 2026)

Perhaps unsurprisingly, the test subjects used the chatbot quite a lot when available, with predictable results. In the ‘tri-system theory of cognition’ that Shaw et al. propose in the paper, the external cognitive system (‘System 3’) is that of the chatbot, whose output is clearly being accepted verbatim by a significant part of the test subjects. If said chatbot output is correct, this is great, but when it’s not, the test results massively suffer.

Where this is worrisome outside of such a self-contained tests is that people are exposed to endless amounts of faulty LLM-generated text, such as for example in the form of ‘AI summaries’ that search engines love to put front and center these days. Back in 2024, for example, Avram Piltch over at Tom’s Hardware compiled a amusing collection of such faulty outputs, some of which are easier to spot than others.

Ranging from the health effects of eating nose pickings to the speed difference between USB 3.2 Gen 1 and USB 3.0, to classics like adding Elmer’s glue to pizza sauce, it’s generally possible to find where on the internet a ridiculous claim was scraped from for the LLM’s dataset, while other types of faulty output are simply due to an LLM not possessing any intelligence or essentials like grasping what a context is.

Meanwhile other types of output are clearly confabulations, a fact which ought to be obvious to any intelligent human being, and yet it seems that so much of it passes whatever sniff test occurs within the cognitive capabilities of the average person.

Making Decisions

Anterior cingulate gyrus. (Credit: BodyParts3D, Wikimedia)
Anterior cingulate gyrus. (Credit: BodyParts3D, Wikimedia)

In the generally accepted model of cognitive decision making we see two internal systems: the first is the fast, intuitive and emotion-driven system. The second is the deliberate and analytical system, which tends to take a backseat to the first system in general, but could be said to be checking the homework of the first.

Although psychology is hardly an exact science, in the scientific fields of systems neuroscience and cognitive neuroscience we can find evidence for how decisions are made in the primate brain – including those of humans – with various cortices involved in the decision-making process. Fascinating here is the activity observed in the parietal cortex where a decision is not only formed, but also apparently assigned a degree of confidence.

Lesions in the anterior cingulate cortex (ACC) have been linked to impaired decision making and the arisal of impulse control issues, as the ACC appears to be instrumental in error detection. Issues in the ACC are thus more likely to result in faulty or flawed decisions and judgements passing by uncorrected. Incidentally, the ACC was found to be heavily affected by environmental tetraethyl lead contamination, underlying the theory that leaded gasoline was responsible for a surge in crime until this additive was discontinued.

With these findings in mind, we can thus rather confidently state that the emergence of LLM-based chatbots does not really add anything new, although it could be said to worsen existing flaws within the primate brain when it comes to said decision making. It looks like we give up our oversight role when LLMs are involved.

Irrational

Of course it’s not just LLMs. One could comfortably argue that the very reason why things like politics, idols, religion, and advertising exist — none of those could exist if people were completely rational beings whose cognitive processes belonged completely to themselves.

Still, it seems that LLM-based chatbots with their often very convincingly human-like and authoritative outputs have hit the same weaknesses that unscrupulous religious leaders and scammers exploit, with sometimes tragic consequences. Although it’s clear that believing some factual misinformation generated by a chatbot is a far cry from deciding to take fatal actions based on a dialog with said chatbot, it also highlights the importance of retaining your critical thinking skills.

While we can generally trust a calculator, an LLM-based chatbot is not nearly as reliable or benign. Caution and awareness of the risk of cognitive surrendering are thus well-warranted.

Read the whole story
mrmarchant
23 hours ago
reply
Share this story
Delete
Next Page of Stories