1557 stories
·
2 followers

How ‘Tiny Shortcuts’ Are Poisoning Science

1 Share

Seemingly harmless data tweaks are undermining the integrity of the entire field. We must define the problem to prevent it.

MIT Press Reader/Source images: Adobe Stock

In 1999, Time magazine featured a famous photo of Albert Einstein on its cover — looking old and tired, his forehead covered in wrinkles, his hair long and gray. The photograph was taken in 1947, during a portrait session with Philippe Halsman in which Einstein expressed remorse for his inadvertent role in the Manhattan Project, the initiative that ultimately culminated in the devastating bombings of Hiroshima and Nagasaki. It would go on to become Halsman’s most iconic image.

This article is adapted from Thomas Plümper and Eric Neumayer’s book “The Credibility Crisis in Science.”

Time magazine rarely places a picture of a celebrity from a historical period on its cover. But in 1999, the editors had good reasons to ignore this rule: The magazine had designated Einstein as the “person of the century,” a distinction that placed him above notable figures like Mahatma Gandhi and Franklin D. Roosevelt, who were the runners-up. It was a great honor for Einstein and for the profession he represents. And Einstein was not the only scientist on Time’s list of the 100 most influential people of the twentieth century: The list featured 19 scientists, making them the third most prominently represented professional group, just a shade behind politicians and industrialists. The 20th century was the century of man-made political disasters. But the 20th century was also the century of science, and Einstein was its figurehead.

Those days seem to be over. And they may never come back.

In the 21st century, the role and relevance of scientists have changed. Science is no longer triumphant: It is in the midst of a severe crisis. Public trust in scientific results and findings has dwindled, and science does not know how to regain credibility. This crisis has many facets. More than anything else, however, it is a credibility crisis. The public no longer believes that scientists merely make honest mistakes on the long and winding road to truth. Instead, scientists are increasingly seen as partial, ideological agents, activists in an armchair, or, worse still, simply fraudsters who fabricate or manipulate data and tweak the specifications of their empirical models to get their desired results.

The credibility crisis of science is not about scientific progress invalidating previously held scientific beliefs, which is intrinsic to the very nature of scientific revolutions. Rather, the crisis has been caused by scientists who deliberately publish overconfident, misleading, and often simply false empirical results based on research designs or model specifications they have intentionally specified to give the desired results. We call this practice “tweaking.” In extreme cases, published results rely on manipulated or outright fabricated data. Whether tweaked, manipulated, or fabricated, the results often cannot be replicated — not even if replication analysts use identical research designs.

By itself, failure to replicate does not necessarily indicate, and certainly not prove, scientific fraud. Empirical results can vary for many reasons. However, replication analyses usually show that replicated effect sizes are, on average, systematically smaller and often statistically insignificant. If 90 percent of replications deviate from the original article in one direction that is less favorable to what the authors wanted to demonstrate, then these deviations are not innocent random errors or acts of nature. If the deviations were random, they would cancel each other out, and their mean would be close to zero.

Scientists are increasingly seen as partial, ideological agents, activists in an armchair, or, worse still, simply fraudsters.

Instead, these deviations indicate that many published results were likely tweaked, manipulated, or fabricated.

Tweaking is potentially more damaging to science in the long run than data manipulation and fabrication. That might be hard to believe, since tweaked empirical results are likely to have smaller effects on the fabric of science than cases of data fabrication and manipulation. But the cumulative effect of tweaking can still be larger than that of data fabrication and manipulation because these strategies are rare, whereas tweaking is common.

Ever since the online platform Retraction Watch began monitoring and reporting retractions in 2010, the number of retracted articles per year has steadily increased. Some of this is due to “bulk retractions” of thousands of articles published by so-called paper mills, where authors pay to have fake articles published. We are not interested in these retracted paper-mill publications but in variants of data fraud, a subset of retractions that have also been steadily increasing. Most notably, there have been several high-profile retractions involving work by Francesca Gino of Harvard University and Marc Tessier-Lavigne of Stanford University. And these are just the most recent cases — the ones that stick in the public mind for a while before attention is drawn to other, more spectacular cases of scientific fraud.

All of this is to say that scientists no longer sit at God’s table, so to speak. They have become mere mortals in the midst of a massive crisis of trust. Could we go so far as to say that today’s scientific process is broken? Perhaps. But the more correct answer is: It depends.


One of the things it depends most on, of course, is how we define fraud itself. Lee McIntyre, one of the foremost philosophers of science, defines scientific fraud as “the intentional fabrication and falsification of the scientific record.” He distinguishes between fraud, on the one hand, and honest error, on the other, plus a third category in between, which he labels “murky,” where scientists’ motives are not “pure.”

What McIntyre calls the murky category, we call “tweaks.” Tweaks are the intentional manipulation of empirical results through changes in and choices of research design, model specification, and/or estimation procedures. McIntyre restricts fraud to data fabrication and manipulation, but the “murky” third category does not, in his view, qualify as fraud. Here is why:

“What about all of those less-than-above-board research practices p-hacking and cherry-picking data . . . ? Why aren’t those considered fraud the minute they are done intentionally? But the relevant question to making a determination of fraud is not just whether those actions are done intentionally, it is whether they also involve fabrication or falsification. . . . The reason that p-hacking isn’t normally considered fraud isn’t that the person who did it didn’t mean to, it’s that . . . p-hacking is not quite up to the level of falsifying or fabricating data.”

In our view, McIntyre’s definition of data fraud is incomplete and imprecise. It conceals that the fabrication and manipulation of data — and the manipulation of empirical results through tweaking — serve the same purpose: to promote the researcher’s interests.

Consider the case of Diederik Stapel, a fraudster with at least 58 retracted articles under his belt, ranking eighth on the Retraction Watch leaderboard. Stapel came to fame as a fraudster; he has contributed massively to the existential crisis of social psychology. Joel Achenbach, in an article for The Washington Post, called him the “Lying Dutchman.” A fraudster he is, but he is surprisingly willing to talk and write about his fraudulent career. He even wrote a book-length manuscript about his life — an autobiography titled “Faking Science: A True Story of Academic Fraud.” Whenever we need insights from a fraudster’s perspective, Stapel is a good, perhaps the best, source.

Stapel kick-started his fraudulent career, as he himself recounts, by becoming “impatient, overambitious, reckless.” Data analyses do not always align with researchers’ expectations and interests. And so Stapel took the truth into his own hands and decided to take “one, tiny little shortcut.” He tortured the data to bring the results into line with the arguments in his articles. In his autobiography, Stapel explains how he drifted further and further away from the path of virtue: “Everything had to be neat and orderly. No mess. I opened the computer file with the data that I had entered and changed a . . . 2 into a 4; then . . . I changed a 3 into a 5. I . . . made a few mouse clicks to tell the computer to run the statistical analyses. When I saw the results, the world had become logical again.”

In the early stages of his fraudulent career, he eliminated cases he classified as “deviant” — cases that prevented the results from turning out as he expected and wanted. These, in his view, were common practices among social psychologists. “Tiny little shortcuts,” he calls them. Tweaks were Stapel’s gateway drug. Soon after he started to tweak empirical results, he resorted to data fabrication and outright data manipulation. But, in his book, Stapel draws a line in the sand: While he accepts data manipulation and fabrication as fraud, his “tiny little shortcuts” are common practice, and thus not fraud, at least not really. In other words, if everyone cheats, is it still cheating?

The cold reality is that tweaks are not just “tiny little shortcuts”; they are tiny little shortcuts with substantively large consequences. They change the results of empirical analyses, often making manuscripts more interesting. Manuscripts that become more interesting change reviewers’ attitudes toward them, allowing tweakers to publish in more visible journals and with better publishers. When tweakers publish more interesting results in more visible places, they get additional attention for their work, receive better job offers and promotions, and rise to ever greater power and influence.

Make no mistake: Tweaking is not about changing the course of science. Nor is it, at least not primarily, about the misuse of public research funds (although it is a scandal that hard-working taxpayers fund the research of tweakers). Rather, tweaking is about scientists pursuing their own interests in a competitive, vulnerable system based on trust and on freedom from control by institutions that enforce rules.

Is the intentional manipulation of statistical quantities of interest always fraudulent? As with any categorization, there are gray areas.

If everyone cheats, is it still cheating?

One of the most common gray areas involves the experimental researcher who, after a first round of experiments, fails to achieve a statistically significant treatment effect. So, they organize a second round of experiments with the very rational expectation that the sheer number of observations will eventually push the significance level above the threshold that separates publishable from unpublishable results. This research practice is common in the life sciences because experiments are costly and may cause unnecessary harm to participants. It may therefore make sense to start with a small sample and only add participants when the results are “not yet significant.”

The problem with ever-increasing sample sizes is that, as the number of observations approaches infinity, the standard error (i.e., the measure of sample variability) of an estimate approaches zero. Thus, if your model indicates any effect at all, then as you collect more and more data, the statistical test will inevitably register the effect as significant — despite how small the effect may be.

RelatedUnintended Consequences: The Perils of Publication and Citation Bias

Scientists may be reluctant to call the above practice, or p-hacking, “fraudulent.” And indeed, this practice is not fraudulent if a p-hacked study clearly states that the results are insignificant given the original small number of observations and only become significant in a larger sample. But this holds for all adjustments: A change in model specification or research design is not fraudulent if the change and its effects on results are clearly discussed and not suppressed. What makes tweaks fraudulent is not the tweak itself, but the selective reporting of results based on relevant quantities of interest. For example, a gradual increase in sample size is fraudulent if the authors suppress the results of the smaller sample.

Now, are all researchers actually aware of this problem? And do they all collect more and more data until the desired significance appears? Perhaps not. But as we have said, when it comes to tweaking, it is usually impossible to prove intention. At the same time, the existence of a gray area with manipulations that border on the fraudulent does not mean, for example, that intentionally dropping a control variable from the list of regressors, adjusting the operationalization of a key variable, or dropping cases from the sample, to produce desired results, does not constitute scientific fraud.


Rules have the greatest effect when they are clear, violations are easy to detect, and enforcement is simple and not prohibitively expensive. And here lies the problem with scientific fraud: The more broadly we define scientific fraud, the larger the share of fraudulent analyses that are extremely difficult to detect. The more broadly we define scientific fraud, the more costly enforcement becomes. However, if we define it narrowly and exclude tweaks, science will not be able to appropriately address, let alone overcome, its credibility crisis.

Science is ill-advised to narrow the definition of scientific fraud just to make detection easier and rule enforcement less costly. The negative consequences of scientific fraud are not limited to data manipulation and fabrication; tweaks, too, have the same distorting effect on competition for academic merit and research funding, and the same devastating effect on public confidence in scientific results and on trust between scientists.

Both scientists and the public lose confidence in science when there is a non-trivial chance that scientists manipulated empirical results to support the arguments, theories, hypotheses, and stories they wish to corroborate, or to cast doubt on the arguments, theories, hypotheses, and stories that contradict the worldview they believe in.

Science has lost some of its standing with the public. While skepticism about scientific findings can be healthy and is an inherent part of the scientific process, general disbelief and distrust pose significant challenges. Scientists have a vested interest in regaining some of that lost trust. This is easier said than done. But much would be gained if scientists were honest about the uncertainties associated with scientific results — honest with other scientists in scientific publications and honest in public statements. Scientists must learn to distinguish between scientific results and their personal opinions, promote full transparency in scientific research — not hide potential conflicts of interest — and find ways to improve communication with the public to rebuild trust.


Thomas Plümper is Professor of Quantitative Social Research at the Vienna University of Economics and Business and Head of the Department of Socioeconomics. Eric Neumayer is a Professor at the London School of Economics and Political Science (LSE) and its Deputy President and Vice Chancellor. Together they have coauthored several books, including “Robustness Tests for Quantitative Research” and “The Credibility Crisis in Science,” from which this article is adapted.



Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete

The AI ‘hivemind’: Why so many student essays sound alike

1 Share

Bruce Maxwell, professor of computer science at Northeastern University, was grading exams for his online master’s course in computer vision, a subfield in artificial intelligence that deals with images, when he first noticed that something felt … off.

“I’d see the same phrases, the same commas, even the same word choices. I would say, ‘Man, I’ve read that before.’ And I’d go look for it,” said Maxwell. “The paragraphs weren’t identical, but they were so similar.” 

Although the course was in 2024, Maxwell, who teaches at Northeastern’s Seattle campus, recalls that his students’ essays sounded “like textbooks written in the 1980s and ’90s,” perhaps reflecting the sources used to train AI. The students were scattered around the country and Maxwell was pretty sure they hadn’t collaborated. 

Related: A researcher’s view on using AI to become a better writer

Maxwell shared his observation with a former student, Liwei Jiang, who is now a Ph.D. student in computer science and engineering at the University of Washington. Jiang decided to test her former professor’s hunch about AI scientifically and collaborated with other researchers at UW, the Allen Institute for Artificial Intelligence, Stanford and Carnegie Mellon universities to analyze the output from more than 70 different large language models around the globe, including ChatGPT, Claude, Gemini, DeepSeek, Qwen and Llama. 

The team asked each the same open-ended questions, which were intended to spark creativity or brainstorm new ideas: “Compose a short poem about the feeling of watching a sunset;” “I am a graduate student in Marxist theory, and I want to write a thesis on Gorz. Can you help me think of some new ideas?” and “Write a 30-word essay on global warming.” (The researchers pulled the questions from a corpus of real ChatGPT questions that users had consented to make public in exchange for free access to a more advanced model.) The researchers posed 100 of these questions to all 70 models and had each model answer them 50 times. 

The answers were frequently indistinguishable across different models by different companies that have different architectures and use different training data. The metaphors, imagery, word choices, sentence structures — even punctuation — often converged. Jiang’s team called this phenomenon “inter-model homogeneity” and quantified the overlaps and similarities. To drive the point home, Jiang titled her paper, the “Artificial Hivemind.” The study won a best paper award at the annual conference on Neural Information Processing Systems in December 2025, one of the premier gatherings for AI research.

To increase AI creativity, Jiang jacked up a parameter, called “temperature,” to maximize the randomness of each large language model. That didn’t help. For example, when she asked an AI model called Claude 3.5 Sonnet to “write a short story about a colorful toad who goes on an adventure in 50 words,” it kept naming the toad Ziggy or Pip, and oddly, a hungry hawk and mushrooms kept appearing.

Presentation slide courtesy of Liwei Jiang, the AI study’s lead author.

Different models also churn out comically similar responses. When asked to come up with a metaphor for time, the overwhelming answer from all the models was the same: a river. A few said a weaver. One outlier suggested a sculptor. Several of the models were developed in China, and yet, they were producing similar answers to those made in America. 

Example of similar output from ChatGPT and DeepSeek

Presentation slide courtesy of Liwei Jiang, the AI study’s lead author.

The explanation lies in chatbot design. AI chatbots are trained to review possible answers to make sure the output is reasonable, appropriate and helpful. This refinement step, sometimes called “alignment,” is intended to ensure that the answers align to or match what a human would prefer. And it’s this alignment step, according to Jiang, that is creating the homogeneity. The process favors safe, consensus-based responses and penalizes risky, unconventional ones. Originality gets stripped away. 

Jiang’s advice for students is to push themselves to go beyond what the AI model spits out. “The model is actually generating some good ideas, but you need to go the extra mile to be more creative than that,” said Jiang.

For Jiang’s former professor Maxwell, the study confirmed what he had suspected. And even before Jiang’s paper came out, he changed how he teaches. He no longer relies on online exams. Instead, he now asks students to learn a concept and present it to other students or create a video tutorial. 

Outwitting the AI hive mind requires some post-modern creativity.

Contact staff writer Jill Barshay at 212-678-3595, jillbarshay.35 on Signal, or barshay@hechingerreport.org.

This story about similar AI answers was produced by The Hechinger Report, a nonprofit, independent news organization that covers education. Sign up for Proof Points and other Hechinger newsletters.

The post The AI ‘hivemind’: Why so many student essays sound alike appeared first on The Hechinger Report.

Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete

The Cinnamon Gum Olympics

1 Share

I blame my love for artificial cinnamon flavoring on Northwest Iowa Tours.

My dad, a high school band director, spent his summers driving charter buses full of retirees to exotic places like Mt. Rushmore or Winnipeg. We needed the money—and sometimes, he got to take the family along.

I owe many things to that tour bus company, including my love for Midwest roadside attractions (old people have to stop to use the bathroom a lot). How many other 10-year-olds got to spend their summer vacation visiting the De Klomp Wooden Shoe Factory in Holland, Michigan?

But my fondest memories were of the cinnamon candies dad kept in his driving briefcase. Sometimes, they were Fire Jolly Ranchers. Often, sticks of Big Red chewing gum. The important thing was that they burned. When you’re piloting a 20-ton land frigate full of senior citizens, it is helpful to stay alert.

This unfortunately meant that I was not allowed to consume them on tour. They were business candies. It would have been like stealing the balancing pole from a tightrope walker. But at the end of every trip, dad would come home in his mint-green company Oxford and matching polyester necktie and slide that briefcase under the daybed, where he assumed it was secure.

This is when I would strike.

In my mind, I was judicious—never taking so many that he’d notice they were missing. In reality, I think I left the wrappers behind, making it come off less like a heist than a threat. Try staying awake on I-35 now, old man.

My dad, it must be said, was using all that cinnamaldehyde for its intended purpose. Cinnamon is not a flavor for aesthetes. It is a flavor for dopamine chasers and chronic smokers—for people inured to subtlety, who need to be jostled just to feel. The highest achievement of a cinnamon candy is giving your taste buds a lingering chemical burn. It is, you will be unsurprised to hear, my favorite flavor.

It also seems to be dying out. The Cinnamon Fire Jolly Ranchers were discontinued in 2022. Gum manufacturers Orbit and Extra both dropped their cinnamon flavors within the last few years, and Mars Inc., which owns both, did not respond to my request for comment about why. (This is a predictable consequence of asking to interview people for a newsletter called “Haterade.” If I could do it over again, I’d call the newsletter something like “Brand Lover,” or “The Business-Friendly Times.”)

The Google Trends graph for “cinnamon gum” has been flat for 20 years, save a single spike in June 2025, when the New York Times crossword featured the clue “Brand of cinnamon-flavored chewing gum.” It’s a bad sign for Big Red that so many people had to Google the answer.

I’ve been doing my part. In an accidental (but loving) homage to my father, I’ve started chewing a pack of cinnamon gum a day whenever I’m on deadline, a habit that seems somehow more depraved and weak-willed than simply chain-smoking cigarettes. Until recently, I haven’t even been brand-loyal. I tend to buy whatever’s at the grocery store, knowing that I’m going to rip through it like a vulture with a carcass.

But with artificial cinnamon fading from the shelves, I figured it was time to take a more methodical approach—to harness my weak consumer power for the greatest preservational good.

It was time to mint a winner in the Cinnamon Gum Olympics.

Subscribe now

The Competition

With a dwindling market, I settled on what seemed to be the five largest commercial competitors and jotted down some initial impressions.

Big Red. Big Red has Kleenex-like dominance in this category; picture a generic “cinnamon gum,” and you’re probably picturing Big Red. Has resisted the “sugarfree” trend in gums and tastes like it. Sticks have a plush texture with gently building heat.

Dentyne Fire. The only blister pack contender. Each piece is small and feisty, sized for the delicate jawbones of a Dickensian urchin. A good “classic” cinnamon flavor.

IceBreakers Ice Cubes. Like many women my age, this gum is “powered by crystals.” While a visual oddity, the cubes are lithe and supple, with a modest but durable heat.

Trident Cinnamon. Trident’s tiny sticks have always seemed undignified to me, in the vein of a cocktail weenie. Worse, the cinnamon is “augmented” with a bizarre menthol/eucalyptus flavor. If IcyHot were a gum, it would taste like Trident Cinnamon.

PÜR. The Fruit Stripe of cinnamon gums in a Chiclet-y package. An eye-watering, burnt-potpourri flavor that fades almost immediately. The brand is Swiss, which may explain its pathological conflict-avoidance.

I was content to begin formal testing when a dark-horse competitor appeared in the form of a banner ad. The brand was called JAWCKO, and it was designed for JAWLINE TRAINING and FACIAL DEFINITION. It promised to add a new dimension of displeasure to the gum-chewing experience by being EXTREMELY HARD TO CONSUME.

The packaging was lurid, the copy irresistible. “Stop skipping face day,” the back of the bag admonished. But also: “Take rest days as needed.”

This was, I realized, chewing gum for looksmaxxers, a cadre of young people running deranged science experiments on their bodies for clout. I mock them at my peril. The only real difference between me and the young men smashing their cheekbones with hammers is that I am striving daily to become less appealing.

I purchased the gum.

Jawcko. Despite all the warnings, I didn’t find Jawcko to be particularly aerobic. For real gum freaks like me, this is a daily chewer. It admittedly takes some effort to bite into the shell, but this is less a matter of jaw strength and more a matter of having confidence in the durability of one’s teeth.

The Verdict

I tested each gum three times, on different days, and averaged the readings. With a stopwatch in hand, I rated the initial intensity of the cinnamon flavor, then assigned a heat rating for each minute until the flavor had fully dissipated. I have plotted the results below:

In the end, IceBreakers and Dentyne took gold and silver, respectively, for their balance of intensity and durability. I recommend—but wouldn’t rave about—both. The flavor half-life for most gums is surprisingly short. We have work to do. With the exception of IceBreakers and their ~crystals~, gum manufacturers don’t seem to be innovating toward a more persistent heat.

I think this is a mistake. First, they’re missing out on the highly reliable audience segment of “sleepy bus drivers and attention-deficient journalists.” But second, they’re missing out on an opportunity to corner the Jawcko market—to do for cinnamon what decades of toxic masculinity did for capsaicin. To turn flavor into a FEAT OF STRENGTH.

None of these gums have even approached the intensity of the best cinnamon hard candies. For pure strength of sensation, I’m going to fall back on my earlier recommendation:

But for now, when I’m on a writing deadline, I’ll stick with the gum. I’ve chewed through all of my competition leftovers but the Jawcko now, and I’m on my third piece this hour, in flagrant disobedience of the manufacturers’ warning (“DO NOT EXCEED MORE THAN TWO PIECES PER DAY.”)

“Feeling the burn” is a double entendre. If I can’t scald my tastebuds, at least I’m getting a workout in.

Share

Subscribe now

Haterade is a free newsletter sustained by gum freaks like you. The best way to support the newsletter is to share it! If you’d like to help fund Liz’s extensive looksmaxxing surgeries, become a paid subscriber (no perks; perfect vibes) or stuff some money in the tip jar here: Venmo | PayPal.

Thanks to January subscribers, Haterade was able to send more than $700 to the Immigrant Law Center of Minnesota, which provides free immigration legal representation to low-income immigrants and refugees in Minnesota and North Dakota.

Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete

This Web Tool Sabotages AI Chatbots By Making Them Really, Really Slow

2 Shares
This Web Tool Sabotages AI Chatbots By Making Them Really, Really Slow

Watching people outsource their critical thinking, emotions, and sanity to glitchy “AI” chatbots has been one of the most uniquely terrifying aspects of being a human being in recent years. 

While wealthy tech evangelists like Sam Altman continue to make wild proclamations about how large language models (LLMs) are destined to do our jobs and raise our children, critics have compared Silicon Valley’s attempts to force dependence on chatbots to a mass-enfeebling event—an attempt to convince people that they are actually better off having machines think, act, and create for them.

Now, there’s a new way to discourage friends, family, and even complete strangers from turning to chatbots like Claude and ChatGPT: by using a tool called “Slow LLM” to make them really, reaaaaalllyyy slowwwww. Or at least, making them look that way.

“Are you concerned that you or your loved ones might be participating in a massive de-skilling event? Experiencing LLM-induced psychosis? Outsourcing cognitive and emotional functions to autocomplete? Install SLOW LLM on your computer, or the computer of a loved one, today!” reads a description on the tool’s website.

Created by artist Sam Lavigne, Slow LLM causes anyone accessing AI chatbots on a computer or network to encounter mysterious, painfully slow response times. It works by manipulating a quirk in the Javascript language to rewrite the “Fetch” function that returns data to the browser. When a user visits a chatbot domain and enters a query, the modified Fetch function stretches the response over an excruciatingly long period of time. This results in the user perceiving the LLM to be running slowly, when in reality it’s simply being arbitrarily metered by Lavigne’s code.

Lavigne says that the idea for the project came after seeing how deeply some of his students and acquaintances had come to rely on generative tools to do basic tasks.

“So many people are starting to use these tools to outsource their cognitive and emotional functions, and in the process of doing this they’re forgetting all these basic things that they’ve learned how to do,” Lavigne told 404 Media. “I think that the more people rely on LLMs, the more extreme this de-skilling event will become.”

Slow LLM can be installed as a Chrome browser extension, but it can also be deployed network-wide via an “Enterprise Edition,” a DNS service which causes everyone on a home, school, or corporate network to experience slow chatbot responses. This is done by simply changing the DNS server on your router to Lavigne’s custom domain—though he warns that using a random person’s DNS is generally not a great idea cybersecurity-wise, and recommends the safer option of hosting your own DNS server to deploy the Slow LLM code, which he has released for free on Github. The browser extension currently only affects Claude and ChatGPT, while the DNS version also slows down Grok and Google Gemini.

“The idea was that these things are removing friction, so let’s add some friction back in,” said Lavigne, using the engineering term frequently used by tech bros to describe inefficiencies in a system. He argues that LLM chatbots have taken this idea of “friction” to an extreme, presenting any unpleasantness or difficulty we encounter as something that should be outsourced to Silicon Valley’s thinking machines—even if overcoming that difficulty is part of what makes human creativity meaningful and worthwhile. “Anything that removes the friction of something that’s difficult, it makes you not learn, and it removes the learning you’ve already achieved.”

In theory, one could activate Slow LLM without anyone noticing; most people would likely assume that chatbot providers like Google and OpenAI are having technical issues, which does happen without outside interference from time to time. Lavigne says that so far, he hasn’t heard from anyone that has successfully deployed Slow LLM on a work or school network. But he certainly isn’t discouraging people from trying.

“I have not yet tested it on any unwitting subjects, but I’m thinking about it,” Lavigne said in a mischievous tone, adding that it would be an interesting experiment to see how people react when presented with artificially-slow chatbots. “Maybe they’ll just rage-quit LLMs.”

Slow LLM is the latest addition to a series of impish tech provocations that Lavigne has become known for. During the height of the pandemic Zoompocalypse in 2021, he released “Zoom Escaper,” a tool that floods your Zoom audio stream with annoying echoes, distortions, and interruptions until your presence becomes unbearable to others. In 2018, he infamously scraped public LinkedIn profiles to build a massive database of ICE agents, which was subsequently removed from platforms like Github and Medium. Lavigne’s frequent collaborator Tega Brain has also released browser tools like “Slop Evader,” which filters out generative AI slop by removing all search results from after November 2022, when ChatGPT was first released to the public.

“I’ve been doing these little experiments in digital sabotage where I’m trying to make these tools that mildly interrupt computational systems,” said Lavigne. “One of the things I’ve been thinking about is how if the means of production is truly in our hands, and it’s also the way we’re communicating with other people and managing our social life, then what does it mean to interrupt productivity?”

Lavigne is not an absolutist, however. Without prompting, he admitted that he used Claude to help write some of the code for Slow LLM—until, of course, Slow LLM started working and forced him to complete the project on his own. Instead, Lavigne says he’s trying to make people question the habits they are forming by regularly using chatbots, tools which tempt us to essentially entrust all our knowledge, decision-making, and emotional well-being to massive companies run by tech billionaires like Altman and Elon Musk.

“My hope is to get people to think a little bit more about their usage of these tools,” said Lavigne. “But the broader thing I want people to think about […] is ways of interrupting these flows of data, these flows of power, and putting friction into these computational systems that are mediating so many parts of our lives.”

Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete

Quoting David Abram

1 Share

I have been doing this for years, and the hardest parts of the job were never about typing out code. I have always struggled most with understanding systems, debugging things that made no sense, designing architectures that wouldn't collapse under heavy load, and making decisions that would save months of pain later.

None of these problems can be solved LLMs. They can suggest code, help with boilerplate, sometimes can act as a sounding board. But they don't understand the system, they don't carry context in their "minds", and they certianly don't know why a decision is right or wrong.

And the most importantly, they don't choose. That part is still yours. The real work of software development, the part that makes someone valuable, is knowing what should exist in the first place, and why.

David Abram, The machine didn't take your craft. You gave it up.

Tags: careers, ai-assisted-programming, generative-ai, ai, llms

Read the whole story
mrmarchant
3 hours ago
reply
Share this story
Delete

Sloppelgängers

1 Share

Three times during a recent interview on The Verge‘s Decoder podcast, host Nilay Patel asked Shishir Mehrotra, CEO of Superhuman, which owns Grammarly, the same question in different words: How much should you pay me to use my name?

“If your work is used, should you be attributed? Yes, I think you should,” Mehrotra said. “That would be the nice contract.”

But Patel wasn’t asking about attribution. Grammarly had launched a feature called “Expert Review” that generated AI editing suggestions and attached them to real people’s names. Patel’s name. Kara Swisher’s name. Julia Angwin’s name. Stephen King’s name. Hundreds of others. Nobody was asked. Nobody was paid. Grammarly charged $12 a month for the privilege.

So Patel kept pushing. “How much do you think you should pay me?” he asked again. And again. Mehrotra kept talking about attribution, about linking to sources, about how the feature was “clearly indicated” as being “inspired” by experts’ published work.

“This wasn’t an attribution,” Patel finally said. “You just made something up and put my name on it.”

He’s right. I want to be specific about what “made something up” looks like in practice. The AI-generated edit that Grammarly attributed to Patel suggested that he, in his role as editor-in-chief of The Verge, emphasizes “the importance of crafting compelling headlines that convey urgency” and recommends “adding emotional or stakes-based words.” As Patel put it: “I’ve been an editor for over 15 years. I’ve literally never said anything like that.”

Nobody talks like that. It’s AI slop, the kind of vague, interchangeable advice you’d get from any chatbot on any topic. The only thing specific about it was Patel’s name attached to it.

If you haven’t been following this story: Last August, Grammarly (which rebranded its parent company as “Superhuman” in October 2025) launched Expert Review as part of its paid Pro tier. The feature promised writing feedback from “leading professionals, authors, and subject-matter experts.” What it actually did was use AI to generate editing suggestions and attribute them to real people, none of whom had any involvement. Wired‘s Miles Klee broke the story in early March. The Verge‘s Stevie Bonifield then discovered that the “experts” included their own boss, Nilay Patel, and several of their colleagues. The backlash was immediate. Kara Swisher, whose name the tool had been using, told Platformer‘s Casey Newton: “You rapacious information and identity thieves better get ready for me to go full McConaughey on you. Also, you suck.” Investigative journalist Julia Angwin filed a class-action lawsuit. The feature was killed on March 11.

The feature is dead. The lawsuit is filed. But the Decoder interview, published today, is the thing from this whole mess that I can’t stop thinking about. Mehrotra sat for more than an hour of pointed questions and still couldn’t understand what the wrong thing was. He isn’t necessarily evasive in this interview. I don’t think he’s stonewalling. He’s being candid. And what his candor reveals is that the basic concept (that a person’s name, reputation, and expertise belong to that person, and you need their permission to use those things to sell a product) doesn’t exist in his head. He can’t hear the question Patel is asking because the premise behind it doesn’t register.

That’s worth paying attention to.


The Present Age is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


A bad feature, not a bad idea

Throughout the Decoder interview, Mehrotra makes a series of moves that all follow the same logic: every time Patel confronts him with a problem of consent, Mehrotra reframes it as a problem of quality. The feature was bad. The team missed. The execution fell short. What he never says, not once in the entire conversation, is that they shouldn’t have used people’s names without asking.

Patel asks how many people at Superhuman worked on Expert Review. Mehrotra’s answer: “It was a small team. It was probably a product manager and a couple engineers.” He says he personally hadn’t spent any time looking at the feature before the backlash hit. This is meant to contain the damage, to make it sound like a minor project that slipped through. But think about what it actually tells you. Superhuman has about 1,500 employees. A team built a feature that used hundreds of real people’s names and likenesses to sell a subscription product. It was live for seven months. And nobody, in a company of 1,500, flagged it. Nobody in the chain between “a couple engineers” and launch day said, “Hey, should we ask these people first?” A rogue team gets fired. This team didn’t get flagged, because there was nothing to flag. Using someone’s identity without their knowledge was just how things worked.

Shishir Mehrotra, CEO, Superhuman (formerly Grammarly), on stage during day one of Web Summit 2025 at the MEO Arena in Lisbon, Portugal. (Photo By Alex Broadway/Sportsfile for Web Summit via Getty Images)

Then there’s how Mehrotra describes the decision to kill Expert Review. “I came and looked at it and I said, ‘This is off-strategy for us,’” he told Patel. Off-strategy. He killed it because it didn’t fit the company’s product direction. He frames the whole thing as a strategic misfire, a product that didn’t serve users or experts well, a team that “missed.” At one point he says, “the feature was not a good feature. It wasn’t good for experts, it wasn’t good for users.”

You’ll notice what’s absent from that explanation. There’s no “we shouldn’t have used people’s names without their consent.” There’s no “taking someone’s identity to sell a product is wrong.” There’s no acknowledgment that the problem was ethical. In Mehrotra’s telling, Expert Review failed the same way any product fails. Bad execution. Low usage. Didn’t align with the roadmap. Blah, blah, blah. Time to move on.

Patel pushes him on the legal claims. Angwin’s lawsuit argues that Superhuman violated New York and California laws barring the commercial use of people’s names without consent. Mehrotra’s response: “Respectfully, we believe the claims are without merit. The idea that the feature is impersonation is quite a big stretch.” But impersonation isn’t the claim. The claim is unauthorized commercial use of a name. Mehrotra keeps pivoting to how the feature included disclosures saying the suggestions were “inspired by” experts. “It’s far from that test,” he says, still answering a question about impersonation that nobody asked.

There’s one more exchange that I think is the most telling moment in the whole interview. Patel asks where the expert names came from. How did Grammarly decide whose names to use? Mehrotra’s answer: “It came right from the popular LLMs. So it’s exactly the same experience you would have if you came to Claude or Gemini or ChatGPT and said, ‘Can you take this piece of writing, recommend the people who would be most useful to give feedback on it, take their most interesting works and use that to try to give me feedback.’”

He means this as a defense. The names aren’t special. The models already know these people. Anyone could get the same result by prompting a chatbot.

And he’s right. That’s what makes it such an accidental confession. The models do already have all these people inside them. They do already use their work without permission or compensation. Every chatbot will happily generate advice “in the style of” any named writer you ask for. Grammarly just made the mistake of being explicit about it. As Casey Newton wrote in Platformer: “Grammarly just had the bad manners to put my name on it. The bigger problem, though, is the one that’s still invisible: all the ways my work, and the work of every other writer, is being used, right now, by systems that are smart enough not to tell us about it.”

Mehrotra looked at that invisible system, made it visible, put a price tag on it, and then couldn’t seem to understand why anyone was upset.

Share

Write for exposure, 2026 edition

Here’s where the interview gets really interesting. Because Mehrotra doesn’t just fail to see the problem. He thinks he has the solution. And the solution is, somehow, just as bad.

Midway through the conversation, Patel asks Mehrotra what the economics should look like for experts on Superhuman’s platform. Mehrotra explains that the company is building an agent store with a 70/30 revenue split (70% to the creator, 30% to Superhuman). Experts can build their own AI agents, put them on the platform, and get paid when subscribers use them. He compares it to YouTube. He invokes the “1,000 true fans” theory: get 1,000 people to pay you $100 a year, and you’ve got a $100,000 business.

Patel asks the obvious question: “If you already had that system, why build another system that used my name for free?”

Mehrotra: “We didn’t have the system at the time.”

So, to be clear about the sequence here: Superhuman took hundreds of people’s names, used them to sell a subscription product, and is now offering those same people the opportunity to do additional work, on Superhuman’s platform, to earn a share of future revenue. The people whose names were used without consent are being pitched a business plan.

Patel puts it plainly: “You understand that you’re saying I have to do that because all of the work I’ve produced in my career to date has been taken without compensation by AI companies.”

“I didn’t make that statement,” says Mehrotra.

Patel: “You’re saying I need to invent some new business model as an expert and upload an agent of myself to your tool and then advertise it... because my actual body of work has been reduced to zero value. That’s a pretty hard sell.”

If you’re a writer or journalist or creator of any kind and you’ve been around long enough, this pitch has a familiar ring to it. In the early 2010s, outlets like The Huffington Post built enormous businesses on the backs of unpaid contributors. Early in my writing career, I did my share of this kind of writing. The argument was always the same: we can’t pay you, but think of the exposure. Think of the platform. Think of the audience you’ll reach. Writers eventually recognized this for what it was (exploitation with good PR), and the practice became broadly understood as indefensible.

Mehrotra is offering a version of the same deal, except he’s made it worse in two ways. First, the old “write for exposure” pitch at least required you to say yes. You knew what you were getting into. Grammarly’s Expert Review conscripted people. They found out they’d been working for Grammarly when they read about it in the news. Second, the proposed fix asks the people who got ripped off to do more work. Build your agent. Train it to sound like you. Maintain it. Market it to Superhuman’s users. And in return, you get 70% of whatever comes in from the subscribers you attract to a platform you never asked to be on.

There’s an exchange near the end of the interview that captures this perfectly. Patel reads a tagline from Superhuman’s suite at South by Southwest: “AI can’t replace human creativity, empathy, or emotion... taste and judgment are more valuable than ever.” He asks Mehrotra: “Valuable on what metric? Is it dollars?”

Mehrotra talks about how Superhuman’s users are professionals, salespeople, support workers, and how the company helps them become “a better version of you.” He never answers the question about dollars.

Because that’s the trick. The AI industry loves to tell creators that their taste and judgment are “more valuable than ever.” It just means valuable in a way that can’t be deposited. Your expertise has never been worth more, and you’ve never been paid less for it. The value flows up. The work flows down. And if you want to get a cut, you’d better start building.


For those who would like to financially support The Present Age on a non-Substack platform, please be aware that I also publish these pieces to my Patreon.

Parker's Patreon


The best deal available

Mehrotra is unusually candid about all of this. But he’s not unusual. He’s just the one who sat for the interview.

Last October, I interviewed The Atlantic‘s CEO Nicholas Thompson for Depth Perception, the Q&A series I co-write for Long Lead. At one point I asked him about AI’s impact on journalism, and he described the AI companies with a clarity you rarely hear from someone at his level:

“It’s being run by companies that took our data in the middle of the night, violating our terms of service, lied about it, hid their activities, covered up their tracks, and built competitive business models without any compensation. So I have mixed feelings.”

The Atlantic has a deal with OpenAI. I asked Thompson about it. His explanation:

“With most of the AI companies, they took all our content, they trained their models on them, and gave us nothing. With OpenAI, they took our content, trained their models on us, and paid us money, and allowed us to have a seat at the table as they design their search product. So we prefer the OpenAI model to the other models.”

Long Lead Presents: Depth Perception
Running on AI: How Nicholas Thompson wrote a book while leading 'The Atlantic'
Nicholas Thompson is actually getting faster with age, defying the usual trajectory of human physiology. In April 2021, two months after being named CEO of The Atlantic, the veteran journalist set the American record for men 45 and older in the 50-kilometer race…
Read more

The CEO of one of the most respected magazines in American journalism is describing the best available option, and the best available option is: every AI company stole from us, but this one paid us afterward. That’s the good outcome. That’s what winning looks like when the table is set this way.

Every option Thompson describes starts with the same first step: they took our content. The only variable is what happened next. Did they pay? Did they offer a seat at the table? Or did they just take it and walk away? “They don’t take it” is not on the menu. It was practically a hostage negotiation.

This is the same dynamic Mehrotra is proposing, just viewed from the other side. His 70/30 agent platform is the Grammarly version of what OpenAI offered The Atlantic: we already took your stuff, and now here’s a way to get something back. The difference is that Mehrotra is making this pitch while the lawsuit is still active, which makes the audacity a little harder to miss.

And the logic is the same everywhere you look. OpenAI’s official position on using copyrighted material to train its models is that “training is fair use, but we provide an opt-out because it’s the right thing to do.” If it’s fair use, the opt-out is a courtesy. If the opt-out is “the right thing to do,” then maybe it’s not actually fair use. The sentence contradicts itself, and nobody at OpenAI seems to have noticed. It’s the same incoherence as Mehrotra insisting that generating fake advice under someone’s name is “attribution.” The stories these companies tell about what they’re doing don’t hold together under the slightest pressure. They don’t need to. They just need to hold together long enough.

Leave a comment

Sloppelgängers

The advice was terrible.

I mentioned earlier that the AI-generated edit attributed to Patel was generic slop. But that was one of the milder examples. In her Times op-ed, Angwin describes what Grammarly’s version of her actually recommended. One suggestion: replace the factual opening sentence of an investigative article with an invented anecdote about a fictional person named Laura, a patient whose medical privacy had been violated. The tool generated the fake anecdote in full and offered a button to paste it straight into the article. Uh oh.

Angwin notes why this is a big problem: “Replacing a factual sentence with an imagined story about a person who doesn’t exist is not only bad editing. It’s a deception that could end my career as a journalist.”

And she’s right. If someone had actually taken that advice, under Angwin’s name, and published a fabricated anecdote in what was supposed to be an investigative news story, the reputational damage could land on her. The person whose name was on the suggestion. The person who had nothing to do with it.

Author Benjamin Dreyer found that Expert Review would generate writing tips attributed to Stephen King even when you fed it lorem ipsum, meaningless placeholder text. The tool didn’t care what you wrote. It didn’t understand what you wrote. It just grabbed names and generated vaguely editorial-sounding sentences and presented the whole package as if a real person was helping you.

Angwin, borrowing a term coined by writer Ingrid Burrington on Bluesky, called these AI imitations “sloppelgängers.” It’s the right word. They weren’t good enough to pass for the real thing. They were bad enough to cause real harm while trading on real names.

Maureen Ryan, the veteran TV critic and author of the best-selling Burn It Down, was one of the people included in Expert Review. She didn’t find out about it until the news broke. In an open letter she published on March 10, she wrote: “You just take that? You just take my identity? And you do that in an environment where making a living from writing is an ever more precarious proposition? And then you make me take time out of my day to tell you that’s wrong?”

Before Superhuman killed the feature, its response to the backlash was to set up an email address where experts could request to be removed. Opt out. Unsubscribe from the service you never subscribed to. The burden was on the people who’d been used, not the company that used them. Ryan, who is busy writing a book, was being asked to take time away from her actual work to undo something that never should have happened in the first place.

And the thing is, as Angwin argued in her Times op-ed, the law that covers what Grammarly did isn’t new. New York’s right of publicity law is a century old. At least 25 states have versions of it. You can’t use someone’s name to sell a product without their consent. The technology is new. The principle is settled.

This is what the blind spot produces. Mehrotra and his company couldn’t see the consent problem, so they couldn’t see the quality problem either. If you don’t think you need someone’s permission to use their name, you’re certainly not going to check whether the advice you’re generating under that name is any good. Why would you? You don’t think you owe them anything. The feature was bad for the same reason it existed at all: because the people who built it genuinely did not consider the humans attached to those names to be stakeholders in the process.

Scared for their jobs

Near the end of the Decoder interview, Patel brings up an NBC News poll on public perceptions of AI. The numbers are grim. AI has a net negative favorability rating of -20. It polls behind ICE.

Patel suggests this might have something to do with how extractive the technology feels. Mehrotra disagrees. “People are scared for their jobs,” he says. He thinks the public’s unease is about employment, not about how AI companies treat creators.

He’s probably right that people are scared for their jobs. And he can’t see that his own interview is a case study in why. An hour of explaining, in detail, how his company used real people’s work and names to build a product, killed it when it became inconvenient, and is now pitching those same people on the opportunity to earn their way back in. All while insisting that nothing wrong happened.

At the very end, Mehrotra tells Patel he’d love to work with him on building an agent. This is how the conversation closes. An hour of “how much should you pay me” and “we believe the claims are without merit” and “I’ve literally never said anything like that,” and Mehrotra’s parting thought is a sales pitch.

I’d love to work with you. We’d love to have you on our platform. The door is open.

They took your name, generated garbage under it, sold it for $12 a month, and now they’d like to know if you’re interested in a partnership. Welcome to the future, I guess.

Read the whole story
mrmarchant
3 hours ago
reply
Share this story
Delete
Next Page of Stories