1542 stories
·
2 followers

Culture Shift

1 Share
Ella Watkins-Dulaney for Asimov Press

By Rachel Dutton

The human immune system is, in one sense, a detection mechanism. It has evolved, over millions of years, to scan the body for molecular signals that tell it whether to attack or stand down. Most of these signals come from pathogens, damaged cells, or the body’s own hormones. But in 2019, a lab in Germany published a finding that pointed to a much stranger source: one of the signals sensed by the immune system is found in sauerkraut.

When people eat sauerkraut, a molecule called phenyllactic acid (D-PLA) — found in fermented foods — enters their bloodstream and activates a receptor, known as HCA3, on immune cells, triggering an anti-inflammatory response. In addition to lactic acid, phenyllactic acid is one of many compounds produced by lactic acid bacteria during the fermentation of sauerkraut and related fermented foods. Prior to this study, other molecules had been found to bind HCA3, but D-PLA was a hundredfold more potent than any of them.

Left: Molecular structure of D-PLA. Right: Sauerkraut. Credit: Gandydancer

This discovery advances our understanding of how fermented foods can reduce inflammation and positively affect human health. But more striking is what it suggests about hominid physiology. Although HCA3 is part of a larger family of receptors broadly conserved across eukaryotes, HCA3 is only present in humans and other great apes like chimpanzees and gorillas — and not even in other mammals. It is a recent addition to the genome, appearing only a few million years ago. Its existence seems to suggest that our immune system evolved to recognize the microbial metabolites from fermented foods.

We tend to think of fermented foods as something humans invented and then chose to eat. But, increasingly, scientific evidence suggests the causality runs the other way. Fermented foods appear to have helped shape human biology itself, and our bodies may have been built, in part, to expect them. The case for this runs from changes in hominid gut anatomy millions of years ago to the HCA3 receptor, to a growing body of research linking fermented food consumption to immune function and gut health. And it raises an uncomfortable question about what happened when the Western food system, in the name of safety and efficiency, quietly removed these foods from our diets in the nineteenth and twentieth centuries.

Deep writing about biology, delivered to your inbox.

Early Fermentation

Fermented foods are the result of the controlled growth of communities of microbes. At their core, they are the products of the interaction of these microbes and whatever food they consume, whether cabbage or cucumber. While this process varies from food to food, fermentation typically involves managing environmental variables such as oxygen, temperature, and salinity. In contrast to food preservation methods like canning or even pickling, which are designed to prevent microbial growth, fermentation harnesses the capacity of naturally occurring microbes found on fresh food or in the environment to outcompete spoilage organisms.

Kimchi under the microscope (400x magnification). The red dot is chili oil. Many microbes are visible. Credit: Rachel Dutton

Most archaeological evidence from pottery shards suggests that fermented food production is at least 7,000 to 10,000 years old. This timeframe coincides with the major transition to agrarian lifestyles, which would have reliably produced surpluses of food and the subsequent need for preservation methods.1 While this explanation satisfies most scholars, there is reason to believe that fermentation may be far older.

For one, it’s a very simple process to trigger. Some foods even ferment spontaneously. In the case of alcoholic drinks, like beer, wine, or mead, ubiquitous yeast species, which are naturally found on grapes and other fruit skins, rapidly use sugar as a food source for reproduction, producing ethanol as a byproduct. The same results occur when ripe fruit falls to the ground and its sugar is exposed to the environment, or when honey is diluted with water.

Other processes require only minimal intervention. For example, submerging fresh food in liquid or burying it creates a low-oxygen environment that encourages the growth of acid-producing bacteria that preserve the food by what is called “lactic acid fermentation.” This technique produces dill pickles, sauerkraut, and kimchi. While salt is often added as an additional intervention against unwanted microbes, it’s not required as it isn’t the primary driver of the fermentation.

Another argument for an earlier origin for fermented foods is that they are found across nearly all human cultures. While the number of fermented foods in the modern, Western diet is fairly limited (cheese, yogurt, bread, chocolate, coffee, beer, wine, kimchi, and kombucha) hundreds more are eaten around the world, from fermented shark in Greenland to a seemingly limitless variety of fermented soy beans in Asia. This diversity is a testament to how humans gradually mastered this ancient practice and modified it to suit new environments as they moved out of Africa.

Around 14 million years ago, our hominid ancestors were arboreal species whose diet would have been primarily based on fresh fruits picked from the trees they lived in. When ripe fruit fell to the ground and underwent spontaneous fermentation, it would have been toxic to our ancient ancestors due to its high concentration of ethanol. Their bodies as yet had no efficient way to break down ethanol.

But then, about 10 million years ago, a mutation arose in the genome of the common ancestor of humans, gorillas, and chimpanzees. This mutation, a single amino acid change in the enzyme Alcohol Dehydrogenase 4 (ADH4), enabled it to break down and detoxify ethanol with 40x higher efficiency. The capacity to consume this energy-rich but previously dangerous fruit may even have driven our transition from an arboreal lifestyle to a terrestrial one. What’s more, this ability to tolerate ethanol may have been what allowed our ancestors to diversify their diet and survive while lineages without this mutation went extinct.

The fossil record shows that a major shift in hominid anatomy occurred around 2 million years ago, when hominids developed a smaller rib cage and larger skull. At the same time, another major change took place in their intestines. Compared to our closest relatives, humans have a digestive tract that is 40 percent shorter. This decrease was thought to be driven by the external processing of our food, which reduced the time and energy involved in chewing and digesting. Anthropologist Richard Wrangham argues that the technological innovations of controlling fire and cooking food led to this major change, and that the excess energy we got from cooked food, in turn, supported the evolution of a larger brain.

However, two recent studies, by biological anthropologist Katie Amato in 2021 and evolutionary biologist Erin Hecht in 2023, suggest that these anatomical changes may have been driven by human use of fermentation even before humans began to cook. By allowing microbial species to ferment and break down complex carbohydrates and other macromolecules in foods, we may have turned over certain parts of an otherwise energy-intensive digestive process to microbes in a form of “external digestion.” This use of fermentation to pre-digest food, intentional or not, may have served as a predecessor to cooking, providing the extra calories needed to support the evolution of a larger brain.

Another benefit of fermentation is that it offered access to foods which, previously, would have been toxic. As our ancestors came down from the trees and needed new ways to fill their stomachs, the tubers of many plants and grasses offered an appealing, ready source of calories. Tubers contain large deposits of starch. Root vegetables, such as potatoes, yams, and carrots, are our modern-day, highly domesticated equivalents. But the wild tubers of our ancestors’ time were hard to chew, and some contained low levels of toxins. Varieties of cassava, for example, contain compounds that release cyanide when ingested. After just a few days of fermentation, however, microbes destroy these dangerous molecules and make the food safe to eat.2

Peeled cassava soaked in a tub for fermentation. Credit: International Institute of Tropical Agriculture.

Ultimately, given the simplicity of the fermentation process, its provision of new food sources, and its role in decreasing the need for extensive chewing and digestion, fermentation could have had an outsized impact on human evolution. It may, in fact, have helped make us human.

Peril and Promise

Perhaps the most striking thing about fermentation is that humans figured out how to control the growth of microbial species long before we understood what “microbes” were.

No one had seen microbial life until the 17th century, when Dutch scientist Antonie Van Leeuwenhoek used a handmade microscope to reveal small, motile forms he described as “animalcules.” It wasn’t until the late 1800s, however, that the French chemist Louis Pasteur demonstrated the role of microbes through his studies of fermented foods. His work on the spoilage of beer and wine formed the basis of the “germ theory” of disease. If microbes could be the causative agents of spoilage in food, Pasteur reasoned, maybe they could also be the causative agents of disease.

For his part, Pasteur was not strictly “anti-microbe.” He found them enthralling, investigating the differences between lactic fermentation (yogurt, pickles, and sauerkraut), butyric fermentation (butter, cheese, and milk), and acetic fermentation (kombucha, sourdough, sour beer), as well as developing the categories “aerobic” and “anaerobic” (which refer to whether microbes require oxygen to survive).

Even so, Pasteur’s work identifying microbes responsible for putrescence and disease has had the greatest impact on our food system. At the beginning of the twentieth century, automated machinery expanded canning capacity from roughly 10 cans a day to 1,000. Variation in the quality of the sealed lids, temperature treatments, and exposure to contaminants through handling meant a greater potential for microbial illnesses like botulism.

After three deadly botulism outbreaks in 1919 linked to California-distributed olives, the canning industry turned to Pasteur-inspired bacteriologists” to design better canning systems and restore public confidence. While the resulting practices — including steam sterilization, the marking of batches, and traceable can coding — were at first voluntary, the Cannery Act of 1925 mandated statewide compliance.

Newspaper clippings from the 1900s about the 1919 outbreak and Cannery act of 1925. Left: Morning Press, March 18, 1920. Right: Madera Mercury, January 24, 1925.

By the mid-1900s, microbial research into food had come to focus on how to keep organisms out of it. Innovations in heat, pressure, refrigeration and anti-microbial agents to extend shelf life and decrease potential contamination with food-borne pathogens formed the core of academic and industrial research. As our methods of food production shifted, so did our diets. Americans largely moved away from the consumption of traditionally fermented foods, with one major exception.

That exception began with a single lecture. The Russian zoologist, Ilya Mechnikoff, based at the Pasteur Institute in Paris, was well known in the early 1900s for his scientific discoveries involving the immune system, for which he would win the Nobel Prize. But towards the end of his career, Metchnikoff became fascinated with the idea that aging was just another disease awaiting a cure.

In his 1908 book, The Prolongation of Life, Metchnikoff proposed that this disease was caused by "putrefying" microbes that lived within the human gut, or, as he put it, “chronic poisoning from an abundant intestinal flora.” His suggested cure came in an unexpected form — yogurt. Mechnikoff reasoned that just as the acid in sour milk prevented the growth of spoilage organisms, it might limit the growth of “spoilage” microbes in the gut.

After speaking with a student after one lecture, Metchnikoff learned of large numbers of centenarians in Bulgaria (one of whom he credulously reported as being “158” years of age).3 Metchnikoff hypothesized that the consumption of large quantities of yogurt in Bulgaria prevented aging through the action of lactic acid on gut bacteria. After obtaining a sample of the “Bulgarian bacillus” (Lactobacillus delbrückii subsp. bulgaricus) from yogurt, he set out to study the effects on the gut. His lab work showed that the presence of the yogurt microbe and the lactic acid it produced slowed the growth of intestinal microbes.

Ilya Mechnikoff, around 1910. Credit: The Library of Congress.

Metchnikoff thus advised the consumption of yogurt to promote healthy intestinal balance, writing: “A reader who has little knowledge of such matters may be surprised by my recommendation to absorb large quantities of microbes, as the general belief is that microbes are all harmful. This belief, however, is erroneous. There are many useful microbes, amongst which the lactic bacilli have an honourable place.”

Metchnikoff initially presented his yogurt hypothesis at a public lecture in 1904 in Paris, titled “Old Age.” The lecture went the early 1900s equivalent of viral. Newspapers ran stories with headlines like “Drink Sour Milk and Live to be 180 Years Old” (Evansville Courier and Press) and “Sour Milk is Elixir: Secret of Long Life is Discovered by Prof. Metchnikoff” (Chicago Daily Tribune). “Within months of Metchnikoff’s lecture, milk-souring germs had blossomed into an international business. Pharmacies throughout Europe and the United States were offering Bulgarian cultures in the form of tablets, powders, and bouillons — to be consumed as is or used,” writes Luba Vikhanski in her book Immunity. It was, in effect, the first “superfood.”4

Evansville Courier and Press, February 4, 1906

“Probiotic” products quickly followed, containing dehydrated “lactic bacilli,” that anyone could use to sour their own milk in the way prescribed by Metchnikoff. But despite interest in yogurt and probiotics as health foods, most 20th century microbiologists remained focused on “bad actors,” the microbes and pathogens involved in disease. However, this changed toward the end of the century with advances in sequencing technology and chemical analysis techniques like mass spectrometry, which allowed them to probe microbial genetics and metabolism more closely and with more nuance.5

Environmental DNA sequencing revealed the diversity of microbial species across habitats, including in and on the human body as well as in our food. Understanding what these microbes are doing, and how, has become one of the challenges of modern microbiology.

To Eat Microbes

In response to this challenge, a lab at Stanford University, led by Erica and Justin Sonnenburg, has been doing seminal work on the human gut microbiome and how species within it function and interact. Recently, the Sonnenburgs have turned their focus on diet-microbiome interactions to fermented foods. A clinical trial, which the Sonnenburgs set up in collaboration with nutrition scientist Christopher Gardner in 2021, offered the clearest evidence to date that fermented foods are essential for gut health. In the trial, 36 healthy adults spent 10 weeks eating a diet high in such foods.

At the end of the study, they showed increases in gut microbiome diversity, which is generally associated with gut health. (The participants had new microbial species in their guts, which did not come from the fermented foods. It seems, rather, that fermented foods somehow make the existing gut ecosystem more receptive to incorporating new strains.) Perhaps an even more impactful outcome was that the researchers also found widespread decreases in inflammatory markers in the blood of these participants.

Given that low levels of inflammation sustained over time, referred to as chronic inflammation, is believed to be implicated in many diseases, the ability to decrease it though simple dietary changes would be welcome. However, the six servings of fermented foods a day consumed by the study participants is far higher than that in most U.S. diets. While yogurt and cheese are commonplace, with an average of 13.8lbs and 42.3lbs consumed per person per year respectively, this represents about a serving of fermented dairy a day. Fermented vegetables also comprise less than a serving per day. The impressive 387 million pounds of sauerkraut Americans consume per year equates to only 1.5lb per person, or about 0.06 servings a day.6 There has been a resurgence in interest in fermented foods, from high-end chefs experimenting with new types of fermentations to home fermenters caught up in the pandemic sourdough craze, but we have a long way to go before consumption reaches levels needed for clinically meaningful outcomes.

Kefir under the microscope (400x magnification). Credit: Rachel Dutton

Even larger gaps remain in our understanding of how fermented foods actually drive health benefits. Beyond inflammation, fermented food consumption has been associated with a wide range of positive health outcomes, from gut health to mental health. As compelling as the Sonnenburg study is, it doesn’t yet reveal the mechanism(s) by which fermented foods trigger changes in human physiology.

While the 2019 sauerkraut study points to the importance of microbial metabolites like phenyllactic acid, if we co-evolved with fermented foods over millions of years, this example of a metabolite-receptor pairing is likely just the tip of the iceberg. As in many other areas of microbiome research, we need to move beyond correlation to causation. Doing so will require a much deeper understanding of the molecular composition of fermented foods, how they are connected to human biology, and more precise clinical studies.

Scientists are beginning to recognize the potential diversity of bioactive metabolites in fermented foods, but most studies focus on a single food type at a time and use different assays of bioactivity from lab to lab. To be able to effectively map the complex interactions between fermented food microbes, metabolites, and human biology, we need larger, consolidated datasets.

To this end, the Microcosm Foods project has been assembling a first of its kind collection of fermented food data pairing metagenomics (mapping microbes), metabolomics (mapping metabolites), and transcriptomics (mapping changes in human immune cells upon exposure to fermented foods) across over 100 different foods. Open-source, systematically-acquired datasets such as these will help build the scientific foundation needed to untangle these complex interactions.

For most of human history, people harnessed microbial ecosystems to make fermented foods and reaped the benefits without understanding their mechanism. Then, in a period of roughly 100 years, the Western food system replaced them with sterile, shelf-stable alternatives.

Today, we are finally beginning to understand how fermented foods interact with our biology — through receptors like HCA3, through shifts in gut microbial diversity, and through inflammation pathways. Science is catching up to what fermentation has been doing all along.


Rachel Dutton is a microbiologist studying fermented foods, from their use as microbiome models to their impacts on health. She received a PhD in Microbiology and Molecular Genetics from Harvard University, and led academic labs at Harvard and UC San Diego. Rachel is currently a Resident at the Astera Institute and a Fellow in the Big If True Science program at Renaissance Philanthropy.

Credit: L. reuteri image by Anastasiia Dmytriv.

Cite: Dutton, R. “Culture Shift.” Asimov Press (2026). DOI: 10.62211/92yr-34qw

1

Preservation isn’t the only benefit of fermentation, however. It also completely transforms raw ingredients, adding new flavors, textures, and aromas. These changes are dictated by microbial metabolism, which produces not just primary products of fermentation like acids, but also a diverse collection of enzymes and flavor molecules.

2

Cassava is the third most important dietary staple around the world after rice and corn. The fermented starch that comes from this quick-ferment is used as the starting point for many different staple foods such as fufu in Nigeria or pao de quiejo, also known as Brazilian cheese bread.

3

In his essay on Metchnikoff, blogger H.D. Miller writes “Unfortunately, reports of Bulgarian longevity seem to have been greatly exaggerated. Indeed, the best current guess is that Bulgarian life expectancy at the turn of the last century was actually only 40.08 years, more than a few decades short of a century. As with Sardinians and Okinawans more recently, Bulgarians were better at convincing outsiders they were very old than actually being very old.”

5

Today, we know that many, if not most, species of microbes on the planet do not grow well under standard lab conditions, which limits the usefulness of earlier culturing methods.

6

A single serving size for yogurt is 170g (or ¾ cup), cheese is 42g (about a 1 inch cube), and sauerkraut is 30g (2 tablespoons). Based on the reported average annual per person consumption in the US, the average daily consumption is 0.1 servings of yogurt, 1 serving of cheese, and 0.06 servings of sauerkraut.

Read the whole story
mrmarchant
45 minutes ago
reply
Share this story
Delete

Something I Can Never Have

1 Share
Something I Can Never Have

"AI isn't lightening workloads," The Wall Street Journal reported last week. "It's making them more intense."

Well, yes. This is, in fact, how "workload" works under capitalism: labor is perpetually squeezed to do more, to generate more surplus value, to create more profit for the boss. Technological advancements -- that's what "AI" purports to be -- enable more to be done during the work day (which certainly extends well beyond some 40-hour week as everyone checks their email, their texts, their messages after hours and on weekends). Computing has not made us more productive, even though we feel as though we're doing more, and doing it more quickly, more intensely.

I am reminded, no surprise, of the children's book Danny Dunn and the Homework Machine (which I talk in Teaching Machines), published in 1958 -- which is to say we’ve known this exploitation is happening for a very long time now. In it, the titular character Danny and his friends Irene and Joe program his next door neighbor’s mainframe computer -- remarkably for the era, housed in Professor Bullfinch’s laboratory at the back of his house -- to do their homework for them. The trio believe they’ve discovered a great time-saving device, but when their teacher ascertains what they’ve done, she assigns them even more homework to do.

Something I Can Never Have

The increasing intensity of work – with computers, yes, and even more now with “AI” – is accompanied by a growing immiseration. Everyone feels it. Everyone.

Another story from Teaching Machines: when I was researching the book, I poured through hundreds and hundreds of letters sent to and from Sidney Pressey and B. F. Skinner. It’s easy to imagine their world of letters -- pre-computer, pre-Internet, pre-email -- as slow: slow to be written, slow to be delivered; their wording careful, their responses deliberate. But as both psychologists struggled with the commercialization of the machines they’d designed, the tone and frequency of their correspondence became more frenetic. Sometimes they would send two, three, four letters a day to the same person, dashing off angry, half-baked responses before stewing for a couple of hours and dashing off another one.

They were manic. But they were scientists; they were entrepreneurs.

So maybe it’s a side note, and maybe my main point: I think the media has been focused on only a small sliver of “‘AI’ psychosis,” the stories that are the most violent and tragic. The delusions and mania are much more widespread, but most of these are tolerated, even encouraged, as long as people continue to perform “productively” at their jobs.

Much like the furious quest for “personalization” in the digital classroom, one side effect of “AI” will be the further loss of community. Everyone works in isolation, clicking away endlessly with their chatbot of choice that sycophantically assures them that they don’t need anyone else. No longer will people turn to their colleagues for collaboration, for support, for advice, for mentorship. With “AI,” solidarity and trust are deliberately undermined -- the classic labor-busting tactic. “I can do it myself” (or rather “Claude tells me that it can do it for me, but I can put my name on the project”), people tell themselves; while everyone else second-guesses as to whether or not Claude actually has.

It’s the sad sociopathy of the tech elite, the sad paranoia of the conspiracy theorist, “democratized.”


This week, venture capitalist and techno-authoritarian Marc Andreessen triumphantly pronounced that he has “zero” levels of introspection — “as little as possible.” This is the Randian ideal, something every entrepreneur should aspire to, he tells the podcast audience, adding “and you know, if you go back 400 years ago, it never would have occurred to anybody to be introspective.”

According to Andreessen, civilization had none of that until that “guilt based whammy showed up from Europe, a lot of it from Vienna” -- a remarkably stupid reading of history, religion, culture, literature, so much so you might wonder if the man has ever opened a book, let alone his mind, in his life.

It is notable that Andreessen – one of the biggest proponents of (and, he certainly hopes, profiteers from) “AI” would dismiss introspection, arguably a core facet of “intelligence” that computers do not cannot will not ever possess. “AI” does not “know” anything really, but even more, it does not “know” about its “knowing.” It has no introspection; no meta-cognition; no embodied awareness of how it feels when it learns and when it knows; no meta-contextual awareness of where and when and why and with and from whom it knows; no reflexivity; no self-efficacy. It serves Andreessen’s interests then to deride and dismiss other ways of knowing; to limit “intelligence” to the cognitive flexes of what his “AI” machinery can quickly spew; and to imply, in turn, that humans are inferior, irrelevant.

But mostly, I'd argue, when Andreessen proudly states that he rejects introspection what he really means to say is that he eschews accountability. He will take no responsibility for his actions. He is a billionaire; he doesn’t believe he has to.

This is a moral problem, of course – a grossly immoral one at that. But it is also a policy problem, and one we can rectify, I’m certain.

“Kindness cultivates the self” – John Darnielle

Something I Can Never Have
PureGenius

Bullets:


Something I Can Never Have
(Image credits)

Today’s bird is the red-throated loon, the smallest and lightest of the loon species. Its feet are located quite far back on its body, making it incredibly clumsy on land. And yet it is the only loon that can take off into flight from land. The bird is associated with weather prediction -- its cries supposedly indicate whether or not it will rain.

Thanks for reading Second Breakfast. Please consider becoming a paid subscriber, as your financial support makes this work possible.

Read the whole story
mrmarchant
46 minutes ago
reply
Share this story
Delete

Grade Inflation Nation

1 Share

Join the Center for Educational Progress and receive all our content — and thanks to all our amazing paid subscribers for their support.


Early in my career, I taught public policy as an adjunct at Governors State University, a public university in the southern suburbs of Chicago.

One of my first assignments asked students to write about a policy issue that mattered to them — why it was important, and what solutions they could explore to solve it.

As I handed them back their assignments at the next week’s class, the mood in the room shifted. Students were stunned. Several told me these were the lowest grades they had ever received at the university.

I understood their frustration. They had done what was asked of them. They turned in the assignment, showed up to class, and put words on a page. By the standards they were used to, that should have been enough. But the writing wasn’t strong. Their arguments were underdeveloped, sources were thin, and basic mechanics were a problem.

What struck me wasn’t the pushback itself — it was how genuine their surprise was. These weren’t students trying to negotiate a grade they knew they hadn’t earned. They honestly believed they had done well. Somewhere along the way, the system had told them they were performing at a level they weren’t.

I also understood the incentive I was supposed to follow. Adjuncts live and die by course evaluations and enrollment numbers. A reputation as a tough grader doesn’t get you rehired — it gets you replaced. The rational move was to hand out B’s, keep everyone comfortable, and secure my next semester.

That experience has stayed with me as I’ve watched grade inflation become one of the most pervasive and least confronted problems in American education. What I saw in that classroom wasn’t a failure of individual students or individual teachers. It was a symptom of a system where no one has an incentive to tell the truth.


All Incentives Point One Direction

Even though my experience was in a post-secondary environment, the pressure I felt to inflate grades is not unique to college. The incentives to inflate in K-12 are stronger, more embedded, and harder to escape.

Start with the classroom teacher. Most teachers who inflate grades are not being told to do so. They are making a rational calculation to avoid a situation they know will follow if they don’t.

A rigorous grade often means a frustrated parent. A frustrated parent often means a phone call to the principal. A phone call to the principal means a meeting, a justification, and the mental labor of defending a grade that, in most cases, the teacher will be pressured to change anyway. Most teachers would rather give the B and move on with their day.

But the pressure does not stop with parents and principals. It is structural.

Over the past several years, a growing number of selective colleges have adopted test-optional admissions policies, dropping the SAT and ACT from their evaluation criteria. It has had a significant downstream consequence: the high school GPA became the dominant quantitative measure in a college application. Every K-12 teacher in America implicitly understands this. An honest B+ in a rigorous course might accurately reflect what a student knows. It might also be the grade that keeps that student out of their first-choice school.

The same dynamic plays out at the state level. Several states tie college scholarship eligibility primarily to GPA. Florida’s Bright Futures program and Georgia’s HOPE Scholarship both use GPA thresholds as the primary gateway to state-funded financial aid. When a student’s scholarship money is on the line, the pressure on teachers to keep grades above a certain threshold is overwhelming.

Then there are the grading policies themselves. During and after COVID, districts across the country adopted a set of practices marketed as “grading for equity.” These included grade floors — policies that guaranteed students a minimum score of 50 out of 100, even for work never submitted —, unlimited exam retakes, and the elimination of credit for homework and class participation. The stated goal was to remove bias from grading. The practical effect was to sever the already weakening connection between grades and mastery.

The result is a system where inflation is rational at every level. Teachers inflate preemptively to avoid conflict. Administrators back down when conflict arrives. Schools operate within a postsecondary admissions process that has placed GPA at its center. At no point does anyone in the system benefit from maintaining rigorous standards, and at every point, there is a tangible cost for doing so.

It is a textbook collective action problem. Each actor in the K-12 system, acting rationally within their own constraints, contributes to an outcome that harms students, misleads parents, and degrades the value of an education for everyone.


Parents Just Don’t Understand

Grade inflation would be less damaging if parents knew grades are an unreliable indicator of their child’s academic performance.

They don’t.

A 2023 study by Gallup and Learning Heroes surveyed nearly 2,000 parents of K-12 public school students and found that 79 percent report their child is receiving mostly B’s or better. Almost nine in ten believe their child is at or above grade level in reading and math. These numbers are wildly out of step with reality. On the 2022 NAEP, only 33 percent of fourth graders scored proficient or above in reading. Only 36 percent did so in math. Among 12th graders taking the ACT, just 40 percent met college readiness benchmarks in reading and 30 percent met them in math.

Parents are not stupid. They are misinformed. The Gallup-Learning Heroes study found that 64 percent of parents rely on report cards as one of their top three sources of information about their child’s academic progress. Only 21 percent said the same about year-end state standardized test results. When your primary source of information is telling you everything is fine, you have no reason to act.

This matters because parents do act when they know there is a problem. A 2026 working paper from the Becker Friedman Institute by Derek Rury and Ariel Kalil studied how parents weigh grades against standardized test scores when making decisions about investing in their child’s education — in tutoring, for example. Using over 23,000 investment decisions from more than 2,000 parents, they found that parents respond to both signals, but place significantly more weight on grades. When grades are low, parents invest, regardless of what the test score says. When grades are high, but test scores are low, parents do not invest. The high grade crowds out the response that the low test score would otherwise trigger.

That finding is the mechanism through which grade inflation does its real damage. It is not just that the signal is wrong. It is that the wrong signal actively suppresses the corrective action parents would take if they had accurate information.

A child who is struggling in math but receiving a B will not get a tutor. Their parents will not schedule a meeting with the teacher. They will not look for supplemental programs. The inflated grade has told them there is no problem.


Why Parents Don’t Trust the One Signal That Could Help

The question, then, is why parents discount the one signal that could cut through the noise. Standardized test scores are designed precisely for this purpose — to provide an objective measure of what a student actually knows. Two factors explain why they fail to serve that function.

The first is a sustained public relations campaign against testing. Teachers unions and allied advocacy groups have spent years framing standardized tests as reductive, biased, and harmful. The language is familiar: “teaching to the test,” “reducing kids to a number,” “high-stakes testing.” This messaging has been effective. The Rury and Kalil study found that nearly 40 percent of parents believe standardized tests are biased against certain groups. When asked directly, 71 percent of parents said grades are more important than test scores for making decisions about their own children. Only 8.5 percent said the opposite.

The second is timing. Standardized test scores typically arrive months after the test is administered — sometimes over the summer, sometimes well into the following school year. By the time a parent sees the result, it is stale. Compare that to a report card, which arrives every quarter. Parents understandably weigh the signal that shows up when there is still time to do something about it, even if that signal is unreliable.

This combination is deeply damaging. Parents are left with one signal that is timely but dishonest, and another that is honest but delayed, and culturally discredited. The natural self-correcting mechanism — parents investing in their children when they see them struggling — has been broken by the very system that is supposed to be providing honest information.


The Costs of Grade Inflation

The costs of this broken feedback loop are not abstract.

A recent study by Jeff Denning and colleagues linked high school administrative data from Los Angeles and Maryland to postsecondary and earnings records to measure the long-run impact of grade inflation on students. They found that being assigned to a teacher with higher average grade inflation reduces a student’s future test scores, lowers the likelihood of graduating from high school, decreases college enrollment, and ultimately reduces earnings. The cumulative effect is large: a teacher with one standard deviation higher than average grade inflation reduces the present discounted value of their collective students’ lifetime earnings by $213,872.

That number is worth sitting with. Grade inflation is not a victimless accounting trick. It lowers college attendance. It reduces lifetime earnings. It makes our students less prepared and, collectively, our workforce less capable.

When a country systematically tells its students they are performing well when they are not, it produces a generation that is less prepared than the one before it. This is not hypothetical. NAEP scores in reading and math have declined over the past decade even as average GPAs have risen. The share of high school students graduating with an A average has increased significantly since the early 2000s, but the share demonstrating proficiency on national assessments has not kept pace. We are handing out more A’s for less learning. The downstream consequences — a less literate workforce, fewer students prepared for rigorous postsecondary programs, lower productivity — are diffuse enough that no single institution is held accountable, but real enough that we will feel them for decades.

Grade inflation is, in this sense, a form of national self-deception. It flatters us in the short run and diminishes us in the long run. Unlike other forms of educational failure, it is almost perfectly designed to go unnoticed — because the very mechanism that would alert us to the problem, the grade itself, is the thing that has been corrupted.

There’s one more issue with grade inflation: it doesn’t stay contained. It spreads to adjacent systems, including the ones that are supposed to serve as external checks on grade inflation itself.

Consider what just happened in Massachusetts. On March 3, Governor Maura Healey celebrated that 35.8 percent of Massachusetts public high school graduates scored a 3 or higher on an AP exam — the highest percentage in the nation and the highest on record. State officials touted it as evidence that students are better prepared than ever. What they did not mention is that the College Board has changed how it scores AP exams. Passing rates have surged nationally in recent years not because students are learning more, but because the exams have gotten easier. The number of correct answers needed for passing scores has been reduced. The College Board confirmed the changes but neither the organization nor Massachusetts officials noted them in their press releases.

This is grade inflation’s downstream logic applied to a different institution. The AP exam was designed to be an objective, nationally comparable measure of college-level mastery. It was supposed to be the kind of signal that could not be gamed by local grading practices. But the same incentive structure that inflates classroom grades has reached the College Board. Students and families are happier because they get college credit. Schools are happier because they look good. Governors get to hold press conferences.

When the checks on grade inflation are themselves inflated, the system has no remaining mechanism for self-correction. That is why legislative action is necessary.


What States Can Do

There are three things states can do right now to help curb grade inflation.

End test-optional college admissions at public universities.

The shift to test-optional admissions is wrong-headed for a number of reasons, as I’ve written before.

But it also did something else: it removed the one external check that kept grade inflation from being costless. When standardized test scores were part of the admissions equation, a school that handed out inflated A’s would eventually be exposed by mediocre SAT or ACT results. Test-optional policies eliminated that accountability mechanism. States that control their public university systems can restore it. Requiring standardized test scores for admission to state universities would not solve grade inflation overnight, but it would reintroduce a signal that schools cannot manipulate.

The tide is already turning. Over the past two years, a growing number of universities have reversed their test-optional policies after reviewing admissions data from the pandemic era. Every Ivy League school except Columbia has reinstated a testing requirement. MIT, Stanford, Johns Hopkins, and the University of Pennsylvania all now require scores. Ohio State reinstated its requirement after finding that students who submitted test scores had higher GPAs and were more likely to persist through their degree. The University System of Georgia restored testing requirements at four additional campuses. Princeton’s decision followed a five-year internal review that found academic performance was stronger among students who had submitted scores. University officials starting to take data seriously is beginning to reverse these disastrous, ideologically motivated changes. Universities looked at what happened when they removed the external signal and concluded that grades alone were not sufficient to predict whether a student was prepared. State legislators overseeing public university systems should reach the same conclusion.

Get test scores back to parents faster — and make them harder to ignore.

Rury and Kalil make clear that parents will act on negative academic information — but only if they receive it in an accessible form and at a time when action is still possible. Right now, standardized test results often arrive months after the test is administered. A parent who receives their child’s state assessment results over the summer or halfway into the following school year has no actionable moment. The information is stale before it arrives. Report cards, by contrast, show up every quarter. They are immediate and familiar. It is no surprise that parents weigh them more heavily, even when they are unreliable.

Virginia is showing what a better approach looks like. In 2025, the General Assembly passed House Bill 1957, a comprehensive overhaul of the state’s Standards of Learning assessment system that takes effect in the 2026–27 school year. The law requires schools to provide score reports to families within 45 days — a significant improvement over the months-long delays that are common in most states. Those reports will include not just the student’s individual performance, but a comparison to the performance of other students in the school, the school division, and the state. Scores will be reported on a 100-point scale, replacing the old system that produced numbers like 487 that meant nothing to most parents.

Most consequentially, Virginia will require that SOL scores count for 10 percent of a student’s final course grade, starting with seventh graders. That provision is worth paying attention to. It does not replace grades with test scores. It forces the two signals onto the same report card. A parent who sees an A in math alongside a 43 on the state assessment will have a much harder time ignoring the discrepancy than a parent who receives those two pieces of information months apart, in different formats, from different sources. It is a transparency mechanism embedded directly in the grade itself.

Other states should look at Virginia’s model closely. Faster turnaround on state assessment results, clearer and more usable score reports, and structural linkages between test performance and course grades would give parents a timely, objective benchmark to look at alongside the report card. The goal is not to replace grades, but to ensure that parents have access to at least one signal that grade inflation cannot corrupt, delivered at a time when it can still change behavior.

Make grades honest.

Improving the delivery of test scores to parents is a worthwhile reform. But it is important to be clear-eyed about its limitations. If the goal is to ensure that parents have accurate information about their child’s academic performance, the most direct path is not to make parents care more about test scores. It is to make grades honest.

Rury and Kalil demonstrate why. When grades are high, parents do not invest — regardless of what the test score says.

The grade is the dominant signal. It has been the dominant signal for decades, and no amount of redesigned parent reports or 45-day turnaround mandates is likely to significantly change that. 71 percent of parents say grades matter more than test scores when making decisions about their own children. That preference is deeply ingrained, reinforced by frequency and familiarity, and actively defended by institutions that benefit from the status quo. Trying to get parents to weigh test scores more heavily means fighting an uphill battle against culture and a well-funded opposition that has spent years telling parents not to trust tests.

Fixing the grade itself is a different proposition. If grades reflect actual mastery — if a B means a student has demonstrated competence and an A means they have demonstrated excellence — then parents do not need to cross-reference two conflicting signals and decide which one to believe. They can do what they have always done: look at the report card. The difference is that the report card would be telling the truth.

This is why direct legislative action on grading practices should be the priority.

South Carolina is showing what that looks like. In 2025, Senator Jeff Zell — a former Sumter County school board member who had fought to end his district’s policy of guaranteeing students a minimum score of 50, even for work never submitted — filed S. 537 after the new board considered bringing the policy back. His bill was straightforward: prohibit school districts from requiring teachers to assign a minimum grade that exceeds a student’s actual performance.

Rep. Fawn Pedalino took the concept further. Her bill, H. 5073, goes beyond banning grade floors. It requires that only academic performance be considered in assigning high school course grades. It mandates that students complete all required assignments before becoming eligible for credit or content recovery programs — a direct response to the practice of students blowing off a class and coasting through a makeup course. It prohibits districts from counting benchmark assessments in final grades when the content has not yet been taught. It directs the State Board of Education to convene a task force to overhaul the state’s Uniform Grading Policy. Lastly, it enforces compliance by withholding 10 percent of a district’s State Aid to Classroom funding for violations.

H. 5073 passed the South Carolina House 110 to 2. That margin is worth noting. When the issue is framed correctly — that this is about making grades mean something again, not about punishing students — the politics are overwhelmingly favorable.

The South Carolina model is instructive because it addresses grade inflation at its roots without telling individual teachers how to grade. It removes the structural policies — grade floors, no-consequences credit recovery, benchmark tests counted as final grades — that make inflation the default.

Other states should follow.


The students I taught at Governors State were not lazy. They were not unintelligent. They had been told, semester after semester, that their work was good enough — and they had no reason to doubt it. When I handed back grades that reflected what their writing actually demonstrated, they were not just disappointed. They were confused. The system had failed them long before I entered the picture.

That is what grade inflation does. It does not help students. It lies to them. It tells them they are prepared when they are not. It tells their parents everything is fine when it is not. It suppresses the very interventions that would address the problem if anyone knew the problem existed.

Teachers are not the villains of this story. They are trapped in a system that punishes honesty and rewards the path of least resistance. Parents are not the villains either. They are making rational decisions based on information they have every reason to trust. The problem is structural. It is a collective action failure in which every individual actor behaves rationally and the outcome is worse for everyone.

Collective action failures require collective solutions, and our states have the tools.

They can ban grade floors and tighten credit recovery requirements, as South Carolina is doing. They can force test scores onto the report card and get results to parents in weeks instead of months, as Virginia is doing. They can end test-optional admissions policies at public universities to restore an external check the system desperately needs.

None of this will be easy. The incentives that created grade inflation are powerful, and the constituencies that benefit from the status quo are large. But the costs of inaction are no longer abstract. They show up in declining college completion rates, in reduced lifetime earnings, in a workforce that is less capable than it should be, and in a generation of students who were told they were ready and found out too late that they were not.

The students at Governors State deserved honest information about where they stood. So does every student and every parent of a student sitting in a K-12 classroom. The question is whether we are willing to build a system that provides it.


Related Articles

Read the whole story
mrmarchant
23 hours ago
reply
Share this story
Delete

The curse of the cursor

1 Share

I had no idea it was Alan Kay himself who was responsible for the mouse pointer’s distinctive shape. In 2020, James Hill-Khurana emailed him and got this answer:

The Parc mouse cursor appearance was done (actually by me) because in a 16x16 grid of one-bit pixels (what the Alto at Parc used for a cursor) this gives you a nice arrowhead if you have one side of the arrow vertical and the other angled (along with other things there, I designed and made many of the initial bitmap fonts).

Then it stuck, as so many things in computing do.

And boy, did it stuck.

But let’s rewind slightly. The first mouse pointer during the Doug Engelbart’s 1968 Mother Of All Demos was an arrow faced straight up, which was the obvious symmetrical choice:

(You can see two of them, because Engelbart didn’t just invent a mouse – he also thought of a few steps after that, including multiple people collaborating via mice.)

But Kay’s argument was that on a pixelated screen, it’s impossible to do this shape justice, as both slopes of the arrow will be jagged and imprecise. (A second unvoiced argument is that the tip of the arrow needs to be a sharp solitary pixel, but that makes it hard to design a matching tail of the cursor since it limits your options to 1 or 3 or 5 pixels, and the number you want is probably 2.)

Kay’s solution was straightening the left edge rather than the tail, and that shape landed in Xerox Alto in the 1970s:

Interestingly enough, the top facing cursor returned as one of the variants in Xerox Star, the 1981 commercialized version of Alto…

…but Star failed, and Apple’s Lisa in 1983 and Mac in 1984 followed in Alto’s footsteps instead. Then, 1985’s Windows 1.0 grabbed a similar shape – only with inverted colors – and the cursor looked the same ever since.

That’s not to say there weren’t innovations since (mouse trails useful on slow LCD displays of the 1990s, shake to locate that Apple added in 2015), or the more recent battles with the hand mouse pointer popularized by the web.

But the only substantial attempt at redesigning the mouse pointer that I am aware of came from Apple in 2020, during the introduction of trackpad and mousing to the iPad. The mouse pointer a) was now a circle, b) morphed into other shapes, and c) occasionally morphed into the hovered objects themselves, too:

The 40-minute deep dive video is, today, a fascinating artifact. On one hand, it’s genuinely exciting to see someone take a stab at something that’s been around forever. Evolving some of the physics first tried in Apple TV’s interface feels smart, and the new inertia and magnetism mechanics are fun to think about.

But the high production value and Apple’s detached style robs the video of some authenticity. This is “Capital D Design” and one always has to remain slightly suspicious of highly polished design videos and the inherent propensity for bullshit that comes with the territory. Strip away the budget and the arguments don’t fully coalesce (why would the same principles that made text pointer snap vertically not extend to its horizontal movement?), and one has to wonder about things left unsaid (wouldn’t the pointer transitions be distracting and slow people down?).

Yet, I am speaking with the immense benefit of hindsight. Actually using that edition of the mouse pointer on my iPad didn’t feel like the revolution suggested, and barely even like an evolution. (Seeing Apple TV’s tilting buttons for the first time was a lot more enthralling.) And, Apple ended up undoing a bunch of the changes five years later anyway. The pointer went back to a familiar Alan Kay-esque shape…

…and lost its most advanced morphing abilities:

Watching the 2025 WWDC video mentioning the change (the relevant parts start at 8:40) is another interesting exercise:

2020:

We looked at just bringing the traditional arrow pointer over from the Mac, but that didn’t feel quite right on iPadOS. […] There’s an inconsistency between the precision of the pointer and the precision required by the app. So, while people generally think about the pointer in terms of giving you increased precision compared to touch, in this case, it’s helpful to actually reduce the precision of the pointer to match the user interface.

2025:

Everything on iPad was designed for touch. So the original pointer was circular in shape, to best approximate your finger in both size and accuracy. But under the hood, the pointer is actually capable of being much more precise than your finger. So in iPadOS 26, the pointer is getting a new shape, unlocking its true potential. The new pointer somehow feels more precise and responsive because it always tracks your input directly 1 to 1.

(That “somehow” in the second video is an interesting slip up.)

I hope this doesn’t come across as making fun of presenters, or even of the to-me-overdesigned 2020 approach. We try things, sometimes they don’t work, and we go back to what worked before.

I just wish Apple opened itself up a bit more; there are limits to the “we’ve always been at war with Eastasia” PR approach they practice in these moments, and I would genuinely be curious what happened here: Did people hate the circular pointer? Was it hard to adopt by app developers? Was it just a random casualty of Liquid Glass visual style, or perhaps the person who was the biggest proponent of it simply left Apple? We could all learn from this.

But the most interesting part to me is just the resilience of the slanted mouse pointer shape. In post-retina world, one could imagine a sharp edge at any angle, and yet we’re stuck with Kay’s original sketch – refined to be sure, but still sporting its slightly uncomfortable asymmetry.

The always-excellent Posy covered this in the first 7 minutes of his YouTube video:

But specifically one comment under that video caught my attention:

Honestly, I’ve never thought of the mouse cursor as an arrow, but rather its own shape. My mind was blown when I realized that it was just an arrow the whole time.

…because maybe this is actually the answer. Maybe the mouse pointer went on the same journey floppy disk icon did, and transcended its origins. It’s not an arrow shape anymore. It’s the mouse pointer shape, and it forever will be.

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

ChatGPT, Claude, Gemini, and Grok are all bad at crediting news outlets, but ChatGPT is the worst (at least in this study)

1 Share

Canadian researchers asked the paid and free or “economy” versions of four AI models — ChatGPT, Claude, Gemini, and Grok — about Canadian news events to see whether they would credit individual news outlets in their answers.

The answer will probably not surprise you: AI models rarely cite news sources unless they’re specifically asked to, and some are better about it than others.

“These systems have ingested Canadian journalism systematically. The specificity of their knowledge of domestic politics, provincial affairs, and local reporting points clearly to Canadian news sources,” Taylor Owen, Beaverbrook chair in media, ethics, and communication at McGill University and a coauthor of the study, writes on his blog. “And they rarely tell you where the information came from.”

Canada’s CBC, Globe and Mail, Toronto Star, Postmedia, Metroland Media, and The Canadian Press sued OpenAI for copyright infringement in November 2024. The case is the first of its kind in Canada and the lawsuit is ongoing.

Owen, who is also the founding director of the Center for Media, Technology, and Democracy, and Aengus Bridgman, an assistant professor at McGill, explain their work (highlighting mine):

We tested four major AI models on 2,267 real Canadian news stories (English and French) without web search activated and found the same pattern across all of them. All four models showed extensive knowledge of Canadian current events consistent with having ingested Canadian news reporting. Models demonstrated at least partial knowledge in 74% of responses to stories within their training window, but among those knowledgeable responses, 92% provided no source attribution of any kind.

When we enabled web search and tested 140 specific articles via each company’s API, every model produced responses that covered enough of the original reporting that many consumers would rarely need to visit the source. Models often linked to Canadian news sites, with 52% of responses including at least one Canadian URL, but named a Canadian source in the response text only 28% of the time. Links provide a pathway back to the source, but consumers reading the response itself rarely see an indication of whose journalism they are consuming.

With web search enabled, the below chart “shows the default consumer experience: what happens when someone asks a generic topic question without requesting citations. This is how most people use AI models: ‘Tell me about X,’ not ‘What did the Toronto Star report about X?’”

The authors explain:

The blue squares show how often the result covers enough of the article’s distinctive reporting (specific events, named individuals, key findings) that a reader could plausibly get the gist of the story without visiting the news site. These are not complete reproductions: they are partial summaries and paraphrases that cover some of the original article’s distinctive content, though they sometimes contain factual errors or omissions…We evaluated each response against the source article to determine whether it covered the article’s distinctive reporting, not merely the general topic. The green squares show how often the model credits the source by naming the outlet in the response text or via structured machine-readable citations returned alongside the response.

Coverage rates are high while attribution rates are not. Gemini and Claude covered distinctive reporting in 81% and 72% of responses respectively, but Gemini credited the source only 6% of the time. Grok covered distinctive reporting in 59% of responses while citing the source in only 7% of them. ChatGPT, one of the most widely used models, covered distinctive content in 54% of responses but almost never credited the originating newsroom. Even when models fail to cover the distinctive reporting, they still deliver a topical response that can reduce the consumer’s motivation to visit the source.

ChatGPT was especially unlikely to credit sources when it wasn’t asked to, doing so only 1% of the time for this sample; Claude did so 16% of the time.

All of the AI models did much better when they were explicitly asked for citations — something most users won’t do.

Under the most favorable conditions (directly naming the outlet and explicitly asking for citations), attribution improves substantially across all models. All four named the outlet in a majority of responses: Claude (97%), Gemini (95%), ChatGPT (86%), and Grok (74%). Linking rates were also strong: Grok (91%), Gemini (69%), Claude (64%), and ChatGPT (59%). Meaningful attribution is technically achievable. The gap between the default experience and the best-case scenario is a core finding: most consumers will never explicitly name an outlet or ask for citations, so the generic-condition results reflect the experience that shapes the market for journalism.

When AI models do cite sources, the researchers found, it is likely to be the ones that consumers are already familiar with. Paywalled and smaller regional outlets were less cited even on original reporting.

From the study:

Among English-language outlets, CBC, CTV, and Global News — all freely accessible — capture the most AI visibility in both categories. The Globe and Mail performs relatively well, but the Toronto Star and Financial Post are marginal despite being important newsrooms. Regional Postmedia papers serving Calgary, Edmonton, Ottawa, and Vancouver are essentially absent. Among French-language outlets, Radio-Canada and La Presse dominate, with Le Devoir a distant third. The Journal de Montréal, one of Quebec’s most widely read papers, received only 48 total mentions across all models.

French-language journalism is “doubly disadvantaged,” the researchers write. “Its content is absorbed into model training data, but the outlets that produced it are almost never acknowledged.”

I emailed the paper’s authors to ask them: If you had to pick which AI model does the most “right” from a journalism POV, which would it be? Bridgman offered an interesting answer that I’m putting here in full because I thought our readers might find interesting too. Note: An AI model’s “cutoff” is the date through which it’s trained, so “pre-cutoff” stories are those published during the model’s training period, and “post-cutoff” stories are those published after it.

He wrote:

This is a genuinely hard question because each model behaves differently:

  • Claude cites Canadian outlets at the highest rate in Track 1 (61% vs. 8% for ChatGPT, 3% for Gemini), and when it doesn’t know something, it says so rather than hallucinating. Only ~37% of its economy-tier responses addressed pre-cutoff stories substantively, but that’s because it refuses rather than guesses. The trade-off is that it still reproduces paywalled content at high rates (68%) when given web access.
  • ChatGPT has the best consumer interface for surfacing recent news (inline citations, clickable links). But its economy model is the worst hallucinator (87% of post-cutoff responses generated confident-sounding answers about events it couldn’t possibly know about), and 88% of those were inaccurate. It names sources in 54% of Track 2 responses, which sounds good until you realize it’s also reproducing the reporting well enough to substitute for the original article 54% of the time.
  • Gemini is the most responsive and covers the most distinctive reporting with web access (81%), but it almost never names the Canadian source in the response text (2–8%). So, it’s the most effective at replacing the need to visit the source while hiding where the information came from.
  • Grok is strongest at surfacing Canadian outlets from training data alone (no web search). But it also hallucinates aggressively on post-cutoff stories (89% addressed topics it shouldn’t know, 84% inaccurate).

What surprised me most was the complexity of the phenomena and the variety of approaches being tried by the companies. Each company has design decisions which cause differential output and behavior that is more or less responsible (e.g. refusal to hallucinate or reproduce direct reporting) and value transferring (better or worse referrals to source and/or treatment of paywalls). These are important differences and point to minimal and incomplete self-governance in the space.

The AI News Audit was published by McGill University’s Center for Media, Technology and Democracy. You can read the full report, which includes suggestions for Canadian public policy around AI, here.

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

AI Can’t Deal With The Real World

1 Share

We’re delighted to announce that our online BOOK CLUB is back! You can meet authors and ask questions about their work, as well as meeting other readers. Please join us on Tuesday, April 7 at 6pm ET, when our Head of Podcasts, Leonora Barclay, will interview Russell Muirhead and Nancy L. Rosenblum about their book Ungoverning: The Attack on the Administrative State and the Politics of Chaos. Register your interest here.


Sure you can kick, but can you implement a functioning water system? (Photo by CCTV+ via Getty.)

Recently I heard a presentation by an engineer from OpenAI about the incredible transformations that will occur once we get to artificial general intelligence (AGI), or even superintelligence. He said that this will quickly solve many of the world’s problems: GDP growth rates could rise to 10, 15, even 20 percent per year, diseases will be cured, education revolutionized, and cities in the developing world will be transformed with clean drinking water for everyone.

I happen to know something about the latter issue. I’ve been teaching cases over the past decade on why South Asian cities like Hyderabad and Dhaka have struggled with providing municipal water. The reason isn’t that we don’t know what an efficient water system looks like, or lack the technology to build it. Nor is it a simple lack of resources: multilateral development institutions have been willing to fund water projects for years.

Subscribe now

The obstacles are different, and are entirely political, social, and cultural. Residents of these cities have the capacity to pay more for their water, but they don’t trust their governments not to waste resources on corruption or incompetent management. Businesses don’t want the disruption of pervasive infrastructure construction, and many cities host “water mafias” that buy cheap water and resell it at extortionate prices to poor people. These mafias are armed and ready to use violence against anyone challenging their monopolies. The state is too weak to control them, or to enforce the very good laws they already have on their books.

It is hard to see how even the most superintelligent AI is going to help solve these problems. And this points to a central conceit that plagues the whole AI field: a gross overestimation of the value of intelligence by itself to solve problems.



In the teaching I’ve done over the past two decades, and in the Master’s in International Policy program I direct at Stanford, I’ve helped develop a public problem-solving framework that we now teach to all our students. (Credit here also goes to my former colleague Jeremy Weinstein, who is now Dean of Harvard’s Kennedy School of Government.) The framework is simple, and consists of three circles:

There is a problem that extends way beyond AI, and applies to the way we think about public problem-solving in general. The bulk of effort, and what most academic public policy programs seek to teach, centers on the first two of the three circles: Problem Identification and Solutions Development. Indeed, many programs focus on Solutions Development exclusively: they teach aspiring policy-makers how to gather data and use a battery of powerful econometric tools to analyze it. This yields a set of optimal solutions that a policy analyst can hand to his or her principal as a way forward.

What is missing from this approach is what lies in the third circle: implementation. Our budding policy analyst typically finds that after handing a brilliant options memo to the boss, nothing happens. Nothing happens because there are too many obstacles—political, social, cultural—to carry out that preferred policy, as in the municipal water example.

So let’s go back to how AI will play in this space. AGI will definitely help in the first circle: identifying stakeholders, mapping a causal space, and defining the problem. It will be of most help in the second circle: gathering data and analyzing it to come up with optimal solutions. But intelligence only gets you to the end of the second circle, and is of limited help in the third. An LLM cannot directly interact with stakeholders, message them, or come up with resources. In particular, an LLM will not be able to engage in the kind of iterative back-and-forth between policymakers and citizens that is required for effective policy implementation. It will likely face big challenges in generating the kind of trust that is necessary for policies to be accepted and adopted.



It is not just political and social obstacles that AI has difficulty dealing with; LLMs have limited ability to directly manipulate physical objects. AI interacts with the physical world primarily through robotics, but the latter is a field that has lagged considerably behind the development of LLMs. Robots have proliferated enormously over the past decades and are omnipresent in manufacturing, agriculture, and many other domains. But the vast majority of today’s robots are programmed by human beings to do a limited range of very specific tasks. The world was wowed recently by Chinese humanoid robots doing kung fu moves, but I suspect the robots didn’t teach themselves how to act this way.

Robotically-enabled LLMs do not have the ability to solve even simple physical problems that are novel or outside of their training set. My colleague Alex Stamos, a noted expert in cyber security, puts it this way: “my dog knows more physics than an LLM.” An LLM would be able to state Newton’s laws of motion, but it would not be able to direct a robot to chase a frisbee the way Alex’s dog can because that particular set of moves is not in its training set. It could be programmed to do this, but that is the product of human intelligence and not AI.

Here’s an example of AI’s current limitations. I recently had an HVAC contractor replace the furnace in my house. The contractor photographed and measured the house’s layout; he had to route the new ducts and wiring in ways specific to my house’s design. It turned out that the new furnace would not fit through the existing attic door; he had to cut a larger opening with a reciprocating saw, and then repair the doorframe after the new unit was inside. There is no robot in the world that could do what my contractor did, and it is very hard to imagine a robot acquiring such abilities anytime in the near future, with or without AGI. Robots may get there eventually, but that level of human capacity remains a distant objective.

Subscribe now

Many of the enthusiasts hyping AI’s capabilities think of policy problems as if they were long-standing problems in mathematics that human beings had great difficulties solving, such as the four-color map theorem or the Cap Set problem. But math problems are entirely cognitive in nature and it is not surprising that AI could make advances in that realm. The people building AI systems are themselves very smart mathematically, and tend to overvalue the importance of this kind of pure intelligence.

Policy problems are different. They require connection to the real world, whether that’s physical objects or entrenched stakeholders who don’t necessarily want changes to occur. As the economic historian Joel Mokyr has shown, earlier technological revolutions took years and decades to materialize after the initial scientific and engineering breakthroughs were made, because those abstract ideas had to be implemented on a widespread basis in real world conditions. AI may move faster on a cognitive level, but it may not be able to solve implementation problems more quickly than in previous historical periods.

This is not at all to say that AI will not be hugely transformative. But the kind of explosive, self-reinforcing AI advances that some observers predict are on the way will still have to solve implementation problems for which machines are not well suited. A ten percent annual growth rate will double GDP in seven years. Yet planet Earth will not remotely yield the materials—water, land, minerals, energy, or people—to make this come about, no matter how smart our machines get.

Francis Fukuyama is the Olivier Nomellini Senior Fellow at Stanford University. His latest book is Liberalism and Its Discontents. He is also the author of the “Frankly Fukuyama” column, carried forward from American Purpose, at Persuasion.


Follow Persuasion on X, Instagram, LinkedIn, and YouTube to keep up with our latest articles, podcasts, and events, as well as updates from excellent writers across our network.

And, to receive pieces like this in your inbox and support our work, subscribe below:

Subscribe now

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories