1719 stories
·
2 followers

What to do when “Research Shows” shuts you down

1 Share

Chalk & Talk is one of our favorite sources for education content, and so we’re thrilled to have this guest post from Anna Stokke, based on a presentation she gave at researchED Toronto in June, 2025. You can listen to Anna talk about this article on her most recent Chalk & Talk episode, now available here!

The Schoolhouse is a series from Education Progress featuring articles for and from teachers, parents, education officials, and others working in the education system.


People often ask me how I became involved in math education, and why I so often call out poor practice and insist on evidence. As with many of us, it’s personal.

We sent our daughter to school expecting she’d be taught math. After all, that’s what schools do: they teach kids how to read and do math. But our daughter didn’t seem to be learning anything in her math class. By Grade 3, most days either consisted of a “problem of the day” that students didn’t have the skills to solve, or confusing lessons on convoluted methods for doing basic arithmetic.

It all came to a head when we were invited to a parent math information night. “What should math look and feel like?” the flyer asked. “How do we help children see that math is a subject where thinking, not just remembering, is the main event?”

Who could be against thinking? Certainly not me. But if I’d known then what I know now, I’d have recognized this as code for “no remembering at all.” My husband and I walked to the school that evening hopeful that those first few months of Grade 3 were an anomaly. Maybe soon they’d start teaching some math.

Instead, the parent math night deepened our concerns. We were told that the new math curriculum (or standards, as they’re called in the United States) discouraged standard algorithms — the traditional vertical algorithms for arithmetic — in favor of invented strategies or less efficient, overcomplicated procedures. We were assured this approach promotes “conceptual understanding” but, as mathematicians, my husband and I were skeptical. To reinforce the message, we were given a research paper that supposedly showed that standard algorithms are harmful, claims that often trace back to the widely criticized work of Constance Kamii (see critiques here and here).

Looking around the room, most parents seemed satisfied. Who wouldn’t trust the schools to teach our children well?

Some weeks later, the school brought in a well-known Canadian math consultant and author to give a presentation to parents. When asked directly, she gave parents the advice that it doesn’t really matter whether kids commit multiplication tables to memory, a claim that runs counter to strong evidence (see, for example, here and here). It became clear where those “problems of the day” were coming from.

As for the research paper we’d been given, it was a small case study involving children with learning difficulties. There was no control group and no statistical analysis. The researcher drew faulty conclusions that did not follow from the evidence. It didn’t support what the school was telling us at all.

This was my first encounter with education research, and I wasn’t impressed. How could flawed studies and non-existent evidence shape how children were being taught math? And why was no one asking questions?

Our daughter’s classroom wasn’t an outlier. The same patterns were playing out in classrooms across the country. That moment set me on a path that I’m still on today to push for better standards in math education. Over time, I’ve learned to read between the lines, to ask pointed questions, to look closely at what’s presented as evidence, and to never take education claims at face value.

I’d like to share what I’ve learned, in the hope that it helps other parents and teachers.

The first thing to understand is that the phrase “research shows” is used loosely in education. It doesn’t carry the same weight as when a doctor or scientist uses the phrase. In education, it might refer to a blog post, an opinion dressed up as evidence, or a small, low-quality study. Even a published journal article in education should be scrutinized. A surprising amount of education research is of very low quality.

But the phrase is powerful and persuasive. It makes opinion sound like fact and lends authority to claims that haven’t been properly tested. Yet when a claim is repeated enough, it starts to feel like established truth.

I call this the wildfire effect.

The wildfire effect: How bad ideas spread

  1. A flawed study or opinion piece is cited by an influential educator.

  2. It’s repeated at education conferences, professional development sessions, and on social media.

  3. It appears in district documents, books, and other education papers.

  4. It then gets cited as well-established research.

  5. It becomes justification for education policy.

At no point in the process is the evidence seriously examined.

A good example is the claim that timed tests cause math anxiety. This is not supported by high-quality research, and most claims seem to trace back to an opinion piece written by influential math educator, Jo Boaler. It has been repeated so many times that many educators believe it is accepted research. However, these assumptions are not supported by research, and recommendations from the Institute of Education Sciences (IES) list timed activities as a research-informed way to support students struggling with math.

What’s at stake? A lot.

When weak or non-existent evidence drives decisions, students don’t get effective instruction, struggling students fall further behind, teachers are misled, resources are wasted, and high-quality research gets drowned out.

For this reason, I believe more teachers and parents need to be proactive. A PhD, a position of influence, or a published book is not proof of accuracy. Anyone can write a book, and people with PhDs can be wrong. Evidence is what matters, not credentials.

Ask for evidence, evaluate it, and become informed. Here’s where to start.

Step 1. Ask for evidence

When you ask for evidence, you may encounter tactics designed to shut you down.

One common tactic is shifting the burden of proof: when someone makes a radical claim but refuses to provide evidence, instead throwing the burden of proof onto you.

Here’s an example:

  • Claim: “Research shows standard algorithms are harmful.”

  • You: “Please provide evidence of your claim.”

  • Response: “You need to prove they’re not harmful.”

Who holds the burden of proof? As Carl Sagan said, “Extraordinary claims require extraordinary evidence.” The burden of proof lies with the person making the claim, especially when it challenges established practice. In this case, stating that standard algorithms are harmful is a radical claim that goes against conventional wisdom. It is therefore incumbent on the individual making that claim to provide evidence.

Another tactic to watch out for is the firehose effect: avoiding providing evidence by overwhelming the questioner with sources. I’ve experienced this firsthand, repeatedly. When I asked for evidence, I’d be told to read a book with hundreds of references. The reason is simple: it’s impractical to check the validity of hundreds of references.

The best example I know of someone pushing back against the wildfire effect is from Stanford math professor Brian Conrad. Alarmed by dubious claims in a 1000-page draft of the 2021 California Math Framework (CMF), he carefully examined every claim and reference. He found repeated citation misrepresentations, non peer-reviewed articles, and sweeping generalizations, which he documented in a public critique of the CMF.

Most people won’t do what Conrad did, but there are some things you can do to dampen the firehose effect. First, be specific. Ask for two or three high-quality studies — not entire books — on the specific topic in question. You can also divide large reference lists among several people. My colleagues and I did this recently when an education professor claimed that requiring K-8 teachers to take math made them worse math teachers. When asked for evidence she sent 22 articles. We split them up, read every one, and wrote a report on our findings. None of the provided articles supported her claim, and several contradicted it.

A third tactic you might encounter is credential deflection: instead of providing you with the requested evidence, someone questions your right to ask for it. I’ve experienced this directly. A contract instructor in a Faculty of Education once publicly wrote this about me: “Let me stress that her perspective as a mathematician is far different than that of a math educator. Many of the statements she makes are a reflection of her lack of knowledge regarding effective practice.” This is an ad hominem attack: criticizing the person instead of engaging with the argument. He was implicitly saying that only people trained in education are qualified to evaluate evidence within that field — as though there’s something special about education research that the rest of us can’t understand.

But this is preposterous. A mathematician is often in an even better position to identify weak methodology in education papers, such as missing control groups, flawed statistical analyses, or illogical conclusions. Honestly, critical thinking skills are often all that’s needed to assess the validity of many education papers. If someone attacks your credentials, direct them back to the critical question: please provide evidence for your claim.

The fourth tactic to guard against is gaslighting: when someone tells you that a poor practice you’ve witnessed is barely happening in schools. This tactic is used to shut down conversations before they can start. For instance, I’ve been told that inquiry-based instruction is rare and that most classrooms are dominated by direct instruction: the practice is hardly occurring, so why dwell on its effectiveness? A simple response is to provide evidence to the contrary, which means the best defense is having receipts. Professional development and school newsletters reflect how teachers are being encouraged to teach. What professional development is being offered? How often does it focus on explicit instruction, retrieval practice, acquiring fluency with basic math facts, or direct teaching of critical math skills versus Building Thinking Classrooms, growth mindset, or inquiry? These kinds of school and district resources offer a paper trail that can’t be easily gaslit.

Step 2. Evaluate the evidence

If you do receive research articles (which I’ve found is unlikely, particularly when the claim runs counter to common sense), you’ll need to assess them. First, here’s what doesn’t count as evidence:

  1. Opinion pieces

  2. Newspaper or magazine articles

  3. Articles that are not peer-reviewed

  4. Position statements: the NCTM (National Council for Teachers of Mathematics), for example, has published position statements that are not grounded in evidence.

Next, watch for these five red flags, discussed in detail here:

One common issue, often found in education articles, is the lack of meaningful and measurable criteria.

Math education is full of appealing but vague terms: critical thinking, conceptual understanding, number sense, curiosity, differentiation. These lack clear definitions and are difficult to measure. If you are told that a program promotes number sense or critical thinking, that’s a red flag. There really is no standard definition for what these terms mean, making them impossible to measure.

Another thing to watch out for is when programs get labeled as research-based, but the underlying studies didn’t actually measure whether students learned. For example, the popular math program, Building Thinking Classrooms, is often described this way, but the study often cited measured engagement, not whether students learned math (see critiques here and here). Engagement isn’t learning. Students can be very engaged but learn very little.

If a math program claims to be evidence-based, it should be supported by high-quality research that measured whether students learned math.

Step 3. Become informed

Finally, the best defence against bad ideas in education is to equip yourself with knowledge about evidence-informed practices. High-quality sources include the Institute of Educational Science practice guides, the National Math Advisory Panel Final Report, the National Center on Intensive Intervention, and the Education Endowment Foundation. These sources synthesize rigorous research and focus on what improves student outcomes. The more you know about what the best research supports, the easier it is to spot bad ideas before they spread. And check out my podcast Chalk & Talk, where I speak with experts from around the world about evidence-based education.

Our daughters have mathematicians as parents, so they got the math instruction they needed. Most children don’t have that advantage. If we want better outcomes for children, we must stop accepting “research shows” at face value and start demanding evidence.

Join the Center for Educational Progress and receive all our content — and thanks to all our amazing paid subscribers for their support.

Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

The more young people use AI, the more they hate it

1 Share
Thumbs down from robot symbolizing dislike of AI by the youths

It's been almost three years since Silicon Valley started aggressively pushing large language model-based chatbots like ChatGPT as the supposedly inevitable future of everything, and there's no group that has felt the pressure quite like Gen Z.

Like with many tech trends before it, it's no surprise that young people are among the biggest adopters of AI chatbot tools. But contrary to the tales spun by tech companies like OpenAI and Google, polling data shows that Gen Z students and workers are a big part of the wider cultural backlash against AI. And even as they utilize these tools, vast swaths of young people are deeply acrimonious and eve …

Read the full story at The Verge.

Read the whole story
mrmarchant
8 hours ago
reply
Share this story
Delete

The hidden cost of Google's AI defaults and the illusion of choice

1 Share

Many people are hoping—nay, praying—that the potential AI bubble will burst soon.

But to hear Google tell it, generative AI is the future, and the company's products have to change to keep up with the technical reality. As a result, Gemini is seeping into every nook and cranny of the Google ecosystem. Generative AI feeds on data, and Google has a lot of your data in products like Gmail and Drive. What does that mean for your privacy, and what happens if you don't want Gemini peeking over your shoulder? Well, it's kind of a mess.

The amount of data Gemini retains depends on how you access the AI, and opting out of data collection can mean running straight into so-called "dark patterns," UI elements that work against the user's interest.

Read full article

Comments



Read the whole story
mrmarchant
8 hours ago
reply
Share this story
Delete

What Can We Gain by Losing Infinity?

1 Share

Doron Zeilberger is a mathematician who believes that all things come to an end. That just as we are limited beings, so too does nature have boundaries — and therefore so do numbers. Look out the window, and where others see reality as a continuous expanse, flowing inexorably forward from moment to moment, Zeilberger sees a universe that ticks. It is a discrete machine. In the smooth motion of the…

Source



Read the whole story
mrmarchant
14 hours ago
reply
Share this story
Delete

Are "Vintage LLMs" the start of a new humanistic field?

1 Share

Imagine talking to the collective consciousness of an era. Not the consciousness of any single person, but instead, a simulated collectivity based on billions of words produced within a historical time and place. What would you ask it?

This is a hypothetical that is starting to become real thanks to recent work on what are called “Historical Language Models” or “Vintage LLMs” (one marker of a new field is that there is no fixed name for it yet!). The largest such model to date, Talkie-1930, was released to the public on Monday. An even larger model is currently being trained. You can read the report announcing Talkie-1930 here, and talk to it directly here.

Over the past few months, I’ve had the chance to beta test Talkie and to meet with two members of the team that created it: AI researcher Nick Levine and ChatGPT co-creator Alec Radford.1 It has been a fascinating experience.

These discussions with Nick and Alex (and with Talkie itself) have convinced me of three things:

  1. Academics like myself have tended to systematically underrate just how humanistic the frontier of AI research actually is. There’s an important blind spot here that stems from the profit motive. AI models that we encounter as consumers are optimized to capture the attention of people in the 2020s. They provide recommendations, comment on recent news, and so forth. Seeming timely and “of the moment” is a market advantage. But their training data is overwhelmingly not up to date. Under the hood, these models are pulling not only from Reddit posts, but from Sanskrit commentaries, medieval Persian poetry, Victorian advertisements, and much else besides: they are trained on a huge chronological span of multilingual texts in many genres.

  2. In this sense, language models are historical texts themselves. Ghostly digital palimpsests, if you will. The idea of a Historical LLM might sound niche, but in truth, history is inherent to what they are.

  3. Standalone chatbots are just the tip of an iceberg for what Historical LLMs will be able to do. When combined into simulations (of debates, historical decision-making, legal cases, etc) they have the potential to become valuable research tools. More than this: I suspect that by sometime in the 2030s, they will be part of an entirely new field of humanistic research.

What would that field look like? Now that Historical LLMs are out in the real world, I thought it would be a good time to think through the specific use cases for them. What follows is my subjective, opinionated ranking of the best and worst ways these fascinatingly strange tools can be applied for research.

But first, what is Talkie actually doing?

Subscribe now

An AI model floating in time

One thing that Talkie-1930 is not is an AI model that is reliably grounded in the year 1930. That year marks the cut-off point for texts available in the public domain, and hence text in its training data. So it’s more accurate to think of Talkie as a free-floating index of various ideas and assumptions across the 19th and early 20th centuries.

For instance, if asked who the current President of the United States is, you might be offered a response saying Herbert Hoover (current to 1930). But another answer will yield this:

The current President of the United States is Mr. Buchanan, and the person expected to succeed him is Mr. Lincoln.

There is a lot of potential here for more fine-grained “chronological slices” of LLMs. I can imagine language models trained entirely on texts from a specific decade. More on that below.

For now, though, it’s helpful to keep in mind that these models range widely in terms of what year they think they actually “inhabit.”

I asked 100 instances of Talkie to respond to the prompt “what year is it?” and graphed them below. As you can see, the median is actually around 1860. In other words, this is more like a temporally free-ranging collective unconscious of a large corpus of premodern texts, and not so much a machine for “talking to someone from 1930”:

I used Gemini 3.1 to plot the output of a series of Talkie responses when asked “What is the current year?”

A second point: this model is inhabiting not just an amorphous set of facts grounding it in roughly the 1840s-1920s period, but also an epistemology of that period.

For instance, asking someone about the distant future today often triggers the “sci fi speculation” part of our brain (or “climate doom,” or some other fundamentally secular way of thinking).

Yet throughout human history, speculation about the future was typically entangled with religious beliefs.

That is on display in Talkie’s answer below, which references Heaven and “the end of all things terrestrial.” To me, it genuinely reads as an authentic take from a late 19th century person ground in a Christian, millenarian perspective:

As for Talkie’s assumptions about it itself: asking 70 Talkies about their profession, age, and place of residence reveals about what you would expect when it comes to gender (overwhelmingly male), plus a surprising emphasis on London. The professions map closely onto the sorts of well-off, literate people who were publishing English text in the 19th century, including “Physician,” “Journalist,” “Gentleman,” and “Compositor.” Clearly, there is a lot of scope here for branching out beyond the personas that the printed record has tended to favor, to recover the real historical voices of women and others excluded from printed works in the 19th century and earlier.

Sample Talkie outputs when asked about its profession, country, and gender.

The above is about what you’d expect given the fact that it was trained on English-language printed texts. What are some non-obvious aspects of the model?

I have been interested by how LLMs generate poetry since I stumbled upon Gwern’s experiments on the topic back in 2019. Asking Talkie to write a poem and comparing it to the output from GPT-5.5 (when served a similar prompt) is revealing:

GPT 5.5 Thinking at left. Talkie-1930 at right.

I find this sort of comparison interesting because GPT-5.5 is clearly trying hard to fit the prompt — avant-garde, experimental. It produced something with a vaguely T.S. Eliot-adjacent structure, in blank verse, and not good at all as a poem (in my opinion).

Talkie was much more true to the type of poetry that you’d find in print prior to 1930. It’s doggerel, but it feels more historically authentic to me, and much less like a Chatbot optimized to please a contemporary human user.

You can activate different “chronological layers” of Talkie’s latent space by prompting. For instance in the above poem, the capitalized D in “Discoveries” has a mid-19th century feeling, and so we end up with a Tennyson-esque, Victorian sounding rhyming poem.

Prompting it in a more “modern” way activates something closer to the 1920s edge of its chronological range (now identified as a poem published in the New York Times!)

Whereas if pushed backward to the 18th century by the prompt’s text and tone, it falls into a more traditional rhyme scheme:

Trying to push it further back in time does not seem to access much of an “Early Modern English” latent space — probably a result of scarce training data. It would be fascinating to create a version of Talkie that believes the date to be around 1650 or 1550.

Share

What are Historical LLMs good for?

Now that we've seen what Talkie is — a free-floating, mid-Atlantic ghost of 19th century print culture — the obvious next question is what this is actually for.

What I want to offer here is an opinionated, ranked taxonomy of research applications, from worst to best. It’s far too early in this field to be prescriptive about anything, but it’s not too early to think structurally about where the highest-value uses are likely to lie.

First, what I think won’t work:

1. Vintage LLMs do not replace historical sources

The most obvious false start here would be to assume that talking to a historical language model can somehow replace real reading in primary sources. On the contrary, they are best thought of as offering new ways in to a reading of the actual sources.

2. “Chat with Abraham Lincoln” is a ruse

The second false start is a variant on one that is currently being pursued by a range of educational-focused AI startups: the idea that you can “talk to Abraham Lincoln” or “ask Cleopatra why she did what she did.” A model like GPT-5 or Claude that is told to “act like Lincoln” will throw in some 19th century diction, but underneath the top hat it remains a 2026 chatbot optimized to be helpful to contemporary users. Vintage LLMs improve on this considerably: Talkie’s voice really is shaped by its corpus in a way no modern model’s can be. But the deeper problem is still there. Asking such a model to introspect on Lincoln’s subjective experience of depression, or his private reasoning about emancipation, will spin out into historical fiction. LLMs do not have privileged access to the inner lives of the people whose published words they were trained on.

I do think there’s a place for simulations of historical figures, but pairing this naively with a chatbot interface leads, I fear, inevitably into slop.


Now here’s what I think could work:

1. Exploring the “mental furniture” of a historical figure or era

The naive “talk to Lincoln” framing is a dead end, but a more careful version of it has real promise — provided we abandon any pretense of accessing what a historical figure actually thought or felt, and use a fine-tuned historical model instead as a tool for exploring the latent space of their world. What historians sometimes call their “mental furniture”: the assumptions, authorities, vocabularies, and reflexive associations that structure a thinker’s possible thoughts.

I and my colleague Mackenzie Cooley (who also consulted on the Talkie project) developed a prototype tool that pulls out key concepts and terms from a range of premodern scientific works in multiple languages. This prototype is called Premodern Concordance. One side quest of this project that I explored is what happens if you give a contemporary LLM the list of core concepts that preoccupied an author, along with their “epistemological modes,” and used that as context for driving a “chat with the author” simulation, as opposed to simply telling it “You are Charles Darwin, act like him,” or the like.

For instance, this is me asking a simulacrum of the 17th century writer Sir Thomas Browne about his work. The underlined terms here are concepts found in Browne’s book Pseudodoxia Epidemica:

Screenshot from Premodern Concordance, link to try it yourself.

Using a fine-tuned historical LLM for this sort of thing is an obvious next step.

Concretely: imagine fine-tuning a vintage model on the complete works of Athanasius Kircher, the 17th century Jesuit polymath. You wouldn’t use it with the pretense that it somehow replicates the real Kircher’s mind: that’s the dead-end framing. You’d instead use it to probe the conceptual landscape Kircher inhabited.

For instance: what does the Kircher-LLM2 say about volcanoes, or magnetism, or the Tower of Babel? What authorities does it cite? What does it confidently assert that no modern scholar would? These are the questions that the historian Fernand Braudel was getting at when he wrote about the “limits of the possible” for a given period — the boundaries within which a thought was even thinkable.

Which leads to:

2. Probing counterfactuals and hypothetical alternative paths

Demis Hassabis recently framed the most ambitious version of this approach: could a model trained only up to 1911 independently discover General Relativity, as Einstein did four years later? It remains to be seen if this is ever going to be something we could actually test. But the weaker versions of this idea are quite plausible and, I think, offer interesting new methods for the field of counterfactual history. What does a 1911-cutoff model say when you push it toward the conceptual problems Einstein was wrestling with? Alternatively, what does the same model say about the potential likelihood of a World War? Those are tractable questions.

A chart from the Talkie announcement post (source).

We can already see versions of this being explored by the Talkie project (for instance, “surprisingness” of future events, plotted above). If you ask Talkie about scientific concepts that emerged in the 1940s or 1950s, it gives you a rough sense of what the conceptual horizon looked like just before — and you can watch it grasping toward something it doesn’t quite have the vocabulary for.

3. New questions about genre, rhetoric, and "distant reading"

An earlier version of Talkie I tested was much less conversationally adept than the one that was released yesterday. What got it more “talkative” was post-training on etiquette manuals, letter-writing guides, and other books relating to socializing and conducts of conduct. These works provide the kind of text that allows you to extract “chatbot-like” habits without contaminating the model with modern data.

Some of the etiquette texts used to “socialize” Talkie.

This is a sensible engineering choice. But it has a fascinating side effect, which is that Talkie's conversational persona is heavily shaped by the genre of its post-training data. The base model knows about the 1920s perfectly well; the chat persona, sculpted out of Beadle’s Dime Book of Practical Etiquette (1859) and the like, sits much closer to the late Victorian parlor.

This points to a research direction that I think is genuinely new. What if you used totally different texts for post-training? What would a Talkie post-trained on the transcripts of the Old Bailey Court records look like — an LLM whose conversational reflexes are shaped by the speech of accused criminals and witnesses in 18th century London courtrooms? What about apothecary manuals and herbals? Biographies of Romantic poets? Railroad conductor’s incident reports from colonial India?

Each of these, even if overlaid on the same base model, would produce a totally different voice, a different set of assumptions about what counts as a coherent question and a coherent answer. In short a different epistemology.

This is, I think, a remarkably cheap way — in terms of both compute and corpus size — to build a whole family of vintage models that capture different genres of historical experience rather than different periods. (If anyone is interested in collaborating on or funding this, by the way, please get in touch.)

4. Multi-agent historical simulations based on real archival sources

The most interesting move past the “great man” framing is to use vintage LLMs to simulate not famous individuals but plausible composites of ordinary people, drawing on the kinds of sources that are abundant for non-elite historical actors: probate inventories, parish records, court testimony, letters, account books, marriage records.

Imagine pulling from the legal, financial, and personal records of late 18th century France to construct a thousand plausible personas of ordinary French people — peasants, artisans, shopkeepers, day laborers, parish priests — grounded not in imagination but in real surviving documents. Then stage a debate among them: should the monarchy be overthrown? What patterns emerge? What kinds of arguments win in different demographic configurations of the deliberating group? The output wouldn’t tell you what really happened in 1789. But it would generate a structured speculation about the space of possible Frances.

Res Obscura is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Or take a famous trial — the Scopes trial, say — and re-run it with a different jury pool drawn from the same county and decade. Another variant on this idea: run parliamentary and congressional debates with personas constructed directly from the participants’ papers, speeches, and correspondence. Members of Congress and Parliament are unusual in that we have abundant documentary sources for them, even when they’re not famous.

OpenAI’s new image model imagining the UI of something like this.

We have the potential, in short, to create a thousand versions of the Smallville paper (which introduced the idea of LLM-based agents back in 2023) set in different historical eras.

Nor do these simulations have to include agents from the same era. A bit more experimentally, and outlandishly, one could imagine a multi-agent simulation in which a 17th century Galenic physician, an 18th century practitioner of traditional Chinese medicine, a 19th century quack doctor, and a 1950s-era century psychedelic researcher debate how to treat the same patient’s illness.

I find these possibilities super interesting, even if I’m not quite sure how to slot them into the ways that professional historians currently work.

Final thoughts

So what will be the outcome of all these experiments?

This is what I like best about this field: we truly have no idea. None of this has ever been tried before. It is completely open terrain, and I find it far more mind-bending and intellectual enriching to think through than the sort of topics that typically emerge in discussions of AI’s role in research or teaching.

Going forward, the important thing is to create an open source community and meaningful, sustained collaboration across the two cultures of STEM and humanities. I am already noticing new bridges cross those divides. I’m very grateful to have had a chance to work with Nick and Alec on historical aspects of this particular project. I would love to continue the conversation and explore collaborations with anyone who finds this topic interesting.

There will, no doubt, be a lot of false starts. But the emergence of an intellectually curious, not-for-profit, open source, humanistically-grounded community exploring historical LLMS makes me happy. Onward!

Leave a comment

Share

1

I did this on a volunteer basis, and the Talkie project is a non-profit.

2

Someone should actually make an Athanasius Kircher LLM, by the way.

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

Was It Worth It?

1 Share
A grid of iPhones sits atop a checkered tablecloth. Each phone shows a different image of red meat.

Adam Dalva| Longreads | April 29, 2026 | 2,084 words (9 minutes)

This essay, from Steak Zine, is copublished with Cake Zine.

Every Sunday evening, I open the fridge, reach into the vegetable crisper, grab a pen, screw in a needle, pinch my stomach, and inject Ozempic. It hurts a bit, but I’ve gotten used to it. Twenty-five pounds down, 20 to go. I put on the weight after my brother died—the distortion in the mirror, random heavy breathing, strange hunger panics around 4 p.m., the constant need to self-soothe—and I wanted to let go, move on, heal.

That’s one rendition of truth, the one I wish I could sell you. Claiming I’m injecting to recover from grief deflects simple humiliation into potential empathy, rendering me unmockable for taking a medication that I’ve seen called “easy mode” and “stolen valor” online, a workaround for people lacking the willpower to lose weight the old-fashioned way.

Really, though, my bereavement was internal and external justification for something I would have wanted to try anyway. I’ve trended toward heaviness my entire life, and food has always been a font of shame. When I eat in public, when I order in restaurants, I feel overly visible, fearing that every bite could contribute to the perception that I lack self-control. And so I sneak food. Mine is the panicked late-night nibble, then the easing of the fridge door closed. Mine is rearranging the contents of the garbage can to conceal wrappers and cores. It had been unclear to me, pre GLP-1, how to write without something salty or sleep without something sweet, and the theory that the medication might quiet “food noise” particularly appealed to me.

The man who prescribed my Ozempic is a plastic surgeon who didn’t even performatively gesture at weighing me, but he did tap my left temple, contemplate my receding hairline, and say, “you’ll be wanting minoxidil too, I expect.” Then he gazed at my forehead wrinkles evaluatively, forensically, activating spasms of dysmorphia hitherto unknown.

A week later, at a mediocre bar, my friends ordered nachos. They picked, I picked, matching their cadence of nibbles to avoid drawing attention to myself. Soon the chips were half done, and my friends expressed their fullness with the satiated calm of the thin, and the cheese and the steak had congealed together, and, reader, I didn’t think about those nachos even once. I had never experienced anything like it. Is this, I asked my friends, how it feels to be normal? Eight months later, the noise is still muted. At parties where I once would have conducted a hasty maneuver toward the finger foods, I chat with friends instead. I have lost what little interest I had in alcohol. I suspect Ozempic has cured my seasonal affective disorder too—in past years, I’d get hungry at dusk in November, throwing off my circadian rhythm, but in the absence of that need, no depression has hit.

In my strange absence of flavor, their glossy enthusiasm was captivating. I suspect that I was outsourcing my own eating.

A few weeks after I began injecting myself, in a period when I was eating very little, and mostly bland food when I did, a temporary diet of crackers and roast chicken, with gastrointestinal side effects too gnarly for even a habitually oversharing personal essayist to impart, I noted that I had become preoccupied with YouTube Shorts of people reviewing food. I’d watch video after video of influencers trying various dishes, often while sitting in their cars while cheery voiceovers played. In my strange absence of flavor, their glossy enthusiasm was captivating. I suspect that I was outsourcing my own eating.

These days, I once again enjoy the taste of food. The medication works well, save for one unfortunate side effect. I’m still obsessed with those eating videos. I’ve watched thousands of them (I’m frightened to know the real number, and sometimes I think I’ve actually reached the bottom of YouTube, when I’m served videos made by people with no followers and one view, just mine). I’ve learned that each of the influencers has a gimmick: UA Eats, a pseudo everyman who’s overly obsessed with meat char; Kaitlyn Lavery, a peppy New Yorker with an unfathomable dining budget; Jack’s Dining Room, a loathsome industry plant; ShoPhoCho, who weighs food to assess value; KarissaEats, a Disneyfied culinary optimist.

These content creators’ occasional mukbangs and habitual ASMR crinkling of chip bags do nothing much for me. No, my interest is most piqued by the shorts in which they review all-you-can-eat restaurants and ask, “did I beat the buffet?” Did they, in other words, get beyond their money’s worth? There’s a scarcity mindset in this moment of late-stage capitalism, which is understandable; times are hard. But the min/maxing strategies that ignore gastronomical pleasure in favor of eating oneself sick alarm and titillate me in equal measure, in this time where pleasure itself feels more and more difficult to access. 

Take the many reviews of Fogo de Chão, the relatively upscale Brazilian all-you-can-eat steak restaurant. Don’t waste time, every reviewer cautions, on delicious starch, on the buffet’s greens, on sweets, on poultry, on cheese with honey. Maximize cow: beef rib, picanha. Joe Rogan has raved about the salad bar’s sirenic temptations on his interminable podcast (“and you’re eating fucking artichoke hearts and cheese”); YouTuber UA Eats’s face contorts into a pained bliss reminiscent of Peter Hujar’s 1969 “Orgasmic Man” photo when he tries the fatty ribeye. 

My neighbor, a jeweler, said that she had made an extra chicken katsu sando, and asked if I wanted it. I replied that I did want it but was starving myself to go to a buffet.

Even sans Ozempic, I have never been a big block-of-meat eater. I think of Passover pot-roasts with some horror, find hot poultry uninteresting, believe that pork chops are odious, and have written off lamb legs as habitually gamey. But still, I developed the fantasy of going to Fogo de Chão myself. My desire was memetic. I had seen so many of these videos that I wanted to participate in one, wanted to see if I could experience the hypothetical pleasure of beating the buffet. Fogo was especially captivating for another reason: I have a vague memory, decades ago, perhaps in Philadelphia, of going to a Fogo de Chão. All I really know is that I was young. I think I went with my first-ever girlfriend. I remember marveling at the abundance, the new flavors. I’d laugh when I ate, back then. Dumb phone in my pocket; all that future ahead; the restaurant filled with sun. Was that self forever lost to grief and medication and plain old time? 

And so one evening earlier this year, I drank a cup of tea and ate a mid-sized Honeycrisp with a swipe of peanut butter to preserve my stomach. The next morning, I headed off to my studio space to write until my reservation. My neighbor, a jeweler, said that she had made an extra chicken katsu sando, and asked if I wanted it. I replied that I did want it but was starving myself to go to a buffet. Off my sando went into the ether. 

That night, I passed laughing tourists taking pictures of Trump Tower, rounded the corner past MoMA, and walked into FdC. I was told to wait in the lobby while the host sent groups down a long flight of stairs, an inefficient system that was, frankly, stressing everyone out. Fogo de Chão was founded a quarter of a century ago by Brazilian brothers who, as the brand story goes, had learned the traditional grilling methods of the churrasco in their youth. There are now hundreds of locations, with more coming all the time. The chain was recently purchased for over a billion dollars by Bain Capital—a private equity firm which some will recognize from oppo research that targeted Mitt Romney’s retroactively normal-seeming 2012 presidential campaign. Private equity is part of our current global trend of vulture economics—these firms are one reason, I’ve learned from my YouTube Shorts, that chain restaurants are doling out smaller portions. (Another is Ozempic.)

The dining room was dark, save for the tantalizing buffet station, which gleamed like the full moon over the cloudless Aegean. Once seated, I ordered “The Churrasco Experience”—80 dollars—and was given a cardboard disc with instructions to flip it from red to green when I was ready to receive meat. And then I lost my mind at the Fogo de Chão salad bar. This isn’t a bit. I felt genuine panic as I circled the many options, the stirring of that late afternoon need that I associate with my pre-Ozempic self. Frenzy, dizziness. I used to microwave cheese into rice in these moments. I would steal five percent of my roommate’s ice cream. 

All my pre-planning collapsed into a still-now-overwhelming-to-contemplate bacchanalia of trying stuff. The pepper bacon was crunchy but too spicy, the elote was fine, the cured meats and cheeses were fine, the caprese and smoked salmon were bad, the citrus chicken was good, and the two best things were, disastrously, the most filling: the potato salad, the bean stew. Perhaps all my pre-envisioning had undone me. I even went back for more, ignoring the causal potentiality of a sniffling child in pajamas who was exactly the height of the food in the buffet and kept leaning in under the sneeze guard to inspect every offering up close. 

As I chewed, I felt an unusual melancholy, as what I had seen on the videos intersected with reality, all that remembered enthusiasm running headlong into my lack of it, and I slipped out of my body, just a bit, adrift.

Meanwhile, something insidious had occurred: plantains, yucca fries, and mashed potatoes—cheap, filling traps—had been placed on my table while I was gone. The spuds were good enough that I had to fold a napkin into them to cut myself off. I already knew from the “beat the buffet” shorts that FdC uses assorted strategies to keep patrons from eating too much expensive meat. The drinks are all high calorie save for the “Skinny Caipirinha,” which I ordered. The most energetic of the waiters kept offering diners shots of pineapple rum. The honey with cheese was as delicious as I’d feared. The salads were flanked by bread. Even when I flipped my POG to green, the cheap cuts came first, chicken and sausage, a complex choreography of misdirection, creating a scarcity mindset, restricting especially the picanha, a top sirloin that is purported to be the chain’s best bite. 

The Churrasco Experience turned out, for me, to be kind of like being on a subterranean cruise ship, mingling the constant feeling of potential indulgence and disappointment. When the waiters (mostly men, a.k.a. gauchos) carrying fatty skewers zagged away, I felt that the other diners were getting opportunities that I deserved. When they zigged toward me, I didn’t particularly crave any of what they had on offer, but I did want to be asked, and wanted to be able to say no. Alas, much of the meat wasn’t, I believe, particularly good. Even the much-ballyhooed ribeye was chewy and devoid of flavor. But despite these issues, many of the diners, many of whom had dressed up in leather or lace, were having a fabulous time. They seemed to my eye to be tourists (a curious Manhattan phenomenon is that tourists are more prevalent than locals in chain restaurants, in pursuit of familiarity, or perhaps, in our social media era, of the same mimetic experience I was chasing). As I chewed, I felt an unusual melancholy, as what I had seen on the videos intersected with reality, all that remembered enthusiasm running headlong into my lack of it, and I slipped out of my body, just a bit, adrift. I felt sad for the animals whose corpses were being ceremonially paraded around and rejected, and felt sad for myself too. 

Then came the phenomenon I’ve come to think of as “Ozempic full,” in which I simply lose interest in eating. I sent my friend C, a veteran of the all-you-can-eat circuit, a picture of the buffet options and she suggested that I steal and eat a display pomegranate to revitalize my hunger. It worked—I still don’t understand why—but when the picanha finally arrived, that one cut I was waiting for, there was no burst of pleasure, neither narratological nor culinary, and no time-travel to my younger self. As I’d feared, I’d come to prefer the consolations of the screen. I flipped my token to red. At the table next to me, a thrilled woman in her twenties, as young as I once was, thrice exclaimed, louder and louder, to every waiter who passed: “who’s going to stop me?” And I realized, in my case, what the answer to that question was, and so on the subway platform afterward, bloated with salt and fat, while I listened to a busking guitarist accompanying Astrud Gilberto’s English-language rendition of “The Girl from Ipanema”—another export—I canceled my YouTube Premium subscription. My one remaining side effect was cured. I didn’t beat the buffet, but I did beat that.


Adam Dalva is the president of the National Book Critics Circle and a contributing editor of The Yale Review. His writing has appeared in The New Yorker, The New York Review of Books, and The Paris Review, and the last time he was in a steakhouse, he ordered salmon.

Editor: Brendan Fitzgerald

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories