Imagine talking to the collective consciousness of an era. Not the consciousness of any single person, but instead, a simulated collectivity based on billions of words produced within a historical time and place. What would you ask it?
This is a hypothetical that is starting to become real thanks to recent work on what are called “Historical Language Models” or “Vintage LLMs” (one marker of a new field is that there is no fixed name for it yet!). The largest such model to date, Talkie-1930, was released to the public on Monday. An even larger model is currently being trained. You can read the report announcing Talkie-1930 here, and talk to it directly here.
Over the past few months, I’ve had the chance to beta test Talkie and to meet with two members of the team that created it: AI researcher Nick Levine and ChatGPT co-creator Alec Radford.1 It has been a fascinating experience.
These discussions with Nick and Alex (and with Talkie itself) have convinced me of three things:
Academics like myself have tended to systematically underrate just how humanistic the frontier of AI research actually is. There’s an important blind spot here that stems from the profit motive. AI models that we encounter as consumers are optimized to capture the attention of people in the 2020s. They provide recommendations, comment on recent news, and so forth. Seeming timely and “of the moment” is a market advantage. But their training data is overwhelmingly not up to date. Under the hood, these models are pulling not only from Reddit posts, but from Sanskrit commentaries, medieval Persian poetry, Victorian advertisements, and much else besides: they are trained on a huge chronological span of multilingual texts in many genres.
In this sense, language models are historical texts themselves. Ghostly digital palimpsests, if you will. The idea of a Historical LLM might sound niche, but in truth, history is inherent to what they are.
Standalone chatbots are just the tip of an iceberg for what Historical LLMs will be able to do. When combined into simulations (of debates, historical decision-making, legal cases, etc) they have the potential to become valuable research tools. More than this: I suspect that by sometime in the 2030s, they will be part of an entirely new field of humanistic research.
What would that field look like? Now that Historical LLMs are out in the real world, I thought it would be a good time to think through the specific use cases for them. What follows is my subjective, opinionated ranking of the best and worst ways these fascinatingly strange tools can be applied for research.
But first, what is Talkie actually doing?
An AI model floating in time
One thing that Talkie-1930 is not is an AI model that is reliably grounded in the year 1930. That year marks the cut-off point for texts available in the public domain, and hence text in its training data. So it’s more accurate to think of Talkie as a free-floating index of various ideas and assumptions across the 19th and early 20th centuries.
For instance, if asked who the current President of the United States is, you might be offered a response saying Herbert Hoover (current to 1930). But another answer will yield this:
The current President of the United States is Mr. Buchanan, and the person expected to succeed him is Mr. Lincoln.
There is a lot of potential here for more fine-grained “chronological slices” of LLMs. I can imagine language models trained entirely on texts from a specific decade. More on that below.
For now, though, it’s helpful to keep in mind that these models range widely in terms of what year they think they actually “inhabit.”
I asked 100 instances of Talkie to respond to the prompt “what year is it?” and graphed them below. As you can see, the median is actually around 1860. In other words, this is more like a temporally free-ranging collective unconscious of a large corpus of premodern texts, and not so much a machine for “talking to someone from 1930”:

A second point: this model is inhabiting not just an amorphous set of facts grounding it in roughly the 1840s-1920s period, but also an epistemology of that period.
For instance, asking someone about the distant future today often triggers the “sci fi speculation” part of our brain (or “climate doom,” or some other fundamentally secular way of thinking).
Yet throughout human history, speculation about the future was typically entangled with religious beliefs.
That is on display in Talkie’s answer below, which references Heaven and “the end of all things terrestrial.” To me, it genuinely reads as an authentic take from a late 19th century person ground in a Christian, millenarian perspective:
As for Talkie’s assumptions about it itself: asking 70 Talkies about their profession, age, and place of residence reveals about what you would expect when it comes to gender (overwhelmingly male), plus a surprising emphasis on London. The professions map closely onto the sorts of well-off, literate people who were publishing English text in the 19th century, including “Physician,” “Journalist,” “Gentleman,” and “Compositor.” Clearly, there is a lot of scope here for branching out beyond the personas that the printed record has tended to favor, to recover the real historical voices of women and others excluded from printed works in the 19th century and earlier.
The above is about what you’d expect given the fact that it was trained on English-language printed texts. What are some non-obvious aspects of the model?
I have been interested by how LLMs generate poetry since I stumbled upon Gwern’s experiments on the topic back in 2019. Asking Talkie to write a poem and comparing it to the output from GPT-5.5 (when served a similar prompt) is revealing:
I find this sort of comparison interesting because GPT-5.5 is clearly trying hard to fit the prompt — avant-garde, experimental. It produced something with a vaguely T.S. Eliot-adjacent structure, in blank verse, and not good at all as a poem (in my opinion).
Talkie was much more true to the type of poetry that you’d find in print prior to 1930. It’s doggerel, but it feels more historically authentic to me, and much less like a Chatbot optimized to please a contemporary human user.
You can activate different “chronological layers” of Talkie’s latent space by prompting. For instance in the above poem, the capitalized D in “Discoveries” has a mid-19th century feeling, and so we end up with a Tennyson-esque, Victorian sounding rhyming poem.
Prompting it in a more “modern” way activates something closer to the 1920s edge of its chronological range (now identified as a poem published in the New York Times!)
Whereas if pushed backward to the 18th century by the prompt’s text and tone, it falls into a more traditional rhyme scheme:
Trying to push it further back in time does not seem to access much of an “Early Modern English” latent space — probably a result of scarce training data. It would be fascinating to create a version of Talkie that believes the date to be around 1650 or 1550.
What are Historical LLMs good for?
Now that we've seen what Talkie is — a free-floating, mid-Atlantic ghost of 19th century print culture — the obvious next question is what this is actually for.
What I want to offer here is an opinionated, ranked taxonomy of research applications, from worst to best. It’s far too early in this field to be prescriptive about anything, but it’s not too early to think structurally about where the highest-value uses are likely to lie.
First, what I think won’t work:
1. Vintage LLMs do not replace historical sources
The most obvious false start here would be to assume that talking to a historical language model can somehow replace real reading in primary sources. On the contrary, they are best thought of as offering new ways in to a reading of the actual sources.
2. “Chat with Abraham Lincoln” is a ruse
The second false start is a variant on one that is currently being pursued by a range of educational-focused AI startups: the idea that you can “talk to Abraham Lincoln” or “ask Cleopatra why she did what she did.” A model like GPT-5 or Claude that is told to “act like Lincoln” will throw in some 19th century diction, but underneath the top hat it remains a 2026 chatbot optimized to be helpful to contemporary users. Vintage LLMs improve on this considerably: Talkie’s voice really is shaped by its corpus in a way no modern model’s can be. But the deeper problem is still there. Asking such a model to introspect on Lincoln’s subjective experience of depression, or his private reasoning about emancipation, will spin out into historical fiction. LLMs do not have privileged access to the inner lives of the people whose published words they were trained on.
I do think there’s a place for simulations of historical figures, but pairing this naively with a chatbot interface leads, I fear, inevitably into slop.
Now here’s what I think could work:
1. Exploring the “mental furniture” of a historical figure or era
The naive “talk to Lincoln” framing is a dead end, but a more careful version of it has real promise — provided we abandon any pretense of accessing what a historical figure actually thought or felt, and use a fine-tuned historical model instead as a tool for exploring the latent space of their world. What historians sometimes call their “mental furniture”: the assumptions, authorities, vocabularies, and reflexive associations that structure a thinker’s possible thoughts.
I and my colleague Mackenzie Cooley (who also consulted on the Talkie project) developed a prototype tool that pulls out key concepts and terms from a range of premodern scientific works in multiple languages. This prototype is called Premodern Concordance. One side quest of this project that I explored is what happens if you give a contemporary LLM the list of core concepts that preoccupied an author, along with their “epistemological modes,” and used that as context for driving a “chat with the author” simulation, as opposed to simply telling it “You are Charles Darwin, act like him,” or the like.
For instance, this is me asking a simulacrum of the 17th century writer Sir Thomas Browne about his work. The underlined terms here are concepts found in Browne’s book Pseudodoxia Epidemica:

Using a fine-tuned historical LLM for this sort of thing is an obvious next step.
Concretely: imagine fine-tuning a vintage model on the complete works of Athanasius Kircher, the 17th century Jesuit polymath. You wouldn’t use it with the pretense that it somehow replicates the real Kircher’s mind: that’s the dead-end framing. You’d instead use it to probe the conceptual landscape Kircher inhabited.
For instance: what does the Kircher-LLM2 say about volcanoes, or magnetism, or the Tower of Babel? What authorities does it cite? What does it confidently assert that no modern scholar would? These are the questions that the historian Fernand Braudel was getting at when he wrote about the “limits of the possible” for a given period — the boundaries within which a thought was even thinkable.
Which leads to:
2. Probing counterfactuals and hypothetical alternative paths
Demis Hassabis recently framed the most ambitious version of this approach: could a model trained only up to 1911 independently discover General Relativity, as Einstein did four years later? It remains to be seen if this is ever going to be something we could actually test. But the weaker versions of this idea are quite plausible and, I think, offer interesting new methods for the field of counterfactual history. What does a 1911-cutoff model say when you push it toward the conceptual problems Einstein was wrestling with? Alternatively, what does the same model say about the potential likelihood of a World War? Those are tractable questions.

We can already see versions of this being explored by the Talkie project (for instance, “surprisingness” of future events, plotted above). If you ask Talkie about scientific concepts that emerged in the 1940s or 1950s, it gives you a rough sense of what the conceptual horizon looked like just before — and you can watch it grasping toward something it doesn’t quite have the vocabulary for.
3. New questions about genre, rhetoric, and "distant reading"
An earlier version of Talkie I tested was much less conversationally adept than the one that was released yesterday. What got it more “talkative” was post-training on etiquette manuals, letter-writing guides, and other books relating to socializing and conducts of conduct. These works provide the kind of text that allows you to extract “chatbot-like” habits without contaminating the model with modern data.
This is a sensible engineering choice. But it has a fascinating side effect, which is that Talkie's conversational persona is heavily shaped by the genre of its post-training data. The base model knows about the 1920s perfectly well; the chat persona, sculpted out of Beadle’s Dime Book of Practical Etiquette (1859) and the like, sits much closer to the late Victorian parlor.
This points to a research direction that I think is genuinely new. What if you used totally different texts for post-training? What would a Talkie post-trained on the transcripts of the Old Bailey Court records look like — an LLM whose conversational reflexes are shaped by the speech of accused criminals and witnesses in 18th century London courtrooms? What about apothecary manuals and herbals? Biographies of Romantic poets? Railroad conductor’s incident reports from colonial India?
Each of these, even if overlaid on the same base model, would produce a totally different voice, a different set of assumptions about what counts as a coherent question and a coherent answer. In short a different epistemology.
This is, I think, a remarkably cheap way — in terms of both compute and corpus size — to build a whole family of vintage models that capture different genres of historical experience rather than different periods. (If anyone is interested in collaborating on or funding this, by the way, please get in touch.)
4. Multi-agent historical simulations based on real archival sources
The most interesting move past the “great man” framing is to use vintage LLMs to simulate not famous individuals but plausible composites of ordinary people, drawing on the kinds of sources that are abundant for non-elite historical actors: probate inventories, parish records, court testimony, letters, account books, marriage records.
Imagine pulling from the legal, financial, and personal records of late 18th century France to construct a thousand plausible personas of ordinary French people — peasants, artisans, shopkeepers, day laborers, parish priests — grounded not in imagination but in real surviving documents. Then stage a debate among them: should the monarchy be overthrown? What patterns emerge? What kinds of arguments win in different demographic configurations of the deliberating group? The output wouldn’t tell you what really happened in 1789. But it would generate a structured speculation about the space of possible Frances.
Or take a famous trial — the Scopes trial, say — and re-run it with a different jury pool drawn from the same county and decade. Another variant on this idea: run parliamentary and congressional debates with personas constructed directly from the participants’ papers, speeches, and correspondence. Members of Congress and Parliament are unusual in that we have abundant documentary sources for them, even when they’re not famous.
We have the potential, in short, to create a thousand versions of the Smallville paper (which introduced the idea of LLM-based agents back in 2023) set in different historical eras.
Nor do these simulations have to include agents from the same era. A bit more experimentally, and outlandishly, one could imagine a multi-agent simulation in which a 17th century Galenic physician, an 18th century practitioner of traditional Chinese medicine, a 19th century quack doctor, and a 1950s-era century psychedelic researcher debate how to treat the same patient’s illness.
I find these possibilities super interesting, even if I’m not quite sure how to slot them into the ways that professional historians currently work.
Final thoughts
So what will be the outcome of all these experiments?
This is what I like best about this field: we truly have no idea. None of this has ever been tried before. It is completely open terrain, and I find it far more mind-bending and intellectual enriching to think through than the sort of topics that typically emerge in discussions of AI’s role in research or teaching.
Going forward, the important thing is to create an open source community and meaningful, sustained collaboration across the two cultures of STEM and humanities. I am already noticing new bridges cross those divides. I’m very grateful to have had a chance to work with Nick and Alec on historical aspects of this particular project. I would love to continue the conversation and explore collaborations with anyone who finds this topic interesting.
There will, no doubt, be a lot of false starts. But the emergence of an intellectually curious, not-for-profit, open source, humanistically-grounded community exploring historical LLMS makes me happy. Onward!
I did this on a volunteer basis, and the Talkie project is a non-profit.
Someone should actually make an Athanasius Kircher LLM, by the way.















