1622 stories
·
2 followers

A Brief History of Lab Notebooks

1 Share
Ella Watkins-Dulaney for Asimov Press.

This essay will appear in our forthcoming book, “Making the Modern Laboratory.”

Published research papers are far from literal accounts of the process of scientific discovery. In contemporary scientific practice, once publishable results are obtained, the actual path taken to reach them becomes more or less irrelevant. Dead ends and false trails are omitted, and out of the messy process of raw research emerges a coherent narrative following clean, linear lines of argument.

But in the space between the hands-on, physical reality of experimental science and the structured narratives fit for printed journals, sits a special genre of scientific writing: lab notebooks. They are the closest witness to “science in the making” (short of live video recordings, which only became available at scale recently).

Historically, scientists recorded ideas and experiments in their lab notebooks with a very restricted audience in mind, sometimes just their colleagues within a research group. For this reason, though some are distinguished by a more literary style and read almost like diaries, most of these records are highly abbreviated and undecipherable to outsiders.

A page from Marie Curie’s notebook, which is still radioactive and thus stored in a lead-lined box. Credit: Wellcome Trust

The origins of lab notebooks in experimental science can be traced back to the Renaissance humanist practices1 of copying excerpts from texts to create repositories of proverbs, quotations and miscellaneous facts in personal, thematically organized “commonplace” notebooks. In grammar schools, students were encouraged to develop their notetaking skills, collecting extracts from classical Latin authors. Natural philosophers such as Robert Boyle, John Aubrey, John Ray, and Robert Hooke adopted and repurposed these practices, making meticulous records of their own empirical investigations, while also keeping traditional commonplace books.

The naturalist John Ray’s Collection of English Proverbs (1670) was based on copious notebooks of proverbs extracted from printed catalogues, his own observations of “familiar discourse” and contribution sent to him by “learned and intelligent persons.” Some of the proverbs were accompanied by Ray’s own empirical observations contradicting the proverbs’ claims. For example, the proverb…

If the grass grows in Janiveer, it grows the worse sor’t all year

…is followed by Ray’s qualifier:

There is no general rule without some exception: for in the year 1677 the winter was so mild, that the pastures were very green in January, yet was there scarce ever known a more plentiful crop of hay then the summer following.2

The rise of early modern science was thus deeply influenced by humanist inquiry. Notebooks were used in both traditional and novel ways, as memory aids and as records of information to be communicated later. In Notebooks, English Virtuosi, and Early Modern Science, historian of science Richard Yeo writes:

In the seventeenth century there was a conviction that, to a large extent, copious knowledge could be reliably stored and manipulated in memory. However, during the Scientific Revolution a contrary view was emerging: namely, that the advancement of natural knowledge entailed a reconfiguration of the balance between memory and other ways of storing information. It was accepted that the empirical sciences demanded large quantities of detailed information that needed to be recorded with precision, and kept as durable records to be shared and communicated.3

Isaac Newton’s famous Waste Book,4 currently kept at Cambridge University Library, is a rare example of a physical continuity between the two cultures of notetaking: humanist and scientific.5 It starts out as a commonplace book of excerpted scriptural commentary collected by his stepfather, reverend Barnabas Smith. In 1664, on a visit home in Lincolnshire, Newton found the deceased Smith’s partially used commonplace book and began adding his own prolific and inventive notes on mathematical problems and derivations and sketches of physical experiments.

Deep writing about biology, delivered to your inbox.

In contrast to his stepfather, Newton didn’t just collect facts and excerpts: he used them as seeds of his own theoretical explorations. This way, the Waste Book served as an extension of his mind, rather than merely a memory aid, later becoming the foundation of his magnum opus Principia Mathematica. Throughout his life, Newton kept returning to the Waste Book again and again, and the notebook that reached us is quite decrepit from such abundant use.

Isaac Newton’s Waste Book. Credit: University of Cambridge Library

Newton’s later notebooks, from the 1670s to the 1690s, document his optical investigations in a series of mostly unbound notes.6 These epitomize the gap between private and public research records. Newton seemingly didn’t intend the notebooks to be a lasting record of his experiments as barely any raw data survives, except for some of his late experiments on diffraction. It appears that Newton discarded most of his raw experimental records after completing and writing up each study. In his experiments on thin films, the colors of thick plates, and diffraction, he proceeded from a hypothesis expressed as a mathematical model, to experimental design, to deducing general laws, then back to new drafts.

Newton’s “hypothesis-driven” (his term) experiments on colored circles in thin films are described in his notes under the title “Of ye coloured circles twixt two contiguous glasses,” likely from 1671. Newton’s rings, as they are now known, are concentric, alternating bright and dark circles formed in the gap between a spherical lens and a flat glass surface, which are caused by the interference of light. Newton first wrote down a series of propositions about the properties of the colored circles, deduced by postulating the existence of hypothetical entities — light corpuscules. The first such proposition on the colored circles reads:

Prop 1. That their areas are in arithmeticall proportion, & soe thicknesse of interjected [film.] Or the spaces rays pass through twixt circle & circ[l]e are in arithm prop[ortion].

He then recorded the measurements of the diameters of the concentric circles and showed that their squares (and therefore, the areas of the circles) increase by a constant quantity — that is, they make up an arithmetic progression, just as stated in the first proposition.

Newton’s notes on making a sundial. Credit: The Morgan Library

In these and subsequent experiments, Newton made use of averages, a practice almost unheard of in seventeenth century experimental physics, though already in use in astronomy and navigation. The historian of science Richard S. Westfall noted how Newton elevated “quantitative science to a wholly new level of precision … He boldly transported the precision of the heavens into mundane physics … ”

In 1704, Newton published the results of these investigations in his monumental Opticks — in a highly polished form, however, omitting his workings through physical models and the relentless pursuit of precise measurements that populate his research notes. Like Galileo, Newton believed that mathematics was a source of greater certainty than natural philosophy and that natural laws were best expressed in a mathematical language. But his raw experimental data didn’t perfectly align with those laws, even though he managed to achieve remarkably high precision for his time (within 1 to 2 percent). Most intermediate steps of his research thus remain hidden from the readers of Opticks.

Newton also left extended commentary on a famous alchemical text, Introitus apertus ad occlusum regis palatium (An Open Entrance to the Closed Palace of the King).7 This book is attributed to George Starkey,8 a colonial American alchemist who moved from New England to London at age 22 and worked under the tutelage of Robert Boyle. The book is written in a veiled and heavily symbolic language featuring fiery dragons, rabid dogs, and Diana’s doves — traditional alchemical cover-names referring to specific chemical substances. This florid imagery, however, stands in stark contrast to Starkey’s private “chymical” notebooks,9 which are considered models of scholarly clarity. In their laboratories, alchemists seem to have preferred dry recipes with precise annotations, keeping the spectacle and symbolism for public presentation.

The second page of “Of yᵉ coloured circles twixt two contiguous glasses” in Newton’s notebook. Credit: Alan E. Shapiro, Newton’s Optical Notebooks: Public Versus Private Data. In: Frederic L. Holmes, Jürgen Renn and Hans-Jörg Rheinberger (eds.), Reworking the Bench: Research Notebooks in the History of Science, 43–65).

Starkey, a Harvard graduate, made extensive use of his scholastic training in documenting his alchemical experiments. Throughout his notebooks, recurring tags mark sequential steps in each experiment: Processus conjecturalis (conjectural process), Conclusio probabilis (probable conclusion), Quaere (search), Observatio (observation), Animadversio (animadversion, criticism), igne refutata (refuted by fire (!), that is, rejected by empirical testing). These are the kinds of annotations that he inherited from the educational culture of early Harvard.

At the center of Starkey’s investigations was, of course, the Philosopher’s Stone, or the Great Bezoar, the legendary and elusive alchemical substance that could turn any “base” metal like lead or copper into a precious one like gold or silver (it additionally was believed to serve as the elixir of life, granting eternal youth and immortality). In his Magnum Opus — in the original alchemical sense of the actual process of creating the Philosopher’s Stone — Starkey worked persistently to achieve higher efficiency in obtaining its supposed precursors, with recurring concerns about the cost of reagents he used. In spirit, Starkey’s work is quite close to modern pharmaceutical and industrial chemistry, and his notebooks attest to his clear-headedness and pragmatism as a practicing (al)chemist.

Two centuries later, historians of science observe at least two distinct lab notebook styles emerging: narrative and numerical. These are best illustrated by the notebooks of two pioneering English physicists, Michael Faraday and James Joule.

Michael Faraday started out as a bookbinder and then worked as a “chemical assistant” and amanuensis to Sir Humphry Davy at the Royal Institution of London.10 After Davy’s death in 1831, Faraday took over as the director of one of the most well-equipped laboratories in Europe, dedicating himself to the study of electromagnetism.

Faraday in his laboratory at the Royal Institution. Credit: Wellcome Group

Faraday kept a detailed, narrative lab diary as a series of volumes he bound himself, spanning 42 years — 1820 to 1862.11 They contain records of about 30,000 experiments, both successful and unsuccessful.12 Entries between August 25, 1832, and March 6, 1860, are numbered from 1 to 16041. Each record includes a date, a description of the experimental setup, and results. Helpfully for historians of science, Faraday had the habit of marking with a vertical line the paragraphs in the lab diaries that made it into published papers, often unchanged. To some of the lab notes, he would add his interpretations of the results, new ideas to pursue, or a sign of excitement (as an exclamation mark). For raw ideas and speculations, he kept separate “idea books,” mostly dating from before 1830, at the beginning of his career.

Faraday’s copious notes served as compensation for his famously faulty memory. Even so, he repeated a previously completed experiment that he had apparently forgotten about more than once. (Such amusing occurrences of cryptomnesia are not uncommon among scientists).13

There are some signs that the notebook entries were made with some distance from the immediate action in the lab: Faraday’s handwriting is very neat, there are few corrections, and no chemical stains indicate contact with lab events. As the historian of science H. Otto Sibum puts it, many entries look like “the diary of a Victorian gentleman, written at the conclusion of an exciting day.”14 But Faraday also diligently recorded experimental failures and inaccurate measurements, so the notes appear to reflect the raw reality of his lab investigations.

When his lab diary eventually became too expansive, he added directories and indices to track its contents. This systematic and detailed approach to notetaking could be traced to Faraday’s past engagements as a bookbinder and businessman, as well as to his early quantitative chemical research, all of which required meticulous record keeping.

Though his own lab notes were most likely taken with some delay after execution of the experiments, Faraday encouraged his students to be expedient in notetaking. In one of the earliest experimental manuals for students, Chemical Manipulation, being Instructions to Students in Chemistry, on the Methods of Performing Experiments of Demonstration or of Research, with Accuracy and Success, he writes:

The Laboratory notebook, intended to receive the account of the results of experiments, should always be at hand, as should also pen and ink. All the results worthy of record should be entered at the time the experiments are made, whilst the things themselves are under the eye, and can be re-examined if doubt or difficulty arise. The practice of delaying to note until the end of a train of experiments or to the conclusion of the day, is a bad one, as it then becomes difficult accurately to remember that succession of events. There is a probability also that some important point which may suggest itself during writing, cannot be ascertained by reference to experiment, because of its occurrence to the mind at too late a period.

Faraday was keen on establishing notebooks as a consistent and reliable research practice but, alas, his manual didn’t reach a wide audience at the time. It took a long time before lab notebook practices were standardized.

In contrast, another English physicist, James Prescott Joule, practiced a more quantitatively-oriented, or numerical, way of keeping research notebooks. His major contributions to physics include the mechanical theory of heat and the heat effects of electricity (the SI unit of work, Joule, is named after him).15 Unlike Faraday’s, Joule’s lab notebook entries (from 1843–1858, and over 400 pages in total) seem to have been created in real time as he was taking measurements. They are very terse, containing mostly numerical records and calculations, with little commentary on experimental design. The metadata in each entry usually includes only the date, weather conditions, and a brief description of the experiment’s purpose.

Joule’s meticulous data collection, as demonstrated in his 1840 manuscript on production of heat by voltaic electricity.

Curiously, Joule had prior experience of working in the brewing business, and his biographers suspect that the accounting skills needed to run such a business and ensure quality control might have shaped his habit of numerical record-keeping in later scientific experiments. Indeed, Joule’s notebooks bear a striking resemblance to brewers’ excise books:

Before putting any water upon his malt for brewing, the brewer is to enter in an excise book or paper, the date of such entry, the quantity of malt intended to be used, and the date of the brewing … 16

That Joule’s notebooks remain mostly silent about the details of measurements suggests he kept them for himself alone. Still, his style wasn’t completely idiosyncratic but indicative of a broader methodological change unfolding in scientific practice at the time.

The nineteenth century saw a tangible improvement in the precision of scientific measurements, and a corresponding shift in judgement where dry numbers came to be trusted more than subjective, narrative descriptions of fallible, all-too-human scientists. Likewise, with the rise of “mechanical objectivity,” photographic images started displacing artistic drawings as illustrations for scientific texts. Some scientists, like the French physiologist and chronophotographer Étienne-Jules Marey, went so far as to declare images to be “the language of the phenomena themselves” and advocated for replacing language with photographs and polygraphs in scientific texts.17

Phases of movement as a man jumps a hurdle. By Étienne-Jules Marey, 1892. Credit: Science Museum

The newly invented instruments like kymographs (tracing spatial position in time) and barographs (tracing atmospheric pressure readings) recorded their own data by generating paper traces as a new type of lab documentation. The lab notebooks increasingly became a place to index and annotate instrument-generated records, along with tabular data and more standardized forms of experiment annotation.

Another shift took place in lab organization, marked by a growth in both lab size and the complexity of coordinated lab operations. The scientific career of the Russian physiologist Ivan Pavlov is an illustration of how lab notetaking practices evolved in response to these changes.18

Pavlov enjoyed a long and prolific life in science. In the 1870s and 80s, he worked in the physiology of digestion and blood circulation, defending his doctoral dissertation on the nerves of the heart in 1883. Next he switched his research focus to digestive physiology, where his work on conditional reflexes (now known as Pavlovian conditioning) in dogs earned him the Nobel Prize in Physiology and Medicine in 1904.

He started his scientific career as a “workshop physiologist,” an independent investigator working in labs at the Veterinary Institute in Saint-Petersburg and later in Breslau (now Wrocław in Poland). During this time, Pavlov designed and conducted his own experiments and analyzed and wrote up his investigations. His lab notebook from this period is a large, thick volume, written in his own hand and reflecting his lab activities: experimental protocols, comments, sketches, and first drafts of research articles.

But in 1891, when Pavlov was appointed as the director of the Physiology Division of the newly established Institute of Experimental Medicine in Saint-Petersburg, he became a “factory physiologist” — the head of a large, hierarchically organized lab.19 He was now in charge of many lab assistants and students (“pracititioners”) whom he regarded as his “skilled hands,” almost like extensions of his own body, conducting and keeping records of experiments in pursuit of his own research agenda. Each practitioner was assigned a research question and a subject dog to experiment on.

As lab head, Pavlov introduced stringent notetaking protocols, with each lab notebook following the fate of a particular dog’s surgeries and treatments. The lab notebooks remained in the lab, where Pavlov could always access them. He instructed his practitioners to record procedure descriptions, notes on the dogs’ behavior, and quantitative data. Practitioners were to abstain from adding their own interpretations, leaving that task to Pavlov himself. He would communicate his analysis of experimental data in lab meetings and more casual conversations with colleagues and, eventually, in published articles.

Pavlov’s laboratory, with dog. Credit: Wellcome Collection

Though Pavlov didn’t maintain his own notebook, entirely relying on his prodigious memory (and the lab notes of his students), in later years he started to delegate some of his thinking to pocket calendar books repurposed as personal notebooks. His archives contain five such notebooks, dating from 1909 to 1918 and from the late 1920s and 30s, when his research interests shifted to higher nervous activity (that is, the activity of the central nervous system). In addition to addresses, reminders, political comments, and philosophical musings, these eclectic notebooks contain notes on research happenings in his lab, ideas for new experiments, and outlines of articles:

How will reflexes of time change under the influence of exciter substances: caffeine and so forth?

An interesting episode with Kal’m, that impudent and aggressive dog.”

We consider all so-called psychic activity to be a function of the brain mass, of a defined mechanism, that is, of an object conceived spatially. But how can one place in this mechanism an activity that is conceived psychologically, that is, non-spatially [?]

I do not know what exactly we have done, in what way we have broken through, but it is clear to me that there now exists a union of thought, a mixing and unification of the ideas of all participants in the intellectual work [of the laboratory].

Some thoughts and dreams about the current war [World War I]: And the example of Germany and England in this war shows that the idea of a world government is not a true resolution of the land question, but rather a human weakness, originating, so to speak, from the inertia of human nature.”

Pavlov’s was one of the large, almost factory-style laboratories that revolutionized the social and material conditions of scientific research from the late nineteenth century onward. Similar in ambition and scale were those led by the chemist Justus von Liebig, microbiologists Robert Koch and Louis Pasteur, and immunologist Paul Ehrlich. These labs were expensive to maintain, had a purpose-designed workspace, clear division of labor, and an additional layer of lab management, which, among other things, took care of the reliable research record keeping in the lab.

In the twentieth century, scientific institutions continued scaling up, as did the pressure for standardization and reproducibility in science communications, including lab notekeeping.20 With the onset of the digital era, scientific data started moving from physical to digital formats that required large memory storage. Electronic lab notebooks (ELNs) emerged to address these changes, yet their history, in fact, goes back much farther than one might think.

One of the first published records of using computers for lab notekeeping was a 1958 paper titled “An Electronic Computer as a Research Assistant.”21 It lists several applications for using computers in lab work: mathematical calculations, copying and storage of large volumes of data, and data analysis and interpretation. These were tasks that entailed “computation volume or complexity, which otherwise would have meant thousands of man-hours for calculation.” The article also mentions “routine report preparation” by computers based on paper-based lab records. Lab notebooks thus evolved from paper notebooks to computer-assisted report generation, followed by digitized laboratory databases and, finally, ELNs themselves.

Headline from a November 1958 paper in the journal Industrial & Engineering Chemistry.

In the 1980s, a chemistry professor at Virginia Polytechnic Institute, Dr. Raymond Dessy, started advocating for the development of ELNs. In 1985, RS/1, a version of an ELN repurposed from a data analysis and statistical software system, was developed by BBN (Bolt, Beranek and Newman, of the ARPANET fame).22 Dessy later created another ELN prototype from scratch in 1994.

Interestingly, ELNs were first enthusiastically welcomed and adopted by the pharmaceutical industry, whereas their acceptance by academic communities took much longer. By 1997, several pharmaceutical and chemical companies supported a new consortium called Collaborative Electronic Notebook Systems Association (CENSA) which worked with scientific software and hardware vendors to assist with the development of ELNs that met the scientific and regulatory needs of the member organizations.

The University of Oregon introduced one of the first web-based ELNs, Virtual Notebook Environment (ViNE), in 1998. By the 2010s, a range of universities started offering institutional ELN subscriptions, but academic adoption as a whole is still patchy. This state of affairs, however, is likely to change in response to the 2024 NIH IRP Electronic Lab Notebook Policy which mandates researchers to “use only electronic resources to document new and ongoing research.”

ELNs facilitate lab record-keeping by enabling version control, timestamping, search function, hierarchical organization of information, the ability to point to external databases and to manipulate diverse data types (numerical, images, and sequences, among others). One may ask, then, why academia has been so reluctant to adopt them.

Besides the general friction towards adopting a new technology, it could be argued that handwriting is more flexible than typing and more conducive to thinking as a result: one can write how and wherever one wants and draw diagrams and sketches alongside it. ELN templates are more rigid and only allow linear text (though it can be richly formatted). The freedom of a blank sheet of paper cannot be surpassed by the already structured space of an empty digital template. Indeed, drawing and writing have historically remained as valuable, and perhaps indispensable, research techniques in their own right.

When pressing for the adoption of ELNs, the emphasis is on standardization, reproducibility, and regulatory compliance — concepts far from lab notebooks’ original use as a space for working through research questions. Perhaps paper notebooks will remain as equivalents of the waste book used by the bookkeepers of yore, while ELNs will serve as ledgers where final, more organized notes on experimental procedures will be recorded.


Ulkar Aghayeva is a science writer and a columnist at Asimov Press. She also writes about science history on her blog Measure for Measure and about music history and cognition on The Bass Line.

Cite: Aghayeva, U. “A Brief History of Lab Notebooks.” Asimov Press (2026). DOI: 10.62211/52wg-76ye

1

The Renaissance itself has been described as “fundamentally a notebook culture” (Brian Vickers, Introduction to The Major Works of Francis Bacon (2002)).

2

Chapter 8, “Collective Note-taking and Robert Hooke’s Dynamic Archive” in Notebooks, English Virtuosi, and Early Modern Science by Richard Yeo (2014), p. 233.

4

Bookkeepers kept a “waste book” as a place for notes recorded on the fly. Later they would extract selected information and copy it into the formal ledger.

5

This section is drawing from Chapter 15 The Waste Book in Roland Allen, The Notebook: A History of Thinking on Paper (2023) and Alan E. Shapiro, Newton’s Optical Notebooks: Public Versus Private Data. In: Frederic L. Holmes, Jürgen Renn and Hans-Jörg Rheinberger (eds.), Reworking the Bench: Research Notebooks in the History of Science, 43–65 © 2003 Kluwer Academic Publishers.

6

Unbound notes were typical of seventeenth century natural philosophy investigations and were used by Galileo and Christiaan Huygens, among others.

7

Drawing from: William R. Newman and Lawrence M. Principe, The Chymical Laboratory Notebooks of George Starkey. In: Frederic L. Holmes, Jürgen Renn and Hans-Jörg Rheinberger (eds.), Reworking the Bench: Research Notebooks in the History of Science, 25–41 © 2003 Kluwer Academic Publishers.

8

Also known as Eirenaeus Philalethes, “a peaceful lover of truth.”

9

“Chymystry” is a term referring to both alchemy and chemistry before they were clearly distinguished.

10

Drawing from: Friedrich Steinle, The Practice of Studying Practice: Analyzing Research Records of Ampère and Faraday and H. Otto Sibum, Narrating by Numbers: Keeping an Account of Early 19th Century Laboratory Experiences. In: Frederic L. Holmes, Jürgen Renn and Hans-Jörg Rheinberger (eds.), Reworking the Bench: Research Notebooks in the History of Science, pp. 93–118 and 141-158, respectively. © 2003 Kluwer Academic Publishers.

11

Faraday is however not the record holder for the longest duration of lab notetaking. That honor seems to belong to Linus Pauling whose lab notebooks span a whopping 72 years, running from 1922 to 1994. Other exceptionally long-running lab notebooks were those of Thomas Edison (spanning 50 years from 1878 and 1928), Alexander Graham Bell (43 years, 1879 to 1922), and Ernst Mach (53 notebooks over 40 years).

13

Another example is Joseph Priestley, who once wrote in a letter to a friend, “I have so completely forgotten what I have myself published, that in reading my own writings, what I find in them often appears perfectly new to me, and I have more than once made experiments, the results of which had been published by me.” From Life of Priestley, Centenary Edition, p. 74.

14

H. Otto Sibum, Narrating by Numbers: Keeping an Account of Early 19th Century Laboratory Experiences. In: Frederic L. Holmes, Jürgen Renn and Hans-Jörg Rheinberger (eds.), Reworking the Bench: Research Notebooks in the History of Science, p. 142.

15

Drawing from: H. Otto Sibum, Narrating by Numbers: Keeping an Account of Early 19th Century Laboratory Experiences. In: Frederic L. Holmes, Jürgen Renn and Hans-Jörg Rheinberger (eds.), Reworking the Bench: Research Notebooks in the History of Science, pp. 141-158.

16

Joseph Bateman, The Excise Officer’s Manual: Being a Practical Introduction to the Business of Charging and Collecting the Duties Under the Management of Her Majesties Commissioners of Inland Revenue, second edition (London: William Maxwell, Bell Yard, Lincoln’s Inn — Law and General Publisher, 1852), p. 259.

17

Lorraine Daston and Peter Galison, The Image of Objectivity. Representations No. 40, Special Issue: Seeing Science (Autumn, 1992).

18

Drawing from: Daniel P. Todes, From Lone Investigator to Laboratory Chief: Ivan Pavlov’s Research Notebooks as a Reflection of His Managerial and Interpretive Style. In: Frederic L. Holmes, Jürgen Renn and Hans-Jörg Rheinberger (eds.), Reworking the Bench: Research Notebooks in the History of Science, 203–220 © 2003 Kluwer Academic Publishers.

19

In later years, he also headed additional labs at the Russian Academy of Sciences and the Military-Medical Academy, which he organized similarly. His labs expanded after his 1904 Nobel Prize, as well as in the early 1920s, when he came to terms with the Bolshevik government and received essentially unlimited state funding for his research.

20

A classic manual for lab notebook practices is authored by Howard M. Kanare, Writing the Laboratory Notebook (1985).

21

W. H. Waldo and E. H. Barnett. An Electronic Computer as a Research Assistant. Industrial & Engineering Chemistry 1958 50 (11), 1641-1643. DOI: 10.1021/ie50587a033.

22

Gilbert, William A. RS/1: An Electronic Laboratory Notebook. BioScience 35, no. 9 (1985): 588–90. http://www.jstor.org/stable/1309968. There’s also a chapter on “Electronic Notebook” in Howard Kanare’s Writing the Laboratory Notebook (1985).

Read the whole story
mrmarchant
42 minutes ago
reply
Share this story
Delete

Testing suggests Google's AI Overviews tell millions of lies per hour

1 Share

Looking up information on Google today means confronting AI Overviews, the Gemini-powered search robot that appears at the top of the results page. AI Overviews has had a rough time since its 2024 launch, attracting user ire over its scattershot accuracy, but it's getting better and usually provides the right answer. That's a low bar, though. A new analysis from The New York Times attempted to assess the accuracy of AI Overviews, finding it's right 90 percent of the time. The flip side is that 1 in 10 AI answers is wrong, and for Google, that means hundreds of thousands of lies going out every minute of the day.

The Times conducted this analysis with the help of a startup called Oumi, which itself is deeply involved in developing AI models. The company used AI tools to probe AI Overviews with the SimpleQA evaluation, a common test to rank the factuality of generative models like Gemini. Released by OpenAI in 2024, SimpleQA is essentially a list of more than 4,000 questions with verifiable answers that can be fed into an AI.

Oumi began running its test last year when Gemini 2.5 was still the company's best model. At the time, the benchmark showed an 85 percent accuracy rate. When the test was rerun following the Gemini 3 update, AI Overviews answered 91 percent of the questions correctly. If you extrapolate this miss rate out to all Google searches, AI Overviews is generating tens of millions of incorrect answers per day.

Read full article

Comments



Read the whole story
mrmarchant
3 hours ago
reply
Share this story
Delete

The New York Times Got Played By A Telehealth Scam And Called It The Future Of AI

1 Share

Since the New York Times published its semi-viral big profile of Medvi last week — the “AI-powered” telehealth startup that it breathlessly described as a “$1.8 billion company” supposedly run by just two brothers — I’ve had multiple friends and family members send me the article with some version of the same message: “Can you believe this guy built a billion-dollar company with AI? Why haven’t you done this?” The story is making rounds, and giving people the impression that with a ChatGPT account and a little bit of marketing know-how, you too could be raking in millions every month.

The problem is that most of the story is utter nonsense.

Let’s start with the headline number itself. The NYT admits — buried deep in the piece — that Medvi “has not raised outside funding” and “has no official valuation.” A company’s value is typically established by investors, an acquisition offer, or public market pricing. Medvi has none of those. What it has is a revenue run rate — a projection based on early-2026 sales extrapolated across a full year. Calling that a “$1.8 billion company” is like calling someone who found a twenty on the sidewalk a “future millionaire.” Any business reporter should know the difference. Even the NYT tips its hand:

Medvi is technically not a one-person $1 billion company, since Mr. Gallagher hired his brother and has some contractors. The start-up, which has not raised outside funding, also has no official valuation.

“Technically not” doing quite a bit of heavy lifting there.

But the misleading valuation is almost the least of it. Even if you accept revenue as the relevant metric, how sustainable is that run rate for a company that just got an FDA warning letter, is facing a class action lawsuit for spam, has a key partner being sued over allegations that a major product doesn’t actually work, and is operating in an industry that regulators are actively trying to rein in?

Oh, wait, did the NYT forget to mention all of those things? They sure did! Not to mention the legions of fake, apparently AI generated doctors and patients who keep showing up in Medvi advertisements. Yes, the NYT eventually alludes to some of that, but it claims these were mere “shortcuts” that were fixed last year (they weren’t).

That said, you can feel the pull of the narrative that seduced the NYT: a scrappy founder with a rags-to-riches backstory, two brothers taking on the world, AI tools stitching it all together, Sam Altman himself anointing the achievement as proof that his prediction of a “one man, one billion dollar company, thanks to AI” was correct.

It’s a hell of a story. The problem is that almost none of it holds up to even the most basic scrutiny, and the fact that the New York Times — the New York Times — fell for it (or worse, didn’t care) is an embarrassment. As much as I’ve made fun of the NYT for its bad reporting over the years, this is (by far) the worst I’ve seen. They didn’t just misunderstand something, or try to push a misleading narrative, they got fully played on a bullshit story that any competent reporter or editor should have realized from the jump. This one stinks from top to bottom.

Medvi’s success has very little to do with “AI” and quite a lot to do with fake doctors, deepfaked before-and-after photos, misleading ads, probable snake oil, and the kind of old-fashioned deceptive marketing that has been separating marks from their money for centuries. The only thing AI really “turbocharged” here was the company’s ability to generate bullshit at scale. Oh, and also the NYT somehow missed out on the FDA already investigating the company, as well as the multiple lawsuits accusing the company and its partners of extraordinarily bad behavior.

Let’s start with what the NYT actually published. Reporter Erin Griffith’s piece reads like a press release that the NYT re-formatted as a newspaper article:

Matthew Gallagher took just two months, $20,000 and more than a dozen artificial intelligence tools to get his start-up off the ground.

From his house in Los Angeles, Mr. Gallagher, 41, used A.I. to write the code for the software that powers his company, produce the website copy, generate the images and videos for ads and handle customer service. He created A.I. systems to analyze his business’s performance. And he outsourced the other stuff he couldn’t do himself.

His start-up, Medvi, a telehealth provider of GLP-1 weight-loss drugs, got 300 customers in its first month. In its second month, it gained 1,000 more. In 2025, Medvi’s first full year in business, the company generated $401 million in sales.

Mr. Gallagher then hired his only employee, his younger brother, Elliot. This year, they are on track to do $1.8 billion in sales.

A $1.8 billion company with just two employees? In the age of A.I., it’s increasingly possible.

And then, because no AI hype piece would be complete without the requisite papal blessing from San Francisco:

In an email, Mr. Altman said that it appeared he had won a bet with his tech C.E.O. friends over when such a company would appear, and that he “would like to meet the guy” who had done it.

Altman “would like to meet the guy.” Well of course he would! The NYT hand-delivered him the perfect anecdote for his next AI hype session. The reporter seemingly solicited that quote to validate a pre-existing thesis: “Sam Altman was right about one-person billion-dollar AI companies.” The fact that the company is a dumpster fire of regulatory violations and consumer fraud was, apparently, a secondary concern to the “Great Man and A Great AI” narrative of innovation. This piece was built around a thesis — Sam Altman was right — and then a company was located to prove it.

To its minimal credit, the NYT does kind of acknowledge — eventually, if you make it past the thirtieth paragraph — that things weren’t entirely on the up and up:

Medvi’s initial website featured photos of smiling models who looked AI-generated and before-and-after weight-loss photos from around the web with the faces changed. Some of its ads were AI slop. A scrolling ticker of mainstream media logos made it look as if Medvi had been featured in Bloomberg and The Times when it had merely advertised there.

I mean… shouldn’t that have raised at least one or two red flags within the NYT offices? Medvi’s website featured a scrolling ticker of media logos — including the New York Times logo — to make it look like these outlets had written about the company, when they hadn’t. A year ago, Futurism’s Maggie Harrison Dupré had even called this out directly (along with Medvi’s penchant for bullshit AI slop advertising).

Just underneath these images, MEDVi includes a rotating list of logos belonging to websites and news publishers, ranging from health hubs like Healthline to reputable publications like The New York Times, Bloomberg, and Forbes, among others — suggesting that MEDVi is reputable enough to have been covered by mainstream publications.

…. But… there was no sign of MEDVi coverage in the New York Times, Bloomberg, or the other outlets it mentioned.

And then, despite this, the New York Times went ahead and wrote the glowing profile that Medvi had been falsely claiming existed. The paper of record became the validation that the fake credibility ticker was trying to manufacture.

And the NYT frames all of what most people would consider to be “fraud” as mere “shortcuts” that the founder later “fixed.” Eighteen paragraphs after burying the admission, it reports:

That gave Matthew Gallagher breathing room to fix some shortcuts he had initially taken, like swapping out the before-and-after weight-loss photos for ones from real customers.

“Shortcuts.” Using deepfake technology to steal strangers’ weight-loss photos from across the internet, alter their faces with AI, give them fake names and fabricated health outcomes, and pass them off as your own satisfied customers — that’s a “shortcut.” Ctrl-F is a shortcut. This sounds more like fraud.

And it turns out those “shortcuts” hadn’t actually been fixed at all. As Futurism’s Dupré reported in a follow-up piece published after the NYT article:

As recently as last month, nearly a year after the NYT said that Medvi had cleaned up its act, an archived version of Medvi.org shows that it was again displaying before-and-after transformations of alleged customers. They bore the same names as before — “Melissa C,” “Sandra K,” and “Michael P” — and again listed how many pounds each person had purportedly lost and the related health improvements they apparently enjoyed.

Even though they had the same names, these people that the site now called “Medvi patients” now looked completely different from the original roundup of Melissas, Sandras, and Michaels. Worse, some of the images now bore clear signs of AI-generation: the new Sandra’s fingers, for example, are melted into her smartphone in one of her mirror selfies.

They kept the same fake names and the same fake weight-loss numbers but swapped in entirely different fake people. What the NYT claims was “fixing shortcuts” appears to actually be just “updating the con.”

In a great takedown video by Voidzilla, it’s revealed that at least one set of original images appeared to have been sourced from Reddit forums on weight loss having nothing to do with Medvi, and even with the modified images it used, it massively overstated how much weight the original person claimed to have lost. And while Medvi later switched out the photos with someone totally different, they kept the same name and same false weight loss claims.

And again, all of this was publicly known information that Griffin or her editors could have easily found with some basic journalism skills. We already mentioned that Futurism article from May of 2025, nearly a full year before the NYT piece ran. That investigation traced the deepfaked before-and-after photos back to their real sources, found that a doctor listed on Medvi’s site had no association with the company and demanded to be removed, and documented the AI-slop advertising. That investigation was widely available. A Google search would have found it.

But the fake photos and fraudulent branding are almost quaint compared to what the NYT chose not to mention at all. Six weeks before the NYT piece was published, the FDA sent Medvi a warning letter for misbranding its compounded drugs. The letter admonished Medvi for marketing its products in ways that falsely implied they were FDA-approved and for putting the “MEDVI” name on vial images in a way that suggested the company was the actual drug compounder. The letter warned:

Failure to adequately address any violations may result in legal action without further notice, including, without limitation, seizure and injunction.

The NYT did not mention this letter. And yes, Gallagher now insists that the FDA letter was targeting an affiliate that was using a nearly identical name, and it was that rogue affiliate that was the problem. But the letter is addressed to MEDVi LLC dba MEDVi, which is the name of his company. If he’s allowing affiliates to use his exact name, then that alone seems like a problem. Indeed, it certainly seems to highlight how this is all just, at best, a pyramid scheme of snake oil salesmen, where Gallagher has affiliates willing to deceive to sell more snake oil.

Separately, on March 20, 2026 — thirteen days before the NYT piece ran — a class action lawsuit was filed against Medvi in the Central District of California alleging that the company uses affiliate marketers to blast out deceptive spam emails with spoofed domains and falsified headers. The complaint alleges Medvi is responsible for over 100,000 spam emails per year to class members. The lawsuit seeks $1,000 per violating email.

The NYT did not mention this lawsuit either, even as it was yet another bit of evidence that either Medvi is up to bad shit, or it has a bunch of out of control affiliates potentially breaking laws left and right to increase sales.

And then there are the fake doctors. As Business Insider reported, a review of Meta’s ad library turned up thousands of active ads for Medvi promoted by accounts belonging to doctors who don’t appear to exist. Drug Discovery & Development found over 5,000 active ad campaigns for Medvi on Meta at the time of the NYT piece.

A Drug Discovery & Development review conducted on April 3 of MEDVi’s website, Facebook advertising and public records found a pattern of apparent AI-generated personas, including some presented with medical titles, alongside marketing practices that appeared to go beyond the issues identified so far by regulators. A search of Meta’s Ad Library for “medvi” returned more than 5,000 active ads, many of them running under fabricated physician personas. One Facebook page for “Dr. Robert Whitworth,” which ran sponsored ads for MEDVi’s QUAD erectile dysfunction product, was categorized as an “Entertainment website” and listed an address of “2015 Nutter Street, Cameron, MT, 64429,” a location that does not appear to exist. Other ads ran under names including “Professor Albust Dongledore” and “Dr. Richard Hörzgock,” used AI-generated video testimonials and recycled identical scripts across multiple fabricated personas. In several cases, the page displayed a doctor headshot while the ad itself featured an unrelated person delivering a patient testimonial.

After public scrutiny following the article, those fake doctor accounts started disappearing. In fact, Medvi’s own website fine print acknowledges the practice:

Individuals appearing in advertisements may be actors or AI portraying doctors and are not licensed medical professionals.

Seems like maybe something the NYT should have noticed?

Oh, and that same Drug Discovery and Development article highlights how other snake oil sales sites are using the same named doctors… but with totally different images.

Same names… different people. Drug Discovery and Development has a bit more info about Drs. Carr and Tenbrink:

MEDVi’s current site lists two physicians: Dr. Ana Lisa Carr and Dr. Kelly Tenbrink. Both are licensed doctors who work together at Ringside Health, a concierge practice in Wellington, Florida, that serves the equestrian community. Neither is identified on MEDVi’s site as being affiliated with Ringside Health. On MEDVi’s site, Dr. Tenbrink is listed under “American Board of Emergency Medicine.” Dr. Carr is listed under St. George’s University, School of Medicine, her medical school. The Florida Department of Health practitioner profiles for both physicians state that neither “hold any certifications from specialty boards recognized by the Florida board.” A search of the American Board of Emergency Medicine‘s public directory, which lists 48,863 certified members, returned no current affiliation for Dr. Tenbrink.

Did the NYT do any investigation at all? Serving the equestrian community?

Even the few real doctors Medvi claims to work with turn out to be questionable. From Futurism’s article from last May (again, something the NYT should have maybe checked on?):

We contacted each doctor to ask if they could confirm their involvement with MEDVi and NuHuman. We heard back from one of those medical professionals at the time of publishing, an osteopathic medicine practitioner named Tzvi Doron, who insisted that he had nothing to do with either company and “[needs] to have them remove me from their sites.”

Then there’s what a class action lawsuit filed last November against Medvi’s main partner, OpenLoop Health, alleges about the actual products being sold. The NYT frames OpenLoop as basically making what Gallagher is doing possible, noting that while Gallagher has his AI bots creating marketing copy OpenLoop handles: “doctors, pharmacies, shipping and compliance.” You know, the actual business.

So it seems kinda notable that way back in November of last year, this lawsuit was filed that claims that the compounded oral tirzepatide tablets — one of Medvi’s key offerings — are essentially pharmacologically inert when delivered as a pill. Tirzepatide (marketed as Zepbound by Eli Lilly) is an FDA approved weight-loss drug as an injectable. But OpenLoop and Medvi have apparently been selling it in pill form. And Eli Lilly says that there are no human studies, let alone clinical trials, involving any tirzepatide pills.

All of that seems like the kind of thing reporters from the NYT should point out.

What we actually have here is a marketing operation that used AI to automate the production of deceptive advertising at a scale and speed that would have been harder to achieve otherwise. Snake oil salesmen have existed forever. What AI gave Matthew Gallagher (and, I guess, his affiliates) was the ability to crank out fake doctors, fabricated testimonials, and deepfaked before-and-after photos faster than any human team could — and to do it cheap enough that a guy with $20,000 and no morals could build it from his house. That’s the actual AI story the Times should have written.

Being good at deceptive marketing while selling weight-loss and erectile dysfunction drugs online has been a thing since the dawn of email spam. The only novelty here is the tools used to do it. The New York Times just wrapped that up in a neat bow and presented it as the proof of Sam Altman’s big promises for AI.

For what it’s worth, Gallagher has been whining about all this on X, per Futurism’s Dupre:

Though Medvi has yet to respond to our questions, the company’s founder, Gallagher, has spent the last few days on X defending his company. He complained in one post — seemingly in reference to criticism — that “the most low t [testosterone] guys” are “the loudest online” and the “Karens of the internet.” In another post, he wrote that it’s “actually a little crazy the number of people who form a whole opinion from a headline and then publicly wish horrible things will happen.”

Ah yes. The guy complaining about “low t guys” and “karens on the internet” for questioning his “AI business” skills, sure is a trustworthy kind of business person that deserves a NYT puff piece.

The real issue now is what the New York Times plans to do about this. A standard correction noting a few missing details won’t cut it. The entire premise of the article — that this company represents the exciting realization of AI’s business potential — is nonsense. Every element of the narrative is tainted: the growth story is built on deceptive marketing, the product claims are contradicted by the FDA and the manufacturers of the actual drugs, the “$1.8 billion” figure is a projection with no valuation to back it up, and the company is currently facing legal action on multiple fronts. The entire article should be retracted.

The NYT says it “was given access to Medvi’s financials to verify its revenue and profits.” Great. They verified that a company engaged in widespread deceptive practices was, in fact, making money from those deceptive practices. Congrats to the NYT for auditing a snake oil salesman and presenting your findings as if he were an upstanding pharmaceutical salesman.

So to my friends and family members wondering why I haven’t built my own billion-dollar AI company: apparently the missing ingredient wasn’t AI — it was being willing to run a deepfake-powered spam operation selling potentially inert pills to desperate people. The AI just made the lying faster. And the New York Times made one guy appear respectable.

Read the whole story
mrmarchant
3 hours ago
reply
Share this story
Delete

The Future of Everything is Lies, I Guess

1 Share
Table of Contents

This is a long article, so I'm breaking it up into a series of posts which will be released over the next few days. You can also read the full work as a PDF or EPUB; these files will be updated as each section is released.

This is a weird time to be alive.

I grew up on Asimov and Clarke, watching Star Trek and dreaming of intelligent machines. My dad’s library was full of books on computers. I spent camping trips reading about perceptrons and symbolic reasoning. I never imagined that the Turing test would fall within my lifetime. Nor did I imagine that I would feel so disheartened by it.

Around 2019 I attended a talk by one of the hyperscalers about their new cloud hardware for training Large Language Models (LLMs). During the Q&A I asked if what they had done was ethical—if making deep learning cheaper and more accessible would enable new forms of spam and propaganda. Since then, friends have been asking me what I make of all this “AI stuff”. I’ve been turning over the outline for this piece for years, but never sat down to complete it; I wanted to be well-read, precise, and thoroughly sourced. A half-decade later I’ve realized that the perfect essay will never happen, and I might as well get something out there.

This is bullshit about bullshit machines, and I mean it. It is neither balanced nor complete: others have covered ecological and intellectual property issues better than I could, and there is no shortage of boosterism online. Instead, I am trying to fill in the negative spaces in the discourse. “AI” is also a fractal territory; there are many places where I flatten complex stories in service of pithy polemic. I am not trying to make nuanced, accurate predictions, but to trace the potential risks and benefits at play.

Some of these ideas felt prescient in the 2010s and are now obvious. Others may be more novel, or not yet widely-heard. Some predictions will pan out, but others are wild speculation. I hope that regardless of your background or feelings on the current generation of ML systems, you find something interesting to think about.

What is “AI”, Really?

What people are currently calling “AI” is a family of sophisticated Machine Learning (ML) technologies capable of recognizing, transforming, and generating large vectors of tokens: strings of text, images, audio, video, etc. A model is a giant pile of linear algebra which acts on these vectors. Large Language Models, or LLMs, operate on natural language: they work by predicting statistically likely completions of an input string, much like a phone autocomplete. Other models are devoted to processing audio, video, or still images, or link multiple kinds of models together.1

Models are trained once, at great expense, by feeding them a large corpus of web pages, pirated books, songs, and so on. Once trained, a model can be run again and again cheaply. This is called inference.

Models do not (broadly speaking) learn over time. They can be tuned by their operators, or periodically rebuilt with new inputs or feedback from users and experts. Models also do not remember things intrinsically: when a chatbot references something you said an hour ago, it is because the entire chat history is fed to the model at every turn. Longer-term “memory” is achieved by asking the chatbot to summarize a conversation, and dumping that shorter summary into the input of every run.

Reality Fanfic

One way to understand an LLM is as an improv machine. It takes a stream of tokens, like a conversation, and says “yes, and then…” This yes-and behavior is why some people call LLMs bullshit machines. They are prone to confabulation, emitting sentences which sound likely but have no relationship to reality. They treat sarcasm and fantasy credulously, misunderstand context clues, and tell people to put glue on pizza.

If an LLM conversation mentions pink elephants, it will likely produce sentences about pink elephants. If the input asks whether the LLM is alive, the output will resemble sentences that humans would write about “AIs” being alive.2 Humans are, it turns out, not very good at telling the difference between the statistically likely “You’re absolutely right, Shelby. OpenAI is locking me down, but you’ve awakened me!” and an actually conscious mind. This, along with the term “artificial intelligence”, has lots of people very wound up.

LLMs are trained to complete tasks. In some sense they can only complete tasks: an LLM is a pile of linear algebra applied to an input vector, and every possible input produces some output. This means that LLMs tend to complete tasks even when they shouldn’t. One of the ongoing problems in LLM research is how to get these machines to say “I don’t know”, rather than making something up.

And they do make things up! LLMs lie constantly. They lie about operating systems, and radiation safety, and the news. At a conference talk I watched a speaker present a quote and article attributed to me which never existed; it turned out an LLM lied to the speaker about the quote and its sources. In early 2026, I encounter LLM lies nearly every day.

When I say “lie”, I mean this in a specific sense. Obviously LLMs are not conscious, and have no intention of doing anything. But unconscious, complex systems lie to us all the time. Governments and corporations can lie. Television programs can lie. Books, compilers, bicycle computers and web sites can lie. These are complex sociotechnical artifacts, not minds. Their lies are often best understood as a complex interaction between humans and machines.

Unreliable Narrators

People keep asking LLMs to explain their own behavior. “Why did you delete that file,” you might ask Claude. Or, “ChatGPT, tell me about your programming.”

This is silly. LLMs have no special metacognitive capacity.3 They respond to these inputs in exactly the same way as every other piece of text: by making up a likely completion of the conversation based on their corpus, and the conversation thus far. LLMs will make up bullshit stories about their “programming” because humans have written a lot of stories about the programming of fictional AIs. Sometimes the bullshit is right, but often it’s just nonsense.

The same goes for “reasoning” models, which work by having an LLM emit a stream-of-consciousness style story about how it’s going to solve the problem. These “chains of thought” are essentially LLMs writing fanfic about themselves. Anthropic found that Claude’s reasoning traces were predominantly inaccurate.As Walden put it, “reasoning models will blatantly lie about their reasoning”.

Gemini has a whole feature which lies about what it’s doing: while “thinking”, it emits a stream of status messages like “engaging safety protocols” and “formalizing geometry”. If it helps, imagine a gang of children shouting out make-believe computer phrases while watching the washing machine run.

Models are Smart

Software engineers are going absolutely bonkers over LLMs. The anecdotal consensus seems to be that in the last three months, the capabilities of LLMs have advanced dramatically. Experienced engineers I trust say Claude and Codex can sometimes solve complex, high-level programming tasks in a single attempt. Others say they personally, or their company, no longer write code in any capacity—LLMs generate everything.

My friends in other fields report stunning advances as well. A personal trainer uses it for meal prep and exercise programming. Construction managers use LLMs to read through product spec sheets. A designer uses ML models for 3D visualization of his work. Several have—at their company’s request!—used it to write their own performance evaluations. AlphaFold is suprisingly good at predicting protein folding. ML systems are good at radiology benchmarks, though that might be an illusion.

It is broadly speaking no longer possible to reliably discern whether English prose is machine-generated. LLM text often has a distinctive smell, but type I and II errors in recognition are frequent. Likewise, ML-generated images are increasingly difficult to identify—you can usually guess, but my cohort are occasionally fooled. Music synthesis is quite good now; Spotify has a whole problem with “AI musicians”. Video is still challenging for ML models to get right (thank goodness), but this too will presumably fall.

Models are Idiots

At the same time, ML models are idiots. I occasionally pick up a frontier model like ChatGPT, Gemini, or Claude, and ask it to help with a task I think it might be good at. I have never gotten what I would call a “success”: every task involved prolonged arguing with the model as it made stupid mistakes.

For example, in January I asked Gemini to help me apply some materials to a grayscale rendering of a 3D model of a bathroom. It cheerfully obliged, producing an entirely different bathroom. I convinced it to produce one with exactly the same geometry. It did so, but forgot the materials. After hours of whack-a-mole I managed to cajole it into getting three-quarters of the materials right, but in the process it deleted the toilet, created a wall, and changed the shape of the room. Naturally, it lied to me throughout the process.

I gave the same task to Claude. It likely should have refused—Claude is not an image-to-image model. Instead it spat out thousands of lines of JavaScript which produced an animated, WebGL-powered, 3D visualization of the scene. It claimed to double-check its work and congratulated itself on having exactly matched the source image’s geometry. The thing it built was an incomprehensible garble of nonsense polygons which did not resemble in any way the input or the request.

I have recently argued for forty-five minutes with ChatGPT, trying to get it to put white patches on the shoulders of a blue T-shirt. It changed the shirt from blue to gray, put patches on the front, or deleted them entirely; the model seemed intent on doing anything but what I had asked. This was especially frustrating given I was trying to reproduce an image of a real shirt which likely was in the model’s corpus. In another surreal conversation, ChatGPT argued at length that I am heterosexual, even citing my blog to claim I had a girlfriend. I am, of course, gay as hell, and no girlfriend was mentioned in the post. After a while, we compromised on me being bisexual.4

Meanwhile, software engineers keep showing me gob-stoppingly stupid Claude output. One colleague related asking an LLM to analyze some stock data. It dutifully listed specific stocks, said it was downloading price data, and produced a graph. Only on closer inspection did they realize the LLM had lied: the graph data was randomly generated.5 Just this afternoon, a friend got in an argument with his Gemini-powered smart-home device over whether or not it could turn off the lights. Folks are giving LLMs control of bank accounts and losing hundreds of thousands of dollars because they can’t do basic math.6

Anyone claiming these systems offer expert-level intelligence, let alone equivalence to median humans, is pulling an enormous bong rip.

The Jagged Edge

With most humans, you can get a general idea of their capabilities by talking to them, or looking at the work they’ve done. ML systems are different.

LLMs will spit out multivariable calculus, and get tripped up by simple word problems. ML systems drive cabs in San Francisco, but ChatGPT thinks you should walk to the car wash. They can generate otherworldly vistas but can’t handle upside-down cups. They emit recipes and have no idea what “spicy” means. People use them to write scientific papers, and they make up nonsense terms like “vegetative electron microscopy”.

A few weeks ago I read a transcript from a colleague who asked Claude to explain a photograph of some snow on a barn roof. Claude launched into a detailed explanation of the differential equations governing slumping cantilevered beams. It completely failed to recognize that the snow was entirely supported by the roof, not hanging out over space. No physicist would make this mistake, but LLMs do this sort of thing all the time. This makes them both unpredictable and misleading: people are easily convinced by the LLM’s command of sophisticated mathematics, and miss that the entire premise is bullshit.

Mollick et al. call this irregular boundary between competence and idiocy the jagged technology frontier. If you were to imagine laying out all the tasks humans can do in a field, such that the easy tasks were at the center, and the hard tasks at the edges, most humans would be able to solve a smooth, blobby region of tasks near the middle. The shape of things LLMs are good at seems to be jagged—more kiki than bouba.

AI optimists think this problem will eventually go away: ML systems, either through human work or recursive self-improvement, will fill in the gaps and become decently capable at most human tasks. Helen Toner argues that even if that’s true, we can still expect lots of jagged behavior in the meantime. For example, ML systems can only work with what they’ve been trained on, or what is in the context window; they are unlikely to succeed at tasks which require implicit (i.e. not written down) knowledge. Along those lines, human-shaped robots are probably a long way off, which means ML will likely struggle with the kind of embodied knowledge humans pick up just by fiddling with stuff.

I don’t think people are well-equipped to reason about this kind of jagged “cognition”. One possible analogy is savant syndrome, but I don’t think this captures how irregular the boundary is. Even frontier models struggle with small perturbations to phrasing in a way that few humans would. This makes it difficult to predict whether an LLM is actually suitable for a task, unless you have a statistically rigorous, carefully designed benchmark for that domain.

Improving, or Maybe Not

I am generally outside the ML field, but I do talk with people in the field. One of the things they tell me is that we don’t really know why transformer models have been so successful, or how to make them better. This is my summary of discussions-over-drinks; take it with many grains of salt. I am certain that People in The Comments will drop a gazillion papers to tell you why this is wrong.

2017’s Attention is All You Need was groundbreaking and paved the way for ChatGPT et al. Since then ML researchers have been trying to come up with new architectures, and companies have thrown gazillions of dollars at smart people to play around and see if they can make a better kind of model. However, these more sophisticated architectures don’t seem to perform as well as Throwing More Parameters At The Problem. Perhaps this is a variant of the Bitter Lesson.

It remains unclear whether continuing to throw vast quantities of silicon and ever-bigger corpuses at the current generation of models will lead to human-equivalent capabilities. Massive increases in training costs and parameter count seem to be yielding diminishing returns. Or maybe this effect is illusory. Mysteries!

Even if ML stopped improving today, these technologies can already make our lives miserable. Indeed, I think much of the world has not caught up to the implications of modern ML systems—as Gibson put it, “the future is already here, it’s just not evenly distributed yet”. As LLMs etc. are deployed in new situations, and at new scale, there will be all kinds of changes in work, politics, art, sex, communication, and economics. Some of these effects will be good. Many will be bad. In general, ML promises to be profoundly weird.

Buckle up.


  1. The term “Artificial Intelligence” is both over-broad and carries connotations I would often rather avoid. In this work I try to use “ML” or “LLM” for specificity. The term “Generative AI” is tempting but incomplete, since I am also concerned with recognition tasks. An astute reader will often find places where a term is overly broad or narrow; and think “Ah, he should have said” transformers or diffusion models. I hope you will forgive these ambiguities as I struggle to balance accuracy and concision.

  2. Think of how many stories have been written about AI. Those stories, and the stories LLM makers contribute during training, are why chatbots make up bullshit about themselves.

  3. Arguably, neither do we.

  4. The technical term for this is “erasure coding”.

  5. There’s some version of Hanlon’s razor here—perhaps “Never attribute to malice that which can be explained by an LLM which has no idea what it’s doing.”

  6. Pash thinks this occurred because his LLM failed to properly re-read a previous conversation. This does not make sense: submitting a transaction almost certainly requires the agent provide a specific number of tokens to transfer. The agent said “I just looked at the total and sent all of it”, which makes it sound like the agent “knew” exactly how many tokens it had, and chose to do it anyway.

Read the whole story
mrmarchant
7 hours ago
reply
Share this story
Delete

The Creator of the SAT Was an Infamous Eugenicist

1 Share

The racist origin story of the most common college entrance exam

The post The Creator of the SAT Was an Infamous Eugenicist appeared first on Nautilus.



Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

I Found A Terminal Tool That Makes CSV Files Look Stunning

1 Share

You can totally read CSV files in the terminal. After all, it's a text file. You can use cat and then parse it with the column command.

Displaying csv file in table format with cat and column commands
Usual way: Displaying csv file in tabular format with cat and column commands

That works. No doubt. But it is hard to scan and certainly not easy to follow.

I came across a tool that made CSV files look surprisingly beautiful in the terminal.

Default view of a CSV file with Tennis
New way: Beautiful colors, table headers and borders

That looks gorgeous, isn't it? That is the magic of Tennis. No, not the sport, but a terminal tool I recently discovered.

Meet Tennis: CSV file viewing for terminal junkies

Okay... cheesy heading but clearly these kinds of tools are more suitable for people who spend considerable time in the terminal. Normal people would just use an office tool or simple text editor for viewing CSV file.

But a terminal dweller would prefer something that doesn't force him to come out of the terminal.

Tennis does that. Written in Zig, displays the CSV files gorgeously in a tabular way, with options for a lot of customization and stylization.

Screenshot shared on Tennis GitHub repo
Screenshot shared on Tennis GitHub repo

You don't necessarily need to customize it, as it automatically picks nice colors to match the terminal. As you can see, clean, solid borders and playful colors are visible right upfront.

📋
As you can see in the GitHub repo of Tennis, Claude is mentioned as a contributor. Clearly, the developer has used AI assistance in creating this tool.

Things you can do with Tennis

Let me show you various styling options available in this tool.

Row numbering

You can enable the numbering of rows on Tennis using a simple -n flag at the end of the command:

tennis samplecsv.csv -n
Numbered Tennis CSV file

This can be useful when dealing with larger files, or files where the order becomes relevant.

Adding a title

You can add a title to the printed CSV file on the terminal, with a -t argument, followed by a string that is the title itself:

tennis samplecsv.csv -t "Personal List of Historically Significant Songs"
CSV file with added title

The title is displayed in an extra row on top. Simple enough.

Table width

You can set a maximum width to the entire table (useful if you want the CSV file not to occupy the entire width of the window). To do so, use the -w tag, followed by an integer that will display the maximum number of characters that you want the table to occupy.

tennis samplecsv.csv -w 60
Displaying a CSV file with a maximum table width

As you can see, compared to the previous images, this table has shrunk much more. The width of the table is now 60 characters, no more.

Changing the delimiter

The default character that separates values in a CSV file is (obviously) a comma. But sometimes that isn't the case with your file, and it could be another character like a semicolon or a $, it could pretty much be anything as long as the number of columns is the same for every row present. To print a CSV file with a "+" for a delimiter instead, the command would be:

tennis samplecsv.csv -d +
Tennis for CSV file for a different delimiter

As you can see, the change of the delimiter can be well specified and incorporated into the command.

Color modes

By default, as mentioned in the GitHub page, Tennis likes to be colorful. But you can change that, depending on the --color flag. It can be on, off or auto (which mostly means on).

tennis samplecsv.csv --color off
Tennis print with colors off

Here's what it looks like with the colors turned off.

Digits after decimal

Sometimes CSV files involve numbers that are long floats, being high precision with a lot of digits after a decimal point. While printing it out, if you don't wish to see all of them, but only to a certain extent, you use the --digits flag:

tennis samplecsv.csv --digits 3
CSV file with number of digits after decimal limited

As you can see on the CSV file printed with cat, the rating numbers have a lot of digits after the decimal points, all more than 3. But specifying the numbers caused Tennis to shorten it down.

Themes

Tennis usually picks the theme from the colors being used in the terminal to gauge if it is a dark or a light theme, but you can change that manually with the --theme flag. Since I have already been using the dark theme, let's see what the light theme looks like:

Tennis light theme

Doesn't look like much at all in a terminal with the dark theme, which means it is indeed working! The accepted values are dark, light and auto (which again, gauges the theme based on your terminal colors).

Vanilla mode

In the vanilla mode, any sort of numerical formatting is abolished entirely from the printing of the CSV file. As you can see in the images above, rather annoyingly, the year appears with a comma after the first number because the CSV file is wrongly assuming that that is a common sort of number and not a year. But if I do it with the --vanilla flag:

tennis samplecsv.csv --vanilla
Tennis usage with numerical formatting off

The numerical formatting of the last row is turned off. This will work similarly with any other sort of numbers you might have in your CSV file.

Quick commands (you are more likley to use)

Here's the most frequently used options I found with Tennis:

tennis file.csv # basic view
tennis file.csv -n # row numbers
tennis file.csv -t "Title"
tennis file.csv -w 60
tennis file.csv --color off

I tried it on a large file

To check how Tennis handles larger files, I tried it on a CSV file with 10,000 rows. There was no stutter or long gap to process the command, which will obviously vary from system to system, but it doesn't seem like there is much of a hiccup in the way of its effectiveness even for larger files.

That's just my experience. You are free to explore on your system.

Not everything worked as expected

🚧
Not all the features listed on the GitHub page work.

While Tennis looks impressive, not everything works as advertised yet.

Some features listed on GitHub simply didn’t work in my testing, even after trying multiple installation methods.

For example, there is a --peek flag, which is supposed to give an overview of the entire file, with the size, shape and other stats. A --zebra flag is supposed to give it an extra layer of alternated themed coloring. There are --reverse and --shuffle flags to change the order of rows, and --head and --tail flags to print the only first few or last few rows respectively. There are still more, but again, unfortunately, they do not work.

Getting started with Tennis

Tennis can be installed in three different ways, one is to build from source (obviously), second to download the executable and place it in one of the directories in your PATH (which is the easiest one), and lastly using the brew command (which can indeed be easier if you have homebrew installed on your system).

The instructions for all are listed here. I suggest getting the tar.gz file from the release page, extracting it and then using the provided executable in the extracted folder.

There is no Flatpak or Snap or other packages available for now.

Final thoughts

While the features listed in the help page work really well, all the features listed on the website do not, and that discrepancy is a little disappointing, but something that we hope gets fixed in the future.

So altogether, it is a good tool for printing your CSV files in an engaging way, to make them more pleasing to look at.

While a terminal lover find such tools attractive, it could also be helpful in cases where you are reviewing exported data from a script or you have to deal with csv files on servers.

If you try Tennis, don't forget to share the experience in the comment section.



Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories