1783 stories
·
2 followers

AI put "synthetic quotes" in his book. But this author wants to keep using it.

1 Share

Journalist and author Steven Rosenbaum has more reasons than most to distrust AI.

His new book, The Future of Truth: How AI Reshapes Reality, is all about "how Truth is being bent, blurred, and synthesized" thanks to the "pressure of fast-moving, profit-driven AI." Yet a New York Times investigation this week found what Rosenbaum now acknowledges are "a handful of improperly attributed or synthetic quotes" linked to his use of AI tools while researching the book.

These quotes include one that tech reporter Kara Swisher told the Times she "never said" and another that Northeastern University professor Lisa Feldman Barrett said "don’t appear in [my] book, and they are also wrong." Rosenbaum is now working with editors on what he says is a full "citation audit" that will correct future editions.

Read full article

Comments



Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

Why English will never be a programming language

1 Share

Learn how to respond when your CFO asks, "why are our devs still writing code?"

In my last post, I argued why you should still type code by hand. If you aren't writing code, you're programming in English (or German, Chinese, etc.) and asking an LLM to translate. Businesses love that idea because a lot more people speak English than write JavaScript and will work at overseas call center rates. Read on to learn why that won't work.

Two columns of text. The left column has the heading SPEC and a long description of a regular expression that specifies a valid email address. The right column has the heading CODE and has the regular expression. The point is that sufficiently detailed spec is just code.

The spec for an email address. A precise spec is just code.

Thoughtworks is a software consultancy that developers and business execs look to for practical guidance in software engineering. In February, some of them got together1 to discuss coding with LLMs. There, someone asked this question2:

What would have to be true for us to ‘check English into the repository’ instead of code?

I felt disappointed to hear expert software practitioners considering this question, because it will be about thirty seconds before it is reframed as a LinkedIn post confidently proclaiming that code is dead. After that, it will be two minutes before your CFO posts it in a chat with your manager.

I'll grant that code is a cost center. Each line is a recurring operational expense that climbs for the life of the product. Just look at the pink line from my recent post on software failures. You can mentally substitute "failures" with "cost":

A chart showing that the cost of software maintenance is spiky and rises over time.

The cost of software maintenance is spiky and rises over time.

Your CFO would gladly shed this expense, and he hopes that LLMs are that golden ticket. You can stop this train wreck if you know where the railroad switch is. The language your leaders hear can switch the trajectory of the business before it careens off the cliff of LLM dependence and layoffs.

Below, you'll learn precisely what would have to be true for businesses to program purely in English, do away with those cumbersome programming languages, and fire those expensive programmers.

How the software sausage is made

In order to understand what it would take to replace code, we need to know the role that code plays when creating software. Let's start with a simplified model of software development:

[specification] ---> [code] ---> [executable] ---> [program] ---> [test]

The model above intentionally ignores details like iteration and feedback. Let's look at the transition between each step.

From spec to code

In general, software developers take a specification and turn it into code.

A spec describes, in language that a broad audience can understand, what the software must do or not do. It consists of one or more requirements. Here's an example requirement:

The app SHALL page the on-call engineer if the server is down or the response is slow and it's during business hours.

Creating a good spec can be difficult. Let's remember Brooks' wisdom that producing the spec is the hard part, not implementation:

The hardest part of the software task is arriving at a complete and consistent specification, and much of the essence of building a programi is in fact the debugging of the specification

-- Fred Brooks, No Silver Bullet, 1986

I already demonstrated Brooks' point. The paging requirement above is ambiguous. Did you notice? Look again.

To turn the paging rules above into code, a dev has to disambiguate between two possible meanings:

  • Never page outside work hours: (server_down OR response_slow) AND business_hours
  • Always page on outages, and only page on slowness during work hours: server_down OR (response_slow AND business_hours)

Unlike English, code is an unambiguous language. To express the paging rules above in C, Python, or JavaScript, a programmer has to choose one of the precise interpretations. If they leave off the parentheses, the language's operator precedence rules determine which of OR and AND gets evaluated first.

Choosing the correct interpretation requires a conversation with someone who knows what the software is supposed to do.

Spec is legible; code is precise

You might think, "What idiot wrote that vague requirement?", but before you judge, remember that specs are imprecise on purpose. Precision and legibility are opposing attributes. "Ensure the email address is valid" is legible, but ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$ is precise. Both levels are necessary; one communicates intent, the other actually does the work. Similarly, "Round to two decimals" is legible, but the IEEE 754 rules for rounding occupy an 84-page document3.

It is of course possible to write a specification in English that contains no ambiguity. The cost of producing it would be greater than the cost to produce the equivalent code. That's because there's no program checking syntax and enforcing a type system. For example, I've read plenty of spec documents that used terms they did not define. Without some kind of automated correctness check, a precise English specification is likely to accumulate omissions and inconsistencies over time. This is especially likely for multiple documents made by multiple contributors.

Let's pretend that some organization succeeded in creating a software specification that is precise enough to replace code. The language of that document would be hard to read for the same reasons that code is hard to read. As others have said, "a sufficiently detailed spec is just code."4

From code to executable

Once we have some code, a program called a compiler reads the code and outputs an executableii. An executable is a file that contains the data and sequence of instructions that work on a specific processor.

The transformation from code to executable is deterministic: the compiler will always output the same executable, given the same code. We'll come back to that in a moment.

From executable to program

Once we have an executable, we can load it into memory and run it. We call that a program. Once we can run a program, we can test it in two ways:

Verification. We ask "Did we build the thing right?" We check if the code implements the spec. It can be done with manual interaction with the program, but it's best to use automated tests because humans are not fast or consistent.

Validation. We ask "Did we build the right thing?" We hand the executable to the user and ask for feedback. This pits the user's expectations against both the spec and code. Any of the three can change as a result.

How to replace code with English

Let's assume for the sake of argument that LLMs actually can parse English specifications, even ambiguous ones, and output the equivalent code that devs would. In that universe, we would delete the code and only store the spec in source control:

[specification] ---------------> [executable] ---> [program] --> [test]

On each build, the LLM would generate the code, build the executable, and run the tests. Then it would discard the code it had generated.

We do not live in that universe.

LLMs can't always resolve ambiguity. If the spec has issues, they'll need clarification, just like devs do. You'll be lucky if they ask for it. Do you want your deploy to prod to stop while an LLM waits for an email reply?

LLMs are not generally deterministic. I promised we'd circle back to this topic. If you give an LLM the same prompt twice, you'll get different output. This presents a problem for verification and validation. Since the artifact changes with each build even if the spec did not, the tests must validate a range of valid specs rather than just one. It's like hitting a bullseye with a bow and arrow. The more the tests constrain the set of executables that will pass, the harder the LLM has to work.

There's no opportunity to inspect the code the LLM generated and no value of doing so, since on each run it is different. This places further load on the test suite for proving correctness.

In light of these facts, here's what would have to be true to use English in the place of programming languages:

  • The specification would have to be unambiguous. It would be pedantic English that makes C++ look pleasant.

  • The tests would need to be comprehensive, automated, and epistemologically sound.

    • Tests serve as a forcing function that constrain the executable to one that satisfies the spec.

    • The tests have to actually verify the executable meets the spec.

      Do you ask the LLM to write a test, then review the test, run it, and see it fail? Then stage the code changes? Then ask the LLM to pass the test? Then verify that the LLM did not change the test while passing it? Review the additional code change? Commit and repeat? If so, this sounds epistemologically sound. 5

    • The tests have to be written in code. If you write them in English and ask the LLM to generate test code, the bullseye moves on each build. Moreover, what verifies the generated tests?

  • The LLM would need to be deterministic, or your budget must be ready to absorb massive token spend. That's because on each CI run, the LLM would iterate in a test-edit loop until the tests pass.

The industry already tried to code in English

In the 1960s, the industry adopted two languages that look like English.

The hope behind COBOL was that business analysts would be able to implement their ideas without programmers.

ADD OVERTIME-HOURS TO TOTAL-HOURS GIVING WEEKLY-HOURS ROUNDED.

The above code can be expressed in JavaScript as const weeklyHours = Math.round(overtimeHours + totalHours);. The cosplay didn't protect workers from having to think hard and understand the problem they were solving. Today, the small priesthood of surviving COBOL programmers are remunerated with airdrops of cash to keep the core of the world's banking infrastructure running.

SQL had similar goals. The query below looks friendly enough, but debugging it requires knowledge of set theory, query planners, indexes, and idiosyncrasies of the SQL flavor. These days, DBAs easily earn six figures.

SELECT FROM orders WHERE status = 'cancelled' AND created_at < '2025-01-01';

Edsger Dijkstra was probably to computer science what Einstein was to physics. The emergence of English-like languages caused him to pen a rather forceful essay6 which can be summed up with this sentence:

some still seem to equate "the ease of programming" with the ease of making undetected mistakes.

What to tell your CFO

Here's what you can tell your CFO when he asks why we still have devs on payroll:

Code is already the cheapest path to working, correct software. LLMs do not change the calculus because figuring out what to make is the expensive part, not coding it up. Skipping code makes the specification of what to make even more expensive and throws away the tools that keep precision affordable. Programming in English would be more expensive than just using a programming language.

Bookmark this page or save this paragraph. You'll need it soon enough.

Further reading

If this post clicked with you, I drum a similar beat about business, coding, and LLMs:

Footnotes

  1. Historically, the industry used the term program as shorthand for program text, what today we call source code.
  2. I'm ignoring interpreted languages like JavaScript and Python. They aren't compiled. They produce the executable the moment you want to run it. I'm also ignoring languages that compile to intermediate representations, like Java or C#. These details don't matter for this discussion.

References

  1. The Future of Software Development
  2. Finding Comfort in the Uncertainty
  3. IEEE Standard for Floating-Point Arithmetic
  4. A sufficiently detailed spec is code
  5. AI-generated tests as ceremony
  6. On the foolishness of "natural language programming".
Read the whole story
mrmarchant
17 hours ago
reply
Share this story
Delete

Zero Sum Problems

1 Share

Over at Daring Fireball, John Gruber makes a passing observation about the Apple Sports app:

I’ve got some gripes about certain specific aspects of Apple Sports. Like, where does one even start to explain how much is wrong with their zero-sum visualization of team stats? Has anyone ever even seen a presentation like that before? Anyone?

That “Anyone” link lands over here. Hi everyone! The team stats image is quite confusing. It’s a summary of a game between the San Antonio Spurs and the Oklahoma City Thunder. I don’t know much about basketball, but I do know a bit about data visualization and in a pleasing coincidence my former student Josh Fink is the A-VP of Basketball Data Science for the Spurs. Here is the image that John objected to:

Confusing Apple Sports team stats visualization.

I had to look at it for a while as well.

I just finished driving a very long way up the side of the country, so I’m kind of tired. But even allowing for that, boy, this way of representing things really is quite confusing. Not being an Apple Sports user I had to look at it for a bit to understand what was happening. But, now that it has given me a headache, I can kind of see why whoever designed this ended up in the undoubtedly bad place they did.

Before I get to why I have some sympathy for the designer, why did I find this representation of these numbers so disorienting? It’s not just just because I’ve been driving for nine hours. John is right to call the picture a “Zero Sum” representation. The design strongly suggests to the viewer that, within each row, we’re looking at each team’s share of a total. Each pair of black and blue lines seem to be vying for control of their whole row, with the longest line being the “winner” in each case.

This sort of representation would make perfect sense for a measure that really was zero sum. Take an example from a properly good sport, like rugby. There, like in basketball, to a first approximation a team either has the ball or it doesn’t.1 But there’s no shot clock in rugby, and possession routinely gets turned over without the game stopping. So, knowing that Team A had 65% possession is not only informative, it also immediately entails that Team B had 35%. You could show that with a representation like one of the rows above.

Literally none of the measures in the Basketball data above are zero-sum in this way. Both teams could shoot 100% from the free throw line, or zero percent. But because the first three measures shown are percentages, this reinforces the zero-sum impression given by the lines. It certainly did that in my case. But then, starting with Assists, the remaining rows are just absolute numbers. When I started looking at the absolute numbers, I got confused a second time by the length of the lines. “Oh so it’s not a share, it’s the value” I thought—but no, they do correspond in terms of relative proportions to the teams share within each row. But they’re not really shares they’re just magnitudes. But they have to be shown in a fixed space and we want to make them relatively comparable somehow so … Argh.

It would be nice if there were One Weird Trick to fully fix this figure. But I’m not sure that there is. For example, at a minimum we could redraw these numbers to reflect the fact that they’re not zero-sum. Keep each measure as a row (i.e. on the y-axis) but have the lines, or columns, be side by side within each category instead of facing off. Like this:

Team Stats side by side for each measure.

Team Stats side by side for each measure.

This view at least lets you immediately see who “won” each measure. The viewer can just directly compare the length of the bars in each category. People are really good at doing that accurately. In that sense it’s much less confusing than the original. But there’s still a lot wrong with it. The core problem is that when we draw a graph like this, we’re usually putting the same kind of thing (e.g. countries, or religious groups, or sports teams) on the y-axis, and then seeing how different their scores are on some single measure (e.g. GDP, or number of adherents, or average points scored per game), which we put on the x-axis. Maybe we use color to break things out by some third measure as well.2 In this case, I’ve just labeled the x-axis as generically as possible. “Value” covers the range of all the measures. The lowest value is 5, in Largest Lead. The highest is 88, in Free Throw %. But these numbers are not meaningfully comparable. The graph encourages us to compare across as well as within categories. But while within-category comparisons are meaningful, the between-category ones are not. There were way more Bench Points than Blocks in the game. But that is not a useful thing to know.

Knowing who won each measure isn’t nothing. It can be informative about how the game went, maybe especially when a team won the game but “lost” on a number of the measures. If you really wanted to lean in to that aspect, you could sort of justify the zero-sum view, and maybe look for a way to sort and order by “how much” a team “won” each category. But again, what’s the right denominator for those measures? For instance, do we care about a team’s share of all Defensive Rebounds in the game? Or do we care about the share of Defensive Rebounds a team won relative to every opportunity it had to make a Defensive Rebound? How meaningful is ordering our rows by those kinds of shares? Even worse, some measures (notably Fouls) are bad to “win”, so we’d have to do something about those.

Team Stats side by side and ordered from absolute highest to lowest, whatever that means.

Team Stats side by side and ordered from absolute highest to lowest, whatever that means.

Our fundamental problem is that we just have two cases (the teams) and fifteen different measures, or variables. Each variable, except for the three percentages, is in effect on its own scale. There’s no direct way to make comparisons across them. Sure, some of these measures are probably going to be associated with one another—e.g. Turnovers and Points Off Turnovers—but the numeric values aren’t directly comparable in general. If you know a lot about basketball you might have some informative rules of thumb about each one of these measures, or some of them in combination. But at that point the lines in this particular graph are not going to be doing any work for you; you’ll just end up looking directly at the numbers. If we had data on all these measures for every NBA game for a whole season then we could of course do much more with them, because then each measure would have a distribution across all games and across all teams.

As it is, the purpose of the “Stats” screen in Apple Sports is just to summarize information from a single game. The other thing I could think of to do with the numbers as kind of graph is something like this:

A back-to-back column chart.

A back-to-back column chart.

This is marginally more helpful than the one before just because, again, it gets rid of the unhelpful zero-sum look of the original. As I hope you can immediately see, it creates many other difficulties. It also doesn’t do away with the core problem. That problem is principally one of information design rather than data visualization. What I mean is that what we’re trying to organize is, in effect, fifteen pairs of related but fundamentally distinct numbers. If we had fifteen cases and two variables things would be simple. But with fifteen variables and two cases … well, this is not the kind of thing you can make a single effective and non-confusing graph out of. That’s why I kind of sympathize with the designer. In a constrained space they have to show thirty numbers (thirty two, including the score). Lots of information. A straight table seems like it would be boring. Surely there’s some way to thematically integrate the numbers in a visually appealing manner that brings out some of the relationships across the rows. That’s what graphs do; it seems like the right thing to reach for. But at its heart this information is not a graph. It just sort of looks like one, and that ends up confusing people.


  1. Modulo some measurement decisions about how to determine when possession is turned over while the ball is in play. ↩︎

  2. Here’s an example of a graph with a categorical measure on the y-axis, a continuous measure on the x-axis, and an additional categorical feature shown with color. ↩︎

Read the whole story
mrmarchant
17 hours ago
reply
Share this story
Delete

Is the web being summarized to death?

1 Share
At Google I/O, new features bring AI agents into the inbox and YouTube in ways that further strain the relationship between publishers and platforms

Read the whole story
mrmarchant
18 hours ago
reply
Share this story
Delete

Why do students fail at computer science?

1 Share

It’s funny when you see students complain about how hard computer science (CS) courses are, on platforms such as Reddit. How they are defeated and more importantly, why does this happen? How do students that were getting A’s in high school all of a sudden regress to barely passing some of their harder courses?

There are a multitude of reasons, but the first may be that high school no longer really prepares you for the rigours of many university courses. Many would of course say that STEM courses are harder than those of the humanities, however if you can’t write half decently, then a history degree will be a struggle to adapt to. Too many high school courses seem to give out good grades. As of early 2026, Ontario high school students entering university commonly have final admission averages between 85.4% and 92.9%. So that means most people entering university have an “A” average in high school. From a sheer statistics viewpoint, that’s just not realistic. With one-third of Ontario high school students transitioning to university, that means 33% of high school graduates are getting an average GPA of A. Why is this happening? You can likely blame, in part, the dysfunctional process of grade inflation.

So when some of these students hit university, taking courses that they may not have encountered before, they sometimes don’t do as well as they think they will. Being ill-prepared is one thing, but using the same approach to learning as high school is also a problem. With CS there are also students who decide a CS degree would be a good idea, for whatever reason, but have little or no background in the subject, or believe it will be easy (where they get this from one does wonder). People may somehow make it through first year, but end up getting stuck on second year courses where the bar is set much higher. They will blame the course (“it’s too hard”), blame the instructor (“doesn’t teach well”), blame the TA, but they hardly ever look at themselves and their study habits, their understanding of the material, or even their own suitability to pursue a CS degree. I noted one individual on Reddit who says they studied 24 hours for a midterm and “got cooked”. Look, the amount of time you study doesn’t matter one iota if you aren’t actually absorbing anything. So where do these students go wrong?

  • They don’t think they need to code. If you don’t know how to code, then you are not going to be successful in CS. Students should not rely on AI to produce good code, especially if the can’t understand what the code does.
  • They fail to understand enough technical details. CS is a technical field, you have to understand more than just basic terminology. You have to understand how something works, and have the ability to implement it if required.
  • They don’t do practice questions to prepare for exams and quizzes.
  • They use AI to do assignments and answer every question they have, assuming AI will always be correct. If you are constantly using AI then in all likelihood you have little or no comprehension of the the subject matter. AI like ChatGPT is raising a generation of programmers who don’t actually know how to program.
  • They fail to read textbooks, or any reference material for that matter.
  • They fail to use the provided support. TA and professor office hours are often empty because nobody has any questions (because they can’t be bothered).
  • They never learn to use the command line, a guaranteed recipe for failure.
  • They don’t practice enough. Learning to program means you have to code, code, and code some more.
  • They lack problem solving skills.

In some institutions, math might also be a problem, however things like software engineering honestly don’t need a ton of math. Math is great if you are planning to go into hardware, digital systems, or theoretical computer science, but otherwise you likely will never use calculus. Matrices and vectors are of more interest, perhaps discrete math, and yes, you should have good basic math skills, but a CS degree should not wholly be about math anymore.

But here’s the thing, CS is hard. It is also constantly changing, so welcome to lifelong learning. The languages you learn today won’t be the only ones you will need over the lifespan of a career.



Read the whole story
mrmarchant
18 hours ago
reply
Share this story
Delete

The Secret to Winning on Jeopardy . “To win on...

1 Share

The Secret to Winning on Jeopardy. “To win on Jeopardy, you don’t need to learn everything. You just need to learn one thing about everything.” As an proficient player of Yell Answers At The TV Jeopardy in my teen years, I can confirm this strat.

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories