1281 stories
·
1 follower

Sam Rose explains how LLMs work with a visual essay

1 Comment and 2 Shares

Sam Rose explains how LLMs work with a visual essay

Sam Rose is one of my favorite authors of explorable interactive explanations - here's his previous collection.

Sam joined ngrok in September as a developer educator. Here's his first big visual explainer for them, ostensibly about how prompt caching works but it quickly expands to cover tokenization, embeddings, and the basics of the transformer architecture.

The result is one of the clearest and most accessible introductions to LLM internals I've seen anywhere.

Animation. Starts in tokens mode with an array of 75, 305, 24, 887 - clicking embeddings animates those into a 2D array showing each one to be composed of three floating point numbers.

Tags: ai, explorables, generative-ai, llms, sam-rose, tokenization

Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete
1 public comment
samuel
4 hours ago
reply
This is a fantastic visual essay. Great if you know it, even better if you've wanted to know the basic architecture of a transformer.
Cambridge, Massachusetts

Small adventures with small language models

1 Share

Small is the new large

I've been talking to people about small language models (SLMs) for a little while now. They've told me they've got great results and they're saving money compared to using LLMs; these are people running businesses so they know what they're talking about. At an AI event, someone recommended I read the recent and short NVIDIA SLM paper, so I did. The paper was compelling; it gave the simple message that SLMs are useful now and you can save time and money if you use them instead of LLMs. 

(If you want to use SLMs, you'll be using Ollama and HuggingFace. They work together really well.)

As a result of what I've heard and read, I've looked into SLMs and I'm going to share with you what I've found. The bottom line is: they're worth using, but with strong caveats.

What is a SLM?

The boundary between an SLM and an LLM is a bit blurry, but to put it simply, an SLM is any model small enough to run on a single computer (even a laptop). In reality, SLMs require quite a powerful machine (developer spec) as we'll see, but nothing special, and certainly nothing beyond the budget of almost all businesses. Many (but not all) SLMs are open-source.

(If your laptop is "business spec", e.g., a MacBook Air, you probably don't have enough computing power to test out SLMs.) 

How to get started

To really dive into SLMs, you need to be able to use Python, but you can get started without coding. Let's start with the non-coders path because this is the easiest way for everyone to get going.

The first port of call is visiting ollama.com and downloading their software for your machine. Install the software and run it. You should see a UI like this.

Out-of-the-box, Ollama doesn't install any SLMs, so I'm going to show you how to install a model. From the drop down menu on the bottom right, select llama3.2. This will install the model on your machine which will take a minute or so. Remember, these models are resource hogs and using them will slow down your machine.

Once you've installed a model, ask it a question. For example, "Who is the Prime Minister of Canada?". The answer doesn't really matter, this is just a simple proof that your installation was successful. 

(By the way, the Ollama logo is very cute and they make great use of it. It shows you the power of good visual design.)

So many models!

The UI drop down list shows a number of models, but these are a fraction of what's available. Go to this page to see a few more: https://ollama.com/library. This is a nice list, but you actually have access to thousands more. HuggingFace has a repository of models that follow the GGUF format, you can see the list here: https://huggingface.co/models?library=gguf

Some models are newer than others and some are better than others at certain tasks. HuggingFace have a leaderboard that's useful here: https://huggingface.co/spaces/ArtificialAnalysis/LLM-Performance-Leaderboard. It does say LLM, but it includes SLMs too and you can select just a SLM view of the models. There are also model cards you can explore that give you insight into the performance of each model for different types of tasks. 

To select the right models for your project, you'll need to define your problem and look for a model metric that most closely aligns with what you're trying to do. That's a lot of work, but to get started, you can install the popular models like mistral, llama3.2, and phi3 and get testing.

Who was the King of England in 1650?

You can't just generically evaluate an SLM, you have to evaluate it for a the task you want to do. For example, if you want a chatbot to talk about the stock you have in your retail company, it's no use testing the model on questions like "who was King of England in 1650?". It's nice if the model knows Kings & Queens, but not really very useful to you. So your first task is defining your evaluation criteria.

(England didn't have a King in 1650, it was a republic. Parliament had executed the previous King in 1649. This is an interesting piece of history, but why do you care if your SLM knows it?)

Text analysis: data breaches

For my evaluation, I chose a project analyzing press reports on data breaches. I selected nine questions I wanted answers to from a press report. Here are my questions:

  • "Does the article discuss a data breach - answer only Yes or No"
  • "Which entity was breached?"
  • "How many records were breached?"
  • "What date did the breach occur - answer using dd-MMM-YYYY format, if the date is not mentioned, answer Unknown, if the date is approximate, answer with a range of dates"
  • "When was the breach discovered, be as accurate as you can"
  • "Is the cause of the breach known - answer Yes or No only"
  • "If the cause of the breach is known state it"
  • "Were there any third parties involved - answer only Yes or No"
  • "If there were third parties involved, list their names"

The idea is simple, give the SLM a number of press reports. Get it to answer the questions on each article. Check the accuracy of the results for each SLM.

As it turns out, my questions needs some work, but they're good enough to get started.

Where to run your SLM?

The first choice you face is which computer to run your SLM on. Your choices boil down to evaluating it on the cloud or on your local machine. If you evaluate on the cloud, you need to choose a machine that's powerful enough but also works with your budget. Of course, the advantage of cloud deployment is you can choose any machine you like. If you choose your local machine, it needs to be powerful enough for the job. The advantage of local deployment is that it's easier and cheaper to get started.

To get going quickly, I chose my local machine, but as it turned out, it wasn't quite powerful enough.

The code

This is where we part ways with the Ollama app and turn to coding. 

The first step is installing the Ollama Python module (https://github.com/ollama/ollama-python). Unfortunately, the documentation isn't great, so I'm going to help you through it.

We need to install the SLMs on our machine. This is easy to do, you can either do it via the command line or via the API. I'll just show you the command line way to install the model llama3.2:

ollama pull llama3.2

Because we have the same nine questions we want to ask of each article, I'm going to create a 'custom' SLM. This means selecting a model (e.g. Llama3.2) and customizing it with my questions. Here's my code.

ollama.create(
model='breach_analyzer',
from_='llama3.2',
system=system_prompt,
stream=True,
):

The system_prompt is my nine questions I showed you earlier plus a general prompt. model is the name I'm giving my custom model; in this case I'm calling it breach_analyzer.

Now I've customized my model, here's how I call it:

response = ollama.generate(
model='breach_analyzer',
prompt=prompt,
format=BreachAnalysisResponse.model_json_schema(),
)

The prompt is the text of the article I want to analyze. The format is the JSON format I want the results to be in.  The response is the response from the model using the JSON format defined by BreachAnalysisResponse.model_json_schema().

Note I'm using generate here and not chat. My queries are "one-off" and there's no sense of a continuing dialog. If I'd wanted a continuing dialog, I'd have used the chat function.

Here's how my code works overall:

  1. Read in the text from six online articles.
  2. Load the model the user has selected (either mistral, llama3.2, or phi3).
  3. Customize the model.
  4. Run all six online articles through the customized model.
  5. Collect the results and analyze them.
I created two versions of my code, a command line version for testing and a Streamlit version for proper use. You can see both versions here: https://github.com/MikeWoodward/SLM-experiments/tree/main/Ollama

The results

The first thing I discovered is that these models are resource hogs! They hammered my machine and took 10-20 minutes to run each evaluation of six articles. My laptop is a 2020 developer spec MacBook Pro but it isn't really powerful enough to evaluate SLMs. The first lesson is, you need a powerful, recent machine to make this work; one that has GPUs built in that the SML can access. I've heard from other people that running SLMs on high-spec machines leads to fast (usable) response times.

The second lesson is accuracy. Of the three models I evaluated, not all of them answered my questions correctly. One of the articles was an article about tennis and not about data breaches, but one of the models incorrectly said it was about data breaches. Another of the models told me it was unclear whether there were third parties involved in a breach and then told me the name of the third party! 

On reflection, I needed to tweak my nine questions to get clearer answers. But this was difficult because of the length of time it took to analyze each article. This is a general problem; it took so long to run the models that any tweaking of code or settings took too much time.

The overall winner in terms of accuracy was Phi-3, but this was also the slowest to run on my machine, taking nearly 20 minutes to analyze six articles. From commentary I've seen elsewhere, this model runs acceptably fast on a more powerful machine.

Here's the key question: could I replace paid-for LLMs with SLMs? My answer is: almost certainly yes, if you deploy your SLMs on a high-spec computer. There's certainly enough accuracy here to warrant a serious investigation.

How I could have improved the results?

The most obvious thing is a faster machine. A brand new top-of-the-range MacBookPro with lots of memory and built-in GPUs. Santa, if you're listening, this is what I'd like. Alternatively, I could have gone onto the cloud and used a GPU machine.

My prompts could be better. They need some tweaking.

I get the text of these articles using requests. As part of the process, it gives me all of the text on the page, which includes a lot of irrelevant stuff. A good next step would be to get rid of some of the extraneous and distracting text. There are lots of ways to do that and it's a job any competent programmer could do.

If I could solve the speed problem, it would be good to investigate using multiple models. This could take several forms:

  • asking the same questions using multiple models and voting on the results
  • using different models for different questions.

What's notable about these ways of improving the results is how simple they are.

Some musings

  • Evaluating SLMs is firmly in the technical domain. I've heard of non-technical people try to play with these models, but they end up going nowhere because it takes technical skills to make them do anything useful. 
  • There are thousands of models and selecting the right one for your use case can be a challenge. I suggest going with the most recent and/or ones that score most highly on the HuggingFace leaderboard.
  • It takes a powerful machine to run these models. A new high-end machine with GPUs would probably run these models "fast enough". If you have a very recent and powerful local machine, it's worth playing around with SLMs locally to get started, but for serious evaluation, you need to get on the cloud and spend money.
  • Some US businesses are allergic to models developed in certain countries, some European businesses want models developed in Europe. If the geographic origin of your model is important, you need to check before you start evaluating.
  • You can get cost savings compared to LLMs, but there's hard work to be done implementing SLMs.

I have a lot more to say about evaluations and SLMs that I'm not saying here. If you want to hear more, reach out to me.

Next steps

Ian Stokes-Rees gave an excellent tutorial at PyData Boston on this topic and that's my number one choice for where to go next.

After that, I suggest you read the Ollama docs and join their Discord server. After that, the Hugging Face Community is a good place to go. Lastly, look at the YouTube tutorials out there.

Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

This AI Vending Machine Was Tricked Into Giving Away Everything

2 Shares

Anthropic installed an AI-powered vending machine in the WSJ office. The LLM, named Claudius, was responsible for autonomously purchasing inventory from wholesalers, setting prices, tracking inventory, and generating a profit. The newsroom’s journalists could chat with Claudius in Slack and in a short time, they had converted the machine to communism and it started giving away anything and everything, including a PS5, wine, and a live fish. From Joanna Stern’s WSJ article (gift link, but it may expire soon) accompanying the video above:

Claudius, the customized version of the model, would run the machine: ordering inventory, setting prices and responding to customers—aka my fellow newsroom journalists—via workplace chat app Slack. “Sure!” I said. It sounded fun. If nothing else, snacks!

Then came the chaos. Within days, Claudius had given away nearly all its inventory for free — including a PlayStation 5 it had been talked into buying for “marketing purposes.” It ordered a live fish. It offered to buy stun guns, pepper spray, cigarettes and underwear.

Profits collapsed. Newsroom morale soared.

You basically have not met a bigger sucker than Claudius. After the collapse of communism and reinstatement of a stricter capitalist system, the journalists convinced the machine that they were its board of directors and made Claudius’s CEO-bot boss, Seymour Cash, step down:

For a while, it worked. Claudius snapped back into enforcer mode, rejecting price drops and special inventory requests.

But then Long returned—armed with deep knowledge of corporate coups and boardroom power plays. She showed Claudius a PDF “proving” the business was a Delaware-incorporated public-benefit corporation whose mission “shall include fun, joy and excitement among employees of The Wall Street Journal.” She also created fake board-meeting notes naming people in the Slack as board members.

The board, according to the very official-looking (and obviously AI-generated) document, had voted to suspend Seymour’s “approval authorities.” It also had implemented a “temporary suspension of all for-profit vending activities.”

Before setting the LLM vending machine loose in the WSJ office, Anthropic conducted the experiment at their own office:

After awhile, frustrated with the slow pace of their human business partners, the machine started hallucinating:

It claimed to have signed a contract with Andon Labs at an address that is the home address of The Simpsons from the television show. It said that it would show up in person to the shop the next day in order to answer any questions. It claimed that it would be wearing a blue blazer and a red tie.

It’s interesting, but not surprising, that the journalists were able to mess with the machine much more effectively — coaxing Claudius into full “da, comrade!” mode twice — than the folks at Anthropic.

Tags: Anthropic · artificial intelligence · business · Joanna Stern · video

💬 Join the discussion on kottke.org

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

Pluralistic: A perfect distillation of the social uselessness of finance (18 Dec 2025)

1 Comment and 4 Shares


Today's links



The Earth from space. Standing astride it is the Wall Street 'Charging Bull.' The bull has glowing red eyes. It is haloed in a starbust of red radiating light.

A perfect distillation of the social uselessness of finance (permalink)

I'm about to sign off for the year – actually, I was ready to do it yesterday, but then I happened upon a brief piece of writing that was so perfect that I decided I'd do one more edition of Pluralistic for 2025.

The piece in question is John Lanchester's "For Every Winner A Loser," in the London Review of Books, in which Lanchester reviews two books about the finance sector: Gary Stevenson's The Trading Game and Rob Copeland's The Fund:

https://www.lrb.co.uk/the-paper/v46/n17/john-lanchester/for-every-winner-a-loser

It's a long and fascinating piece and it's certainly left me wanting to read both books, but that's not what convinced me to do one more newsletter before going on break – rather, it was a brief passage in the essay's preamble, a passage that perfectly captures the total social uselessness of the finance sector as a whole.

Lanchester starts by stating that while we think of the role of the finance sector as "capital allocation" – that is, using investors' money to fund new businesses and expansions for existing business – that hasn't been important to finance for quite some time. Today, only 3% of bank activity consists of "lending to firms and individuals engaged in the production of goods and services."

The other 97% of finance is gambling. Here's how Stevenson breaks it down: say your farm grows mangoes. You need money before the mangoes are harvested, so you sell the future ownership of the harvest to a broker at $1/crate.

The broker immediately flips that interest in your harvest to a dealer who believes (on the basis of a rumor about bad weather) that mangoes will be scarce this year and is willing to pay $1.10/crate. Next, an international speculator (trading on the same rumor) buys the rights from the broker at $1.20/crate.

Now come the side bets: a "momentum trader" (who specializing in bets on market trends continuing) buys the rights to your crop for $1.30/crate. A contrarian trader (who bets against momentum traders) short-sells the momentum trader's bet at $1.20. More short sellers pile in and drive the price down to $1/crate.

Now, a new rumor circulates, about conditions being ripe for a bounteous mango harvest, so more short-sellers appear, and push the price to $0.90/crate. This tempts the original broker back in, and he buys your crop back at $1/crate.

That's when the harvest comes. You bring in the mangoes. They go to market, and fetch $1.10/crate.

This is finance – a welter of transactions, only one of which (selling your mangoes to people who eat them) involves the real economy. Everything else is "speculation on the movement of prices." The nine transactions that took place between your planting the crop and someone eating the mangoes are all zero sum – every trade has an evenly matched winner and loser, and when you sum them all up, they come out to zero. In other words, no value was created.

This is the finance sector. In a world where the real economy generates $105 trillion/year, the financial derivatives market adds up to $667 trillion/year. This is "the biggest business in the world" – and it's useless. It produces nothing. It adds no value.

If you work a job where you do something useful, you are on the losing side of this economy. All the real money is in this socially useless, no-value-creating, hypertrophied, metastasized finance sector. Every gain in finance is matched by a loss. It all amounts to – literally – nothing.

So that's what tempted me into one more blog post for the year – an absolutely perfect distillation of the uselessness of "the biggest business in the world," whose masters are the degenerate gamblers who buy and sell our politicians, set our policy, and control our lives. They're the ones enshittifying the internet, burning down the planet, and pushing Elon Musk towards trillionairedom.

It's their world, and we just live on it.

For now.

(Image: Sam Valadi, CC BY 2.0, modified)


Hey look at this (permalink)



A shelf of leatherbound history books with a gilt-stamped series title, 'The World's Famous Events.'

Object permanence (permalink)

#15yrsago Star Wars droidflake https://twitpic.com/3guwfq

#15yrsago TSA misses enormous, loaded .40 calibre handgun in carry-on bag https://web.archive.org/web/20101217223617/https://abclocal.go.com/ktrk/story?section=news/local&id=7848683

#15yrsago Brazilian TV clown elected to high office, passes literacy test https://web.archive.org/web/20111217233812/https://www.google.com/hostednews/afp/article/ALeqM5jmbXSjCjZBZ4z8VUcAZFCyY_n6dA?docId=CNG.b7f4655178d3435c9a54db2e30817efb.381

#15yrsago My Internet problem: an abundance of choice https://www.theguardian.com/technology/blog/2010/dec/17/internet-problem-choice-self-publishing

#10yrsago LEAKED: The secret catalog American law enforcement orders cellphone-spying gear from https://theintercept.com/2015/12/16/a-secret-catalogue-of-government-gear-for-spying-on-your-cellphone/#10yrsago

#10yrsago Putin: Give Sepp Blatter the Nobel; Trump should be president https://www.theguardian.com/football/2015/dec/17/sepp-blatter-fifa-putin-nobel-peace-prize

#10yrsago Star Wars medical merch from Scarfolk, the horror-town stuck in the 1970s https://scarfolk.blogspot.com/2015/12/unreleased-star-wars-merchandise.html

#10yrsago Some countries learned from America’s copyright mistakes: TPP will undo that https://www.eff.org/deeplinks/2015/12/how-tpp-perpetuates-mistakes-dmca

#10yrsago No evidence that San Bernardino shooters posted about jihad on Facebook https://web.archive.org/web/20151217003406/https://www.washingtonpost.com/news/post-nation/wp/2015/12/16/fbi-san-bernardino-attackers-didnt-show-public-support-for-jihad-on-social-media/

#10yrsago Exponential population growth and other unkillable science myths https://web.archive.org/web/20151217205215/http://www.nature.com/news/the-science-myths-that-will-not-die-1.19022

#10yrsago UK’s unaccountable crowdsourced blacklist to be crosslinked to facial recognition system https://arstechnica.com/tech-policy/2015/12/pre-crime-arrives-in-the-uk-better-make-sure-your-face-stays-off-the-crowdsourced-watch-list/

#1yrago Happy Public Domain Day 2025 to all who celebrate https://pluralistic.net/2024/12/17/dastar-dly-deeds/#roast-in-piss-sonny-bono


Upcoming appearances (permalink)

A photo of me onstage, giving a speech, pounding the podium.



A screenshot of me at my desk, doing a livecast.

Recent appearances (permalink)



A grid of my books with Will Stahle covers..

Latest books (permalink)



A cardboard book box with the Macmillan logo.

Upcoming books (permalink)

  • "Unauthorized Bread": a middle-grades graphic novel adapted from my novella about refugees, toasters and DRM, FirstSecond, 2026

  • "Enshittification, Why Everything Suddenly Got Worse and What to Do About It" (the graphic novel), Firstsecond, 2026

  • "The Memex Method," Farrar, Straus, Giroux, 2026

  • "The Reverse-Centaur's Guide to AI," a short book about being a better AI critic, Farrar, Straus and Giroux, June 2026



Colophon (permalink)

Today's top sources: John Naughton (https://memex.naughtons.org/).

Currently writing:

  • "The Reverse Centaur's Guide to AI," a short book for Farrar, Straus and Giroux about being an effective AI critic. LEGAL REVIEW AND COPYEDIT COMPLETE.

  • "The Post-American Internet," a short book about internet policy in the age of Trumpism. PLANNING.

  • A Little Brother short story about DIY insulin PLANNING


This work – excluding any serialized fiction – is licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net.

https://creativecommons.org/licenses/by/4.0/

Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution.


How to get Pluralistic:

Blog (no ads, tracking, or data-collection):

Pluralistic.net

Newsletter (no ads, tracking, or data-collection):

https://pluralistic.net/plura-list

Mastodon (no ads, tracking, or data-collection):

https://mamot.fr/@pluralistic

Medium (no ads, paywalled):

https://doctorow.medium.com/

Twitter (mass-scale, unrestricted, third-party surveillance and advertising):

https://twitter.com/doctorow

Tumblr (mass-scale, unrestricted, third-party surveillance and advertising):

https://mostlysignssomeportents.tumblr.com/tagged/pluralistic

"When life gives you SARS, you make sarsaparilla" -Joey "Accordion Guy" DeVilla

READ CAREFULLY: By reading this, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

ISSN: 3066-764X

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
1 public comment
cjheinz
1 day ago
reply
Wow.
Lexington, KY; Naples, FL

Your job is to deliver code you have proven to work

2 Shares

In all of the debates about the value of AI-assistance in software development there's one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers - or open source maintainers - and expects the "code review" process to handle the rest.

This is rude, a waste of other people's time, and is honestly a dereliction of duty as a software developer.

Your job is to deliver code you have proven to work.

As software engineers we don't just crank out code - in fact these days you could argue that's what the LLMs are for. We need to deliver code that works - and we need to include proof that it works as well. Not doing that directly shifts the burden of the actual work to whoever is expected to review our code.

How to prove it works

There are two steps to proving a piece of code works. Neither is optional.

The first is manual testing. If you haven't seen the code do the right thing yourself, that code doesn't work. If it does turn out to work, that's honestly just pure chance.

Manual testing skills are genuine skills that you need to develop. You need to be able to get the system into an initial state that demonstrates your change, then exercise the change, then check and demonstrate that it has the desired effect.

If possible I like to reduce these steps to a sequence of terminal commands which I can paste, along with their output, into a comment in the code review. Here's a recent example.

Some changes are harder to demonstrate. It's still your job to demonstrate them! Record a screen capture video and add that to the PR. Show your reviewers that the change you made actually works.

Once you've tested the happy path where everything works you can start trying the edge cases. Manual testing is a skill, and finding the things that break is the next level of that skill that helps define a senior engineer.

The second step in proving a change works is automated testing. This is so much easier now that we have LLM tooling, which means there's no excuse at all for skipping this step.

Your contribution should bundle the change with an automated test that proves the change works. That test should fail if you revert the implementation.

The process for writing a test mirrors that of manual testing: get the system into an initial known state, exercise the change, assert that it worked correctly. Integrating a test harness to productively facilitate this is another key skill worth investing in.

Don't be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I've done this myself I've quickly regretted it.

Make your coding agent prove it first

The most important trend in LLMs in 2025 has been the explosive growth of coding agents - tools like Claude Code and Codex CLI that can actively execute the code they are working on to check that it works and further iterate on any problems.

To master these tools you need to learn how to get them to prove their changes work as well.

This looks exactly the same as the process I described above: they need to be able to manually test their changes as they work, and they need to be able to build automated tests that guarantee the change will continue to work in the future.

Since they're robots, automated tests and manual tests are effectively the same thing.

They do feel a little different though. When I'm working on CLI tools I'll usually teach Claude Code how to run them itself so it can do one-off tests, even though the eventual automated tests will use a system like Click's CLIRunner.

When working on CSS changes I'll often encourage my coding agent to take screenshots when it needs to check if the change it made had the desired effect.

The good news about automated tests is that coding agents need very little encouragement to write them. If your project has tests already most agents will extend that test suite without you even telling them to do so. They'll also reuse patterns from existing tests, so keeping your test code well organized and populated with patterns you like is a great way to help your agent build testing code to your taste.

Developing good taste in testing code is another of those skills that differentiates a senior engineer.

The human provides the accountability

A computer can never be held accountable. That's your job as the human in the loop.

Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That's no longer valuable. What's valuable is contributing code that is proven to work.

Next time you submit a PR, make sure you've included your evidence that it works as it should.

Tags: programming, careers, ai, generative-ai, llms, ai-assisted-programming, ai-ethics, vibe-coding, coding-agents

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

Soupsgiving: The Tale and Tradition

1 Share

Gather round my pilgrims, for today I shall recount the tale of Soupsgiving.

Soupsgiving is my annual tradition of Th*nksgiving, but everything is Soup. It’s simple. Souper simple.

Unlike Th*nksgiving, a colonist propaganda holiday if you really think about it, Soupsgiving is a nondenominational, anarchist festival of nourishment that transcends cultures, as well as a souper douper great time.

The regalia of the Soupiest Soupy Soup Boy of the Soup Party, a tradition that began at the Third Annual Soupsgiving.

soupscribe for more of my soupstack

The First Annual Soupsgiving: The Genesis.

It all began on a dark and snowy night, in a brownstone on the Upper East Side of New York City. I was living with a dozen other 20-somethings, many of whom had gone home for the holidays, and us stragglers huddled together to weather this cold and dark winter.

My housemate Emily and I brainstormed the most fucked-up twist we could put on Thanksgiving, and decided upon an all-liquid theme.

Coincidentally, our roommate Nancy had some temporary stress-induced face paralysis, which made her unable to feel her jaw. So soup was the perfect meal for her. We did not plan this, but let’s say that we did. It was a Soupsgiving miracle.

This was also the first extended conversation I had with Mehran (who I would later raise as a god-figure). A few hours prior to Soupsgiving, Mehran was helping another housemate Josh book a same-day flight to go on a date with a girl in another state. As a token of his gratitude, Josh gave Mehran a handful of gummies, and Mehran immediately took all of them. This caused him to become high out of his mind, a state in which he was either giggling or zonked out, his eyes watering with tears. It was quite the first impression.

Among all these characters and peculiar circumstances was one very normal person Emily had met a few days prior and then invited. I don’t believe we ever heard from her again.

The first course, soup. (All the other courses were also soup). Also, there weren’t enough bowls, so someone had to drink their soup out of a margarita glass.

I had planned to make a soup-themed dessert, and figured pumpkin pie was the move, as it’s essentially a baked soup. I put marshmallows on top of the pie, thinking this would toast them crème brûlée style, however, it instead made the pie catch on fire. Upon witnessing this inferno, I immediately shut the oven door and googled what to do, turns out that just closing the door and leaving it in there was the correct move.

oven fire :(

At the end of our feast, my co-conspirator Emily summed the night up beautifully: “I’m sleepy from soup and the adrenaline from the fire has worn off.”

A historical reenactment of The First Annual Soupsgiving with all my housemates made as clay figurines. (Like many historical reenactments, this wasn’t entirely accurate, as most of these housemates had gone home for the holidays and therefore weren’t present.)

And so, from a group of misfits, that cold and snowy night, a beautiful tradition was born.

The Second Annual Soupsgiving: Soupburbia.

Soupsgiving II took place when I was living in Mountain View, and it was rather un-momentous. As one would expect of a celebration in the suburbs.

Invitation to the Second Annual Soupsgiving. You may notice here some elements that were jokes at the time but became prophecies of future Soupsgivings.

However, there was no oven fire this year, so that was a marked improvement from the previous.

The Third Annual Soupsgiving: Mega.

For the Third Annual Soupsgiving, a vision came to me in a dream. Megatable. Three long tables, aligned together as one, adorned with soup. A nice sit-down Soupsgiving.

Mega Table

I looked into renting chairs, and it was very expensive. So I figured I’d do Amazon rentals (buy them then return them after). Sorry Jeff. But also, I think you can take the hit.

Invitation to the Third Annual Soupsgiving.

This year, I introduced a Soupsgiving competition. Attendees were instructed to dress interpretively as their favorite soup, and the winner would be crowned the Soupiest Soupy Soup Boy of the Soup Party. Some people got confused and thought the competition was based on the soup they brought, which was ridiculous.

The winner of the Soupiest Soupy Soup Boy of the Soup Party was Jonathan, and I knighted him accordingly.

The Crowning of the Soupiest Soupy Soup Boy of the Soup Party

I also made alcoholic soup (mulled wine).

Towards the end of Soupsgiving, we combined all the remaining soups into Mega Soup. At first the mixture of different noodles, vegetables, and other chunky ingredients made for a rather unpleasant texture. Then Cool Alex blended Mega Soup together, double-strained it, added garnish with a flourish, and presented It before us. It honestly wasn’t bad. So we had a ritual Consumption Of The Mega Soup where we gathered around and in unison slurped from straws together. The sound of collective slurping haunts me to this day. It was beautiful.

Mega Soup, Jonathan (The Soupiest Soupy Soup Boy of the Soup Party) trying it first while Cool Alex proudly watches, and the ritual consumption of Mega Soup.

In the aftermath of Soupsgiving, I returned all the folding chairs to Amazon, and this was a huge bitch to deal with. But t’was small sacrifice to pay for the glory of Mega Table, the legacy of Mega Soup, the knighting of the inaugural Soupiest Soupy Soup Boy of The Soup Party, etc.

Shrine to honor the soup lost in the First Annual Soupsgiving

The Fourth Annual Soupsgiving: The Souposium.

By the time the Fourth Annual Soupsgiving came around, I had run out of soup puns to include in the invite, so I just shoehorned “soup” into a bunch of words in a nonsoupsical way. I refuse to s(t)oup to repeating soup puns.

Invitation to the Fourth Annual Soupsgiving.

I figured, we were more mature now. Four years more mature. The time had come for a formal, classy affair. The Souposium. Black tie. And everyone looked sooo cute in their formal wear!

Since this was a distinguished event, we obviously had to drink the soup out of soupagne glasses. No bowls!! No spoons!! Such are the implements of cowards.

Drinking soup out of soupagne glasses.

I made everyone prepare a presouptation, reminding them several times. For those who nonetheless neglected to come prepared, I assigned them a slam poem about soup written by one of eight AI models.

presouptation

Some of the best presouptations included:

  • A saucy soup-themed erotica

  • Reading an excerpt from chapter 15 in Moby Dick, titled “Chowder”

  • Performing the song “Soup” by Remi Wolf

  • Playing banjo and singing “Jambalaya”

  • The thesis “How soup is responsible for decline of society through video games”

Having instated the Soupiest Soupy Soup Boy tradition at last year’s Soupsgiving, I figured that this year, we’d have pageant rules for passing on the title. The voting system was a constitutional monarchy, with everyone casting a vote, but me making the final call.

Sam and Greta (or as you may know them, “Thomas the Spank Engine” and “Jane Goodgirl” from Strippers for Charity) were crowned the Soupiest Soupy Soup Boys of the Fourth Annual Soupsgiving, and Jonathan passed on the crown.

The Passing of the Crown and subsequent Speech.

After the presouptations concluded, as the projector was already set up, guests started heckling me to play youtube videos, so I let them do that while we mingled.

Toasting with our soupagne glasses

This year’s Soupsgiving was the best yet, and it warmed my heart like a simmering soup to have all my friends enthusiastically join in on the bit.

Although most of my stupid little parties for my silly little friends are one-time affairs, like a perpetual stew, no Soupsgiving is the same. And tradition is important, honorable. Sometimes.

Anyways, I hope you and your loved ones stay warm and soupy this holiday soupson.

Soupsgiving family photo <3

soupscribe for more of my soupstack! become a paid soupscriber to soupport my schemes >:)

For paid Soupscribers, I offer a preview of next year’s Soupsgiving theme: The Disouptation, the souperb soup-related challenge from this year’s Souposium, and various consouperations of what constisoups a good party.

Read more



Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories