Introduction
This is the post I don’t think many people expected me to write, and I have(rightly!) surrounded myself with people who are generally uncomfortable tosomewhat hostile to “AI”, mostly for good reasons, though I’ll get into themany caveats on that as I go.
As activists mitigating the harms of “AI”, we need to be well informed, and weneed to understand what the specific harms are. Treating it with ahands-clean purist mindset will be extremely difficult and as activism, morealienating than effective. These are genuinely useful tools, and pretendingthey aren’t will not in fact win many hearts and minds.
This post is going to be very long, because in addition to technical context,I’m touching social issues, technical background, discourse norms, contextin a culture of rising technocracy and fascism funded by venture capital, andthe erosion of our information systems and cultural norms all at once. I can’tget into it all here, but I am not staying away from it on purpose.
Overall, I still believe that LLMs are a net negative on humanity, that thedestruction of our infosphere is going to have generational consequences, andthat if the whole thing disappeared from the face of the earth tomorrow, Iwouldn’t be sad. The damage is would still be out there, but the cheapness ofbullshit pervading everything would at least resume being human content millscale. Not to say that that was good before LLMs came along and made it thisbad, but it was better.
That said, that’s not going to happen, and the amount of effort required tomake it happen would be much better spent on organizing labor and climateaction. The AI industry may collapse in a house of cards. I think it somewhatlikely considering amount of financial trickery these companies are using. Butas someone I know put it: we’re not just going to forget that computers canwrite code now. We aren’t.
I want you to think about all of this with an intensely skeptical mind. Nothostile, mind you, but skeptical. Every claim someone makes may well becheckable. You can check! I recommend you do so. My math in this essay will berough back of envelope calculation, but I think that is appropriate given thetendency of the costs of technology to change orders of magnitude, andsituationally for things to vary by at least a factor of two.
And since we’re both operating in the domain of things not long ago consideredscience fiction, and because the leadership of AI companies tend to be filledwith people with a love of science fiction, many of whom won’t hesitate to, asis said, create the Torment Nexus from the popular science fiction novel Don’tCreate The Torment Nexus, I suggest one story to read and keep in mind:Marshall Brain’s “Manna – Two Views of Humanity’sFuture”.
TL;DR
- There are open models and closed; good code work needs to be done on stuffthat needs very high end hardware to run, at least in part.
- Chinese models are quite good, and structured differently as companies.
- Don’t bother running models on your own hardware at home to write code unlessyou’re a weird offline-first free software zealot. I kind of am, and still Isee the futility with the hardware I have on hand.
- Nobody agrees on the right way to do things.
- Everyone is selling something. Usually a grand vision handwaving the hard andbad parts.
- I’ll write more about how to actually use the tools in another segment.
- A lot of the people writing about this stuff are either executives who wantto do layoffs, or now-rich people who made it big in some company’s IPO. Takewhat they say with the grain of salt you’d use for someone insulated by moneyand who can have free time relatively easily. They are absolutely handwavingover impacts they themselves will not experience.
A note on terms
I am writing this with as much verbal precision as I can muster. I loathe termslike “Vibe Code”, and in general I am not jumping on any marketing waves andhype trains. I’m being specifically conservative in the words I use. I say LLM,not “AI”, when talking about the text generation models at the heart of most ofthe “AI” explosion. I’ll prefer technical terms to marketing buzzwords thewhole way through, even at the cost of being awkward and definitely a littlestodgy. Useful precision beats vacuous true statements every time, and thedifference now very much matters.
The Models
There are a zillion models out there. Generally the latest and greatest modelsby the most aggressive companies are called “frontier” models, and they arequite capable. The specific sizes and architectures are somewhat treated astrade secrets, at least among the American companies, so things like powerrequired to operate them and the kind of equipment required is the sort ofthings analysts in the tech press breathe raggedly over.
the American frontier models include
- Anthropic’s “Claude Opus”
- OpenAI’s GPT-5.2
- Google Gemini 3 Pro
- something racist from xAI called Grok.
The frontier models are a moving target as they’re always the mostsophisticated things each company can put forth as a product, and quite oftenthey’re very expensive to run. Most of the companies have tools that cleverlychoose cheap models for easy things and the expensive models for difficultthings. Remember this when evaluating anything resembling a benchmark: it’s aneasy place to play sleight of hand.
When you use a frontier model company’s products, most of the time you interactwith a mix of models. This is usually a somewhat cheaper to run version of thefrontier models as the main mode, sometimes offering the true best model as anoption, a thing that is sometimes invoked, and the whole thing is hidden behinda façade that mkes it all look the same. Version numbers often resemble cellphone marketing, with a race to hae bigger numbers, “X” and “v” in places tomake it seem exciting. There is no linear progression nor comparison of any ofthe numbers in the names of models or products.
I largely have no interest in interacting with the American frontier modelcompanies, as their approach is somewhat to dominate the industry and burn theworld doing it. Anthropic is certainly the best of the bunch but I really don’twant to play their games.
I do not know this for sure, but I expect these models run into the terabytesof weights, more than a trillion parameters, plus they are products with a lotof attached software — tools they can invoke, memory and databases and userprofiles fed into the system.
Behind them are the large models from other AI companies, largely Chinese,producing research models that they and others operate as services, and oftenthey are released openly (called “open weights models”). Additionally some ofthe frontier model companies will release research models for various purposes.All core AI companies pretty much style themselves as research organizationsfirst, and product companies second. Note that nearly every AI company callsits best model a frontier model, whether it fits with the above or not.
Chinese companies and therefore models often have a drive for efficiency thatthe American ones do not. They are not the same kind of market-dominatingmonopolist-oriented sorts that VC-funded American companies are. They aren’t ascapable, but they do more with less. They’re very pragmatic in their approachcompared to the science fiction fueled leadership of American AI companies.These models run in the hundreds of gigabytes and have hundreds of billion ofparameters, though most can be tweaked to run some parts in a GPU and the reston a CPU in main memory, if slowly. They can run on regular PC hardware, ifextremely high end hardware, and distillations and quantizations of thesemodels, while they lose some fidelity, fit on even more approachable hardware.Still larger than most people own, but these are not strictly datacenter-onlybeasts.
Large, capable open models (Mostly Chinese) include:
- z.AI’s GLM-4.7 and GLM-5
- Kimi K2.5
- MiniMax M2.1
- Deepseek-V3.2
- Alibaba’s Qwen3-Max
- Mistral Large 3
- Trinity Large
Mistral Large 3 comes out of Europe. Trinity comes out of the US, but has aless “win the AI race” mindset. There’s a lot of superpower “We need our ownsovereign solution” going on. China, the US and Europe are all making sure theyhave a slice of the AI pie.
I’m sure there’s more — the field is ever changing, and information about themodels from Chinese companies percolates slowly compared to the Americanfrontier models.
Behind these models are specialized smaller models, often sort-of good for codewriting tasks if one isn’t challenging them, but I actually think this is wherethe line of usefulness is drawn.
Medium-small coding models include:
- Qwen2.5-Coder
- GPT-OSS 120b
- Mistral’s Codestral
- GPT-4.7-Flash
- Claude Haiku
- Gemini 2.5 Coder
- Smaller versions of Qwen3
- Smaller versions of many other models
There’s also some much smaller models that will run on large gaming GPUs. Idon’t think they’re quite useful, they’re very attractive toys that people canget to do some truly impressive things, but I don’t think they’re all that.They are, however, about the capability of what kneejerk AI-haters expect,error-prone lossy toys that if anyone called “the future”, I’d laugh in theirface or spit at their feet. Notice how far down the list this is.
The Economics
LLMs are expensive pieces of software to run. Full stop, anything with broadutility is something that requires a GPU greater than most high end gaming PCs,and quite a lot of RAM. I am setting a high bar here for utility, because AIboosters tend to have a frustrating way of equivocating, showing low numbersfor costs when it suits them, and high ones for performance, despite not beingfrom the same models. There are domain specific tasks and models that can workin mere small GPU or even Raspberry Pi levels of computation, but for generalpurpose “reasoning” tasks and coding specifically, right now in 2026, withcurrent model efficiencies, and with current hardware, if you want to use LLMsfor writing software, you will be throwing a lot of computing power at it. A$5000 budget would barely suffice to run something like gpt-oss 120b(OpenAI’s open model that is okay at code-writing tasks). Additionally, if youkept the model busy 100% of the time, you might be talking $50-$200 inelectricity depending on local prices, per month.
If you spent $15,000 and triple the electricity you could run something likeGLM-4.7 at a really good pace.
Water cooling for data centers is probably the most talked about environmentalcost, but I think it’s actually a distraction most of the time. Dear god why dopeople build data centers in Arizona, that’s a travesty, but also that’s aspecific decision made by specific people with names and addresses who shouldbe protested specifically.
Datacenter growth at the cost of people driving up electricity demand is a bigproblem, and we need to get back on the solar train as fast as possible.
This is not inexpensive software to run. However, it’s not an unfathomableamount of power.
Training models is wildly expensive, but it amortizes. There are in factdifficult economic conversations we need to be having here, but it’s allobscured by the fog of “what about the water?” and “AI will save us all andchange everything!” that pervades the discourse. The framing of the argumentsat large are fundamentally misleading, by basically everyone, pro or anti-AI,and much more about affiliative rhetoric than argumentative. We need to havethe arguments, and actually look for and persuade people of the truths. They’reuncomfortable so I fully understand why we’re not very often, but if we want toactually solve crises, we need to talk with actual truths in mind.
With prices of $200/month for “Max” plans, if one uses the tools well, acompany would in fact be making a smart decision to get their developers usingthem. They are definitely below cost, probably by at least 3-5x. Maybe 10x.(Remember that a price shock will come at some point before depending on theeconomics of these systems in existential ways for a business.)
Even at cost the math works out for a great many use cases.
Light plans are $20/month, and I think that for intermittent use, with goodtime sharing, that’s quite sustainable. In my experimentation I’m paying evenless than that, and while I don’t think those prices will be sustained, I don’tthink they’re impossible either.
Most of the big providers and almost all of the hosted open model providershave a pay-by-the-token API option. This is an unpackaged a-la-carte offering,in the style of cloud providers. They nickle and dime you. The model whiletransparent is hard to calculate. The usual rates are in prices per millioninput tokens and per million output tokens. Input tokens are cheaper, butinteractions with tools will re-send them over and over so you get charged forthem multiple times. Output tokens are more expensive but closer to one-timethings. Expensive models can be $25 per million output tokens and $5 permillion input tokens (Claude Opus 4.6). I expect this reflects a decent marginon the true costs, but I have not a ton to back this expectation up. Most openmodels run in the realm of $0.50-$3 per million input tokens and $1-$5 permillion output tokens. Given that a lot of the open models are run by companieswith no other business than running models, I expect these represent near truefinancial costs. There’s no other business nor investment to hide anycomplexity in.
Most of the tools can talk to most of the models in some way. Usually each hasa preferred model provider, and doing anything else will be a lesson inconfiguration files and API keys. Some more so than others.
Most of the tools are rougly as secure as running some curl | bash command.They kinda try to mitigate the damage that could happen, but not completely,and it’s a losing battle with fundamentally insecure techniques. Keep this inmind. There are ways to mitigate it (do everything in containers) but you willneed to be quite competent with at least Docker to make that happen. I havenot, I’m going for being a micromanaging busybody and not using anythingresembling “YOLO mode”. I also back everything up and am not giving permissionto write to remote repos, just local directories.
I know terminal-based tools more than IDEs, though I’ll touch on IDE-integratedthings a bit. I haven’t used any web-based tools. I grew up in terminals andthat’s kinda my jam.
- Claude Code is widely acknowledgedas best in class, has a relatively good permission model, and lots of toolshook into it. It’s the only tool Anthropic allows with their basic consumersubscriptions. If you want to use other tools with Claude, you have to pay bythe token. Can use other models, but it’s a bit of a fight, and a lot of APIsdon’t support Anthropic’s “Messages” API yet.
- OpenAI Codex is OpenAI’s tooling. It’s gotdecent sandboxing, so that what the model suggests to run can’t escape andtrash your system nearly so easily. It’s not perfect but it’s quite a bitbetter than the rest. It’s a bit of a fight to use other models.
- OpenCode touts itself as open source, when in realitymost stuff is. It’s a bit less “please use my company’s models” than mosttools, and it’s the tool I’ve had the best luck with. It has two modes — Buildand Plan — and using them both is definitely a key to using the tool well.Plan mode creates documents and written plans. Build does everything else andactually changes files on disk.
- Kilo Code is both a plugin for VS Code, and a toolin the terminal. It has not just two modes but five, and more can becustomized. “Code”, “Architect”, “Ask”, “Debug”, and “Orchestrator”.Orchestrator mode is interesting in thatit’s using one stream of processingwith one set of prompts to evaluate the output of other modes. This shouldallow more complex tasks without failing because there’s a level ofoversight. I’ve not used this yet, but I will be experimenting more. Itspermission model is pretty laughable but at least it starts out asking you ifit can run commands instead of just doing it.
- Charmbracelet Crush isaesthetically cute but also infuriating, and it’s very insistent onadvertising itself in commit messages. I’ve not yet seen if I can make itstop, but it did make me switch back to OpenCode.
- Cursor — App and terminal tool. Requires an accountand using their models at least in part, though you can bring your own key touse models through other services.
- Cline — Requires an account. IDE plugins and terminal tools.
- TRAE — IDE with orchestration features. Intended tolet it run at tasks autonomously. I’ve not used it.
- Factory Droid. Requires an account. Can bring your own key.
- Zed. IDE editor, with support for a lot of providers and models.
TL;DR
I like OpenCode; Kilo Code and Charmbracelet Crush are runners-up. The(textual) user interface is decent in all three, and it’s not loud, it’s notfancy, but it’s pretty capable. At some point I’ll try orchestration and thenmaybe Kilo Code will win the day. You’re not stuck with just one tool either.
Antagonistic Structures and The Blurry JPEG of the Internet
At its core, you can think of LLMs as extremely tight lossy data compression.The idea that it is a “blurry jpeg of the internet” is not wrong in kind,though in scope it understates it. Data compression is essentially predictingwhat’s next, and that’s exactly what LLMs do. Very different specifics, but inthe end, small bits of stuff go in, large outputs come out. It’s also “fancyautocomplete”, but that too undersells it because when you apply antagonisticindependent chains of thought on top, you get some much more useful emergentbehavior.
A pattern that you have to internalize is that while lots of these tools andmodels are sloppy and error-prone, anything you can do to antagonize that intobeing better will be helpful. This is the thing where I show you how LLM codetools can be a boon to an engineer who wants to do things well. Suddenly, wehave a clear technical reason to document everything, to use a clear typesystem, to clarify things with schemas and plans, to communicate technicaldirection before we’re in the weeds of editing code. All the things thatdevelopers are structurally pushed to do less of, even though they’re always anet win, are rewarded.
You will want your LLM-aided code to be heavily tested. You will want dataformats fully described. You will want eery library you use to have accuratedocumentation. You will use it. Your tools will use it.
You will want linters. You will want formatters. Type systems help.
This pattern goes deep, too. Things like Kilo Code’s “Orchestrator” mode andsome of Claude Code’s features work as antagonistic checks on other models.When one model says “I created the code and all the tests pass” by deleting allthe failing tests, the other model which is instructed to be critical will say“no, put that back, try again”.
One of the big advances in models was ‘reasoning’ which is internally a similarthing: If you make a request, the model is no longer simply completing what youprompted, but instead having several internal chains of thought approaching itcritically, and then when some threshold is met, continuing on completing fromthere. All the useful coding models are reasoning models. The model internallyantagonizes itself until it produces something somewhat sensible. Repeat asneeded to get good results.
Even then, with enough runtime, Claude will decide that the best path forwardis to silence failing tests, turn off formatters, or put in comments saying//implement later for things that aren’t enforced. Writing code with thesetools is very much a management task. It’s not managing people, but sometimesyou will be tempted to think so.
The Conservative Pressure
So here’s the thing about LLMs. They’re really expensive to train.
There’s two phases: “pre-training” (which is really more building the rawmodel, it’s most of training), and “post-training” (tailoring a general modelinto one for certain kind of tasks).
Models learn things like ‘words’ and ‘grammar’ in pre-training, along withembedded, fuzzy knowledge of most things in their training set.
Post-training can sort-of add more knowledge, giving it a refresher course inwhat happened since it came out. There’s always a lag, too. It takes time totrain models.
The thing is though that the models really do mostly know only about what theywere trained on. Any newer information almost certainly comes from searches themodel and tools together do, and stuffs into the context window of the currentsession, but it doesn’t know anything, really.
The hottest new web framework of 2027 will not in fact be new, because themodels don’t know about it and won’t write code for it.
Technology you can invent from first principles will work fine. Technology thatexisted and was popular in 2025 will be pretty solid. Something novel or niche,code generation will go of the rails much more easily without a lot of tooling.
This is, in the case of frontend frameworks, maybe a positive development inthat the treadmill of new frameworks is a long-hated feature of a difficult tosimplify problem space for building things that real people touch.
In general however, it will be a force for conservatism in technology. Expecteverything to be written in boring, as broken as it ever was in 2025 ways for awhile here.
They’re Making Bullets in Gastown
There’s a sliding scale of LLM tools, with chats on one end and fullorchestration systems of independent streams of work being managed by yet moreLLM streams of work at the other. The most infamous of this isGastown,which is a vibe-coded slop-heap of “what if we add more layers of management,all LLM, and let it burn through tokens at a prodigious rate?”
Automating software development as a whole will look a lot like this - ifemployers want to actually replace developers, this is what they’ll do. Withmore corporate styles and less vibe coded “let’s turn it all to eleven” goingon.
Steve’s point in the Gastown intro is that most people arne’t ready for gastownand it may eat their lunch, steal their baby and empty their bank account. Thisis true. Few of us are used to dealing with corporate amounts of money andeffort and while we think a lot about the human management of it, we don’tusually try to make it into a money-burning code-printer. I think there’s a lotof danger for our whole world here. Unfettering business has never yieldedunambiguously good results.
Other tools like this are coming. While I was writing this,multi-claude was released, andthere’s more too:Shipyard,Supacode, everyone excited about replacing people isbuilding tools to burn more tokens faster with less human review. They’rewriting breathless articles and hand-waving about the downsides (or assumingthey can throw more LLMs at it to fix problems.)
I personally want little part in this.
Somewhere much futher down the scale of automation is things like my friendDavid’s claude-reliabilityplugin, which is a pile of hacks to automate Claude Code to keep going when itstops for stupid reasons. Claude is trained on real development work and “I’lldo it later” is entirely within its training set. It really does stop and puttodos on the hard parts. A whack upside the head and telling it to keep goingsure helps make it make software that sucks less.
Automating the automation is always going to be a little bit of what’s goingon. Just hopefully with some controls and not connecting it to a money-funneland saying full speed ahead on a gonzo clown car of violence.
There’s a lot of this sortof thing.
The Labor Issue
The labor left has had its sights on AI for a while as the obvious parallel tothe steam-looms that reshaped millwork from home craft to extractive industry.We laud the Luddites, who, contrary to popular notions about them were notanti-technology per se, they just saw the extractive nature of businesses usingthese machines, turning a craft one might make a small profit at to a job wherepeople get used up, mind and body, and exhausted. They destroyed equipment andtried to make a point. In the end they had only moderate success, though theyand the rest of the labor movement won us such concepts as “the weekend” and“the 8 hour day”.
Even the guy who madeGastown sees howextractive businesses can - or even must! - be. Maybe especially that guy.We’re starting to see just how fast we can get the evil in unfettered business,capital as wannabe monopolists, to show itself.
Ethan Marcotte knows what’s up: We need tounionize.. That’sone of the only ways out of this mess. We, collectively, have the power. Butonly collectively. We don’t have to become the protectionist unions of old, butwe need to start saying “hey no, we’re not doing that” en masse for the partsthat bring harms. We need to say “over my dead body” when someone wants to runroughshod over things like justice, equality, and not being a bongo-playingextractive douchecanoe. We’ve needed to unionize for a long time now, and notto keep wages up but because we’re at the tip of a lot of harms, and we need tostop them. The world does not have to devolve into gig work and wideninginequality.
Coding with automated systems like this is intoxicating. It’s addictive,because it’s the lootbox effect. We don’t get addicted to rewards. We getaddicted to potential rewards. Notice that gamblers aren’t actually motivatedby having won. They’re motivated by maybe winning next time. It can lead usto the glassy eyed stare with a bucket of quarters at a slot machine, and itcan lead us to 2am “one more prompt, maybe it’ll work this time” in a hurry. I sure did writing webtty.
There’s Something About Art…
Software, while absolutely an art in many ways, is built on a huge commons ofopen work done by million of volunteers. This is not unambiguously always good,but the structure of this makes the ethics of code generation more complex andnuanced than it is for image generation, writing generation, and videogeneration. We did in fact put a ton of work out there with a license that says“Free to use for any purpose”. Not to say every scrape of github was ethical atall: I’m sure AGPL code was snaffled up with the rest, and ambiguously ornon-permissively licensed too. It is however built on a massive commons whereany use is allowed. The social status quo was broken, but the legal line atleast is mostly in the clear. (Mostly. This is only a tepid defense of some ofthe AI company scrapes.)
AI image generation and video generation can get absolutely fucked. It wasalready hard to make it as an artist because the value of art is extremely hardto capture. And we broke it. Fuck the entire fucking AI industry for this and Ihope whoever decided to make it a product first can’t sleep soundly for therest of their life. I hope every blog post with a vacuously related image withno actual meaning finds itself in bit rot with alacrity.
Decoding the Discourse
It’s helpful to know that the words used to describe “AI” systems are wildlyinconsistent in how people use them. Here’s a bit of a glossary.
Agent:
- A separate instance of a model with a task assigned to it.
- A coding tool.
- A tool of some kind for a user to use but that can operate in the background in some way.
- A tool models can invoke
- A service being marketed that uses AI internally.
- A tool that other agents can use.
Agentic: in some way related to AI.
Orchestration: Yo dawg I heard you liked AI in your AI, so I put AI in yourAI so your AI can AI while you AI your AI.
Vibe Coding:
- coding with LLMs.
- using LLMs to write code without looking or evaluating the results.
A coda
In the time it took to write this over a week or more, Claude Opus 4.5 gave wayto Claude Opus 4.6. GLM-4.7 was surpassed by GLM-5 just today as I write thisbit, but z.ai is now overloaded trying to bring it online and has no sparecomputing power. All my tools have had major updates this week. The pace ofchange is truly staggering. This is not a particularly good thing.
I may edit this article over time. No reason we can’t edit blog posts, youknow. Information keeps changing with new data and context.
Now go out there and try your best to make the world better.