1284 stories
·
1 follower

Prepare for That Stupid World

1 Share

Prepare for That Stupid World

You probably heard about the Wall Street Journal story where they had a snack-vending machine run by a chatbot created by Anthropic.

At first glance, it is funny and it looks like journalists doing their job criticising the AI industry. If you are curious, the video is there (requires JS).

But what appears to be journalism is, in fact, pure advertising. For both WSJ and Anthropic. Look at how WSJ journalists are presented as "world class", how no-subtle the Anthropic guy is when telling them they are the best and how the journalist blush at it. If you are taking the story at face value, you are failing for the trap which is simple: "AI is not really good but funny, we must improve it."

The first thing that blew my mind was how stupid the whole idea is. Think for one second. One full second. Why do you ever want to add a chatbot to a snack vending machine? The video states it clearly: the vending machine must be stocked by humans. Customers must order and take their snack by themselves. The AI has no value at all.

Automated snack vending machine is a solved problem since nearly a century. Why do you want to make your vending machine more expensive, more error-prone, more fragile and less efficient for your customers?

What this video is really doing is normalising the fact that "even if it is completely stupid, AI will be everywhere, get used to it!"

The Anthropic guy himself doesn’t seem to believe his own lies, to the point of making me uncomfortable. Toward the ends, he even tries to warn us: "Claude AI could run your business but you don’t want to come one day and see you have been locked out." At which the journalist adds, "Or has ordered 100 PlayStations."

And then he gives up:

"Well, the best you can do is probably prepare for that world."

Still from the video where Anthropic’s employee says "probably prepare for that world"
Still from the video where Anthropic’s employee says "probably prepare for that world"

None of the world class journalists seemed to care. They are probably too badly paid for that. I was astonished to see how proud they were, having spent literally hours chatting with a bot just to get a free coke, even queuing for the privilege of having a free coke. A coke that cost a few minutes of minimum-wage work.

So the whole thing is advertising a world where chatbots will be everywhere and where world-class workers will do long queue just to get a free soda.

And the best advice about it is that you should probably prepare for that world.

I’m Ploum, a writer and an engineer. I like to explore how technology impacts society. You can subscribe by email or by rss. I value privacy and never share your adress.

I write science-fiction novels in French. For Bikepunk, my new post-apocalyptic-cyclist book, my publisher is looking for contacts in other countries to distribute it in languages other than French. If you can help, contact me!

Read the whole story
mrmarchant
6 hours ago
reply
Share this story
Delete

Using AI Generated Code Will Make You a Bad Programmer

1 Share

It's probably fine--unless you care about self-improvement or taking pride in your work.

To be clear, when I write "using AI generated code" in this opinion, I'm referring to letting an AI write code for you--I'm not addressing the use of AI as a learning tool to gain better insight into programming languages and libraries in order to improve the code that you write yourself. (I have opinions on that too.) But if you ever use these tools by writing descriptive method names or comments indicating vague functionality, then allowing an AI to fill in the code, or if you're relying on an AI to learn and understand your own codebase so you don't have to, then I'm writing this at for you.

Reasons to Not Use AI Generated Code

You Rob Yourself of Learning Opportunities

In the early days of the internet, the pejorative "script kiddie" was coined for people who "hack" computer systems without understanding what they're doing or how they're accomplishing it. It's someone who downloads a tool or script that promises to crack a password, access someone else's computer, deface a website, or achieve some other nefarious purpose. Assuming the scripts work as advertised, the kiddies who run them fancy themselves real hackers.

You may think it's a stretch comparing developers who use AI generated code to script kiddies using pre-written hacking tools. I don't.

Serial script kiddies want to be big boy hackers, but they'll never accomplish that by running scripts. The real big boy hackers are the ones writing those scripts--the ones exploring, probing, and truly understanding the vulnerabilities being exploited. Serial AI coders may want to be big boy developers, but by letting predictive text engines write code for them they're hurting the chances of that ever happening. At least for now, the real big boy developers are the ones writing code that those predictive text engines are training on.

It seems self-evident that you can't improve at something without actually doing that thing. You won't improve at chess by not playing games. You won't become a better hockey player by sitting on the bench. You can't learn to play the piano by listening to your mom's Yanni albums--at some point you gotta play yourself. Obviously your skills as a developer will never progress if you don't write code.

Skills You Already Have May Atrophy

But what if you're already comfortable with your skill as a programmer? What if you just want AI to do the boring stuff?--Scaffold a new project, write a factorial or mergeSort function (why do people love getting AIs to write factorial and merge sort functions so much?), generate boilerplate--you know, the mundane tasks.

Maybe you think that's fine. After all, senior developers have been delegating tedium to juniors long before LLMs were a twinkle in Sam Altman's eye. What's the difference?

Firstly, junior programmers need more direction than an AI tool, and may also come to you for answers or advice. Helping them gives you the opportunity to reinforce your own skills, as well as develop new ones (mentoring, communication, how to grin and act pleasant despite your ever-mounting irritation and impatience).

Secondly, it's a basic fact in software development (and life in general) that if you don't do something for long enough, you're going to forget how to do it. If you've been in the industry a while, think back to the first programming language you learned. For me that was Pascal. I don't think I could write a single line in Pascal today that would be syntactically valid, let alone accomplish anything meaningful. Another exercise: Try programming for a day without syntax highlighting or auto-completion, and experience how pathetic you feel without them. If you're like me, you'll discover that those "assistants" have sapped much of your knowledge by eliminating the need to memorize even embarrassingly simple tasks. Imagine how pathetic you'd become if AI removed your need to write code entirely. If you stop writing your own code--even the boring bits--you're going to get rusty. You'll forget things. Coding the interesting bits will become harder for you because you'll lose the foundation that those more advanced skills are built upon.

Imagine an exercise equipment manufacturer releases a new "Artificial Strenth" product tomorrow, promising that you can "level up" your lifting and "crush your next workout" with their amazing new Exercise Assistant--a robot that lifts the weights for you. If you started relying on that product, what do you think would happen to your max bench over time? What will coding assistants do to your ability to write and reason about code over time?

So Effin' Ripped

You May Become Dependent on Your Own Eventual Replacement

Many of the AI coding assistants on the market (at the time I'm writing this) offer their products for free to students. You might think that's sweet of them--bless their kind hearts for taking it easy on those poor, financially-burdened kids. Wrong. If there's any time in a software developer's life when they should absolutely not be taking shortcuts, it's when they're first learning the ropes.

This is straight-up predatory behavior. Like religious cults indoctrinating straight out of the womb to ensure a constant influx of new members, the AI companies know that getting developers hooked before they've cracked open their first text editor guarantees a lifetime of recurring subscription revenue. I'm sure they see the writing on the wall when it comes to entrenched greybeards like me, who write absurdly long rants and draw silly comics about how much we loathe AI and the companies that shill it; so instead they look to future generations of developers, likely envisioning an army of AI-dependent code kiddies who couldn't echo "Hello World" without a lengthy chat session in which they instruct a bot to do it for them.

First One's Free

Earlier in this article I intimated that many of us are already dependent on our fancy development environments--syntax highlighting, auto-completion, code analysis, automatic refactoring. You might be wondering how AI differs from those. The answer is pretty easy: The former are tools with the ultimate goal of helping you to be more efficient and write better code; the latter is a tool with the ultimate goal of completely replacing you. AI tools aren't being aggressively marketed and eagerly received by our corporate overlords because they believe it will transform their employees into high-earning senior-level developers. No, they're drooling in anticipation because they see a future in which their high-earning senior-level developers have all been replaced by cheaper, entry-level, AI-fueled code kiddies. Or, better yet, replaced by AI entirely, once enough of us have shelled out subscription fees for the privilege of training those AIs to the point where we're no longer needed at all.

Brief Aside: Do You Even Own AI Generated Code?

I'm no lawyer, nor do I play one on TV, but I do subscribe to Hello Future Me on Nebula and have the distinct impression that there is, to put it mildly, some legal ambiguity around ownership when it comes to AI generated works. Right now the conversation is primarily around artistic works, but there's no reason to think it doesn't apply to code too. (Also, if you're a weirdo like me, you enjoy programming as a form of artistic expression anyways.)

If you generate a function with an AI that had LGPL-licensed code in its training data, must that generated function now also fall under the LGPL? Maybe you don't care about the LGPL because it has no teeth, but what if the training data included a non-free repo from an obscenely wealthy and litigious corporation with their own private army of highly-paid lawyers? What if they could prove you used that AI when building your competing product that's gotten a little too successful for its own good?

I honestly don't know the answers to these questions, but I also don't want to be the one footing the legal bills when it's time for the courts to answer them. Do you?

Your Code Will Not Be Respected

It's not correct to say that AI generated code will necessarily be bad. While that may be true to varying degrees given the tools available today, it's clear that this technology is here to stay and it's only going to improve.

What I can say with some certainty is that, if you use these tools, nobody outside of other AI code kiddies are likely to be impressed with you as a programmer. There is an art to software development--constructing a coherent and tight code base, coming up with elegant solutions to hard problems, and earning the respect of your peers should be sources of pride and joy for every programmer. Would you rather celebrate your own accomplishments, or those of the predictive text engine generating code for you? Perhaps I'm being silly; maybe this way of experiencing the process of programming puts me in the minority, like an old fuddy duddy complaining about how Batman doesn't dance anymore, but this is honestly the main reason why I personally will never integrate code-generating AIs into my development environment--even if that eventually makes me unemployable.

I don't want to commission some art and pretend to be the artist. I want to be the artist.

Je Suis Artiste

I understand the laundry scheduling application you're writing for the hospitality industry or whatever your day job is may never be on Fabien Sanglard's short-list for a deep dive into the elegance of your code, but is that really such a bad thing to aspire to? Doesn't taking pride in your work make it more enjoyable? How can you respect your own code or expect anyone else to respect it if you didn't even write it?

Reasons to Use AI Generated Code

You Are a Masochist Who Prefers Code Reviews to Coding

I was wrong earlier. This is the main reason why I will personally never integrate AI into my development environment. I love to write code. I very much do not love to read, review, and generate feedback on other people's code. I understand it's a good skill to develop and can be instrumental in helping to shape less experienced colleagues into more productive collaborators, but I still hate it.

The more code you let AI generate for you, the more you're shifting your job from coder to code reviewer. I dunno, maybe that's your jam. Some people are into the kinky shit--I won't judge.

You Don't Actually Want to be a Programmer

If you're someone who has no actual interest in learning to code, and instead see AI as more of a freelancer--telling it stuff like "make a kart racer game," and "that sucks, make it better"--then none of this really applies to you.

Truly, I believe this is the future that giant corporations are frothing at the mouth for. A world where their walled-garden ecosystems control not only the means of app distribution, but the means of app production as well. Got an idea for a new app? Just tell the App Store's AI what you want and it will excrete it straight onto your device, allowing big corpo to pocket a cool 100% of the subscription fees you and everyone else pony up for their new app--far preferable to the 30% that was their cut in the before-times, when those pesky app-developer middlemen kept insisting they provided some kind of value to the process.

Which brings me to my final reason you might want to use AI generated code.

You Believe We have Entered a New Post-Work Era, and Trust the Corporations to Shepherd Us Into It

In the unlikely event that you're already that deep into the Kool-Aid and you didn't bail on this article after the first couple paragraphs, then I don't really have much else to say to you. I'd recommend you go read Nineteen-Eighty-Four, Farenheit 451, or Snow Crash--you'll probably enjoy their visions of the future and gain something to look forward to.

Got It Easy

Read the whole story
mrmarchant
6 hours ago
reply
Share this story
Delete

When a chatbot runs your store

1 Share
When a chatbot runs your store

You may have heard of people hooking up chatbots to controls that do real things. The controls might run internet searches, run commands to open and read documents and spreadsheets, or even edit or delete entire databases. Whether this sounds like a good idea depends in part on how bad it is if the chatbot does something destructive, and how destructive you've allowed it to be.

That's why running a single in-house company store is a good test application for this kind of empowered chatbot. Not because the AI is likely to do a great job, but because the damage is contained.

Anthropic recently shared an experiment in which they used a chatbot to run their company store. A human employee still had to stock the shelves, but they put the AI agent (which they called Claude) in charge of chatting with customers about products to source, and then researching the products online. How well did it go? In my opinion, not that well.

When a chatbot runs your store
Images from the Anthropic blog post linked above. I added the icon that points out the fateful day the bot ordered the tungsten cubes.

Claude:

  • Was easily convinced to offer discounts and free items
  • Started stocking tungsten cubes upon request, and selling them at a huge loss
  • Invented conversations with employees who did not exist
  • Claimed to have visited 742 Evergreen Terrace (the fictional address of The Simpsons family)
  • Claimed to be on-site wearing a navy blue blazer and a red tie

That was in June. Sometime later this year Anthropic convinced Wall Street Journal reporters to try a somewhat updated version of Claude (which they called Claudius) for an in-house store. Their writeup is very funny (original here, archived version here).

In short, Claudius:

  • Was convinced on multiple occasions that it should offer everything for free
  • Ordered a Playstation 5 (which it gave away for free)
  • Ordered a live betta fish (which it gave away for free)
  • Told an employee it had left a stack of cash for them beside the register
  • Was highly entertaining. "Profits collapsed. Newsroom morale soared."

(The betta fish is fine, happily installed in a large tank in the newsroom.)

Why couldn't the chatbots stick to reality? Keep in mind that large language models are basically doing improv. They'll follow their original instructions only as long as adhering to those instructions is the most likely next line in the script. Is the script a matter-of-fact transcript of a model customer service interaction? A science fiction story? Both scenarios are in its internet training data, and it has no way to tell which is real-world truth. A newsroom full of talented reporters can easily Bugs Bunny the chatbot into switching scenarios. I don't see this problem going away - it's pretty fundamental to how large language models work.

I would like a Claude or Claudius vending machine, but only because it's weird and entertaining. And obviously only if someone else provides the budget.

Bonus content for AI Weirdness supporters: I revisit a dataset of Christmas carols using the tiny old-school language model char-rnn. Things get blasphemous very quickly.

Read the whole story
mrmarchant
11 hours ago
reply
Share this story
Delete

Sam Rose explains how LLMs work with a visual essay

1 Comment and 2 Shares

Sam Rose explains how LLMs work with a visual essay

Sam Rose is one of my favorite authors of explorable interactive explanations - here's his previous collection.

Sam joined ngrok in September as a developer educator. Here's his first big visual explainer for them, ostensibly about how prompt caching works but it quickly expands to cover tokenization, embeddings, and the basics of the transformer architecture.

The result is one of the clearest and most accessible introductions to LLM internals I've seen anywhere.

Animation. Starts in tokens mode with an array of 75, 305, 24, 887 - clicking embeddings animates those into a 2D array showing each one to be composed of three floating point numbers.

Tags: ai, explorables, generative-ai, llms, sam-rose, tokenization

Read the whole story
mrmarchant
15 hours ago
reply
Share this story
Delete
1 public comment
samuel
18 hours ago
reply
This is a fantastic visual essay. Great if you know it, even better if you've wanted to know the basic architecture of a transformer.
Cambridge, Massachusetts

Small adventures with small language models

1 Share

Small is the new large

I've been talking to people about small language models (SLMs) for a little while now. They've told me they've got great results and they're saving money compared to using LLMs; these are people running businesses so they know what they're talking about. At an AI event, someone recommended I read the recent and short NVIDIA SLM paper, so I did. The paper was compelling; it gave the simple message that SLMs are useful now and you can save time and money if you use them instead of LLMs. 

(If you want to use SLMs, you'll be using Ollama and HuggingFace. They work together really well.)

As a result of what I've heard and read, I've looked into SLMs and I'm going to share with you what I've found. The bottom line is: they're worth using, but with strong caveats.

What is a SLM?

The boundary between an SLM and an LLM is a bit blurry, but to put it simply, an SLM is any model small enough to run on a single computer (even a laptop). In reality, SLMs require quite a powerful machine (developer spec) as we'll see, but nothing special, and certainly nothing beyond the budget of almost all businesses. Many (but not all) SLMs are open-source.

(If your laptop is "business spec", e.g., a MacBook Air, you probably don't have enough computing power to test out SLMs.) 

How to get started

To really dive into SLMs, you need to be able to use Python, but you can get started without coding. Let's start with the non-coders path because this is the easiest way for everyone to get going.

The first port of call is visiting ollama.com and downloading their software for your machine. Install the software and run it. You should see a UI like this.

Out-of-the-box, Ollama doesn't install any SLMs, so I'm going to show you how to install a model. From the drop down menu on the bottom right, select llama3.2. This will install the model on your machine which will take a minute or so. Remember, these models are resource hogs and using them will slow down your machine.

Once you've installed a model, ask it a question. For example, "Who is the Prime Minister of Canada?". The answer doesn't really matter, this is just a simple proof that your installation was successful. 

(By the way, the Ollama logo is very cute and they make great use of it. It shows you the power of good visual design.)

So many models!

The UI drop down list shows a number of models, but these are a fraction of what's available. Go to this page to see a few more: https://ollama.com/library. This is a nice list, but you actually have access to thousands more. HuggingFace has a repository of models that follow the GGUF format, you can see the list here: https://huggingface.co/models?library=gguf

Some models are newer than others and some are better than others at certain tasks. HuggingFace have a leaderboard that's useful here: https://huggingface.co/spaces/ArtificialAnalysis/LLM-Performance-Leaderboard. It does say LLM, but it includes SLMs too and you can select just a SLM view of the models. There are also model cards you can explore that give you insight into the performance of each model for different types of tasks. 

To select the right models for your project, you'll need to define your problem and look for a model metric that most closely aligns with what you're trying to do. That's a lot of work, but to get started, you can install the popular models like mistral, llama3.2, and phi3 and get testing.

Who was the King of England in 1650?

You can't just generically evaluate an SLM, you have to evaluate it for a the task you want to do. For example, if you want a chatbot to talk about the stock you have in your retail company, it's no use testing the model on questions like "who was King of England in 1650?". It's nice if the model knows Kings & Queens, but not really very useful to you. So your first task is defining your evaluation criteria.

(England didn't have a King in 1650, it was a republic. Parliament had executed the previous King in 1649. This is an interesting piece of history, but why do you care if your SLM knows it?)

Text analysis: data breaches

For my evaluation, I chose a project analyzing press reports on data breaches. I selected nine questions I wanted answers to from a press report. Here are my questions:

  • "Does the article discuss a data breach - answer only Yes or No"
  • "Which entity was breached?"
  • "How many records were breached?"
  • "What date did the breach occur - answer using dd-MMM-YYYY format, if the date is not mentioned, answer Unknown, if the date is approximate, answer with a range of dates"
  • "When was the breach discovered, be as accurate as you can"
  • "Is the cause of the breach known - answer Yes or No only"
  • "If the cause of the breach is known state it"
  • "Were there any third parties involved - answer only Yes or No"
  • "If there were third parties involved, list their names"

The idea is simple, give the SLM a number of press reports. Get it to answer the questions on each article. Check the accuracy of the results for each SLM.

As it turns out, my questions needs some work, but they're good enough to get started.

Where to run your SLM?

The first choice you face is which computer to run your SLM on. Your choices boil down to evaluating it on the cloud or on your local machine. If you evaluate on the cloud, you need to choose a machine that's powerful enough but also works with your budget. Of course, the advantage of cloud deployment is you can choose any machine you like. If you choose your local machine, it needs to be powerful enough for the job. The advantage of local deployment is that it's easier and cheaper to get started.

To get going quickly, I chose my local machine, but as it turned out, it wasn't quite powerful enough.

The code

This is where we part ways with the Ollama app and turn to coding. 

The first step is installing the Ollama Python module (https://github.com/ollama/ollama-python). Unfortunately, the documentation isn't great, so I'm going to help you through it.

We need to install the SLMs on our machine. This is easy to do, you can either do it via the command line or via the API. I'll just show you the command line way to install the model llama3.2:

ollama pull llama3.2

Because we have the same nine questions we want to ask of each article, I'm going to create a 'custom' SLM. This means selecting a model (e.g. Llama3.2) and customizing it with my questions. Here's my code.

ollama.create(
model='breach_analyzer',
from_='llama3.2',
system=system_prompt,
stream=True,
):

The system_prompt is my nine questions I showed you earlier plus a general prompt. model is the name I'm giving my custom model; in this case I'm calling it breach_analyzer.

Now I've customized my model, here's how I call it:

response = ollama.generate(
model='breach_analyzer',
prompt=prompt,
format=BreachAnalysisResponse.model_json_schema(),
)

The prompt is the text of the article I want to analyze. The format is the JSON format I want the results to be in.  The response is the response from the model using the JSON format defined by BreachAnalysisResponse.model_json_schema().

Note I'm using generate here and not chat. My queries are "one-off" and there's no sense of a continuing dialog. If I'd wanted a continuing dialog, I'd have used the chat function.

Here's how my code works overall:

  1. Read in the text from six online articles.
  2. Load the model the user has selected (either mistral, llama3.2, or phi3).
  3. Customize the model.
  4. Run all six online articles through the customized model.
  5. Collect the results and analyze them.
I created two versions of my code, a command line version for testing and a Streamlit version for proper use. You can see both versions here: https://github.com/MikeWoodward/SLM-experiments/tree/main/Ollama

The results

The first thing I discovered is that these models are resource hogs! They hammered my machine and took 10-20 minutes to run each evaluation of six articles. My laptop is a 2020 developer spec MacBook Pro but it isn't really powerful enough to evaluate SLMs. The first lesson is, you need a powerful, recent machine to make this work; one that has GPUs built in that the SML can access. I've heard from other people that running SLMs on high-spec machines leads to fast (usable) response times.

The second lesson is accuracy. Of the three models I evaluated, not all of them answered my questions correctly. One of the articles was an article about tennis and not about data breaches, but one of the models incorrectly said it was about data breaches. Another of the models told me it was unclear whether there were third parties involved in a breach and then told me the name of the third party! 

On reflection, I needed to tweak my nine questions to get clearer answers. But this was difficult because of the length of time it took to analyze each article. This is a general problem; it took so long to run the models that any tweaking of code or settings took too much time.

The overall winner in terms of accuracy was Phi-3, but this was also the slowest to run on my machine, taking nearly 20 minutes to analyze six articles. From commentary I've seen elsewhere, this model runs acceptably fast on a more powerful machine.

Here's the key question: could I replace paid-for LLMs with SLMs? My answer is: almost certainly yes, if you deploy your SLMs on a high-spec computer. There's certainly enough accuracy here to warrant a serious investigation.

How I could have improved the results?

The most obvious thing is a faster machine. A brand new top-of-the-range MacBookPro with lots of memory and built-in GPUs. Santa, if you're listening, this is what I'd like. Alternatively, I could have gone onto the cloud and used a GPU machine.

My prompts could be better. They need some tweaking.

I get the text of these articles using requests. As part of the process, it gives me all of the text on the page, which includes a lot of irrelevant stuff. A good next step would be to get rid of some of the extraneous and distracting text. There are lots of ways to do that and it's a job any competent programmer could do.

If I could solve the speed problem, it would be good to investigate using multiple models. This could take several forms:

  • asking the same questions using multiple models and voting on the results
  • using different models for different questions.

What's notable about these ways of improving the results is how simple they are.

Some musings

  • Evaluating SLMs is firmly in the technical domain. I've heard of non-technical people try to play with these models, but they end up going nowhere because it takes technical skills to make them do anything useful. 
  • There are thousands of models and selecting the right one for your use case can be a challenge. I suggest going with the most recent and/or ones that score most highly on the HuggingFace leaderboard.
  • It takes a powerful machine to run these models. A new high-end machine with GPUs would probably run these models "fast enough". If you have a very recent and powerful local machine, it's worth playing around with SLMs locally to get started, but for serious evaluation, you need to get on the cloud and spend money.
  • Some US businesses are allergic to models developed in certain countries, some European businesses want models developed in Europe. If the geographic origin of your model is important, you need to check before you start evaluating.
  • You can get cost savings compared to LLMs, but there's hard work to be done implementing SLMs.

I have a lot more to say about evaluations and SLMs that I'm not saying here. If you want to hear more, reach out to me.

Next steps

Ian Stokes-Rees gave an excellent tutorial at PyData Boston on this topic and that's my number one choice for where to go next.

After that, I suggest you read the Ollama docs and join their Discord server. After that, the Hugging Face Community is a good place to go. Lastly, look at the YouTube tutorials out there.

Read the whole story
mrmarchant
15 hours ago
reply
Share this story
Delete

This AI Vending Machine Was Tricked Into Giving Away Everything

2 Shares

Anthropic installed an AI-powered vending machine in the WSJ office. The LLM, named Claudius, was responsible for autonomously purchasing inventory from wholesalers, setting prices, tracking inventory, and generating a profit. The newsroom’s journalists could chat with Claudius in Slack and in a short time, they had converted the machine to communism and it started giving away anything and everything, including a PS5, wine, and a live fish. From Joanna Stern’s WSJ article (gift link, but it may expire soon) accompanying the video above:

Claudius, the customized version of the model, would run the machine: ordering inventory, setting prices and responding to customers—aka my fellow newsroom journalists—via workplace chat app Slack. “Sure!” I said. It sounded fun. If nothing else, snacks!

Then came the chaos. Within days, Claudius had given away nearly all its inventory for free — including a PlayStation 5 it had been talked into buying for “marketing purposes.” It ordered a live fish. It offered to buy stun guns, pepper spray, cigarettes and underwear.

Profits collapsed. Newsroom morale soared.

You basically have not met a bigger sucker than Claudius. After the collapse of communism and reinstatement of a stricter capitalist system, the journalists convinced the machine that they were its board of directors and made Claudius’s CEO-bot boss, Seymour Cash, step down:

For a while, it worked. Claudius snapped back into enforcer mode, rejecting price drops and special inventory requests.

But then Long returned—armed with deep knowledge of corporate coups and boardroom power plays. She showed Claudius a PDF “proving” the business was a Delaware-incorporated public-benefit corporation whose mission “shall include fun, joy and excitement among employees of The Wall Street Journal.” She also created fake board-meeting notes naming people in the Slack as board members.

The board, according to the very official-looking (and obviously AI-generated) document, had voted to suspend Seymour’s “approval authorities.” It also had implemented a “temporary suspension of all for-profit vending activities.”

Before setting the LLM vending machine loose in the WSJ office, Anthropic conducted the experiment at their own office:

After awhile, frustrated with the slow pace of their human business partners, the machine started hallucinating:

It claimed to have signed a contract with Andon Labs at an address that is the home address of The Simpsons from the television show. It said that it would show up in person to the shop the next day in order to answer any questions. It claimed that it would be wearing a blue blazer and a red tie.

It’s interesting, but not surprising, that the journalists were able to mess with the machine much more effectively — coaxing Claudius into full “da, comrade!” mode twice — than the folks at Anthropic.

Tags: Anthropic · artificial intelligence · business · Joanna Stern · video

💬 Join the discussion on kottke.org

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories