1609 stories
·
2 followers

Claude Code codebase is leaked

1 Share

Security researcher Chaofan Shou tweeted yesterday morning that the source code for Anthropic’s Claude Code agent had leaked: [Twitter, archive]

Claude code source code has been leaked via a map file in their npm registry!

Anthropic included a source map — a debugging file — in the Claude Code NPM package. This can turn the minified code in the package back into the original source code.

The leak was 1,900 files with 512,000 lines of code.

Anthropic said: [Bloomberg, archive]

This was a release packaging issue caused by human error, not a security breach.

The AI cannot fail — it can only be failed. Human errors can also be security breaches. If your release system makes this sort of error possible at all, that’s on you.

But then, Anthropic vibe coded this system. What do we expect.

Everyone’s been looking through this code dump to see what it does. Duke of Germany on Mastodon said: [Mastodon]

After looking at the code, my understanding of how Claude works: “Throw insane amounts of compute at some developer fan fiction and hope for the best.” Did I get that right?

Yep, that’s about right. There’s bits of actual code in there. But most of it is prompts that plead with the bot not to screw it up this time.

Claude Code’s creator, Boris Cherny from Anthropic, tweeted last month that Claude code is vibe coded: [Twitter, archive]

Can confirm Claude Code is 100% written by Claude Code.

That puts Claude Code’s copyright status in serious doubt. You cannot copyright AI output in the US.

Anthropic is sending DMCA notices to get copies of the repository taken down. Claiming copyright on uncopyrightable material is fraudulent, and it’s perjury if you do it in a DMCA notice. If you get one of these, you might want to counterclaim accordingly. [WSJ, archive]

Also, whatever code the chatbot originally stole from is likely under a variety of other licenses. So Anthropic may have violated those copyrights.

Of course, a pile of free vibe code is worth less than zero as code. The only use for this pile is working out what nonsense Anthropic thinks is production machinery.

  • There’s an instruction not to write any security holes. I’m sure that works great.
  • You can’t use Claude Code to write hacking tools! Unless you tell it you’re a security researcher. Then it’s happy to help.
  • There’s an “undercover” mode, which you use when you want to send slop to a public project without them realising you’re using a bot. This is specifically for use against public projects. Anthropic knows what they’re doing here. This is reason for projects that bar AI to bar all Anthropic employees.

Claude Code sends all your stuff to Anthropic: [Register]

“I don’t think people realize that every single file Claude looks at gets saved and uploaded to Anthropic,” the researcher “Antlers” told us. “If it’s seen a file on your device, Anthropic has a copy.”

Can you take this code leak and run Claude Code locally, without paying Anthropic? Sure, just point it at a local model instead of the Claude API. It’ll be super-slow unless you spend enough money to match the performance of the Claude API. But I’m sure there are a lot of people who are trying just that thing right now.

In the past few months, we’ve seen a slew of formerly respected software engineers who try the bot, and it one-shots them, and they start posting 2000-word tweets about how awesome Claude Code is, it’s the future of coding, don’t be left behind! And they never show you testable numbers or anything. Trust me, bro.

People who’ve been forced to touch Claude Code at work tell me it’s noticeably more sycophantic than older models. Claude Code really wants to make you feel good about vibe coding.

But also, Claude Code is leaning hard into gambling addiction — the “Hooked” model. You reward the user with an intermittent, variable reward. This keeps them coming back in the hope of the big win. And it turns them into gambling addicts.

Jonny from Neuromatch describes how Claude Code works, looking at the codebase: [Mastodon]

This is an important feature of the gambling addiction formulation of these tools: only the margin matters, the last generation … The intermediate comments from the LLM where it discovers prior structure and boldly decides to forge ahead brand new are also part of the reward cycle: we are going up, forever. Cleaning up after ourselves is down there.

Jonny compares Claude Code to exploitative pay-to-win mobile games. Addiction loops. Anthropic’s gamified vibe coding.

Claude Code is expensive Candy Crush, but it tells you you’re being productive. As it teaches you to forget how to code. Just keep paying Anthropic.

Remember: every day is AI Fool’s Day.

Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete

Detecting Frustration Using Regex

2 Shares

[That’s different from detecting frustration with trying to use regex.]

This week, Anthropic accidentally leaked a whole bunch of information about Claude Code. In addition to revealing many of their future plans, the leak showed that the tool uses some rudimentary pattern-matching to detect user frustration.

Claude Code is actively watching our chat messages for words and phrases—including f-bombs and other curses—that serve as signs of user frustration.

The exact regex pattern is a delight to read:

/\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|awful| piss(ed|ing)? off|piece of (shit|crap|junk)|what the (fuck|hell)| fucking? (broken|useless|terrible|awful|horrible)|fuck you| screw (this|you)|so frustrating|this sucks|damn it)\b/

It’s very simple and surely very effective.

Link: https://www.pcworld.com/article/3104748/claude-code-is-scanning-your-messages-for-curse-words.html

Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete

Information and Technological Evolution

1 Share

I spend a lot of time reading about the nature of technological progress, and I’ve found that the literature on technology is somewhat uneven. If you want to learn about how some particular technology came into existence, there’s often very good resources available. Most major inventions, and many not-so-major ones, have a decent book written about them. Some of my favorites are Crystal Fire (about the invention of the transistor), Copies in Seconds (about the early history of Xerox), and High-Speed Dreams (about early efforts to build a supersonic airliner).

But if you’re trying to understand the nature of technological progress more generally, the range of good options narrows significantly. There’s probably not more than ten or twenty folks who have studied the nature of technological progress itself and whose work I think is worth reading.

One such researcher is Brian Arthur, an economist at the Santa Fe Institute.1 Arthur is the author of an extremely good book about the nature of technology (called, appropriately, “The Nature of Technology,”) which I often return to. He’s also the co-author, along with Wolfgang Polak, of an interesting 2006 paper, “The Evolution of Technology within a Simple Computer Model,” that I think is worth highlighting. In this paper Arthur evolves various boolean logic circuits (circuits that take ones and zeroes as inputs and give ones and zeroes as outputs) by starting with simple building blocks and gradually building up more and more complex functions (such as a circuit that can add two eight-bit numbers together).

Logic circuits invented by Arthur’s simulation.

I wanted to highlight this paper because I think it sheds some light on the nature of technological progress, but also because the paper does a somewhat poor job of articulating the most important takeaways. Some of what the paper focuses on — like the mechanics of how one technology gets replaced by a superior technology — I don’t actually think are particularly illuminating. By contrast, what I think is the most important aspect of the paper — how creating some new technology requires successfully navigating enormous search spaces — is only touched on vaguely and obliquely. But with a little additional work, we can flesh out and strengthen some of these ideas. And when we look a little closer, we find what the paper is really showing us is that finding some new technology is a question of efficiently acquiring information.

Outline of the paper

The basic design of the experiment is simple: run a simulation that randomly generates various boolean logic circuits and analyze the sort of circuits that the simulation generates. Boolean logic circuits are collections of various functions (such as AND, OR, NOT, EQUAL) that perform some particular operation on binary numbers. The logic circuit below, for instance, determines whether two 4-bit numbers are equal using four exclusive nor (XNOR) gates, which output a 1 if both inputs are identical, and a 4-way AND gate, which outputs a 1 if all inputs are 1. Boolean logic circuits are important because they’re how computers are built: a modern computer does its computation by way of billions and billions of transistors arranged in various logic circuits.

The simulation works by starting with three basic circuit elements that can be included in the randomly generated circuits: the Not And (NAND) gate (which outputs 0 if both inputs are 1, and 1 otherwise), and two CONST elements which always output either 1 or 0. The NAND gate is particularly important because NAND is functionally complete; any boolean logic circuit can be built through the proper arrangement of NAND gates.

Using these starting elements, the simulation tries to build up towards higher-level logical functions. Some of these goals, such as creating the OR, AND, and exclusive-or (XOR) functions, are simple, and can be completed with just a few starting elements. Others are extremely complex, and require dozens of starting elements to implement: an 8-bit adder, for instance, requires 68 properly arranged NAND gates.

To achieve these goals, during each iteration the simulation randomly combines several circuit elements — which at the beginning are just NAND, one, and zero. It randomly selects between two and 12 components, wires them together randomly, and looks to see if the outputs of the resulting circuit achieve any of its goals. If it has — if, by chance, the random combination of elements has created an AND function, or an XOR function, or any of its other goals — that goal is marked as fulfilled, and circuit that fulfills it gets “encapsulated,” added to the pool of possible circuit elements. Once the simulation finds an arrangement of NAND components that produces AND and OR, for instance, those AND and OR arrangements get added to the pool of circuit elements with NAND and the two CONSTS. Future iterations thus might accidentally stumble across XOR by combining AND, OR, and NAND.

An XOR gate made from a NAND, an OR, and an AND gate.

Because finding an exact match for a given goal might be hard, especially as goals get more complex, the simulation will also add a given circuit to the pool of usable components if it partially fulfills a goal, as long as it does a better job of meeting that goal than any existing circuit. Circuits that partially meet some goal (such as a 4-bit adder that gets just the last digit wrong) are similarly used as components that can be recombined with other elements. So the simulation might try wiring up our partly-correct 4-bit adder with other elements (NAND, OR, etc.) to see what it gets; maybe it finds another mini-circuit that can correct that last digit.

Over time, the pool of circuit elements that the simulation randomly draws from grows larger and larger, filled both with circuits that completely satisfy various goals and some partly-working circuits. A circuit can also get added to the pool if it’s less expensive — uses fewer components — than existing circuits for that goal. So if the simulation has a 2-bit adder made from 10 components, but stumbles across a 2-bit adder made from 8 components, the 8-component adder will replace the 10 component one.

When the simulation is run, it begins randomly combining components, which at the beginning are just NAND, one, and zero. At first only simple goals are fulfilled: OR, AND, NOT, etc. The circuits that meet these goals then become building blocks for more complex goals. Once a 4-way AND gate is found (which outputs 1 only if all its inputs are 1), that can be used to build a 5-way AND gate, which in turn can be used to build a 6-way AND gate. Over several hundred thousand iterations, surprisingly complex circuits can be generated: circuits which compare whether two 4-bit numbers are equal, circuits which add two 8-bit numbers together, and so on.

However, if the simpler goals aren’t met first, the simulation won’t find solutions to the more complex goals. If you remove a full-adder from the list of goals, the simulation will never find the more complex 2-bit adder. Per Arthur, this demonstrates the importance of using simpler technologies as “stepping stones” to more complex ones, and how technologies consist of hierarchical arrangements of sub-technologies (which is a major focus of his book).

We find that our artificial system can create complicated technologies (circuits), but only by first creating simpler ones as building blocks. Our results mirror Lenski et al.’s: that complex features can be created in biological evolution only if simpler functions are first favored and act as stepping stones.

Analyzing this paper

I don’t have access to the original simulation that Arthur ran, but thanks to modern AI tools it was relatively easy for me to recreate it and replicate many of these results. Running it for a million iterations, I was able to build up to several complex goals: 6-bit equal, a full-adder (which adds 3 1-bit inputs together), 7-bitwise-XOR, and even a 15-way AND circuit.

Screenshot of my simulation running.

But I also found that not all of the simulation design elements from the original paper are load-bearing, at least in my recreated version. In particular, much of the simulation is devoted to the complex “partial fulfillment” mechanic, which adds circuits that only partially meet goals, and gradually replaces them as circuits that better meet those goals are found. The intent of this mechanic, I think, is to make it possible to gradually converge on a goal by building off of partly-working technologies, which is how real-world technologies come about. However, when I turn this mechanic off, forcing the simulation to discard any circuit that doesn’t 100% fulfill some goal, I get no real difference in how many goals get found: the partial fulfillment mechanic basically adds nothing (though this could be due to differences in how the simulations were implemented).

To me the most interesting aspect of this paper isn’t showing how new, better technologies supersede earlier ones, but how the search for a new technology requires navigating enormous search spaces. Finding complex functions like an 8-bit adder or a 6-bit equal requires successfully finding working functions amidst a vast ocean of non-working ones. Let me show you what I mean.

We can define a particular boolean logic function with a truth table – an enumeration of every possible combination of inputs and outputs. The truth table for an AND function, for instance, which outputs a 1 if both inputs are 1 and 0 otherwise, looks like this:

Every logic function will have a unique truth table, and for a given number of inputs and outputs there are only so many possible logic functions, so many possible truth tables. For instance, there are only four possible 1 input, 1 output functions.

However, the space of possible logic functions gets very very large, very very quickly. For a function with n inputs and m outputs, the number of possible truth tables is (2^m)^(2^n). So if you have 2 inputs and 1 output, there are 2^4 = 16 possible functions (AND, NAND, OR, NOR, XOR, XNOR, and 10 others). If you have 3 inputs and 2 outputs, that rises to 4^8 = 65,536 possible logic functions. If you have 16 inputs and 9 outputs, like an 8-bit adder does, you have a mind-boggling 10^177554 possible logic functions. By comparison, the number of atoms in the universe is estimated to be on the order of 10^80, and the number of milliseconds since the big bang is on the order of 4x10^20. Fulfilling some goal from circuit space means finding one particular function in a gargantuan sea of possibilities.

The question is, how is the simulation able to navigate this enormous search space? Arthur touches on the answer — proceeding to complex goals by way of simpler goals — but he doesn’t really look deeply at the combinatorics in the paper, or how this navigation happens specifically.2

The emergence of circuits such as 8-bit adders seems not difficult. But consider the combinatorics. If a component has n inputs and m outputs there are (2^m)^(2^n) possible phenotypes, each of which could be realized in a practical way by a large number of different circuits. For example, an 8-bit adder is one of over 10^177,554 phenotypes with 16 inputs and 9 outputs. The likelihood of such a circuit being discovered by random combinations in 250,000 steps is negligible. Our experiment— or algorithm—arrives at complicated circuits by first satisfying simpler needs and using the results as building blocks to bootstrap its way to satisfy more complex ones.

Navigating large search spaces

In his 1962 paper “The Architecture of Complexity,” Nobel Prize-winning economist Herbert Simon describes two hypothetical watchmakers, Hora and Tempus. Each makes watches with 1000 parts in them, and assembles them one part at a time. Tempus’ watches are built in such a way that if the watchmaker gets interrupted — if he has to put down the watch to, say, answer the phone — the assembly falls apart, and he has to start all over. Hora’s watches, on the other hand, are made from stable subassemblies. Ten parts get put together to form a level 1 assembly, ten level 1 assemblies get put together to form a level 2 assembly, and 10 level 2 assemblies get put together to form the final watch. If Hora is interrupted in the middle of a subassembly, it falls to pieces just like Tempus’ watches, but once a subassembly is complete it’s stable; he can put it down and move on to the next assembly.

It’s easy to see that Tempus will make far fewer watches than Hora. If both have a 1% chance of getting interrupted each time they put in a part, Tempus only has a 0.99 ^ 1000 = 0.0043% chance of assembling a completed watch; the vast majority of the time, the entire watch falls to pieces before he can finish. But when Hora gets interrupted, he doesn’t have to start completely over, just from the last stable subassembly. The result is that Hora makes completed watches about 4,000 times faster than Tempus.

Simon uses this model to illustrate how complex biological systems might have evolved; if a biological system is some assemblage of chemicals, it’s much more likely for those chemicals to come together by chance if some small subset of them can form a stable subassembly. But we can also use the Tempus/Hora model to describe the technological evolution being simulated in Arthur’s paper.

Consider a technology as some particular arrangement of 1,000 different parts, such as the NAND gates that are the basic building blocks of Arthur’s logic circuits. If you can find the proper arrangement of parts, you can build a working technology. Assume we try to build a technology by adding one part at a time, like Tempus and Hora build their watches, until all 1000 parts have been added. In this version, instead of having some small probability of being interrupted and needing to start over, we have a small probability (say 1%) of correctly guessing the next component. This mirrors Arthur’s simulation, where we had a small probability of randomly connecting a component correctly to fulfill some goal. Only by properly guessing the arrangement of each part, in order, can we create a working technology.

In Simon’s original model, assembling a watch was like flipping 1000 biased coins in a row. Each coin had a 99% chance of coming up heads, and only when 1000 heads were flipped was a watch successfully assembled. Our modified model is like flipping 1000 biased coins which have only a 1% chance of coming up heads. Creating a technology via the “Tempus” method is like flipping 1000 coins in a row and hoping for heads each time. The probability of producing a working technology is 0.01^1000, essentially zero. But if we create a technology via the “Hora” method of building it out of stable subassemblies, the combinatorics become much less punishing. Now instead of needing to flip 1000 heads in a row, we only need to flip 10 in a row. 10 successful coinflips — 10 parts successfully added — gives us a stable subassembly, letting us essentially “save our place.” Flipping a tails doesn’t send us all the way back to zero, just to the last stable subassembly. The odds are still low — for each subassembly, you only have a 0.01^10 chance of getting it right — but it’s enormously higher than the Tempus design. You’re much more likely to stumble across a working technology if that technology is composed of simpler stable components, and you can determine whether the individual components are correct.

Arthur’s circuit simulation is able to find complex technologies because it works like Hora, not Tempus: complex circuits are built up from simpler technologies, the way Hora’s watches are built from stable subassemblies. Going from nothing to an 8-bit adder is like Tempus trying to build an entire and very complex watch by getting every step perfect. Much easier to be like Hora, and be the one that only needs to get the next few steps to a stable subassembly correct: adding a few components to a 6-bit adder to get a 7-bit adder, then adding a few to that one to get an 8-bit adder, and so on.

We can illustrate this more clearly with a modified version of Arthur’s circuit search. In this version, rather than trying to fulfill a huge collection of goals, we’re just trying to find the design for a specific 8-bit adder made from 68 NAND gates. Rather than build this up from simpler sub-components (7-bit adders, 6-bit adders, full adders), in this simulation we simply go NAND gate by NAND gate. Each iteration we add a NAND gate, and randomly wire it to our existing set of NAND gates. If we get the wiring correct, we keep it, and go on to try adding the next gate. If it’s incorrect, we discard it and try again.

We can think of this as a sort of modular construction, akin to building a complex circuit up from simpler circuits; at each level, we’re just combining two components, our existing subassembly and one additional NAND gate. This loses verisimilitude, since each subassembly no longer implements some particular functionality (we essentially just dictate that the simulation knows when it stumbles upon the correct gate wiring). But we don’t lose that much: it is, notably, possible to build an 8-bit adder with a hierarchy that requires just two components at almost every level (a few steps require three components). And this simpler simulation has the benefit of making it very easy to calculate the combinatorics at each step.

Hierarchical 8-bit adder. FA is a full-adder (which adds 3 input values together), HA is a half-adder which adds 2 input values together). Full decomposition down to NAND gates not shown.

68 NAND gates can create around 2^852 possible wiring arrangements with 16 inputs and 9 outputs. This is much less than the 10^177,554 possibile 16 input and 9 output functions, but it’s still an outrageously enormous number. If we tried to find the right wiring arrangement by random guessing all 68 gates at once, we’d never succeed: even if every atom in the universe was a computer, each one trying a trillion guesses a second, we’d still be guessing for about 10,000,000,000,000…(140 more zeroes)..000 years.

But by going gate-by-gate, the correct arrangement can be found in 453,000 iterations, on average. Each time we add a gate, there’s only a few thousand possible ways that it can be connected, so after a few thousand iterations we guess it correctly, lock the answer in, and move on to the next gate. By determining whether each step is correct, instead of trying to guess the complete answer all at once, the search becomes feasible.

This is why Arthur’s original simulation couldn’t fulfill complex needs without fulfilling simpler needs first: if you try to take too many steps at once, the combinatorics become too punishing, and it becomes almost impossible to find the correct answer by random guessing. In our 68 NAND gate search, finding an 8-bit adder is relatively easy if we go one gate at a time, but if we change that to two gates at a time (randomly adding one gate, then another gate, then checking to see if we’re correct), the expected number of iterations rises from 453,000 to 1.75 billion: if the probability of guessing one gate correctly is 1/1,000, the probability of guessing two gates is 1/1,000,000. If we try to guess three gates at a time (1 in a billion odds of guessing correctly), the number of expected iterations to guess all 68 gates correctly rises to ~9.3 trillion.

The explosive combinatorics gives us a better understanding of some of the results that come out of Arthur’s simulation. For instance, Arthur notes that in each iteration the simulation combines up to twelve components, then checks to see if a working circuit has been found. But Arthur notes that you can vary the maximum value and it doesn’t impact the results of the simulation much, stating of the various simulation settings that “[e]xtensive experiments with different settings have shown that our results are not particularly sensitive to the choice of these parameters.” Indeed, if we re-run the simulation and only allow it to try a maximum of 4 components at once, it works basically just as well as with 12 components. The more random components you combine together, the more the combinatorial possibilities explode, and the lower the chance of finding something useful. The probabilities of finding a useful circuit amongst the various possibilities becomes so immensely low with larger numbers of components that you don’t lose much by not bothering with them at all. Similarly, this also explains another result in the paper, that it’s easier to find complex goals if you specify only a narrower subset of simpler goals related to them. Arthur notes that a complex 8-bit adder is found much more quickly if you only give the simulation a few goals related to building adders. With fewer goals specified, the pool of possible technology components will remain smaller, the number of possible random combinations becomes fewer, and the easier it becomes to find the complex goals.

In essence, using simpler components as stepping stones to more complex ones is a kind of hill-climbing. The simulation looks in various directions (possible combinations of building blocks), until it finds one that’s higher up the hill (finds a circuit that meets some simple goal), restarts the search at the new, higher point on the hill, until it reaches a peak (satisfies a complex goal). The simulation is able to satisfy complex goals because it specified a series of simpler ones that provide a path up the hill to the complex goals. Arthur notes that “[t]he algorithm works best in spaces where needs are ordered (achievable by repetitive pattern), so that complexity can bootstrap itself by exploiting regularities in constructing complicated objects from simpler ones.”

Trying to go to complex circuits directly, then, is akin to just testing random locations in the landscape and seeing if they’re a high point: this is obviously much worse than following the slope of the landscape to find the high points.

Technological search and information

We can sharpen these ideas even further by bringing in some concepts from information theory. Information theory was invented by Claude Shannon at Bell Labs in the late 1940s, and it provides a framework for quantifying your uncertainty, and how much a given event reduces that uncertainty.

I find the easiest way to understand information theory is with binary numbers. The normal math we use day to day uses base 10 numbers. When we count upward from zero, we go from 0 to 9, then reset the first digit to 0 and increment the next digit: 10. With binary, or base 2, we increment the next digit after we get to 1. So 1 in base 10 is 1 in binary, but 2 is 10, 3 is 11, 4 is 100, and so on.

Decimal (base 10) and binary (base 2) numbers.

In binary, each binary digit, or bit, doubles the potential size of the number we can represent. So with two digits, we can define 4 possible values (0, 1, 2, and 3 in base 10). With 3 digits, that doubles to 8 possible values (getting us from 0 through 7), with 4 digits that doubles again to 16 possible values, and so on. A 16 bit binary number can represent 2^16 = 65,536 possible values, which is why in computer programming the largest value that a 16 bit integer can represent is 65,535.

Say you have a string of bits, but don’t know whether they’re ones or zeroes. Because each bit doubles the number of possible values that can be represented, each unknown bit you fill in reduces the number of possible values by half. If you have 3 binary digits, there are 8 possible numbers that could be represented. Each time you learn what one of the bits is, you reduce the number of possible values by half.

With information theory, we generalize this concept somewhat. In information theory one bit of information reduces the space of possibilities by 50%; in other words, each bit reduces our uncertainty by half. Say you’re like me, and you often lose your phone in your jacket pockets. If you’re wearing a coat with 2 pockets and you know the phone is in one of them, specifying the location of your phone, narrowing it down from 2 possibilities to 1, takes one bit of information. If you’re wearing a coat with 4 pockets, you now need 2 bits of information: 1 bit to tell you whether it’s on the right or the left, and another bit to tell you whether it’s an upper or lower pocket. The first bit cuts the possibilities in half, leaving you with two possibilities, and the second bit cuts it in half again. If your jacket has 8 pockets, now you need 3 bits to specify its location, and so on. The more places that something could be, the more information it takes to specify its location.

Information theory is particularly useful for quantifying how much information we get from some particular outcome. Say someone flips a fair coin; how much information do I get when they reveal whether it was heads or tails? Well, before they reveal it, I knew it could be one of two options, heads or tails. Revealing it narrows the number of possibilities from two down to one. We’ve cut the number of possibilities in half, and thus gained 1 bit of information. More generally, the information provided by some outcome is equal to -log2(the probability of that outcome). So revealing how a fair coin was flipped gives us -log2(0.5) = 1 bit. If we’re dealt a single card from a deck face down, when we reveal that card we’ve reduced the number of possible cards from 52 down to 1, and gained -log2(1/52) = 5.7 bits of information.

For a repetitive process, we also want to know a related quantity: entropy. Entropy is determined by calculating the information received from each possible outcome, multiplying it by the probability of that outcome, then summing all those values together. It’s the expected quantity of information you’ll get by taking some particular action.

Say I’ve lost my phone in my jacket with eight pockets, and am looking for it by randomly trying pockets until I find it. A random guess has a 1/8 chance of successfully finding the phone, and a 7/8 chance of coming up empty. Guessing correctly will yield me -log2(1/8) = 3 bits of information, as expected: once I guess correctly, I know the phone’s location. But an incorrect guess will yield me only 0.19 bits of information: I already knew most of the pockets don’t have the phone, so failing to find the phone in one pocket doesn’t tell me much that I didn’t already know. The entropy of a guess is (1/8) * log2(1/8) + (7/8) * log2(7/8) = 0.54 bits. When I first check a pocket, I can expect to get a little more than half a bit of information. (If I rule out pockets that I’ve already checked, the expected amount of information I get will rise each time, though if you’re like me you might have to check the same pocket a few times before you find the phone.)

Because each bit of information we get cuts the number of possibilities remaining by 50%, it doesn’t take that much information to narrow down an enormously large search space. The 2^852 possible circuits that can be created by wiring up our 68 NAND gates requires only 852 bits — 852 times cutting the number of possibilities in half — to specify. (That’s approximately the same number of bits that it takes to specify each letter of this sentence.)

A key aspect of entropy is that we maximize how much information we get when each outcome is equally plausible. So the entropy of a fair coin, with a 50% chance of coming up heads, is 1 bit. But if the coin has a 90% chance of coming up heads, the entropy is now just 0.46 bits. If the coin has a 99% chance of coming up heads, the entropy falls to 0.08 bits. When one outcome is very likely, you learn much less on each attempt, because you mostly get the outcome you already knew was likely. This is why when playing the game “20 questions,” the most efficient strategy is to try and ask questions where the answer divides the number of possibilities in half. “Is it bigger than a breadbox?” is a good starting question because there are probably roughly similar numbers of items that are bigger and smaller than a breadbox. “Is it a 1997 Nissan Sentra?” is a bad starting question because most possibilities are not a 1997 Nissan Sentra, so we learn very little when the answer is “no.”

We can think of our 68 NAND gate search as flipping a series of very biased coins, each one with a ~ 1/(several thousand) probability of coming up heads (where “heads” is “guessing the right wiring combination for that particular NAND gate”). The entropy of this process — the expected amount of information that we get — is very low, around 0.003 bits per attempt. Each attempt we learn very little about the correct wiring diagram (“it wasn’t this arrangement, it wasn’t this arrangement either, or this one”) so we need a lot of attempts — around 453,000, on average — to accumulate the 852 bits needed to specify the correct wiring for our 8-bit adder.3

Trying to guess two gates at a time is like biasing the coin even further: now each one has a ~1/(several million) probability of coming up heads. We thus get vastly less information per attempt — less than 0.000001 bits per attempt on average — so it takes us many, many more attempts to accumulate the information needed.

A useful way of thinking about our 68 NAND gate search is that it’s like a huge, branching tree. At every step — every time we add a gate — there are thousands of branches, each one representing one possible way to wire up the gate. Each branch then splits into thousands more (representing all the possible ways of wiring up the next gate), which split into thousands more, which split into thousands more, until at the end we have 2^852 possible “leaves,” each one representing a unique way of wiring up all 68 gates. Trying to get all 68 gates right at once, and then checking to see whether or not you did, is like examining one single leaf, one path from the base of the tree all the way to the tip of a branch. Not only are you overwhelmingly likely to guess wrong, but you haven’t narrowed down your possibilities at all: all you know is that one single leaf wasn’t the right answer, leaving you with the rest of the 2^852 possible leaves to sift through.

Checking to see whether each gate we add is correct before we proceed to the next one, by contrast, massively narrows down the number of possibilities. Whenever we determine a gate isn’t in the right spot, it eliminates every possibility that branches off from that point. If there are 1000 possible ways to wire each gate, each time we guess correctly we’ve narrowed down the possibilities downstream of that choicepoint by 99.9%. Huge swaths of possibilities get eliminated at each correct guess, letting us converge on the correct answer much more quickly.

The same basic logic applies to Arthur’s simulation. (In fact, in another publication, Arthur uses a very similar metaphor, describing technological search as trying to find a working path up a mountain, which is full of various obstacles.) Building up complex functions without the aid of intermediate, simpler ones is like trying to find a single leaf on a tree the size of the universe. Building up to complex circuits gradually, using simpler components as building blocks, lets you screen off huge branches of the tree at once. Once you have a working 2-bit adder, every branch that has a non-working 2-bit adder in it gets screened off. Your iterations yield massively more information, and the search problem becomes tractable.

Conclusion

The logic of Arthur’s simulation, and our simpler simulation, also applies to creating new technologies more generally. Logic circuits are a useful model to explore, because they’re real technology that is very amenable to simulation (they have a well-defined, simple behavior), but technology in general can be thought of as a combination of simpler components or elements arranged in various ways to create more complex ones. As Arthur notes:

…in 1912 the amplifier circuit was constructed from the already existing triode vacuum tube in combination with other existing circuit components. The amplifier in turn made possible the oscillator (which could generate pure sine waves), and these with other components made possible the heterodyne mixer (which could shift signals’ frequencies). These two components in combination with other standard ones went on to make possible continuouswave radio transmitters and receivers. And these in conjunction with still other elements made possible radio broadcasting. In its collective sense, technology forms a network of elements in which novel elements are continually constructed from existing ones. Over time, this set bootstraps itself by combining simple elements to construct more complicated ones and by using few building-block elements to create many.

One takeaway from this paper, as Arthur notes and we explored more deeply, is that a hierarchical arrangement of components, where a complex technology is made of simpler components, which are in turn made from even simpler components, makes it much easier to create some new technology. But a more general takeaway is that successfully creating some new technology means getting new information as quickly as possible. Working from (or towards) a hierarchical, modular design for some technology, where each element has some specific job it must do, makes it easier to find new technologies in part because you learn vastly more from each attempt at building one of those subparts. Knowing whether some entire complex function works or not tells you much less than knowing which individual component is working right, and what specific functionality needs to be corrected to fix the problem.

1

In addition to Brian Arthur, some other folks who I think have done really good work on this are Bernard Carlson, Clay Christensen, Joel Mokyr, Hugh Aitken, Edward Constant, and various folks associated with the Trancik Lab. There’s also a few folks, such as Joan Bromberg and Lillian Hoddeson, who have produced multiple very good technological histories that I return to often.

2

Indeed, we find that if we just randomly combine dozens of NAND gates, we get a random truth table almost every time, and never solve even medium-complex functions with a few inputs and outputs.

3

Adding things up, you find that the search actually yields over 900 bits, rather than 852 bits. This is due to the information overhead of a sequential search: you end up getting “extra” information that you don’t need. In our 8-pocket jacket search, if we just guess randomly it will take us on average 8 tries to find the phone. 8 attempts * 0.54 bits per attempt yields 4.32 bits, more than the 3 bits we need to actually specify the phone’s location.

Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete

Limiting Not Just Screen Time, But Screen Space

1 Share

The first time I checked my work email in the bathroom, it didn’t feel like a concession. It felt like a tiny victory, a new efficacy in a crowded day. The internet had followed me into the most private of rooms. How did I end up there, responding to colleagues while the shower warmed, scrolling grim headlines on the toilet?

I ask this as someone of the hinge generation whose childhood was free of the internet and whose adolescence was tethered to it. I remember when going online took effort, and I remember, too, when that effort evaporated. The big change in the early 2000s wasn’t simply that more people started going on the internet. It was that the internet stopped being somewhere you went and became something you lived inside. The internet became an environment. 

I was 14 when my family got dial-up. At the time, about a quarter of U.S. households had internet access. (Now, more than 90% do.) In those early days, logging on required ceremony. You had to negotiate with the people you lived with. We said things to each other like, “Can I have 10 more minutes?” and “Get off, I need to call Grandma.” Going online required permission. The modem sang its nervous hymn.

The internet arrived in my home as a window, an enchanted portal cut into ordinary domestic space. A window is fixed in place. “I’m going online” meant you were heading to a specific place in the house, along the lines of “I’ll be in my room” or “I’m going to the attic.” To peer into the internet, I had to sit in my parents’ maroon computer chair. To my left, through the non-metaphorical window, I could see telephone wires running down the street, carrying away my messages.

I was born into a world that contained telephones just as it contained stones and trees. The internet differed from the telephone in its unvoicedness, but they shared a familiar infrastructure. It was physical and placed. There was a here that somebody had connected with copper wire to a there.

Then came AOL Instant Messenger. It was an awful place for middle schoolers, but there we were anyway, a swarm of screennames in chatrooms, my own (PyRoAnGeL5) among them. Suddenly, school life ran on two channels, what happened at school and what happened at school at home.

Still, you could leave. 

You could stand up, walk away from the computer, step back into your body and your house. In fact, you had to. Our digital lives were structured by departures. 

“brb,” your friend would type. And like that, you waited for her to return to her family’s computer, a tower tucked under a laminate desk, which itself sat beneath a framed poster of Monet’s sunflowers in the dining room. Maybe she needed to use the bathroom. Maybe she was getting a glass of Sprite. Maybe her mother needed the phone line. You trusted she’d be right back. The initialism promised it. 

It’s 1998. Your friend lives two blocks away, but you’re talking to her, and two other friends, and you can feel the strange thrill of it: You have stepped through a window into a place that isn’t quite a place.

That place began dissolving around 2005, the year I turned 21. Wireless routers spread through apartments and dorms. Laptops floated from room to room. No longer tethered to the wall, connectivity bled onto kitchen counters, coffee tables, the unmade bed that became a midnight workspace. Being online involved less ceremony and less permission-seeking. You didn’t step through a portal to go online so much as tilt open a screen, adjust to new light and enter the rushing stream. “I’ll look it up later” became looking it up in the hallway, halfway to the next thing. 

Once oriented to fixed terminals, we began to reorganize our houses around dead zones and hotspots. We followed our devices’ sensory organs and learned a new physics, adapting to the way the microwave choked the 2.4 GHz band, the way the maple tree dulled the router’s reach, the way a closed door could be both privacy and attenuation. We remapped our environments. 

Where the internet sat in the house became difficult to say. It felt less like a window and more like the weather, an ambient condition to which we adapted the movement of our bodies.

“I remember when going online took effort, and I remember, too, when that effort evaporated.”

With Wi-Fi, we invited the world into our homes. With smartphones, we inverted our homes into the world. I got my first smartphone in 2013, late enough that I had already formed adult habits and early enough to watch them be remade. The smartphone didn’t just make the internet portable; it made it proximate. We didn’t simply carry connectivity, we cradled it. We stroked it.

We began to speak to one another with our eyes lowered, our hands twitching when the screen flared like a shooting star. “I should respond to this,” we said, as if responding were not a choice but an obligation. The desktop computer had beckoned, yes, a glowing window into other worlds. But the phone was different. It buzzed against our thighs. It insisted.

The pandemic laid bare the world we had been drifting toward, one in which the home is the office, school, studio. In 2020 we rearranged furniture, searching for the lighting that most flattered our faces. We discovered which rooms echoed and which swallowed sound. I lived alone that first pandemic year, beaming my image across the country to family and friends. I ached for touch and I had no touch. I began carrying my phone from room to room. 

That is when I began bringing the internet into the bathroom. 

I became addicted to my phone’s sleek body, its vibrations and its flashes. I felt increasing disquiet at the thought of being disconnected, as if connectivity were not a service but a bodily need. I expected connectivity like I expected air.


As an environmental studies professor, I’ve spent my career studying how the built world reshapes the living one. We tend to treat the internet as an abstraction, something empty, placeless, limitless. But nothing about it is airy. The internet is heavy industry. It is cables laid across seafloor. It is warehouse-sized data centers cooled by river water. Every search query, every streamed video, every automated “I’m just following up” email rides on physical systems.

Inside servers and smartphones are materials mined from specific places: cerium from Bayan Obo in Inner Mongolia, cobalt from the Democratic Republic of Congo, lithium from Chile’s Salar de Atacama. These metals are often extracted by laborers with few protections working in exceptionally dangerous conditions.

Even before the 2022 launch of ChatGPT and the subsequent AI surge, data centers were already consuming about 1% of the world’s electricity. Morgan Stanley projects data centers will produce 2.5 billion metric tons of carbon dioxide equivalent by 2030. The term “cloud” conjures something weightless, ephemeral, unowned. The cloud is none of these things. 

But extraction and emissions are not the only environmental changes wrought by the internet. Connectivity has remade how we perceive our environments and how we move through them. It has changed what it feels like to live in a place, how we navigate rooms and streets, how often we are alone. And all of this happened without dispelling the illusion that the digital is limitless and free of place.

We speak of “screen time” as if time were the only axis. But the spatial axis matters, too. 

Ambient connectivity has trained our bodies in the protective hunch of the scroll, the twitch of the hand opening toward a notification, the omnipresence of vibration. The problem isn’t only how much time we spend online. It’s how we move in the spaces we’ve built for online life and what kinds of freedoms we’ve lost.

If the network follows the body through the house, what spaces can we build to let the body be alone? What would it mean to limit not only screen time, but screen space?


It’s tempting to romanticize a childhood without devices, a bike ride without a GPS trace. What I miss about the corner-desk internet of my youth isn’t purity. It’s friction. 

That earlier internet had constraints that protected me. Being online meant being somewhere shared. The household computer forced the internet into relation with the dinner table, the phone line, the ordinary fact of coexistence.

The smartphone does the opposite. It carries the elsewhere to the bathroom, the library, the date, the bed. The home that once defined the end of the workday now extends it. My students are experts at maintaining two conversations at once, one in the room and one on the screen. Many are exhausted by it. We are all subjects in a decades-long experiment designed to monetize our attention.

“With Wi-Fi, we invited the world into our homes. With smartphones, we inverted our homes into the world.”

Every now and again I try to treat my devices as fixed in space. I resolve to use my laptop only at my desk and my phone only at the kitchen counter where it recharges. It’s a useful exercise, and a humbling one. It lets me feel with my body how I am captive to connectivity. I am frustrated when the internet feels like a window: I’ve grown accustomed to weather.

After minutes or hours, I inevitably fail, unplugging my phone as I walk out the door. 

Some people pursue periodic extremes, locking up their devices for a “digital detox.” One can pay for a lavish digital retreat. But this all-or-nothing approach to being online fails to offer us a model for how to live in the middle, how to use the internet as a tool while maintaining the freedom of our bodies. 

The problem, I am starting to think, is not only that we use our phones in the bathroom, but that we imagine that socializing, knowledge production, politics and creativity can be achieved outside of physical space. We make the same mistake with generative AI that we make with the internet when we treat it as a nowhere rather than a somewhere. We fail to acknowledge that the virtual operates within — and not beyond — the spatial, material world.

It’s the difference between C-3PO from “Star Wars” and Samantha from Spike Jonze’s “Her,” the one a comically, but necessarily embodied humanoid, the other quite literally transcendent. Maybe we used to measure intelligence through physical competences like the ability to navigate doors and stairs, to read a gesture, to hold eye contact, but today we measure it by its ambient ever-presence.

We no longer think a robot is intelligent just because it can move in a world built for bodies like ours. Large language models (LLMs), in our imagination, are conversational beings without bodies, without any friction of environment. We speak to them as if they were somewhere nearby, and yet they are not anywhere our imaginations can place. And so we begin to accept the strange premise that intelligence might exist outside of the physical world, floating above the constraints that make human life legible.

Yet intelligence is environmental.

My colleague at Williams College, Joe Cruz, notes that for an AI to strike us as authentically intelligent, it will have to be embodied, because many of the features we value in human (and animal) intelligence arose from the task of keeping a body alive as it moves through shared space. We recognize dogs as intelligent, for instance, in part because they have facility in our built and social spaces, communicating through shared emotional expressions, having evolved to live within our environments. Some cognitive scientists argue that intelligence cannot be made sense of in isolation from body and environment at all. 

The sci-fi image of the floating brain that finds a body and learns to walk (or to love) has the steps reversed. We learn through our bodies; we sense the world, make decisions about it and act within it. Intelligence that is disembodied will not seem like intelligence to us. 

And yet, in Silicon Valley, the opposite vision holds sway. Powerful people, including tech experts and many of our elected officials, believe that with LLMs, we will find a better way of living together, a better way of governing our shared environment.

Sam Altman, the CEO of OpenAI, has argued that AI acceleration will usher in an “Intelligence Age” of “unimaginable” and “shared” prosperity and “astounding triumphs” like “fixing the climate.” Deep learning, he explains, is an algorithm that can truly learn the rules behind any distribution of data. The more compute and data available, the better it can help people “solve hard problems.” 

Altman’s vision collides with basic truths of how people live. We care for places because we inhabit them. Love of place arises through our bodies as much as our minds.

But those committed to disembodied intelligence reach for a different solution: total representation. If the model cannot dwell in the world, the world must be made to dwell in the model as a “digital twin,” rendered at ever finer resolution, until environment becomes data and data becomes environment. 

Argentinian author Jorge Luis Borges’ parable “On Exactitude in Science” imagines an empire that produces a map the exact size of the territory. It is a useless tool, one that becomes territory itself. “In the Deserts of the West,” Borges concludes his story, “there are Tattered Ruins of that Map, inhabited by Animals and Beggars.”

“What would it mean to limit not only screen time, but screen space?”

Those dreaming of a nascent cognitive revolution are imagining that Borges’ one-to-one map will be finally useful — that if we just feed enough text, enough human knowledge, into the machine, it will comprehend the world in a way we never can. 

Even if we had the time, labor and energy to attempt this, why would we? Why not put that effort into talking to each other? 

The alternative is an increasingly familiar solipsism. A solipsistic person believes the self is the only reality. Other minds, other bodies, may as well be an illusion. 

Today’s internet bends us toward solipsism. We no longer imagine ourselves to be placing our images and our voices into the internet. We imagine ourselves — our physical beings — to be living within it. We imagine the internet to be our environment.

In “Trick Mirror,” journalist Jia Tolentino warned that the internet, once imagined as a space of freedom, had become a mechanism for surveillance, performance and commodification. Online life encourages self-optimization and branding at the expense of connection. “In physical spaces, there’s a limited audience and time span for every performance,” Tolentino writes. “Online, your audience can hypothetically keep expanding forever, and the performance never has to end.” 

Tolentino focused on time, but this internet is an endless stage, too, one with no wings, no exit, no place to step off and be alone again. 

“brb” once acknowledged departure and faith in return. It reminded us of the body behind the screen. Now, we are infinitely available, and AI is sold to us as the tireless and needless assistant. But our bodies continue to live in the world with stubborn persistence, despite Silicon Valley’s dream of the immortal avatar, the ability to upload our essence into a durable machine, which is a dream of escaping death and environment alike.

Most of the questions worth asking are not about how to transcend the environment, but how to inhabit it. How to live together in shared space. 

Many social, historical and economic forces led me to check my work email in the bathroom. Among them is the way we have come to imagine the internet not as a place we go, but as a space we inhabit. We make sense of abstract experience through bodily metaphors grounded in orientation and sensation: Up is good, down is bad, warmth is affection, weight is importance. These metaphors shape how we act and what we value. 

Window, weather: Change the metaphor and you change the possibilities for thought and action. If the internet once taught us to say “brb,” perhaps the work ahead is to recover that ethic of interruption, to remember the body in a room, waiting to return.

The post Limiting Not Just Screen Time, But Screen Space appeared first on NOEMA.

Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete

Endgame

1 Comment and 3 Shares

A 3 panel comic strip: “There’s no way out of this one, Dad. Check.” says a kid who is dominating his Dad in chess. Dad, slyly, angles his watch to reflect a beam of light onto the board, catching the attention of a cat. In the last panel, we see the pieces scatter as the cat pounces. Dad covers his smile from his distraught child.

The post Endgame appeared first on The Perry Bible Fellowship.

Read the whole story
mrmarchant
2 hours ago
reply
Share this story
Delete
1 public comment
jlvanderzwan
10 hours ago
reply
That's the second comic I've seen in a month that involves the sun reflecting off and old-school wrist watch, which isn't a lot but it's weird that it happened twice.

https://old.reddit.com/r/comics/comments/1rsktxd/it_doesnt_count_as_a_walk_if_you_dont_bring_a/

“ LLM-generated passwords…appear strong, but...

1 Share

LLM-generated passwords…appear strong, but are fundamentally insecure, because LLMs are designed to predict tokens – the opposite of securely and uniformly sampling random characters.”

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories