1796 stories
·
2 followers

My Son’s Math Homework Is Essentially Just Pokémon ....

1 Share

My Son’s Math Homework Is Essentially Just Pokémon. “As I watched my son play Prodigy, it became clear there wasn’t much learning happening. In about 10 minutes of gameplay, he spent less than 30 seconds answering math questions.”

Read the whole story
mrmarchant
34 minutes ago
reply
Share this story
Delete

Reclaiming Social Engineering for Good

1 Share


“Social engineering” sounds like something out of a conspiracy thriller, charged with totalitarian control and fringe paranoia. More mundanely, it’s come to be associated with phishing and other scams, in which fraudsters manipulate people into disclosing personal information.

Yet the concept is older and more benign: it is the deliberate shaping of human behavior, often at scale. It predates silicon—and became pervasive, and ungoverned, especially once its practitioners learned to hide it. Authoritarian regimes and more recently scammers and big companies have profited from it. To defend ourselves from bad actors, and to benefit from social engineering’s good side, we need to reclaim the name, and govern it prudently.

The roots of engineering

In 1894, Dutch entrepreneur Jacques van Marken urged companies to hire “social engineers” to manage human systems such as insurance, education, and profit sharing for workers as carefully as they did mechanical ones. Fifteen years later, reformer William H. Tolman published Social Engineering, describing how U.S. industrialists optimized workers’ conditions alongside manufacturing methods. If industrialists could shape steel and electricity on demand, why not society itself?

By the 1920s, that confidence had spread. The architect Le Corbusier declared that dwellings were “machines for living in,” imagining cities as orderly lattices where people moved like parts on a conveyor belt. Civilization would run like a Swiss watch.

The idea soon darkened. Authoritarian regimes pushed it to extremes, promising to fashion “the New Man.” In Nazi Germany, engineer Fritz Todt founded Organization Todt, a vast state engineering enterprise that emerged from the autobahn highway system and later operated concentration camps using slave labor.

In the Soviet Union, leaders adopted U.S. scientific management techniques to plan factory-worker movements and classify populations through centralized records, feeding both rapid industrialization drives and the gulag system of forced labor. The same tools and managerial methods used to build highways and enact five-year plans worked for repression and mass control.

By the 1950s, “social engineering” had become a contaminated phrase. The revelations of Nazi and Soviet abuses, along with Cold War critiques of grand social planning turned the term from a progressive slogan into a warning label. Banishing the words pushed the practice underground, making it harder to recognize when it resurfaced in new forms—such as organizational psychology and systems management that still relied on classification and behavioral influence techniques but under softer, less loaded labels.

Social engineering’s more subtle spread

In the postwar years, the new social-engineering lexicon included “human factors” and “urban planning,” all promising integration rather than command. As computing advanced, the language shifted again: “customer journey mapping” to track interactions, “user experience” to script them. Engineering, which began as a means of reshaping physical space, set its sights on shaping behavior. Digital design features embedded in our smartphones now target our attention and desire.

Language helps conceal these modern forms of social engineering. “Data analytics” sounds neutral beside “surveillance.” “Personalization” flatters individuality while still sorting users into predictable categories. “Behavioral nudges” guide decisions without the sense of intrusion. We attach “social” as a favorable modifier to sciences, capital, and media, yet recoil when it meets “engineering.”

That discomfort is a clue. Engineering implies control, and control prompts us to ask who directs whom, toward what ends, and with whose permission.

Not all social engineering these days is hidden. Hackers don’t need to break a firewall if someone hands over their password. Romance scammers cultivate intimacy the way farmers cultivate crops. They succeed not through force but by exploiting trust. If even these obvious attacks work, the invisible kind, with roots in social engineering, are a shoo-in.

Most of the social engineering we encounter is proprietary and beyond our control. Firms build recommendation algorithms tuned to boost engagement and profit with no hearings or right of appeal. Browser and cookie defaults decide what data we surrender. A single autoplay toggle can cost users hours and build unhealthy habits. These are acts of engineering as deliberate as laying a road or redrawing an electoral district. They create a kind of curated itch by which boredom never settles, and satisfaction never arrives. The results are predictable—users click on targeted ads, make purchases, form habits, and lock in opinions.

Consent has transformed along with it. Once straightforward and revocable, it is now subtle and persistent, buried in defaults or opaque terms of service too quickly accepted. You remain free to opt out, much as you are free to refuse roads or electricity. Consent has become the preselected setting of modern life.

When social engineering operated more in the open, citizens could contest it, at least in societies with responsive government. Today’s invisible version diffuses accountability so thoroughly that scrutiny becomes hard to direct. Despite recent congressional hearings on social media’s impact on youth mental health and juries agreeing that firms are knowingly designing algorithms that cause harm, pinpointing responsibility remains elusive. When the mechanism is buried inside a system used by billions, we cannot easily point to a single decision-maker or trace the precise moment of manipulation.

Today’s social engineering is less overt and theatrical than its predecessors. Earlier versions arrived on public posters and loudspeakers for mass audiences. Today’s version is more intimate, delivered through personal devices and constant feeds tailored to the individual. The model succeeds because participation feels like freedom, not control.

Not all social engineering is dystopian. Well-kept parks foster community, accessible buildings extend dignity, vaccines and seatbelts save lives. Even in the digital realm, positive examples exist: browser extensions that automatically block hidden trackers, search engines that refuse to build personalized surveillance profiles, and decentralized social platforms that give users greater control over their own data and feeds.

The term “social engineering” still unsettles, though. But “asocial” engineering, which ignores human consequences entirely, is worse. Recognition of the human dimension to engineering is the beginning of repair. Only by seeing the machinery clearly and naming it honestly can we decide who engineers what and why. The machinery will not dismantle itself. Once named, it becomes subject to choice. That negotiation of purpose, power, and process are the defining political questions of any real democracy. We cannot ensure that social engineering serves and sustains society so long as we dodge the words.

Read the whole story
mrmarchant
9 hours ago
reply
Share this story
Delete

Linux is Getting a Free Pass on Age Verification in California and Colorado

1 Share

Age verification laws have been spreading fast, and we have been keeping tabs on them for a while now. California's Digital Age Assurance Act (AB 1043) was the first to land, signed in October 2025, with Colorado following with its own version (SB26-051).

Neither made any concessions for open source software in the original language, which left Linux distributions and other community-run projects in a very uncomfortable dilemma.

Both have since moved to fix that, with Colorado having wrapped it up earlier this month and California heading for a full Assembly vote.

What's California doing?

a cropped screenshot of the california ab-1856 bill for age verification signals in software and online services
Look at the blue bits.

AB 1043 required OS providers to collect a user's age or birth date at account setup and share it with apps through a real-time API, starting January 1, 2027. Open Source projects got no special treatment in the original text, which is something we wrote about when the bills started drawing attention.

Assembly Member Buffy Wicks, who authored AB 1043 herself, introduced AB 1856 in February to address that.

After four rounds of revisions, the bill has rewritten the definition of "operating system provider" to exclude anyone distributing an OS under terms that let recipients copy, redistribute, and modify the software.

Most Linux distributions under permissive or copyleft licenses fall cleanly within that.

In tandem, another change covers the application side, where software that is not offered as a standalone executable through a covered app store is no longer treated as an "application" under the law.

The bill passed the Appropriations Committee 11-0 on May 14. It was ordered to third reading on May 19 and is awaiting an assembly vote. Interestingly, Buffy is the chair of that committee.

What about Colorado?

Colorado's path here involved some direct community legwork. Carl Richell, the founder of System76, spent some considerable time working with Senator Matt Ball, one of SB26-051's co-authors, to get open source exclusions written into the bill.

The bill exempts OS providers and developers distributing software under terms that permit copying, redistribution, and modification. It also adds a requirement that exempt software have no platform-imposed technical or contractual restrictions on installing modified versions.

The extra clause is aimed at tivoization, where manufacturers lock down hardware to block modified software from running even when the source code is freely available.

Beyond that, code repository providers, containerized software distributions, and applications from free, publicly available code repositories are explicitly excluded too.

The law also has a narrower scope, only applying to OS providers that operate a covered app store or ship one pre-installed. An OS provider with no app store involvement does not come into scope at all.

Besides that, SB26-051 is now set to take effect on July 1, 2028.

Some closing words…

Neither state got here automatically. The open source exemptions did not exist in either bill to start with, and it took sustained community pressure and direct legislative outreach to get them added.

This is something that can be applied to many other issues, of course. Though, when the representatives are more interested in serving certain interests (say due to pressure from certain lobbies) than their constituents, disruption tends to be the only way out.



Read the whole story
mrmarchant
9 hours ago
reply
Share this story
Delete

Notes on Pope Leo XIV's encyclical on AI

2 Shares

Dropped this morning by the Vatican: Magnifica Humanitas of His Holiness Pope Leo XIV on Safeguarding the Human Person in the Time of Artificial Intelligence. This is a very interesting document. It's some of the clearest writing I've seen on the ethics of integrating AI into modern society.

Pope Leo XIV chose the name Leo in honor of Pope Leo XIII, who is known for his 1891 Rerum novarum encyclical on "Rights and Duties of Capital and Labor".

This story on Vatican News further clarifies the significance of that decision:

Meeting with the College of Cardinals for their first formal encounter after his election, Pope Leo XIV explained part of the reason for the choice of his papal name. "There are different reasons for this," he said, before going on to explain that he chose the name Leo "mainly because Pope Leo XIII, in his historic encyclical Rerum novarum addressed the social question in the context of the first great industrial revolution."

"In our own day," he continued, "the Church offers to everyone the treasury of her social teaching in response to another industrial revolution and to developments in the field of artificial intelligence that pose new challenges for the defence of human dignity, justice, and labour."

And now we get Pope Leo XIV's own encyclical on the AI revolution. There's a lot in here, but the writing style is very approachable, including to non-Catholics.

A few of my highlights

(I listened to most of the encyclical on a walk with our dog, my first time trying the ElevenReader iPhone app. It worked very well: I pasted in a URL to the document and it read it to me in a very high quality voice, highlighting each paragraph as it went.)

Here are some of my highlights. In each case below emphasis is mine.

Here's a useful description of the interpretability problem for LLMs in section 98:

First, any statement regarding AI risks becoming quickly outdated, given the remarkable pace at which these systems are developing. Second, all of us, including those who design them, possess only a limited understanding of their actual functioning. Indeed, current AI systems are more “cultivated” than “built,” for developers do not directly design every detail, but instead create a framework within which the intelligence “grows.” As a result, fundamental scientific aspects — such as the internal representations and computational processes of these systems — remain, at present, unknown.

I liked section 83's description of the relationship between development and dignity:

For individuals as well as for nations, development is both a duty and a right. Minimum conditions are required for enabling every person and people to flourish in accord with their dignity, without being kept in a state of dependence or excluded from access to necessary goods. Development is truly human when it places people at the center instead of the accumulation of wealth, and when it concerns peoples as well as individuals. Justice demands the recognition of the rights of society and the rights of peoples, and includes a responsibility toward future generations. Development is not truly human if it increases consumption for some while shifting costs and burdens onto others, or relegates entire regions to subordinate roles, preventing them from realizing their full potential.

Baked in cultural biases and sycophancy get a mention in section 100:

In personal use, three aspects in particular deserve careful consideration: the ease with which results are obtained, the impression of objectivity and the simulation of human communication. The speed and simplicity with which information, complex analyses, media content and practical assistance can be accessed undoubtedly makes life easier. Yet they can also encourage excessive reliance and the search for ready-made answers, and weaken personal creativity and judgment. The apparent objectivity of the responses and suggestions these systems provide can lead us to overlook the fact that they reflect the cultural assumptions of those who designed and trained them, with all their strengths and limitations. The artificial imitation of positive human communication — words of advice, empathy, friendship and even love — can be engaging and at times genuinely helpful. However, for less discerning users, it can also be misleading, creating the illusion of a relationship with a real personal subject. When words are simulated, they do not build genuine relationships, but only their appearance. The artificial imitation of care or support can become particularly risky when it enters contexts where real relationships and emotional bonds are lacking.

101 touches on the environmental impact:

Current AI systems require enormous amounts of energy and water, significantly influencing carbon dioxide emissions, and place heavy demands on natural resources. As their complexity increases, especially in the case of large language models, the need for computing power and storage capacity grows too, which requires an extensive network of machines, cables, data centers and energy-intensive infrastructure. For this reason, it is essential to develop more sustainable technological solutions that reduce environmental impact and help protect our common home.

102 covers the risks of algorithmic systems making decisions that impact people's lives without "compassion, mercy, forgiveness":

The use of AI is never a purely technical matter: when it enters processes that affect people’s lives, it touches on rights, opportunities, status and freedom. Important and sensitive decisions — concerning employment, credit, access to public services or even a person’s reputation — risk being fully delegated to automated systems that do not know “compassion, mercy, forgiveness, and above all, the hope that people are able to change,” and can therefore give rise to new forms of exclusion.

105 emphasizes the need for human accountability in how these systems are applied:

For AI to respect human dignity and truly serve the common good, responsibility must be clearly defined at every stage: from those who design and develop these systems to those who use them and rely on them for concrete decisions. In many cases, however, the internal processes leading to a result remain opaque, making it harder to assign responsibility and correct errors. This is where accountability becomes crucial: the possibility of identifying who must “account” for decisions, justify them, monitor them, and, when necessary, challenge them and remedy any harm caused.

And 108 touches on the way AI amplifies the power of those with resources:

In fact, as with every major technological shift, AI tends to amplify the power of those who already possess economic resources, expertise and access to data. In light of the common good and the universal destination of goods, this raises serious concerns, since small but highly influential groups can shape information and consumption patterns, influence democratic processes and steer economic dynamics to their own advantage, undermining social justice and solidarity among peoples. For this reason, it is essential that the use of AI, especially when it touches on public goods and fundamental rights, be guided by clear criteria and effective oversight, grounded in participation and subsidiarity.

That same section explicitly calls out data as something that should be thought of more as a public good:

[...] Moreover, ownership of data cannot be left solely in private hands but must be appropriately regulated. Data is the product of many contributors and should not be treated as something to be sold off or entrusted to a select few. It is necessary to think creatively in order to manage data as a common or shared good, in a spirit of participation, as Saint John Paul II already suggested regarding collective goods.

Given that Palantir is named after a Lord of the Rings reference, I can't help but wonder if the J.R.R. Tolkien quote from The Return of the King (section 213) was the Pope throwing a little shade at Peter Thiel.

The twentieth-century Catholic author J.R.R. Tolkien, in the words of a protagonist in one of his novels, described our responsibility in this way: “It is not our part to master all the tides of the world, but to do what is in us for the succour of those years wherein we are set, uprooting the evil in the fields that we know, so that those who live after may have clean earth to till.” The civilization of love will not arise from a single or spectacular gesture, but from the sum total of small and steadfast acts of fidelity that serve as a bulwark against dehumanization. For this reason, it is worthwhile pausing to reflect on some aspects of how we, each in our own way, can cooperate in building the civilization of love.

Another 2026 prediction down

On 6th January this year I joined the Oxide and Friends 2026 predictions podcast episode to talk about predictions for 2026, 2029 and 2032. I wrote mine up here, with hindsight they weren't nearly ambitious enough - it's already undeniable that LLMs write good code, we've made huge advances in sandboxing and New Zealand kākāpō have indeed had a truly excellent breeding season.

There's one segment from the episode that I didn't bother to include in my write-up, but that I can't resist providing as a lightly-edited transcript here:

Bryan Cantrill: 37:13

I think that AI has created some real public perception problems for itself. And I think that you are gonna have one of the frontier model companies, this year, have a white paper explaining how the proliferation of AI will mean prosperity for everybody. They will be trying to make some economic argument - because this is gonna be a 2026 election issue, how we think of these things and how they are regulated and it's a big mess. There's more heat than light in this debate.

Simon Willison: 38:05

I'd like to tag something on to that one: I think that only works if they can sort of wash that through existing trusted experts. Sam Altman and Dario are constantly publishing essays about this stuff and nobody believes a word they say. Get Barack Obama's signature on one of these position papers and maybe you've got something people might start to trust a little bit.

Adam Leventhal: 38:27

Otherwise, it's just like "leaded gas is good for you", says Exxon.

Bryan Cantrill: 38:31

I mean, yeah. God. Obama... let's go with that, that's a great one because if it's like Bill Clinton everyone's gonna kind of roll their eyes, so it's gotta be someone who's got real credibility saying that this is gonna be broad-based... I'd say if they get that person to do it, it's gonna be revealed that that's also a bit crooked.

Simon Willison: 38:57

How about the Pope?

Bryan Cantrill: 39:01

The Pope is very into this stuff! That's a great prediction. We've hit pay dirt. The Pope weighing in on LLMs and their economic impact on the world.

Simon, I'm giving you full credit if the Pope weighs in believing that this is gonna be economic devastation.

My prediction here looks a whole lot less insightful given the Leo XIV/Leo XIII relationship, which I was unaware of when we recorded the episode!

Tags: predictions, ai, kakapo, generative-ai, llms, bryan-cantrill, ai-ethics

Read the whole story
mrmarchant
20 hours ago
reply
Share this story
Delete

Citing Gandalf, Pope Leo says we must "disarm" AI

1 Share

With the co-founder of Anthropic at his side today in Rome, Pope Leo XIV released a major new encyclical—his first—called "Magnifica Humanitas” ("Magnificent Humanity"). It calls for AI to be "disarmed" in service of the common good.

"The word is strong," Leo admits, but he chose the language of "disarmament" deliberately "because this moment needs words capable of attracting attention, awakening consciences, and indicating paths forward for humanity." AI today must be "freed from logics that turn it into an instrument of domination, exclusion, and death."

The 40,000-word encyclical contains uncompromising critiques of AI-powered autonomous weapons, neo-colonial attitudes towards data collection, and the hoarding of "new forms of property, such as patents, algorithms, digital platforms, technological infrastructure, and data."

Read full article

Comments



Read the whole story
mrmarchant
20 hours ago
reply
Share this story
Delete

Solving the board game Quoridor

1 Share

Solving Quoridor

This post significantly improves the state of the art in solving the board game Quoridor. I describe novel techniques that enable fully solving almost all board configurations with area ≤ 28 (e.g. 5x5, 8x3, 7x4, etc) for most wall counts on a consumer laptop.

Background

I was introduced to the board game Quoridor back in 2014 and was immediately taken by it.

I usually spend a weekend returning to Quoridor once every couple years, writing different forms of AI bots to play it. This last weekend, I made a breakthrough that enables both much stronger bots, and much more complete solving.

Screenshot 2026-02-23 at 7.27.26 PM.png

Rules

The game is pretty simple:

  • Pawns start on opposite sides of the 9x9 board
  • Your goal is to get your pawn to the far side
  • You have 10 walls
  • On your turn, you can move your pawn 1 square, or place a wall
  • You can jump over the opponent's pawn
  • You can't place a wall that makes it impossible for a pawn to get to its goal

That last rule is where all the performance complexity comes from. You might be planning on blocking your friend's straight shot — making him take the long way around — but he places a wall that cuts off the long route, so now it's illegal for you to block the short route!

The "pawn jumps over opponent's pawn" rule creates interesting parity/zugzwang situations.

In addition to the typical 9x9 board with 10 walls, many papers have analyzed smaller boards and wall counts, since the full game is currently intractable.

Major Results

Parity Advantage vs Tempo Advantage

Many have speculated that Quoridor might be a 2nd player win on odd-height boards due to pawn jump parity. This work shows that odd-height Quoridor boards are not always 2nd player wins.

They are always 2nd player wins with few walls, but typically turn into 1st player wins at a sufficiently high wall count. For example, 5x5 is a 2nd player win at ≤4 walls per player, but 1st player win at >4 walls.

The intuition here is that odd-height boards have a jump parity advantage for 2nd player, but 1st player still has a tempo advantage, so a sufficient number of walls makes the 1st player tempo advantage dominate the 2nd player's jump advantage.

There are a few notable exceptions discussed below.

Relatedly, we find that even-height boards are uniformly 1st player wins at all wall counts because 1st player has the jump parity advantage and the tempo advantage.

Forced Draws

It was known that forced draws by repetition were possible to contrive, but this work shows that the 8x3 board with 3 walls per player is a draw from the starting position.

Both players must just dance left and right forever. If either player deviates from this repetition, the other player has a forced win, therefore the optimal strategy is draw by repetition.

Weird geometries

There are some geometries with outlier results. For example:

4x7 is a 2nd player win for 0 or 1 wall, 1st player win for 2 walls, then back to 2nd player for 3 walls, then 1st player beyond that.

7x3 never transitions into a 1st player win at any wall count.

3x5 is a 2nd player win at all wall counts except 3

8x3 is a 2nd player win at all wall counts except 3 (where it's a draw, as noted above).

Full results table

Note: even-heights omitted, they are all 1st player win at all wall counts and geometries.

W x H012345678910
2 x 321111111111
3 x 322222222222
4 x 322211111111
5 x 322211111111
6 x 322221111111
7 x 322222222222
8 x 3222D2222222
9 x 322222221111
2 x 522222222222
3 x 522212222222
4 x 522221111111
5 x 522222111111
2 x 722111111111
3 x 722222111111
4 x 72212111????
2 x 922222222222
3 x 9222221111??

You can reproduce the results of this work by running the code in this repository. The smaller configurations finish almost instantly, the largest ones in the table take up to 30 minutes on my M3 MacBook Pro. I have 128GB of RAM and the largest configurations use a sizeable chunk of that to store all the precomputed data and transposition table.

Complexity

Quoridor has two main complexifiers.

The "illegal to fully block goal" rule makes enumerating legal moves hard. Naively, you have to do a pathfinding search for every candidate move. This means move generation is several orders of magnitude slower than a game like chess.

Adding to this pain, the branching factor pretty high. On any given turn there are typically 4 pawn moves and about 100 possible wall moves. So not only are the wall moves super slow to check the legality of, there are also a ton of them.

This huge branching factor makes naive alpha-beta negamax pretty weak. It takes a decent amount of work to get a bot searching to depth 6 or so.

Beyond the giant branching factor, the horizon effect is harder to deal with. In chess, you can deal with the horizon effect pretty easily with quiescence search where you search only the "interesting" moves (i.e. captures, checks) until no such moves remain and the position is "quiet".

In Quoridor, there are very few quiet positions because walls can be placed anywhere on the board at any time.

In addition to not being able to look too deep, once you bottom-out on depth, evaluating a position is really hard. You have a short path, great. Can it be cut? Can you block it from being cut? You have walls, awesome — will you be able to effectively use them, or is your opponent in a safe-from-walls corridor? Etc.

All these complexities make me think Quoridor would be really amenable to an AlphaZero type approach which shines on games with high branching factors and difficult evaluation functions.

Miscellaneous optimization tricks

A few tricks I've picked up over the years of hacking on Quoridor bots. Some are highly specific to Quoridor, others are well-known in e.g. the computer chess community. Not all of these were used or are applicable to the full solver, but are nonetheless interesting or applicable to a general Quoridor bot.

Wall legality heuristic

If you place a wall floating off in an open area, it's always possible to go around it, so no legality check is needed.

Further, if you place a wall and it's only touching another wall (or board edge) at a single point, it's also always possible to go around it, so no legality check is needed.

You only need to do a legality check if the wall touches another wall or edge at at least 2 of the 3 points the wall touches.

Quoridor wall legality heuristic contact cases One 9 by 9 Quoridor board showing candidate walls with zero, one, and two contacts. 0 contacts skip 1 contact skip 2 contacts path check!

Most walls are legal

It's almost guaranteed your evaluation function uses path length as an input. This requires running a path algorithm.

Since you are already going to do this for your leaf-node evaluation function, you should skip it during all move generation. Move moves are legal!

Just recurse assuming all wall moves are legal, and if you discover at the leaf node that whoops we are in an illegal branch, that's fine, just return null instead of a score to mark that this node is invalid.

If you're at an inner node and your very first child returns null, then do a path check to see if the inner node is illegal, and fast-return null if it is.

This optimization only works because illegality is monotonic in Quoridor. Once you are in an illegal state, you cannot get to a legal state.

Bitboards

Standard Quoridor is a 9x9 cell board, but walls are length 2, so there are only 8x8 places to place a wall. This means you can represent all the horizontal walls as a 64 bit integer, and all the vertical walls as a 64 bit integer. Getting candidate wall moves can now be done with just a few ops.

Transposition

Transposition table is an easy 2x win. Add in horizontal symmetry for another 2x. You don't even really need to use Zobrist hashing since the board state is so few bits, you can either use it as a key outright or hash it in just a few ops.

Solving

Breakthrough optimization

There is one trick that makes solving boards like 5x5 Quoridor fast and easy, and it falls out of these two observations:

  • There are only 2,532,560 total possible wall configurations on a 5x5 board
  • If you have all the possible wall configurations, you can precompute legality bitboards for both players for all possible wall states

That is, for each wall state, you floodfill from each player's goal row to make a mask that contains the set of legal cells that player's pawn can be on.

This allows for extremely cheap legality checks. To check if a wall move is legal, you just look up the configuration in the table, and check that both players' pawns are on floodfilled cells.

Below is an example of a board in a legal state vs illegal state from just one player's point of view.

Floodfill mask lookup for Quoridor wall legality Two 5 by 5 Quoridor boards. The left board is legal because the pawn is inside the floodfilled goal mask. The right board is illegal because the pawn is outside that mask. Legal state pawn is inside the precomputed mask Illegal state pawn is outside the precomputed mask

Combined with the other tricks, this allowed me to fully solve 5x5 Quoridor in just a few minutes on my laptop.

Unfortunately this solution does not scale to the full 9x9 board which has 1020 possible wall configurations.

It's likely tractable to spend a few hundred dollars of cloud compute to solve the next frontier of board sizes if someone wants to throw money at the problem.

6x6 is not really interesting since it's almost guaranteed to be a 1st player win due to even height, but e.g. 7x5 would probably be interesting.

Wall Configurations by Board Size Line graph showing log base 10 of wall configurations with at most 2N minus 1 walls for square boards from 2x2 through 9x9. Wall Configurations by Board Size 0 5 10 15 20 2 3 4 5 6 7 8 9 Board Size log10 wall configurations

Proof search

Previous work has largely focused on using retrograde analysis to solve small Quoridor variants.

This work largely uses an algorithm closely related to proof-number search which, as far as I can find, appears to have never been applied to Quoridor.

I was not aware of proof-number search before this work, and accidentally re-invented it by modifying the negamax algorithm until it was essentially a proof search.

I had initially started with normal iterative-deepening negamax with (-∞, ∞) alpha-beta bounds and win/loss evaluation function, but you can get significantly more beta-cutoffs by initializing with (0, ∞) alpha-beta bounds.

At each depth, you do a search assuming 1st player wins, and if that doesn't find a forced win, do a search assuming 2nd player wins, and if that doesn't find a forced win, try it all again at the next depth.

Because of this structure, traditional alpha-beta techniques like move ordering can significantly speed up the search.

I had gone into this conjecturing that this always terminates and there were no forced draws from the start position. I found the forced draw for 8x3 with 3 walls when the max depth kept ticking up indefinitely, then I added a retrograde solver fallback which proved the draw. The various optimizations mentioned in this work make retrograde analysis tractable for this size.

Additional implementation details

While precomputing all wall configurations and legality masks, you can squeeze out more performance by precomputing a bit more information for any wall configuration:

  • All legal moves. Move generation is much cheaper with the legality masks, but you can still get a nice constant factor improvement by just precomputing it all upfront.
  • Distance to goal. This is useful for move ordering which helps alpha-beta. You wanna try moves that help yourself or hurt your opponent first.

Future work

Enumerating wall frontiers instead of whole wall configurations

You really shouldn't actually care about the full wall state. You really only care about the exact illegality frontier. That is, the minimal set of walls on the current board that causes the illegality.

Quoridor illegality frontier A 9 by 9 Quoridor board with a minimal teal wall frontier crossing the board and unrelated slate gray walls elsewhere.

Because illegality is monotonic, once you have such a minimal set, any additional walls are not making the board "more illegal".

So a better algorithm is to enumerate all possible such minimal illegality frontiers instead of all possible wall configurations. Then, implement some clever datastructure that can efficiently check if any illegality frontier is a subset of the current wall state.

Unfortunately, this probably still isn't enough to solve 9x9 Quoridor, the branching factor is still too high even if you could do legality checks in 1 nanosecond.

This method could probably be squeezed to scale to 6x6 or maybe even 7x7 though.

Enumerating all possible paths instead of walls

Rather than enumerating all possible wall configurations or frontiers, you can enumerate all possible paths, then perhaps there is a clever way to efficiently prune the set of valid paths given a current wall configuration. If the set of valid paths is empty, the state is illegal.

I've thought less hard about this one but I feel there is probably something there. The number of possible paths from all squares is pretty tractable for 7x7 — just a few billion.

Divide-and-conquer tiles

There is probably something to be done where you divide the 9x9 board into smaller tiles, e.g. nine 3x3 tiles.

Then you can precompute every possible tile. Each tile is keyed by its wall configuration + goal-reachability situation. A tile's goal-reachability situation is a function of its wall configuration + its neighbors' wall configurations and reachability situations.

This turns your 9x9 pathfinding into a 3x3 pathfinding which should be around 10x faster. It's unclear to me if this is useful for solving, but is probably useful for playing.

Closing thoughts

I'll keep thinking about this problem in the shower every few months and hopefully have some more insights.

It's interesting that the top LLMs — even tasked with grinding on this problem overnight in a harness — don't come close to this insight (tried with gpt-5.5-xhigh in codex CLI). It would be interesting to keep this as a private eval out of the training data, but publishing it is also valuable. It will be interesting to see if future LLMs can improve on this result.

I did of course use coding agents to implement all the code for the most recent version of this project, after providing them the high-level insights described above. I've implemented enough Quoridor bitboards for one lifetime and wouldn't have spent such time revisiting this project without that accelerant.

Call for others

If any enterprising young person is confident they can train an AlphaZero style Quoridor bot for a few hundred dollars of cloud GPU, reach out to me with your proposal and I'd be interested in funding it. I'm really curious what superhuman Quoridor looks like.

Play

Idle
Configure a board and precompute the tables.
Read the whole story
mrmarchant
21 hours ago
reply
Share this story
Delete
Next Page of Stories