1221 stories
·
1 follower

Learning How Learning Works

1 Share

Is it possible for large language models (LLMs) to successfully learn non-English languages?

That’s the question at the center of an ongoing debate among linguists and data scientists. However, the answer isn’t just a matter of scholarly research. The ability or inability of LLMs to learn so-called “impossible” languages has broader implications in terms of both how LLMs learn and the global societal impacts of LLMs.

Languages that deviate from natural linguistic structures, which are referred to as impossible languages, typically fall into two categories. The first is not a true language, but an artificially constructed language that contains arbitrary rules that cannot be followed and still make sense. The other category includes languages that include non-standard characters or grammar, such as Chinese and Japanese.

Low-resource languages, meaning those with limited training data, such as Lao, often face similar challenges to impossible languages. However, they are not considered to be impossible languages unless they also include non-standard characters, such as Burmese.

Revisiting impossible languages

In 2023, Noam Chomsky, considered the founder of modern linguistics, wrote that LLMs “learn humanly possible and humanly impossible languages with equal facility.”

However, in the Mission: Impossible Language Models paper that received a Best Paper award at the 2024 Association of Computational Linguistics (ACL) conference, researchers shared the results of their testing of Chomsky’s theory, having discovered that language models actually struggle with learning languages with non-standard characters.

Rogers Jeffrey Leo John, CTO of DataChat Inc., a company that he cofounded while working at the University of Wisconsin as a data science researcher, said the Mission: Impossible paper challenged the idea that LLMs can learn impossible languages as effectively as natural ones.

“The models [studied for the paper] exhibited clear difficulties in acquiring and processing languages that deviate significantly from natural linguistic structures,” said John. “Further, the researchers’ findings support the idea that certain linguistic structures are universally preferred or more learnable both by humans and machines, highlighting the importance of natural language patterns in model training. This finding could also explain why LLMs, and even humans, can grasp certain languages easily and not others.”

Measuring the difficulty of an LLM learning a language

An LLM’s fluency in a language falls onto a broad spectrum, from predicting the next word in a partial sentence to answering a question. Additionally, individual users and researchers often bring different definitions and expectations of fluency to the table. Understanding LLMs’ issues with processing impossible languages starts by defining how the researchers, and linguists in general, determine whether a language is difficult for an LLM to learn. Kartik Talamadupula, a Distinguished Architect (AI) at Oracle who previously was head of Artificial Intelligence at Wand Synthesis AI, an AI platform integrating AI agents with human teams, said that when talking about measuring the ability of an LLM, the bar is always about predicting the next token (or word).

“Behavior like ‘answering questions’ or ‘logical reasoning’ or any of the other things that are ascribed to LLMs are just human interpretations of this token completion behavior,” said Talamadupula. “Training on additional data for a given language will only make the model more accurate in terms of predicting that next token, and sequentially, the set of all next tokens, in that particular language.

John explained that when a model internalizes statistical patterns through probabilities of how words, phrases, and complex ideas co-occur, based on exposure to billions or trillions of examples, it can model syntax, infer semantics, and even mimic reasoning. With this skill mastered in a language, the LLM then uses it as a powerful training signal.

“If a model sees enough questions and answers in its training data, it can learn: When a sentence starts with ‘What is the capital of France?’, the next few tokens are likely to be ‘The capital of France is Paris,’” said John. “Other capabilities, like question-answering, summarization, [and] translation can all emerge from that next-word prediction task, especially if you fine-tune or prompt the model in the right way.”

Sanmi Koyejo, an assistant professor of computer science at Stanford University, said researchers also measure how quickly (in terms of training steps) a model reaches a certain performance threshold when determining if a language is difficult to learn or not. He said the Mission: Impossible paper demonstrated that for AIs to learn impossible languages, they often need more training on the data to reach performance levels comparable to those of other languages.

Low volume of training data increases difficulty

An LLM learns everything, including language and grammar, through training data. If a topic or language does not have sufficient training data, the LLM’s ability to learn it is significantly limited. The majority of high-quality training data is currently in Chinese and English, and many non-standard languages are impossible for LLMs to effectively learn, due to the lack of sufficient data.

Talamadupula said that non-standard languages such as Korean, Japanese, and Hindi, often have the same issue as low-resource languages with standard characters—not having enough data for training. This dearth of data makes it difficult to accurately model the probability of next-token generation. When asked about the challenge of non-Western languages understanding implied subjects, he said that LLMs do not actually understand a subject in a sentence.

“Based on their training data, they just model the probability that a given token, or word, will follow a set of tokens that have already been generated. The more data that is available in a given language, the more accurate the ‘completion’ of a sentence is going to be,” he said.

“If we were to somehow balance all the data available and train a model on a regimen of balanced data across languages, then the model would have the same error and accuracy profiles across languages,” said Talamadupula.

John agreed that because the ability of an LLM to learn a language stems from probability distributions, both the volume and quality of training data significantly influence how well an LLM performs across different languages. Because English and Chinese content dominate most training datasets, LLMs have a higher fluency, deeper knowledge, and stronger capabilities in those languages.

“Ultimately, this stems from how LLMs learn languages—through probability distributions. They develop linguistic understanding by being exposed to examples. If a model sees only a few thousand instances of a language, like Xhosa, compared to trillions of English tokens, it ends up learning unreliable token-level probabilities, misses subtleties in grammar and idiomatic usage, and struggles to form strong conceptual links between ideas and their linguistic representations,” said John.

Language structure also affects the ability to learn

Research also increasingly shows that the structure of the target language plays a role. Koyejo said the Mission: Impossible paper supports the idea that information locality (related words being close together) is an important property that makes languages learnable by both humans and machines.

“When testing various impossible languages, the researchers of the Mission: Impossible Language Models paper found that randomly shuffled languages (which completely destroys locality) were the hardest for models to learn, showing the highest perplexity scores,” said Koyejo. The Mission: Impossible paper defined perplexity as a course-grained metric of language learning. Koyejo also explained that languages created with local ‘shuffles’, where words were rearranged only within small windows, were easier for models to learn than languages with global shuffles.

“The smaller the window size, the easier the language was to learn, suggesting that preserving some degree of locality makes a language more learnable,” said Koyejo. “The researchers observed a clear gradient of difficulty—from English (high locality) → local shuffles → even-odd shuffles → deterministic shuffles → random shuffles (no locality). This gradient strongly suggests that information locality is a key determinant of learnability.”

Koyejo also pointed out that another critical element for a model learning a non-standard language is tokenization, with the character systems of East Asian languages creating special challenges. For example, Japanese mixes multiple writing systems, and the Korean alphabet combines syllable blocks. He said that progress in those languages will require increased data and architectural innovations that better suit their unique properties.

“Neither language uses spaces between words consistently. This means standard tokenization methods often produce sub-optimal token divisions, creating inefficiencies in model learning,” said Koyejo. “Our studies on Vietnamese, which shares some structural properties with East Asian languages, highlight how proper tokenization dramatically affects model performance.”

Insights into learning

The challenge with LLMs learning nonstandard languages is both interesting and impactful, and the issues provide key insights into how LLMs actually learn. The Mission: Impossible Language Models paper also reaches this conclusion, stating, “We argue that there is great value in treating LLMs as a comparative system for human languages in understanding what systems like LLMs can and cannot learn.”

Aaron Andalman, chief science officer and co-founder of Cognitiv and a former MIT neuroscientist, expanded on the paper’s conclusion by adding that LLMs don’t merely learn linguistic structures, but also implicitly develop substantial knowledge about the world during their training, meaning they develop a higher understanding of the languages.

“Effective language processing requires understanding context, which encompasses concepts, relationships, facts, and logical reasoning about real-world situations,” said Andalman. “Consequently, as models grow larger and undergo more extensive training, they accumulate more extensive and nuanced world knowledge.”

Further Reading

Read the whole story
mrmarchant
35 minutes ago
reply
Share this story
Delete

Fizz Buzz in CSS

1 Share

What is the smallest CSS code we can write to print the Fizz Buzz sequence? I think it can be done in four lines of CSS as shown below:

li { counter-increment: n }
li:not(:nth-child(5n))::before { content: counter(n) }
li:nth-child(3n)::before { content: "Fizz" }
li:nth-child(5n)::after { content: "Buzz" }

Here is a complete working example: css-fizz-buzz.html.

I am neither a web developer nor a code-golfer. I am just an ordinary programmer playing on the sea-shore and diverting myself in now and then finding a rougher pebble or an uglier shell than ordinary, whilst the great ocean of absurd contraptions lay all undiscovered before me.

Seasoned code-golfers looking for a challenge can probably shrink this solution further. However, such wizards are also likely to scoff at any mention of counting lines of code, since this mind sport treats such measures as pointless when all of CSS can be collapsed into a single line. The number of bytes is probably more meaningful. The code can also be minified slightly by removing all whitespace:

$ curl -sS https://susam.net/css-fizz-buzz.html | sed -n '/counter/,/after/p' | tr -d '[:space:]'
li{counter-increment:n}li:not(:nth-child(5n))::before{content:counter(n)}li:nth-child(3n)::before{content:"Fizz"}li:nth-child(5n)::after{content:"Buzz"}

This minified version is composed of 152 characters:

$ curl -sS https://susam.net/css-fizz-buzz.html | sed -n '/counter/,/after/p' | tr -d '[:space:]' | wc -c
152

If you manage to create a shorter solution, please do leave a comment.

Read on website | #absurd | #web | #technology

Read the whole story
mrmarchant
44 minutes ago
reply
Share this story
Delete

The Web Runs On Tolerance

1 Share

If you've ever tried to write a computer program, you'll know the dread of a syntax error. An errant space and your code won't compile. Miss a semi-colon and the world collapses. Don't close your brackets and watch how the computer recoils in distress.

The modern web isn't like that.

You can make your HTML as malformed as you like and the web-browser will do its best to display the page for you. I love the todepond website, but the source-code makes me break out in a cold sweat. Yet it renders just fine.

Sure, occasionally there are weird artefacts. But the web works because browsers are tolerant.

You can be crap at coding and the web still works. Yes, it takes an awful lot of effort from browser manufacturers to make "do what I mean, not what I say" a reality. But the world is better for it.

That's the crucial mistake that XHTML made. It was an attempt to bring pure syntactic rigour to the web. It had an intolerant ideology. Every document had to precisely conform to the specification. If it didn't, the page was irrevocably broken. I don't mean broken like a weird layout glitch, I mean broken like this:

XML Parsing Error: mismatched tag. Expected: </h1>.
Location: https://example.com/test.xhtml Line Number 9, Column 5:

The user experience of XHTML was rubbish. The disrespect shown to anyone for deviating from the One True Path made it an unwelcoming and unfriendly place. Understandably, XHTML is now a mere footnote on the web. Sure, people are free to use it if they want, but its unforgiving nature makes it nobody's first choice.

The beauty of the web as a platform is that it isn't a monoculture.

That's why it baffles me that some prominent technologists embrace hateful ideologies. I'm not going to give them any SEO-juice by linking to them, but I cannot fathom how someone can look at the beautiful diversity of the web and then declare that only pure-blooded people should live in a particular city.

How do you acknowledge that the father of the computer was a homosexual, brutally bullied by the state into suicide, and then fund groups that want to deny gay people fundamental human rights?

The ARM processor which powers the modern world was co-designed by a trans woman. When you throw slurs and denigrate people's pronouns, your ignorance and hatred does a disservice to history and drives away the next generation of talent.

History shows us that all progress comes from the meeting of diverse people, with different ideas, and different backgrounds. The notion that only a pure ethnostate can prosper is simply historically illiterate.

This isn't an academic argument over big-endian or little-endian. It isn't an ideological battle about the superiority of your favourite text editor. There's no good-natured ribbing about which desktop environment has the better design philosophy.

Denying rights to others is poison. Wishing violence on people because of their heritage is harmful to all of us.

Do we want all computing to go through the snow-white purity of Apple Computer? Have them as the one and only arbiters of what is and isn't allowed? No. That's obviously terrible for our ecosystem.

Do we want to segregate computer users so that an Android user can never connect their phone to a Windows machine, or make it impossible for Linux laptops to talk to Kodak cameras? That sort of isolation should be an anathema to us.

Why then align with people who espouse isolationism? Why gleefully cheer the violent racists who terrorise our communities? Why demean people who merely wish to exist?

The web runs on tolerance. Anyone who preaches the ideology of hate has no business here.

Read the whole story
mrmarchant
14 hours ago
reply
Share this story
Delete

Why Does A.I. Write Like … That?

1 Share

The explosion of em dashes and negation (“it’s not X, it’s Y”) in LinkedIn posts and college essays over the past few years has an obvious culprit, and it rhymes with “fartificial fintelligence.” But rather than relying on pedantic ire, Sam Kriss sets out to take the measure of AI writing’s sundry and various shortcomings. Required reading for the next time someone tells you that “no, it’s actually quite good!” Good for data analysis, sure. Good for compelling prose? Maybe not quite.

What nobody really anticipated was that inhuman machines generating text strings through essentially stochastic recombination might be funny. But GPT had a strange, brilliant, impressively deadpan sense of humor. It had a habit of breaking off midway through a response and generating something entirely different. . . . When I tried to generate some more newspaper headlines, they included “A Gun Is Out There,” “We Have No Solution” and “Spiders Are Getting Smarter, and So, So Loud.”

I ended up sinking several months into an attempt to write a novel with the thing. It insisted that chapters should have titles like “Another Mountain That Is Very Surprising,” “The Wetness of the Potatoes” or “New and Ugly Injuries to the Brain.” The novel itself was, naturally, titled Bonkers From My Sleeve.” There was a recurring character called the Birthday Skeletal Oddity. For a moment, it was possible to imagine that the coming age of A.I.-generated text might actually be a lot of fun.

Read the whole story
mrmarchant
14 hours ago
reply
Share this story
Delete

Why Are 38 Percent of Stanford Students Saying They're Disabled?

1 Share
12-4-25-v1-a (3) | Illustration: Eddie Marshall | Midjourney

The students at America's elite universities are supposed to be the smartest, most promising young people in the country. And yet, shocking percentages of them are claiming academic accommodations designed for students with learning disabilities. 

In an article published this week in The Atlantic, education reporter Rose Horowitch lays out some shocking numbers. At Brown and Harvard, 20 percent of undergraduate students are disabled. At Amherst College, that's 34 percent. At Stanford University, it's a galling 38 percent. Most of these students are claiming mental health conditions and learning disabilities, like anxiety, depression, and ADHD. 

Obviously, something is off here. The idea that some of the most elite, selective universities in America—schools that require 99th percentile SATs and sterling essays—would be educating large numbers of genuinely learning disabled students is clearly bogus. A student with real cognitive struggles is much more likely to end up in community college, or not in higher education at all, right?

The professors Horowitz interviewed largely back up this theory. "You hear 'students with disabilities' and it's not kids in wheelchairs," one professor told Horowitch. "It's just not. It's rich kids getting extra time on tests." Talented students get to college, start struggling, and run for a diagnosis to avoid bad grades. Ironically, the very schools that cognitively challenged students are most likely to attend—community colleges—have far lower rates of disabled students, with only three to four percent of such students getting accommodations. 

To be fair, some of the students receiving these accommodations do need them. But the current language of the Americans with Disabilities Act (ADA) allows students to get expansive accommodations with little more than a doctor's note.

While some students are no doubt seeking these accommodations as semi-conscious cheaters, I think most genuinely identify with the mental health condition they're using to get extra time on tests. Over the past few years, there's been a rising push to see mental health and neurodevelopmental conditions as not just a medical fact, but an identity marker. Will Lindstrom, the director of the Regents' Center for Learning Disorders at the University of Georgia, told Horowitch that he sees a growing number of students with this perspective. "It's almost like it's part of their identity," Lindstrom told her. "By the time we see them, they're convinced they have a neurodevelopmental disorder." 

What's driving this trend? Well, the way conditions like ADHD, autism, and anxiety get talked about online—the place where most young people first learn about these conditions—is probably a contributing factor. Online creators tend to paint a very broad picture of the conditions they describe. A quick scroll of TikTok reveals creators labeling everything from always wearing headphones, to being bad at managing your time, to doodling in class as a sign that someone may have a diagnosable condition. According to these videos, who isn't disabled? 

The result is a deeply distorted view of "normal." If ever struggling to focus or experiencing boredom is a sign you have ADHD, the implication is that a "normal," nondisabled person has essentially no problems. A "neurotypical" person, the thinking goes, can churn out a 15-page paper with no hint of procrastination, maintain perfect focus during a boring lecture, and never experience social anxiety or awkwardness. This view is buffeted by the current way many of these conditions are diagnosed. As Horowitch points out, when the latest issue of the DSM, the manual psychiatrists use to diagnose patients, was released in 2013, it significantly lowered the bar for an ADHD diagnosis. When the definition of these conditions is set so liberally, it's easy to imagine a highly intelligent Stanford student becoming convinced that any sign of academic struggle proves they're learning disabled, and any problems making friends are a sign they have autism. 

Risk-aversion, too, seems like a compelling factor driving bright students to claim learning disabilities. Our nation's most promising students are also its least assured. So afraid of failure—of bad grades, of a poorly-received essay—they take any sign of struggle as a diagnosable condition. A few decades ago, a student who entered college and found the material harder to master and their time less easily managed than in high school would have been seen as relatively normal. Now, every time she picks up her phone, a barrage of influencers is clamoring to tell her this is a sign she has ADHD. Discomfort and difficulty are no longer perceived as typical parts of growing up.

In this context, it's easy to read the rise of academic accommodations among the nation's most intelligent students as yet another manifestation of the risk-aversion endemic in the striving children of the upper middle class. For most of the elite-college students who receive them, academic accommodations are a protection against failure and self-doubt. Unnecessary accommodations are a two-front form of cheating—they give you an unjust leg-up on your fellow students, but they also allow you to cheat yourself out of genuine intellectual growth. If you mask learning deficiencies with extra time on texts, soothe social anxiety by forgoing presentations, and neglect time management skills with deadline extensions, you might forge a path to better grades. But you'll also find yourself less capable of tackling the challenges of adult life.

The post Why Are 38 Percent of Stanford Students Saying They're Disabled? appeared first on Reason.com.

Read the whole story
mrmarchant
14 hours ago
reply
Share this story
Delete

What '67' Reveals About Childhood Creativity

1 Share

Have you heard about 2025’s word of the year? It’s causing a bit of controversy because it’s actually not a word. “67” (pronounced six-seven) is all the rage with Gen Alpha, a phrase often accompanied by an up and down hand movement.

Though it originated in the lyrics of a song by Philadelphia rapper Skrilla, and even in that context doesn’t mean anything in particular, it has become inescapable in 2025, causing outright bans on the phrase in classrooms as well as extensive head scratching by parents.

The “67” phenomenon has been, much like the rest of Gen Alpha’s vernacular, attributed to algorithms and brainrot culture. But other than its initial spread via TikTok, there’s not much that separates “67” from centuries of absurd, nonsensical kid culture.

This whole “67” thing may be foreign to you, but you probably grew up singing “Miss Mary Mack” or shouting “Kobe!” or drawing a Superman “S” in your notebooks—or something along those lines. These are all examples of children’s culture studied by Iona and Peter Opie. And their work might be the key to finding the meaning within the seemingly meaningless “67.”

The Opies were a British couple who dedicated their lives to the study of children’s folklore, games, traditions, and beliefs.

Their first book was a collection of nursery rhymes, but the Opies published numerous books which fully documented child culture—not as it was remembered by adults later in life, but as it actually was, existing and evolving in real-time.

From the 1950s through the 1980s, Iona and Peter traveled throughout the UK, observing children on playgrounds and in schools, recording their rhymes and interviewing them about their pastimes. They also built up a network of hundreds of teachers, parents, academics, and children themselves all around the English-speaking world, who filled out surveys and corresponded with the Opies.

Thousands of children ended up contributing directly to the Opies’ fieldwork, and their many published books and extensive archive, currently held by Oxford’s Bodleian Library, are an incredibly valuable trove of firsthand documentation of the lives of postwar children in the UK and elsewhere.

The Opies were outsiders to the academic establishment, technically amateurs without degrees, who nevertheless made an enormous impact on the fields of folklore, childhood studies, and ethnology.

Part of their obsession with documenting children’s traditions had to do with refuting an idea, common at the time, that television and mass media was “ruining” childhood. (Sound familiar?)

article-image

The Opies proved that childhood culture was as vibrant and alive as it had ever been. Children had their own world of lore and superstition: knocking on their own heads for good luck because of “blockhead,” slang for idiot, meant your head was like wood; avoiding stepping on cracks in the pavement; sitting cross-legged for good luck during exams and tests.

Many of the common rhymes and verses beloved by children were found by the Opies to have originated, much like “67,” in the lyrics of popular music. But unlike the novelty of “67,” one of the most fascinating qualities of this oral tradition was its historical nature. Songs perceived by children to be the hot new thing on the playground actually had their origins in popular songs or poems of decades if not centuries before.

One rhyme was tracked from its origin in a 1725 ballad about a drunken soldier to a contemporary playground couplet in 1954.

article-image


Other rhymes and phrases still in common use in the 1950s had, the Opies discovered, actually originated in minstrel and music-hall tunes from the 1840s through the 1880s.

The Opies didn’t use the word “meme” because that term wasn’t coined until the 1980s, with Richard Dawkins’ work on “the selfish gene,” but they were essentially demonstrating that these rhymes were memes, being passed along from child to child in a long unbroken chain, being modified somewhat from generation to generation as they mutated to survive.

Though “67” doesn’t rhyme, it has a great deal in common with the memetic rhymes the Opies collected. In The Lore and Language of Schoolchildren, they wrote:

“[Rhymes] seem to be one of their means of communication with each other. Language is still new to them, and they find difficulty in expressing themselves. When on their own they burst into rhyme, of no recognizable relevancy, as a cover in unexpected situations, to pass off an awkward meeting, to fill a silence, to hide a deeply felt emotion, or in a gasp of excitement.”

This is the same way Gen Alpha kids today, to their teachers’ and parents’ consternation, drop “67” in the middle of conversations, or laugh uncontrollably when it comes up in math class.

The Opies went on, “And through these quaint ready-made formulas the ridiculousness of life is underlined, the absurdity of the adult world and their teachers proclaimed, danger and death mocked, and the curiosity of language itself is savoured.”

The ridiculousness and pointlessness of “67” is perhaps why it has succeeded so extravagantly as a meme, breaking out of the classroom to become Word of the Year: it perfectly encapsulates everything the Opies understood that kids need out of their private jokes.

So is “67” a sign that screens and algorithms are “ruining childhood” with “brainrot?” Far from it—this trendactually shows that despite a screen-mediated culture kids are actually managing to generate new entries in the playground canon.

Of course now instead of being preserved from generation to generation, these memes are being replaced at the speed of the internet by new rhymes, jokes, and phrases straight from the feed.

Will kids still be saying “67” in 67 years, or will it have been forgotten? Only time will tell.

Read the whole story
mrmarchant
14 hours ago
reply
Share this story
Delete
Next Page of Stories