1224 stories
·
1 follower

The Land of Giants, a conceptual proposal to build power line towers...

1 Share
The Land of Giants, a conceptual proposal to build power line towers so that they look like people.

💬 Join the discussion on kottke.org →

Read the whole story
mrmarchant
just a second ago
reply
Share this story
Delete

Accessible by Design: The Role of the 'lang' Attribute

1 Share
by Todd Libby

When starting a project, whether it is an application, a mobile app or site, or just a website in general I still see an alarming number of examples where the language attribute is not included in the <html> element. Not the !DOCTYPE, but the element directly after the DOCTYPE.

I have audited many sites and many frameworks in the past, I have noticed an alarming omission right from the outset when developers are building sites or applications. Especially in the mobile space and let's face it, in web development we focus on making things for ourselves and if it works on our computer, it must work everywhere! Right?

I see it more prevalent these days. There are surveys out and the issue of accessibility education in university or boot camps still lacks. New developers entering the field who aren't aware, framework authors that just don't know, understand, or they just don'make their work accessible.

I am here to discuss the importance of the language attribute in your code.

The Attribute and the Importance of the Language Used

Sometimes, a tiny detail can make or break the experience for millions of users. One of these tiny, powerful details is the lang attribute in your HTML.

The lang attribute is a simple piece of code that tells web browsers and screen readers what human language your page is written in. For example:

<html lang="en"> means the page is in English.
<html lang="es"> means the page is in Spanish.

When you forget this attribute, you're not just missing a semantic tag—you're creating a major accessibility barrier. If you don't tell the computer what language you're using, assistive tools won't know how to read your content correctly.

There Is Data Here and You Should Read It

The WebAIM Million Report is an accessibility report done by WebAIM every year and it's an accessibility evaluation of the top one million homepages on the internet. 2025 marked the seventh year this has been done and the results are not surprising.

Let's show the data for the language attribute.

A graph showing the top six accessibility issues found in the top one million websites by WebAIM. Low contrast of text is number one followed by missing alt text, missing labels, empty links, empty buttons and finally missing language attribute.

For the seventh year in a row, a missing document language made the list.

A graph showing the top six accessibility issues found in the top one million websites by WebAIM by year starting in 2019 up to 2025. Low contrast of text is number one followed by missing alt text, missing labels, empty links, empty buttons and finally missing language attribute.

As with the rest of the items in the data, it has been a common theme the last seven years. Missing language attribute has always been the last item on the repeating list of common failures. So what are the implications?

A numerical look shows the data is still trending to the same six problems in the report. So why is it that these issues are the ones that stay in the top six?

The WebAIM Million report showing the percentage of top million websites tested and the percentage of those with issues.
The WebAIM Million Report showing low contrast of text at 79.1% followed by missing alternative text for images at 55.5%, missing form input labels at 48.2%, empty links at 45.4%, empty buttons at 29.6%, and finally missing language attribute at 15.8%.

What Happens When the Language is Missing? The Wrong Voice Problem

The main group affected by a missing lang tag is the screen reader user. Screen readers are essential tools that read web content aloud. They're mainly used by people who are blind, have low vision or for those that use text-to-speech. They are also used by people that find reading difficult for other reasons, this is a common practice with people with ADHD (Adult attention-Deficit/Hyperactivity Disorder).

Screen readers don't just use one voice; they use specialized software packages for each language. This software knows the pronunciation rules, rhythm, and stress for English, French, Japanese, etc.

When your page is missing the lang attribute, the screen reader has to guess the language. It usually guesses based on the user's computer settings (for example, if the user lives in Germany, the screen reader will try to use the German voice).

Example: English Text Read by a German Voice

Imagine your entire website is in clear English. If a German screen reader tries to read it, it will apply German pronunciation rules.

“The” might sound like “Tee-hay.”

or;

“Data” might be pronounced with a hard ‘A’ sound instead of a soft one.

The result is garbled, unnatural, and often unintelligible speech. The text is still on the page, but for the screen reader user, the content is lost. They cannot understand your article, buy your product, or use your service.

This single small mistake transforms your helpful website into a frustrating, unusable experience.

It's a Rule, Not a Suggestion (WCAG)

Using the lang attribute isn't just a friendly suggestion; it's a core requirement for making your website accessible.

The Web Content Accessibility Guidelines (WCAG) are the international standard for web accessibility. WCAG Success Criterion 3.1.1 (Language of Page) states that the language of the page must be clear to the computer. This is a level ‘A’ requirement, which means it's mandatory for basic accessibility.

If your website fails this check, it is officially considered inaccessible.

How It Affects Other Tools

The lang attribute helps more than just screen readers:

1. Braille Displays

A refreshable braille display translates text into small patterns of raised bumps. Different languages use different contraction rules in braille (called Grade 2 braille). If the language is not set, the braille translator might use the wrong rules, turning clear text into meaningless gibberish for the braille reader.

2. Automated Translation

When a user relies on tools like Google Translate or a browser's built-in translation feature, telling the tool the source language (the language you wrote it in) ensures a much more accurate translation. If the source language is unclear, the translation quality drops sharply. An example can be found here.

3. Quotation Marks

The lang attribute helps the browser and other user agents select the correct typographical glyphs for quotation marks, especially when it comes to when the <q> and <blockquote> elements are used (when styled using CSS generated content such as content: open-quote). For example:

  • In English lang="en", quotes are typically “double quotes”.

  • In German lang="de", they are often rendered as „low-9 quotes‟.

  • In French lang="fr", they use « guillemets ».

While less related to visual quotation marks, providing the correct language helps assistive technologies pronounce the surrounding text accurately, ensuring a fluid and comprehensible reading experience.

Not providing the correct language may cause browsers to default to the user's system language or a neutral setting for quotation marks which may not match the document's language which results in incorrect or confusing typography (e.g., using English quote marks for German language).

Without a declared language, a screen reader may attempt to read the text using incorrect phonetic rules, voice, and accent. Which makes the content sound like gibberish and can make it incomprehensible for users who rely on audio output.

4. Hyphenation

Proper hyphenation is entirely language-dependent. Hyphenation rules can be complex and unique to each language. when CSS is used, hyphens: auto, the browser or user agent relies on the lang attribute to load the appropriate hyphenation dictionary and apply correct linguistic rules which can improve text flow and readability. Especially in justified or narrow columns.

For example, a long compound word in German, lang="de", will be broken according to German rules such as Rechtsschutzversicherungsgesellschaften (which means, insurance companies providing legal protection).

Most browsers do not provide automatic hyphenation if the language is not declared. This can not only lead to unsightly text blocks with excessive white space between words, but also horizontal scrolling or overflow on mobile devices which severely impacts readability and layout stability.

If the browser attempts to guess the language or uses the wrong default, it could apply the incorrect hyphenation rules, which breaks words in places that are linguistically wrong, which, in turn, confuses the reader.

What About Pages with Two Languages?

What if your page is mostly English but includes a quote in Spanish? If you don't do anything, the screen reader will read the Spanish quote using the English voice, again leading to mispronunciation.

You can fix this instantly by adding the lang attribute to the specific element that changes language:

<p lang="en">
The artist once said, "Always remember this phrase:
<span lang="fr">Je ne regrette rien.</span>" I think that sums up his career.
</p>

In this code, the screen reader switches to the French voice for the quote and then immediately switches back to the English voice for the rest of the sentence. This small change ensures all users hear the content exactly as intended.

How to Set the Language in Modern Web Frameworks

In modern websites built with tools like React, Vue, or Angular, you usually don't touch the main HTML file very often. Since these tools mostly control the content inside the <body> tag, you have to know where to find the root template file to set the lang attribute correctly. for example,

React uses the file, public/index.html. Therefore you would directly place the attribute in the <html> tag in that file.

Framework What File to Edit Where to Put the Code
React public/index.html Directly on the <html> tag in that file.
Next.js app/layout.tsx (or similar root file) Set the lang in the JSX for the root <html> element.
Vue public/index.html Directly on the <html> tag in that file.
Nuxt nuxt.config.ts Inside the app.head.htmlAttrs setting in your config file.
Angular src/index.html Directly on the <html> tag in that file.
Svelte/SvelteKit index.html or src/app.html Directly on the <html> tag in the main template file.

Example: Setting the Language in a Static Template

For most simple apps (React, Angular, plain HTML), you will open your main index.html file and change the first line like this:

<!DOCTYPE html>
<!-- Change the line below from <html> to the correct language code -->
<html lang="en">
<head>
<!-- ... -->
</head>
<body>
<!-- Your app code loads here -->
</body>
</html>

Conclusion

The lang attribute is a tiny line of code that provides universal access to your content. It's arguably the easiest, fastest, and most impactful accessibility fix you can make on any website.

By correctly setting the language, you ensure that everyone has equal access to your content. Regardless of whether they use a screen reader, braille display, or translation tool to do so, their tools have the fundamental information they need to do their jobs correctly. It's a simple commitment that makes the web better for everyone.

Don't let a missing two-letter code turn your content into a foreign language for your users and don't be afraid to use it or add it in!

Read the whole story
mrmarchant
1 minute ago
reply
Share this story
Delete

Magic Magikarp Makes Moves

1 Share
A picture of a life sized magikarp from pokemon

One of the most influential inventions of the 20th century was Big Mouth Billy Bass. A celebrity bigger than the biggest politicians or richest movie stars, there’s almost nothing that could beat Billy. That is, until [Kiara] from Kiara’s Workshop built a Magikarp version of Big Mouth Billy Bass.

Sizing in at over 2 entire feet, the orange k-carp is able to dance, it is able to sing, and it is able to stun the crowd. Magikarp functions the same way as its predecessor; a small button underneath allows the show to commence. Of course, this did not come without its challenges.

Starting the project was easy, just a model found online and some Blender fun to create a basic mold. Dissecting Big Mouth Billy Bass gave direct inspiration for how to construct the new idol in terms of servos and joints. Programming wasn’t even all that much with the use of Bottango for animations. Filling the mold with the silicone filling proved to be a bit more of a challenge.

After multiple attempts with some minor variations in procedure, [Kirara] got the fish star’s skin just right. All it took was a paint job and some foam filling to get the final touches. While this wasn’t the most mechanically challenging animatronic project, we have seen our fair share of more advanced mechanics. For example, check out this animatronic that sees through its own eyes!

Read the whole story
mrmarchant
1 hour ago
reply
Share this story
Delete

Learning How Learning Works

1 Share

Is it possible for large language models (LLMs) to successfully learn non-English languages?

That’s the question at the center of an ongoing debate among linguists and data scientists. However, the answer isn’t just a matter of scholarly research. The ability or inability of LLMs to learn so-called “impossible” languages has broader implications in terms of both how LLMs learn and the global societal impacts of LLMs.

Languages that deviate from natural linguistic structures, which are referred to as impossible languages, typically fall into two categories. The first is not a true language, but an artificially constructed language that contains arbitrary rules that cannot be followed and still make sense. The other category includes languages that include non-standard characters or grammar, such as Chinese and Japanese.

Low-resource languages, meaning those with limited training data, such as Lao, often face similar challenges to impossible languages. However, they are not considered to be impossible languages unless they also include non-standard characters, such as Burmese.

Revisiting impossible languages

In 2023, Noam Chomsky, considered the founder of modern linguistics, wrote that LLMs “learn humanly possible and humanly impossible languages with equal facility.”

However, in the Mission: Impossible Language Models paper that received a Best Paper award at the 2024 Association of Computational Linguistics (ACL) conference, researchers shared the results of their testing of Chomsky’s theory, having discovered that language models actually struggle with learning languages with non-standard characters.

Rogers Jeffrey Leo John, CTO of DataChat Inc., a company that he cofounded while working at the University of Wisconsin as a data science researcher, said the Mission: Impossible paper challenged the idea that LLMs can learn impossible languages as effectively as natural ones.

“The models [studied for the paper] exhibited clear difficulties in acquiring and processing languages that deviate significantly from natural linguistic structures,” said John. “Further, the researchers’ findings support the idea that certain linguistic structures are universally preferred or more learnable both by humans and machines, highlighting the importance of natural language patterns in model training. This finding could also explain why LLMs, and even humans, can grasp certain languages easily and not others.”

Measuring the difficulty of an LLM learning a language

An LLM’s fluency in a language falls onto a broad spectrum, from predicting the next word in a partial sentence to answering a question. Additionally, individual users and researchers often bring different definitions and expectations of fluency to the table. Understanding LLMs’ issues with processing impossible languages starts by defining how the researchers, and linguists in general, determine whether a language is difficult for an LLM to learn. Kartik Talamadupula, a Distinguished Architect (AI) at Oracle who previously was head of Artificial Intelligence at Wand Synthesis AI, an AI platform integrating AI agents with human teams, said that when talking about measuring the ability of an LLM, the bar is always about predicting the next token (or word).

“Behavior like ‘answering questions’ or ‘logical reasoning’ or any of the other things that are ascribed to LLMs are just human interpretations of this token completion behavior,” said Talamadupula. “Training on additional data for a given language will only make the model more accurate in terms of predicting that next token, and sequentially, the set of all next tokens, in that particular language.”

John explained that when a model internalizes statistical patterns through probabilities of how words, phrases, and complex ideas co-occur, based on exposure to billions or trillions of examples, it can model syntax, infer semantics, and even mimic reasoning. With this skill mastered in a language, the LLM then uses it as a powerful training signal.

“If a model sees enough questions and answers in its training data, it can learn: When a sentence starts with ‘What is the capital of France?’, the next few tokens are likely to be ‘The capital of France is Paris,’” said John. “Other capabilities, like question-answering, summarization, [and] translation can all emerge from that next-word prediction task, especially if you fine-tune or prompt the model in the right way.”

Sanmi Koyejo, an assistant professor of computer science at Stanford University, said researchers also measure how quickly (in terms of training steps) a model reaches a certain performance threshold when determining if a language is difficult to learn or not. He said the Mission: Impossible paper demonstrated that for AIs to learn impossible languages, they often need more training on the data to reach performance levels comparable to those of other languages.

Low volume of training data increases difficulty

An LLM learns everything, including language and grammar, through training data. If a topic or language does not have sufficient training data, the LLM’s ability to learn it is significantly limited. The majority of high-quality training data is currently in Chinese and English, and many non-standard languages are impossible for LLMs to effectively learn, due to the lack of sufficient data.

Talamadupula said that non-standard languages such as Korean, Japanese, and Hindi, often have the same issue as low-resource languages with standard characters—not having enough data for training. This dearth of data makes it difficult to accurately model the probability of next-token generation. When asked about the challenge of non-Western languages understanding implied subjects, he said that LLMs do not actually understand a subject in a sentence.

“Based on their training data, they just model the probability that a given token, or word, will follow a set of tokens that have already been generated. The more data that is available in a given language, the more accurate the ‘completion’ of a sentence is going to be,” he said.

“If we were to somehow balance all the data available and train a model on a regimen of balanced data across languages, then the model would have the same error and accuracy profiles across languages,” said Talamadupula.

John agreed that because the ability of an LLM to learn a language stems from probability distributions, both the volume and quality of training data significantly influence how well an LLM performs across different languages. Because English and Chinese content dominate most training datasets, LLMs have a higher fluency, deeper knowledge, and stronger capabilities in those languages.

“Ultimately, this stems from how LLMs learn languages—through probability distributions. They develop linguistic understanding by being exposed to examples. If a model sees only a few thousand instances of a language, like Xhosa, compared to trillions of English tokens, it ends up learning unreliable token-level probabilities, misses subtleties in grammar and idiomatic usage, and struggles to form strong conceptual links between ideas and their linguistic representations,” said John.

Language structure also affects the ability to learn

Research also increasingly shows that the structure of the target language plays a role. Koyejo said the Mission: Impossible paper supports the idea that information locality (related words being close together) is an important property that makes languages learnable by both humans and machines.

“When testing various impossible languages, the researchers of the Mission: Impossible Language Models paper found that randomly shuffled languages (which completely destroys locality) were the hardest for models to learn, showing the highest perplexity scores,” said Koyejo. The Mission: Impossible paper defined perplexity as a course-grained metric of language learning. Koyejo also explained that languages created with local ‘shuffles’, where words were rearranged only within small windows, were easier for models to learn than languages with global shuffles.

“The smaller the window size, the easier the language was to learn, suggesting that preserving some degree of locality makes a language more learnable,” said Koyejo. “The researchers observed a clear gradient of difficulty—from English (high locality) → local shuffles → even-odd shuffles → deterministic shuffles → random shuffles (no locality). This gradient strongly suggests that information locality is a key determinant of learnability.”

Koyejo also pointed out that another critical element for a model learning a non-standard language is tokenization, with the character systems of East Asian languages creating special challenges. For example, Japanese mixes multiple writing systems, and the Korean alphabet combines syllable blocks. He said that progress in those languages will require increased data and architectural innovations that better suit their unique properties.

“Neither language uses spaces between words consistently. This means standard tokenization methods often produce sub-optimal token divisions, creating inefficiencies in model learning,” said Koyejo. “Our studies on Vietnamese, which shares some structural properties with East Asian languages, highlight how proper tokenization dramatically affects model performance.”

Insights into learning

The challenge with LLMs learning nonstandard languages is both interesting and impactful, and the issues provide key insights into how LLMs actually learn. The Mission: Impossible Language Models paper also reaches this conclusion, stating, “We argue that there is great value in treating LLMs as a comparative system for human languages in understanding what systems like LLMs can and cannot learn.”

Aaron Andalman, chief science officer and co-founder of Cognitiv and a former MIT neuroscientist, expanded on the paper’s conclusion by adding that LLMs don’t merely learn linguistic structures, but also implicitly develop substantial knowledge about the world during their training, meaning they develop a higher understanding of the languages.

“Effective language processing requires understanding context, which encompasses concepts, relationships, facts, and logical reasoning about real-world situations,” said Andalman. “Consequently, as models grow larger and undergo more extensive training, they accumulate more extensive and nuanced world knowledge.”

Further Reading

Read the whole story
mrmarchant
4 hours ago
reply
Share this story
Delete

Fizz Buzz in CSS

1 Share

What is the smallest CSS code we can write to print the Fizz Buzz sequence? I think it can be done in four lines of CSS as shown below:

li { counter-increment: n }
li:not(:nth-child(5n))::before { content: counter(n) }
li:nth-child(3n)::before { content: "Fizz" }
li:nth-child(5n)::after { content: "Buzz" }

Here is a complete working example: css-fizz-buzz.html.

I am neither a web developer nor a code-golfer. I am just an ordinary programmer playing on the sea-shore and diverting myself in now and then finding a rougher pebble or an uglier shell than ordinary, whilst the great ocean of absurd contraptions lay all undiscovered before me.

Seasoned code-golfers looking for a challenge can probably shrink this solution further. However, such wizards are also likely to scoff at any mention of counting lines of code, since this mind sport treats such measures as pointless when all of CSS can be collapsed into a single line. The number of bytes is probably more meaningful. The code can also be minified slightly by removing all whitespace:

$ curl -sS https://susam.net/css-fizz-buzz.html | sed -n '/counter/,/after/p' | tr -d '[:space:]'
li{counter-increment:n}li:not(:nth-child(5n))::before{content:counter(n)}li:nth-child(3n)::before{content:"Fizz"}li:nth-child(5n)::after{content:"Buzz"}

This minified version is composed of 152 characters:

$ curl -sS https://susam.net/css-fizz-buzz.html | sed -n '/counter/,/after/p' | tr -d '[:space:]' | wc -c
152

If you manage to create a shorter solution, please do leave a comment.

Read on website | #absurd | #web | #technology

Read the whole story
mrmarchant
4 hours ago
reply
Share this story
Delete

The Web Runs On Tolerance

1 Share

If you've ever tried to write a computer program, you'll know the dread of a syntax error. An errant space and your code won't compile. Miss a semi-colon and the world collapses. Don't close your brackets and watch how the computer recoils in distress.

The modern web isn't like that.

You can make your HTML as malformed as you like and the web-browser will do its best to display the page for you. I love the todepond website, but the source-code makes me break out in a cold sweat. Yet it renders just fine.

Sure, occasionally there are weird artefacts. But the web works because browsers are tolerant.

You can be crap at coding and the web still works. Yes, it takes an awful lot of effort from browser manufacturers to make "do what I mean, not what I say" a reality. But the world is better for it.

That's the crucial mistake that XHTML made. It was an attempt to bring pure syntactic rigour to the web. It had an intolerant ideology. Every document had to precisely conform to the specification. If it didn't, the page was irrevocably broken. I don't mean broken like a weird layout glitch, I mean broken like this:

XML Parsing Error: mismatched tag. Expected: </h1>.
Location: https://example.com/test.xhtml Line Number 9, Column 5:

The user experience of XHTML was rubbish. The disrespect shown to anyone for deviating from the One True Path made it an unwelcoming and unfriendly place. Understandably, XHTML is now a mere footnote on the web. Sure, people are free to use it if they want, but its unforgiving nature makes it nobody's first choice.

The beauty of the web as a platform is that it isn't a monoculture.

That's why it baffles me that some prominent technologists embrace hateful ideologies. I'm not going to give them any SEO-juice by linking to them, but I cannot fathom how someone can look at the beautiful diversity of the web and then declare that only pure-blooded people should live in a particular city.

How do you acknowledge that the father of the computer was a homosexual, brutally bullied by the state into suicide, and then fund groups that want to deny gay people fundamental human rights?

The ARM processor which powers the modern world was co-designed by a trans woman. When you throw slurs and denigrate people's pronouns, your ignorance and hatred does a disservice to history and drives away the next generation of talent.

History shows us that all progress comes from the meeting of diverse people, with different ideas, and different backgrounds. The notion that only a pure ethnostate can prosper is simply historically illiterate.

This isn't an academic argument over big-endian or little-endian. It isn't an ideological battle about the superiority of your favourite text editor. There's no good-natured ribbing about which desktop environment has the better design philosophy.

Denying rights to others is poison. Wishing violence on people because of their heritage is harmful to all of us.

Do we want all computing to go through the snow-white purity of Apple Computer? Have them as the one and only arbiters of what is and isn't allowed? No. That's obviously terrible for our ecosystem.

Do we want to segregate computer users so that an Android user can never connect their phone to a Windows machine, or make it impossible for Linux laptops to talk to Kodak cameras? That sort of isolation should be an anathema to us.

Why then align with people who espouse isolationism? Why gleefully cheer the violent racists who terrorise our communities? Why demean people who merely wish to exist?

The web runs on tolerance. Anyone who preaches the ideology of hate has no business here.

Read the whole story
mrmarchant
18 hours ago
reply
Share this story
Delete
Next Page of Stories