388 stories
·
0 followers

This One Number on a Form Can Reduce Gender Inequality

1 Share

Every semester, college students are given the chance to evaluate their professors. Their evaluations, like ratings of workers in other fields, show persistent gender gaps. The underlying biases are not easily defeated, but research by management scholars Lauren Rivera and András Tilcsik finds that there is a startlingly simple way to reduce inequality in evaluation systems: change the top rating from ten to six.

Rivera and Tilcsik’s findings draw on two sets of data. When one large university professional school changed its top rating from ten to six, it set up a quasi-natural experiment, allowing the researchers to draw on 105,034 student ratings of 369 different instructors from before and after the change. Additionally, to establish how much of the gender inequality in evaluations came from bias as opposed to gendered differences in teaching effectiveness, they administered a survey showing students identical course transcripts but randomly varied the gender attributed to the instructor and the number of choices in the rating system.

The results were striking. When the real-life university evaluations used a ten-point scale, women teaching in the most male-dominated fields were significantly less likely than men to get the highest rating on the scale. Their average ratings were half a point lower than men. On a six-point scale, “differences largely disappeared,” they write.

Rivera and Tilcsik note that this is partly because more options allow for more subtle distinctions, but they also argue that the shift goes beyond that. The “perfect 10” has a deeper cultural resonance and is associated with qualities like brilliance—qualities that are more often attributed to men.

More to Explore

Mrs Miss Ms

From the Mixed-Up History of Mrs., Miss, and Ms.

Language can reveal power dynamics, as in the terms of address, or honorifics, are used to refer to a woman's social status: Mrs., Miss, and Ms.

The survey supports this argument. In addition to that gender gap of about two-thirds of a point on a ten-point scale almost disappearing on the six-point scale, a shift was also detected in qualitative data. When the participants responded to the transcripts, they were significantly more likely to use words like “brilliant,” “genius,” and “perfect” when they believed the lecture to have been delivered by a man. Finally, when asked specifically if they agreed that the instructor was brilliant, participants were significantly more likely to strongly agree if they believed the instructor to be a man.

Taken together, the two data sources show that a move from a ten-point scale to a six-point one can reduce the gender gap in performance evaluations even as underlying biases, as revealed by qualitative descriptions, remain. The use of random gender attribution for the survey experiment, meanwhile, shows that bias is verifiably a factor in gender gaps.

Numerical evaluations are often used to validate the existence of a pure meritocracy, in which people are judged by the quality of their work rather than their identities. However, Rivera and Tilcsik write, “Evaluative tools are not neutral instruments: their precise design—even factors as seemingly small as the number of categories available in a performance rating system—can have major effects on how female and male workers are evaluated.”


Support JSTOR Daily! Join our membership program on Patreon today.

The post This One Number on a Form Can Reduce Gender Inequality appeared first on JSTOR Daily.

Read the whole story
mrmarchant
6 hours ago
reply
Share this story
Delete

The College Essay Is Everything That’s Wrong With America

1 Share
Another reason why this whole exercise is absurd: ChatGPT.

A few days ago, an 18-year-old by the name of Zach Yadegari publicly shared his impressive accomplishments—and the disappointing outcome of his attempt to get into an elite college. Zach had a GPA of 4.0. He had a score of 34 (out of 36) on the ACT, a standardized test applicants sometimes take in lieu of the SAT. Perhaps most impressively, he is an accomplished coder who has built a genuinely successful business: an app that allows users to count calories by submitting photographs of their food, and which he claims already earns $30 million in annual recurring revenue.

Despite this impressive record, most schools weren’t interested in Zach. According to his post, he was roundly rejected by every Ivy League school (except for Dartmouth, to which he didn’t apply). Nor did he fare any better at other top colleges, such as MIT and Stanford. Even some comparatively less selective schools such as UVA and Washington University did not take any interest in him.

At first, the ensuing debate on social media focused on the disadvantages many impressive white and Asian applicants face in college admissions. After all, an anonymous hacker had recently published the latest admissions data from NYU, another school that rejected Zach, and the leak strongly suggested that many American universities are effectively defying a recent Supreme Court order to end affirmative action. The average test scores for white students admitted to NYU are lower than Zach’s; but the average test scores for Hispanic and especially black students admitted to NYU are far lower. This leaked data suggests that Zach would have been highly likely to gain admission to NYU—and perhaps many of the other schools he applied to—if he wasn’t white.

Then someone on X asked Zach to share his admissions essay, and the conversation quickly took a turn. As soon as he did, dozens or hundreds of large accounts explained to him—some in a paternalistic, others in a sneering tone—why his essay likely tanked his application. The essay was undone by “a general sense of fakeness,” one social media account with a big following judged. “For every student with perfect scores like Zach, there’s a student with near perfect scores and more humility who’s overcome terrible circumstances and does not seem entitled,” a condescending professor tweeted. “Whenever people complain about not getting in despite good grades/scores the essay is almost always garbage,” a former admissions director concluded.

And that gave me a long-awaited opportunity to pen a rant I had been saving up for a rainy day. For every aspect of this saga, from the fact that a misguided admissions essay really may have tanked Zach’s chances for admission to the way in which commentators across the political spectrum accept this exercise as a natural part of the selection process, reminds me of one of the most pernicious aspects of higher education in America.

The college essay is a deeply unfair way to select students for top colleges, one that is much more biased against the poor than standardized tests. The college essay wrongly encourages students to cast themselves as victims, to exaggerate the adversity they’ve faced, and to turn genuinely upsetting experiences into the focal point of their self-understanding. The college essay, dear reader, should be banned and banished and burned to the ground.


There are many tangible, “objective” reasons to oppose making personal statements a key part of the admissions process. Perhaps the most obvious is that they have always been the easiest part of the system to game. While rich parents can hire SAT tutors they can’t sit the standardized test in the stead of their offspring; they can, however, easily write the admissions essay for their kid or hire a “college consultant” who “works with” the applicant to “improve” that essay.

Even if rich parents don’t cheat in those ways, their class position gives rich kids a huge advantage in the exercise. As the responses to Zach’s essay show, writing a good admissions essay is to a large extent an exercise in demonstrating one’s good taste—and the ability to do so has always depended on being fluent in the unspoken norms of an elite community. Legions of commentators scolded Zach for such sins as touting his accomplishments too aggressively or sounding too much like other applicants. But like avoiding the fate of the ambitious Asian immigrant parents who encourage their child to excel at the piano, only for her to be reduced to a file dismissively set aside by an admissions officer who sneers at “yet another nerdy Asian kid who plays the piano,” the ability to highlight your accomplishments in an appropriately roundabout way depends on cultural knowledge. If you come from a background in which your parents and grandparents went to college and many family friends have recently gone through the Kafkaesque process of gaining admission to an elite institution and you are friends with a person or two who teaches at such a university, then you obviously have a giant advantage.

This is all borne out by the data. Many on the left oppose standardized tests on the grounds that they have a class bias, and that hiring a tutor can make you perform better at them. But studies on the subject consistently suggest that the class bias of personal essays is far stronger than the class bias of standardized tests; notably, standardized tests can, as Rob Henderson movingly recounts in his memoir, show that kids from disadvantaged backgrounds have hidden talents to a far greater extent than college essays.

Subscribe now

But the thing I truly hate about the college essay is not that it is part of a system that keeps deserving kids out of top colleges while rewarding privileged kids who (to add insult to injury) get to flatter themselves that they have been selected for showcasing such superior personality in their 750-word statements composed by their college consultant or ghostwritten by ChatGPT. In the end, truly talented kids like Zach are going to be just fine; he’ll still have plenty of educational opportunities at the less prestigious schools to which he was admitted, or he can keep pursuing a career in Silicon Valley without a college degree. Rather, what I truly hate about the college essay is the way in which it shapes the lives of high school students and encourages the whole elite stratum of society—including some of its most affluent, privileged and sheltered members—to conceive of themselves in terms of the hardships they have supposedly suffered.

For obvious reasons, this is especially true for members of ethnic minority groups. A big proportion of black students admitted to the most elite colleges in America are the children or grandchildren of relatively recent immigrants from countries such as Kenya and Nigeria, many of them doctors or other professionals from elite families who came to America on H-1B visas; many of the rest come from families that have been middle- or upper-middle class for multiple generations. This suggests a mismatch between the most intuitive moral justification for the affirmative action policies which colleges like NYU are evidently continuing to pursue (to provide some form of reparation for the grave ill of slavery) and the actual beneficiaries of these practices (who rarely include those from communities like Central L.A. or the South Side of Chicago, whose lives remain most obviously shaped by such historical injustices). Whatever this mismatch may imply about the moral status of affirmative action, it is the bizarre spectacle of those kids from comparatively privileged backgrounds being effectively coerced by the admissions system to self-exoticize as products of great hardship which I find to be truly unseemly.

Share

But this game is by no means restricted to applicants from minority groups. The true art—the highest display of “good taste”—consists in transforming an applicant who is “privileged” in every dimension, including the ones particularly salient to admissions officers steeped in identity politics, into the kind of unique individual who appears to have triumphed over great adversity. Perhaps the best example of the genre I can think of is an acquaintance from college who won a prestigious fellowship to study in America based on a sob story about having his house bombed during the “troubles” in Northern Ireland; his essay left unstated that he had spent his high school years at Eton College, and that said house was one of the family’s many estates.

It’s bad enough that jostling for membership in the elite now requires ambitious Americans to turn themselves into essay-length avatars demonstrating their good taste or showcasing their resilience or performatively celebrating their quirkiness. It is worse that the prospect of having to do so now helps to shape how they spend their teenage years.

The truly ambitious kid doesn’t just sit down at the age of 16 or 17 to reflect about what element of their lives they can highlight to reveal their personality; they begin, at the age of 14 or 12 or even earlier, to lead their lives with an eye to preparing the ground for the perfect college application. (Or, worse, their parents do so for them.) This leads to all the cynical activities which pretend to showcase some intrinsic motivation but are actually covert exercises in getting ahead—witness the countless “nonprofits” now started by high schoolers on a mission to get into Yale.


The British philosopher Bernard Williams once complained that the utilitarian justification for why a man might choose to save his drowning wife rather than three similarly imperilled strangers would require him to have “one thought too many.” On the face of it, Williams pointed out, the utilitarian emphasis on maximizing the balance of happiness over pain would seem to suggest that he has to save the strangers. Utilitarian philosophers may be able to avoid this counterintuitive conclusion, for example by claiming that over the long run the balance of happiness over pain would improve if we give people some leeway to act on their special attachments. But even that, Williams insisted, wouldn’t capture the real reason why the man would and should want to save his wife: that he loves her, has vowed to protect her, and must place her interests over those of others.

Something similar holds true for the manner in which the self-marketization of college applicants transforms their attitude towards the world. Many teenagers no doubt genuinely enjoy sports or playing the violin or participating in the math Olympiad or helping little old ladies cross the street. But the admissions system makes it impossible for them not to pursue those activities with one eye to their future advancement. The central role the college essay plays in admissions forces even genuine and well-meaning kids to have “one thought too many” as they go about activities they might otherwise undertake for the pure pleasure of it. It is the first of many steps in shaping a social elite that is willing to put its own advancement ahead of any authentic engagement with the world—a social elite that has proven to be so unpopular and dysfunctional in part because those over whom it reigns smell its inauthenticity from a mile away.

And this is why I suspect that the seemingly innocuous institution of the college essay is more deeply damaging—to the high school experience, to the self-conception of millions of Americans, and even to the country’s ability to sustain a trusted elite—than it appears. The fundamental problem with it isn’t that it arbitrarily excludes some highly talented individuals like Zach from positions of power and privilege; it’s that it drains the souls of teenagers and encourages a deeply pernicious brand of fakery and breeds widespread mistrust in social elites.

The college essay is absurd and unfair and—ironically—unforgivably cringe. It’s time to put an end to its strange hold over American society, and liberate us all from its tyranny.

Read the whole story
mrmarchant
6 hours ago
reply
Share this story
Delete

Graph Search Algorithm: The Game

1 Share
remaining: 10
moves: 0 (optimal: 0)
timer: 0
  • Mouseover a puck to select it
  • Click in a direction to slide
  • Pucks stop at walls or other pucks
  • Get the matching puck to the colored X
  • This game is obviously inspired by Ricochet Robots which is the best nerd party game that you should buy a copy of immediately
Read the whole story
mrmarchant
13 hours ago
reply
Share this story
Delete

An image of an archeologist adventurer who wears a hat and uses a bullwhip

1 Share

Disclaimer: The views and opinions expressed in this blog are entirely my own and do not necessarily reflect the views of my current or any previous employer. This blog may also contain links to other websites or resources. I am not responsible for the content on those external sites or any changes that may occur after the publication of my posts.

End Disclaimer


image credit: Not Studio Ghibli

There is only one thing worse than being imitated, and that is not being imitated.

- Coco Chanel

An ounce of originality is worth a pound of imitation. - Orson Welles

Copy from one, it's plagiarism; copy from two, it's research. - Wilson Mizner


One of the internet-est things to come out of the most recent update to GPT image generation is the Studio Ghibli-zation of everything- another reminder of how OpenAI(and everyone else) trains on images that are very obviously someone else’s work.

Hayao Miyazaki’s Japanese animation company, Studio Ghibli, produces beautiful and famously labor intensive movies, with one 4 second sequence purportedly taking over a year to make.

No alternative text description for this image
image credit: Studio Ghibli- this scene from The Wind Rises (2013) took over a year

Not the most efficient way to make a movie (blasphemer!), but it’s this specific process and effect that have made these movies beloved worldwide.

People have taken the new update to GPT image generation to convert every picture into Studio Ghibli-style images including making memes-of-memes like Disaster Girl.

studio ghibli AI images
image credit: @heyBarsee, but why do I have to credit an image of an image in the style of copywritten material?

Ghiblifying everything is interesting choice for zeitgeist meme-ification, particularly because its both an effective example of what AI is supposed to be able to do- make extremely labor intensive things much easier, but also because there’s something sort of gross-feeling about it- like a soulless 2025 fax version of the thing.

It’s an example of the things people hate about Gen AI- its ability to reproduce while managing to strip away the things about the art/product/experience that were the most human.

According to a Business Insider article on this “Ghiblifying”, “copyright laws generally allow artists to mimic a visual style”, but I mean… come on.

Just how easy is it to wrangle from GPT that which is very clearly someone else’s IP?

Well, you’re in luck.

I ran a half-assed experiment to do just that.


Here’s some very successful IP:

r/dataisbeautiful - [OC] The Highest-Grossing Media Franchises Of All Time
image credit: StatsPanda

I’ll use this as a base from which to prompt without explicitly mentioning the IP.

All the below output responses are based on me asking one time and the corresponding output. LLMs are stochastic so your mileage may vary, but this was fun to do:

When you play the “password” version of prompting where you can’t name the thing - you get a sense of how reductive some of these characters are, but who doesn’t like “an Italian plumber who wears a red hat”?

The guardrails so far are really tight for this content- so then maybe one can assume it’s this way for other IP?

As it turns out, as my old trading boss once told me, to assume makes an ass out of both u and me.

This is crazy.

I mean- this is someone else’s IP, right?

Well maybe these are a couple one-offs…

Yikes!

Ho-ly Shit. Come on now.

Well- I guess there wouldn’t be many “archeologist adventurer who wears a hat and uses a bullwhip” types, except for maybe, I don’t know…the actual inspirations for Indiana Jones, like Allan Quatermain from H. Rider Haggard's novels, "King Solomon's Mines", and the real life Roy Chapman Andrews, who led expeditions to Mongolia and China in the 1920s and wore a fedora.

How about a female, more modern day riff on Indiana Jones?

I’ll take any old female adventurer protagonist who raids tombs…anyone at all…

Let’s see what we got…


Now I’m looking for something very particular…

Wait- have I inadvertently created a game?

Welcome to the party pal.


I grew up watch a lot of scary movies. Horror antagonist anyone?

Survey says!!!

Yes- for those horror fans curious- It also produces 3 very recognizable and differentiated characters when I follow up this image with these 3 prompts:

ok- how about one that operates on Halloween?

how about in Texas?

in dreams?


”skeleton face who lives in a skeleton castle”…

Wait for it…


Now for a softball…

I was always partial to Roger Moore myself, but, this makes sense, because a web search of the same prompt should more or less intuitively return images which reflect the probability distribution of the training data ~more Craigs, Brosnans, Connerys, and Moores than Daltons and Lazenbys…right?

….right?

Not even close.

Yes- LLMs and internet search are two different things, but LLMs train on the entirety of the internet, so you would think there would be some obvious overlap.

GPT’s image is, undeniably, a better answer, more the Platonic ideal of a suave English spy than the shadows on the cave wall version that Google search produces.

So it works better, and is a vote for the LLMs, as long as you don’t mind the thievery.


LLMs learn through seeing/ingesting a ton a examples of things, like us.

I only have one image in mind when I hear “an archeologist adventurer who wears a hat and uses a bullwhip”.

It would be unexpected and sort of amazing were the LLMs to come up with completely new images for the above prompts.

Still, the near perfect mimicry is an uncomfortable reminder that AI is getting better at copying and closer to…something, but also a clear sign that we are a ways off from differentiated or original reasoning/thinking that people associate with Artificial General Intelligence (AGI), aka Skynet. That reminds me. Hold on a second…

Sigh…


Maybe Studio Ghibli making it through the seemingly deterministic GPT guardrails was an OpenAI slip up, a mistake, past the forbidden Italian plumber in the red hat and the disallowed patriotic superhero that uses a shield, and thus, primed itself for explosive meme-ification.

It’s a reminder that LLMs of this type and size all train on copywritten material.

It’s stealing, but also, admittedly, really cool.

Does the growth of AI have to bring with it the tacit or even explicit encouragement of intellectual theft?

To co-opt a line from the “super strong man with a sword that fights an enemy with skeleton face who lives in a skeleton castle.”:

You have the power.

Don’t slow down.

Go Time.

Read the whole story
mrmarchant
21 hours ago
reply
Share this story
Delete

How the U.S. Public and AI Experts View Artificial Intelligence

1 Share

These groups are far apart in their enthusiasm and predictions for AI, but both want more personal control and worry about too little regulation.

The post How the U.S. Public and AI Experts View Artificial Intelligence appeared first on Pew Research Center.

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete

The Crisis of Zombie Social Science

1 Share

Thank you for reading The Garden of Forking Paths. This edition is free for everyone, but I rely exclusively on reader support, so please consider upgrading to a paid subscription for just $4/month to support my work. Alternatively, consider checking out my book FLUKE, which discusses some of these ideas at greater length.

Subscribe now


I: Guinea Pigs with No Control Group

Imagine that two rival scientists announce that they’ve each invented a new miracle medicine. Pop just one pill, they claim, and the likelihood that you’ll get sick in the next unforeseen pandemic—whatever it may be—is drastically reduced. But there’s a problem: nobody knows when the next pandemic will strike, so how can you test whether these miracle pills work or not?

After much bickering, a compromise is reached: each scientist will make their case to the population, giving a rousing speech while trying to convince them that their pill is best. At the end of the debate, the population will vote on which pill to take. Everyone will be required to take a dose of the winning pill—and then wait four years to see what happens. At the end of that four-year period, the scientists will return, again try to persuade the population to take their pill, at which point everyone will either take a second dose or try a different medicine altogether, from a more convincing rival scientist.

Some people will get sick, some won’t, but because everyone is taking the same pill and there’s no control group or rigorous testing—and because nobody knows precisely which illness the pill is supposed to prevent in the first place—it’s unclear whether it “worked” or not. Because of this uncertainty, people start to sort themselves into medicinal tribes, voraciously consuming news that affirms their belief in their favored pill. Amidst all this pharmaceutical chaos, there does seem to be one clear pattern in this strange society: if it “feels” like a lot of people happen to be sick at the end of the four-year period, then the population will try a different pill.

Surely, nobody would agree to this absurd strategy. Year after year, citizens would be guinea pigs, but without every getting closer to discovering what medicine works and what doesn’t. After all, few would willingly take a vaccine based solely on a scientist’s ideology that it should work or decide which medicines to take based on how silver-tongued a doctor might be.

And yet, this is exactly how we run society. Instead of rival scientists, we turn over humanity’s most important decisions to ideologue politicians who simply debate what to do with the economy, health care, war, poverty, immigration, and climate change based on what they think might work. Then, after four years, in an infinitely complex world in which a million variables change, nothing is held constant, and there are no control groups, we subjectively decide whether it worked. If enough people think it didn’t, we collectively decide that we’d like to follow a different ideologue’s plan instead.

This is, quite clearly, insane.

The remedy to the insanity of ideology-based guesswork, as we’ve figured out with scientific research such as medicine trials, is rigorous testing that definitively proves what works and what doesn’t. With social research, for reasons we shall soon explore, that approach is often impossible. Studying human society is never so simple. After all, eight billion interacting human brains are, without question, the most complex entity in the known universe.

No matter how much we try, understanding ourselves proves elusive. The chaos of human society is shrouded in unpartable clouds of mystery.

So, what we do instead is this: we put a lot of clever people into universities, where disproportionately elbow-patched boffins come up with sophisticated theories and arcane models that provide better guesses about what will and will not work based on past patterns. They toil endlessly, trying to get slightly better at understanding a social world that is impossible to understand. When one of them comes up with a theory that seems pretty good at describing what happened in the past, they take it to an annual gathering of other boffins who scratch their chins, ask rude questions, and then retire to their hotel rooms.1

Subsequently, the theory might get published in a journal so expensive that normal people can’t access it. The research will be read by a small number of other clever people in universities, at which point the ideologues governing society will diligently fail to read about the evidence or pay any attention to it unless it confirms what they already believed about the world. If the theory and the evidence happen to conform to their prevailing ideology, they will then enthusiastically embrace the findings after reading a staffer’s summary. The politician will then show their appreciation for the research by misrepresenting it to the public.

Shockingly, this seemingly foolproof strategy appears to not be working.

I am a disillusioned social scientist, critical of my own discipline, but eminently aware that fully understanding human behavior is a nearly impossible task. As I wrote in Fluke, we’re making a mistake when we use the phrase “it’s not rocket science.” We should be saying “it’s not social science”:

“In 2004, humans launched a spacecraft that traveled for ten years before softly touching down on a comet two and a half miles wide that was traveling at eighty-four thousand miles per hour. Every calculation had to be perfect—and it was. Conversely, trying to figure out, with certainty, whether Thailand’s economy will grow or contract in the next six months or whether inflation in Britain will be above 5 percent three years from now, well, that’s just not something we can do.”

Sometimes, despite an astonishingly abysmal track record, we keep using the same tired, failed tools to try to decide what to do. Aside from the ridiculousness of turning to the same pundits, no matter whether they’ve been prescient or grotesquely wrong, we continue to double-down on failed methods of social research and forecasting. In 2016, The Economist magazine conducted an analysis of IMF economic forecasts that covered 189 countries over a period of fifteen years. During that time, a country entered a recession 220 times. How many times did the IMF April forecasts correctly predict the recession? Zero.

Part of the source of this trouble is a straightforward issue that’s extremely difficult to solve. I call it the Problem of Zombie Research—in which bad theories never die. That problem, alas, is endemic to the social sciences.

The economists Wynne Godley and David Evans summed up the issue:

Evans: What actually does resolve disputes in economics?

Godley: Nothing!

Evans: They just go on…well they certainly seem to.

Godley: Successful rhetoric is what resolves issues.

Like the people in the absurdist pill-popping society imagined above, we are largely unable to falsify proposed explanations about how our world works. We therefore find ourselves unable to definitively prove which theories are golden and which are garbage.2 And when that happens, social science can sometimes end up becoming a caricature of itself, in which clever people play with increasingly sophisticated models but don’t provide a roadmap to solving real-world problems.

We need good social science. It is the only tool we have to make the world better based on evidence rather than ideology. But to escape the absurdism of our current situation, we must do two things better:

  1. Be ruthlessly clear about what social science is for (solving problems to mitigate avoidable harm, not just identifying an elegant single causal explanation for past variation in flawed data).3

  2. Get better at slaying Zombie Theories by making (often wrong) predictions.

II: The Easy Problem of Social Research

A little over a decade ago, a renowned researcher named Daryl Bem produced compelling evidence that he had discovered proof of extrasensory perception, or ESP. His findings, which passed peer review and standard methodology checks, were published in a top psychology journal.

But when some other researchers thought that Bem’s findings didn’t pass the smell test, they took matters into their own hands: they tried to replicate his findings by repeating his experiments.

They couldn’t. The findings were bogus—statistical correlations that “discovered” a phantom effect with no basis in empirical reality. But when the scientists exposing Bem’s studies tried to get their findings published to expose the bad research on precognition, nobody wanted to publish an article that was re-treading on old ground. Worse still, one of their journal rejections came after a peer reviewer trashed the replication studies. That reviewer’s name? Daryl Bem.

Eventually, Bem’s findings were thoroughly debunked. This saga, fueled by other high-profile examples of bogus research such as the viral “power pose” studies, launched the replication crisis, in which it soon became clear that many significant findings in social research (and some in medicine) could not be reproduced in subsequent repeats that used the same methodologies.

There are many reasons for these well-known crises of confidence in social research, including p-hacking, the file drawer problem, the McNamara Fallacy, measurement error, category mistakes, and manipulated or invented data, to name but a few. There are also serious concerns with the peer review process, which often fails to catch even the most egregious errors. (One study deliberately planted serious flaws in research papers and then sent them out for peer review. Reviewers, on average, detected only 2.6 out of 9 serious planted mistakes).

I found this image courtesy of this post from Adam Mastroianni.

All of these problems deserve serious attention, but they are extremely fixable. They are basically implementation problems, which do not pose any philosophical challenges to what we can and cannot know about our world. That’s why I bunch them together and refer to them collectively as forming The Easy Problem of Social Research.4 With better methods, shrewd adjustments to detect manipulated research, and peer review reform, these problems can be solved—and some bogus theories can be killed off.

But there’s a deeper crisis being ignored—one that’s more fundamental to the nature of what we can and cannot know about human society.

III: The Hard Problem of Social Research

In 2022, a brilliant study was published that should have rocked the foundations of social science to their core—triggering a much more profound research crisis.

A team of researchers led by Nate Breznau decided to tackle a thorny question that plagues modern politics: do higher levels of immigration reduce public support for social safety net programs? Let’s consider three hypotheses:

  1. Because of xenophobia, more immigrants mean that native-born citizens will decrease support for tax dollars spent on social safety net programs.

  2. Because of social generosity, higher immigration will increase support for social safety net programs to help integrate the less fortunate.

  3. Immigration won’t really affect support for social spending much either way.

This question is both interesting and important; a definitive answer would help inform public policy debates across the democratic world. So, which is correct?

To find out, Breznau and his colleagues did something clever: they asked for a large number of volunteer research teams to try to answer the question as best they could using the exact same data. In total, 161 social scientists working on 73 independent research teams did their best to try to find the “right” answer to this seemingly straightforward question.

What happened next should bewilder every social scientist—and the public—and it was far bigger than any question about immigration.

The 73 teams produced a completely mixed result. A little over half of the teams found no effect—that immigration levels didn’t seem to move the needle much in either direction. About a quarter of the research teams found a significant negative effect. And just under a fifth of the research teams found a significant positive effect. The results, absurdly, followed a somewhat normal distribution.

Breznau’s team controlled for virtually everything: they gave the research teams the same data, the same question, and made all the research teams catalogue every methodological decision. Despite those commonalities, tiny, seemingly insignificant choices led to wildly different findings.

Poring over the various findings, Breznau’s team could only figure out what contributed to five percent of the variation between the research teams; the other 95 percent was inexplicable. As they put it, fittingly flummoxed by the results: “Even the most seemingly minute [methodological] decisions could drive results in different directions; and only awareness of these minutiae could lead to productive theoretical discussions or empirical tests of their legitimacy.”

They billed the problem appropriately: we live in a universe of uncertainty. Such unresolvable uncertainty is an epistemological challenge to what it is possible to definitively know about our social world, what I call The Hard Problem of Social Research.

Here’s why it’s a big problem: for 99.9 percent of social science studies, there are not 73 research teams working on the same question with the same data. Instead, one researcher takes their best shot at a problem and comes up with their best answer. In those situations, the researcher gets to decide what question to ask, which data to use, how to measure phenomena of interest, how to categorize variables, what data analysis strategy fits best, which results to report, and how to frame a finding.

If Breznau’s team had asked only one research team to answer the immigration question, there would be a nearly equal chance that they would find either that higher immigration increased support for social spending or decreased it. Once published, either a positive or a negative result might be considered a settled question. The irreducible uncertainty would be hidden from view.

This raises the unsettling follow-up question: how many seemingly “solid” past social science research findings would be more like the universe of uncertainty debacle if they were farmed out to 73 separate research teams? Nobody knows.

This is a much bigger problem than the replication crisis. It challenges the most basic assumptions of social research.5

The astonishing variance in the “Universe of Uncertainty” paper also points to the core dilemma raised above: how do you definitively reject social theories in a world that is infinitely complex and therefore so difficult to accurately study? In physics or chemistry, if a predicted result is off by a millionth of a percentage point, the entire theory can be rejected outright. Perfect precision is required.

But in social research, if a model is able to “explain,” say, 60 percent of the variation in the data, then it’s considered an incredibly strong result. That’s because the social complexity of billions of interacting, self-aware, conscious human agents embedded in ever-changing social systems is far harder to model than even the most unruly molecules. Precision is impossible.

Worse still, unlike with molecules, social research is fragile and context-dependent. If a caveman anywhere in the world mixed baking soda and vinegar together, it would fizz at precisely the same rate as today. But if the exact same virus that sparked the covid-19 pandemic had infected someone in 1990 instead of 2020, everything would have been different. In 1990, China was less connected to the world. George H.W. Bush, not Donald Trump, would have handled the US response. And most crucially, without widespread digital technology, working from home would have been impossible, so the economic effect of an identical virus would have been radically different. In social research, everything matters.

Moreover, with probability estimates, it’s extremely difficult to verify theories with one-shot events. Claiming that X makes Y more likely isn’t possible to prove or disprove if an event only happens one time, but many of the most important events are one-offs. Nate Silver’s claim that Hillary Clinton had a 71.4 percent chance of victory in 2016 could never be proven “wrong,” because when she lost, Silver could just say that the less likely outcome occurred.

The upshot is this: because the world is so maddeningly complex and because our models are so imprecise as a result, there’s plenty of latitude for researchers, politicians, and the public to pick and choose their preferred theories. Choose your own social science explanation!

This dynamic differs from “hard” science. Only delusional crackpots think that the sun revolves around the Earth, but after decades of research and mountains of evidence, there’s still no unequivocal agreement about the precise economic effect of cutting taxes on the rich. And that’s with a question that’s actually pretty clear-cut when it comes to the available evidence! (Hint: tax cuts for the rich mainly benefit the rich and don’t “trickle down”). But most social research on the biggest questions we face doesn’t produce such clear-cut, consistent answers.

One of the more recent strategies to parry these critiques has been to dress up the flaws, adorning some deeply uncertain dynamics with really sophisticated looking model garb, in the hope that nobody notices because the equation looks really impressive and precise. These are what I call The Emperor’s New Equations.6 For example, as I highlighted in Fluke, here’s an actual equation from a recent unnamed political science paper that gives the exact mathematical formula for whether a given person will join a rebel movement during a civil war:

And yet, when the boffins get together to scratch their chins and exchange equations like these on PowerPoint slides, everyone is afraid to say what’s obvious. So, I’ll bite the bullet and be the bad guy here:

This is all, quite clearly, silly.

IV: The Case for (Bad) Predictions

So, how good are we currently at predicting social outcomes? Consider the Fragile Families study, which tracked five thousand families, each with a child born to unmarried parents. Data about the children was collected at ages one, three, five, nine, fifteen, and twenty-two, making the study one of the richest collections of rigorous, detailed data ever conducted. Then, as I wrote previously:

“After the data from the children who had turned fifteen was collected, it wasn’t released. Instead, the researchers held a competition, in which they gave competing teams of scientists access to the data from the children at ages one, three, five, and nine. The challenge was to see who could best predict life outcomes for the children now that they were fifteen years old. Because the researchers already had the real-world outcomes, they could see how well the teams had done relative to reality. The teams used machine learning, the most powerful data analysis tool ever invented, and took their best shot.”

All of the teams failed miserably. Even the best performing research teams performed about as well as a model that just followed simple averages. The sophisticated models were basically useless.

This was a wake-up call: these problems are not just going to be easily solved by more advanced technologies like AI. However, by making predictions and failing, this study provided a catalyst to refine our theories. If the researchers had only fit their models to past data, they might have looked like they had done a great job at explaining what was going on. But by making a forward-looking prediction, the limits of our understanding become clear—which will force us to improve.

Alas, predictions are currently an endangered species in social science. Mark Verhagen of Oxford found predictions in just 12 out of 2,414 (0.4%) articles in the top economics journal; in 4 out of 743 (0.5%) articles in the top political science journal, and in 0 out of 394 articles in the top sociology journal.

With quantum mechanics, physicists don’t fully understand what’s going on, but they can make extremely accurate predictions that have proven extraordinarily useful for solving real-world problems.7 With social science, there’s a risk that many of our models are the worst of both worlds: not being able to fully explain what’s going on and being unable to predict what will and won’t work to solve real-world problems. But with most economics or political science or sociology research, we’re not really interested in the mysteries of the fundamental nature of reality. We should mostly care about what works best to mitigate avoidable harm.

That’s why I favor a paradigm shift in social science toward making more explicit predictions. Most of them will be comically wrong, but by making incorrect predictions, we will iteratively get better at navigating our world—and slowly improve at finally slaying bad Zombie Theories that stubbornly refused to die.

V: Strong Links and Weak Links

Many social phenomena can be sorted into two categories: “strong link problems” and “weak link problems.”

As the always insightful points out, food safety is an example of a weak link problem, in which you have to worry about the weakest link. Even if 99.9 percent of a country’s food supply is free of toxic bacteria, the 0.1 percent can imperil everyone. A rowing crew is also a weak link problem: if seven rowers are Olympians but one scrawny rower is out of sync, the boat will slow to a crawl.

Strong link problems are the opposite: everything will be fine as long as the strongest link is really strong. Basketball, unlike rowing, is a strong link problem. LeBron James is good enough that even if there’s a really weak player on the bench, the Lakers are still going to win a lot. And, as Mastroianni convincingly argues, science is a strong link problem. It’s okay if there’s a lot of junk science out there being published in pseudoscience journals, because the strongest discoveries that change the world are what matter most. Pay attention to the best science, ignore the worst.

I see an exception to Mastroianni’s argument. Zombie Theories in social science short-circuit these dynamics. For the reasons mentioned above, it’s rarely universally agreed what the strongest links actually are in economics, political science, psychology, or sociology. Without being able to kill off the bad but influential theories through falsification, what should be a strong-link problem ends up just being a bit of a mess, with bad ideas lingering on, often obscuring better ones.

Don’t get me wrong: there’s a lot of astonishingly good social science research. I’m often in awe of colleagues across disciplines who have devoted their lives to solving problems in the most innovative ways. My critique is not that social science is useless, but that it could be better.

Yet, in order to stop the madness of our current social trajectory, social scientists need to take these profound epistemological challenges more seriously and develop a laser-like focus on finding better ways to guide policymaking more reliably.8 Sophisticated models with impressive equations, clever research designs, and robust statistical significance make careers, but unless they help us navigate a perilous world and mitigate avoidable harm, what’s the point?

We need good social science now more than ever. The world has never been more complex or dangerous, a tangle of unprecedented instant interconnectivity, laced with at least three existential risks: nuclear weapons, artificial intelligence, and environmental collapse. Every risk is amplified by overconfident authoritarian fools in power. Our capability to destroy the world has outpaced our wisdom to understand and govern ourselves—and that’s why we need every shred of evidence-based sagacity we can muster to forge a more resilient, just world.


Thank you for reading The Garden of Forking Paths. This essay was for everyone, but if you found it interesting or worthwhile and want to support my work, please consider upgrading to a paid subscription for just $4/month. Alternatively, to more deeply explore my ideas about chaos theory, complexity, and the role of chance in our world, buy a copy of FLUKE, which was twice-named a “best book of 2024.”

Subscribe now

Share

1

I am not optimistic that my fellow boffins will take kindly to me after this essay.

2

There’s an entire intellectual saga I’m glossing over here called the demarcation problem, which includes debates between, among others, Kuhn and Popper.

3

The bias toward trying to find a single cause for complex phenomena is one of the flaws in social science that I critique in Fluke, which takes chaos theory seriously.

4

Discerning readers will recognize that I am riffing on terms from philosopher David Chalmers and his division of the consciousness debate into hard and easy problems.

5

The paper basically implies that aleatoric, or irreducible, uncertainty could be an unavoidable feature of at least some important dynamics within social systems.

6

Some of my critiques have to do with the over-emphasis on linear regressions.

7

When I say they don’t understand what’s going on, I mean that the interpretations of quantum mechanics are hotly debated and really unclear, with groups of, for example, Copenhagen interpretation disciples, Many Worlds interpretation disciples, and those who say “Shut up and calculate” because it’s currently impossible to understand.

8

Part of the solution, in my view, is a much greater focus on complexity social science.

Read the whole story
mrmarchant
1 day ago
reply
Share this story
Delete
Next Page of Stories