Stop using “machine learning” to measure “media bias”

Anjali Shrivastava
3 min readFeb 7, 2024

--

Media bias discourse is both worn-out and doesn’t go far enough. The landscape is overrun with “neutral” analysts using some kind of sentiment analysis to lazily label something as right or left. This MIT paper that used “machine learning” to assess “media bias” was lauded as a data-driven study that objectively showed bias in reporting. This has happened over, and over, and over again. And this approach rarely passes the sniff test for me.

The researchers in the MIT paper scored publishers based on how frequently they used politically charged language (e.g. “undocumented immigrant” vs. “illegal immigrant”) — blindly searching for “biased” phrases ignores the fact that journalists employ rhetorical devices like sarcasm, quotes and nuance.

What is the goal here? If the problem of media bias is solely due to specific phrases, then I have a foolproof solution for ending it: CTRL+F and replace.

My point is that language does not exist in a vacuum. Telling a dark joke among friends is perfectly acceptable. To your boss… not so much. You have to consider the context.

So if I had to measure the media landscape, how would I approach it? Simple, I would consider the context. Which in this case, is the readership.

Imagine a machine learning model that isn’t just based on the contents of the story, but takes into account where the story is shared, and by whom. That’s immediately a more compelling picture (to me, at least).

With a model like that, you could infer the political leanings of readers for individual stories, rather than blanket labeling a publisher as left or right. You could put together a social graph like this (inspired by tumblr’s reblog graph feature) for each story, to find where discussion of stories is happening, and what they’re saying about it.

Mock of social graph for a bellingcat story (this is fake data!)

I was inspired to consider this approach from an old, now deleted, tweet thread by Shaun Cammack on the Kenosha riots. In the thread, Cammack catalogs how the same story was interpreted by left-leaning readers and right-leaning readers, and which facts each group tended to index on. I wrote about it in my mini-rant about Media Bias charts earlier this week.

I think this kind of study would provoke more productive discussion than the current narrative of “journalist bias bad.” And if it exists, please let me know.

Okay, I wrote this several months ago but decided against publishing because it seemed overly ranty. And the MIT study I harped on was published in 2021, so I thought this methodology was on its way out… until I saw this story in the Financial Times. Sentiment analysis, in the year 2024. And on books published 1600–1900 (?!!?!?).

Those charts hinge on a flawed methodology — what you’re probably seeing is English modernizing towards what we know it today. But instead of writing a new essay in response to it, I’m repurposing old garbage. Because I can also be lazy! But at least I’m not getting paid for it (thanks, medium).

--

--