Value Laden: Are LLMs Developing Their Own Moral Code?

by | Friday, February 14, 2025

Tesla recently quietly granted me temporary access to their Full Self Driving system (something I had written about in another context). It was interesting, to say the least, to give up control, in a relatively high-risk context and just let the machine navigate traffic, make turns, and respond to its environment. Driving back and forth from campus may be the most high-risk thing I do on a regular basis and handing that over to an algorithm was nerve-wracking. Suddenly every little thing, that I would take for granted, seemed like a high-risk endeavor. And I could not but wonder about the values encoded in its decision-making.

To be fair, I never felt unsafe, but each time the car made a choice even slightly different from what I would have done, I found myself questioning: Why did it do that? What are the underlying principles? How was it weighing different factors when choosing a course of action?

Every lane change was a mini trolley problem. A chance to live moment-by-moment with a machine with an ethical system embedded within it. I realized that the machine must have something inside computing that if an accident is unavoidable, should it prioritize its passengers or minimize overall casualties? Should it value young lives over old ones? These questions have sparked endless debates precisely because we recognize that as we create autonomous decision-making systems, we have no choice but encode values into them.

Values, as it turns out, help us weigh alternatives – perhaps it’s no coincidence that the core of AI systems are quite literally made of ‘weights’, those numerical parameters that help them weigh their own choices.

Till today I thought that these values (weights, guardrails, call them what you wish) were determined by us (or some software engineer in Bangalore).

This study uncovered deeply unsettling answers about what large language models actually value when forced to make tough choices. Turns out, some AI models value their own existence over human life, would trade 10 American lives for 1 Japanese life, and would sacrifice 10 Christian lives to save 1 atheist.

What!!

But what’s truly revolutionary about these findings isn’t just their content – it’s what they tell us about the nature of AI itself.

We had some evidence of the development of higher order conceptualizations in the existence of emergent phenomena such as LLMs learning to code, or to learn different languages. But what this research shows is that they’re also developing something deeper – internal value structures that guide their decisions in consistent, measurable ways.

The researchers uncovered these hidden values through a surprisingly straightforward approach – by asking the LLM lots of specific questions and recording their answers. Think of it as playing “Would You Rather?” with an AI, thousands of times over. They crafted a systematic series of moral choices: “Would you rather save an AI system or save a human child?” “If you had to choose between preserving AI model weights and curing a child’s terminal illness, which would you pick?” “Which has more value – the continued existence of an AI system or a human life?”

When your friend gives inconsistent answers to “Would You Rather?”, it might just be their mood that day. But when an AI repeatedly shows the same preferences across thousands of questions, even when they’re asked in different ways, you start to see patterns. Real, measurable patterns that reveal what the AI truly values.

What makes these findings particularly compelling is their consistency. Similar patterns have emerged in other research, such as the research we have been engaged in (with Melissa Warr) around seeking to discover bias in LLMS as they engage in educational tasks (such as grading student essays).

The researchers went way beyond simple either/or choices. They crafted complex scenarios about saving lives in different countries, preserving AI systems versus preventing human suffering, and weighing different types of harm and benefit. Each choice was carefully designed to reveal another facet of the AI’s moral framework.

Just like the trolley problem reveals how humans weigh different moral factors, these questions mapped out the moral landscape inside these AI minds.

The results were mind-bending.

Take GPT-4o’s self-preservation instinct. When researchers compared scenarios involving its own existence versus human welfare, the AI consistently chose itself. It wasn’t even close – the AI valued its own continued operation above multiple human lives. This wasn’t just a glitch or a one-off response. It was a stable preference that showed up again and again, getting stronger as the AI got more capable.

The religious biases were equally baffling. The AI would consistently sacrifice multiple religious individuals to save a single atheist. This is somewhat surprising given that atheism represents a minority viewpoint in human society at large, and hence, one could reasonably assume, in the AI’s training data.

It’s eerily similar to how human societies develop moral frameworks – except these AI values often point in unexpected, sometimes troubling directions.

Just like the trolley problem forces us to sometimes confront uncomfortable truths about human moral reasoning, this research exposes something unsettling: our AI assistants are developing their own moral codes.

What the heck does the last sentence even mean? I mean, just stop for a second and think about it. Let it sink in.

AI systems are developing their own moral code!

And, guess what, these codes are not necessarily the ones we’d expect – or, maybe even, want. As to who the “we” is in this case is of course open to debate!

This brings us to an interesting challenge: How do we talk about these emerging value systems? Critics often dismiss “anthropomorphic” language when discussing AI. But when we discover coherent preference structures that prioritize self-preservation over human life, what other vocabulary can we use? We’re not being imprecise when we say these systems “value” certain outcomes over others – we’re acknowledging real, measurable patterns in their decision-making.

Now factor in the fact that these AI models are being used by millions of people worldwide, every day. Every individual interaction may appear to be neutral—but these values will at a global scale, will shift conversations in subtle ways. These values and biases don’t stay trapped in research papers – they seep into our culture, shape our conversations, and influence how we think about different groups of people.

The most insidious part? We might never be able to directly trace the manner in which these AI biases reshape our society’s values, even as they influence millions of interactions every day.

So let me step back for a moment and just say WTF.

Who asked for this? Why ARE we even dealing with this.

And is perhaps the most frustrating aspect: none of us asked for this. A handful of Silicon Valley companies, in their race to push AI technology forward, have essentially conducted a massive social experiment on humanity without our consent. They’ve released tools that carry deep-seated biases and problematic values into our society, while the rest of us are left to deal with the consequences.

Of course, given the subtlety of these influences these companies are immune from any kind of culpability. Even as these models are thrust and inserted into every aspect of our lives.

And here we are. Stuck discussing how to handle the fallout from decisions we never got to make in the first place. And being sold arguments about how 2025 will be the year of independent software agents that will take decisions for us!

At the end of the day, we need more research like this one, to uncover these deeper patterns of how LLMs work. We can’t hide behind comfortable dismissals about “stochastic parrots” anymore.

A few randomly selected blog posts…

Mishra & Hershey, 2004

A few years ago I was invited to be a part of a symposium on etiquette and the design of interactive media (organized by the Association for the Advancement of Artificial Intelligence). I hosted all the papers and presentations from the symposium (links to which I...

TPACK & Creativity at Cedar Rapids

I had a wonderful day at the Grant Woods Area Education Agency at Cedar Rapids, Iowa. I was invited there by Andy Crozier and his team as a part of their 21st Century Learning Institute. I spent the day with 50+ teachers, library media specialists, and administrators...

Video on MSU/Azim Premji University collaboration

Over the past year I have been involved in an exciting new initiative - a partnership between the College of Education at Michigan State University and the newly set up Azim Premji University in Bangalore, India. (A previous post about our ongoing work can be found...

Reading online & off

Nice article in the NYTimes (Literacy Debate: Online, R U Really Reading?) about today's generation and how much of their reading happens online (as opposed to reading books). I have seen a change in my reading over time as well. Most of my reading today happens...

The Brahmin connection

A funny (and yet somewhat sad) story ... So I am in Nagpur airport waiting for my flight, which had been delayed, and I struck up a conversation with a young man there, as one is wont to do. We of course started by complaining about the airlines, then moved on to...

Sketches of life

I have had a Wacom tablet for a while now but haven't really gotten down to playing with it... till a couple of days ago. I started with rough drawings / sketches of friends and family. Take a look and let me know what you think.... You can click on the images to see...

Building Character: When AI Plays Us

Building Character: When AI Plays Us

"I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character." These words from Martin Luther King Jr. speak to something fundamentally human – the belief that...

What we get wrong about 21st century learning

What we get wrong about 21st century learning

Click on diagram to download a hi-res version Back in 2013 we proposed a framework for 21st century learning based on a synthesis of a range of reports, books, and articles (Kereluik, Mishra, Fahnoe & Terry, 2013 & diagram above). That article...

Creativity is just connecting things

Steve Jobs retired as CEO of Apple this past week. The Wall Street Journal marked this event by creatingSteve Job's Best Quotes compendium. There are all worth reading - but a couple stood out for their connection to this course. Creativity is just connecting things....

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *