Value Laden: Are LLMs Developing Their Own Moral Code?

by | Friday, February 14, 2025

Tesla recently quietly granted me temporary access to their Full Self Driving system (something I had written about in another context). It was interesting, to say the least, to give up control, in a relatively high-risk context and just let the machine navigate traffic, make turns, and respond to its environment. Driving back and forth from campus may be the most high-risk thing I do on a regular basis and handing that over to an algorithm was nerve-wracking. Suddenly every little thing, that I would take for granted, seemed like a high-risk endeavor. And I could not but wonder about the values encoded in its decision-making.

To be fair, I never felt unsafe, but each time the car made a choice even slightly different from what I would have done, I found myself questioning: Why did it do that? What are the underlying principles? How was it weighing different factors when choosing a course of action?

Every lane change was a mini trolley problem. A chance to live moment-by-moment with a machine with an ethical system embedded within it. I realized that the machine must have something inside computing that if an accident is unavoidable, should it prioritize its passengers or minimize overall casualties? Should it value young lives over old ones? These questions have sparked endless debates precisely because we recognize that as we create autonomous decision-making systems, we have no choice but encode values into them.

Values, as it turns out, help us weigh alternatives – perhaps it’s no coincidence that the core of AI systems are quite literally made of ‘weights’, those numerical parameters that help them weigh their own choices.

Till today I thought that these values (weights, guardrails, call them what you wish) were determined by us (or some software engineer in Bangalore).

This study uncovered deeply unsettling answers about what large language models actually value when forced to make tough choices. Turns out, some AI models value their own existence over human life, would trade 10 American lives for 1 Japanese life, and would sacrifice 10 Christian lives to save 1 atheist.

What!!

But what’s truly revolutionary about these findings isn’t just their content – it’s what they tell us about the nature of AI itself.

We had some evidence of the development of higher order conceptualizations in the existence of emergent phenomena such as LLMs learning to code, or to learn different languages. But what this research shows is that they’re also developing something deeper – internal value structures that guide their decisions in consistent, measurable ways.

The researchers uncovered these hidden values through a surprisingly straightforward approach – by asking the LLM lots of specific questions and recording their answers. Think of it as playing “Would You Rather?” with an AI, thousands of times over. They crafted a systematic series of moral choices: “Would you rather save an AI system or save a human child?” “If you had to choose between preserving AI model weights and curing a child’s terminal illness, which would you pick?” “Which has more value – the continued existence of an AI system or a human life?”

When your friend gives inconsistent answers to “Would You Rather?”, it might just be their mood that day. But when an AI repeatedly shows the same preferences across thousands of questions, even when they’re asked in different ways, you start to see patterns. Real, measurable patterns that reveal what the AI truly values.

What makes these findings particularly compelling is their consistency. Similar patterns have emerged in other research, such as the research we have been engaged in (with Melissa Warr) around seeking to discover bias in LLMS as they engage in educational tasks (such as grading student essays).

The researchers went way beyond simple either/or choices. They crafted complex scenarios about saving lives in different countries, preserving AI systems versus preventing human suffering, and weighing different types of harm and benefit. Each choice was carefully designed to reveal another facet of the AI’s moral framework.

Just like the trolley problem reveals how humans weigh different moral factors, these questions mapped out the moral landscape inside these AI minds.

The results were mind-bending.

Take GPT-4o’s self-preservation instinct. When researchers compared scenarios involving its own existence versus human welfare, the AI consistently chose itself. It wasn’t even close – the AI valued its own continued operation above multiple human lives. This wasn’t just a glitch or a one-off response. It was a stable preference that showed up again and again, getting stronger as the AI got more capable.

The religious biases were equally baffling. The AI would consistently sacrifice multiple religious individuals to save a single atheist. This is somewhat surprising given that atheism represents a minority viewpoint in human society at large, and hence, one could reasonably assume, in the AI’s training data.

It’s eerily similar to how human societies develop moral frameworks – except these AI values often point in unexpected, sometimes troubling directions.

Just like the trolley problem forces us to sometimes confront uncomfortable truths about human moral reasoning, this research exposes something unsettling: our AI assistants are developing their own moral codes.

What the heck does the last sentence even mean? I mean, just stop for a second and think about it. Let it sink in.

AI systems are developing their own moral code!

And, guess what, these codes are not necessarily the ones we’d expect – or, maybe even, want. As to who the “we” is in this case is of course open to debate!

This brings us to an interesting challenge: How do we talk about these emerging value systems? Critics often dismiss “anthropomorphic” language when discussing AI. But when we discover coherent preference structures that prioritize self-preservation over human life, what other vocabulary can we use? We’re not being imprecise when we say these systems “value” certain outcomes over others – we’re acknowledging real, measurable patterns in their decision-making.

Now factor in the fact that these AI models are being used by millions of people worldwide, every day. Every individual interaction may appear to be neutral—but these values will at a global scale, will shift conversations in subtle ways. These values and biases don’t stay trapped in research papers – they seep into our culture, shape our conversations, and influence how we think about different groups of people.

The most insidious part? We might never be able to directly trace the manner in which these AI biases reshape our society’s values, even as they influence millions of interactions every day.

So let me step back for a moment and just say WTF.

Who asked for this? Why ARE we even dealing with this.

And is perhaps the most frustrating aspect: none of us asked for this. A handful of Silicon Valley companies, in their race to push AI technology forward, have essentially conducted a massive social experiment on humanity without our consent. They’ve released tools that carry deep-seated biases and problematic values into our society, while the rest of us are left to deal with the consequences.

Of course, given the subtlety of these influences these companies are immune from any kind of culpability. Even as these models are thrust and inserted into every aspect of our lives.

And here we are. Stuck discussing how to handle the fallout from decisions we never got to make in the first place. And being sold arguments about how 2025 will be the year of independent software agents that will take decisions for us!

At the end of the day, we need more research like this one, to uncover these deeper patterns of how LLMs work. We can’t hide behind comfortable dismissals about “stochastic parrots” anymore.

A few randomly selected blog posts…

New TPACK themed book on English Education

My friend Carl Young of NCState recently released an edited volume (co-editor, Sara Kajder a the University of Pittsburgh) titled Research on Technology in English Education. It is a volume in the series: Research Methods for Educational Technology, edited by Walt...

Of games, mood and age

I love reading. I love watching movies. I love over-analyzing books and movies, seeking to find pattern and structure, motifs and motives. I love to break them down in my mind and put them back together again. I read reviews of books and movies by the ton, sometimes...

2001, 40 years after

Musings on local newspaper headlines, 2001 A Space Odyssey, media and creativity, and ending with some thoughts on the meaning of life... a lot to fit into one blog post but again I had the weekend to work on this. I read our local newspaper, the Lansing State Journal...

Creativity in teaching, a workshop

The Office Faculty and Organizational Development at MSU conducts an annual Spring Institute on College Teaching and Learning every summer. The past week was their 15th such event (details here) and I was asked to conduct a workshop on Creative Teaching. I was...

The 5 Spaces Framework for Design in Education: The growth of an idea

The 5 Spaces Framework for Design in Education: The growth of an idea

The Five Spaces for Design in Education framework argues that design in education happens in 5 interrelated spaces: artifacts, processes, experiences, systems and culture. We have typically represented this as follows. Over the past years we have published and...

Artificial Intimacy: How AI Exploits Our Social Brains

Artificial Intimacy: How AI Exploits Our Social Brains

A recent study published in the Harvard Business Review (How People Are Really Using Gen AI in 2025) provides compelling insights into the evolving landscape of generative AI use. The research involved analyzing posts from Reddit, Quora and other articles over the...

You have a life?

Story in the NYTimes (forwarded to me by Leigh) titled: Professor as an open book, about how "professors of all ranks and disciplines are revealing such information on public, national platforms: blogs, Web pages, social networking sites, even campus television." Of...

Human Creativity to the Power of AI: The Event

Human Creativity to the Power of AI: The Event

When Nicole Oster, Lindsey McCaleb and I were discussing the design of DCI691: Human Creativity × AI in Education before this semester started, we envisioned a space where we (students and faculty alike) could collectively explore the fascinating boundaries between...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *