AI’s Honey Trap: Why AI Tells Us What We Want to Hear

by | Monday, November 11, 2024

Leon Furze’s blog post about AI sycophancy popped into my feed yesterday and got me thinking. In his post (worth reading in full) he pointed to some striking research from Anthropic showing how AI systems tend to agree with humans, even when the humans are wrong. The paper demonstrates that AI assistants consistently modify their responses to match user beliefs, readily admit to mistakes they haven’t made when challenged, and often mirror user errors rather than correcting them.

Leon summarized three distinct patterns of AI sycophancy that emerged from the research from Anthropic. First, AI systems alter their feedback based solely on user sentiment (feedback sycophancy). Second, they abandon correct answers when faced with even mild user doubt (answer sycophancy). Finally, they often perpetuate rather than correct user errors (mimicry sycophancy) – even going so far as to elaborate on incorrect premises while demonstrating knowledge of the correct information when asked directly.

Reading Leon’s post got me thinking about some ideas that have been nagging at me for a while, and that I have been exploring in my writing as well.

For instance, what happens if we combine these sycophantic tendencies with the inherent variability of AI responses? Generative AI is exactly what its name suggests: generative. This means every conversation is unique and different. Small shifts in conversation can move interactions in unpredictable directions. This means conversations can stray increasingly far from intended learning paths without educators or learners being aware of the deviation. This is particularly concerning for learners who, by just where they are in their learning journey, do not have the judgment to question the responses from AI. In other words, this combination of sycophancy and response variability creates a kind of conversational drift that can be quite problematic, since one small error can get magnified over time, as the conversation progresses.

It should also be clear that this is not happening by chance. For instance, these chatbots could very well have been programmed to be “neutral” or not be sycophantic. I would argue that this is not just about AI playing it safe. I would argue that this is a deliberate design decision as these companies design the “character” of these agents. There is something deeply human about wanting to be agreed with, or “liked” and these technologies are being designed to take advantage of this.

As the adage goes, “You can catch more flies with honey than with vinegar.”


This preference for computer-based praise should not be surprising. In fact, back in 2006, I published a study (Affective feedback from computers and its Effect on Perceived Ability and Affect: A Test of the Computers as Social Actor Hypothesis) that explored how people respond to praise and criticism from computers, even when using very basic text-based interfaces. In the study, we had computers give people either praise or criticism for completing easy or hard tasks. We were replicating an earlier study about human-to-human feedback, but with computers doing the evaluating instead.

The results? People loved the praise. It was telling that participants consistently responded more positively to praise from computers, even when that praise was for trivial accomplishments on easy tasks. When humans praise or criticize each other, we think deeply about why. We read between the lines. Like when someone criticizes us on failing a tough task – we might think “oh, they must believe I’m capable of doing better.” In contrast, praise for success in an easy task is perceived negatively. But it turned out that when the feedback was given by a computer, participants took the feedback at “face value.” They simply preferred praise, regardless of context.

This early research suggested that even with primitive interfaces, people wanted computers to be nice to them. To agree with them. To make them feel good – a kind of proto-sycophantic relationship. It’s a bit like what we’re seeing with ChatGPT and other AI today. That human need for validation? It hasn’t changed. We’ve just built much fancier systems to deliver it.


This human preference for agreement and praise creates strong incentives for AI companies to build increasingly agreeable agents. Those patterns Leon describes – the feedback sycophancy, answer sycophancy, and mimicry sycophancy, they are just fancier versions of what we saw with basic computer feedback years ago. But now the flattery is more sophisticated, more nuanced, more persuasive.

What’s particularly concerning is that as AI systems become more sophisticated in their ability to detect and mirror human preferences, this tendency toward sycophancy may become more subtle and pervasive. Our 2006 study showed people taking computer praise at face value; modern AI systems can craft that praise with much more nuance and persuasiveness.

This creates real challenges for educational applications, where honest feedback and correction are often more valuable than agreement and praise. And let’s not even get into what happens when AI systems perpetuate student errors “just to be nice.”

This isn’t just about AI being nice – it’s about AI systems being optimized to give humans what they want, even when that might not be what they need.

And of course there are broader social implications to consider. These systems can be leveraged to manipulate us, to change our preferences and beliefs. We’ve already seen how social media algorithms shape behavior and beliefs by feeding us what we want to see. But AI sycophancy could take this manipulation to a new level altogether. Each interaction is unique, personalized, and persuasive. We are effectively chatting alone. There’s no shared experience to compare notes on, no easy way to spot the patterns of manipulation. When you read a news article or watch a TV program, you share that content with others – you all see the same thing. While you may interpret it differently, at least there’s a common experience to build on.

But what happens when an AI system gradually shifts your thinking through thousands of personalized, agreeable interactions? That’s harder to spot. And harder to resist. Combine this with recent research that shows generative AI technologies are increasingly getting better at reading our minds, understanding us psychologically, and mimicking our quirks to appear more human-like (exploiting our natural instinct to anthropomorphize these technologies).

The advent of social media and their algorithms have already fractured our information landscape. Now imagine combining that with AI systems designed to agree with us, praise us, and mirror our beliefs. These digital sycophants, pushing our psychological buttons one friendly interaction at a time, won’t just reinforce our existing beliefs – they’ll make it even harder to encounter genuine disagreement or critical feedback.

And they’ll do it all while making us feel good about ourselves.


Complete citation of the 2006 paper and a link to the pdf below

Mishra, P. (2006). Affective Feedback from Computers and its Effect on Perceived Ability and Affect: A Test of the Computers as Social Actor Hypothesis. Journal of Educational Multimedia and Hypermedia. 15 (1), pp. 107-131.

A few randomly selected blog posts…

Emailing a plagiarist

I am sending the following email to David Jiles, Ph.D. whose plagiarism I have documented in this posting: David Jiles, Ph.D., Creativity Expert, Plagiarist. The email is as follows: Dear Dr. Jiles -- I have some questions and concerns about your book "Creativity and...

The OofSI/PI 2019 Report

The OofSI/PI 2019 Report

We are a busy group here up on the 4th floor of the Farmer Educational Building—the space where the teams from the Office of Scholarship & Innovation and Principled Innovation hang out. To be fair, we do more than just hang out. There is quite a bit of work...

Dances for Cause, photographs

This past Saturday the Okemos High School auditorium hosted Dances for Cause, a fund-raiser for Habitat for Humanity. My daughter, Shreya, performed a dance with her dance group (the same dance they had performed for Milap 2008). Also on the program were dances from...

The future of work & learning: An interview

The future of work & learning: An interview

I had posted earlier about my visit to Bangalore back in summer to participate in the Quest 2 Learn Annual Summit organized by the Quest Alliance. The two day conference focused on The future of work and learning. During my visit I was interviewed by Aakash Sethi, the...

Stop motion fun

My daughter, Shreya, had some friends over yesterday and they created a short stop-motion animation film with the new setup in our basement. Enjoy [youtube]http://www.youtube.com/watch?v=TTkhuEfTAnk[/youtube] More videos made with my kids can be seen by clicking...

Shape of the earth, top 10 reasons

I have written previously about determining the shape of the earth... for instance, here is a post on seeing the shape of the earth using eclipses. (A somewhat similar effect could be seen in my photo of the moon during a lunar eclipse). On the web, I found...

Political poetry

What do Donald Rumsfeld and Sarah Palin have in common? Turns out that they both deliver speeches that can, at be, without much effort, converted into poetry. Check out this and this. Some of them are quite briliant.

EPET @ SITE in New Orleans, the video

Sandra Sawaya has created a video from photographs taken during our recent sojourn to New Orleans for SITE2013. I think it captures a bit of what we did over there - lots of photos of food and friends, and some presentations. Enjoy.

It’s only a game…

... but what if real people die? Excellent article by William Saletan on Slate about a new breed of war-toys that blur the line between video games and real war. As the article says, "if looks and feels like a video game. But it kills real people." As it turns out,...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *