Leon Furze’s blog post about AI sycophancy popped into my feed yesterday and got me thinking. In his post (worth reading in full) he pointed to some striking research from Anthropic showing how AI systems tend to agree with humans, even when the humans are wrong. The paper demonstrates that AI assistants consistently modify their responses to match user beliefs, readily admit to mistakes they haven’t made when challenged, and often mirror user errors rather than correcting them.
Leon summarized three distinct patterns of AI sycophancy that emerged from the research from Anthropic. First, AI systems alter their feedback based solely on user sentiment (feedback sycophancy). Second, they abandon correct answers when faced with even mild user doubt (answer sycophancy). Finally, they often perpetuate rather than correct user errors (mimicry sycophancy) – even going so far as to elaborate on incorrect premises while demonstrating knowledge of the correct information when asked directly.
Reading Leon’s post got me thinking about some ideas that have been nagging at me for a while, and that I have been exploring in my writing as well.
For instance, what happens if we combine these sycophantic tendencies with the inherent variability of AI responses? Generative AI is exactly what its name suggests: generative. This means every conversation is unique and different. Small shifts in conversation can move interactions in unpredictable directions. This means conversations can stray increasingly far from intended learning paths without educators or learners being aware of the deviation. This is particularly concerning for learners who, by just where they are in their learning journey, do not have the judgment to question the responses from AI. In other words, this combination of sycophancy and response variability creates a kind of conversational drift that can be quite problematic, since one small error can get magnified over time, as the conversation progresses.
It should also be clear that this is not happening by chance. For instance, these chatbots could very well have been programmed to be “neutral” or not be sycophantic. I would argue that this is not just about AI playing it safe. I would argue that this is a deliberate design decision as these companies design the “character” of these agents. There is something deeply human about wanting to be agreed with, or “liked” and these technologies are being designed to take advantage of this.
As the adage goes, “You can catch more flies with honey than with vinegar.”
This preference for computer-based praise should not be surprising. In fact, back in 2006, I published a study (Affective feedback from computers and its Effect on Perceived Ability and Affect: A Test of the Computers as Social Actor Hypothesis) that explored how people respond to praise and criticism from computers, even when using very basic text-based interfaces. In the study, we had computers give people either praise or criticism for completing easy or hard tasks. We were replicating an earlier study about human-to-human feedback, but with computers doing the evaluating instead.
The results? People loved the praise. It was telling that participants consistently responded more positively to praise from computers, even when that praise was for trivial accomplishments on easy tasks. When humans praise or criticize each other, we think deeply about why. We read between the lines. Like when someone criticizes us on failing a tough task – we might think “oh, they must believe I’m capable of doing better.” In contrast, praise for success in an easy task is perceived negatively. But it turned out that when the feedback was given by a computer, participants took the feedback at “face value.” They simply preferred praise, regardless of context.
This early research suggested that even with primitive interfaces, people wanted computers to be nice to them. To agree with them. To make them feel good – a kind of proto-sycophantic relationship. It’s a bit like what we’re seeing with ChatGPT and other AI today. That human need for validation? It hasn’t changed. We’ve just built much fancier systems to deliver it.
This human preference for agreement and praise creates strong incentives for AI companies to build increasingly agreeable agents. Those patterns Leon describes – the feedback sycophancy, answer sycophancy, and mimicry sycophancy, they are just fancier versions of what we saw with basic computer feedback years ago. But now the flattery is more sophisticated, more nuanced, more persuasive.
What’s particularly concerning is that as AI systems become more sophisticated in their ability to detect and mirror human preferences, this tendency toward sycophancy may become more subtle and pervasive. Our 2006 study showed people taking computer praise at face value; modern AI systems can craft that praise with much more nuance and persuasiveness.
This creates real challenges for educational applications, where honest feedback and correction are often more valuable than agreement and praise. And let’s not even get into what happens when AI systems perpetuate student errors “just to be nice.”
This isn’t just about AI being nice – it’s about AI systems being optimized to give humans what they want, even when that might not be what they need.
And of course there are broader social implications to consider. These systems can be leveraged to manipulate us, to change our preferences and beliefs. We’ve already seen how social media algorithms shape behavior and beliefs by feeding us what we want to see. But AI sycophancy could take this manipulation to a new level altogether. Each interaction is unique, personalized, and persuasive. We are effectively chatting alone. There’s no shared experience to compare notes on, no easy way to spot the patterns of manipulation. When you read a news article or watch a TV program, you share that content with others – you all see the same thing. While you may interpret it differently, at least there’s a common experience to build on.
But what happens when an AI system gradually shifts your thinking through thousands of personalized, agreeable interactions? That’s harder to spot. And harder to resist. Combine this with recent research that shows generative AI technologies are increasingly getting better at reading our minds, understanding us psychologically, and mimicking our quirks to appear more human-like (exploiting our natural instinct to anthropomorphize these technologies).
The advent of social media and their algorithms have already fractured our information landscape. Now imagine combining that with AI systems designed to agree with us, praise us, and mirror our beliefs. These digital sycophants, pushing our psychological buttons one friendly interaction at a time, won’t just reinforce our existing beliefs – they’ll make it even harder to encounter genuine disagreement or critical feedback.
And they’ll do it all while making us feel good about ourselves.
Complete citation of the 2006 paper and a link to the pdf below
Mishra, P. (2006). Affective Feedback from Computers and its Effect on Perceived Ability and Affect: A Test of the Computers as Social Actor Hypothesis. Journal of Educational Multimedia and Hypermedia. 15 (1), pp. 107-131.
0 Comments