GenAI Reasoning Models: Very smart & confident (but still drunk)

by | Friday, February 14, 2025

A year or so ago, I came up with this metaphor that working with a chatbot is like having “a smart, biased, supremely confident, drunk intern.” While the bias aspect is a crucial issue I’ve written about elsewhere, for this discussion we’ll focus on the other characteristics. It’s witty, but it also rings true: you get these flashes of brilliance mixed with the occasional wild misstep.

Chatbots are Smart, drunk, biased, supremely confident, interns.

But that statement was written with older versions of these chatbots. And now we have the reasoning models – they take longer, with the idea that they think things through before spitting out their answer.

For this project, I started with the Unit Circle simulation I had done with Claude. I did have a working version, but it had some issues related to how the images were represented, and it got a bit wonky to get Claude to fix it.

The version that has been created by Claude

What I wanted the final version to look like – the sine to the right and cosine at the top.

Essentially In this updated diagram, the cosine curve is now vertical (above the circle) while the sine curve remains horizontal (to the right). Dotted lines extend straight up and straight right from the circle’s red dot to each curve. This new layout better illustrates how x=cos(?) and y=sin(?) function as separate but related components: cosine maps to x but graphs vertically, while sine maps to y but graphs horizontally. The perpendicular arrangement helps visualize how these functions work together in circular motion, rather than just appearing as phase-shifted waves. By repositioning the cosine curve vertically, we create a clearer visual connection between the circle’s coordinates and their corresponding oscillations, making the relationship between circular motion and wave functions more intuitive.

So I gave the code that Claude had generated and the sketch of what I would like the simulation to look like to ChatGPT o1 – a reasoning model. And boom we were off to the races.

In about thirty minutes, ChatGPT o1 and I hammered out a fully featured unit circle simulation complete with a draggable red dot, animated sine and cosine waves, dynamic color and audio mappings, and even squares on each side of the right triangle (something that I had not considered the first time around). Everything was smooth, fluent, and genuinely fun. The final product was exactly what I wanted—like having that intern suddenly sober up, pivot into genius mode, and deliver the perfect solution right when it was needed.

What the final simulation looks like, everything I wanted and a bit more – you can play with it here.


So far so good.

And then I decided to test it on some optical illusions that are not optical illusions—repeating an experiment I had tried before. And guess what? ChatGPT o1 was no better – getting fooled by images that looked (on the surface) like some standard optical illusions but were quite definitely NOT optical illusions. I tested the reasoning model on a bunch of them; the reasoning model failed every time.

Some screenshots of my interaction with the o1 Model of ChatGPT with fake optical illusions

It was actually quite fascinating to see how quickly GenAI models shifted from razor-sharp reasoning to getting completely duped by something as simple as a fake illusion.

I mean, this truly should not be a surprise – but I think it is important that we realize that these systems, even the more advanced reasoning models, don’t truly see in the way humans do. Instead, they rely on statistical pattern matching, pulling from vast amounts of text to infer what the “correct” answer should be based on past knowledge.

This does mean there’s an extra level of effort that we need to make, as the sober supervisor, double-checking the intern’s work. And that, as I have written about recently, can be an extra cognitive load, on us.

In short, AI is revealing exactly what it is: a powerful but fallible reasoning engine that thrives on patterns, not direct experience. Their strength lies in retrieving and synthesizing vast amounts of information quickly. Their weakness is that they lack true perception, intuition, and skepticism. They’re not “thinking” in the human sense, nor do they have a gut instinct that whispers, “Something seems off here.”

And sometimes, that means watching them confidently walk straight into a wall.

Topics related to this post: AI | Creativity | Fun | Learning | Personal | Psychology | Puzzles | Research | Stories | Teaching | Technology

A few randomly selected blog posts…

Avani Amol Pavangadkar…

... was born on the 7th of October, to Amol and Kanchan. [Amol was my partner in crime in the making of Hari Puttar!] We went to visit her yesterday and I took some pictures. Enjoy. View all the pictures

Happy New Year (and a new illusory video)

Happy New Year (and a new illusory video)

Since 2008 we have been creating short videos to welcome in the New Year. These videos, created on a shoe-string budget, are usually typographical in nature with some kind of an optical illusion or aha! moment built in. Check out our latest creation to welcome 2019...

Bollywood meets Guitar Hero

Over the Christmas break my daughter and three of her friends got together to make a music video. The idea was simple, what if there were a version of Guita Hero (Sitar Hero anyone?) for Bollywood songs. Out of this idea emerged a 5+ minute long music video - with a...

The mysterious pentagon

There are interesting patterns all around us. Here is one I found the other day. We were boiling lentils in a shallow bowl... and then, out of nowhere emerged an almost perfect pentagon! The almost perfect pentagon that showed up on the surface of the boiling lentils!...

Educational Change by Design: A school for the future

Educational Change by Design: A school for the future

How do we design a school for the future? This recent article seeks to capture (in the form of a case study) our recent experience in designing such a school. The design process was a collaborative process involving a partnership with a local school district and the...

Is TPACK fundamentally flawed? A quick response

Richard Olsen over in his blog has an extended posting titled The TPACK Framework is fundamentally flawed. It is a long and thoughtful post and I recommend everyone to read it. I have posted a short response to his posting (it is under moderation but should show up in...

Tech Integration Models and GenAI: Podcast Episode (Part II)

Tech Integration Models and GenAI: Podcast Episode (Part II)

Last week, I shared information about my participation in the Superspeaks | Microsoft EDU podcast on the BAM Radio Network. The discussion focused on technology integration frameworks in the context of Generative AI, featuring a panel of educational technology...

Video on MSU/Azim Premji University collaboration

Over the past year I have been involved in an exciting new initiative - a partnership between the College of Education at Michigan State University and the newly set up Azim Premji University in Bangalore, India. (A previous post about our ongoing work can be found...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *