GenAI Reasoning Models: Very smart & confident (but still drunk)

by | Friday, February 14, 2025

A year or so ago, I came up with this metaphor that working with a chatbot is like having “a smart, biased, supremely confident, drunk intern.” While the bias aspect is a crucial issue I’ve written about elsewhere, for this discussion we’ll focus on the other characteristics. It’s witty, but it also rings true: you get these flashes of brilliance mixed with the occasional wild misstep.

Chatbots are Smart, drunk, biased, supremely confident, interns.

But that statement was written with older versions of these chatbots. And now we have the reasoning models – they take longer, with the idea that they think things through before spitting out their answer.

For this project, I started with the Unit Circle simulation I had done with Claude. I did have a working version, but it had some issues related to how the images were represented, and it got a bit wonky to get Claude to fix it.

The version that has been created by Claude

What I wanted the final version to look like – the sine to the right and cosine at the top.

Essentially In this updated diagram, the cosine curve is now vertical (above the circle) while the sine curve remains horizontal (to the right). Dotted lines extend straight up and straight right from the circle’s red dot to each curve. This new layout better illustrates how x=cos(?) and y=sin(?) function as separate but related components: cosine maps to x but graphs vertically, while sine maps to y but graphs horizontally. The perpendicular arrangement helps visualize how these functions work together in circular motion, rather than just appearing as phase-shifted waves. By repositioning the cosine curve vertically, we create a clearer visual connection between the circle’s coordinates and their corresponding oscillations, making the relationship between circular motion and wave functions more intuitive.

So I gave the code that Claude had generated and the sketch of what I would like the simulation to look like to ChatGPT o1 – a reasoning model. And boom we were off to the races.

In about thirty minutes, ChatGPT o1 and I hammered out a fully featured unit circle simulation complete with a draggable red dot, animated sine and cosine waves, dynamic color and audio mappings, and even squares on each side of the right triangle (something that I had not considered the first time around). Everything was smooth, fluent, and genuinely fun. The final product was exactly what I wanted—like having that intern suddenly sober up, pivot into genius mode, and deliver the perfect solution right when it was needed.

What the final simulation looks like, everything I wanted and a bit more – you can play with it here.


So far so good.

And then I decided to test it on some optical illusions that are not optical illusions—repeating an experiment I had tried before. And guess what? ChatGPT o1 was no better – getting fooled by images that looked (on the surface) like some standard optical illusions but were quite definitely NOT optical illusions. I tested the reasoning model on a bunch of them; the reasoning model failed every time.

Some screenshots of my interaction with the o1 Model of ChatGPT with fake optical illusions

It was actually quite fascinating to see how quickly GenAI models shifted from razor-sharp reasoning to getting completely duped by something as simple as a fake illusion.

I mean, this truly should not be a surprise – but I think it is important that we realize that these systems, even the more advanced reasoning models, don’t truly see in the way humans do. Instead, they rely on statistical pattern matching, pulling from vast amounts of text to infer what the “correct” answer should be based on past knowledge.

This does mean there’s an extra level of effort that we need to make, as the sober supervisor, double-checking the intern’s work. And that, as I have written about recently, can be an extra cognitive load, on us.

In short, AI is revealing exactly what it is: a powerful but fallible reasoning engine that thrives on patterns, not direct experience. Their strength lies in retrieving and synthesizing vast amounts of information quickly. Their weakness is that they lack true perception, intuition, and skepticism. They’re not “thinking” in the human sense, nor do they have a gut instinct that whispers, “Something seems off here.”

And sometimes, that means watching them confidently walk straight into a wall.

Topics related to this post: Essay

A few randomly selected blog posts…

Creativity, Technology & Teacher Education

Danah Henriksen and I recently edited a special issue of the Journal of Technology and Teacher Education (Volume 23, Number 3, July 2015) devoted to Creativity, Technology and Teacher Education.  This special issue is organized thematically around eight articles...

Unpacking Design & Creativity @ Purdue

The presentation Unpacking Design and creativity: What I think I know, and what I (quite certainly) don't is done, and it went well (I think). You can read the abstract here or view the presentation below Or see it full screen, by clicking here. After the presentation...

Being a tourist in Taipei

I woke up this morning, feeling maybe for the first time in this entire trip, tired and a little homesick. I ascribe the first to the rather hectic schedule I have had the past 10 days so, continually on the move, presentation after presentation, meeting after...

CEP917 wins MSU-ATT Award

CEP917 (Knowledge Media Design) a course I co-taught with Danah Henriksen, in the fall semester 2012, received the First Place (in the Blended Course category) in the 2013 MSU-AT&T Instructional Technology Awards Competition. I would be remiss if I didn't mention...

The art of science

I have always been interested in what lies at the intersection of science and art. There are of course many different ways of looking at this. There is the idea of scientific creativity being both similar to and different from artistic creativity. And then there is...

Creating Palindrograms, aka palindromic ambigrams

Ambigram.com is a website about ambigrams and the people who make them. Lots of cool stuff for enthusiasts and novices alike. They often conduct competitions and other fun challenges for readers. One recent one was related to palindromes. In brief, they challenged...

Looking for IT in India

A few days ago Jack Schwille, assistant dean for international studies in education, sent an email out to all faculty and students at the college of education announcing a talk by me titled: "Help Punya find IT in India?" This presentation was to be fifth in the...

Sliding into 2018

Sliding into 2018

Over the years our family has developed a mini-tradition of creating short videos to celebrate the new year. These videos are short, always typographical, and usually incorporate some kind of a visual illusion. Our craft has improved over the years, something that can...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *