There is a great deal of buzz about how generative AI (GenAI) can transform education—something I have been thinking about a lot as well. That said, I not so sure we’re asking the right questions.
Let’s back up a second.
Back in the early 90’s I was a grad student at the University of Illinois Urbana-Champaign, building hypermedia models for teaching and learning science. And every once in a while I heard of this program called PLATO – though I never actually got to play with it.
PLATO, or Programmed Logic for Automatic Teaching Operations, was the granddaddy of Intelligent Tutoring Systems (ITS). Born in the 1960s, it was revolutionary: a computer system that could teach, adapt, and even chat with students. By the time I arrived, though, PLATO was more legend than reality, fading into the shadows of tech history.
Why did PLATO fade away? Maybe it was just ahead of its time. And there were other reasons. The hardware couldn’t keep up with the vision. Maintenance was a nightmare. And let’s face it, in the era of emerging personal computers, a centralized mainframe system felt like yesterday’s news.
But the dream of PLATO – personalized, adaptive learning powered by artificial intelligence – never died. It just went into hibernation, waiting for technology to catch up.
Speaking personally, I never bought into intelligent tutoring systems (ITS). Learning for me was complex and personal, driven my individual interests and passions, and, no surprise, top-down instructional systems never appealed to me. But it was cool to be at an university which had housed a pioneering technology based educational solution. So though I never went the ITS route, and instead, focused my attention on teachers, and teacher knowledge and how they could creatively play, learn and teach with the technologies available to them. This meant desktop computers (at first) and the internet (soon after) Those interests led to the TPACK framework and all the good stuff that came from that.
But the dream of ITS never really died. It remained a significant focus of educational research and development—all of which focused around the idea that there would exist, one day, these nifty educational computer programs that would track your progress, adapt to your needs, and serve up custom-tailored learning experiences.
So far this has seemed a dream.
Then ChatGPT burst into the world. ChatGPT was the first of many Large Language Models (LLMs) that entered our collective consciousness and appeared to be the technology that could address the challenge ITS’ had faced in the past.
At first glance, these LLMs seem perfect for supercharging tutoring systems. They can crunch massive amounts of data, spit out relevant info on any topic, and even mimic human-like conversations. It’s was almost like having a genius librarian, subject expert, and charismatic teacher rolled into one AI package. It was PLATO on steroids. The promise of a personalized AI tutor for every student seemed tantalizingly close.
Sal Khan even delivered a TEDTalk about these possibilities and the (mythical) 2-sigma boost a good AI tutor could deliver.
I argue, however, that these expectations are based on a fundamental misunderstanding of this new technology.
As we have argued (oh so many times) LLMs are essentially sophisticated stochastic parrots. They are parrots in that they have no understanding of the meaning of what they are spouting. Stochastic in that their outputs are controlled by complex algorithms that generate outputs probabilistically, word by word. And sophisticated in that these algorithms are complex word predictors based on some deeper “conceptual” structures – not quite like our human schemas, but similar. These structures allow them to make connections and generate responses that can seem surprisingly insightful.
But what we have to recognize is that LLMs are always making stuff up. Always. Sometimes, what they generate aligns perfectly with reality, (or at least our conceptions of it), and we think, “Wow, that’s spot on!” Other times, their responses are so off-base we call it a “hallucination.” But in truth, it’s all the same process. They’re constantly extrapolating, creating, inventing – whether we perceive the output as a hallucination or not. And as I have written elsewhere, companies in this space, are increasingly recognizing this fact.
This generative nature leads to another crucial characteristic: variability. LLM outputs are inherently inconsistent. In a recent study my colleagues and I conducted, we found significant unexplained variance in LLM-generated responses. Even when controlling for input variables, the outputs showed substantial fluctuation—up to two-thirds of the variation in LLM-generated scores couldn’t be attributed to any manipulated factors. This inconsistency persists across different versions of the same LLM and seems resistant to attempts at mitigation through prompt engineering or context expansion.
Please note that I am not even getting into all the other problems these LLMs suffer from, such as being biased, racist and WEIRD. This is not because I don’t think these issues are important. They are, and I care about them deeply, as anybody who has read this blog will know. The reason I do not get into these issues is because, I argue that, even if we fix the issues related to bias (however difficult it may be to do so) these two attributes of LMSs, their ability to hallucinate and the variability in their output, will never go away. These attributes are inherent to this technology.
So what does all this mean for Intelligent Tutoring systems?
Well, in short, it is problematic. Mainly because these fundamental attributes of LLMs—hallucination and variability—conflict with the precise, consistent functioning required in traditional ITS roles. The tendency to hallucinate compromises the accuracy of student modeling and the reliability of domain knowledge presentation. The high degree of output variability undermines the consistency necessary for effective pedagogical strategies.
Imagine a math tutor who sometimes forgets how to add. Or a history teacher who occasionally invents new presidents. Not exactly the rock-solid foundation we want for education, is it?
Now, don’t get me wrong. This creative, variable nature can be amazing for brainstorming or exploring new ideas. But for a tutor? It’s a recipe for disaster.
So, where does this leave us?
It seems to me that the goals of Intelligent Tutoring Systems and Large Language Models are fundamentally different. ITS aim for consistency, accuracy, and targeted instruction. They’re designed to guide a student along a specific learning path, adapting to individual needs but always with a clear educational objective in mind.
LLMs, on the other hand, are engines of possibility. They’re not constrained by the need for consistency or absolute accuracy. Their strength lies in their ability to generate diverse ideas, make unexpected connections, and stimulate thought in new directions. It’s like having a brilliant, wildly creative friend who’s equally likely to offer profound insight or spin an entertaining yarn – and you never quite know which you’re going to get.
Instead of trying to force LLMs into the ITS mold, perhaps we need to reimagine their relationship. What if we viewed LLMs not as replacements for ITS, but as complementary tools in the learning process?
PLATO may be gone, but its spirit lives on – not as the all-knowing digital teacher, but as a collaborative partner in the grand adventure of learning. Or as I have said before, it is a smart, drunk, somewhat biased, intern.
0 Comments