Dissertation in a day

by | Monday, March 16, 2026

For the past six years, I have been a co-host on Silver Lining for Learning, a weekly webinar series that began on March 20, 2020, the very week the world shut down due to the pandemic. What started as an urgent conversation among colleagues about how to keep learning alive became something larger: 264 episodes, guests from over 30 countries, six years of thinking together about what education could become. I co-host SLL with Chris Dede, Curt Bonk, Lydia Cao, and Yong Zhao, and I’d always felt there was something in that archive beyond the sum of its episodes. But we had never looked at the full record systematically.

Last Wednesday, sitting in Frankfurt airport waiting for a flight to Billund, I opened Claude and started talking. Over the next few days — in airports, hotels, on flights (thank you, Delta, for Wi-Fi), and then obsessively over the weekend — the project took shape. By Sunday night I had a thematic analysis of 2.6 million words, a reproducible methodology, an interactive website, and a complete open data archive. A friend looked at the result and recognized that this work would ordinarily take months of graduate student labor and a dissertation committee’s worth of methodological debate. His comment: “This is a dissertation done over a weekend.”

He wasn’t entirely wrong.

Take a moment to check out: Analyzing Silver Lining for Learning

Here’s what the project involved:

Claude created a python script that downloaded YouTube transcripts for all 264 episodes. That’s over 2 million words of spoken dialogue, every word of every conversation between hosts and guests. Then Claude helped standardize each of the transcripts: removing time-stamps, normalizing the formatting, creating paragraph boundaries, and adding metadata headers. This gave us one of the largest longitudinal corpora of educational conversation I’m aware of.

The next step was analyzing this corpus. Rather than imposing a framework, I asked Claude to draw two independent random samples of 50 guest episodes each (stratified by year, using different random seeds), and built a thematic taxonomy from the ground up for each of the sets. At the end of the process, we had two independent taxonomies, that we could then compare. As it happens, the two samples converged on 9 of 11 themes independently, a form of inter-rater reliability, where the “raters” were two separate passes through different slices of the data. After some back and forth with Claude we landed on the final taxonomy: 12 themes organized into four clusters.

With these themes in hand the next step was developing a scoring dictionary, i.e. 10–15 keywords at three tiers (primary, secondary, tertiary), each with different weights. The challenge here was that these had to be calibrated for conversational speech, not academic prose. Not surprisingly, people on a webinar don’t say “emergency remote teaching pedagogy,” they say “when the schools closed and we had to figure it out on Zoom.” Every dictionary went through multiple rounds of revision, with cross-theme overlap validation to ensure the instruments were actually measuring distinct constructs.

These keywords were then used to score all of the transcripts. One of the things that Claude suggested was log-scaling, i.e. applying a formula to the scores so that the mere presence of a single repeated keyword would not dominate the score while still rewarding genuine thematic presence. The math isn’t exotic, but applying it consistently across 2.6 million words with 12 separate dictionaries is the kind of thing that would have taken weeks of manual work.

Merely log-scaling is not enough since raw scores aren’t comparable across themes (Teacher Development naturally scores higher than Crisis and Conflict because teachers are discussed everywhere). Per-theme z-score normalization puts all 12 themes on a common scale, making it meaningful to say that an episode’s strongest theme is X rather than Y. Finally, we then identified the top three themes for each episode — the thematic signature that makes each conversation distinctive.

Once all this was done we could now get into representing this data in ways that are accessible to others. This led to three different explorations, each offering a different lens on the data and analysis. The first was a scatterplot that packed all the episodes into thematic space, and clicking a dot would highlight other episodes that were conceptually “adjacent” to it. The second representation was a streamgraph tracing how 12 themes rose and fell over six years. Finally, we created a tool to explore how these themes intersected with each other, i.e. which themes co-occur and which stay in their own lanes. All of this, along with the full data archive (every transcript, every score, every dictionary, every script) was made accessible and downloadable by anyone interested in replicating the research or asking their own questions of the data.

Just to be clear, I did not write a line of code. Claude wrote all the Python scripts, all the JavaScript for the website, all the CSS. Claude processed the transcripts, ran the scoring, computed the normalizations, built the visualizations.

What I did was take decisions, analytical decisions, UX decisions, and aesthetic decisions. Which episodes to sample. How many themes felt right. When a theme was too broad or too narrow. Whether “pandemic” and “crisis” were one theme or two (they’re two — and figuring out why was one of the most important moments in the analysis). How to handle the fact that conversational speech doesn’t map neatly to keyword dictionaries. When the scoring formula was rewarding frequency over genuine thematic presence. What the temporal patterns actually meant. How the interface should look like? How do we represent that a certain theme had been selected. And more…

And all of this happened through a dialogic process. Essentially, I would ask Claude to analyze a batch of episodes. Claude would surface patterns. I would push back, redirect, ask “but what about…” Claude would revise. I would look at the output and say, “This doesn’t feel right, and here’s why.” We would iterate. The themes, the visualizations, the interface, the words — they all emerged from dialogue, a methodology I ended up calling “dialogic coding.”

Thus, the quality of the analysis depended both on what I brought to the task and what Claude could do. In other words, AI can process 2.6 million words in minutes, but it cannot decide what matters. It cannot bring 20 years of thinking about educational technology to bear on whether “Open Education” and “AI” should be in the same cluster. But again, I cannot, for the life of me, write code that would create interactive scatterplots or streamgraphs or in certain cases even know that these representations are appropriate or possible.

Finally, the results of this analysis were surprising at multiple levels. For instance, the pandemic didn’t just fade — it got metabolized into deeper questions about equity and transformation. AI doesn’t just rise — it rose in near-perfect isolation, connecting to fewer other themes than almost any other topic in the archive. Equity didn’t dominate any single conversation but shows up as connective tissue across all six years. And the most telling tension in the entire dataset: as AI enthusiasm surges, so does the counter-theme we called “Technology as Enabler, Not Solution” — the SLL community’s persistent refusal to let technology hype go unchecked.

These are findings that would have taken a traditional research team months to surface from a corpus this size. We got there in a day.

Of course, there are some concerns that are important to point out.

First, the transcripts are auto-generated YouTube captions, not human-produced transcripts. They contain errors. A keyword-based scoring approach cannot capture irony, disagreement, or the difference between someone advocating for equity and someone describing its absence. The themes are inductively derived but they are still one researcher’s interpretation, shaped by that researcher’s interests, blind spots, and disciplinary commitments.

And, of course, the speed itself is a limitation. A dissertation takes years for reasons beyond inefficiency. The slow accretion of understanding, the seminars, the committee pushback, the dead ends that teach you something — all of that is compressed or absent here. What you gain in scope you may lose in depth.

But still. In one weekend, working with an AI collaborator, I built a thematic analysis of 2.6 million words across 264 episodes spanning six years, with a reproducible methodology, an interactive website for exploration, and a complete open data archive that anyone can download and verify. Every number traces back to the transcripts. Every analytical decision is documented. The full dialogue between researcher and AI is preserved and available.

My friend called it a dissertation in a day. It’s not, quite. It’s something else — something we don’t have a good name for yet. A new kind of scholarly practice where the researcher’s role shifts from data processor to analytical director, where the conversation with the AI is itself a form of inquiry, and where the speed of execution means ideas can be tested, revised, and tested again before the initial excitement fades. I think we’re going to see a lot more of this. And I think the interesting question isn’t whether AI can do the work, it clearly can, but whether the humans directing it bring enough expertise, judgment, and intellectual honesty to make the output worth trusting.

In this case, I had 264 episodes’ worth of conversations to draw on. Six years of hearing what the world’s smartest educators think about learning. That’s not a bad foundation for analysis, and to create something from the vast corpus of data the SLL team has generated over the past six years.

Topics related to this post: Essay

A few randomly selected blog posts…

Multiple representations of the periodic table and learning

Mishra & Yadav (2006) was a paper based around my dissertation research. It took a while to get published and I am including it here for the record. My dissertation (Mishra, 1998) was maybe the first place where I made a specific mention of the triad of...

The Nostalgia Machine: Why Ed Tech Research Keeps Missing the Point

The Nostalgia Machine: Why Ed Tech Research Keeps Missing the Point

A colleague recently shared a new study with me that, on the surface, seemed to confirm what many people suspect: that AI is making us dumber. The research, (Experimental Evidence of the Effects of Large Language Models versus Web Search on Depth of Learning) by Shiri...

TPACK & More: Presentation at RemoteK12 summit

TPACK & More: Presentation at RemoteK12 summit

REMOTE K12: The Connected Teacher Summit, was a one-day virtual summit hosted by ASU, designed for K-12 teachers and those that support and enable teachers in district public, charter and private schools.  I presented a talk titled: Technology in teaching &...

Designing for anticipation, Teaching for anticipation

In a couple of previous posts I had talked about the idea of postdiction (see the posts here and here). The argument being that good teaching (among a long list of other good things) is postdictable, i.e. it walks the line between predictability and chaos, and most...

Social-Emotional Learning & AI at LERN2026

Social-Emotional Learning & AI at LERN2026

Social-Emotional Learning & AI at LERN2026 Our team presented two papers at the 2nd Annual Learning Engineering Research Network (LERN) Convening, held February 3–4, 2026, at ASU's Tempe campus. The convening, themed "From Insights to Implementation: Learning...

Tweaking the design

Someone once said that all design is redesign - and it has never been truer than trying to design your website. A few weeks ago I found out that my site looked terrible on the iTouch and the iPhone. I made a quick fix (adding a template and plugin) that would allow...

Generative AI: Will history repeat or (just) rhyme

Generative AI: Will history repeat or (just) rhyme

As generative AI continues to reshape our world, we're faced with a crucial question: Will we repeat the mistakes we made with previous technologies or will this time be something different? George Santayana famously warned, "Those who cannot remember the past are...

Us in Flux: A conversation with Sarah Pinsker

Us in Flux: A conversation with Sarah Pinsker

The Center for Science and the Imagination at ASU has a new series called Us in Flux. Every two weeks they publish a (super-short) short story that explores "themes of community, collaboration, and collective imagination in response to transformative events." They...

Political poetry

What do Donald Rumsfeld and Sarah Palin have in common? Turns out that they both deliver speeches that can, at be, without much effort, converted into poetry. Check out this and this. Some of them are quite briliant.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *