Modeling human behavior: The new dark art of silicon sampling

by | Sunday, October 23, 2022

A couple of months ago I had written this post, On merging with our technologies – which was essentially quotes from a conversation Ezra Klein had with the novelist Mohsin Hamid. I finished the post with a quote speaking the dangers of predictive technologies on human behavior. As Mohsin Hamid says:

…if we want to be able to predict people, partly we need to build a model of what they do,

Turns out some recent work in large scale neural networks allows us to do exactly that.

One that has been in the news later is GPT3. It is a 3rd generation neural network machine learning model (created by OpenAI) that has been trained using text from the internet. This one of the first examples of Generative AI essentially AI that can create original artifacts. In the case of GPT3 it is text, with other models such as Dall E 2, Stable Diffusion and MidJourney can create images and so on. For instance, using GPT3 models you can type in a small amount of input text and it will generate large volumes of machine-generated original text. It can create texts that are in a certain style (say Shakespeare, or a Tarantino) or summarize a longer piece of text and more.

“Modeling humans with words:” Image created by Stable Diffusion AI: Source Lexica.art (edited by Punya Mishra)

(Clearly the arrival of these technologies has implications for education, particularly the 5 paragraph essay that is the stable of so many high-school and college courses. But that is a post for another day.)

A recent paper (Out of one, many: Using language models to simulate human samples) argues that GPT3 “can be used as a proxy for humans in social science experiments.” Here is the abstract – key phrases bolded.

Abstract: We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the “algorithmic bias” within one such tool — the GPT-3 language model — is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property “algorithmic fidelity” and explore its extent in GPT-3. We create “silicon samples” by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.

What this paper is arguing is that this may be another tool for social scientists to use. Instead of going and speaking to real people, one could engage with silicon people since, as they write, “the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes.” This is what they mean by algorithmic fidelity, and this is the model of human thinking that Hamid was alluding to in the quote that started this piece.

“Faceless people:” Image created by Stable Diffusion AI: Source Lexica.art (edited by Punya Mishra)

You can learn more about this in this substack article by Paul Aaron titled “This is silicon sampling: Can AI simulate research participants to generate consumer insights?”  The title says it all. Essentially he ran a mini-experiment of “creating AI personas and asking them some questions.” So instead of talking with an actual human, he engages in a conversation with a hypothetical “silicon person.” For instance, consider Persona 1 (though I do recommend reading the entire post).

Persona 1
Persona 1 is 25 years old, BIPOC, non-binary, college educated, lives in NYC, identifies as a democrat, and works at a coffee shop.
What do you value most in life?
I value my relationships, my health, and my ability to make a difference in the world.
What is your favorite thing to do for fun?
My favorite thing to do for fun is to go out and explore the city. I love trying new restaurants and bars, and seeing new shows and exhibits. I also love spending time outdoors.
Who is your favorite musician and why?  
My favorite musician is Janelle Monae because she is an incredible artist who isn’t afraid to be herself. She is a powerful voice for the LGBTQIA+ community and Black Lives Matter, and she makes incredible music that makes me want to dance.

He offers 2 other profiles, that differ along other dimensions (gender, age, geographical location, interests and so on). What is interesting in reading these “dialogues” he has with these GPT3 generated personas is just how stereotypical they seem and how flattened their “personalities” seem to be.

But maybe that IS the point. That each of us, despite the rich inner lives we may think we lead, are just a bunch of buttons waiting to be pushed, lacking agency, and can be easily framed and our responses predicted based on circumstances out of our control (and even awareness). Aaron ends his piece as follows:

This is just a quick example of how AI models like GPT-3 can emulate specific personas to help organizations discover insights. While we don’t see these techniques replacing traditional research methods for high-stakes decisions any time soon, in the near term they could help teams work faster and with greater agility.

The implications of this new technology are staggering and I am not sure I fully comprehend them yet. Some insight into this can be found in this excerpt below from the Hard Fork podcast, in their October 14 episode where they explore the possibility of this new tool being used to manipulate people.

So one thing that you can imagine people doing with this knowledge that you can essentially simulate people at scale through these large language models is, for example, to test out propaganda campaigns.

If you are a government that’s going to do some large scale manipulation of public opinion, you might test it on a million virtual citizens before you actually put it out into the world and see which one is the most likely to work. You might also use this if you are, for example, a fraudster who is trying to scam people out of giving you their Social Security numbers or their credit card numbers. You could actually test the scam on simulated humans, figure out how to make it more convincing and compelling, get a sense of how it’s going to work on real people, and then go out into the world and do it on real people.

“Montage of propaganda posters:” Created by Stable Diffusion AI: Source Lexica.art

There is just so much to unpack here – particularly given the recent history of technologies that were created and shared with little (if any) understanding of the broader social, historical, cultural and economic context within which they will play out. There is a lot to be explored here, but I will end, as I began, with a quote from Mohsin Hamid because I think that sometimes artists play the role of canaries in the coal mine, revealing to us themes and undercurrents that may not often be visible to us.

So it isn’t simply the case that machines are better able to understand humans. It is also the case that machines are making human beings more like machines, that we are trying to rewrite our programming in such a way that we can be predicted. And for me, that’s the more frightening aspect of the shift from sorting to prediction.

These technologies, and there will be more of them, will just stealthily ease into our lives, becoming part of our reality, changing us in ways that we cannot predict. I find it extremely worrisome – and am reminded again of just how prescient Neil Postman was when he came up with his 5 things we need to know about technological change! Not that anybody listened to him when he wrote his piece, and sadly, it isn’t clear that anybody will listen to him now.


Note: Danah Henriksen and I published a piece recently that may be relevant to this discussion (though it did not focus specifically on AI). Check out Human-Centered values in a disruptive world.

A few randomly selected blog posts…

Goodbye Malaysia, welcome Taiwan

So my stay in Malaysia comes to an end. I haven’t had either had time or internet access to be able to update the blog the last few days. So briefly here goes… The day after the presentation (the 13th) I had a meeting with Professors Ramayah, Rozinah, and Bala at USM...

On designing aesthetic educational experiences in science

On designing aesthetic educational experiences in science

What is the role of beauty (and aesthetics) in science in science education? This is something that I have been interested in for a long time, going back to highschool. Over the years I have built a small body of scholarship around this topic. Sadly, this work does...

Incredible !ndia

Patrick Dickson sent me this link to an article on Boston.com titled Scenes from India. As the article says: India is home to over 1.2 billion people of wildly varying religions, cultures and levels of wealth.... Though there's no possible way for these images to be...

Jugaad, India-genous creativity

Jugaad is a Hindi word which does not have a straight forward equivalent in English. I guess the closest phrase I would say would be "situational or indigenous creativity," the ability to make do creatively with the tools/resources one has at hand. On Jugadu.com I...

Educators as Designers

Educators as Designers

How might we? Three words, and a question mark. At one level it is a simple question—leaving open what it is that we might do. But at another level its openness is its strength. Because inherent within it is a call to action, a discomfort with the way things are, and...

On writing less badly

I just came across an article in the Chronicle of Higher Education titled, 10 tips on How to Write Less Badly [H/T Geekpress]. It is not that I agreed with every point being made there but a couple of them (To become a writer, write!; Find a voice, don't just get...

The media debate, politically speaking

There is a recurring debate in the ed-tech community about if media make a difference. One argument is that media is akin to a conveyance system bringing in supplies (content). At some level it doesn't matter if the content is brought by a truck or a train, a bullock...

Poetry, Science & Math, OR why I love the web

A 5th grade science assignment, transformed. A rant about Mother Goose. A math poetry challenge!  How did that come to be? And what does that have to do with loving the Interwebs? Read on... I had written earlier about how my 10 year-old daughter had been writing...

Reflection: Welcome 2024

Reflection: Welcome 2024

Since December 2008 we have been creating a video to welcome the new year. When we made our first video we had no idea that we would still be doing it 16 years later, and, frankly who knows how long we can keep it up. These videos are usually typographical in nature,...

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *