A few days ago, The Washington Post published a story that caught my eye. Titled: The Marshmallow Test and other predictors of success have bias built in, researchers say, the article discusses the famous Marshmallow Test, long heralded as a predictor of future success.
The Marshmallow Test is a psychological experiment developed in the late 1960s by Stanford psychologist Walter Mischel. In this test, children are offered a choice: eat one marshmallow immediately or wait for a short period (usually about 15 minutes) to receive two marshmallows. The ability to delay gratification, as measured by waiting for the second marshmallow, was later correlated with various positive outcomes in life, including better academic performance, higher SAT scores, lower body mass index (BMI), and even higher income in adulthood.
This simple test gained immense popularity and has been widely cited as a predictor of future success. It’s been used to argue for the importance of self-control and delayed gratification in achieving long-term goals. The test and its interpretations have influenced educational policies, parenting advice, and even economic theories about personal success and social mobility.
However, the WaPo story complicates the narrative. For instance, the story described how, when the same test was applied to Yucatec Maya children, the results were puzzling. Instead of patiently waiting for a second marshmallow, many children simply left the room. And this was true not just of this test but a range of other tests that have been often been used to study children’s “executive functions”, i.e. the mental muscles that help us stay focused, think on our feet, and actually get stuff done.
The researchers argue that it was not because they lacked self-control but because sitting alone in a room doing nothing made little sense in their cultural context. As the article says,
The researchers raised a pointed question: If a child from a poor family or a child from a different culture doesn’t perform well, is the fault in the child or the test?
And these findings really complicate a whole area of research that has focused on determining human universals in how the human mind develops. What this research shows is that these “universals” are culturally and context dependent.
Of course these are preliminary findings and there is much more research that needs to be done to explicate the complex relationship between human development and culture.
But enough about marshmallows. Let’s sink our teeth into the AI flavor of the year: Large Language Models (LLMs).
What do these findings mean for the training and outputs of LLMs such as ChatGPT, Gemini, Claude and more.
As I have argued elsewhere, when our AI systems are fed with data primarily from Western, middle-class contexts, we risk creating technology that misunderstands, misclassifies, or even discriminates against vast swathes of the global population. In other words these data are WEIRD (Western, Educated, Industrialized, Rich and Democratic). Just as the Marshmallow Test failed to account for cultural differences in how children behave and learn, AI trained on WEIRD datasets may fail to understand or properly serve diverse global populations.
Another prime example of a WEIRD construct deeply embedded in AI systems is the concept of the “terrible twos.” (For instance, see this video about the role of culture in creating our reality). This Western notion characterizes two-year-olds as particularly difficult, defiant, and prone to tantrums. However, this is far from a universal experience across cultures. In many non-Western societies, where children are integrated differently into family and community life, this phase is not observed or is not seen as particularly challenging. Yet, an AI trained on predominantly Western data will likely reinforce this idea globally, potentially pathologizing normal developmental stages in cultures where the “terrible twos” simply don’t exist.
These examples do not just illustrate a potential danger—they expose a present and persistent problem in AI, particularly in Large Language Models (LLMs). These models have already been trained on vast corpora of data that speak glowingly of the Marshmallow Test and similar WEIRD psychological constructs. The biases are not a future risk, but a current reality deeply embedded in these systems.
The implications are far-reaching and resistant to change. New studies challenging these WEIRD concepts, if they appear at all, will barely make a dent in the massive corpus of data that affirms them. This means that for years to come, LLMs will continue to propagate these biases.
Perhaps most alarmingly, as the article points out, these culturally biased assessments can make children who are perfectly functional in their own contexts feel inadequate or “behind.” When theories developed in WEIRD contexts are taught as universal truths about human psychology, we risk creating a world where cultural differences are pathologized rather than celebrated. A psychologist or parent in Ghana or Mexico, consulting AI for advice, may be told their child lacks “executive control” or “self-regulation” based on behaviors that are perfectly normal and adaptive in their cultural context.
As an academic studying these issues, I feel compelled to point out these problems and make people aware of them. However, I’m increasingly pessimistic about our ability to truly fix this situation. The world of AI development is deeply entrenched in WEIRD perspectives, and the momentum of technological progress often outpaces our ability to correct course.
The homogenizing force of AI, trained on biased datasets and deployed globally, seems almost unstoppable. While we can strive to be more aware of these biases in our own work and thinking, the broader trend of cultural flattening through technology appears set to continue.
And yet, the story of human culture gives us reason for cautious optimism. Consider the journey of cultural forms like jazz and rap. Born from the specific context of African American experiences in the United States, these art forms have spread across the world, taking on rich and varied flavors in different cultural contexts. From Japanese jazz fusion to French hip-hop, these once-localized forms of expression have been adopted, adapted, and transformed by diverse cultures worldwide. Perhaps AI will follow a similar path. As it spreads across the globe, different cultures may find ways to reshape and repurpose these technologies, infusing them with local knowledge, values, and ways of thinking. While the initial wave of AI may be WEIRD, the ingenuity and adaptability of human cultures might, just might, lead to newer, better culturally sensitive forms of AI.
To be honest, I am not too optimistic. The WEIRD forces we face are far too strong – as I have written about elsewhere.
0 Comments