A-EYE: When AI can see

by | Saturday, October 14, 2023

AI can now see! And talk to you about what it sees!

ChatGPT released its latest upgrade – the ability to not just create images but also to interpret them. I had been waiting for a while now to get access to these new vision features – and just this morning it popped into my account. Essentially, one can now upload images to ChatGPT and engage in a conversation with it about the image. (This feature is just available to paid users.) So I decided to play with it and and see what it could do.

In the examples below, you’ll see how it can intelligently describe not just the content of an image, but also infer deeper meanings and connections. I gave ChatGPT a wide range of images, from book covers to diagrams to street art, and asked it to analyze them from different perspectives – summarizing themes, identifying potential misconceptions, conducting semiotic and Marxist analyses, and more. In each case it generated pretty nuanced multi-paragraph responses that synthesized these images picking up on subtle symbols, societal critiques, and contextual nuances. While not perfect, these experiments foreshadow the potential for AI to become an increasingly versatile partner in making sense of our visual world. If nothing else, check out the last two examples, a semiotic and Marxist analysis of the TPACK diagram, and the narrative it created tying together disparate graphs on refugee data!

It has become somewhat of a cliche to describe every new capability of AI as being a game-changer, so I will not do that. That said, I do think this is pretty insane. And honestly, it took me longer to put together this blog post than running these experiments. But you can judge for yourself.


Note: The images below are the exact images that were used in these experiments (except when indicated otherwise).

First off I uploaded this photo (the cover of a book) and asked it to:

… write me a 2 paragraph report on this book. The first paragraph should describe the book and the authors. The second paragraph should extrapolate the ideas in the book to the age of AI

This is what it generated:

“The Medium is the Massage: An Inventory of Effects” is a seminal work co-authored by Marshall McLuhan, a renowned media theorist, and Quentin Fiore, a graphic designer. The book delves into McLuhan’s famous aphorism “the medium is the message,” exploring the idea that the form of a medium embeds itself in any message it would transmit or convey, creating a symbiotic relationship by which the medium influences how the message is perceived. McLuhan’s insights in the book highlight the transformative effects of media on human senses, social structures, and cultural norms, with Fiore’s design amplifying these concepts through visually compelling layouts and collages.

Extrapolating McLuhan’s insights to the age of AI, the interplay between medium and message takes on profound new dimensions. Artificial intelligence, as a medium, is not just a tool but shapes the very fabric of information dissemination, consumption, and understanding. The algorithms that power AI platforms, from social media feeds to search engines, curate and prioritize content in ways that influence societal discourse, values, and perception. Thus, AI becomes an active participant in the communication process, subtly shaping human cognition and behavior. In this context, the AI’s architecture, biases, and design principles become the medium, dictating not just the flow of information but also the nuances of its interpretation and societal implications.

I then give it a photograph of my bookshelf and asked it to tell me about it could about the person who has these books on his shelf.

Funnily enough, it refused to do – part of its guardrails to not answer questions about people.

But when I asked it to summarize 3-5 themes that connect across these books, it did so readily.

Based on the visible titles and known content of these books, several overarching themes can be identified:

  1. Human Evolution and History: Many of the books, such as “Sapiens,” delve into the history and evolution of humanity, examining how we have developed over millennia both biologically and culturally.
  2. Psychology and Cognition: Works by Steven Pinker and others explore the intricacies of the human mind, its development, and how it functions, delving into topics such as human nature, cognition, and behavior.
  3. Society and Culture: Several titles reflect an interest in the larger structures of human societies, cultures, and civilizations, including their foundations, transformations, and challenges.
  4. Philosophy and Thought: Some of the books touch upon philosophical questions about human existence, consciousness, and the nature of reality.
  5. Political and Social Commentary: Books such as “The Declaration of Independence and Constitution of the United States of America” suggest an interest in governance, rights, and the foundations of societies. Other titles might explore contemporary issues and challenges in modern society.

How would it do as a science educator? I gave it the following diagram (grabbed from the web), a typical diagram used to explain how seasons happen. I then asked it: “What misconceptions can this diagram lead to.”

It not just identified the image, it proceeded to give me 6 potential misconceptions that this image may create in students minds, as follows:

The diagram seems to depict the Earth’s tilt and its position relative to the Sun at different times of the year, which is associated with the changing seasons. However, there are a few potential misconceptions that might arise from this illustration:

  1. Orbit Shape: The diagram could imply that Earth’s orbit around the Sun is highly elliptical. In reality, Earth’s orbit is almost circular, with only a slight elliptical shape.
  2. Tilt Constantcy: The Earth’s axial tilt is shown as constant relative to the Sun. This could lead to the misconception that Earth’s tilt changes in relation to the Sun as it orbits. In reality, the axial tilt remains relatively constant with respect to distant stars, which is why we experience different seasons.
  3. Distance to the Sun: One might mistakenly infer that the distance between the Earth and the Sun drastically changes between seasons, especially given the varying sizes of the elliptical path. While there is a difference in distance due to the elliptical orbit (with Earth being closer during perihelion and farther during aphelion), it’s not the primary reason for the change in seasons.
  4. Seasonal Length: The four positions could inadvertently suggest that each season (winter, spring, summer, autumn) is of equal length. In reality, due to the elliptical shape of Earth’s orbit and other factors, some seasons are slightly longer than others.
  5. Hemispherical Bias: The diagram depicts the Earth from a Northern Hemisphere perspective, with the North Pole tilting toward the Sun in June and away in December. This could lead to the misconception that the entire Earth experiences summer in June and winter in December, neglecting the fact that the Southern Hemisphere has opposite seasons.
  6. Cause of Seasons: The primary reason for seasons is the tilt of the Earth’s axis, not the distance from the Sun. However, without proper context, one might think that Earth’s proximity to the Sun causes the seasons.

It’s always important to provide context and explanations when using diagrams to illustrate complex concepts to prevent these types of misconceptions.

Next up, I asked it to give me a semiotic analysis of a a photograph I had taken of some street graffiti back in 2022 when I had visited San Francisco.

Of course! Semiotic analysis involves interpreting the signs and symbols within a piece of media to extract meaning. Here’s a semiotic analysis of the graffiti you’ve shared:

  1. Setting & Symbols:
    • Golden Gate Bridge: Instantly recognizable as a symbol of San Francisco. Its inclusion firmly places the setting in this iconic city.
    • Cartoonish Buildings: While exaggerated, these likely represent the crowded and stacked architecture of the city.
  2. Characters & Elements:
    • Animated Animals in Human Attire: These characters may represent the diverse and eclectic mix of inhabitants of the city, suggesting a multicultural and diverse society.
    • Bear with “I ?? BURRITOS” Shirt: A possible nod to California’s rich Mexican culture and the popularity of Mexican cuisine, especially burritos.
    • Red Fox with SF Hat: This could be emphasizing local pride and identity. The fox, known for its cunning, might symbolize the sharp and savvy nature of the locals.
    • Robot: This might be a reference to the tech industry, as San Francisco is close to Silicon Valley and is a hub for tech startups and innovation.
    • Agent-Like Character with Sunglasses: Possibly a nod to the corporate side of the city or the secretive nature of some businesses, especially in the tech sector.
    • Characters with Street Vendor Cart: This could represent the entrepreneurial spirit of the city, or it could be a nod to the street food culture and local vendors.
  3. Text & Phrases:
    • “EXPENSIVE FOOD”: Likely commenting on the high cost of living in San Francisco, which includes the cost of dining out.
    • “12 GALAXIES” and other text: These might refer to local establishments, events, or inside jokes within the community.
  4. Colors:
    • The vibrant and contrasting colors used might represent the dynamic and lively atmosphere of the city, indicating its energy and vibrancy.
  5. Overall Theme:
    • The mural seems to celebrate the diversity, energy, and unique cultural elements of San Francisco while also offering a playful critique of some of its challenges, like the high cost of living.

Remember, semiotic analysis is open to interpretation, and different viewers might notice and interpret signs differently. The above analysis is just one perspective on the rich and vibrant imagery present in the graffiti.

I then asked it to write two paragraphs. The first would be a Marxist interpretation of the same image, followed by a paragraph comparing these two interpretations (semiotic and Marxist) pointing out strengths and weaknesses of each. It responded with .

Marxist Interpretation: The graffiti image, from a Marxist perspective, can be seen as a depiction of class struggle and the contradictions of capitalism within San Francisco. The cartoonish, tilted buildings may symbolize the unstable housing market and skyrocketing real estate prices, driven by the tech boom. Characters like the robot and the corporate agent represent the dominant capitalist class, while the street vendor and animals in human attire embody the working class. The “EXPENSIVE FOOD” sign and the bear wearing the “I ?? BURRITOS” shirt underscore the disparity between the celebration of cultural diversity and the commodification of that culture, leading to gentrification and the displacement of marginalized communities.

Comparison: The semiotic analysis focuses on understanding the individual symbols and signs within the image and extracting the meaning from their interrelations. It provides a detailed breakdown, allowing for a broad understanding of cultural, local, and social references. However, it may not inherently focus on power dynamics or systemic critiques. On the other hand, the Marxist interpretation delves into the societal structures, power relations, and economic forces represented in the artwork. It critiques the systemic inequalities and highlights class struggle, but may overlook some cultural nuances and local references that a semiotic analysis would catch. Both frameworks offer valuable insights: semiotics excels in cultural and symbolic interpretation, while Marxism emphasizes socio-economic structures and power dynamics.

Next, I gave it the new TPACK diagram and asked it to conduct a similar analysis of the image.

Semiotic Analysis: The image depicts the interconnected relationships between different types of knowledge vital for modern educators. The Venn diagram format indicates the overlap and mutual importance of each knowledge domain. At its core, we find the intersections of technological, pedagogical, and content knowledge, suggesting that the combination of all three forms the foundation of effective teaching in today’s digital age. The outer dotted circle, labeled “Contextual Knowledge,” implies that all these domains are embedded within a broader context, suggesting that understanding the context in which education takes place is crucial.

Marxist Interpretation: From a Marxist lens, the image can be seen as a representation of the demands and expectations placed on modern educators within a capitalist system. The emphasis on technological knowledge (TK) underscores the importance and inevitability of technology in today’s education system, which can be perceived as a reflection of capitalist forces pushing for tech integration in all sectors of society, potentially benefiting tech corporations. Furthermore, the necessity for educators to master all these overlapping domains of knowledge might highlight the increasing demands on teachers without necessarily providing them with corresponding resources or compensations. The outer “Contextual Knowledge” might be seen as the socio-economic environment teachers operate within, affected by policies, funding, and societal values, all of which can be influenced by capitalist imperatives.

Comparison: The semiotic analysis provides insights into the symbolic meanings of the image, emphasizing the interconnected nature of knowledge domains and the importance of context. It offers a neutral, descriptive perspective focused on signification. On the contrary, the Marxist interpretation critiques the underlying socio-economic forces and power dynamics represented in the image, offering a more critical perspective on the societal expectations and demands placed on educators. While the semiotic approach helps understand the image’s structure and symbolism, the Marxist analysis delves deeper into societal critiques and potential implications. Both perspectives provide a comprehensive understanding, but from different angles: one is interpretative of symbols, and the other is critical of systemic structures.

Finally, and this is truly the most amazing, I took three graphs from an article titled How the U.S. compares with other countries taking in refugees and asked it to write a short narrative that connects these three images together and explains what they collectively mean. This is the article it wrote – just based on these three images. (I have included the three images in one – but they were fed into GPT individually.)

The Evolving Landscape of Refugee Relocation: An Insight

The global dynamics of refugee movements have witnessed significant shifts over the decades, necessitating a deeper understanding of not only the origins but also the host countries shouldering the responsibility of providing shelter to these displaced souls.

The Shifting Origins
The U.S., as a major global power, has played a prominent role in refugee admissions over the years. A glance at the admissions data from 1975 to 2015 paints a vivid picture of these shifts. From the aftermath of the Vietnam War, which saw the U.S. accommodating a significant number of Vietnamese refugees, to the Soviet refugee influx in the late 80s and the acceptance of refugees from conflict-torn regions like Kosovo, Somalia, and Laos in the late 90s and early 2000s, the U.S. has continually adjusted its refugee policies in response to global crises. This reactive approach is evident with events like the U.S. Refugee Act of 1980, which raised overall quotas, and the special statuses granted in 2008 to refugees from Myanmar and Bhutan.

A Relative Burden
However, while countries like the U.S. have taken commendable steps, the actual burden borne by countries is relative to their size and resources. Data from 2014 strikingly illustrates this point. Countries such as Lebanon and Jordan, despite their limited resources and size, have hosted refugees equivalent to 19.6% and 8.2% of their total populations, respectively. Similarly, nations like Nauru, Chad, and Djibouti, often overshadowed in global dialogues, bear a substantial relative burden.

What emerges from these data points is a tale of human movement shaped by socio-political events and crises. As much as the origin points tell a story of displacement and search for safer shores, the host countries’ data reflects their commitment, capability, or sometimes the sheer geographic inevitability of being a refuge. It’s imperative to understand and appreciate the role of every nation, big or small, in this shared human responsibility.

While the U.S.’s role is pivotal, the magnanimity of smaller nations in the face of adversity stands out as a testament to global solidarity. As the world continues to grapple with refugee crises, these figures serve as a mirror, reflecting both our challenges and our enduring human spirit.

A few randomly selected blog posts…

Following up on lunar distance

A followup to my previous posting about the Italian kids calculating the distance to the moon using recordings from the Apollo Space program. As I read the story on the technology Review website, I came to the comments made by readers. One stuck out. This is what...

A visit to Israel

A visit to Israel

I just got back from a trip to Israel. I was invited by the MEITAL 2019 conference and the Kibbutzim College of Education, Technology and the Arts. MEITAL is an organization of higher education institutions in Israel focusing on understanding and responding to local...

Happy Thanksgiving

Happy Thanksgiving

A new design for my favorite holiday of the year. See animated version below. Enjoy Previous designs can be found here and here.

Visualizing mathematics

I love visual proofs of mathematical theorems. One visual proof I use quite often in my design courses (CEP817 or CEP917) is a visual proof of the fact that the sum of consecutive odd numbers is a square number. In other words: 1 + 3 = 4 = 22 1 + 3 + 5 = 9 = 32 1 + 3...

Mathematical insight on reality & you (yes, you!)

Mathematical insight on reality & you (yes, you!)

I have always been intrigued by the manner in which everyday ideas get "mathematicized" (if that's a word). For instance, the other day, on a bus-stop by my office I noticed an equation written on the wall. I have no idea why it was there, but...

Responding to my reading…

I had written a response to Mohsin Hamid's The Reluctant Fundamentalist a while ago (read it here). Yesterday, I received a note from Irfan critiquing my take on the novel. This is what he wrote: Punya, I read the novel and it does not seem to me, as you interpret,...

On picturing words, tech-mix an old school idea

Students in my CEP 818 (Creativity in Teaching and Learning) have been using digital photography to explore a variety of topics related to trans-disciplinary creativity. I hope to showcase some of their work on this blog once the semester gets over. In the meanwhile,...

Meta poems too

Meta poems too

Layout is the first to go Lines of poetry are sacred to both the author and the reader. To alter the specific construction in line length is to alter the look and rhythm of the poem. However, as ebooks and eReading devices have become more prevalent, readers have come...


Submit a Comment

Your email address will not be published. Required fields are marked *