AI and Visual Arts

Andrew Sempere @ EPFL Extension School · 15 minutes

How AI is making us think again about what we see.

Artistic practice has always been about the skilled application of illusion. Drawings and paintings, for example, create an optical illusion – a viewer will admire an artist’s ability to create convincing scenes or characters through color theory, forced perspective, foreshortening and shading. These are used to create an image that fools the eye.

A skilled human artist can take a piece of paper and charcoal and, with these simple tools, create a sense of depth and build a world that exists only in our minds. This unique skill set has largely formed the basis of how we evaluate art in Europe for hundreds of years. The better the quality of the illusion, the greater the artist.

Ferdinand Hodler

This painting is called "Die Strasse nach Evordes" and was painted by Ferdinand Hodler (image taken from here). Using only colors on a flat canvas, the artist creates a captivating scenary - one with depth, warmth and wonder. The sound of birds and the feeling of a fresh summer breez on our skin seems only a few feet away.

So what should we make of technology that can create a convincing illusion by itself? Are computer-generated works of art more or less real than those created by a human? If an AI can create a painting that looks and feels like a da Vinci, is it as valuable as the original?

This topic has been debated before. The widespread availability of photographic processes in the 1930s, and the explosion of visual news magazines, caused a rift in the art world. They irrevocably changed what it meant to create images and be an artist.

Walter Benjamin’s landmark 1936 essay “The Work of Art in the Age of Mechanical Reproduction” highlighted this split, presenting both the problems and possibilities that “mechanical reproduction” brings to the art world. In particular, Benjamin sought to locate the value of an original artwork while acknowledging that widespread reproducibility makes art more accessible.

The Mona Lisa quandary

Take, for example, da Vinci’s classic painting – the Mona Lisa. The fact that it can easily be viewed through a quick online search makes it an even more powerful work of art, but the original image itself has become less impactful as a result. The aura of the original means people will queue for hours to see it, yet we have all seen that image thousands of times – on posters, stamps, notebooks and clothing. We’ve seen it to such an extent that it has become a cliché.

In this case, technology has changed the meaning of the original painting. But the debate isn’t about whether it has made it more or less valuable. Rather, it has led to far more complex and interesting questions. Where exactly does the value of that image reside? Who is the author? Who owns the illusion and what does it mean? And can it be moved from one object to another?

But now – in the age where everything can be digital – some art doesn’t even exist in a physical form anymore. There might not even be a single “unique original” version of the piece of art; the “original” might just be one of many.

So what does that mean for the art and its value?

Do androids dream?

In early 2015, Google released Deep Dream, with the delightful proposition that it was software that allowed you to view what was inside the “mind” of an AI as it “dreamed”. It is certainly entertaining to look at: psychedelic swirling colors that mysteriously evolve into dog heads. It appears, if this software is to be believed, that the AI ultimately thinks everything is a dog.

Deep Dream

Visualization of what an AI, trained to detect dogs, would love to see, instead of the actual input, i.e. what it "dreams" about (image taken from here).

But the real artwork is in the title of this project. The software is neither dreaming nor deep in an artistic sense, and the canine obsession is a circumstance of its programming. Deep Dream uses a dataset made up of thousands of photographs of dogs; and so AI dreams appear to consist exclusively of dogs not because the AI is obsessed with man’s best friend, but because humans know what dogs are – and find them significant enough to have collected thousands of photos of them.

What’s more, while the images appear to be of dogs, they are actually not. They merely appear dog-like to human eyes – in reality, they are a collection of pixels positioned in a shape that roughly matches an object of what we humans would label as a dog.

The software’s images come from running a classification algorithm in reverse.1 While this is fun as a provocation, the computer has no real understanding that this output, as strange as it appears to us, is any more or less correct than auto-tagging2 a pet on a smartphone.

We cannot say the AI is dreaming (or hallucinating) because it had no concept of what made these things “normal” to start with. We cannot even attribute a love of dogs to the AI – it is simply reflecting the data it has been trained on. The software isn’t thinking, just as an amplifier doesn’t play an electric guitar. It is simply producing output in accordance with the instructions provided by its human operators.

And this output appears psychedelic to human eyes for a similarly simple reason: we have psychedelic human experiences to draw parallels with it. The “meaning” in these images relies on our own internal “classification system” and life experiences.

This observation, though, doesn’t increase or decrease the value of the images that Deep Dream creates – it simply shifts the responsibility. The AI produces the image, yet it is up to us to edit, crop, select, amplify and enjoy the output ourselves, as a reflection of our own experience. The AI produces the raw material; but the art and dreams belong to us.

Write what you know

A common saying among fiction writers is “write what you know”. In the case of AI, it is an absolute – an AI can only produce material it is familiar with, at least for now. All AIs are built on training data, and their knowledge of the world is limited to the scope of that dataset.

In the case of Deep Dream, much was made of the fact that the output reminded people of the “trippy” visuals you see when hallucinating under the influence of psychoactive substances. If androids dream of electric sheep, it’s because we’ve trained them on a set of sheep data – but do they even “dream” at all? Are there parallels between human mental processes and the way this algorithm runs?

To some extent, yes. But there has been a distinct shift in AI research over the past few decades that has had profound implications on the way this technology finds its way into our daily lives.

In the early days of AI research and cybernetics in the 1940s, the emphasis was on replicating human intelligence, with the promise that this future was imminent. However, more recent research in this field has focused on the integration of AI and AI-like software in ways that feel natural. No claims are made that the AI is actually alive or replicates organic processes.

This shift can be seen from contemporary AI researchers such as Bruce Blumberg, a “user experience” designer and AI researcher who worked at MIT and Apple. He spent several years of his career building conversational agents3 modelled on non-verbal communication between humans and their companion species, based on observing his son’s interest in show dogs. The goal here wasn’t to make an AI as smart as a human, but to create an AI that was “smart enough” and as relatable as a pet.

While the notion of a conversational, general AI (see our previous unit AI - two letters many meanings for more details) still captures the minds of science fiction authors and fans, the closest we have right now is Siri – the personal assistant who lives on hundreds of millions of iPhones all over the globe. Siri, and other devices such as Amazon’s Alexa, are clearly limited in their understanding of the world, but they are good enough at speed dialling your friends or playing a song for you.

The imitation within the confines of a domain, that is characteristic for narrow AI, is both more interesting and more profitable than the general AI featured in science fiction. After all, illusion is context-specific: give a phone or computer access to an address book and calendar, and a relatively simple interface becomes quite useful and “smart”.

Careful framing is also a key component of how art functions. For example, the optical illusion of depth in a painting only exists because the image has a border and is being viewed from a specific distance. Step too far away or too close and the illusion vanishes. And if you were to look at a screenshot of your favorite movie under a magnifying glass, it would dissolve the sense of story and leave only the pixels.

But the promise of technical image making – especially when it comes to high definition photography – has always been about precision and, by implication, telling the truth. Images are evidence, and within the confines of a particular frame, they can make us believe situations or stories that may, or may not, be real.

Fakes all the way down

During the 2020 US Election, political operatives online claimed to have an explosive report containing scandalous information about presidential candidate Joe Biden and his son, Hunter.

This “leaked” report was written by a Swiss security expert named Marten Aspen, who ran a private firm called Typhoon Investigations. Once the information reached major news outlets, journalists began to investigate and immediately ran into a roadblock. Although it had a barebones social media presence, Typhoon Investigations didn’t seem to exist – in fact, Marten Aspen didn’t exist either.

The only evidence of Aspen’s existence was a photograph. But this headshot had its own story to tell: it was actually one of an infinite number of AI-generated, fake headshots from a website called This Person Does Not Exist.

This Person Does Not Exist was created by American software engineer Phillip Wang. The underlying AI software responsible for generating these fake images (also called a StyleGAN4) was developed by video card manufacturer Nvidia as a demonstration of their framework for running AI computations on a graphics card. Nvidia never intended their software to be used as a tool to create fake headshots, and it is likely the demo would have remained under the radar were it not for Wang’s website.

By featuring the images as a collection and providing a title that implies the images are headshots, Wang challenges us to consider that these images are photographs of people who might exist, when they do not. Put simply, they are not evidence of existence – and on closer examination, it’s quite easy to tell they are fakes.

NBC News noted on the Marten Aspen case:

“Aspen’s ears were asymmetrical, for one, but his left eye is what gave away that he did not really exist. Aspen’s left iris juts out and appears to form a second pupil, a somewhat frequent error with computer-generated faces.” - Note, this NBC News article also includes an image of Martin Aspen.

There are actually many errors introduced by the StyleGAN algorithm, and artist Kyle McDonald – who creates visual models using code – was the first one to catalog them with regards to human faces. McDonald wrote a post on the online publishing portal Medium that analyzed the key components of this experiment. Errors included everything from “painterly rendering” to “weird teeth”.

However, with more training examples along with guidance from the developers, the StyleGAN can get better at its task. And with every passing year, we will be able to improve our understanding of how this AI performs these tricks.

Take the following illustration, which is taken from the research paper introducing StyleGAN in 2019, for example. The AI researchers took information from one image of a person (as shown on the left under Source A), and combined it with a few key features (e.g. having glasses, gender, age, head position, etc.) of another image of a person (as shown along the top under Source B).5 The results are the images on the horizontal line next to the Source A image. Don’t forget that none of these people actually exist.

StyleGAN Research

This figure is taken from the research paper by Karras, Laine, & Aila (2019) and depicts portraits from people who don't exist.

But even if we are told that something is real – and are then presented with evidence that suggests the opposite – we will often dismiss that evidence. It works the other way around too: if we are looking for errors, we are more likely to see them than not.

Fake headshots “work” because our brains are highly tuned to identify human faces. While you might believe this makes us good at identifying when something is “off”, the opposite is actually true. Humans instead tend to see faces everywhere, even when they don’t exist, which is a phenomena known as pareidolia.

As it turns out, our ability to match patterns is actually tuned to ignore errors, rather than highlighting them. So when we are presented with photographic evidence of a Swiss security expert, we tend to accept the image because it resembles a headshot of a middle-aged white man – even if on closer examination the man has bizarrely-shaped ears, multiple pupils, and is wearing a shirt with a collar that doesn’t look real.

The illusion doesn’t need to be perfect. It only needs to be good enough in the right light.

The Presentation of Self and the performing arts

In 1956, a sociologist named Erving Goffman wrote a book called The Presentation of Self in Everyday Life. In it, Goffman claimed that humans treat face-to-face interaction as a stage performance. His suggestion wasn’t that these interactions are false, but that we play a variety of roles in different aspects of life; we use different languages, wear different costumes, and present ourselves differently to different people, depending on the situation.

Goffman died in 1982, nearly a decade before the internet became a household phenomenon, and two decades before the launch of the hugely popular social media website Facebook. Nevertheless, his ideas have found traction in the work of researchers studying internet culture, in particular those examining how we present ourselves online. Socially, we understand the internet is a place where identity is fluid and, as we have seen, our brains are uniquely tuned to interpret nearly any image as “evidence”. In some cases we are not “tricked” at all, but fully enjoy the performance as willing participants.

Self-performance for fun – and profit

Janky is a cartoon cat who has animated friends, and boasts close to a million Instagram followers. He often posts sponsored content, featuring high-end brands including Prada and Gucci. Janky is clearly “fake” – he has no connection to any existing story world (in the way The Avengers do with the Marvel universe, for example), but this CGI (short for “Computer-Generated Imagery”) character has a voice, a following, and a fan base.

Janky also isn’t an AI; he’s a character that is “voiced” by a marketing team that works for a website selling collectable vinyl toys. This technique will be familiar to fans of the late 1990s band The Gorillaz.

Formed in 1998 and fronted by Blur singer Damon Albarn and artist Jamie Hewlett, The Gorillaz are a supergroup formed by real-life musicians, although the band is presented as four animated cartoon characters. It was originally created as a commentary on the emptiness of MTV, and the popularity of manufactured “boy bands”.

All of these things (cartoon cats, boy bands, MTV) are appealing synthetic worlds created by media and advertising companies with huge budgets. They are not real in any way that matters, yet they appeal to millions of people and generate billions of dollars in revenue.

As AI becomes more efficient at generating convincing visual evidence of people and places that don’t exist, you can expect to see an increase in artificial identities – and the amount of people that embrace them. You can also expect to see more conflict, as the boundaries between truth and reality become even more blurred. In fact, how people present their identity online is already becoming a hotly contested topic.

Back in early 2019, a debate broke out among social media influencers when someone on Instagram noticed that many photos by travel blogger Tupi Saravia mysteriously featured the same cloud formation.6 Ultimately, Saravia was forced to come clean and admit she used an “AI-powered” photo editing tool called Quickshot to make her images look more appealing.

Saravia’s critics claimed her use of AI contributed to the “culture of fakeness”. In return, she argued it was simply a storytelling tool; no better or worse than CGI in movies, the make-up on newscasters, or the way we try and present only our best selves on social media.

Either way, it is increasingly clear that AI is here to stay – especially when it comes to visual image making – and it has had a huge impact on how we see and perform in the world around us. While this can clearly be used negatively in a variety of ways, including damaging political enemies, supporting conspiracy theories, or manufacturing fake scandals, it can also be used to enlighten, entertain, and bring joy.

In all of these cases, AI works best as a creative partner to make relevant suggestions to humans who can then make the final decision. AI doesn’t work alone, it is in a partnership based on the will of the person who wields it. Ultimately, in terms of art, AI does what all the best art has always done: expand our potential and leave us with bigger and more interesting questions than when we started.

  1. A normal classification algorithm trains an AI model to be a great detector, e.g. it makes the model capable of detecting if an image contains a cat or a dog. DeepDream does this process in reverse and augments an input image in such a way that it looks the most like a dog as it could be (at least for the model). 

  2. Auto-tagging are computer algorithms which automatically label your images. It’s what your smartphone or Facebook does when they group all the images of your partner into one group. 

  3. A conversational agent is a dialogue system that can analyse written or spoken text and is also able to give automatic responds using human language. 

  4. More information about StyleGAN can either be found in this Wikipedia article or taken directly from the original research paper here

  5. For more details about this and to see this process in action, check out this YouTube video

  6. You can see a small collection of Saravia’s shared photos with common cloud formation in this tweet

Next

Machine Learning