The Conversation

The folly of making art with text-to-image generative AI

Making art using artificial intelligence isn’t new. It’s as old as AI itself.

What’s new is that a wave of tools now let most people generate images by entering a text prompt. All you need to do is write “a landscape in the style of van Gogh” into a text box, and the AI can create a beautiful image as instructed.

The power of this technology lies in its capacity to use human language to control art generation. But do these systems accurately translate an artist’s vision? Can bringing language into art-making truly lead to artistic breakthroughs?

Engineering outputs

I’ve worked with generative AI as an artist and computer scientist for years, and I would argue that this new type of tool constrains the creative process.

When you write a text prompt to generate an image with AI, there are infinite possibilities. If you’re a casual user, you might be happy with what AI generates for you. And startups and investors have poured billions into this technology, seeing it as an easy way to generate graphics for articles, video game characters and advertisements.

In contrast, an artist might need to write an essaylike prompt to generate a high-quality image that reflects their vision – with the right composition, the right lighting and the correct shading. That long prompt is not necessarily descriptive of the image but typically uses lots of keywords to invoke the system of what’s in the artist’s mind. There’s a relatively new term for this: prompt engineering.

Basically, the role of an artist using these tools is reduced to reverse-engineering the system to find the right keywords to compel the system to generate the desired output. It takes a lot of effort, and much trial and error, to find the right words.

AI isn’t as intelligent as it seems

To learn how to better control the outputs, it’s important to recognize that most of these systems are trained on images and captions from the internet.

Think about what a typical image caption tells about an image. Captions are typically written to complement the visual experience in web browsing.

For example, the caption might describe the name of the photographer and the copyright holder. On some websites, like Flickr, a caption typically describes the type of camera and the lens used. On other sites, the caption describes the graphic engine and hardware used to render an image.

So to write a useful text prompt, users need to insert many nondescriptive keywords for the AI system to create a corresponding image.

Today’s AI systems are not as intelligent as they seem; they are essentially smart retrieval systems that have a huge memory and work by association.

Artists frustrated by a lack of control

Is this really the sort of tool that can help artists create great work?

At Playform AI, a generative AI art platform that I founded, we conducted a survey to better understand artists’ experiences with generative AI. We collected responses from over 500 digital artists, traditional painters, photographers, illustrators and graphic designers who had used platforms such as DALL-E, Stable Diffusion and Midjourney, among others.

Only 46% of the respondents found such tools to be “very useful,” while 32% found them somewhat useful but couldn’t integrate them to their workflow. The rest of the users – 22% – didn’t find them useful at all.

[Abridged]

Ahmed Elgammal, Rutgers University

Categories Opinion The Conversation