Apr 7, 2022 - Science

This new AI can create realistic images from your text commands

Images generated by DALL-E 2 in response to the prompt "astronaut riding a horse." Credit: OpenAI
Results generated by DALL-E 2 in response to the prompt "astronaut riding a horse." Image: OpenAI

A new algorithm can produce realistic images from a text prompt, OpenAI announced this week.

The big picture: As advancements in artificial intelligence surge forward, projects like DALL-E 2 could help researchers to create systems that visualize the world around them.

  • "The world is not just text," says OpenAI co-founder and chief scientist Ilya Sutskever. To gain a more human-like understanding of the world and to interact with people, "our neural networks need to master the visual world."

How it works: OpenAI's text generator GPT-3 builds on a prompt to predict what word would most likely come next in a sequence.

  • The image generating algorithm DALL-E 2 works in the same way but its palette is pixels not words.
  • It uses a language model called CLIP to first take a text prompt and try to produce dots with features that represent the prompt.
  • Then it uses a neural network to render an image that tries to match the text provided.

In a demo with Axios this week, researchers showed off a range of images DALL-E 2 can generate — from "teddy bears mixing sparkling chemicals as mad scientists" to "an ibis in the wild painted in the style of John Audubon."

  • The algorithm builds on earlier work, but generates higher resolution images and adds the ability to make specific edits to parts of an image and to generate variations of an image from the same prompt.
  • "It’s really fascinating," says OpenAI research scientist Prafulla Dhariwal. "It's like watching art being generated through math, and it is kind of magical."

But, but, but ... DALL-E 2 does make mistakes.

  • It struggles when asked to count large numbers of features — for example, if you ask it to draw a cat with eight legs.
  • And it has a hard time when prompted to go against strongly held priors established through training — like a big mouse chasing a small lion.

What to watch: Like the language generator GPT-3, there are concerns about its misuse.

  • The OpenAI researchers say they are taking steps to try to minimize the risk, including removing violent content from the algorithm's training data, filtering violent or pornographic prompts, and having humans review the images shared on the platform. (Images will also have a signature — the rainbow in the bottom right corner.)

So is it considered creative? It depends what you mean and who you ask.

  • The AI has the ability to create variations of images by breaking them down into their essential components and then blending them together, says OpenAI researcher Aditya Ramesh.
  • "To the extent that humans can take an idea from a source of inspiration and apply it to something new, this can be viewed as a kind of creative ability."
Go deeper