Stories

Training real AI with fake data

Illustration: Aïda Amer/Axios

AI systems have an endless appetite for data. For an autonomous car's camera to identify pedestrians every time — not just nearly every time — its software needs to have studied countless examples of people standing, walking and running near roads.

Yes, but: Gathering and labeling those images is expensive and time consuming, and in some cases impossible. (Imagine staging a huge car crash.) So companies are teaching AI systems with fake photos and videos, sometimes also generated by AI, that stand in for the real thing.

The big picture: A few weeks ago, I wrote about the synthetic realities that surround us. Here, the machines that we now rely on — or may soon — are also learning inside their own simulated worlds.

How it works: Software that has been fed tons of human-labeled photos and videos can deduce the shapes, colors and movements that correspond, say, to a pedestrian.

  • But there's an ever-present danger that the car will come across a person in a setting unlike any it's seen before and, disastrously, fail to recognize them.
  • That's where synthetic data can fill the gap. Computers can generate millions of scenes that an actual car might not experience, even after a million driving hours.

What's happening: Startups like Landing.ai, AI.Reverie, CVEDIA and ANYVERSE can create super-realistic scenes and objects for AI systems to learn from.

  • Nvidia and others make synthetic worlds for digital versions of robots to play in, where they can test changes or learn new tricks to help them navigate the real world.
  • And autonomous vehicle makers like Waymo build their own simulations to train or test their driving software.

Synthetic data is useful for any AI system that interacts with the world — not just cars.

  • In health care, made-up data can substitute for sensitive information about patients, mirroring characteristics of the population without revealing private details.
  • In manufacturing, "if you're doing visual inspection on smartphones, you don't have a million pictures of scratched smartphones," says Andrew Ng, founder of Landing.ai and former AI head of Google and Baidu. "If you can get something to work with just 100 or 10 images, it breaks open a lot of new applications."
  • In robotics, it's helpful to imitate hard-to-find conditions. "It's very expensive to go out and vary the lighting in the real world, and you can't vary the lighting in an outdoor scene," says Mike Skolones, director of simulation technology at Nvidia. But you can in a simulator.

"We're still in the early days," says Evan Nisselson of LDV Capital, a venture firm that invests in visual technology.

  • But, he says, synthetic data keeps getting closer to reality.
  • Generative adversarial networks — the same AI technology that drives most deepfakes — have helped vault synthetic data to new heights of realism.