Nov 3, 2021 - Technology

The synthetic data that will help build AI and the metaverse

Illustration of cardboard binary propped up.

Illustration: Aïda Amer/Axios

Synthetic data — the generation of artificial images to train AI and computer vision — will be key to building out a future metaverse.

Why it matters: AI has long been trained on images — including human faces — captured from the real world, but doing so can create serious privacy concerns.

  • Using synthetic data instead can help sidestep that issue, though it brings new worries about accuracy and authenticity.

Driving the news: Facebook announced on Tuesday that it plans to shut down its decade-old facial recognition system and delete the facial scans of more than a billion users, out of what it said were privacy concerns.

Between the lines: Increasingly, privacy concerns will lead companies to move from capturing real faces and other images to train AI as they transition to using synthetically generated data.

  • Tel Aviv-based synthetic data company Datagen does high-quality level digital scans and motion capture of real people and objects and then uses AI to generate realistic but not real versions.
  • Gartner predicted recently that by 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated.

The big picture: Since images of real people aren't being used directly, privacy and bias are less of a concern.

  • Early computer vision systems were often trained on datasets taken from the internet that were disproportionately white and male, which meant they were less accurate in recognizing faces from other races and genders.
  • With synthetic data, "you can incorporate the real distributions of the real world, so there's no bias among age, gender and more," says Gil Elbaz, co-founder and CTO of Datagen.

The catch: Some experts worry synthetic data may not be as valid as the real thing, which could damage the performance of AI models trained on it.

  • Many of the same tools used to generate synthetic faces for AI training could also be used to create convincing deepfakes, though Elbaz notes technical tools like smart contracts could be used to separate synthetics from fakes.

What's next: Synthetic data will be key to creating a more realistic version of the AR and VR future called the metaverse.

  • "The metaverse is going to have a hardware and software component," says Elbaz. "Synthetic data will be part of the software that enables the right kind of hardware."
Go deeper