Facebook's AI wants to learn the world through human eyes
- Bryan Walsh, author of Axios Future

Illustration: Aïda Amer/Axios
Facebook is announcing a new machine learning project that aims to teach AI how to understand and interact with the world through a first-person perspective.
Why it matters: Most computer vision is trained on images and videos taken from a third-person perspective, but to build AI assistants and robots that can work with us in the real world, researchers will need to compile data sets built on what is known as egocentric perception.
What's happening: Facebook AI's Ego4D project involves a consortium of researchers from 13 universities and labs in nine countries who used academic gifts from the company to collect more than 2,200 hours of first-person video recorded in the real world with head-mounted cameras and other wearables.
- That data was supplemented by an additional 400 hours of first-person video captured by Facebook Reality Labs Research using test subjects wearing augmented reality (AR) smart glasses in staged environments.
- The resulting data is more than 20 times larger than any other existing egocentric dataset in terms of hours of footage, and will be made publicly available in November to researchers who agree to Ego4D's data use agreement.
What they're saying: "Down the line this will allow AI systems to help you in ways that are really in the moment, contextual with what you're doing and what you've seen before, all in ways you just can't do now," says Kristen Grauman, a research scientist at Facebook AI.
Between the lines: As part of Ego4D, Facebook AI is establishing five benchmarks that researchers will be able to use to build and test egocentric AI models that use the project's data set.
- These involve episodic memory (the ability to know what happened in the real world and when), forecasting, hand-object interaction, audio-visual diarization, and social interaction.
- Put it all together, and you could eventually build AI assistants — perhaps tied to smart glasses like the kind Facebook and Ray-Ban recently put out — that could observe your actions and offer reminders or assistance tied to real world tasks.
- "Maybe you're cooking a dish, and [the AI] can anticipate that you're going to end up needing key ingredients, and remember where they physically are," says Grauman.
The bottom line: It's not hard to see how building first-person, egocentric AI models will dovetail with Facebook's move into the AR and VR spaces.