Stories

IBM releases image data to improve facial recognition AI

Diverse faces in a crowd of people
A crowd of people in England. Photo: Pictures Ltd./Corbis via Getty Images.

IBM plans to release more than 1 million facial images to help better train the artificial intelligence behind facial recognition systems.

Why it matters: The risk of bias being built into AI systems is a major hurdle for all companies developing facial analysis algorithms to, for example, recognize different skin colors and other attributes in a non-discriminatory way. Since AI is only as good as the data that trains it, IBM thinks making a diverse dataset available will help root out bias.

How it works: IBM says its new project will be the largest facial image dataset available that is specifically curated for the training of AI, and will be open to academics, public interest groups and competitors.

  • IBM selected a sampled subset of one million images from Yahoo's Flickr dataset (Flickr allows the use of its images for research).
  • Its researchers have been annotating those images with facial attributes (such as hair color and facial hair) for more advanced matching capabilities, and they used geo-tags to get an appropriate mix of data from multiple countries.
  • IBM is also releasing a dataset of 36,000 facial images that are equally distributed across all ethnicities, genders and ages to provide a more diverse dataset for researches to use to identify and correct bias in their facial analysis systems.
"No single company can tackle the challenge of AI bias in a vacuum, and we believe it’s essential that tools like this be available for everyone in the field so we all can play a role in advancing the technology responsibly."
— Ruchir Puri, chief architect, IBM Watson

The challenge: AI technologies are developing quickly, but consumers are wary of the consequences. At the same time, consumers are becoming more aware of the power of their data — including photos and videos — being collected by tech giants to build AI systems.

  • For months, IBM has been pushing responsible technology development, making the case that losing consumer trust in the early days of AI will undermine the broader benefits the tech industry thinks AI can offer. Of course, loss of consumer trust is also bad for business.

Go deeper: Axios' Ina Fried has more on the risk of bias in AI: