The rise of privacy preserving AI
Illustration: Sarah Grillo/Axios
Data is AI's jet fuel — amassing as much as possible allows tech companies to precisely target ads, or medical AI to differentiate between a benign tumor and a malignant one.
The state of play: No problem for Facebook and its gobs of data — but hard for a small clinic with few patients to learn from. Now, new AI methods are allowing companies to benefit from the collective wisdom of peers and competitors, without giving up sensitive data or trade secrets.
Why it matters: This could help improve health care, among the country’s most stubborn problems, by clearing a key hurdle for medical AI — gathering a big and diverse enough dataset to help doctors diagnose difficult problems or choose better treatments.
- Confined to each individual company's own data, AI systems don't have access to the staggering range of examples they need to outperform humans.
- The main recourse has been to send information, assiduously scrubbed of private details, to some central hub to be pooled for study — a slow, laborious process.
What's happening: Privacy-preserving AI techniques like federated learning are powering new systems that can benefit from multiple companies' data — without even having to know what the data is.
- Google showed off in 2017 how federated learning helped its Android keyboards learn new words, based on lessons gleaned from its enormous user base.
- More recently, companies have applied the techniques to new industries, allowing sectors with privacy responsibilities to exploit the strength in numbers that other, less regulated industries can marshal.
Perhaps the most obvious application for federated learning is in health care, where strict rules prevent sharing patient data — but the benefit of gathering lots is potentially very high.
- Owkin, a French startup, has connected more than 30 hospitals and research centers to a system that learns from all of them, in the process rewarding the hospitals that contribute the best data.
- Each institution's data stays on its own computers, rather than being sent elsewhere for processing.
- "We can have different hospitals collaborate while being competitive on their research," Anna Huyghues-Despointes, Owkin's director of strategy, tells Axios.
VIA, a Boston-area AI startup, uses federated learning to pool data about the condition of power transformers, such that a utility in Europe can learn from one in Thailand or New Zealand.
- For power companies, predicting the next catastrophic equipment failure could require data on 1,000 previous problems — but any one company only sees one or two a year, says Colin Gounden, VIA's co-founder.
- Get a few dozen utilities to team up and those 1,000 examples are within reach. Security concerns prevent them from just doling out information about their transformers, but several have already joined VIA's pilot federated learning system.
What's next: Intel is working on methods that will allow companies to apply an AI model to data without even decrypting it, which would open new doors to cooperation even among the most privacy-conscious companies and industries.
The bottom line: Sharing is just one way to solve one of the biggest problems still ahead of AI — figuring out how to slake computers' unending thirst for data. Researchers are also experimenting with bolstering small datasets with synthetic training data, or creating algorithms that can learn from far fewer examples.