Lazaro Gamio / Axios
IBM today claimed a leap in "deep learning," the leading method to train intelligent machines to sort photos, decipher voices and drive autonomous vehicles, and compared the achievement to the jump to jet-powered aviation six decades ago.
The breakthrough, IBM said, could significantly improve fraud detection, medical diagnoses, and self-driving technology, beating a 2014 record set by Microsoft in the speed of a run of deep learning. The software also bested Facebook, until now the leader in this type of deep learning, IBM said. IBM made the software open-source, although it only runs on the company's platform, said Sumit Gupta, vice president for AI and deep learning at IBM.
"This is just as transformative as the jet engine, giving us the accuracy we need," Gupta told Axios.
Why it matters: Gupta said that in one training run, the time needed was cut from 16 days to seven hours, quickening the process of machine learning by 58 times. The result, he said, is to help shift deep learning from an impractically long process to a manageable one.
How it works: Deep learning is a subset of artificial intelligence. Researchers run large data sets — millions of photos, for instance — through layers of artificial neurons, what they call a "neural network." Each time the data is run through the neurons, they pick up more detail about the photos, until after dozens of times, they "know" a lot about them. If the task is narrow, like learning about only photographs of human faces, a neural network can achieve greater than 99% accuracy at distinguishing one person from another.
- The problem comes when the task is open-ended — when you throw a lot of disconnected photographs at a network.
- Until now, Microsoft's Project Adam is the record-holder at understanding such large data sets, although the accuracy is much lower — in 2014, it achieved 29.8% accuracy in 15 million photos in 22,000 different categories.
- IBM says it's bested that record by 4%, in a world in which a 1% improvement is considered commendable. IBM achieved 33.8% accuracy training on half the number of photos — 7.5 million.
What IBM says it did: IBM says its achievement comes from figuring out how to distribute the data runs quickly over dozens of servers, resulting in few of the bottlenecks that have dogged such efforts in the past. In a white paper published yesterday, it described using 64 servers.
- Until now, researchers could only make deep learning work "well" on a single server, according to IBM.
- The software makes it "possible to run popular open source codes like Tensorflow, Caffe, Torch and Chainer over massive scale neural networks and data sets with very high performance and very high accuracy," IBM said in a blog post.
At this scale of computing, Facebook has been the leader, IBM said, achieving what is called "scaling efficiency" of 89% in a big data run. IBM said its software improved that to 95%, and did the run in just under 50 minutes, compared with an hour for Facebook.
- What experts say about the announcement: Given the current commercial frenzy around AI, there is much exaggeration in the field, and experts urge ultra-vigilance before accepting announcements of breakthroughs. A couple of experts I reached out to were cautiously impressed. "The results they present are compelling," said Manuela Veloso, head of the machine learning department at Carnegie Mellon University.
- Ameet Talwalkar, a professor at UCLA and co-founder of Determined AI, told me, "the results do look impressive." But, he added that IBM appeared to be using its ultra-quick computers. "These results seem to rely on using a very fast network, so it's not clear that the results would also apply to more standard cloud-based setups," he said.