
Illustration: Lazaro Gamio/Axios
Artificial intelligence experts — concerned about reported blunders with high-stakes AI systems from makers like Amazon and IBM — are urging more oversight, testing, and perhaps a fundamental rethinking of the underlying technology.
Why it matters: Wall Street, the military, and other sectors expect AI to make increasingly weighty decisions in the future — with less and less human involvement. But if the systems behave inaccurately or display biases, the consequences outside the lab could cause harm to real people.
In reports this week:
- Amazon’s face-recognition platform, Rekognition, matched 28 members of Congress with mugshots when it was put through testing by the ACLU, which announced the results Thursday. The misidentified faces disproportionately belonged to people of color. Responding on its blog, Amazon said the ACLU didn’t test Rekognition with the correct settings, and that its system is meant to help humans make big decisions — not final determinations on its own. Amazon amended the blog post on Friday, Cnet reported, inviting the federal government to recommend rules for how law enforcement uses facial recognition technology.
- IBM’s Watson gave doctors "unsafe and incorrect" recommendations for cancer treatments, Stat News reported last week, quoting internal IBM documents. The finding blamed both IBM engineers and the doctors who were feeding in training data. IBM told Stat News that Watson Health has since improved.
- In an earlier case, a self-driving Uber killed a pedestrian in Arizona in March.
The context: For skeptics of deep learning, the leading machine-learning method that powers most commercial AI, these shortcomings belie greater problems ahead.
- "We shouldn’t mistake pattern recognition for genuine intelligence," Gary Marcus, an NYU professor, tells Axios in an email. "And we shouldn’t be surprised when narrow, shallow intelligence (which is all we have, so far) lets us down."
- Garrett Kenyon, a scientist at Los Alamos National Laboratory, said in an interview that deep learning can’t grasp abstract concepts, or even reliably count or compare objects.
- This isn’t the first time an external audit has found bias in deployed face-recognition algorithms. In research published by MIT, researcher Joy Buolamwini tested three companies' face-recognition systems and found that they performed poorly on darker-skinned and female faces. In response, two of the companies — IBM and Microsoft — published improvements to their algorithms.
The other side: Jack Clark, strategy and communications director at OpenAI, said these cases are not marks against deep learning as a technology.
- "We know DL works," Clark said, using an abbreviation for deep learning.
- "We also know that DL bugs can be pretty bad and implementing DL systems is hard," he continued.
- Without more details about the Amazon and IBM incidents, "it's very difficult to make a call as to whether this is due to a flaw in implementation (which would be my assumption) or a flaw in the algorithm itself (which to my mind seems less likely)."
The bottom line: Clark said that AI systems need to be "vigorously and transparently tested in the wild" before they’re put to work in the real world.