Teaching robots to see — and understand
Illustration: Sarah Grillo/Axios
Machine vision is a crucial missing link holding back the robotization of industries like manufacturing and shipping. But even as that field advances rapidly, there's a larger hurdle that still blocks widespread automation — machine understanding.
Why it matters: Up against a shortage of workers, those sectors stand to benefit hugely from automation. But the people working in warehouses and factories could find their jobs changed or eliminated if vision technology sees new breakthroughs.
The big picture: Machine vision can help robots navigate spaces previously closed off to them, like a crowded warehouse floor or a cluttered front lawn. And it's critical for tasks that require dexterity, like packing a box with oddly shaped objects.
- Plus, AI can help make sense of the avalanche of video footage recorded daily, which far outstrips humanity's ability to digest it.
- Companies are scrambling to make use of that data to understand how people and vehicles move, or to check for tiny imperfections in new products.
- The rise of AI-monitored cameras is also making surveillance inescapable at work and in public spaces.
Driving the news: In a report first shared with Axios, LDV Capital, a venture firm that invests in visual technologies, predicts an upheaval in manufacturing and logistics, driven primarily by computer vision.
- "The majority of global factories, ports, and warehouses are understaffed and ill-equipped to meet still-rising requirements," the report reads. Visual technologies will help change that, LDV argues.
- In China, some "lights-off" factories have been built to operate without a single human present. But the U.S. will largely see robots employed in factories and warehouses not custom-built for robots, says Abby Hunter-Syed, VP of operations at LDV.
Yes, but: It'll take more than just high-fidelity cameras and fast AI perception to make an intelligent robot.
- A big unsolved challenge is imbuing robots with a deeper understanding of the world around them, so that they can interpret what they see and react to it.
- "Domestic robots, for example, are just not going to arrive until machines can interpret scenes well," says Gary Marcus, co-founder of robotics company Robust.ai. "You can do Roomba, but not Rosie the Robot."
A broad understanding of the world helps us humans avoid confounding errors when we look around.
- Even if we see a cloud perfectly shaped like a horse, we never actually think it's a flying horse because we get how clouds work.
- The same ability helps us handle objects easily — even ones we've never seen before. Humans can generally guess how to place an item on a surface so that it stays upright, for example, rather than tipping over.
- "We've built physics models in our heads, and we've not quite been able to transfer them to robots," says Avideh Zakhor, a Berkeley professor who studies computer vision.
The big question: How much of the problem is solvable with incremental improvements in machine vision, before robots need better common sense?
- Evan Nisselson, a partner at LDV, argues that industry can get 85% or 90% of the way toward lucrative automation with better machine vision.
- But that depends on how much warehouses and factories can remove variability and chaos from the areas where robots are working.
The bottom line: "The Rubicon here, which we haven't crossed yet, is to not just be able to see objects," says Marcus. "It's interpreting scenes that will be the breakthrough."