AI robotics still struggle to match the skill level of a child. Human intelligence emerges from our combination of senses and language abilities; the same might be true for artificial intelligence. Is combining vision, audio, and language processing into a single AI system possible—and will it solve the problem? Explore the implications for AI and its potential use cases. As featured in the 10 Breakthrough Technologies 2021.