Almost every AI agent demo lands in roughly the same place: it works most of the time, looks remarkable, and then fails in a way no one anticipated. Self-driving cars hit this wall a decade ago, and agents are running into it now. For data and AI teams, the question is no longer whether agents can complete a task — it's whether they can complete it reliably enough to remove the human reviewer. Which categories of work tolerate a 90% success rate? Which absolutely don't? And where should the next layer of guardrails sit?

Ruslan Salakhutdinov is a UPMC Professor of Computer Science at Carnegie Mellon University and one of Geoffrey Hinton's former PhD students. He has previously served as Director of AI Research at Apple and VP of Research in Generative AI at Meta. His research focuses on deep learning, reasoning, and AI agents.

In the episode, Richie and Russ explore the most exciting use cases of AI agents today, long horizon tasks, the credit assignment problem, multi-agent systems, designing reliable human-in-the-loop workflows, agent safety and guardrails, embodied and physical AI, lessons from self-driving cars, the difference between academia and industry, and much more.

Links Mentioned in the Show:

• Claude Code (Anthropic)

• Yutori

• Waymo

• Apple Project Titan

• DeepSeek-V3 Technical Report

• Kimi K2 Technical Report

• Connect with Ruslan: LinkedIn

• AI-Native Course: Intro to AI for Work

New to DataCamp?