logo
episode-header-image
Yesterday
53m 19s

AI Inference: Why Speed Matters More Tha...

The Neuron
About this episode

Everyone's talking about the AI datacenter boom right now. Billion dollar deals here, hundred billion dollar deals there. Well, why do data centers matter? It turns out, AI inference (actually calling the AI and running it) is the hidden bottleneck slowing down every AI application you use (and new stuff yet to be released).

In this episode, Kwasi Ankomah from SambaNova Systems explains why running AI models efficiently matters more than you think, how their revolutionary chip architecture delivers 700+ tokens per second, and why AI agents are about to make this problem 10x worse.

💡 This episode is sponsored by Gladia's Solaria - the speech-to-text API built for real-world voice AI. With sub-270ms latency, 100+ languages supported, and 94% accuracy even in noisy environments, it's the backbone powering voice agents that actually work. Learn more at gladia.io/solaria

🔗 Key Links:

• SambaNova Cloud: https://cloud.sambanova.ai

• Check out Solaria speech to text API: https://www.gladia.io/solaria

• Subscribe to The Neuron newsletter: https://theneuron.ai

🎯 What You'll Learn:

• Why inference speed matters more than model size

• How SambaNova runs massive models on 90% less power

• Why AI agents use 10-20x more tokens

• The best open source models right now

• What to watch for in AI infrastructure

➤ CHAPTERS

Timecode - Chapter Title

0:00 - Intro

2:14 - What is AI Inference?

3:19 - Why Inference is the Real Challenge

9:18 - A message from our sponsor, Gladia Solaria

10:16 - The 95% ROI Problem Discussion

13:47 - SambaNova's Revolutionary Chip Architecture

15:19 - Running DeepSeek's 670B Parameter Models

18:11 - Developer Experience & Platform

21:26 - AI Agents and the Token Explosion

24:33 - Model Swapping and Cost Optimization

31:30 - Energy Efficiency 10kW vs 100kW

36:13 - Future of AI Models Bigger vs Smaller

39:24 - Best Open Source Models Right Now

46:01 - AI Infrastructure Next 12 Months

47:09 - Agents as Infrastructure

50:28 - Human-in-the-Loop and Trust

52:55 - Closing and Resources

Article Written by: Grant Harvey

Hosted by: Corey Noles and Grant Harvey

Guest: Kwasi Ankomah

Published by: Manique Santos

Edited by: Adrian Vallinan

Up next
Oct 3
First 48 Hours With Sora 2: The Good, The Bizarre, and Sam Altman
In this special hands-on episode, Corey Noles and Grant Harvey dive into OpenAI's Sora 2 - the AI video platform that's part TikTok, part meme generator, and 100% chaos. Watch as they navigate the new social media-style interface, create ridiculous videos featuring Sam Altman at ... Show More
46m 34s
Oct 1
How OpenAI Beat Every Human Team at the World's Hardest Coding Competition
In this episode, we're joined by Ahmed El-Kishky, research lead at OpenAI, to discuss their historic victory at the International Collegiate Programming Contest (ICPC) where their AI system solved all 12 problems, beating every human team in the world finals. We dive into how the ... Show More
52m 50s
Sep 24
Mustafa Suleyman on Seemingly Conscious AI and Microsoft's Next Chapter
Microsoft AI CEO Mustafa Suleyman (co-founder of DeepMind) joins The Neuron to discuss his provocative essay on "Seemingly Conscious AI" and why machines that mimic consciousness pose unprecedented risks - even when they're not actually alive. We explore how 700 million people ar ... Show More
42m 4s
Recommended Episodes
Sep 23
How Microsoft is Fixing the Biggest AI Agent Problem
Want the guide to create AI Agents? get it here: https://clickhubspot.com/fhc Episode 77: Are we nearing a future where AI agents can autonomously tackle our biggest challenges—while remaining efficient, safe, and truly aligned with human goals? Matt Wolfe (https://x.com/mreflow) ... Show More
30m 8s
Nov 2024
Why You Should Ditch AI // AI Robot Learn Surgery
### Podcast Show Notes: Exploring AI, Productivity, and Autonomous Agents   #### Episode Highlights: 1. **AI Learns to Perform Surgery: The Future of Robotic Healthcare**    - Researchers at Johns Hopkins University have developed a surgical robot capable of learning complex proc ... Show More
14m 49s
Aug 26
Microsoft AI CEO: The AI Future You’re Not Ready For
Want to Master AI Agents in 2025? Get the guide: https://clickhubspot.com/etv Episode 73: What’s really holding back the future of AI—and are we truly prepared for what comes next? Matt Wolfe (https://x.com/mreflow) is joined by Mustafa Suleyman (https://x.com/mustafasuleyman), l ... Show More
22m 40s
Sep 5
How to Be An AI Leader (According to OpenAI)
OpenAI has published a new leadership guide for executives, laying out five principles — align, activate, amplify, accelerate, and govern — designed to help organizations lead in the age of AI. This episode breaks down the most important lessons, the subtext behind OpenAI’s recom ... Show More
28m 11s
Jul 25
Why AI is our ultimate test and greatest invitation | Tristan Harris
Technologist Tristan Harris has an urgent question: What if the way we’re deploying the world’s most powerful technology — artificial intelligence — isn’t inevitable, but a choice? In this eye-opening talk, he calls on us to learn from the mistakes of social media’s catastrophic ... Show More
18m 40s
Sep 11
Why You Need Different AIs for Different Jobs
Even the biggest companies are learning that no single AI model can do it all. Microsoft, for example, is now bringing Anthropic’s Claude into Office 365 because it outperforms OpenAI in key areas like Excel and PowerPoint. This shift highlights a bigger truth: the future of AI i ... Show More
24m 2s
Sep 16
The State of AI Agents in 2025 & How to Use Them
Want our guide to master AI Agents? Get it here: https://clickhubspot.com/bka Episode 76: What actually makes something a real "AI Agent"—and how close are we to AI handling complex work entirely on its own? Matt Wolfe (https://x.com/mreflow) is joined by Deepak Singh (https://x. ... Show More
51m 26s
Nov 2024
Making Sense of Agentic AI | ThoughtWorks Birgitta Boeckeler
There’s AI agents. There’s AI tooling. Do either drive business impact or are they just more things your dev team is supposed to stay on top of? Birgitta Boeckeler, Global Lead for AI Assisted Software Delivery at ThoughtWorks, joins the show to discuss the practical applications ... Show More
47m 40s
Sep 17
AI: Copilot or Job Killer? - An Interview With Eliman Dambell
Some CEOs brag about using AI to cut jobs. But there’s another way to see it.In this episode, I sit down with Eliman Dambell, co-founder of Savvio.ai and former London finance director turned crypto analyst. He brings a unique perspective on why “AI should be a copilot, not a rep ... Show More
31m 53s
Feb 2025
Questions Executives Should Ask About AI
Unpacking AI: Executive Insights & Essential Questions Join us in this special edition of Hashtag Trending and Cybersecurity Today as we dive deep into AI with technology consultant Marcel Gagné and cybersecurity expert John Pinard. We discuss the necessity for executives to unde ... Show More
1h 1m