logo
episode-header-image
Jul 2025
1h 28m

903: LLM Benchmarks Are Lying to You (An...

Jon Krohn
About this episode
Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models. Additional ... Show More
Up next
Nov 21
942: Odds of AGI by 2040? LEAP Expert Forecasts and Workforce Implications
What’s on the horizon for AI? Jon Krohn wades through opinions from more than experts, curated by the Longitudinal Expert AI Panel (LEAP), about what we can expect from the industry. From estimates on AI-assisted workers through energy consumption to AI performance in highly skil ... Show More
6m 13s
Nov 18
941: Multi-Agent Human Societies, with Dr. Vijoy Pandey
Vijoy Pandey imagines a bold new society in which agents and humans make scientific discoveries and complete physical tasks together, and he tells Jon Krohn about his work at AGNTCY, Cisco’s open-source platform for the Internet of Agents. Listen to the episode to hear Vijoy Pand ... Show More
1h 6m
Nov 14
940: In Case You Missed It in October 2025
Jon Krohn curates a selection of clips from the month that was. Hear from the orchestrators of an expanding AI universe in this episode of In Case You Missed It, with news, views and groundbreaking ideas from Sheamus McGovern, Jerry Yurchisin, Stephanie Hare, Larissa Schneider, a ... Show More
43m 27s
Recommended Episodes
Aug 2024
Metrics Driven Development
<p>How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Developme ... Show More
42m 12s
Aug 2024
Only as good as the data
<p>You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, ... Show More
45m 41s
Nov 2024
The Future of AI: Predictions and Realities
<p>In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions ... Show More
18m 14s
Apr 2024
Measuring The Speed of AI Through Benchmarks
<p dir="ltr">David Kanter, Executive Director at MLCommons, discusses the work they're doing with MLPerf Benchmarks, creating the world's first industry standard approach to measuring AI speed and safety. He also shares ways they're testing AI and LLMs for harm, to measure—and, o ... Show More
31m 45s
Jul 2023
AI Today Podcast: How AI is Transforming Insurance, Interview with Connor Atchison, Wisedocs
AI is proving transformational in every industry, including long established industries, and insurance is no exception. AI is able to optimize underwriting processes, enable more personalized insurance offerings, enhance the overall customer experience, as well as help with proce ... Show More
30m 19s
Jul 2025
Measuring the impact of AI on software engineering – with Laura Tacho
Supported by Our Partners•⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more.• Graphite — The AI developer productivity platform.—There’s no shortage of bold claims about AI and developer productivity, but how do you separate signal from noise?In thi ... Show More
1h 11m
Feb 2025
Grok 3: The New AI Challenger
<p>In this episode, Jaeden discusses the launch of Grok 3, the latest AI model from X AI, highlighting its capabilities, training methods, and performance benchmarks compared to competitors like OpenAI's ChatGPT. He shares personal experiences using Grok 3, including its reasonin ... Show More
16m 45s
Sep 2024
AI is more than GenAI
<p>GenAI is often what people think of when someone mentions AI. However, AI is much more. In this episode, Daniel breaks down a history of developments in data science, machine learning, AI, and GenAI in this episode to give listeners a better mental model. Don’t miss this one i ... Show More
40m 3s
Sep 9
5 Debates Shaping AI
AI is at the center of five big debates: is massive AI spending fueling real growth or just a bubble, will entry-level jobs vanish, does AI truly boost productivity, is vibe coding overhyped, and should we accelerate or slow down with open-source and policy? These questions captu ... Show More
29m 17s
Apr 2025
AGI is still 30 years away — Ege Erdil & Tamay Besiroglu
Ege Erdil and Tamay Besiroglu have 2045+ timelines, think the whole "alignment" framing is wrong, don't think an intelligence explosion is plausible, but are convinced we'll see explosive economic growth (economy literally doubling every year or two).This discussion offers a tota ... Show More
3h 8m