Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models.
Additional ... Show More
Yesterday
984: Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra
Raju Malhotra, Chief Product and Technology Officer at Certinia, talks to Jon Krohn about the so-called SaaSpocalypse and how agentic AI is proving the doomsayers wrong. Listen to the episode to hear more about Certinia’s work with Salesforce and building with Agentforce 360, the ... Show More
29m 18s
Apr 14
983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith
My guest today took a public school that was about to be shut down and turned it into the number one school in Boston, and AI is her latest secret weapon. In a long-overdue episode on AI for supporting children’s education, hear directly from Principal Traci Walker Griffith how h ... Show More
1h 12m
Aug 2024
Metrics Driven Development
How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” ... Show More
42m 12s
Nov 2024
The Future of AI: Predictions and Realities
In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions fo ... Show More
19m 52s
Apr 2024
Measuring The Speed of AI Through Benchmarks
<p dir="ltr">David Kanter, Executive Director at MLCommons, discusses the work they're doing with MLPerf Benchmarks, creating the world's first industry standard approach to measuring AI speed and safety. He also shares ways they're testing AI and LLMs for harm, to measure—and, o ... Show More
31m 45s
Jul 2023
AI Today Podcast: How AI is Transforming Insurance, Interview with Connor Atchison, Wisedocs
AI is proving transformational in every industry, including long established industries, and insurance is no exception. AI is able to optimize underwriting processes, enable more personalized insurance offerings, enhance the overall customer experience, as well as help with proce ... Show More
30m 19s
Jul 2025
Measuring the impact of AI on software engineering – with Laura Tacho
Supported by Our Partners• Statsig — The unified platform for flags, analytics, experiments, and more.• Graphite — The AI developer productivity platform.—There’s no shortage of bold claims about AI and developer productivity, but how do you separate signal from noise?In thi ... Show More
1h 11m
Feb 2025
Grok 3: The New AI Challenger
In this episode, Jaeden discusses the launch of Grok 3, the latest AI model from X AI, highlighting its capabilities, training methods, and performance benchmarks compared to competitors like OpenAI's ChatGPT. He shares personal experiences using Grok 3, including its reasoning a ... Show More
18m 24s