logo
episode-header-image
Jul 8
1h 28m

903: LLM Benchmarks Are Lying to You (An...

Jon Krohn
About this episode
Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models. Additional ... Show More
Up next
Aug 22
916: The 5 Key GPT-5 Takeaways
GPT-5 has just been released, but with not very much fanfare. In this Five-Minute Friday, Jon Krohn asks if GPT-5 deserves the community’s underwhelmed response to its release. He outlines five features of the model and explains why people might be feeling less than enthusiastic ... Show More
9m 40s
Aug 19
915: How to Jailbreak LLMs (and How to Prevent It), with Michelle Yi
Tech leader, investor, and Generationship cofounder Michelle Yi talks to Jon Krohn about finding ways to trust and secure AI systems, the methods that hackers use to jailbreak code, and what users can do to build their own trustworthy AI systems. Learn all about “red teaming” and ... Show More
1h 9m
Aug 15
914: Data Lakes 101 (and Why They’re Key for AI Models), with Oz Katz
In this Five-Minute Friday, Cofounder and CTO of lakeFS Oz Katz talks to Jon Krohn about data warehouses, data lakes, and how companies can handle increasingly complex data infrastructures and formats. Hear about lakeFS’s collaboration with Legofest, lakeFS’s approach to helping ... Show More
25m 52s
Recommended Episodes
Aug 2024
Metrics Driven Development
How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” ... Show More
42m 12s
Aug 2024
Only as good as the data
You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, ben ... Show More
45m 41s
Nov 2024
The Future of AI: Predictions and Realities
In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions fo ... Show More
18m 14s
Apr 2024
Measuring The Speed of AI Through Benchmarks
David Kanter, Executive Director at MLCommons, discusses the work they’re doing with MLPerf Benchmarks, creating the world’s first industry standard approach to measuring AI speed and safety. He also shares ways they’re testing AI and LLMs for harm, to measure—and, over time, red ... Show More
31m 45s
Jul 2023
AI Today Podcast: How AI is Transforming Insurance, Interview with Connor Atchison, Wisedocs
AI is proving transformational in every industry, including long established industries, and insurance is no exception. AI is able to optimize underwriting processes, enable more personalized insurance offerings, enhance the overall customer experience, as well as help with proce ... Show More
30m 19s
Dec 2024
Navigating AI Safety and Security Challenges with Yonatan Zunger [The BlueHat Podcast]
While we are on our winter publishing break, please enjoy an episode of our N2K CyberWire network show, The BlueHat Podcast by Microsoft and MSRC. See you in 2025! Yonatan Zunger, CVP of AI Safety & Security at Microsoft joins Nic Fillingham and Wendy Zenone on this week's episod ... Show More
53m 34s
Feb 2025
Grok 3: The New AI Challenger
In this episode, Jaeden discusses the launch of Grok 3, the latest AI model from X AI, highlighting its capabilities, training methods, and performance benchmarks compared to competitors like OpenAI's ChatGPT. He shares personal experiences using Grok 3, including its reasoni ... Show More
16m 45s
Sep 2024
AI is more than GenAI
GenAI is often what people think of when someone mentions AI. However, AI is much more. In this episode, Daniel breaks down a history of developments in data science, machine learning, AI, and GenAI in this episode to give listeners a better mental model. Don’t miss this one if y ... Show More
40m 3s
Apr 2025
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo
Scott and Daniel break down every month from now until the 2027 intelligence explosion.Scott Alexander is author of the highly influential blogs Slate Star Codex and Astral Codex Ten. Daniel Kokotajlo resigned from OpenAI in 2024, rejecting a non-disparagement clause and risking ... Show More
3h 4m
Feb 2025
Satya Nadella – Microsoft’s AGI Plan & Quantum Breakthrough
Satya Nadella on: Why he doesn’t believe in AGI but does believe in 10% economic growth; Microsoft’s new topological qubit breakthrough and gaming world models;Whether Office commoditizes LLMs or the other way around. Watch on Youtube; listen on Apple Podcasts or Spotify.-------- ... Show More
1h 16m