logo
episode-header-image
Aug 2024
42m 12s

Metrics Driven Development

Practical AI LLC
About this episode

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” approach. Shahul from Ragas joins us to discuss Ragas in this episode, and we dig into specific metrics, the difference between benchmarking models and evaluating LLM apps, generating synthetic test data and more.

Join the discussion

Changelog++ members save 5 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • Assembly AI – Turn voice data into summaries with AssemblyAI’s leading Speech AI models. Built by AI experts, their Speech AI models include accurate speech-to-text for voice data (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, chapter detection, PII redaction, and more. 

Featuring:

Show Notes:

Something missing or broken? PRs welcome!

Up next
Nov 19
Beyond note-taking with Fireflies
<p>Fireflies CEO, Krish Ramineni shares how the company is transforming AI-powered note-taking into a deeper layer of knowledge automation. He breaks down the technology behind real-time functionality like Live Assist, the user behavior patterns driving product evolution, and how ... Show More
48m 59s
Nov 13
Autonomous Vehicle Research at Waymo
Waymo’s VP of Research, Drago Anguelov, joins Practical AI to explore how advances in autonomy, vision models, and large-scale testing are shaping the future of driverless technology. The conversation dives into the dual challenges of building an onboard driver and testing that d ... Show More
52m 8s
Nov 10
Are we in an AI bubble?
Dan and Chris unpack whether today’s surge in AI deployment across enterprise workflows, manufacturing, healthcare, and scientific research signals a lasting transformation or an overhyped bubble. Drawing parallels to the dot-com era, they explore how technology integration is re ... Show More
49m 41s
Recommended Episodes
Jul 2025
903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir
Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmar ... Show More
1h 28m
Aug 15
Measuring AI code assistants and agents with the AI Measurement Framework
In this episode of Engineering Enablement, DX CTO Laura Tacho and CEO Abi Noda break down how to measure developer productivity in the age of AI using DX’s AI Measurement Framework. Drawing on research with industry leaders, vendors, and hundreds of organizations, they explain ho ... Show More
41m 14s
Nov 2024
Making Sense of Agentic AI | ThoughtWorks Birgitta Boeckeler
<p>There’s AI agents. There’s AI tooling. Do either drive business impact or are they just more things your dev team is supposed to stay on top of?<br/><br/>Birgitta Boeckeler, Global Lead for AI Assisted Software Delivery at ThoughtWorks, joins the show to discuss the practical ... Show More
47m 40s
Jul 2025
Measuring the impact of AI on software engineering – with Laura Tacho
Supported by Our Partners•⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more.• Graphite — The AI developer productivity platform.—There’s no shortage of bold claims about AI and developer productivity, but how do you separate signal from noise?In thi ... Show More
1h 11m
Feb 2025
The Future of Data Engineering: AI, LLMs, and Automation
Summary In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large langua ... Show More
59m 39s
Oct 11
Context Engineering as a Discipline: Building Governed AI Analytics
SummaryIn this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his j ... Show More
51m 58s
Oct 13
Evals, error analysis, and better prompts: A systematic approach to improving your AI products | Hamel Husain (ML engineer)
Hamel Husain, an AI consultant and educator, shares his systematic approach to improving AI product quality through error analysis, evaluation frameworks, and prompt engineering. In this episode, he demonstrates how product teams can move beyond “vibe checking” their AI systems t ... Show More
54m 48s
Sep 2024
Leveling up JavaScript with Deno 2 (Interview)
Jerod is joined by Ryan Dahl to discuss his second take on leveling up JavaScript developers all around the world. Jerod asks Ryan why not try to fix or fork Node instead of starting fresh, how Deno (the open source project) can avoid the all too common rug pull (not cool) scenar ... Show More
1h 15m
Aug 3
Where AI Is Right Now: 15 Charts in 15 Minutes
In today’s episode, we take a rapid-fire tour through 15(ish) charts that capture the current state of artificial intelligence across consumer use, enterprise adoption, agents, and infrastructure. From skyrocketing usage metrics and token consumption to the rise of agentic workfl ... Show More
22m 24s
Sep 2024
The wrong place to slap a person (Friends)
Nick Nisi joins Adam and Jerod to talk about Karaoke, ARC and the business model of web browsers, this WordPress drama, and an epic bonus for Changelog ++ subscribers. 
1h 39m