logo
episode-header-image
Aug 2024
42m 12s

Metrics Driven Development

Practical AI LLC
About this episode

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” approach. Shahul from Ragas joins us to discuss Ragas in this episode, and we dig into specific metrics, the difference between benchmarking models and evaluating LLM apps, generating synthetic test data and more.

Join the discussion

Changelog++ members save 5 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • Assembly AI – Turn voice data into summaries with AssemblyAI’s leading Speech AI models. Built by AI experts, their Speech AI models include accurate speech-to-text for voice data (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, chapter detection, PII redaction, and more. 

Featuring:

Show Notes:

Something missing or broken? PRs welcome!

★ Support this podcast ★
Up next
Yesterday
The impact of AI on the workforce: A state-level case study
Daniel sits down with Chelsea Linder, VP of Innovation and Entrepreneurship at TechPoint, to explore the what AI innovation and impact look like on the ground. They discuss Chelsea's journey from the VC world into economic development/ innovation, the growth of an AI innovation n ... Show More
44m 4s
Sep 29
We've all done RAG, now what?
Longtime friend of the show Rajiv Shah returns to unpack lessons from a year of building retrieval-augmented generation (RAG) pipelines and reasoning models integrations. We dive into why so many AI pilots stumble, why evaluation and error analysis remain essential data science s ... Show More
43m 35s
Sep 23
Creating a private AI assistant in Thunderbird
In this episode, Daniel and Chris are joined by Chris Aquino, software engineer at Thunderbird to hear the story of how they developed a privacy-preserving AI executive assistant. They discuss various design decisions including remote (but confidential) inference, local encryptio ... Show More
53m 8s
Recommended Episodes
Jul 8
903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir
Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmar ... Show More
1h 28m
Aug 15
Measuring AI code assistants and agents with the AI Measurement Framework
In this episode of Engineering Enablement, DX CTO Laura Tacho and CEO Abi Noda break down how to measure developer productivity in the age of AI using DX’s AI Measurement Framework. Drawing on research with industry leaders, vendors, and hundreds of organizations, they explain ho ... Show More
41m 14s
Nov 2024
Making Sense of Agentic AI | ThoughtWorks Birgitta Boeckeler
There’s AI agents. There’s AI tooling. Do either drive business impact or are they just more things your dev team is supposed to stay on top of? Birgitta Boeckeler, Global Lead for AI Assisted Software Delivery at ThoughtWorks, joins the show to discuss the practical applications ... Show More
47m 40s
Feb 2025
The Future of Data Engineering: AI, LLMs, and Automation
Summary In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large langua ... Show More
59m 39s
Sep 2024
Leveling up JavaScript with Deno 2 (Interview)
Jerod is joined by Ryan Dahl to discuss his second take on leveling up JavaScript developers all around the world. Jerod asks Ryan why not try to fix or fork Node instead of starting fresh, how Deno (the open source project) can avoid the all too common rug pull (not cool) scenar ... Show More
1h 15m
Aug 3
Where AI Is Right Now: 15 Charts in 15 Minutes
In today’s episode, we take a rapid-fire tour through 15(ish) charts that capture the current state of artificial intelligence across consumer use, enterprise adoption, agents, and infrastructure. From skyrocketing usage metrics and token consumption to the rise of agentic workfl ... Show More
22m 24s
Sep 2024
The wrong place to slap a person (Friends)
Nick Nisi joins Adam and Jerod to talk about Karaoke, ARC and the business model of web browsers, this WordPress drama, and an epic bonus for Changelog ++ subscribers. Leave us a commentChangelog++ members get a bonus 61 minutes at the end of this episode and zero ads. Join today ... Show More
1h 39m
Aug 27
Amperity Reimagines Data and Developer Workflows with AI - Ep. 271
Derek Slager, co-founder and CTO of Amperity, explores how agentic AI and vibe coding are reshaping enterprise data management and the developer experience on the NVIDIA AI Podcast. Hear how Amperity’s platform unifies customer data, powers advanced analytics, and brings conversa ... Show More
36m 40s
Apr 2025
Steven Zgaljic, Jahnel Group, CTO: Harnessing AI for Health Insights
RSVP to the 13th CTO Colloquium on 4/17/25In this episode, Steven Zgaljic, CTO of Jahnel Group, joins host Etienne de Bruin to share a personal story about his daughter’s health challenges. Faced with the need to meticulously track symptoms and daily activities, Steven leveraged ... Show More
18m 44s
Sep 18
How People Actually Use ChatGPT
This episode of AI Daily Brief dives into two important reports on how people are really using AI tools like ChatGPT and Claude. OpenAI’s massive study with Harvard and NBER reveals consumer patterns across 1.5 million conversations, while Anthropic’s Economic Index tracks broade ... Show More
27m 39s