logo
episode-header-image
Aug 2024
42m 12s

Metrics Driven Development

Practical AI LLC
About this episode

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” approach. Shahul from Ragas joins us to discuss Ragas in this episode, and we dig into specific metrics, the difference between benchmarking models and evaluating LLM apps, generating synthetic test data and more.

Join the discussion

Changelog++ members save 5 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • Assembly AI – Turn voice data into summaries with AssemblyAI’s leading Speech AI models. Built by AI experts, their Speech AI models include accurate speech-to-text for voice data (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, chapter detection, PII redaction, and more. 

Featuring:

Show Notes:

Something missing or broken? PRs welcome!

★ Support this podcast ★
Up next
Aug 19
Inside America’s AI Action Plan
Dan and Chris break down Winning the Race: America's AI Action Plan, issued by the White House in July 2025. Structured as three "pillars" — Accelerate AI Innovation, Build American AI Infrastructure, and Lead in International AI Diplomacy and Security — our dynamic duo unpack th ... Show More
43m 52s
Aug 12
Confident, strategic AI leadership
Allegra Guinan of Lumiera helps leaders turn uncertainty about AI into confident, strategic leadership. In this conversation, she brings some actionable insights for navigating the hype and complexity of AI. The discussion covers challenges with implementing responsible AI practi ... Show More
47m 40s
Aug 8
Educating a data-literate generation
Dan sits down with guests Mark Daniel Ward and Katie Sanders from The Data Mine at Purdue University to explore how higher education is evolving to meet the demands of the AI-driven workforce. They share how their program blends interdisciplinary learning, corporate partnerships, ... Show More
44m 41s
Recommended Episodes
Jul 8
903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir
Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmar ... Show More
1h 28m
Aug 15
Measuring AI code assistants and agents with the AI Measurement Framework
In this episode of Engineering Enablement, DX CTO Laura Tacho and CEO Abi Noda break down how to measure developer productivity in the age of AI using DX’s AI Measurement Framework. Drawing on research with industry leaders, vendors, and hundreds of organizations, they explain ho ... Show More
41m 14s
Nov 2024
Making Sense of Agentic AI | ThoughtWorks Birgitta Boeckeler
There’s AI agents. There’s AI tooling. Do either drive business impact or are they just more things your dev team is supposed to stay on top of? Birgitta Boeckeler, Global Lead for AI Assisted Software Delivery at ThoughtWorks, joins the show to discuss the practical applications ... Show More
47m 40s
Feb 2025
The Future of Data Engineering: AI, LLMs, and Automation
Summary In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large langua ... Show More
59m 39s
Sep 2024
Leveling up JavaScript with Deno 2 (Interview)
Jerod is joined by Ryan Dahl to discuss his second take on leveling up JavaScript developers all around the world. Jerod asks Ryan why not try to fix or fork Node instead of starting fresh, how Deno (the open source project) can avoid the all too common rug pull (not cool) scenar ... Show More
1h 15m
Aug 3
Where AI Is Right Now: 15 Charts in 15 Minutes
In today’s episode, we take a rapid-fire tour through 15(ish) charts that capture the current state of artificial intelligence across consumer use, enterprise adoption, agents, and infrastructure. From skyrocketing usage metrics and token consumption to the rise of agentic workfl ... Show More
22m 24s
Sep 2024
The wrong place to slap a person (Friends)
Nick Nisi joins Adam and Jerod to talk about Karaoke, ARC and the business model of web browsers, this WordPress drama, and an epic bonus for Changelog ++ subscribers. Leave us a commentChangelog++ members get a bonus 61 minutes at the end of this episode and zero ads. Join today ... Show More
1h 39m
Nov 2024
The Future of AI: Predictions and Realities
In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions fo ... Show More
18m 14s
Oct 2024
Generally AI - Season 2 - Episode 1: Generative AI and Creativity
Hosts Roland and Anthony discuss how AI is being used to make creativity more accessible. While some Generative AI content lacks variety and artistic depth, there is potential for AI to assist human creators rather than replace them. They also explore the challenge of evaluating ... Show More
44m 10s
Nov 2024
ANTHOLOGY — Packages, pledges & protocols (Interview)
The hallway track at All Things Open 2024 — features Carl George, Principal Software Engineer at Red Hat for a discussion on the state of open source enterprise linux and RHEL (Red Hat Enterprise Linux), Max Howell, creator of Homebrew and tea.xyz which offers rewards and recogni ... Show More
1h 45m