logo
episode-header-image
Aug 2024
42m 12s

Metrics Driven Development

Practical AI LLC
About this episode

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” approach. Shahul from Ragas joins us to discuss Ragas in this episode, and we dig into specific metrics, the difference between benchmarking models and evaluating LLM apps, generating synthetic test data and more.

Join the discussion

Changelog++ members save 5 minutes on this episode because they made the ads disappear. Join today!

Sponsors:

  • Assembly AI – Turn voice data into summaries with AssemblyAI’s leading Speech AI models. Built by AI experts, their Speech AI models include accurate speech-to-text for voice data (such as calls, virtual meetings, and podcasts), speaker detection, sentiment analysis, chapter detection, PII redaction, and more. 

Featuring:

Show Notes:

Something missing or broken? PRs welcome!

★ Support this podcast ★
Up next
Jul 7
AI in the shadows: From hallucinations to blackmail
In the first episode of an "AI in the shadows" theme, Chris and Daniel explore the increasing concerning world of agentic misalignment. Starting out with a reminder about hallucinations and reasoning models, they break down how today’s models only mimic reasoning, which can lead ... Show More
44m 50s
Jul 2
Finding Nemotron
In this episode, we sit down with Joey Conway to explore NVIDIA's open source AI, from the reasoning-focused Nemotron models built on top of Llama, to the blazing-fast Parakeet speech model. We chat about what makes open foundation models so valuable, how enterprises can think ab ... Show More
46m 23s
Jun 27
AI hot takes and debates: Autonomy
Can AI-driven autonomy reduce harm, or does it risk dehumanizing decision-making? In this “AI Hot Takes & Debates” series episode, Daniel and Chris dive deep into the ethical crossroads of AI, autonomy, and military applications. They trade perspectives on ethics, precision, resp ... Show More
45m 36s
Recommended Episodes
Nov 2024
Making Sense of Agentic AI | ThoughtWorks Birgitta Boeckeler
There’s AI agents. There’s AI tooling. Do either drive business impact or are they just more things your dev team is supposed to stay on top of? Birgitta Boeckeler, Global Lead for AI Assisted Software Delivery at ThoughtWorks, joins the show to discuss the practical applications ... Show More
47m 40s
Feb 2025
The Future of Data Engineering: AI, LLMs, and Automation
Summary In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large langua ... Show More
59m 39s
Sep 2024
The wrong place to slap a person (Friends)
Nick Nisi joins Adam and Jerod to talk about Karaoke, ARC and the business model of web browsers, this WordPress drama, and an epic bonus for Changelog ++ subscribers. Leave us a commentChangelog++ members get a bonus 61 minutes at the end of this episode and zero ads. Join today ... Show More
1h 39m
Nov 2024
The Future of AI: Predictions and Realities
In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions fo ... Show More
18m 14s
Oct 2024
Generally AI - Season 2 - Episode 1: Generative AI and Creativity
Hosts Roland and Anthony discuss how AI is being used to make creativity more accessible. While some Generative AI content lacks variety and artistic depth, there is potential for AI to assist human creators rather than replace them. They also explore the challenge of evaluating ... Show More
44m 10s
Nov 2024
ANTHOLOGY — Packages, pledges & protocols (Interview)
The hallway track at All Things Open 2024 — features Carl George, Principal Software Engineer at Red Hat for a discussion on the state of open source enterprise linux and RHEL (Red Hat Enterprise Linux), Max Howell, creator of Homebrew and tea.xyz which offers rewards and recogni ... Show More
1h 45m
Jan 2025
Breaking Down Data Silos: AI and ML in Master Data Management
Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data ... Show More
57m 30s
Dec 2024
ShopTalk & Friends (Changelog & Friends #72)
Chris Coyier and Dave Rupert join Adam and Jerod for a ShopTalk & Friends conversation on the viability of the web, making content, ads to support that content, Codepen’s future plans, books, side quests, and social networks devaluing links. Join the discussionChangelog++ members ... Show More
1h 34m
Nov 2024
Code Generation & Synthetic Data With Loubna Ben Allal #51
Our guest today is Loubna Ben Allal, Machine Learning Engineer at Hugging Face 🤗 . In our conversation, Loubna first explains how she built two impressive code generation models: StarCoder and StarCoder2. We dig into the importance of data when training large models and what can ... Show More
47m 6s
Mar 2024
Open sourcing AI app development with Harrison Chase from LangChain
Companies are employing AI agents and co-pilots to help their teams increase efficiency and accuracy, but developing apps that are trained properly can require a skill set many enterprise teams don’t have. This week on No Priors, Sarah and Elad are joined by Harrison Chase, the C ... Show More
27m 32s