logo
episode-header-image
Jan 2025
57m 30s

Breaking Down Data Silos: AI and ML in M...

Tobias Macey
About this episode
Summary
In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights the importance of combining traditional techniques with modern AI to address the nuances of data reconciliation. Dan emphasizes the transformative potential of large language models (LLMs) in creating more natural user experiences, improving trust in AI-driven data solutions, and simplifying complex data management processes. He also discusses the balance between using AI for complex data problems and the necessity of human oversight to ensure accuracy and trust.


Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. 
  • As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world; like in their episode “The Secret Sauce Behind McDonald’s Data Strategy”, which digs into how AI-driven tools can be used to support crew efficiency and customer interactions. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.
  • Your host is Tobias Macey and today I'm interviewing Dan Bruckner about the application of ML and AI techniques to the challenge of reconciling data at the scale of business
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by giving an overview of the different ways that organizational data becomes unwieldy and needs to be consolidated and reconciled?
    • How does that reconciliation relate to the practice of "master data management"
  • What are the scaling challenges with the current set of practices for reconciling data?
  • ML has been applied to data cleaning for a long time in the form of entity resolution, etc. How has the landscape evolved or matured in recent years?
    • What (if any) transformative capabilities do LLMs introduce?
  • What are the missing pieces/improvements that are necessary to make current AI systems usable out-of-the-box for data cleaning?
  • What are the strategic decisions that need to be addressed when implementing ML/AI techniques in the data cleaning/reconciliation process?
  • What are the risks involved in bringing ML to bear on data cleaning for inexperienced teams?
  • What are the most interesting, innovative, or unexpected ways that you have seen ML techniques used in data resolution?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on using ML/AI in master data management?
  • When is ML/AI the wrong choice for data cleaning/reconciliation?
  • What are your hopes/predictions for the future of ML/AI applications in MDM and data cleaning?
Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Up next
Jul 6
Foundational Data Engineering At 2Sigma
SummaryIn this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing ... Show More
55m 5s
Jun 29
Enabling Agents In The Enterprise With A Platform Approach
SummaryIn this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current entrepreneurial venture focused on multi-agen ... Show More
54m 18s
Jun 18
Dagster's New Era: Modularizing Data Transformation in the Age of AI
SummaryIn this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insi ... Show More
1h 1m
Recommended Episodes
Nov 2024
#262 Self-Service Business Intelligence with Sameer Al-Sakran, CEO at Metabase
We’re improving DataFramed, and we need your help! We want to hear what you have to say about the show, and how we can make it more enjoyable for you—find out more here.We’re often caught chasing the dream of “self-serve” data—a place where data empowers stakeholders to answer th ... Show More
51m 33s
Jun 2024
#212 The History of Data and AI, and Where It's Headed with Cristina Alaimo, Assistant Professor at Luiss Guido Carli University
One thing we like to do on DataFramed is cover the current state of data & AI, and how it will change in the future. But sometimes to really understand the present and the future, we need to look into the past. We need to understand just exactly how data became so foundational to ... Show More
50m 3s
Mar 2025
Bridging AI and Business: Conversational AI & Communicating Data Value
In this episode of the Data Science Salon Podcast, host Anna Anisin sits down with two incredible leaders driving innovation in AI and data science. First, Noelle Russell, CEO at AI Leadership Institute, shares her expertise on Conversational AI and intelligent contact centers. S ... Show More
24m 53s
Oct 2024
#254 Career Skills for Data Professionals with Wes Kao, Co-Founder of Maven
Mastering the technical side of data and AI is one thing, but communicating those insights effectively is a whole different challenge. How do you make sure your data is understood, acted upon, and influences decisions? It’s not just about presenting the right numbers—it’s about f ... Show More
46m 22s
Nov 2021
AI Today Podcast: AI Education Series: Managing Data for AI
This podcast episode provides a snippet of Cognilytica’s AI and ML education from our Cognilytica Education Subscription. Data is at the heart of AI. It should be no surprise then that proper data management is crucial for AI projects. This podcast is an excerpt from our Cognilyt ... Show More
24m 25s
Jun 2021
AI Today Podcast: Big Data Analytics Evolution with Antonio Cotroneo at OmniSci
AI and advanced big data analytics have been transforming organizations and helping them get answers from their largest datasets for years. In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer interview Antonio Cotroneo, Director of Technical Content Str ... Show More
32m 7s
Aug 2024
Only as good as the data
You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, ben ... Show More
45m 41s
Dec 2024
Best of 2024: The Art of Prompt Engineering with Alex Banks, Founder and Educator, Sunday Signal
As we look back at 2024, we're highlighting some of our favourite episodes of the year, and with 100 of them to choose from, it wasn't easy!The four guests we'll be recapping with are:Lea Pica - A celebrity in the data storytelling and visualisation space. Richie and Lea cover th ... Show More
44m 58s
Oct 2024
Why Human Data is Key to AI: Alexandr Wang from Scale AI
In this conversation with a16z general partner David George, Scale AI founder and CEO Alexandr Wang discusses the three pillars of AI—models, compute, and data—and how creating abundant data is core to the evolution of gen AI. With Scale’s work across enterprise, automotive, and ... Show More
35m 8s
Apr 2023
2344: Cloudera: Moving Beyond Big Data to Hybrid Data Mastery
I sit down with Chris Royles, EMEA Field CTO at Cloudera, to discuss the evolution of Big Data and why hybrid data is the next challenge for businesses to tackle. In this episode, we explore how the term 'Big Data' has become dated and how the rapid rise of hybrid data has shifte ... Show More
39m 54s