logo
episode-header-image
Jan 2025
57m 30s

Breaking Down Data Silos: AI and ML in M...

Tobias Macey
About this episode
Summary
In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data expert and discusses the challenges of reconciling large-scale organizational data. He explains how data silos arise from independent teams and highlights the importance of combining traditional techniques with modern AI to address the nuances of data reconciliation. Dan emphasizes the transformative potential of large language models (LLMs) in creating more natural user experiences, improving trust in AI-driven data solutions, and simplifying complex data management processes. He also discusses the balance between using AI for complex data problems and the necessity of human oversight to ensure accuracy and trust.


Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details. 
  • As a listener of the Data Engineering Podcast you clearly care about data and how it affects your organization and the world. For even more perspective on the ways that data impacts everything around us don't miss Data Citizens® Dialogues, the forward-thinking podcast brought to you by Collibra. You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone. In every episode of Data Citizens® Dialogues, industry leaders unpack data’s impact on the world; like in their episode “The Secret Sauce Behind McDonald’s Data Strategy”, which digs into how AI-driven tools can be used to support crew efficiency and customer interactions. In particular I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast-moving field. The Data Citizens Dialogues podcast is bringing the data conversation to you, so start listening now! Follow Data Citizens Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.
  • Your host is Tobias Macey and today I'm interviewing Dan Bruckner about the application of ML and AI techniques to the challenge of reconciling data at the scale of business
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by giving an overview of the different ways that organizational data becomes unwieldy and needs to be consolidated and reconciled?
    • How does that reconciliation relate to the practice of "master data management"
  • What are the scaling challenges with the current set of practices for reconciling data?
  • ML has been applied to data cleaning for a long time in the form of entity resolution, etc. How has the landscape evolved or matured in recent years?
    • What (if any) transformative capabilities do LLMs introduce?
  • What are the missing pieces/improvements that are necessary to make current AI systems usable out-of-the-box for data cleaning?
  • What are the strategic decisions that need to be addressed when implementing ML/AI techniques in the data cleaning/reconciliation process?
  • What are the risks involved in bringing ML to bear on data cleaning for inexperienced teams?
  • What are the most interesting, innovative, or unexpected ways that you have seen ML techniques used in data resolution?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on using ML/AI in master data management?
  • When is ML/AI the wrong choice for data cleaning/reconciliation?
  • What are your hopes/predictions for the future of ML/AI applications in MDM and data cleaning?
Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Up next
Oct 5
The Data Model That Captures Your Business: Metric Trees Explained
SummaryIn this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data pr ... Show More
1h 1m
Sep 28
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
SummaryIn this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC archite ... Show More
56m 31s
Sep 18
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
SummaryIn this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to m ... Show More
52m 58s
Recommended Episodes
Nov 2024
#262 Self-Service Business Intelligence with Sameer Al-Sakran, CEO at Metabase
We’re improving DataFramed, and we need your help! We want to hear what you have to say about the show, and how we can make it more enjoyable for you—find out more here.We’re often caught chasing the dream of “self-serve” data—a place where data empowers stakeholders to answer th ... Show More
51m 33s
Mar 2025
#295 How To Get Hired As A Data Or AI Engineer with Deepak Goyal, CEO & Founder at Azurelib Academy
The role of data and AI engineers is more critical than ever. With organizations collecting massive amounts of data, the challenge lies in building efficient data infrastructures that can support AI systems and deliver actionable insights. But what does it take to become a succes ... Show More
52m 27s
Sep 18
How People Actually Use ChatGPT
This episode of AI Daily Brief dives into two important reports on how people are really using AI tools like ChatGPT and Claude. OpenAI’s massive study with Harvard and NBER reveals consumer patterns across 1.5 million conversations, while Anthropic’s Economic Index tracks broade ... Show More
27m 39s
Aug 27
Amperity Reimagines Data and Developer Workflows with AI - Ep. 271
Derek Slager, co-founder and CTO of Amperity, explores how agentic AI and vibe coding are reshaping enterprise data management and the developer experience on the NVIDIA AI Podcast. Hear how Amperity’s platform unifies customer data, powers advanced analytics, and brings conversa ... Show More
36m 40s
Mar 2025
Feed Drop: How AI Will Change Your Job: MIT’s David Autor
Today’s episode is a bonus drop from our friends over at the MIT CSAIL Alliances podcast. We’ll back in two weeks for Season 11 of Me, Myself, and AI. David Autor, the Daniel (1972) and Gail Rubinfeld Professor, Margaret MacVicar Faculty Fellow in MIT’s Department of Economics, s ... Show More
40m 18s
Mar 2025
189. Numbers Need Narrative: Use Data to Influence and Inspire
Why numbers are only as compelling as the narratives we attach to them. Facts and figures can be your friend, but before you load your presentation full of data, Miro Kazakoff has a word of caution: “Data’s objective, but people are not.”You might think that your data speaks for ... Show More
21m 9s
Apr 2025
Is Your AI Knowledge Mature Enough With Eryn Peters
Welcome to the Artificial Intelligence Podcast with Jonathan Green! In this insightful episode, we delve into the critical question surrounding AI literacy and job security with our expert guest, Eryn Peters, who specializes in AI maturity and workforce transformation.Eryn shares ... Show More
31m 41s
Jan 2025
3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud
As part of the IT Press Tour in Silicon Valley, I had the opportunity to sit down with David Flynn, CEO of Hammerspace, to explore how the company is redefining the future of enterprise data storage. At a time when AI-driven workloads and hybrid cloud computing are pushing storag ... Show More
24m 26s
Sep 23
How Microsoft is Fixing the Biggest AI Agent Problem
Want the guide to create AI Agents? get it here: https://clickhubspot.com/fhc Episode 77: Are we nearing a future where AI agents can autonomously tackle our biggest challenges—while remaining efficient, safe, and truly aligned with human goals? Matt Wolfe (https://x.com/mreflow) ... Show More
30m 8s
Feb 2017
MLG 002 What is AI, ML, DS
Links: Notes and resources at ocdevel.com/mlg/2 Try a walking desk stay healthy & sharp while you learn & code Try Descript audio/video editing with AI power-tools What is artificial intelligence, machine learning, and data science? What are their differences? AI history. Hierarc ... Show More
1h 5m