logo
episode-header-image
Aug 26
50m 54s

From Academia to Industry: Bridging Data...

Tobias Macey
About this episode
Summary
In this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and lineage, as well as the challenges of data integration. He explores the impact of large language models (LLMs) on data engineering, highlighting their potential to simplify knowledge graph construction and enhance data integration. The conversation covers the evolving landscape of data architectures, managing semantics and access control, and the interplay between industry and academia in advancing data engineering practices, with Paul also sharing insights into his work with the intelligent data engineering lab and the importance of human-AI collaboration in data engineering pipelines.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
  • Your host is Tobias Macey and today I'm interviewing Paul Groth about his research on knowledge graphs and data engineering
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by describing the focus and scope of your academic efforts?
  • Given your focus on data management for machine learning as part of the INDELab, what are some of the developing trends that practitioners should be aware of?
    • ML architectures / systems changing (matteo interlandi) GPUs for data mangement
  • You have spent a large portion of your career working with knowledge graphs, which have largely been a niche area until recently. What are some of the notable changes in the knowledge graph ecosystem that have resulted from the introduction of LLMs?
  • What are some of the other ways that you are seeing LLMs change the methods of data engineering?
    • There are numerous vague and anecdotal references to the power of LLMs to unlock value from unstructured data. What are some of the realitites that you are seeing in your research?
  • A majority of the conversations in this podcast are focused on data engineering in the context of a business organization. What are some of the ways that management of research data is disjoint from the methods and constraints that are present in business contexts?
  • What are the most interesting, innovative, or unexpected ways that you have seen LLM used in data management?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on data engineering research?
  • What do you have planned for the future of your research in the context of data engineering, knowledge graphs, and AI?
Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Up next
Yesterday
Blurring Lines: Data, AI, and the New Playbook for Team Velocity
Summary<br />In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and ... Show More
1 h
Nov 16
State, Scale, and Signals: Rethinking Orchestration with Durable Execution
Summary&nbsp;<br />In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, ... Show More
51m 46s
Nov 9
The AI Data Paradox: High Trust in Models, Low Trust in Data
Summary<br />In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in ... Show More
51m 35s
Recommended Episodes
Nov 2024
#262 Self-Service Business Intelligence with Sameer Al-Sakran, CEO at Metabase
We’re improving DataFramed, and we need your help! We want to hear what you have to say about the show, and how we can make it more enjoyable for you—find out more here.We’re often caught chasing the dream of “self-serve” data—a place where data empowers stakeholders to answer th ... Show More
51m 33s
Mar 2025
#295 How To Get Hired As A Data Or AI Engineer with Deepak Goyal, CEO & Founder at Azurelib Academy
The role of data and AI engineers is more critical than ever. With organizations collecting massive amounts of data, the challenge lies in building efficient data infrastructures that can support AI systems and deliver actionable insights. But what does it take to become a succes ... Show More
52m 27s
Nov 4
184: What I’d Learn Instead of Data Science in 2026
<p>Help us become the #1 Data Podcast by leaving a rating &amp; review! We are 67 reviews away! </p><p>I wouldn't try to become a data analyst next here. Here's 4 reasons why and what I'd do instead. </p><p>👩‍💻 Want to land a data job in less than 90 days? 👉 <a href="https://w ... Show More
15m 27s
Jun 2025
Architecting AI-Driven Financial Systems: Innovation at the Intersection of Fintech and Emerging Tech
In this episode of the Data Science Salon Podcast, we sit down with Sasibhushan Rao Chanthati, AVP and Senior Software Engineer at T. Rowe Price, where he’s building the future of finance through intelligent, scalable technologies. Sasi specializes in creating secure digital ecos ... Show More
29m 7s
Sep 9
Leading across technical domains, strategic deep-dives & applying your skills in new industries w/ Simone Kalmakis #231
<p>How do you apply your leadership skills to a new, mission-driven industry and effectively lead teams across multiple technical domains? In this episode, Simone Kalmakis (VPE @ Viam) shares her playbook for successfully transitioning between industries from health-tech and clim ... Show More
43m 17s
Sep 23
925: AI, Automation and the Future of Work, with Oxford’s Prof. Carl Benedikt Frey
Tech innovation’s dependence on economic systems, trust in technology throughout history, and job displacement through AI: The Dieter Schwartz Associate Professor of AI and work at the University of Oxford, Carl Benedikt Frey, talks to Jon Krohn about his latest book, How Progres ... Show More
1h 10m
Aug 27
Amperity Reimagines Data and Developer Workflows with AI - Ep. 271
Derek Slager, co-founder and CTO of Amperity, explores how agentic AI and vibe coding are reshaping enterprise data management and the developer experience on the NVIDIA AI Podcast. Hear how Amperity’s platform unifies customer data, powers advanced analytics, and brings conversa ... Show More
36m 40s
Oct 1
179: How I Use PRIVATE Data ETHICALLY In the New Era of AI
<p>There is an impossible choice most organizations face. Companies building modern AI face a brutal, binary-feeling decision: either ship a privacy-first model that “kinda low key sucks,” or ship a high-performing model that likely exposes sensitive personal data. Luckily, there ... Show More
7m 17s
Mar 2023
The Startup World in Generative AI! With Or Gorodissky, VP of R&D at D-ID - What's AI Podcast Episode 6
<p>This is an interview with <a href="https://www.linkedin.com/in/orgoro" target="_blank" rel="noopener noreferer">Or Gorodissky</a>, VP of R&amp;D at an amazing generative AI startup called D-ID.</p> <p>This was <strong>my first-ever interview</strong>! I hope you will still enj ... Show More
1h 2m
Apr 2025
Specialized AI brains for physical industry
Everyone wants a piece of general purpose models. Instacart has deployed ChatGPT for recipes and meal planning. The Mayo Clinic is using it to summarize patient records. Schneider Electric is using an OpenAI LLM to generate sustainability reports. With such powerful models, what’ ... Show More
37m 2s