logo
episode-header-image
Apr 2025
39m 49s

Simplifying Data Pipelines with Durable ...

Tobias Macey
About this episode
Summary
In this episode of the Data Engineering Podcast Jeremy Edberg, CEO of DBOS, about durable execution and its impact on designing and implementing business logic for data systems. Jeremy explains how DBOS's serverless platform and orchestrator provide local resilience and reduce operational overhead, ensuring exactly-once execution in distributed systems through the use of the Transact library. He discusses the importance of version management in long-running workflows and how DBOS simplifies system design by reducing infrastructure needs like queues and CI pipelines, making it beneficial for data pipelines, AI workloads, and agentic AI.


Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
  • Your host is Tobias Macey and today I'm interviewing Jeremy Edberg about durable execution and how it influences the design and implementation of business logic
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what DBOS is and the story behind it?
  • What is durable execution?
    • What are some of the notable ways that inclusion of durable execution in an application architecture changes the ways that the rest of the application is implemented? (e.g. error handling, logic flow, etc.)
  • Many data pipelines involve complex, multi-step workflows. How does DBOS simplify the creation and management of resilient data pipelines? 
  • How does durable execution impact the operational complexity of data management systems?
  • One of the complexities in durable execution is managing code/data changes to workflows while existing executions are still processing. What are some of the useful patterns for addressing that challenge and how does DBOS help?
  • Can you describe how DBOS is architected?
    • How have the design and goals of the system changed since you first started working on it?
  • What are the characteristics of Postgres that make it suitable for the persistence mechanism of DBOS?
  • What are the guiding principles that you rely on to determine the boundaries between the open source and commercial elements of DBOS?
  • What are the most interesting, innovative, or unexpected ways that you have seen DBOS used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on DBOS?
  • When is DBOS the wrong choice?
  • What do you have planned for the future of DBOS?
Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Up next
Oct 5
The Data Model That Captures Your Business: Metric Trees Explained
SummaryIn this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data pr ... Show More
1h 1m
Sep 28
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
SummaryIn this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC archite ... Show More
56m 31s
Sep 18
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
SummaryIn this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to m ... Show More
52m 58s
Recommended Episodes
Jan 2025
You’re Using the Smartest GPT Model Wrong (GPT o1 Full Tutorial)
Episode 42: Are you truly unlocking the full potential of OpenAI's 01 models? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) dive deep into the capabilities of ChatGPT01 and GPT01 Pro, offering insights to ensure you're not overlooking these power ... Show More
38m 54s
Sep 18
How People Actually Use ChatGPT
This episode of AI Daily Brief dives into two important reports on how people are really using AI tools like ChatGPT and Claude. OpenAI’s massive study with Harvard and NBER reveals consumer patterns across 1.5 million conversations, while Anthropic’s Economic Index tracks broade ... Show More
27m 39s
Nov 2024
Build An App with a Backend Using Ai in 20 min (Cursor Ai, Replit, Firebase, Wispr Flow)
Episode 32: How can you build an app with a backend using AI in just 20 minutes? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) sit down with AI enthusiast Riley Brown (https://x.com/rileybrown_ai) to explore this exciting and challenging process. ... Show More
39m 34s
Mar 2025
#295 How To Get Hired As A Data Or AI Engineer with Deepak Goyal, CEO & Founder at Azurelib Academy
The role of data and AI engineers is more critical than ever. With organizations collecting massive amounts of data, the challenge lies in building efficient data infrastructures that can support AI systems and deliver actionable insights. But what does it take to become a succes ... Show More
52m 27s
Apr 2025
Is Your AI Knowledge Mature Enough With Eryn Peters
Welcome to the Artificial Intelligence Podcast with Jonathan Green! In this insightful episode, we delve into the critical question surrounding AI literacy and job security with our expert guest, Eryn Peters, who specializes in AI maturity and workforce transformation.Eryn shares ... Show More
31m 41s
Sep 30
Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]
My guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundr ... Show More
1h 58m
Jul 2022
IoT, IIoT and Managing Edge Data
Brian Gilmore (@BrianMGilmore, Director IoT/Emerging Technology @InfluxDB) talks about Edge and Industrial Edge Computing, as well as application and data challenges at the edge.SHOW: 634CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST ... Show More
35m 37s
Sep 15
#321 Developing Financial AI Products at Experian with Vijay Mehta, EVP of Global Solutions & Analytics at Experian
Financial institutions are racing to harness the power of AI, but the path to implementation is filled with challenges. From feature engineering to model deployment, the technical complexities of AI adoption in finance require careful navigation of both technological and regulato ... Show More
49m 28s
Jul 9
Data Pipelines with Apache Airflow
Julian LaNeve (@JulianLaneve, CTO @astronomerio) discusses data pipelines, Apache Airflow, Astronomer’s managed offering, and the benefits of data pipelines for both developers and operations.SHOW: 939SHOW TRANSCRIPT: The Cloudcast #939 TranscriptSHOW VIDEO: https://youtube.com/@ ... Show More
25m 36s
Apr 2025
MLA 024 Code AI MCP Servers, ML Engineering
Tool use in code AI agents allows for both in-editor code completion and agent-driven file and command actions, while the Model Context Protocol (MCP) standardizes how these agents communicate with external and internal tools. MCP integration broadens the automation capabilities ... Show More
43m 38s