logo
episode-header-image
Dec 2024
59m 56s

The Art of Database Selection and Evolut...

Tobias Macey
About this episode
Summary
In this episode of the Data Engineering Podcast Sam Kleinman talks about the pivotal role of databases in software engineering. Sam shares his journey into the world of data and discusses the complexities of database selection, highlighting the trade-offs between different database architectures and how these choices affect system design, query performance, and the need for ETL processes. He emphasizes the importance of understanding specific requirements to choose the right database engine and warns against over-engineering solutions that can lead to increased complexity. Sam also touches on the tendency of engineers to move logic to the application layer due to skepticism about database longevity and advises teams to leverage database capabilities instead. Finally, he identifies a significant gap in data management tooling: the lack of easy-to-use testing tools for database interactions, highlighting the need for better testing paradigms to ensure reliability and reduce bugs in data-driven applications.


Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • It’s 2024, why are we still doing data migrations by hand? Teams spend months—sometimes years—manually converting queries and validating data, burning resources and crushing morale. Datafold's AI-powered Migration Agent brings migrations into the modern era. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today to learn how Datafold can automate your migration and ensure source to target parity. 
  • Your host is Tobias Macey and today I'm interviewing Sam Kleinman about database tradeoffs across operating environments and axes of scale
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • The database engine you use has a substantial impact on how you architect your overall system. When starting a greenfield project, what do you see as the most important factor to consider when selecting a database?
  • points of friction introduced by database capabilities
  • embedded databases (e.g. SQLite, DuckDB, LanceDB), when to use and when do they become a bottleneck
  • single-node database engines (e.g. Postgres, MySQL), when are they legitimately a problem
  • distributed databases (e.g. CockroachDB, PlanetScale, MongoDB)
  • polyglot storage vs. general-purpose/multimodal databases
  • federated queries, benefits and limitations 
    • ease of integration vs. variability of performance and access control

Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Up next
Oct 5
The Data Model That Captures Your Business: Metric Trees Explained
SummaryIn this episode of the Data Engineering Podcast Vijay Subramanian, founder and CEO of Trace, talks about metric trees - a new approach to data modeling that directly captures a company's business model. Vijay shares insights from his decade-long experience building data pr ... Show More
1h 1m
Sep 28
From GPUs-as-a-Service to Workloads-as-a-Service: Flex AI’s Path to High-Utilization AI Infra
SummaryIn this crossover episode of the AI Engineering Podcast, host Tobias Macey interviews Brijesh Tripathi, CEO of Flex AI, about revolutionizing AI engineering by removing DevOps burdens through "workload as a service". Brijesh shares his expertise from leading AI/HPC archite ... Show More
56m 31s
Sep 18
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
SummaryIn this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to m ... Show More
52m 58s
Recommended Episodes
Jun 2024
How Avangrid built a data foundation for AI
Mark Waclawiak was tuned into energy issues at an early age. Both his parents worked in the industry: his mom designed electrical systems for buildings and his dad worked at the utility. So the importance of electricity was always apparent to him.When he started working for a uti ... Show More
24m 35s
Sep 16
SurrealDB 3.0 and Building Event-Driven AI Applications with Tobie Morgan Hitchcock
Modern application development often involves juggling multiple types of databases to handle diverse data models. The lack of unification can lead to complex architectures with attendant security concerns and fragmented development workflows. SurrealDB is an open-source, multi-mo ... Show More
55m 18s
Mar 2025
#295 How To Get Hired As A Data Or AI Engineer with Deepak Goyal, CEO & Founder at Azurelib Academy
The role of data and AI engineers is more critical than ever. With organizations collecting massive amounts of data, the challenge lies in building efficient data infrastructures that can support AI systems and deliver actionable insights. But what does it take to become a succes ... Show More
52m 27s
Jan 2025
3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud
As part of the IT Press Tour in Silicon Valley, I had the opportunity to sit down with David Flynn, CEO of Hammerspace, to explore how the company is redefining the future of enterprise data storage. At a time when AI-driven workloads and hybrid cloud computing are pushing storag ... Show More
24m 26s
Jul 2022
IoT, IIoT and Managing Edge Data
Brian Gilmore (@BrianMGilmore, Director IoT/Emerging Technology @InfluxDB) talks about Edge and Industrial Edge Computing, as well as application and data challenges at the edge.SHOW: 634CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST ... Show More
35m 37s
Nov 2024
833: The 10 Reasons AI Projects Fail, with Dr. Martin Goodson
Martin Goodson speaks to Jon Krohn about what he would add to his viral article “Ten Ways Your Data Project is Going to Fail”, why practitioners always need to be present at AI policy discussions, and Evolution AI’s breakthroughs in computer vision and NLP. This episode is brough ... Show More
1h 25m
Aug 21
Evolution designed us to die fast; we can change that — Jacob Kimmel
Jacob Kimmel thinks he can find the transcription factors to reverse aging. We do a deep dive on why this might be plausible and why evolution hasn’t optimized for longevity. We also talk about why drug discovery has been getting exponentially harder, and what a new platform for ... Show More
1h 44m
Sep 18
How People Actually Use ChatGPT
This episode of AI Daily Brief dives into two important reports on how people are really using AI tools like ChatGPT and Claude. OpenAI’s massive study with Harvard and NBER reveals consumer patterns across 1.5 million conversations, while Anthropic’s Economic Index tracks broade ... Show More
27m 39s
Sep 2023
How to write high-performance SQL for your Postgres database
pgnanalyze helps users deliver consistent PostgreSQL performance and availability at any scale. Get started with a free trial or explore their docs. You can also find them on YouTube, where Lukas posts a weekly show called 5mins of Postgres.Lukas was a founding engineer of Citus ... Show More
24m 44s
Jun 2025
Architecting AI-Driven Financial Systems: Innovation at the Intersection of Fintech and Emerging Tech
In this episode of the Data Science Salon Podcast, we sit down with Sasibhushan Rao Chanthati, AVP and Senior Software Engineer at T. Rowe Price, where he’s building the future of finance through intelligent, scalable technologies. Sasi specializes in creating secure digital ecos ... Show More
29m 7s