logo
episode-header-image
Oct 2024
28m 35s

Data Lakehouses & Apache Iceberg

Massive Studios
About this episode

Alex Merced (@AMdatalakehouse, Senior Tech Evangelist, @dremio) talks about everything data and we dig deep into Apache Iceberg and DataLakehouses.

SHOW: 865

Want to go to All Things Open in Raleigh for FREE? (Oct 27th-29th)

We are offering 5 Free passes, first come, first serve for the Cloudcast Community -> Registration Link

Instructions:

  1. Click reg link
  2. Click “Get Tickets”
  3. Choose ticket option
  4. Proceed with registration (discount will automatically be applied, cost will be $0)

SHOW TRANSCRIPT: The Cloudcast #865 Transcript

SHOW VIDEO: https://youtube.com/@TheCloudcastNET 

CLOUD NEWS OF THE WEEK: - http://bit.ly/cloudcast-cnotw

NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST: - "CLOUDCAST BASICS" 

SHOW NOTES:

Topic 1 - Welcome to the show. Tell us a little bit about your background.

Topic 2 - It’s been a little while since we talked about Data Lakehouses, can you give us a little bit of background on this space, and what the most recent dynamics are around these technologies.

Topic 3 - What are the typical integrations with a Data Lakehouse? How are users/developers typically interacting with Data Lakehouse technologies? [The marketplace for Iceberg catalogs like Nessie and Polaris]

Topic 4 - How does an open data format like Apache Iceberg fit into the bigger picture of data lakehouses, or large scale stores of data?

Topic 5 - How does Dremio enable Iceberg? How does Dremio sit in the intersection of Data Lakehouse, Data Mesh and Data Virtualization trends all of which come from the same fundamental problem, the growing scale of data use cases.

Topic 6 -  We’ve seen companies start to rethink their data in the cloud strategies. Are you seeing on-premises making a comeback for large data applications

FEEDBACK?

Up next
Oct 8
Using AI Reasoning to Prevent AI Scams
Alan Lefort (CEO, @StrongestLayer) discusses how LLM-powered reasoning is transforming phishing security from reactive pattern-matching to predictive threat detection, and why traditional rule-based systems can no longer defend against sophisticated AI-generated phishing attacks. ... Show More
34 m
Oct 5
Will Cloud Providers start acquiring SaaS?
As cloud matures, could the hyperscale cloud providers start looking to acquire SaaS providers to build out a bundled application portfolio? Or are the demands of AI investment too much to pursue that strategy? SHOW: 964SHOW TRANSCRIPT: The Cloudcast #964 TranscriptSHOW VIDEO: ht ... Show More
28m 16s
Oct 1
AI & Cloud Trends for September 2025
Brian Gracely (@bgracely) and Brandon Whichard (@bwhichard, @SoftwareDefTalk) discuss the top stories in Cloud and AI from September 2025.SHOW: 963SHOW TRANSCRIPT: The Cloudcast #963 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcastNET NEW TO CLOUD? CHECK OUT OUR OTHER PODC ... Show More
43m 11s
Recommended Episodes
Feb 2024
Using Trino And Iceberg As The Foundation Of Your Data Lakehouse
Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this ... Show More
58m 46s
Aug 2024
The Evolution of DataOps: Insights from DataKitchen's CEO
Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to simplify the lives of data engineers. Chris explains the challenges faced by data engineers, such as constant system failures ... Show More
53m 30s
May 2022
Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way
Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. Srivatsan Sridharan has had the ... Show More
58m 11s
Apr 2023
2344: Cloudera: Moving Beyond Big Data to Hybrid Data Mastery
I sit down with Chris Royles, EMEA Field CTO at Cloudera, to discuss the evolution of Big Data and why hybrid data is the next challenge for businesses to tackle. In this episode, we explore how the term 'Big Data' has become dated and how the rapid rise of hybrid data has shifte ... Show More
39m 54s
Jan 2024
Designing Data Platforms For Fintech Companies
Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platfor ... Show More
47m 57s
Aug 18
High Performance And Low Overhead Graphs With KuzuDB
SummaryIn this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. ... Show More
1h 1m
Sep 2021
An Exploration Of The Data Engineering Requirements For Bioinformatics
Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a uniqu ... Show More
55m 10s
Jan 2025
3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud
As part of the IT Press Tour in Silicon Valley, I had the opportunity to sit down with David Flynn, CEO of Hammerspace, to explore how the company is redefining the future of enterprise data storage. At a time when AI-driven workloads and hybrid cloud computing are pushing storag ... Show More
24m 26s
Oct 2024
Bring Vector Search And Storage To The Data Lake With Lance
Summary The rapid growth of generative AI applications has prompted a surge of investment in vector databases. While there are numerous engines available now, Lance is designed to integrate with data lake and lakehouse architectures. In this episode Weston Pace explains the inner ... Show More
58m 1s
Jul 2024
Achieving Data Reliability: The Role of Data Contracts in Modern Data Management
Summary Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and prevent ... Show More
49m 26s