logo
episode-header-image
Oct 2024
28m 35s

Data Lakehouses & Apache Iceberg

Massive Studios
About this episode

Alex Merced (@AMdatalakehouse, Senior Tech Evangelist, @dremio) talks about everything data and we dig deep into Apache Iceberg and DataLakehouses.

SHOW: 865

Want to go to All Things Open in Raleigh for FREE? (Oct 27th-29th)

We are offering 5 Free passes, first come, first serve for the Cloudcast Community -> Registration Link

Instructions:

  1. Click reg link
  2. Click “Get Tickets”
  3. Choose ticket option
  4. Proceed with registration (discount will automatically be applied, cost will be $0)

SHOW TRANSCRIPT: The Cloudcast #865 Transcript

SHOW VIDEO: https://youtube.com/@TheCloudcastNET 

CLOUD NEWS OF THE WEEK: - http://bit.ly/cloudcast-cnotw

NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST: - "CLOUDCAST BASICS" 

SHOW NOTES:

Topic 1 - Welcome to the show. Tell us a little bit about your background.

Topic 2 - It’s been a little while since we talked about Data Lakehouses, can you give us a little bit of background on this space, and what the most recent dynamics are around these technologies.

Topic 3 - What are the typical integrations with a Data Lakehouse? How are users/developers typically interacting with Data Lakehouse technologies? [The marketplace for Iceberg catalogs like Nessie and Polaris]

Topic 4 - How does an open data format like Apache Iceberg fit into the bigger picture of data lakehouses, or large scale stores of data?

Topic 5 - How does Dremio enable Iceberg? How does Dremio sit in the intersection of Data Lakehouse, Data Mesh and Data Virtualization trends all of which come from the same fundamental problem, the growing scale of data use cases.

Topic 6 -  We’ve seen companies start to rethink their data in the cloud strategies. Are you seeing on-premises making a comeback for large data applications

FEEDBACK?

Up next
Today
Data Pipelines with Apache Airflow
Julian LaNeve (@JulianLaneve, CTO @astronomerio) discusses data pipelines, Apache Airflow, Astronomer’s managed offering, and the benefits of data pipelines for both developers and operations.SHOW: 939SHOW TRANSCRIPT: The Cloudcast #939 TranscriptSHOW VIDEO: https://youtube.com/@ ... Show More
25m 36s
Jul 6
A Mid-Year Cloud Check-In
As we get to the mid-point of 2025, let’s take a look at where the cloud is - what’s doing well, what’s going through some changes, and what might be in store for the rest of 2025. SHOW: 938SHOW TRANSCRIPT: The Cloudcast #938 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcas ... Show More
25m 53s
Jul 1
AI & Cloud Trends for June 2025
Brian Gracely (@bgracely) and Brandon Whichard (@bwhichard, @SoftwareDefTalk) discuss the top stories in Cloud and AI from June 2025.SHOW: 937SHOW TRANSCRIPT: The Cloudcast #937 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcastNET NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST: ... Show More
42m 46s
Recommended Episodes
Feb 2024
Using Trino And Iceberg As The Foundation Of Your Data Lakehouse
Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this ... Show More
58m 46s
Aug 2024
The Evolution of DataOps: Insights from DataKitchen's CEO
Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to simplify the lives of data engineers. Chris explains the challenges faced by data engineers, such as constant system failures ... Show More
53m 30s
May 2022
Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way
Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. Srivatsan Sridharan has had the ... Show More
58m 11s
Apr 2023
2344: Cloudera: Moving Beyond Big Data to Hybrid Data Mastery
I sit down with Chris Royles, EMEA Field CTO at Cloudera, to discuss the evolution of Big Data and why hybrid data is the next challenge for businesses to tackle. In this episode, we explore how the term 'Big Data' has become dated and how the rapid rise of hybrid data has shifte ... Show More
39m 54s
Jan 2024
Designing Data Platforms For Fintech Companies
Summary Working with financial data requires a high degree of rigor due to the numerous regulations and the risks involved in security breaches. In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platfor ... Show More
47m 57s
Sep 2021
An Exploration Of The Data Engineering Requirements For Bioinformatics
Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a uniqu ... Show More
55m 10s
Jan 2025
3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud
As part of the IT Press Tour in Silicon Valley, I had the opportunity to sit down with David Flynn, CEO of Hammerspace, to explore how the company is redefining the future of enterprise data storage. At a time when AI-driven workloads and hybrid cloud computing are pushing storag ... Show More
24m 26s
Oct 2024
Bring Vector Search And Storage To The Data Lake With Lance
Summary The rapid growth of generative AI applications has prompted a surge of investment in vector databases. While there are numerous engines available now, Lance is designed to integrate with data lake and lakehouse architectures. In this episode Weston Pace explains the inner ... Show More
58m 1s
Jul 2024
Achieving Data Reliability: The Role of Data Contracts in Modern Data Management
Summary Data contracts are both an enforcement mechanism for data quality, and a promise to downstream consumers. In this episode Tom Baeyens returns to discuss the purpose and scope of data contracts, emphasizing their importance in achieving reliable analytical data and prevent ... Show More
49m 26s
Mar 2021
Data Quality Management For The Whole Team With Soda Data
Summary Data quality is on the top of everyone’s mind recently, but getting it right is as challenging as ever. One of the contributing factors is the number of people who are involved in the process and the potential impact on the business if something goes wrong. In this episod ... Show More
58 m