logo
episode-header-image
Aug 2024
53m 30s

The Evolution of DataOps: Insights from ...

Tobias Macey
About this episode
Summary
In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to simplify the lives of data engineers. Chris explains the challenges faced by data engineers, such as constant system failures, the need for rapid changes, and high customer demands. Chris delves into the concept of DataOps, its evolution, and the misappropriation of related terms like data mesh and data observability. He emphasizes the importance of focusing on processes and systems rather than just tools to improve data engineering workflows. Chris also introduces DataKitchen's open-source tools, DataOps TestGen and DataOps Observability, designed to automate data quality validation and monitor data journeys in production.
Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
  • Your host is Tobias Macey and today I'm interviewing Chris Bergh about his tireless quest to simplify the lives of data engineers
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what DataKitchen is and the story behind it?
  • You helped to define and popularize "DataOps", which then went through a journey of misappropriation similar to "DevOps", and has since faded in use. What is your view on the realities of "DataOps" today?
  • Out of the popularized wave of "DataOps" tools came subsequent trends in data observability, data reliability engineering, etc. How have those cycles influenced the way that you think about the work that you are doing at DataKitchen?
  • The data ecosystem went through a massive growth period over the past ~7 years, and we are now entering a cycle of consolidation. What are the fundamental shifts that we have gone through as an industry in the management and application of data?
  • What are the challenges that never went away?
  • You recently open sourced the dataops-testgen and dataops-observability tools. What are the outcomes that you are trying to produce with those projects?
  • What are the areas of overlap with existing tools and what are the unique capabilities that you are offering?
  • Can you talk through the technical implementation of your new obserability and quality testing platform?
  • What does the onboarding and integration process look like?
  • Once a team has one or both tools set up, what are the typical points of interaction that they will have over the course of their workday?
  • What are the most interesting, innovative, or unexpected ways that you have seen dataops-observability/testgen used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on promoting DataOps?
  • What do you have planned for the future of your work at DataKitchen?
Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Up next
Aug 18
High Performance And Low Overhead Graphs With KuzuDB
SummaryIn this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. ... Show More
1h 1m
Aug 12
Bridging Data and Decision-Making: AI's Role in Modern Analytics
SummaryIn this episode of the Data Engineering Podcast Lucas Thelosen and Drew Gilson from Gravity talk about their development of Orion, an autonomous data analyst that bridges the gap between data availability and business decision-making. Lucas and Drew share their backgrounds ... Show More
1h 10m
Aug 5
From Bits to Tables: The Evolution of S3 Storage
SummaryIn this episode of the Data Engineering Podcast Andy Warfield talks about the innovative functionalities of S3 Tables and Vectors and their integration into modern data stacks. Andy shares his journey through the tech industry and his role at Amazon, where he collaborates ... Show More
50m 8s
Recommended Episodes
Nov 2024
#262 Self-Service Business Intelligence with Sameer Al-Sakran, CEO at Metabase
We’re improving DataFramed, and we need your help! We want to hear what you have to say about the show, and how we can make it more enjoyable for you—find out more here.We’re often caught chasing the dream of “self-serve” data—a place where data empowers stakeholders to answer th ... Show More
51m 33s
Nov 2024
#259 Getting the Data For Your Data-Driven Decisions with Jonathan Bloch & Scott Voigt
We’re improving DataFramed, and we need your help! We want to hear what you have to say about the show, and how we can make it more enjoyable for you—find out more here.Understanding where the data you use comes from, how to use it responsibly, and how to maximize its value has b ... Show More
46m 16s
Dec 2024
Best of 2024: The Art of Prompt Engineering with Alex Banks, Founder and Educator, Sunday Signal
As we look back at 2024, we're highlighting some of our favourite episodes of the year, and with 100 of them to choose from, it wasn't easy!The four guests we'll be recapping with are:Lea Pica - A celebrity in the data storytelling and visualisation space. Richie and Lea cover th ... Show More
44m 58s
Apr 2023
2344: Cloudera: Moving Beyond Big Data to Hybrid Data Mastery
I sit down with Chris Royles, EMEA Field CTO at Cloudera, to discuss the evolution of Big Data and why hybrid data is the next challenge for businesses to tackle. In this episode, we explore how the term 'Big Data' has become dated and how the rapid rise of hybrid data has shifte ... Show More
39m 54s
Feb 2025
#282 Navigating the Challenges of Product Integrations with Gil Feig, Co-Founder and CTO of Merge
As the software landscape becomes more fragmented, the importance of product integrations continues to rise. For those working in data and engineering roles, this presents both challenges and opportunities. How do you efficiently manage and scale integrations across diverse syste ... Show More
27m 46s
Jul 2024
A Story On Data: Evolution, Technology and More w/ Mohammad Mortada | Below The Fold
In this episode of 'Below the Fold,' Mohamed and Ibrahim dive into the dynamics of data in marketing. With Mohamed's extensive background in advertising and customer experience at Oracle, and Ibrahim's expertise in programmatic marketing, they discuss how data usage has evolved f ... Show More
46m 25s
Aug 2024
Driving Supply Chain Solutions for Life Sciences with AI - with Andrei Tadique of Takeda
Today's guest is Andrei Tadique, Director and Head of Manufacturing Science at Takeda Pharmaceuticals. Andrei joins us on today's podcast to discuss the biggest challenges for Life Sciences leaders in driving logistics and supply chain workflows. In the course of his conversation ... Show More
24m 19s
Jan 2025
The Role of Analytics in Shaping the Future of MLOps
Sophia Rowland, Senior Product Manager at SAS, discusses her journey from data science to product management at SAS, focusing on the integration of AI and analytics. She explains the concepts of Model Ops and ML Ops, the challenges organizations face in operationalizing machine l ... Show More
32m 42s
Oct 2024
Why Human Data is Key to AI: Alexandr Wang from Scale AI
In this conversation with a16z general partner David George, Scale AI founder and CEO Alexandr Wang discusses the three pillars of AI—models, compute, and data—and how creating abundant data is core to the evolution of gen AI. With Scale’s work across enterprise, automotive, and ... Show More
35m 8s
Sep 2024
The tool product managers love and is disrupting Jira | Karri Saarinen, CEO at Linear | E234
In this episode of the Product Podcast we chat with Karri Saarinen, the CEO at Linear. It's the fastest-growing and most beloved project management tool in the world. The company is valued at $400 million, and has raised $52 million in funding from Accel, Sequoia, and some o ... Show More
31m 29s