logo
episode-header-image
Apr 2024
56m 23s

Establish A Single Source Of Truth For Y...

Tobias Macey
About this episode

Summary

Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. In order to enable this use case, while maintaining a single point of access, the semantic layer has evolved as a technological solution to the problem. In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Leverage Datafold's fast cross-database data diffing and Monitoring to test your replication pipelines automatically and continuously. Validate consistency between source and target at any scale, and receive alerts about any discrepancies. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold.
  • Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!
  • Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
  • Your host is Tobias Macey and today I'm interviewing Artyom Keydunov about the role of the semantic layer in your data platform

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by outlining the technical elements of what it means to have a "semantic layer"?
  • In the past couple of years there was a rapid hype cycle around the "metrics layer" and "headless BI", which has largely faded. Can you give your assessment of the current state of the industry around the adoption/implementation of these concepts?
  • What are the benefits of having a discrete service that offers the business metrics/semantic mappings as opposed to implementing those concepts as part of a more general system? (e.g. dbt, BI, warehouse marts, etc.)
    • At what point does it become necessary/beneficial for a team to adopt such a service?
    • What are the challenges involved in retrofitting a semantic layer into a production data system?
  • evolution of requirements/usage patterns
  • technical complexities/performance and cost optimization
  • What are the most interesting, innovative, or unexpected ways that you have seen Cube used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Cube?
  • When is Cube/a semantic layer the wrong choice?
  • What do you have planned for the future of Cube?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By:

Support Data Engineering Podcast

Up next
Jul 6
Foundational Data Engineering At 2Sigma
SummaryIn this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing ... Show More
55m 5s
Jun 29
Enabling Agents In The Enterprise With A Platform Approach
SummaryIn this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current entrepreneurial venture focused on multi-agen ... Show More
54m 18s
Jun 18
Dagster's New Era: Modularizing Data Transformation in the Age of AI
SummaryIn this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insi ... Show More
1h 1m
Recommended Episodes
Mar 2022
Bayesian Machine Learning with Ravin Kumar (Ep. 191)
This is one episode where passion for math, statistics and computers are merged. I have a very interesting conversation with Ravin,  data scientist at Google where he uses data to inform decisions. He has previously worked at Sweetgreen, designing systems that would benefit team ... Show More
31m 12s
Jun 2021
Buying and Selling Homes Algorithmically with Opendoor’s VP of Research and Data Science, Kushal Chakrabarti
For many people, the process of buying and selling a home will undoubtedly be the most difficult decisions they will make in their lifetime. Is the price you’re paying for your home fair? Is the price you’re selling your home for an adequate sale price? For a long time, realtors ... Show More
32m 26s
Dec 2020
The Algorithms that Bring you Style with Stitch Fix’s Director of Data Science, Tatsiana Maskalevich
The old saying, “look good, feel good,'' fits Stitch Fix perfectly. The direct-to-consumer, online personal styling service has boomed due to its ability to not only match consumers with trendy and comfortable clothes, but to make it a personalized experience for each buyer.“At t ... Show More
52m 39s
Feb 2023
Shorten the distance between production data and insight
Modern networked applications generate a lot of data, and every business wants to make the most of that data. Most of the time, that means moving production data through some transformation process to get it ready for the analytics process. But what if you could have in-app analy ... Show More
20m 27s
Jan 2022
Academics and Data Science Innovation with Dr. David Bader, Distinguished Professor and Director, Institute for Data Science, New Jersey Institute of Technology
The data science field is expanding because so many businesses and other institutions require skilled workers who can manage data as well as provide insights. Companies and students are clamoring for more academic programs. There is great need, but academic institutions are still ... Show More
39m 32s
Nov 2021
Time Plus Data Equals Efficiency with Paul Dix, the Founder and CTO of InfluxData and the Creator of InfluxDB
If the topic of databases is brought up to certain people, their eyes may gloss over. But if that happened, that would be because they just don’t know the awesome power of databases. Data can be valuable but only if it is contextualized, and time is an extremely relevant aspect t ... Show More
36m 4s