logo
episode-header-image
Mar 2024
50m 44s

Adding Anomaly Detection And Observabili...

Tobias Macey
About this episode

Summary

Working with data is a complicated process, with numerous chances for something to go wrong. Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. While there are numerous products available to provide that visibility, they all have different technologies and workflows that they focus on. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. In this episode Maayan Salom explores the approach that she has taken to bring observability, enhanced testing capabilities, and anomaly detection into every step of the dbt developer experience.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
  • Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!
  • This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Leverage Datafold's fast cross-database data diffing and Monitoring to test your replication pipelines automatically and continuously. Validate consistency between source and target at any scale, and receive alerts about any discrepancies. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold.
  • Your host is Tobias Macey and today I'm interviewing Maayan Salom about how to incorporate observability into a dbt-oriented workflow and how Elementary can help

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by outlining what elements of observability are most relevant for dbt projects?
  • What are some of the common ad-hoc/DIY methods that teams develop to acquire those insights?
    • What are the challenges/shortcomings associated with those approaches?
  • Over the past ~3 years there were numerous data observability systems/products created. What are some of the ways that the specifics of dbt workflows are not covered by those generalized tools?
    • What are the insights that can be more easily generated by embedding into the dbt toolchain and development cycle?
  • Can you describe what Elementary is and how it is designed to enhance the development and maintenance work in dbt projects?
  • How is Elementary designed/implemented?
    • How have the scope and goals of the project changed since you started working on it?
    • What are the engineering challenges/frustrations that you have dealt with in the creation and evolution of Elementary?
  • Can you talk us through the setup and workflow for teams adopting Elementary in their dbt projects?
  • How does the incorporation of Elementary change the development habits of the teams who are using it?
  • What are the most interesting, innovative, or unexpected ways that you have seen Elementary used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Elementary?
  • When is Elementary the wrong choice?
  • What do you have planned for the future of Elementary?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By:

Support Data Engineering Podcast

Up next
Jul 6
Foundational Data Engineering At 2Sigma
SummaryIn this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing ... Show More
55m 5s
Jun 29
Enabling Agents In The Enterprise With A Platform Approach
SummaryIn this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current entrepreneurial venture focused on multi-agen ... Show More
54m 18s
Jun 18
Dagster's New Era: Modularizing Data Transformation in the Age of AI
SummaryIn this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insi ... Show More
1h 1m
Recommended Episodes
Nov 2021
Time Plus Data Equals Efficiency with Paul Dix, the Founder and CTO of InfluxData and the Creator of InfluxDB
If the topic of databases is brought up to certain people, their eyes may gloss over. But if that happened, that would be because they just don’t know the awesome power of databases. Data can be valuable but only if it is contextualized, and time is an extremely relevant aspect t ... Show More
36m 4s
Mar 2022
Bayesian Machine Learning with Ravin Kumar (Ep. 191)
This is one episode where passion for math, statistics and computers are merged. I have a very interesting conversation with Ravin,  data scientist at Google where he uses data to inform decisions. He has previously worked at Sweetgreen, designing systems that would benefit team ... Show More
31m 12s
Aug 2023
2476: ThoughtSpot - How AI Analytics is Redefining Business Intelligence
In the rapidly evolving world of data analytics, staying ahead of the curve is essential. Today on Tech Talks Daily, I'm thrilled to have Sumeet Arora from ThoughtSpot to walk us through their game-changing announcements. ThoughtSpot is already renowned for its advanced analytics ... Show More
33m 55s
Jun 2021
Buying and Selling Homes Algorithmically with Opendoor’s VP of Research and Data Science, Kushal Chakrabarti
For many people, the process of buying and selling a home will undoubtedly be the most difficult decisions they will make in their lifetime. Is the price you’re paying for your home fair? Is the price you’re selling your home for an adequate sale price? For a long time, realtors ... Show More
32m 26s
Jan 2022
Academics and Data Science Innovation with Dr. David Bader, Distinguished Professor and Director, Institute for Data Science, New Jersey Institute of Technology
The data science field is expanding because so many businesses and other institutions require skilled workers who can manage data as well as provide insights. Companies and students are clamoring for more academic programs. There is great need, but academic institutions are still ... Show More
39m 32s
Aug 2018
The Future of Computing
In this episode, we are joined by Alex Wright-Gladstein, CEO and co-founder of Ayar Labs. Ayar Labs has developed new electronic-photonic integrated circuits that move data using light instead of electricity. Alex shares exciting insights around the future of computing with light ... Show More
29m 8s
Oct 2023
#628: Data on EKS
Organizations use their data to make better decisions and build innovative experiences for their customers. With the exponential growth in data, and the rapid pace of innovation in machine learning (ML), there is a growing need to build modern data applications that are agile and ... Show More
20m 56s
Jun 2024
Making ETL pipelines a thing of the past
RelationalAI’s first big partner is Snowflake, meaning customers can now start using their data with GenAI without worrying about the privacy, security, and governance hassle that would come with porting their data to a new cloud provider. The company promises it can also add met ... Show More
26m 13s