logo
episode-header-image
Oct 2018
40m 55s

Using Notebooks As The Unifying Layer Fo...

Tobias Macey
About this episode

Summary

Jupyter notebooks have gained popularity among data scientists as an easy way to do exploratory analysis and build interactive reports. However, this can cause difficulties when trying to move the work of the data scientist into a more standard production environment, due to the translation efforts that are necessary. At Netflix they had the crazy idea that perhaps that last step isn’t necessary, and the production workflows can just run the notebooks directly. Matthew Seal is one of the primary engineers who has been tasked with building the tools and practices that allow the various data oriented roles to unify their work around notebooks. In this episode he explains the rationale for the effort, the challenges that it has posed, the development that has been done to make it work, and the benefits that it provides to the Netflix data platform teams.

Preamble

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • Your host is Tobias Macey and today I’m interviewing Matthew Seal about the ways that Netflix is using Jupyter notebooks to bridge the gap between data roles

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by outlining the motivation for choosing Jupyter notebooks as the core interface for your data teams?
    • Where are you using notebooks and where are you not?


  • What is the technical infrastructure that you have built to suppport that design choice?

  • Which team was driving the effort?

    • Was it difficult to get buy in across teams?


  • How much shared code have you been able to consolidate or reuse across teams/roles?

  • Have you investigated the use of any of the other notebook platforms for similar workflows?

  • What are some of the notebook anti-patterns that you have encountered and what conventions or tooling have you established to discourage them?

  • What are some of the limitations of the notebook environment for the work that you are doing?

  • What have been some of the most challenging aspects of building production workflows on top of Jupyter notebooks?

  • What are some of the projects that are ongoing or planned for the future that you are most excited by?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Up next
Jul 6
Foundational Data Engineering At 2Sigma
SummaryIn this episode of the Data Engineering Podcast Effie Baram, a leader in foundational data engineering at Two Sigma, talks about the complexities and innovations in data engineering within the finance sector. She discusses the critical role of data at Two Sigma, balancing ... Show More
55m 5s
Jun 29
Enabling Agents In The Enterprise With A Platform Approach
SummaryIn this episode of the Data Engineering Podcast Arun Joseph talks about developing and implementing agent platforms to empower businesses with agentic capabilities. From leading AI engineering at Deutsche Telekom to his current entrepreneurial venture focused on multi-agen ... Show More
54m 18s
Jun 18
Dagster's New Era: Modularizing Data Transformation in the Age of AI
SummaryIn this episode of the Data Engineering Podcast we welcome back Nick Schrock, CTO and founder of Dagster Labs, to discuss the evolving landscape of data engineering in the age of AI. As AI begins to impact data platforms and the role of data engineers, Nick shares his insi ... Show More
1h 1m
Recommended Episodes
Mar 2024
LLM Security and Privacy
Sean Falconer (@seanfalconer, Head of Dev Relations @SkyflowAPI, Host @software_daily) talks about security and privacy of LLMs and how to prevent PII (personally identifiable information) from leaking outSHOW: 807 CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotw NEW TO CLO ... Show More
26m 9s
Feb 2021
We Review Resumes, Websites, and Online Presence
In this episode of Syntax, Scott and Wes review resumes, websites, and online presences, and discuss pros and cons, what you should focus on, and more! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at freshbooks.com/syntax and put SYNTAX in the “How did you hear abou ... Show More
1h 7m
Jun 2024
.NET Unwrapped: From Workflow Engines to Identity, A Developer's Journey with Dustin Metzgar
Avalonia XPF This episode of The Modern .NET Show is supported, in part, by Avalonia XPF, a binary-compatible cross-platform fork of WPF, enables WPF apps to run on new platforms with minimal effort and maximum compatibility. Show Notes I want it to be like one of those books tha ... Show More
1h 22m
Jan 2021
How Edgevana CEO Mark Thiele is Streamlining The Way Companies Access Data Centers
Mark Thiele has spent his entire life in and around IT infrastructure, even building his own fair share of data centers. But if there is one thing about the entire process that he finds vexing, it’s the wasted time between when companies start negotiating contracts for data cente ... Show More
46m 5s
Jan 2023
Amjad Masad - The Future of Software Creation - [Invest Like the Best, EP.310]
My guest today is Amjad Masad. Amjad is the founder and CEO of Replit, whose mission is to bring the next billion software creators online. Replit has built a browser-based coding environment that makes coding more fun, collaborative, and approachable. We discuss how that is poss ... Show More
1h 2m
Jun 2024
Microsoft is all-in on AI: Part 2 (Interview)
Mark Russinovich, Eric Boyd & Neha Batra join us to discuss the state of AI for Microsoft and OpenAI at Microsoft Build 2024. It’s safe to say that Microsoft is all-in on AI. Leave us a comment Changelog++ members save 14 minutes on this episode because they made the ads disappea ... Show More
2h 46m
Mar 2020
GitHub Actions and the DevOps Lifecycle
Chris Patterson (@chrisrpatterson, Product Manager for GitHub Actions @GitHub) talks about the evolution of GitHub from a collaboration-centric platform to a DevOps-centric platform, as well as discussing the expanding role of GitHub Actions for developers, DevOps and SREs. SHOW: ... Show More
28m 13s
Dec 2021
Gitpod, iPad Coding, Web3, WTF NFT
In this episode of Syntax, Scott and Wes talk with Geoff and Pauline from Gitpod about developing on Gitpod, Web3, and The NFT Bay. Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at freshbooks.com/syntax and put SYNTAX in the "How did you hear about us?" section. Logr ... Show More
1h 3m
Oct 2020
370: Designing for One Hand
This week, we discuss the tradeoffs and challenges of designing interfaces for one-handed use. In The Sidebar, we talk about strategies for collaborating effectively with brand and product design.Golden Ratio Patrons:Float Float has been a lifeline for teams working remotely in 2 ... Show More
33m 6s
Mar 2021
Potluck — VSCode × Vercel vs Netlify × Models × Mutations × Multi-Vendor Platforms × Websites vs Web Apps × More!
It’s another potluck! In this episode, Scott and Wes answer your questions about VSCode, Vercel vs Netlify, staying up to date with dev concepts, models and mutations, websites vs seb apps, adaptive vs responsive design, and more! Freshbooks - Sponsor Get a 30 day free trial of F ... Show More
58m 44s