logo
episode-header-image
Oct 2018
40m 55s

Using Notebooks As The Unifying Layer Fo...

Tobias Macey
About this episode

Summary

Jupyter notebooks have gained popularity among data scientists as an easy way to do exploratory analysis and build interactive reports. However, this can cause difficulties when trying to move the work of the data scientist into a more standard production environment, due to the translation efforts that are necessary. At Netflix they had the crazy idea that perhaps that last step isn’t necessary, and the production workflows can just run the notebooks directly. Matthew Seal is one of the primary engineers who has been tasked with building the tools and practices that allow the various data oriented roles to unify their work around notebooks. In this episode he explains the rationale for the effort, the challenges that it has posed, the development that has been done to make it work, and the benefits that it provides to the Netflix data platform teams.

Preamble

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute.
  • Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
  • Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat
  • Your host is Tobias Macey and today I’m interviewing Matthew Seal about the ways that Netflix is using Jupyter notebooks to bridge the gap between data roles

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by outlining the motivation for choosing Jupyter notebooks as the core interface for your data teams?
    • Where are you using notebooks and where are you not?


  • What is the technical infrastructure that you have built to suppport that design choice?

  • Which team was driving the effort?

    • Was it difficult to get buy in across teams?


  • How much shared code have you been able to consolidate or reuse across teams/roles?

  • Have you investigated the use of any of the other notebook platforms for similar workflows?

  • What are some of the notebook anti-patterns that you have encountered and what conventions or tooling have you established to discourage them?

  • What are some of the limitations of the notebook environment for the work that you are doing?

  • What have been some of the most challenging aspects of building production workflows on top of Jupyter notebooks?

  • What are some of the projects that are ongoing or planned for the future that you are most excited by?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Up next
Nov 24
Blurring Lines: Data, AI, and the New Playbook for Team Velocity
Summary<br />In this crossover episode, Max Beauchemin explores how multiplayer, multi‑agent engineering is transforming the way individuals and teams build data and AI systems. He digs into the shifting boundary between data and AI engineering, the rise of “context as code,” and ... Show More
1 h
Nov 16
State, Scale, and Signals: Rethinking Orchestration with Durable Execution
Summary&nbsp;<br />In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, ... Show More
51m 46s
Nov 9
The AI Data Paradox: High Trust in Models, Low Trust in Data
Summary<br />In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in ... Show More
51m 35s
Recommended Episodes
Mar 2024
LLM Security and Privacy
<p>Sean Falconer (@seanfalconer, Head of Dev Relations @SkyflowAPI, Host @software_daily) talks about security and privacy of LLMs and how to prevent PII (personally identifiable information) from leaking out</p><p><b>SHOW: 807<br/><br/>CLOUD NEWS OF THE WEEK - </b><a href='http: ... Show More
26m 9s
Feb 2021
We Review Resumes, Websites, and Online Presence
In this episode of Syntax, Scott and Wes review resumes, websites, and online presences, and discuss pros and cons, what you should focus on, and more! Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at freshbooks.com/syntax and put SYNTAX in the “How did you hear ab ... Show More
1h 7m
Jun 2024
.NET Unwrapped: From Workflow Engines to Identity, A Developer's Journey with Dustin Metzgar
<h3>Avalonia XPF</h3> <p>This episode of The Modern .NET Show is supported, in part, by <a href= "http://avaloniaui.net/themoderndotnetshow?utm_source=Podcasts&utm_campaign=The+.Modern+NET+Show+s6e20" target="_blank" rel="noopener">Avalonia XPF</a>, a binary-compatible cross-plat ... Show More
1h 22m
Jan 2021
How Edgevana CEO Mark Thiele is Streamlining The Way Companies Access Data Centers
<p><a href="https://www.linkedin.com/in/markthiele/">Mark Thiele</a> has spent his entire life in and around IT infrastructure, even building his own fair share of data centers. But if there is one thing about the entire process that he finds vexing, it’s the wasted time between ... Show More
46m 5s
Jan 2023
Amjad Masad - The Future of Software Creation - [Invest Like the Best, EP.310]
My guest today is Amjad Masad. Amjad is the founder and CEO of Replit, whose mission is to bring the next billion software creators online. Replit has built a browser-based coding environment that makes coding more fun, collaborative, and approachable. We discuss how that is poss ... Show More
1h 2m
Jun 2024
Microsoft is all-in on AI: Part 2 (Interview)
Mark Russinovich, Eric Boyd & Neha Batra join us to discuss the state of AI for Microsoft and OpenAI at Microsoft Build 2024. It's safe to say that Microsoft is all-in on AI. 
2h 46m
Mar 2020
GitHub Actions and the DevOps Lifecycle
<p>Chris Patterson (@chrisrpatterson, Product Manager for GitHub Actions @GitHub) talks about the evolution of GitHub from a collaboration-centric platform to a DevOps-centric platform, as well as discussing the expanding role of GitHub Actions for developers, DevOps and SREs. </ ... Show More
28m 13s
Dec 2021
Gitpod, iPad Coding, Web3, WTF NFT
In this episode of Syntax, Scott and Wes talk with Geoff and Pauline from Gitpod about developing on Gitpod, Web3, and The NFT Bay. Freshbooks - Sponsor Get a 30 day free trial of Freshbooks at freshbooks.com/syntax and put SYNTAX in the "How did you hear about us?" section. ... Show More
1h 3m
Oct 2020
370: Designing for One Hand
This week, we discuss the tradeoffs and challenges of designing interfaces for one-handed use. In The Sidebar, we talk about strategies for collaborating effectively with brand and product design.Golden Ratio Patrons:Float Float has been a lifeline for teams working remotely in 2 ... Show More
33m 6s
Mar 2021
Potluck — VSCode × Vercel vs Netlify × Models × Mutations × Multi-Vendor Platforms × Websites vs Web Apps × More!
It’s another potluck! In this episode, Scott and Wes answer your questions about VSCode, Vercel vs Netlify, staying up to date with dev concepts, models and mutations, websites vs seb apps, adaptive vs responsive design, and more! Freshbooks - Sponsor Get a 30 day free trial of ... Show More
58m 51s