About this episode
Summary
Building a machine learning application is inherently complex. Once it becomes necessary to scale the operation or training of the model, or introduce online re-training the process becomes even more challenging. In order to reduce the operational burden of AI developers Robert Nishihara helped to create the Ray framework that handles the distributed computing aspects of machine learning operations. To support the ongoing development and simplify adoption of Ray he co-founded Anyscale. In this episode he re-joins the show to share how the project, its community, and the ecosystem around it have grown and evolved over the intervening two years. He also explains how the techniques and adoption of machine learning have influenced the direction of the project.
Announcements
- Hello and welcome to Podcast.__init__, the podcast about Python’s role in data and science.
- When you’re ready to launch your next app or want to try a project you hear about on the show, you’ll need somewhere to deploy it, so take a look at our friends over at Linode. With the launch of their managed Kubernetes platform it’s easy to get started with the next generation of deployment and scaling, powered by the battle tested Linode platform, including simple pricing, node balancers, 40Gbit networking, dedicated CPU and GPU instances, and worldwide data centers. Go to pythonpodcast.com/linode and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Your host as usual is Tobias Macey and today I’m interviewing Robert Nishihara about his work at Anyscale and the Ray distributed execution framework
Interview
- Introductions
- How did you get introduced to Python?
- Can you describe what Anyscale is and the story behind it?
- How has the Ray project and ecosystem evolved since we last spoke? (2 years ago)
- How has the landscape of AI/ML technologies and techniques shifted in that time?
- What are the main areas where organizations are trying to apply ML/AI?
- What are some of the issues that teams encounter when trying to move from prototype to production with ML/AI applications?
- What are the features of Ray that help to mitigate those challenges?
- With the introduction of more widely available streaming/real-time technologies the viability of reinforcement learning has increased. What new challenges does that approach introduce?
- What are some of the operational complexities associated with managing a deployment of Ray?
- What are some of the specialized utilities that you have had to develop to maintain a large and multi-tenant platform for your customers?
- What is the governance model around the Ray project and how does the work at Anyscale influence the roadmap?
- What are the most interesting, innovative, or unexpected ways that you have seen Anyscale/Ray used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Ray and Anyscale?
- When is Anyscale/Ray the wrong choice?
- What do you have planned for the future of Anyscale/Ray?
Keep In Touch
Picks
Closing Announcements
- Thank you for listening! Don’t forget to check out our other show, the Data Engineering Podcast for the latest on modern data management.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@podcastinit.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
The intro and outro music is from Requiem for a Fish The Freak Fandango Orchestra / CC BY-SA
Dec 2022
Update Your Model's View Of The World In Real Time With Streaming Machine Learning Using River
Preamble
This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning.
Summary
The majority of machine learning projects that you read about or work on are built around batch processes. The model i ... Show More
1h 16m
Dec 2022
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
Preamble
This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning.
Summary
Deep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference ... Show More
59m 22s
Nov 2022
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks
Preamble
This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning.
Summary
Machine learning has the potential to transform industries and revolutionize business capabilities, but only if the mo ... Show More
47m 37s
Jul 2025
Revolutionizing Python Notebooks with Marimo
SummaryIn this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. ... Show More
51m 56s
Feb 2025
#495: OSMnx: Python and OpenStreetMap
See the full show notes for this episode on the website at <a href="https://talkpython.fm/495">talkpython.fm/495</a>
1h 1m
Oct 11
Context Engineering as a Discipline: Building Governed AI Analytics
SummaryIn this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Nick Schrock, CTO and founder of Dagster Labs, to discuss Compass - a Slack-native, agentic analytics system designed to keep data teams connected with business stakeholders. Nick shares his j ... Show More
51m 58s
Sep 2021
An Exploration Of The Data Engineering Requirements For Bioinformatics
<div class="wp-block-jetpack-markdown"><h2>Summary</h2>
<p>Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research ... Show More
55m 10s
Aug 26
From Academia to Industry: Bridging Data Engineering Challenges
SummaryIn this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and ... Show More
50m 54s
May 2022
Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way
<div class="wp-block-jetpack-markdown"><h2>Summary</h2>
<p>Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional la ... Show More
58m 11s
Aug 18
High Performance And Low Overhead Graphs With KuzuDB
SummaryIn this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. ... Show More
1h 1m
Mar 2021
Data Quality Management For The Whole Team With Soda Data
<div class="wp-block-jetpack-markdown"><h2>Summary</h2>
<p>Data quality is on the top of everyone’s mind recently, but getting it right is as challenging as ever. One of the contributing factors is the number of people who are involved in the process and the potential impa ... Show More
58 m
Aug 2024
The Evolution of DataOps: Insights from DataKitchen's CEO
Summary
In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to simplify the lives of data engineers. Chris explains the challenges faced by data engineers, such as constant system failures ... Show More
53m 30s
Feb 2025
The Future of Data Engineering: AI, LLMs, and Automation
Summary
In this episode of the Data Engineering Podcast Gleb Mezhanskiy, CEO and co-founder of DataFold, talks about the intersection of AI and data engineering. He discusses the challenges and opportunities of integrating AI into data engineering, particularly using large langua ... Show More
59m 39s