logo
episode-header-image
Apr 28
1h 8m

#503: The PyArrow Revolution

MICHAEL KENNEDY
About this episode
Pandas is at a the core of virtually all data science done in Python, that is virtually all data science. Since it's beginning, Pandas has been based upon numpy. But changes are afoot to update those internals and you can now optionally use PyArrow. PyArrow comes with a ton of benefits including it's columnar format which makes answering analytical questions faster, support for a range of high performance file formats, inter-machine data streaming, faster file IO and more. Reuven Lerner is here to give us the low-down on the PyArrow revolution.

Episode sponsors

NordLayer
Auth0
Talk Python Courses

Links from the show

Reuven: github.com/reuven
Apache Arrow: github.com
Parquet: parquet.apache.org
Feather format: arrow.apache.org
Python Workout Book: manning.com
Pandas Workout Book: manning.com
Pandas: pandas.pydata.org
PyArrow CSV docs: arrow.apache.org
Future string inference in Pandas: pandas.pydata.org
Pandas NA/nullable dtypes: pandas.pydata.org
Pandas `.iloc` indexing: pandas.pydata.org
DuckDB: duckdb.org
Pandas user guide: pandas.pydata.org
Pandas GitHub issues: github.com
Watch this episode on YouTube: youtube.com
Episode transcripts: talkpython.fm

--- Stay in touch with us ---
Subscribe to Talk Python on YouTube: youtube.com
Talk Python on Bluesky: @talkpython.fm at bsky.app
Talk Python on Mastodon: talkpython
Michael on Bluesky: @mkennedy.codes at bsky.app
Michael on Mastodon: mkennedy
Up next
May 5
#504: Developer Trends in 2025
What trends and technologies should you be paying attention to today? Are there hot new database servers you should check out? Or will that just be a flash in the pan? I love these forward looking episodes and this one is super fun. I've put together an amazing panel: Gina Häußge ... Show More
1h 9m
Apr 21
#502: Django Ledger: Accounting with Python
Do you or your company need accounting software? Well, there are plenty of SaaS products out there that you can give your data to. but maybe you also really like Django and would rather have a foundation to build your own accounting system exactly as you need for your company or ... Show More
1h 3m
Apr 14
#501: Marimo - Reactive Notebooks for Python
Have you ever spent an afternoon wrestling with a Jupyter notebook, hoping that you ran the cells in just the right order, only to realize your outputs were completely out of sync? Today's guest has a fresh take on solving that exact problem. Akshay Agrawal is here to introduce M ... Show More
1 h
Recommended Episodes
Sep 2024
Pausing to think about scikit-learn & OpenAI o1
Recently the company stewarding the open source library scikit-learn announced their seed funding. Also, OpenAI released “o1” with new behavior in which it pauses to “think” about complex tasks. Chris and Daniel take some time to do their own thinking about o1 and the contrast to ... Show More
50m 10s
May 2023
675: Pandas for Data Analysis and Visualization
Wrangling data in Pandas, when to use Pandas, Matplotlib or Seaborn, and why you should learn to create Python packages: Jon Krohn speaks with guest Stefanie Molin, author of Hands-On Data Analysis with Pandas.This episode is brought to you by Posit, the open-source data science ... Show More
1h 8m
Jul 2019
Episode 67: Classic Computer Science Problems in Python
Today I am with David Kopec, author of Classic Computer Science Problems in Python, published by Manning Publications. His book deepens your knowledge of problem solving techniques from the realm of computer science by challenging you with interesting and realistic scenarios, exe ... Show More
28m 35s
Jun 2024
SE Radio 622: Wolf Vollprecht on Python Tooling in Rust
Wolf Vollprecht, the CEO and founder of Prefix.dev, speaks with host Gregory M. Kapfhammer about how to implement Python tools, such as package managers, in the Rust programming language. They discuss the challenges associated with building Python infrastructure tooling in Python ... Show More
55m 10s
Apr 3
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo
Scott and Daniel break down every month from now until the 2027 intelligence explosion.Scott Alexander is author of the highly influential blogs Slate Star Codex and Astral Codex Ten. Daniel Kokotajlo resigned from OpenAI in 2024, rejecting a non-disparagement clause and risking ... Show More
3h 4m
Jun 2023
AI trends: a Latent Space crossover
Daniel had the chance to sit down with @swyx and Alessio from the Latent Space pod in SF to talk about current AI trends and to highlight some key learnings from past episodes. The discussion covers open access LLMs, smol models, model controls, prompt engineering, and LLMOps. Th ... Show More
59m 39s
Aug 2024
Metrics Driven Development
How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” ... Show More
42m 12s
Jul 2024
120: Don’t Learn Python as a Data Analyst (Learn This Instead)
Although Python is talked about a lot in the data world, if you are aiming for your first data analyst role, I don’t think you should learn it. It takes too much time, it’s hard to learn, and it’s hard to use. In this episode, I’ll dive into more of the specifics and what to focu ... Show More
9m 1s
Nov 2024
scikit-learn & data science you own
We are at GenAI saturation, so let’s talk about scikit-learn, a long time favorite for data scientists building classifiers, time series analyzers, dimensionality reducers, and more! Scikit-learn is deployed across industry and driving a significant portion of the “AI” that is ac ... Show More
52m 2s
Feb 2022
Modern Code Generation with Jordan Adler
Jordan Adler is Head of Developer Engineering at OneSignal and has a deep interest in code generation. He has helped migrate large systems from Python 2 or Python 3 using code generation and code transformation. Using tools like Yellicode, Python Future, and others, Jordan's team ... Show More
34m 49s