logo
episode-header-image
May 2023
1h 8m

675: Pandas for Data Analysis and Visual...

Jon Krohn
About this episode

Wrangling data in Pandas, when to use Pandas, Matplotlib or Seaborn, and why you should learn to create Python packages: Jon Krohn speaks with guest Stefanie Molin, author of Hands-On Data Analysis with Pandas.

This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

In this episode you will learn:
• The advantages of using pandas over other libraries [07:55]
• Why data wrangling in pandas is so helpful [12:05]
• Stefanie’s Data Morph library [24:27]
• When to use pandas, matplotlib, or seaborn [33:45]
• Understanding the ticker module in matplotlib [36:48]
• Where data analysts should start their learning journey [40:08]
• What it’s like being a software engineer at Bloomberg [51:19]

Additional materials: www.superdatascience.com/675

Up next
Oct 7
929: Dragon Hatchling: The Missing Link Between Transformers and the Brain, with Adrian Kosowski
Breaking news: Jon Krohn welcomes Adrian Kosowski to the show to talk about the groundbreaking research happening at Pathway. Adrian and his team demonstrate how they have brought attention in AI closer to the way the brain functions, creating, in essence, a “massively parallel s ... Show More
1h 14m
Oct 3
928: The “Lethal Trifecta”: Can AI Agents Ever Be Safe?
Prompt injections, malicious code, and AI agents: In this week’s Five-Minute Friday, Jon Krohn looks into the current security weaknesses found in AI systems. A structural vulnerability that The Economist dubs a “lethal trifecta” could cause havoc for AI users, unless we take the ... Show More
5m 55s
Sep 30
927: Automating Code Review with AI, feat. CodeRabbit’s David Loker
Earlier this year, David Loker joined CodeRabbit as their Director of AI. As more people come to write code with the help of large language models, David believes CodeRabbit will become a helpful assistant for code reviewing and pull requests. He tells Jon Krohn how CodeRabbit as ... Show More
1h 19m
Recommended Episodes
Aug 2024
#474: Python Performance for Data Science
Python performance has come a long way in recent times. And it's often the data scientists, with their computational algorithms and large quantities of data, who care the most about this form of performance. It's great to have Stan Seibert back on the show to talk about Python's ... Show More
1h 8m
Feb 2025
#495: OSMnx: Python and OpenStreetMap
On this episode, I’m joined by Dr. Jeff Boeing, an assistant professor at the University of Southern California whose research spans urban planning, spatial analysis, and data science. We explore why OpenStreetMap is such a powerful source of global map data—and how Jeff’s Python ... Show More
1h 1m
Jul 2024
#471: Learning and teaching Pandas
If you want to get better at something, often times the path is pretty clear. If you get better at swimming, you go to the pool and practice your strokes and put in time doing the laps. If you want to get better at mountain biking, hit the trails and work on drills focusing on di ... Show More
1h 4m
Sep 9
What's New at CFI | Data Analysis in Python
Ready to take your data analysis skills to the next level? In this episode of What's New at CFI, we chat with subject matter expert Joseph Yeates about his newest course, Data Analysis in Python. This course is the perfect follow-up to our "Getting Started with Python" series and ... Show More
13m 33s
Dec 2024
#489: Anaconda Toolbox for Excel and more with Peter Wang
Peter Wang has been pushing Python forward since the early days of its data science roots. We're lucky to have him back on the show. We're going to talk about the Anaconda Toolbox for Excel as well as many other trends and topics that are hot in the Python space right now. I'm su ... Show More
1h 9m
Dec 2024
#491: DuckDB and Python: Ducks and Snakes living together
Join me for an insightful conversation with Alex Monahan, who works on documentation, tutorials, and training at DuckDB Labs. We explore why DuckDB is gaining momentum among Python and data enthusiasts, from its in-process database design to its blazingly fast, columnar architect ... Show More
1h 2m
Jul 2024
120: Don’t Learn Python as a Data Analyst (Learn This Instead)
Although Python is talked about a lot in the data world, if you are aiming for your first data analyst role, I don’t think you should learn it. It takes too much time, it’s hard to learn, and it’s hard to use. In this episode, I’ll dive into more of the specifics and what to focu ... Show More
9m 1s
Jul 28
Revolutionizing Python Notebooks with Marimo
SummaryIn this episode of the Data Engineering Podcast Akshay Agrawal from Marimo discusses the innovative new Python notebook environment, which offers a reactive execution model, full Python integration, and built-in UI elements to enhance the interactive computing experience. ... Show More
51m 56s
Mar 2025
NVIDIA RAPIDS and Open Source ML Acceleration with Chris Deotte and Jean-Francois Puget
NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries. It leverages CUDA and significantly enhances the performance of core Python frameworks including Polars, pandas, scikit-learn and NetworkX. Chris Deotte is a Senior Data Scientist at NVIDIA an ... Show More
42m 6s
May 2018
MLA 002 Numpy & Pandas
NumPy enables efficient storage and vectorized computation on large numerical datasets in RAM by leveraging contiguous memory allocation and low-level C/Fortran libraries, drastically reducing memory footprint compared to native Python lists. Pandas, built on top of NumPy, introd ... Show More
18m 10s