logo
episode-header-image
May 2023
1h 8m

675: Pandas for Data Analysis and Visual...

Jon Krohn
About this episode

Wrangling data in Pandas, when to use Pandas, Matplotlib or Seaborn, and why you should learn to create Python packages: Jon Krohn speaks with guest Stefanie Molin, author of Hands-On Data Analysis with Pandas.

This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

In this episode you will learn:
• The advantages of using pandas over other libraries [07:55]
• Why data wrangling in pandas is so helpful [12:05]
• Stefanie’s Data Morph library [24:27]
• When to use pandas, matplotlib, or seaborn [33:45]
• Understanding the ticker module in matplotlib [36:48]
• Where data analysts should start their learning journey [40:08]
• What it’s like being a software engineer at Bloomberg [51:19]

Additional materials: www.superdatascience.com/675

Up next
Yesterday
903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir
Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmar ... Show More
1h 28m
Jul 4
902: In Case You Missed It in June 2025
In this episode of “In Case You Missed It”, Jon recaps his June interviews on The SuperDataScience Podcast. Hear from Diane Hare, Avery Smith, Kirill Eremenko, and Shaun Johnson as they talk about the best portfolios for AI practitioners, how to stand out in a saturated candidate ... Show More
29m 29s
Jul 1
901: Automating Legal Work with Data-Centric ML (feat. Lilith Bat-Leah)
Senior Director of AI Labs for Epiq Lilith Bat-Leah speaks to Jon Krohn about the ways AI have disrupted the legal industry using LLMs and retrieval-augmented generation (RAG), as well as how the data-centric machine learning research movement (DMLR) is systematically improving d ... Show More
1h 6m
Recommended Episodes
Aug 2024
#474: Python Performance for Data Science
Python performance has come a long way in recent times. And it's often the data scientists, with their computational algorithms and large quantities of data, who care the most about this form of performance. It's great to have Stan Seibert back on the show to talk about Python's ... Show More
1h 8m
Feb 2025
#495: OSMnx: Python and OpenStreetMap
On this episode, I’m joined by Dr. Jeff Boeing, an assistant professor at the University of Southern California whose research spans urban planning, spatial analysis, and data science. We explore why OpenStreetMap is such a powerful source of global map data—and how Jeff’s Python ... Show More
1h 1m
Jul 2024
#471: Learning and teaching Pandas
If you want to get better at something, often times the path is pretty clear. If you get better at swimming, you go to the pool and practice your strokes and put in time doing the laps. If you want to get better at mountain biking, hit the trails and work on drills focusing on di ... Show More
1h 4m
Dec 2024
#489: Anaconda Toolbox for Excel and more with Peter Wang
Peter Wang has been pushing Python forward since the early days of its data science roots. We're lucky to have him back on the show. We're going to talk about the Anaconda Toolbox for Excel as well as many other trends and topics that are hot in the Python space right now. I'm su ... Show More
1h 9m
Dec 2024
#491: DuckDB and Python: Ducks and Snakes living together
Join me for an insightful conversation with Alex Monahan, who works on documentation, tutorials, and training at DuckDB Labs. We explore why DuckDB is gaining momentum among Python and data enthusiasts, from its in-process database design to its blazingly fast, columnar architect ... Show More
1h 2m
Jul 2024
120: Don’t Learn Python as a Data Analyst (Learn This Instead)
Although Python is talked about a lot in the data world, if you are aiming for your first data analyst role, I don’t think you should learn it. It takes too much time, it’s hard to learn, and it’s hard to use. In this episode, I’ll dive into more of the specifics and what to focu ... Show More
9m 1s
Jun 2020
Rust and machine learning #4: practical tools (Ep. 110)
In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly. To make a comparison with the Python ecosystem I wi ... Show More
24m 18s
Mar 2025
#497: Outlier Detection with Python
Have you ever wondered why certain data points stand out so dramatically? They might hold the key to everything from fraud detection to groundbreaking discoveries. This week on Talk Python to Me, we dive into the world of outlier detection with Python with Brett Kennedy. You’ll l ... Show More
55m 22s
Oct 2024
#483: Reflex Framework: Frontend, Backend, Pure Python
Let's say you want to create a web app and you know Python really well. Your first thought might be Flask or Django or even FastAPI? All good choices but there is a lot to get a full web app into production. The framework we'll talk about today, Reflex, allows you to just write P ... Show More
1h 3m
Mar 2025
NVIDIA RAPIDS and Open Source ML Acceleration with Chris Deotte and Jean-Francois Puget
NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries. It leverages CUDA and significantly enhances the performance of core Python frameworks including Polars, pandas, scikit-learn and NetworkX. Chris Deotte is a Senior Data Scientist at NVIDIA an ... Show More
42m 6s