logo
episode-header-image
Nov 2018
24m 43s

MLA 009 Charting and Visualization Tools...

OCDevel
About this episode

Python charting libraries - Matplotlib, Seaborn, and Bokeh - explaining, their strengths from quick EDA to interactive, HTML-exported visualizations, and clarifies where D3.js fits as a JavaScript alternative for end-user applications. It also evaluates major software solutions like Tableau, Power BI, QlikView, and Excel, detailing how modern BI tools now integrate drag-and-drop analytics with embedded machine learning, potentially allowing business users to automate entire workflows without coding.

Links

Core Phases in Data Science Visualization

  • Exploratory Data Analysis (EDA):
    • EDA occupies an early stage in the Business Intelligence (BI) pipeline, positioned just before or sometimes merged with the data cleaning ("munging") phase.
    • The outputs of EDA (e.g., correlation matrices, histograms) often serve as inputs to subsequent machine learning steps.

Python Visualization Libraries

1. Matplotlib

  • The foundational plotting library in Python, supporting static, basic chart types.
  • Requires substantial boilerplate code for custom visualizations.
  • Serves as the core engine for many higher-level visualization tools.
  • Common EDA tasks (like plotting via .corr().hist(), and .scatter() methods on pandas DataFrames) depend on Matplotlib under the hood.

2. Pandas Plotting

  • Pandas integrates tightly with Matplotlib and exposes simple, one-line commands for common plots (e.g., df.corr()df.hist()).
  • Designed to make quick EDA accessible without requiring detailed knowledge of Matplotlib's verbose syntax.

3. Seaborn

  • A high-level wrapper around Matplotlib, analogous to how Keras wraps TensorFlow.
  • Sets sensible defaults for chart styles, fonts, colors, and sizes, improving aesthetics with minimal effort.
  • Importing Seaborn can globally enhance the appearance of all Matplotlib plots, even without direct usage of Seaborn's plotting functions.

4. Bokeh

  • A powerful library for creating interactive, web-ready plots from Python.
  • Enables user interactions such as hovering, zooming, and panning within rendered plots.
  • Exports visualizations as standalone HTML files or can operate as a server-linked app for live data exploration.
  • Supports advanced features like cross-filtering, allowing dynamic slicing and dicing of data across multiple axes or columns.
  • More suited for creating reusable, interactive dashboards rather than quick, one-off EDA visuals.

5. D3.js

  • Unlike previous libraries, D3.js is a JavaScript framework for creating complex, highly customized data visualizations for web and mobile apps.
  • Used predominantly on the client-side to build interactive front-end graphics for end users, not as an EDA tool for analysts.
  • Common in production-grade web apps, but not typically part of a Python-based data science workflow.

Dedicated Visualization and BI Software

Tableau

  • Leading commercial drag-and-drop BI tool for data visualization and dashboarding.
  • Connects to diverse data sources (CSV, Excel, databases), auto-detects column types, and suggests default chart types.
  • Users can interactively build visualizations, cross-filter data, and switch chart types without coding.

Power BI

  • Microsoft's BI suite, similar to Tableau, supporting end-to-end data analysis and visualization.
  • Integrates data preparation, visualization, and increasingly, built-in machine learning workflows.
  • Focused on empowering business users or analysts to run the BI pipeline without programming.

QlikView

  • Another major BI offering is QlikView, emphasizing interactive dashboards and data exploration.

Excel

  • Still widely used for basic EDA and visualizations directly on spreadsheets.
  • Offers limited but accessible charting tools for histograms, scatter plots, and simple summary statistics.
  • Data often originates from Excel/CSV files before being ingested for further analysis in Python/pandas.

Trends & Insights

  • Workflow Integration: Modern BI tools are converging, adding both classic EDA capabilities and basic machine learning modeling, often through a code-free interface.
  • Automation Risks and Opportunities: As drag-and-drop BI tools increase in capabilities (including model training and selection), some data science coding work traditionally required for BI pipelines may become accessible to non-programmers.
  • Distinctions in Use:
    • Python libraries (Matplotlib, Seaborn, Bokeh) excel in automating and scripting EDA, report generation, and static analysis as part of data pipelines.
    • BI software (Tableau, Power BI, QlikView) shines for interactive exploration and democratized analytics, integrated from ingestion to reporting.
    • D3.js stands out for tailored, production-level, end-user app visualizations, rarely leveraged by data scientists for EDA.

Key Takeaways

  • For quick, code-based EDA: Use Pandas' built-in plotters (wrapping Matplotlib).
  • For pre-styled, pretty plots: Use Seaborn (with or without direct API calls).
  • For interactive, shareable dashboards: Use Bokeh for Python or BI tools for no-code operation.
  • For enterprise, end-user-facing dashboards: Choose BI software like Tableau or build custom apps using D3.js for total control.
Up next
Oct 2020
MLA 010 NLP packages: transformers, spaCy, Gensim, NLTK
<div> <p>The landscape of Python natural language processing tools has evolved from broad libraries like NLTK toward more specialized packages such as Gensim for topic modeling, SpaCy for linguistic analysis, and Hugging Face Transformers for advanced tasks, with Sentence Transfo ... Show More
26m 22s
Nov 2020
MLA 011 Practical Clustering Tools
<div> <p>Primary clustering tools for practical applications include K-means using scikit-learn or Faiss, agglomerative clustering leveraging cosine similarity with scikit-learn, and density-based methods like DBSCAN or HDBSCAN. For determining the optimal number of clusters, sil ... Show More
34m 50s
Nov 2020
MLG 032 Cartesian Similarity Metrics
<p><a href= "https://ocdevel.com/walk?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg32"> Try a walking desk</a> to stay healthy while you study or work!</p> <p>Show notes at <a href= "https://ocdevel.com/mlg/32?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg32">ocdevel.com ... Show More
41m 52s
Recommended Episodes
Aug 18
High Performance And Low Overhead Graphs With KuzuDB
SummaryIn this episode of the Data Engineering Podcast Prashanth Rao, an AI engineer at KuzuDB, talks about their embeddable graph database. Prashanth explains how KuzuDB addresses performance shortcomings in existing solutions through columnar storage and novel join algorithms. ... Show More
1h 1m
Jul 2024
The Rise of Generative AI Video Tools
Episode 13: What impact will AI-generated content have on the entertainment industry? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) dive into this topic, envisioning a future where AI generates interactive movies and complex gaming worlds with in ... Show More
42m 48s
Sep 18
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
SummaryIn this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to m ... Show More
52m 58s
Nov 2024
Code Generation & Synthetic Data With Loubna Ben Allal #51
Our guest today is Loubna Ben Allal, Machine Learning Engineer at Hugging Face 🤗 . In our conversation, Loubna first explains how she built two impressive code generation models: StarCoder and StarCoder2. We dig into the importance of data when training large models and what can ... Show More
47m 6s
Apr 2025
Canva Create 2025 - What's New for Educators? - HoET261
In this exciting crossover episode, Chris Nesi teams up with Leena Marie Saleh (The EdTech Guru) for a detailed look into Canva's latest educational innovations unveiled during Canva Create 2025. Whether you're a teacher, instructional coach, or tech integrator, this episode is p ... Show More
54m 32s
Jun 2025
806 : Topical English Vocabulary Lesson With Teacher Tiffani about Digital Art
<p>In today’s episode, you will learn a series of vocabulary words that are connected to a specific topic. This lesson will help you improve your ability to speak English fluently about a specific topic. It will also help you feel more confident in your English abilities.</p><h1> ... Show More
13m 21s
Jul 2024
Rendering Revolutions: Chaos founder Vlado Koylazov's Journey from V-Ray to Virtual Production
This podcast episode features Vlado Koylazov, co-founder of Chaos and inventor of the widely-used V-Ray rendering software. Koylazov shares his journey in computer graphics, from his early fascination with the field to the development of V-Ray and the latest innovations at Chaos. ... Show More
42m 42s
Sep 2024
Pausing to think about scikit-learn & OpenAI o1
<p>Recently the company stewarding the open source library scikit-learn announced their seed funding. Also, OpenAI released “o1” with new behavior in which it pauses to “think” about complex tasks. Chris and Daniel take some time to do their own thinking about o1 and the contrast ... Show More
50m 10s
Aug 2023
Deepdub’s Ofir Krakowski on Redefining Dubbing from Hollywood to Bollywood - Ep. 202
In the global entertainment landscape, TV show and film production stretches far beyond Hollywood or Bollywood — it's a worldwide phenomenon. However, while streaming platforms have broadened the reach of content, dubbing and translation technology still has plenty of room for gr ... Show More
32m 37s
Apr 2025
Simplifying Data Pipelines with Durable Execution
Summary In this episode of the Data Engineering Podcast Jeremy Edberg, CEO of DBOS, about durable execution and its impact on designing and implementing business logic for data systems. Jeremy explains how DBOS's serverless platform and orchestrator provide local resilience and r ... Show More
39m 49s