logo
episode-header-image
Jan 2015
10m 56s

[MINI] Data Provenance

Kyle Polich
About this episode

This episode introduces a high level discussion on the topic of Data Provenance, with more MINI episodes to follow to get into specific topics. Thanks to listener Sara L who wrote in to point out the Data Skeptic Podcast has focused alot about using data to be skeptical, but not necessarily being skeptical of data.

Data Provenance is the concept of knowing the full origin of your dataset. Where did it come from? Who collected it? How as it collected? Does it combine independent sources or one singular source? What are the error bounds on the way it was measured? These are just some of the questions one should ask to understand their data. After all, if the antecedent of an argument is built on dubious grounds, the consequent of the argument is equally dubious.

For a more technical discussion than what we get into in this mini epiosode, I recommend A Survey of Data Provenance Techniques by authors Simmhan, Plale, and Gannon.

Up next
Jul 21
Network of Past Guests Collaborations
Kyle and Asaf discuss a project in which we link former guests of the podcast based on their co-authorship of academic papers. 
34m 10s
Jul 6
The Network Diversion Problem
In this episode, Professor Pål Grønås Drange from the University of Bergen, introduces the field of Parameterized Complexity - a powerful framework for tackling hard computational problems by focusing on specific structural aspects of the input. This framework allows researchers ... Show More
46m 14s
Jun 28
Complex Dynamic in Networks
In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads. Our guest, Professor Baruch Barzel of Bar-Ilan Un ... Show More
56 m
Recommended Episodes
Jun 2019
Data Trusts and Citation Trends
In episode eleven of season five, we dig in to just what a data trust actually is, take a look at citation trends and other places (PMLR) you can dig up data to understand the field and talk with Raia Hadsell of DeepMind.See omnystudio.com/listener for privacy information. Hosted ... Show More
54m 15s
Feb 2025
863: TabPFN: Deep Learning for Tabular Data (That Actually Works!), with Prof. Frank Hutter
Jon Krohn talks tabular data with Frank Hutter, Professor of Artificial Intelligence at Universität Freiburg in Germany. Despite the great steps that deep learning has made in analysing images, audio, and natural language, tabular data has remained its insurmountable obstacle. In ... Show More
1h 6m
Dec 2024
2. ORIGINS OF AI + sending an accidental n*de to your PI?
🤖 welcome BACK to season 4 episode 2 of the @SoCulturedPodcast ! This season is all about discovering science through the lens of history, and todays episode we are delving into the origins of AI! ❓Where did AI get its start? Who were the key figures in the field? How has it pro ... Show More
50 m
Jun 24
#274: Real Talk About Synthetic Data with Winston Li
Synthetic data: it's a fascinating topic that sounds like science fiction but is rapidly becoming a practical tool in the data landscape. From machine learning applications to safeguarding privacy, synthetic data offers a compelling alternative to real-world datasets that might b ... Show More
58m 5s
Sep 2024
821: The Skills You Need to Be an Effective Data Scientist, with Marck Vaisman
Marck Vaisman speaks to Jon Krohn about his paradigm for understanding core data practitioner types. Hear Marck detail the four data practitioner personas that he has identified in his research, why he believes the roadmaps that influencers like to promote as surefire ways to a d ... Show More
1h 13m
Oct 2021
AI Today Podcast: Data science in the Enterprise: Interview with Sanyam Bhutani, host of Chai Time Data Science podcast
On the AI Today podcast we regularly interview thought leaders who are implementing AI and cognitive technology at various companies and agencies. However in this episode hosts Kathleen Walch and Ron Schmelzer interview Sanyam Bhutani, host of Chai Time Data Science podcast. As h ... Show More
23m 38s
Jun 3
893: How to Jumpstart Your Data Career (by Applying Like a Scientist), with Avery Smith
Avery Smith is a passionate and motivational YouTuber and careers educator for data science. In this episode, Jon Krohn asks Avery about the tools and tricks he has learned from personal experience and from his students in how to get ahead in the tech industry. Avery shares the “ ... Show More
1h 17m
Apr 2024
The Top EconTalk Conversations of 2023 (with Russ Roberts)
The favorite EconTalk episodes for host Russ Roberts are when he and his guest have an unusually powerful connection such as his recent episode with Charles Duhigg, and the ones where he learns something mind-blowing, like Adam Mastroianni’s insight that you can’t reach the brain ... Show More
42m 8s
May 13
#271: It Might Be Irrational, but Let's Talk Behavioral Science with Dr. Lindsay Juarez
Data that tracks what users and customers do is behavioral data. But behavioral science is much more about why humans do things and what sorts of techniques can be employed to nudge them to do something specific. On this episode, behavioral scientist Dr. Lindsay Juarez from Irrat ... Show More
1 h