logo
episode-header-image
Jun 24
58m 5s

#274: Real Talk About Synthetic Data wit...

Michael Helbling, Moe Kiss, Tim Wilson, Val Kroll, and Julie Hoyer
About this episode

Synthetic data: it's a fascinating topic that sounds like science fiction but is rapidly becoming a practical tool in the data landscape. From machine learning applications to safeguarding privacy, synthetic data offers a compelling alternative to real-world datasets that might be incomplete or unwieldy. With the help of Winston Li, founder of Arima, a startup specializing in synthetic data and marketing mix modelling, we explore how this artificial data is generated, where its strengths truly lie, and the potential pitfalls to watch out for! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Up next
Aug 19
#278: Is AI Good at Data Analysis? That's the Wrong Question? with Juliana Jackson
Imagine a world where business users simply fire up their analytics AI tool, ask for some insights, and get a clear and accurate response in return. That’s the dream, isn’t it? Is it just around the corner, or is it years away? Or is that vision embarrassingly misguided at its co ... Show More
1 h
Aug 5
#277: ANOVA? I Hardly Know Ya'! with Chelsea Parlett-Pelleriti
Did you know that, upon closer inspection, many a statistical test will reveal that "it's just a linear model" (#IJALM)? That wound up being a key point that our go-to statistician, Chelsea Parlett-Pelleriti, made early and often on this episode, which is the next installment in ... Show More
1 h
Jul 22
#276: BI is Dead! Long Live BI! With Colin Zima
Product managers for BI platforms have it easy. They "just" need to have the dev team build a tool that gives all types of users access to all of the data they should be allowed to see in a way that is quick, simple, and clear while preventing them from pulling data that can be m ... Show More
1h 4m
Recommended Episodes
Feb 2022
AI Today Podcast: Overview of Synthetic Data
Machine learning algorithms need examples of data from which they can learn, especially supervised machine learning algorithms. However, one big challenge for those looking to put machine learning into practice is the lack of a sufficient quantity of good quality data examples fr ... Show More
47m 14s
May 2023
TinyML: Bringing machine learning to the edge
When we think about machine learning today we often think in terms of immense scale — large language models that require huge amounts of computational power, for example. But one of the most interesting innovations in machine learning right now is actually happening on a really s ... Show More
45m 45s
Apr 2023
2344: Cloudera: Moving Beyond Big Data to Hybrid Data Mastery
I sit down with Chris Royles, EMEA Field CTO at Cloudera, to discuss the evolution of Big Data and why hybrid data is the next challenge for businesses to tackle. In this episode, we explore how the term 'Big Data' has become dated and how the rapid rise of hybrid data has shifte ... Show More
39m 54s
Nov 2024
Code Generation & Synthetic Data With Loubna Ben Allal #51
Our guest today is Loubna Ben Allal, Machine Learning Engineer at Hugging Face 🤗 . In our conversation, Loubna first explains how she built two impressive code generation models: StarCoder and StarCoder2. We dig into the importance of data when training large models and what can ... Show More
47m 6s
Feb 2025
863: TabPFN: Deep Learning for Tabular Data (That Actually Works!), with Prof. Frank Hutter
Jon Krohn talks tabular data with Frank Hutter, Professor of Artificial Intelligence at Universität Freiburg in Germany. Despite the great steps that deep learning has made in analysing images, audio, and natural language, tabular data has remained its insurmountable obstacle. In ... Show More
1h 6m
Jan 2025
Breaking Down Data Silos: AI and ML in Master Data Management
Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data ... Show More
57m 30s
Sep 2024
Human Data is Key to AI: Alex Wang from Scale AI
What if the key to unlocking AI's full potential lies not just in algorithms or compute, but in data? In this episode, a16z General Partner David George sits down with Alex Wang, founder and CEO of Scale AI, to discuss the crucial role of "frontier data" in advancing artificial i ... Show More
30m 56s
Sep 2021
An Exploration Of The Data Engineering Requirements For Bioinformatics
Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a uniqu ... Show More
55m 10s
Sep 2024
Open Animal Tracks
Our guest today is Risa Shinoda, a PhD student at Kyoto University Agricultural Systems Engineering Lab, where she applies computer vision techniques. She talked about the OpenAnimalTracks dataset and what it was used for. The dataset helps researchers predict animal footprint. S ... Show More
22m 45s
Apr 2025
2027 Intelligence Explosion: Month-by-Month Model — Scott Alexander & Daniel Kokotajlo
Scott and Daniel break down every month from now until the 2027 intelligence explosion.Scott Alexander is author of the highly influential blogs Slate Star Codex and Astral Codex Ten. Daniel Kokotajlo resigned from OpenAI in 2024, rejecting a non-disparagement clause and risking ... Show More
3h 4m