logo
episode-header-image
Jun 2025
58m 5s

#274: Real Talk About Synthetic Data wit...

Michael Helbling, Moe Kiss, Tim Wilson, Val Kroll, and Julie Hoyer
About this episode

Synthetic data: it's a fascinating topic that sounds like science fiction but is rapidly becoming a practical tool in the data landscape. From machine learning applications to safeguarding privacy, synthetic data offers a compelling alternative to real-world datasets that might be incomplete or unwieldy. With the help of Winston Li, founder of Arima, a startup specializing in synthetic data and marketing mix modelling, we explore how this artificial data is generated, where its strengths truly lie, and the potential pitfalls to watch out for! For complete show notes, including links to items mentioned in this episode and a transcript of the show, visit the show page.

Up next
Sep 30
#281: Analytics: The View from the Corner Office with Anna Lee
From spreadsheets to strategy: what does data look like from the CEO’s chair? For this episode, we sat down with Anna Lee, CEO of Flybuys and former CFO/COO of THE ICONIC, to get her view on data-led leadership and what great looks like in data and analytics. Discover how Anna's ... Show More
1h 6m
Sep 16
#280: Dashboards Must Die! Long Live Dashboards! with Andy Cotgreave
If you didn’t have a visceral reaction to the title for this episode, then you are almost certainly not in our target audience. There are few more certain ways to get a room full of analytics folk fired up than to raise the topic of dashboards. Are they where data goes to die, or ... Show More
1h 6m
Sep 2
#279: The Process(es) of Analytics (We Have Thoughts)
What is "process" in analytics? On the one hand, it can be seen as a detailed sequence of minutia by which anything that needs to be repeated in the world of analytics gets carried out in a structured and consistent manner. On the other hand, that’s the sort of definition that st ... Show More
1h 1m
Recommended Episodes
Feb 2022
AI Today Podcast: Overview of Synthetic Data
Machine learning algorithms need examples of data from which they can learn, especially supervised machine learning algorithms. However, one big challenge for those looking to put machine learning into practice is the lack of a sufficient quantity of good quality data examples fr ... Show More
47m 14s
May 2023
TinyML: Bringing machine learning to the edge
When we think about machine learning today we often think in terms of immense scale — large language models that require huge amounts of computational power, for example. But one of the most interesting innovations in machine learning right now is actually happening on a really s ... Show More
45m 45s
Apr 2023
2344: Cloudera: Moving Beyond Big Data to Hybrid Data Mastery
I sit down with Chris Royles, EMEA Field CTO at Cloudera, to discuss the evolution of Big Data and why hybrid data is the next challenge for businesses to tackle. In this episode, we explore how the term 'Big Data' has become dated and how the rapid rise of hybrid data has shifte ... Show More
39m 54s
Nov 2024
Code Generation & Synthetic Data With Loubna Ben Allal #51
Our guest today is Loubna Ben Allal, Machine Learning Engineer at Hugging Face 🤗 . In our conversation, Loubna first explains how she built two impressive code generation models: StarCoder and StarCoder2. We dig into the importance of data when training large models and what can ... Show More
47m 6s
Feb 2025
863: TabPFN: Deep Learning for Tabular Data (That Actually Works!), with Prof. Frank Hutter
Jon Krohn talks tabular data with Frank Hutter, Professor of Artificial Intelligence at Universität Freiburg in Germany. Despite the great steps that deep learning has made in analysing images, audio, and natural language, tabular data has remained its insurmountable obstacle. In ... Show More
1h 6m
Aug 26
From Academia to Industry: Bridging Data Engineering Challenges
SummaryIn this episode of the Data Engineering Podcast Professor Paul Groth, from the University of Amsterdam, talks about his research on knowledge graphs and data engineering. Paul shares his background in AI and data management, discussing the evolution of data provenance and ... Show More
50m 54s
Jan 2025
Breaking Down Data Silos: AI and ML in Master Data Management
Summary In this episode of the Data Engineering Podcast Dan Bruckner, co-founder and CTO of Tamr, talks about the application of machine learning (ML) and artificial intelligence (AI) in master data management (MDM). Dan shares his journey from working at CERN to becoming a data ... Show More
57m 30s
Sep 2024
Human Data is Key to AI: Alex Wang from Scale AI
What if the key to unlocking AI's full potential lies not just in algorithms or compute, but in data? In this episode, a16z General Partner David George sits down with Alex Wang, founder and CEO of Scale AI, to discuss the crucial role of "frontier data" in advancing artificial i ... Show More
30m 56s
Sep 2021
An Exploration Of The Data Engineering Requirements For Bioinformatics
Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a uniqu ... Show More
55m 10s
Sep 2024
Open Animal Tracks
Our guest today is Risa Shinoda, a PhD student at Kyoto University Agricultural Systems Engineering Lab, where she applies computer vision techniques. She talked about the OpenAnimalTracks dataset and what it was used for. The dataset helps researchers predict animal footprint. S ... Show More
22m 45s