logo
episode-header-image
Sep 16
56m 15s

The Startup Powering The Data Behind AGI

Lukas Biewald
About this episode

In this episode of Gradient Dissent, Lukas Biewald talks with the CEO & founder of Surge AI, the billion-dollar company quietly powering the next generation of frontier LLMs. They discuss Surge's origin story, why traditional data labeling is broken, and how their research-focused approach is reshaping how models are trained.

You’ll hear why inter-annotator agreement fails in high-complexity tasks like poetry and math, why synthetic data is often overrated, and how Surge builds rich RL environments to stress-test agentic reasoning. They also go deep on what kinds of data will be critical to future progress in AI—from scientific discovery to multimodal reasoning and personalized alignment.


It’s a rare, behind-the-scenes look into the world of high-quality data generation at scale—straight from the team most frontier labs trust to get it right.


Timestamps:

00:00 – Intro: Who is Edwin Chen?

03:40 – The problem with early data labeling systems

06:20 – Search ranking, clickbait, and product principles

10:05 – Why Surge focused on high-skill, high-quality labeling

13:50 – From Craigslist workers to a billion-dollar business

16:40 – Scaling without funding and avoiding Silicon Valley status games

21:15 – Why most human data platforms lack real tech

25:05 – Detecting cheaters, liars, and low-quality labelers

28:30 – Why inter-annotator agreement is a flawed metric

32:15 – What makes a great poem? Not checkboxes

36:40 – Measuring subjective quality rigorously

40:00 – What types of data are becoming more important

44:15 – Scientific collaboration and frontier research data

47:00 – Multimodal data, Argentinian coding, and hyper-specificity

50:10 – What's wrong with LMSYS and benchmark hacking

53:20 – Personalization and taste in model behavior

56:00 – Synthetic data vs. high-quality human data


Follow Weights & Biases:

https://twitter.com/weights_biases

https://www.linkedin.com/company/wandb

Up next
Aug 5
Arvind Jain on Building Glean and the Future of Enterprise AI
In this episode of Gradient Dissent, Lukas Biewald sits down with Arvind Jain, CEO and founder of Glean. They discuss Glean's evolution from solving enterprise search to building agentic AI tools that understand internal knowledge and workflows. Arvind shares how his early use of ... Show More
43m 41s
Jul 8
How DeepL Built a Translation Powerhouse with AI with CEO Jarek Kutylowski
In this episode of Gradient Dissent, Lukas Biewald talks with Jarek Kutylowski, CEO and founder of DeepL, an AI-powered translation company. Jarek shares DeepL’s journey from launching neural machine translation in 2017 to building custom data centers and how small teams can not ... Show More
42m 42s
Jun 2025
GitHub CEO Thomas Dohmke on Copilot and the Future of Software Development
In this episode of Gradient Dissent, Lukas Biewald sits down with Thomas Dohmke, CEO of GitHub, to talk about the future of software engineering in the age of AI. They discuss how GitHub Copilot was built, why agents are reshaping developer workflows, and what it takes to make to ... Show More
1h 9m
Recommended Episodes
Aug 2024
AI in Action: From Machine Learning Interpretability to Cybersecurity with Serg Masís and Nirmal Budhathoki
In this DSS Podcast, Anna Anisin welcomes Serg Masís, Climate and Agronomic Data Scientist at Syngenta. Serg, an expert in machine learning interpretability and responsible AI, shares his diverse background and journey into data science. He discusses the challenges of building fa ... Show More
25m 37s
Jun 2025
CVS Health and Aible are Delivering Enterprise AI with Rapid Prototyping, Agents, and Reasoning Models - Ep. 261
Tony Ambrozie from CVS Health and Arijit Sengupta from Aible share how their partnership is transforming enterprise AI development through rapid prototyping and human-centered design. Discover their proven methodology for moving from concept to production in just 30 days, why the ... Show More
39m 34s
Jun 2025
How to Design an AI-Native Engineering Organization
NLW is joined by Sid Pardeshi and Brian Elliot from Blitzy.com to discuss the radically changes coming to AI engineering organizations. From copilots to agent swarms, this is a conversation about the opportunities and challenges facing all enterprise engineering groups as they lo ... Show More
38m 16s
Jul 8
How I'm Building a Zero-Employee Business with AI
Want to Automate your work with AI? Get the playbook here: https://clickhubspot.com/wgk Episode 66: Can you really build a zero-employee business with AI? Nathan Lands (https://x.com/NathanLands) sits down with John Rush (https://x.com/johnrushx), founder and self-proclaimed buil ... Show More
46 m
May 2024
GSK’s Use of AI in Vaccine Tech, Drug Discovery
GSK’s Chief Digital and Technology Officer Shobie Ramakrishnan discusses how the company is leveraging AI and data models for vaccine development and drug discovery in this episode of Bloomberg Intelligence’s Tech Disruptors podcast. BI’s Health-Care Analyst Sam Fazeli and Techno ... Show More
42m 42s
Sep 2023
The Future of AI in Coding with Bito CEO Amar Goel
In this episode, we dive into the journey of Amar Goel, CEO of Bito AI, and how his company raised $3.2 million to create a platform that trains directly on your codebase. We explore the challenges and opportunities of fundraising, as well as the impact of Bito AI's unique ap ... Show More
22m 54s
Dec 2020
Applying AI to Merchant Services with Adrian Talapan of Fee Navigator: Ep 145
Artificial Intelligence (AI) and Machine Learning have tremendous potential and applications to the business world. On this episode, Adrian Talapan, Co-founder and CEO of FeeNavigator, joins to discuss how they've applied AI to their tool which offers instant merchant statement a ... Show More
43m 23s
Sep 23
How Microsoft is Fixing the Biggest AI Agent Problem
Want the guide to create AI Agents? get it here: https://clickhubspot.com/fhc Episode 77: Are we nearing a future where AI agents can autonomously tackle our biggest challenges—while remaining efficient, safe, and truly aligned with human goals? Matt Wolfe (https://x.com/mreflow) ... Show More
30m 8s
Jul 2
Alembic and the Future of AI in Marketing - Ep. 263
Tomás Puig, founder and CEO of Alembic, joins the NVIDIA AI Podcast to discuss the intersection of AI, data, and marketing. He shares how Alembic uses advanced mathematics and AI—particularly spiking neural networks and causal inference—to help brands extract actionable insights ... Show More
39m 44s
Sep 16
The State of AI Agents in 2025 & How to Use Them
Want our guide to master AI Agents? Get it here: https://clickhubspot.com/bka Episode 76: What actually makes something a real "AI Agent"—and how close are we to AI handling complex work entirely on its own? Matt Wolfe (https://x.com/mreflow) is joined by Deepak Singh (https://x. ... Show More
51m 26s