logo
episode-header-image
May 2025
2h 24m

Is RL + LLMs enough for AGI? — Sholto Do...

Dwarkesh Patel
About this episode

New episode with my good friends Sholto Douglas & Trenton Bricken. Sholto focuses on scaling RL and Trenton researches mechanistic interpretability, both at Anthropic.

We talk through what’s changed in the last year of AI research; the new RL regime and how far it can scale; how to trace a model’s thoughts; and how countries, workers, and students should prepare for AGI.

See you next year for v3. Here’s last year’s episode, btw. Enjoy!

Watch on YouTube; listen on Apple Podcasts or Spotify.

----------

SPONSORS

* WorkOS ensures that AI companies like OpenAI and Anthropic don't have to spend engineering time building enterprise features like access controls or SSO. It’s not that they don't need these features; it's just that WorkOS gives them battle-tested APIs that they can use for auth, provisioning, and more. Start building today at workos.com.

* Scale is building the infrastructure for safer, smarter AI. Scale’s Data Foundry gives major AI labs access to high-quality data to fuel post-training, while their public leaderboards help assess model capabilities. They also just released Scale Evaluation, a new tool that diagnoses model limitations. If you’re an AI researcher or engineer, learn how Scale can help you push the frontier at scale.com/dwarkesh.

* Lighthouse is THE fastest immigration solution for the technology industry. They specialize in expert visas like the O-1A and EB-1A, and they’ve already helped companies like Cursor, Notion, and Replit navigate U.S. immigration. Explore which visa is right for you at lighthousehq.com/ref/Dwarkesh.

To sponsor a future episode, visit dwarkesh.com/advertise.

----------

TIMESTAMPS

(00:00:00) – How far can RL scale?

(00:16:27) – Is continual learning a key bottleneck?

(00:31:59) – Model self-awareness

(00:50:32) – Taste and slop

(01:00:51) – How soon to fully autonomous agents?

(01:15:17) – Neuralese

(01:18:55) – Inference compute will bottleneck AGI

(01:23:01) – DeepSeek algorithmic improvements

(01:37:42) – Why are LLMs ‘baby AGI’ but not AlphaZero?

(01:45:38) – Mech interp

(01:56:15) – How countries should prepare for AGI

(02:10:26) – Automating white collar work

(02:15:35) – Advice for students



Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Up next
Oct 4
Some thoughts on the Sutton interview
I have a much better understanding of Sutton’s perspective now. I wanted to reflect on it a bit.(00:00:00) - The steelman(00:02:42) - TLDR of my current thoughts(00:03:22) - Imitation learning is continuous with and complementary to RL(00:08:26) - Continual learning(00:10:31) - C ... Show More
11m 39s
Sep 26
Richard Sutton – Father of RL thinks LLMs are a dead end
Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson. And he thinks LLMs are a dead end.After interviewing him, my steel man of Richard’s position is this: LLMs aren’t capable of learning on-the-job, so no matter ... Show More
1h 6m
Sep 12
Fully autonomous robots are much closer than you think – Sergey Levine
Sergey Levine, one of the world’s top robotics researchers and co-founder of Physical Intelligence, thinks we’re on the cusp of a “self-improvement flywheel” for general-purpose robots. His median estimate for when robots will be able to run households entirely autonomously? 2030 ... Show More
1h 28m
Recommended Episodes
Mar 2025
How AI is saving billions of years of human research time | Max Jaderberg
Can AI compress the years long research time of a PhD into seconds? Research scientist Max Jaderberg explores how “AI analogs” simulate real-world lab work with staggering speed and scale, unlocking new insights on protein folding and drug discovery. Drawing on his experience wor ... Show More
19m 15s
Oct 2024
Engineering an Open Source CRISPR with Aadyot Bhatnagar
CRISPR is a powerful tool in biotechnology that allows scientists to precisely edit genes, much like editing lines of code in a computer program. Just as developers can remove or alter specific parts of a code to fix bugs or enhance functionality, CRISPR enables researchers to mo ... Show More
32m 57s
May 2023
CRISPR | The Future of Genetic Engineering
In this episode, Dr Matt and Dr Mike discuss how humans realised that bacteria could fight off viruses using sequences in their DNA called, CRISPR. Today, CRISPR technology is being used to cure cancer and investigate the cause of genetic disease.  For a video of Dr Mike teaching ... Show More
1h 22m
Jul 2024
Decoding Our DNA: How AI Supercharges Medical Breakthroughs and Biological Threats with Kevin Esvelt
AI has been a powerful accelerant for biological research, rapidly opening up new frontiers in medicine and public health. But that progress can also make it easier for bad actors to manufacture new biological threats. In this episode, Tristan and Daniel sit down with biologist K ... Show More
32m 47s
Nov 2024
833: The 10 Reasons AI Projects Fail, with Dr. Martin Goodson
Martin Goodson speaks to Jon Krohn about what he would add to his viral article “Ten Ways Your Data Project is Going to Fail”, why practitioners always need to be present at AI policy discussions, and Evolution AI’s breakthroughs in computer vision and NLP. This episode is brough ... Show More
1h 25m
Sep 23
How Microsoft is Fixing the Biggest AI Agent Problem
Want the guide to create AI Agents? get it here: https://clickhubspot.com/fhc Episode 77: Are we nearing a future where AI agents can autonomously tackle our biggest challenges—while remaining efficient, safe, and truly aligned with human goals? Matt Wolfe (https://x.com/mreflow) ... Show More
30m 8s
Oct 2024
These AI Workflows 10x'd Our Productivity (Q&A Special)
Episode 30: How are AI tools really transforming our productivity? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands)  dive deep into the world of AI-driven workflows in their latest Q&A Special episode. No guest joins this episode, ensuring that our ... Show More
47m 47s
Dec 2024
AI Pretends to Change Views, Human Spine Grown in Lab, and Body-Heat Powered Wearables Breakthrough
We're experimenting and would love to hear from you!In this episode of Discover Daily, we delve into new research on AI alignment faking, where Anthropic and Redwood Research reveal how AI models can strategically maintain their original preferences despite new training objec ... Show More
8m 50s
Sep 15
Faster Science, Better Drugs
Can we make science as fast as software? In this episode, Erik Torenberg talks with Patrick Hsu (cofounder of Arc Institute) and a16z general partner Jorge Conde about Arc’s “virtual cells” moonshot, which uses foundation models to simulate biology and guide experiments. They dis ... Show More
56m 26s
Apr 2025
Long-awaited ape genomes give new insights into their evolution — and ours
00:46 Complete sequencing of ape genomesResearchers have sequenced the complete genomes of six ape species, helping uncover the evolutionary history of our closest relatives and offering insights into what makes humans human. The genomes of chimpanzee, bonobo, gorilla, Bornean or ... Show More
29m 48s