logo
episode-header-image
Oct 2023
3h 7m

Paul Christiano - Preventing an AI Takeo...

Dwarkesh Patel
About this episode

Paul Christiano is the world’s leading AI safety researcher. My full episode with him is out!

We discuss:

- Does he regret inventing RLHF, and is alignment necessarily dual-use?

- Why he has relatively modest timelines (40% by 2040, 15% by 2030),

- What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?

- Why he’s leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,

- His current research into a new proof system, and how this could solve alignment by explaining model's behavior

- and much more.

Watch on YouTube. Listen on Apple PodcastsSpotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.

Open Philanthropy

Open Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations.

For more information and to apply, please see the application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/

The deadline to apply is November 9th; make sure to check out those roles before they close.

Timestamps

(00:00:00) - What do we want post-AGI world to look like?

(00:24:25) - Timelines

(00:45:28) - Evolution vs gradient descent

(00:54:53) - Misalignment and takeover

(01:17:23) - Is alignment dual-use?

(01:31:38) - Responsible scaling policies

(01:58:25) - Paul’s alignment research

(02:35:01) - Will this revolutionize theoretical CS and math?

(02:46:11) - How Paul invented RLHF

(02:55:10) - Disagreements with Carl Shulman

(03:01:53) - Long TSMC but not NVIDIA



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dwarkeshpatel.com
Up next
Yesterday
Nick Lane – Life as we know it is chemically inevitable
Nick Lane has some pretty wild ideas about the evolution of life.He thinks early life was continuous with the spontaneous chemistry of undersea hydrothermal vents.Nick’s story may be wrong, but I find it remarkable that with just that starting point, you can explain so much about ... Show More
1h 20m
Oct 4
Some thoughts on the Sutton interview
I have a much better understanding of Sutton’s perspective now. I wanted to reflect on it a bit.(00:00:00) - The steelman(00:02:42) - TLDR of my current thoughts(00:03:22) - Imitation learning is continuous with and complementary to RL(00:08:26) - Continual learning(00:10:31) - C ... Show More
11m 39s
Sep 26
Richard Sutton – Father of RL thinks LLMs are a dead end
Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson. And he thinks LLMs are a dead end.After interviewing him, my steel man of Richard’s position is this: LLMs aren’t capable of learning on-the-job, so no matter ... Show More
1h 6m
Recommended Episodes
Oct 2023
What AI means for your product strategy | Paul Adams (CPO of Intercom)
Paul Adams is the longtime chief product officer at Intercom, where he leads the product management, product design, data science, and research teams. Before Intercom, Paul was the global head of brand design at Facebook, a senior user researcher at Google, and a product designer ... Show More
1h 23m
Dec 2019
Judea Pearl: Causal Reasoning, Counterfactuals, Bayesian Networks, and the Path to AGI
Judea Pearl is a professor at UCLA and a winner of the Turing Award, that’s generally recognized as the Nobel Prize of computing. He is one of the seminal figures in the field of artificial intelligence, computer science, and statistics. He has developed and championed probabilis ... Show More
1h 23m
Feb 2023
The bot Cicero can collaborate, scheme and build trust with humans. What does this mean for the next frontier of AI? With Noam Brown, Research Scientist at Meta
AGI can beat top players in chess, poker, and, now, Diplomacy. In November 2022, a bot named Cicero demonstrated mastery in this game, which requires natural language negotiation and cooperation with humans. In short, Cicero can lie, scheme, build trust, pass as human, and ally w ... Show More
58m 40s
Jun 2024
Ep. 306: Defusing AI Panic
One of the simmering concerns surrounding the current AI revolution is the fear that we might accidentally create an “alien mind” smarter than we expected. In this episode, Cal puts on his Computer Scientist hat and directly addresses this potential by sketching his emerging conc ... Show More
1h 42m
Jun 2023
#154 - Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters
Can there be a more exciting and strange place to work today than a leading AI lab? Your CEO has said they're worried your research could cause human extinction. The government is setting up meetings to discuss how this outcome can be avoided. Some of your colleagues think this i ... Show More
3h 9m
Nov 2023
Geoffrey Hinton: ‘It’s Far Too Late’ to Stop Artificial Intelligence
Artificial intelligence has made headlines all year long, but the turn of events this week was extraordinary. OpenAI was thrown into chaos with the firing and eventual rehiring of CEO Sam Altman. There was a shakeup in the company’s board of directors and fierce debates about how ... Show More
35m 59s
Apr 2023
Ep. 244: Thoughts on ChatGPT
Are new AI technologies like ChatGPT about to massively disrupt our world? Drawing from his recent New Yorker article on the topic, Cal explains exactly how programs like ChatGPT work, and uses this knowledge to explain why we can calm our fears about this new technology. Below a ... Show More
1h 29m
May 2024
The TED AI Show: Is AI destroying our sense of reality? with Sam Gregory
Could you spot a deepfake? We’re entering a new world where generative AI is challenging our sense of what’s real and what’s fiction. In our first episode, Bilawal and Sam Gregory, a human rights activist and technologist, discuss how to protect our sense of reality.This is an ep ... Show More
27m 25s
Dec 2023
Uncontrollable AI Risks
We are joined by Darren McKee, a Policy Advisor and the host of Reality Check — a critical thinking podcast. Darren gave a background about himself and how he got into the AI space. Darren shared his thoughts on AGI's achievements in the coming years. He defined AGI and discussed ... Show More
38m 43s