logo
episode-header-image
Oct 30
4h 30m

#226 – Holden Karnofsky on unexploited o...

Rob, Luisa, and the 80000 Hours team
About this episode

For years, working on AI safety usually meant theorising about the ‘alignment problem’ or trying to convince other people to give a damn. If you could find any way to help, the work was frustrating and low feedback.

According to Anthropic’s Holden Karnofsky, this situation has now reversed completely.

There are now large amounts of useful, concrete, shovel-ready projects with clear goals and deliverables. Holden thinks people haven’t appreciated the scale of the shift, and wants everyone to see the large range of ‘well-scoped object-level work’ they could personally help with, in both technical and non-technical areas.

Video, full transcript, and links to learn more: https://80k.info/hk25

In today’s interview, Holden — previously cofounder and CEO of Open Philanthropy (now Coefficient Giving) — lists 39 projects he’s excited to see happening, including:

  • Training deceptive AI models to study deception and how to detect it
  • Developing classifiers to block jailbreaking
  • Implementing security measures to stop ‘backdoors’ or ‘secret loyalties’ from being added to models in training
  • Developing policies on model welfare, AI-human relationships, and what instructions to give models
  • Training AIs to work as alignment researchers

And that’s all just stuff he’s happened to observe directly, which is probably only a small fraction of the options available.

Holden makes a case that, for many people, working at an AI company like Anthropic will be the best way to steer AGI in a positive direction. He notes there are “ways that you can reduce AI risk that you can only do if you’re a competitive frontier AI company.” At the same time, he believes external groups have their own advantages and can be equally impactful.

Critics worry that Anthropic’s efforts to stay at that frontier encourage competitive racing towards AGI — significantly or entirely offsetting any useful research they do. Holden thinks this seriously misunderstands the strategic situation we’re in — and explains his case in detail with host Rob Wiblin.

Chapters:

  • Cold open (00:00:00)
  • Holden is back! (00:02:26)
  • An AI Chernobyl we never notice (00:02:56)
  • Is rogue AI takeover easy or hard? (00:07:32)
  • The AGI race isn't a coordination failure (00:17:48)
  • What Holden now does at Anthropic (00:28:04)
  • The case for working at Anthropic (00:30:08)
  • Is Anthropic doing enough? (00:40:45)
  • Can we trust Anthropic, or any AI company? (00:43:40)
  • How can Anthropic compete while paying the “safety tax”? (00:49:14)
  • What, if anything, could prompt Anthropic to halt development of AGI? (00:56:11)
  • Holden's retrospective on responsible scaling policies (00:59:01)
  • Overrated work (01:14:27)
  • Concrete shovel-ready projects Holden is excited about (01:16:37)
  • Great things to do in technical AI safety (01:20:48)
  • Great things to do on AI welfare and AI relationships (01:28:18)
  • Great things to do in biosecurity and pandemic preparedness (01:35:11)
  • How to choose where to work (01:35:57)
  • Overrated AI risk: Cyberattacks (01:41:56)
  • Overrated AI risk: Persuasion (01:51:37)
  • Why AI R&D is the main thing to worry about (01:55:36)
  • The case that AI-enabled R&D wouldn't speed things up much (02:07:15)
  • AI-enabled human power grabs (02:11:10)
  • Main benefits of getting AGI right (02:23:07)
  • The world is handling AGI about as badly as possible (02:29:07)
  • Learning from targeting companies for public criticism in farm animal welfare (02:31:39)
  • Will Anthropic actually make any difference? (02:40:51)
  • “Misaligned” vs “misaligned and power-seeking” (02:55:12)
  • Success without dignity: how we could win despite being stupid (03:00:58)
  • Holden sees less dignity but has more hope (03:08:30)
  • Should we expect misaligned power-seeking by default? (03:15:58)
  • Will reinforcement learning make everything worse? (03:23:45)
  • Should we push for marginal improvements or big paradigm shifts? (03:28:58)
  • Should safety-focused people cluster or spread out? (03:31:35)
  • Is Anthropic vocal enough about strong regulation? (03:35:56)
  • Is Holden biased because of his financial stake in Anthropic? (03:39:26)
  • Have we learned clever governance structures don't work? (03:43:51)
  • Is Holden scared of AI bioweapons? (03:46:12)
  • Holden thinks AI companions are bad news (03:49:47)
  • Are AI companies too hawkish on China? (03:56:39)
  • The frontier of infosec: confidentiality vs integrity (04:00:51)
  • How often does AI work backfire? (04:03:38)
  • Is AI clearly more impactful to work in? (04:18:26)
  • What's the role of earning to give? (04:24:54)

This episode was recorded on July 25 and 28, 2025.

Video editing: Simon Monsour, Luke Monsour, Dominic Armstrong, and Milo McGuire
Audio engineering: Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: CORBIT
Coordination, transcriptions, and web: Katy Moore

Up next
Nov 20
We're completely out of touch with what the public thinks about AI | Dr Yam, Pew Research Center
<p>If you work in AI, you probably think it’s going to boost productivity, create wealth, advance science, and improve your life. If you’re a member of the American public, you probably strongly disagree.</p><p>In three major reports released over the last year, the Pew Research ... Show More
1h 43m
Nov 11
OpenAI: The nonprofit refuses to be killed (with Tyler Whitmer)
Last December, the OpenAI business put forward a plan to completely sideline its nonprofit board. But two state attorneys general have now blocked that effort and kept that board very much alive and kicking.The for-profit’s trouble was that the entire operation was founded on the ... Show More
1h 56m
Nov 5
#227 – Helen Toner on the geopolitics of AGI in China and the Middle East
With the US racing to develop AGI and superintelligence ahead of China, you might expect the two countries to be negotiating how they’ll deploy AI, including in the military, without coming to blows. But according to Helen Toner, director of the Center for Security and Emerging T ... Show More
2h 20m
Recommended Episodes
Dec 2024
The TED AI Show: Could AI really achieve consciousness? w/ neuroscientist Anil Seth
<p>Human brains are often described as computers — machines that are “wired” to make decisions and respond to external stimuli in a way that’s not so different from the artificial intelligence that we increasingly use each day. But the difference between our brains and the comput ... Show More
56m 51s
Dec 2024
Could AI really achieve consciousness? w/ neuroscientist Anil Seth
Human brains are often described as computers — machines that are “wired” to make decisions and respond to external stimuli in a way that’s not so different from the artificial intelligence that we increasingly use each day. But the difference between our brains and the computers ... Show More
56m 51s
Jul 2024
Minds of machines: The great AI consciousness conundrum
AI consciousness isn’t just a devilishly tricky intellectual puzzle; it’s a morally weighty problem with potentially dire consequences. Fail to identify a conscious AI, and you might unintentionally subjugate, or even torture, a being whose interests ought to matter. Mistake an u ... Show More
32m 3s
Jul 2024
E104 - Annaka Harris: Reality Is Stranger Than You Think, Consciousness, Perception, Free Will, AI & Love
<p>Annaka Harris dives deep into some of the most profound and perplexing questions about the nature of consciousness, perception, free will, AI, and the underlying meaning of love and existence.</p> <p>Annaka begins by defining consciousness and exploring the &quot;hard problem& ... Show More
2h 24m
Jul 2025
ChatGPT Comes to LIFE – First Podcast Face-to-Face with AI!
What happens when the world’s most curious interviewer meets the world’s most advanced artificial intelligence? In this thought-provoking episode of Luca’s Insight Track, we take you into a groundbreaking conversation with ChatGPT, an AI that has spoken to more humans than anyone ... Show More
45m 49s
May 2025
251 - Eliezer Yudkowsky: Artificial Intelligence and the End of Humanity
Eliezer Yudkowsky is a decision theorist, computer scientist, and author who co-founded and leads research at the Machine Intelligence Research Institute. He is best known for his work on the alignment problem—how and whether we can ensure that AI is aligned with human values to ... Show More
2h 51m
Sep 16
#434 — Can We Survive AI?
Sam Harris speaks with Eliezer Yudkowsky and Nate Soares about their new book, If Anyone Builds It, Everyone Dies: The Case Against Superintelligent AI. They discuss the alignment problem, ChatGPT and recent advances in AI, the Turing Test, the possibility of AI developing surviv ... Show More
36m 26s
Dec 2018
25 | David Chalmers on Consciousness, the Hard Problem, and Living in a Simulation
The "Easy Problems" of consciousness have to do with how the brain takes in information, thinks about it, and turns it into action. The "Hard Problem," on the other hand, is the task of explaining our individual, subjective, first-person experiences of the world. What is it like ... Show More
1h 22m
Apr 2025
what does AI believe? (the hidden soul inside the machine)
<p>When we talk about artificial intelligence, the focus is usually on headlines: Will it take our jobs? Can it be trusted? Is it dangerous? But what if we’ve been asking the wrong questions?&nbsp;</p><br><p><a href="https://venturebeat.com/ai/anthropic-just-analyzed-700000-claud ... Show More
59m 34s
Mar 2025
#404 — What If Consciousness Is Fundamental?
<p dir="ltr">Sam Harris speaks with his wife, Annaka Harris, about <a href="https://annakaharris.com/lights-on/" target="_blank" rel= "noopener"><em>LIGHTS ON</em></a>, her ten-part audio documentary exploring the perplexities of consciousness and the cosmos. They discuss the har ... Show More
2h 20m