logo
episode-header-image
May 2023
2h 49m

#151 – Ajeya Cotra on accidentally teach...

Rob, Luisa, and the 80,000 Hours team
About this episode

Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons.

Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods.

Links to learn more, summary and full transcript.

As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it.

Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky!

Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes:

  • Saints — models that care about doing what we really want
  • Sycophants — models that just want us to say they've done a good job, even if they get that praise by taking actions they know we wouldn't want them to
  • Schemers — models that don't care about us or our interests at all, who are just pleasing us so long as that serves their own agenda

And according to Ajeya, there are also ways we could end up actively selecting for motivations that we don't want.

In today's interview, Ajeya and Rob discuss the above, as well as:

  • How to predict the motivations a neural network will develop through training
  • Whether AIs being trained will functionally understand that they're AIs being trained, the same way we think we understand that we're humans living on planet Earth
  • Stories of AI misalignment that Ajeya doesn't buy into
  • Analogies for AI, from octopuses to aliens to can openers
  • Why it's smarter to have separate planning AIs and doing AIs
  • The benefits of only following through on AI-generated plans that make sense to human beings
  • What approaches for fixing alignment problems Ajeya is most excited about, and which she thinks are overrated
  • How one might demo actually scary AI failure mechanisms

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

Producer: Keiran Harris

Audio mastering: Ryan Kessler and Ben Cordell

Transcriptions: Katy Moore

Up next
Yesterday
Rob & Luisa chat kids, the 2016 fertility crash, and how the 50s invented parenting that makes us miserable
<p>Global fertility rates aren’t just falling: the rate of decline is accelerating. From 2006 to 2016, fertility dropped gradually, but since 2016 the rate of decline has increased 4.5-fold. In many wealthy countries, fertility is now below 1.5. While we don’t notice it yet, in t ... Show More
1h 59m
Nov 20
#228 – Eileen Yam on how we're completely out of touch with what the public thinks about AI
<p>If you work in AI, you probably think it’s going to boost productivity, create wealth, advance science, and improve your life. If you’re a member of the American public, you probably strongly disagree.</p><p>In three major reports released over the last year, the Pew Research ... Show More
1h 43m
Nov 11
OpenAI: The nonprofit refuses to be killed (with Tyler Whitmer)
Last December, the OpenAI business put forward a plan to completely sideline its nonprofit board. But two state attorneys general have now blocked that effort and kept that board very much alive and kicking.The for-profit’s trouble was that the entire operation was founded on the ... Show More
1h 56m
Recommended Episodes
Aug 2023
How A.I. Will Destroy Or Save Humanity w/ Mo Gawdat
The former Chief Business Officer of GOOGLE X has a WARNING for us about AI and you NEED to hear it! THIS is the turning point for humanity… While many people are touting AI’s incredible benefits, others are striking a more cautionary tone about the future of AI, including this w ... Show More
1h 12m
May 2023
#239: Will Artificial Intelligence Replace You Soon? AI Expert Fahed Bizzari Reveals Everything
<p class="MsoNormal"><span style= "font-size: 11.0pt; font-family: 'Century Gothic',sans-serif;">It's time we address the elephant in the room</span><span lang="EN-US" style= "font-size: 11.0pt; font-family: 'Century Gothic',sans-serif; mso-ansi-language: EN-US;" xml:lang="EN-US" ... Show More
38m 49s
Jun 2023
Can Artificial Intelligence teach itself?
<p>Welcome to the exciting new field of generative artificial intelligence - or generative AI. We’re not talking about robots or spaceships: instead these are image generators and chatbots that are already revolutionising the way people write, research and interact in the virtual ... Show More
26m 28s
Dec 2023
Best of ThinkCast 2023: Generative AI, Generative AI and More Generative AI
<p dir="ltr">In 2023, ThinkCast has covered mission-critical business topics from supply chain disruptions to hybrid work to strategic planning. But one topic has dominated the conversation: generative AI.</p> <p><strong> </strong></p> <p dir="ltr">For our best-of-the-year specia ... Show More
28m 25s
Aug 2023
How to Turn Our Fear of AI Into Optimism
<p><em>Mo Gawdat, an artificial intelligence expert and former chief business officer of Google X, explains how humans have the power to turn AI into a positive force that benefits society.&nbsp;</em></p><br><p><strong>Background:</strong></p><br><p>Public perception of artificia ... Show More
21m 8s
Jun 2023
Ethan Mollick — How AI Changes Everything (EP.165)
<p>Ethan Mollick is an Associate Professor at the Wharton School of the University of Pennsylvania, where he studies and teaches innovation and entrepreneurship. He also leads Wharton Interactive, an effort to democratize education using games, simulations, and AI.<br /> <br /> W ... Show More
53m 47s
May 2023
Continuous Learning With AI: Aflac’s Shelia Anderson
Shelia Anderson parlayed a love of learning into studying the emerging field of engineering when she began her undergraduate education. After gaining experience leading IT teams in the technology, airline, and insurance industries, she joined Aflac in the summer of 2022. Shelia j ... Show More
22m 21s
Mar 2024
134. How to Chat with Bots: The Secrets to Getting the Information You Need from AI
<p>Join Matt Abrahams with creativity and innovation experts <a href="https://www.fastersmarter.io/guests/jeremy-utley/">Jeremy Utley</a> and <a href="https://www.fastersmarter.io/guests/kian-gohar/">Kian Gohar</a> to explore the transformative potential of AI in the realms of cr ... Show More
28m 23s