logo
episode-header-image
May 2023
2h 49m

#151 – Ajeya Cotra on accidentally teach...

Rob, Luisa, and the 80,000 Hours team
About this episode

Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons.

Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods.

Links to learn more, summary and full transcript.

As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it.

Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky!

Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes:

  • Saints — models that care about doing what we really want
  • Sycophants — models that just want us to say they've done a good job, even if they get that praise by taking actions they know we wouldn't want them to
  • Schemers — models that don't care about us or our interests at all, who are just pleasing us so long as that serves their own agenda

And according to Ajeya, there are also ways we could end up actively selecting for motivations that we don't want.

In today's interview, Ajeya and Rob discuss the above, as well as:

  • How to predict the motivations a neural network will develop through training
  • Whether AIs being trained will functionally understand that they're AIs being trained, the same way we think we understand that we're humans living on planet Earth
  • Stories of AI misalignment that Ajeya doesn't buy into
  • Analogies for AI, from octopuses to aliens to can openers
  • Why it's smarter to have separate planning AIs and doing AIs
  • The benefits of only following through on AI-generated plans that make sense to human beings
  • What approaches for fixing alignment problems Ajeya is most excited about, and which she thinks are overrated
  • How one might demo actually scary AI failure mechanisms

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type ‘80,000 Hours’ into your podcasting app. Or read the transcript below.

Producer: Keiran Harris

Audio mastering: Ryan Kessler and Ben Cordell

Transcriptions: Katy Moore

Up next
Jul 8
#220 – Ryan Greenblatt on the 4 most likely ways for AI to take over, and the case for and against AGI in <8 years
Ryan Greenblatt — lead author on the explosive paper “Alignment faking in large language models” and chief scientist at Redwood Research — thinks there’s a 25% chance that within four years, AI will be able to do everything needed to run an AI company, from writing code to design ... Show More
2h 50m
Jun 24
#219 – Toby Ord on graphs AI companies would prefer you didn't (fully) understand
The era of making AI smarter just by making it bigger is ending. But that doesn’t mean progress is slowing down — far from it. AI models continue to get much more powerful, just using very different methods, and those underlying technical changes force a big rethink of what comin ... Show More
2h 48m
Jun 12
#218 – Hugh White on why Trump is abandoning US hegemony – and that’s probably good
For decades, US allies have slept soundly under the protection of America’s overwhelming military might. Donald Trump — with his threats to ditch NATO, seize Greenland, and abandon Taiwan — seems hell-bent on shattering that comfort.But according to Hugh White — one of the world' ... Show More
2h 48m
Recommended Episodes
Aug 2023
How A.I. Will Destroy Or Save Humanity w/ Mo Gawdat
The former Chief Business Officer of GOOGLE X has a WARNING for us about AI and you NEED to hear it! THIS is the turning point for humanity…While many people are touting AI’s incredible benefits, others are striking a more cautionary tone about the future of AI, including this we ... Show More
1h 12m
Feb 2023
Use AI to get ahead while others panic (PREPARE NOW) | Tom Bilyeu
Attain Your Potential - Download the FREE Impact 90 Challenge Start Pack: bit.ly/3hr3zBi Click here to download your FREE guide to 100x YOUR EFFICIENCY IN 10 EASY STEPS: https://bit.ly/3F8qOJL AI is going to obliterate your job. And that’s fantastic news. It will finally free you ... Show More
17m 50s
May 2023
#239: Will Artificial Intelligence Replace You Soon? AI Expert Fahed Bizzari Reveals Everything
It's time we address the elephant in the room. There's no doubt that artificial intelligence is at the forefront of every industry and around every corner. But the big question on everyone's mind is: "Will I be replaced by a robot soon?". In this episode, we discuss exactly that ... Show More
38m 49s
Jun 2023
Can Artificial Intelligence teach itself?
Welcome to the exciting new field of generative artificial intelligence - or generative AI. We’re not talking about robots or spaceships: instead these are image generators and chatbots that are already revolutionising the way people write, research and interact in the virtual wo ... Show More
26m 28s
Dec 2023
Best of ThinkCast 2023: Generative AI, Generative AI and More Generative AI
In 2023, ThinkCast has covered mission-critical business topics from supply chain disruptions to hybrid work to strategic planning. But one topic has dominated the conversation: generative AI. For our best-of-the-year special, we’ve focused on three insightful angles you can use ... Show More
28m 25s
Aug 2023
How to Turn Our Fear of AI Into Optimism
Mo Gawdat, an artificial intelligence expert and former chief business officer of Google X, explains how humans have the power to turn AI into a positive force that benefits society.  Background: Public perception of artificial intelligence ranges widely. Depending on who you’re ... Show More
21m 8s
Jun 2023
Ethan Mollick — How AI Changes Everything (EP.165)
Ethan Mollick is an Associate Professor at the Wharton School of the University of Pennsylvania, where he studies and teaches innovation and entrepreneurship. He also leads Wharton Interactive, an effort to democratize education using games, simulations, and AI. When Ethan starte ... Show More
53m 47s
May 2023
Continuous Learning With AI: Aflac’s Shelia Anderson
Shelia Anderson parlayed a love of learning into studying the emerging field of engineering when she began her undergraduate education. After gaining experience leading IT teams in the technology, airline, and insurance industries, she joined Aflac in the summer of 2022. Shelia j ... Show More
22m 21s
Jan 2024
Why AI Should Be Taught to Know Its Limits
One of AI’s biggest, unsolved problems is what the advanced algorithms should do when they confront a situation they don’t have an answer for. For programs like Chat GPT, that could mean providing a confidently wrong answer, what’s often called a “hallucination”; for others, as w ... Show More
17m 43s
Mar 2024
134. How to Chat with Bots: The Secrets to Getting the Information You Need from AI
Join Matt Abrahams with creativity and innovation experts Jeremy Utley and Kian Gohar to explore the transformative potential of AI in the realms of creativity and problem-solving. If you treat artificial intelligence like an oracle, you’ll likely be disappointed. But if you trea ... Show More
25m 58s