logo
episode-header-image
Oct 2024
45m 22s

OpenAI's Noam Brown, Ilge Akkaya and Hun...

Sequoia Capital
About this episode

Combining LLMs with AlphaGo-style deep reinforcement learning has been a holy grail for many leading AI labs, and with o1 (aka Strawberry) we are seeing the most general merging of the two modes to date. o1 is admittedly better at math than essay writing, but it has already achieved SOTA on a number of math, coding and reasoning benchmarks.

Deep RL legend and now OpenAI researcher Noam Brown and teammates Ilge Akkaya and Hunter Lightman discuss the ah-ha moments on the way to the release of o1, how it uses chains of thought and backtracking to think through problems, the discovery of strong test-time compute scaling laws and what to expect as the model gets better. 

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital 

Mentioned in this episode:

  • Learning to Reason with LLMs: Technical report accompanying the launch of OpenAI o1.
  • Generator verifier gap: Concept Noam explains in terms of what kinds of problems benefit from more inference-time compute.
  • Agent57: Outperforming the human Atari benchmark, 2020 paper where DeepMind demonstrated “the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games.”
  • Move 37: Pivotal move in AlphaGo’s second game against Lee Sedol where it made a move so surprising that Sedol thought it must be a mistake, and only later discovered he had lost the game to a superhuman move.
  • IOI competition: OpenAI entered o1 into the International Olympiad in Informatics and received a Silver Medal.
  • System 1, System 2: The thesis if Danial Khaneman’s pivotal book of behavioral economics, Thinking, Fast and Slow, that positied two distinct modes of thought, with System 1 being fast and instinctive and System 2 being slow and rational.
  • AlphaZero: The predecessor to AlphaGo which learned a variety of games completely from scratch through self-play. Interestingly, self-play doesn’t seem to have a role in o1.
  • Solving Rubik’s Cube with a robot hand: Early OpenAI robotics paper that Ilge Akkaya worked on.
  • The Last Question: Science fiction story by Isaac Asimov with interesting parallels to scaling inference-time compute.
  • Strawberry: Why?
  • O1-mini: A smaller, more efficient version of 1 for applications that require reasoning without broad world knowledge.


00:00 - Introduction

01:33 - Conviction in o1

04:24 - How o1 works

05:04 - What is reasoning?

07:02 - Lessons from gameplay

09:14 - Generation vs verification

10:31 - What is surprising about o1 so far

11:37 - The trough of disillusionment

14:03 - Applying deep RL

14:45 - o1’s AlphaGo moment?

17:38 - A-ha moments

21:10 - Why is o1 good at STEM?

24:10 - Capabilities vs usefulness

25:29 - Defining AGI

26:13 - The importance of reasoning

28:39 - Chain of thought

30:41 - Implication of inference-time scaling laws

35:10 - Bottlenecks to scaling test-time compute

38:46 - Biggest misunderstanding about o1?

41:13 - o1-mini

42:15 - How should founders think about o1?

Up next
Aug 5
Vercel CEO Guillermo Rauch: Building the Generative Web with AI
Vercel CEO Guillermo Rauch has spent years obsessing over reducing the friction between having an idea and getting it online. Now with AI, he's achieving something even more ambitious: making software creation accessible to anyone with a keyboard. Guillermo explains how v0 has gr ... Show More
1 h
Jul 30
OpenAI’s IMO Team on Why Models Are Finally Solving Elite-Level Math
In just two months, a scrappy three-person team at OpenAI sprinted to fulfill what the entire AI field has been chasing for years—gold-level performance on the International Mathematical Olympiad problems. Alex Wei, Sheryl Hsu and Noam Brown discuss their unique approach using ge ... Show More
30m 10s
Jul 22
OpenAI Just Released ChatGPT Agent, Its Most Powerful Agent Yet
Isa Fulford, Casey Chu, and Edward Sun from OpenAI's ChatGPT agent team reveal how they combined Deep Research and Operator into a single, powerful AI agent that can perform complex, multi-step tasks lasting up to an hour. By giving the model access to a virtual computer with tex ... Show More
37m 36s
Recommended Episodes
Nov 2024
AI and the Future of Math, with DeepMind’s AlphaProof Team
In this week’s episode of No Priors, Sarah and Elad sit down with the Google DeepMind team behind AlphaProof, a new reinforcement learning-based system for formal math reasoning that recently reached a silver-medal standard in solving International Mathematical Olympiad problems. ... Show More
39m 21s
Jul 22
AI Just Achieved Something No One Thought it Would Until Years From Now
An experimental reasoning model from OpenAI and Deep Thinking model from Gemini just achieved a Gold Medal performance at the International Math Olympiad. In both cases, the models solved 5 out of 6 IMO problems without any external tools, using pure mathematical reasoning that r ... Show More
26m 5s
Aug 2024
The Zoom Election + Google DeepMind's Math Olympiad + HatGPT! Olympics Edition
This week, with hundreds of thousands of people joining online political rallies for Kamala Harris, we discuss whether 2024 is suddenly becoming the Zoom election, and what that means for both parties’ political organizing. Then, Pushmeet Kohli, a computer scientist at Google Dee ... Show More
1h 1m
Feb 2025
OpenAI researcher on why soft skills are the future of work | Karina Nguyen (Research at OpenAI, ex-Anthropic)
Karina Nguyen leads research at OpenAI, where she’s been pivotal in developing groundbreaking products like Canvas, Tasks, and the o1 language model. Before OpenAI, Karina was at Anthropic, where she led post-training and evaluation work for Claude 3 models, created a document up ... Show More
1h 14m
Feb 2025
Scaling AI: Building the Right AI Team
You’re smart. You know your business. But do you know how to build the right AI team? It’s harder than it looks, and the old playbook won’t cut it. In this episode, host Courtney Baker is joined by CEO David DeWolf, Chief Product & Technology Officer Mohan Rao, and NordLight CEO ... Show More
33m 21s
Apr 2024
Applying CPMAI Methodology in the real world: Interview with George Fountain, Booz Allen Hamilton (BAH) [AI Today Podcast]
Companies of all sizes in every industry are looking to see how Artificial Intelligence (AI), machine learning (ML), and cognitive technology projects can provide them a competitive edge. They want to provide efficiencies and improve ROI in today’s competitive landscape. As a res ... Show More
13m 4s
Jul 20
Anthropic co-founder on quitting OpenAI, AGI predictions, $100M talent wars, 20% unemployment, and the nightmare scenarios keeping him up at night | Ben Mann
Benjamin Mann is a co-founder of Anthropic, an AI startup dedicated to building aligned, safety-first AI systems. Prior to Anthropic, Ben was one of the architects of GPT-3 at OpenAI. He left OpenAI driven by the mission to ensure that AI benefits humanity. In this episode, Ben o ... Show More
1h 14m
Jul 22
Are World Models the Key to AGI?
A groundbreaking Harvard study trained AI on 10 million solar systems and found it perfectly predicted orbits but completely failed to understand gravity, raising questions about whether LLMs can develop true world models. While companies pour billions into scaling, Meta's Yann L ... Show More
21m 28s
Feb 2025
AI won't plateau — if we give it time to think | Noam Brown
To get smarter, traditional AI models rely on exponential increases in the scale of data and computing power. Noam Brown, a leading research scientist at OpenAI, presents a potentially transformative shift in this paradigm. He reveals his work on OpenAI's new o1 model, which focu ... Show More
13m 28s
Apr 2025
Inside monday.com’s transformation: radical transparency, impact over output, and their path to $1B ARR | Daniel Lereya (Chief Product and Technology Officer)
Daniel Lereya, the Chief Product and Technology Officer at monday.com, shares how he and his team realized they were being outpaced by competitors and how that realization completely transformed how they operate and allowed them to build a global powerhouse, doing over $1 billion ... Show More
1h 32m