logo
episode-header-image
Mar 2025
1h 52m

180: Reinforcement Learning

Patrick Wheeler and Jason Gauci
About this episode

Intro topic: Grills

News/Links:

Book of the Show


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show

  • Patrick: 
    • Pokemon Sword and Shield
  • Jason: 

Topic: Reinforcement Learning

  • Three types of AI
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
    • Value optimization
      • SARSA
      • Q-Learning
    • Policy optimization
      • Policy Gradients
      • Actor-Critic
      • Proximal Policy Optimization
  • Value vs Policy Optimization
    • Value optimization is more intuitive (Value loss)
    • Policy optimization is less intuitive at first (policy gradients)
    • Converting values to policies in deep learning is difficult
  • Imitation Learning
    • Supervised policy learning
    • Often used to bootstrap reinforcement learning
  • Policy Evaluation
    • Propensity scoring versus model-based
  • Challenges to training RL model
    • Two optimization loops
      • Collecting feedback vs updating the model
    • Difficult optimization target
      • Policy evaluation
  • RLHF &  GRPO

★ Support this podcast on Patreon ★
Up next
Nov 4
185: Workflow Orchestrators
Intro topic: Asymmetric ReturnsNews/Links:NanoChat by Andrej Karpathyhttps://github.com/karpathy/nanochatPydantic AIhttps://www.marktechpost.com/2025/03/25/pydanticai-advancing-generative-ai-agent-development-through-intelligent-framework-design/1000th Starlink this yearhttps://s ... Show More
1h 32m
Sep 23
184: Asynchronous Programming
184: Asynchronous ProgrammingIntro topic: AI ScamsNews/Links:Coding Adventure: Ray-Tracing Glass and Caustics (Sebastian Lague)https://www.youtube.com/watch?v=wA1KVZ1eOuABoson AI announces Higgs Audio V2https://www.boson.ai/technologies/voice The Misconception that Almost Stopped ... Show More
1h 30m
Jul 2025
183: Landing a Software Job in 2025
00:00:00 Intro00:01:58 Introducing Mark Cunningham00:07:01 How Do You Find A Job?00:15:43 How to Get the Best Interview00:33:06 Tips on How To Pass An Interview00:38:38 How to Have a Good Interview00:48:12 What is the Reverse Interview?00:54:24 What Is The Hiring Manager's Role?0 ... Show More
1h 46m
Recommended Episodes
Jan 2025
DeepSeek R1 & The Short Case For Nvidia Stock | Jeffrey Emanuel
<p>China’s new DeepSeek AI model, which reportedly matches GPT-4’s performance at 1/45th the cost, has rattled the AI hardware market and contributed to a 20% dip in Nvidia’s stock price. Investor-technologist Jeffrey Emanuel argues that DeepSeek’s efficiency gains aren’t the onl ... Show More
1h 29m
Sep 16
OpenAI Strengthens ChatGPT Safety
<p>Safety is at the heart of ChatGPT’s newest updates. We explore the improvements designed to prevent misuse and protect users. This episode also asks whether these steps are enough for the future of AI trust.</p><p></p><ul><li><p>Try AI Box: <a href="https://aibox.ai">⁠⁠https:/ ... Show More
12m 35s
Feb 2025
AI ROLLUP #11: $97B Elon OpenAI Rumor | AI Crypto Rebound? | Virtuals on Solana | ARC Launchpad
<p>Ejaaz and David reunite to dissect the AI Crypto sector’s rebound from a 70% crash, fueled by Elon’s rumored $97B OpenAI bid and the relentless rise of open-source devs being heads down. They explore how tokens might be the most accessible path to AI exposure, why ARC’s curate ... Show More
1h 9m
Sep 2024
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric World
<p><!-- wp:paragraph --></p> <p>Zico Colter is a Professor and the Director of the Machine Learning Department at Carnegie Mellon University.  His research spans several topics in AI and machine learning, including work in AI safety and robustness, LLM security, the impact of dat ... Show More
1 h
Nov 2023
OpenAI DevDay: Everything You Need To Know
Yesterday OpenAI announces 128k GPT-4 Turbo at 1/3rd the price; a new Text-to-Speech model; Whisper 3; and proto-agent features like the Assistants API and Custom GPTs. Today's Sponsors: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts o ... Show More
23m 48s
May 2023
Mind-Reading AI Generates Videos from Brainwaves
Researchers have developed a new method called Mind-Video for generating semantically accurate video from fMRI scans. Additionally, on todays Brief, NLW covers: The most important announcements from Microsoft Build (including AI coming to Windows 11 and ChatGPT + Bing) Google ... Show More
17m 3s
Apr 2025
What 14 Quantum Titans Revealed at GTC
Deploy Your AI Agents 8x faster with LangWatch. Get a demo: https://langwatch.ai/?utm_source=louis-yt► Master the most in-demand skill for building AI-powered solutions—from scratch: https://academy.towardsai.net/courses/python-for-genai?ref=1f9b29► Master LLMs and Get Industry-r ... Show More
15m 32s
Feb 2025
China’s AI Bombshell: DeepSeek Just Changed Everything
<p>China’s latest AI breakthrough, <strong>DeepSeek AI</strong>, has just sent shockwaves through the global tech landscape, rattling U.S. dominance in artificial intelligence. Built in just <strong>two months for a fraction of the cost</strong> of OpenAI's models, DeepSeek R1 is ... Show More
11m 38s
Dec 2024
Altman Says AGI Coming Faster Than We Think
<p>Sam Altman shares bold predictions about AGI during the DealBook Summit, claiming it's coming sooner than many anticipate. This video covers Altman's remarks, including OpenAI's progress on AGI, updated user stats for ChatGPT, and insights into OpenAI's evolving relationship w ... Show More
16m 29s
Oct 2
Windows 7 usage spike, OpenAI's Sora app, Meta AI chat data + more!
Timestamps: 0:00 hit the 'Links with some bros 0:15 Windows 7 'market share' spike 2:02 OpenAI's new 'Sora' video slop app 3:31 Meta training on AI chat data 4:17 Gemini for Home, Nothing AI app store 5:05 War Thunder! 5:47 QUICK BITS INTRO 5:57 UK demands Apple backdoor again 6: ... Show More
10m 53s