logo
episode-header-image
Mar 2025
1h 52m

180: Reinforcement Learning

Patrick Wheeler and Jason Gauci
About this episode

Intro topic: Grills

News/Links:

Book of the Show


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show

  • Patrick: 
    • Pokemon Sword and Shield
  • Jason: 

Topic: Reinforcement Learning

  • Three types of AI
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
    • Value optimization
      • SARSA
      • Q-Learning
    • Policy optimization
      • Policy Gradients
      • Actor-Critic
      • Proximal Policy Optimization
  • Value vs Policy Optimization
    • Value optimization is more intuitive (Value loss)
    • Policy optimization is less intuitive at first (policy gradients)
    • Converting values to policies in deep learning is difficult
  • Imitation Learning
    • Supervised policy learning
    • Often used to bootstrap reinforcement learning
  • Policy Evaluation
    • Propensity scoring versus model-based
  • Challenges to training RL model
    • Two optimization loops
      • Collecting feedback vs updating the model
    • Difficult optimization target
      • Policy evaluation
  • RLHF &  GRPO

★ Support this podcast on Patreon ★
Up next
Jun 30
182: AI Assisted Coding
Intro topic: Getting an entry-level jobNews/Links:Mario Kart 64 Fully Decompiledhttps://gbatemp.net/threads/mario-kart-64-decompilation-project-reaches-100-completion.671104/Q-Learning is not yet scalablehttps://seohong.me/blog/q-learning-is-not-yet-scalable/Grover’s Algorithmhtt ... Show More
1h 37m
May 12
181: Memory Management
Intro topic: Video Game PricesNews/Links:Step one: Jump in the Lava - Abyssofthttps://youtu.be/WdadpHLAfdA?si=oXYnhB0EdkR_RaPEScalable world models for continuous controlhttps://www.tdmpc2.com/Clever code is probably the worst code you could write - Engineer’s Codexhttps://read.e ... Show More
1h 46m
Feb 2025
179: Project Planning
Intro topic: Lego event space & retail store: https://www.instagram.com/bambeecave News/Links:StackOverflow Question Count Going Down https://gist.github.com/hopeseekr/f522e380e35745bd5bdc3269a9f0b132DeepSeek claims its ‘reasoning’ model beats OpenAI’s o1 on certain benchmarkshtt ... Show More
1h 43m
Recommended Episodes
Jan 2025
DeepSeek R1 & The Short Case For Nvidia Stock | Jeffrey Emanuel
China’s new DeepSeek AI model, which reportedly matches GPT-4’s performance at 1/45th the cost, has rattled the AI hardware market and contributed to a 20% dip in Nvidia’s stock price. Investor-technologist Jeffrey Emanuel argues that DeepSeek’s efficiency gains aren’t the only s ... Show More
1h 29m
Feb 2025
AI ROLLUP #11: $97B Elon OpenAI Rumor | AI Crypto Rebound? | Virtuals on Solana | ARC Launchpad
Ejaaz and David reunite to dissect the AI Crypto sector’s rebound from a 70% crash, fueled by Elon’s rumored $97B OpenAI bid and the relentless rise of open-source devs being heads down. They explore how tokens might be the most accessible path to AI exposure, why ARC’s curated l ... Show More
1h 9m
Sep 2024
20VC: OpenAI's Newest Board Member, Zico Colter on The Biggest Bottlenecks to the Performance of Foundation Models | The Biggest Questions and Concerns in AI Safety | How to Regulate an AI-Centric Wor
Zico Colter is a Professor and the Director of the Machine Learning Department at Carnegie Mellon University.  His research spans several topics in AI and machine learning, including work in AI safety and robustness, LLM security, the impact of data on models, implicit models, an ... Show More
1 h
Nov 2023
OpenAI DevDay: Everything You Need To Know
Yesterday OpenAI announces 128k GPT-4 Turbo at 1/3rd the price; a new Text-to-Speech model; Whisper 3; and proto-agent features like the Assistants API and Custom GPTs. Today's Sponsors: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or ... Show More
23m 48s
May 2023
Mind-Reading AI Generates Videos from Brainwaves
Researchers have developed a new method called Mind-Video for generating semantically accurate video from fMRI scans. Additionally, on todays Brief, NLW covers: The most important announcements from Microsoft Build (including AI coming to Windows 11 and ChatGPT + Bing) Google Bar ... Show More
17m 3s
Apr 8
What 14 Quantum Titans Revealed at GTC
Deploy Your AI Agents 8x faster with LangWatch. Get a demo: https://langwatch.ai/?utm_source=louis-yt► Master the most in-demand skill for building AI-powered solutions—from scratch: https://academy.towardsai.net/courses/python-for-genai?ref=1f9b29► Master LLMs and Get Industry-r ... Show More
15m 32s
Feb 2025
China’s AI Bombshell: DeepSeek Just Changed Everything
China’s latest AI breakthrough, DeepSeek AI, has just sent shockwaves through the global tech landscape, rattling U.S. dominance in artificial intelligence. Built in just two months for a fraction of the cost of OpenAI's models, DeepSeek R1 is outperforming some of the most p ... Show More
11m 38s
Dec 2024
Altman Says AGI Coming Faster Than We Think
Sam Altman shares bold predictions about AGI during the DealBook Summit, claiming it's coming sooner than many anticipate. This video covers Altman's remarks, including OpenAI's progress on AGI, updated user stats for ChatGPT, and insights into OpenAI's evolving relationship with ... Show More
16m 29s
Feb 2017
MLG 001 Introduction
Show notes: ocdevel.com/mlg/1. MLG teaches the fundamentals of machine learning and artificial intelligence. It covers intuition, models, math, languages, frameworks, etc. Where your other ML resources provide the trees, I provide the forest. Consider MLG your syllabus, with high ... Show More
8m 11s
Jan 2025
AI ROLLUP #5: T-1 Listings Soon? | Ai16z’s Eliza #1 GitHub | Virtuals Flips Tao | Zerebro’s Creative LLM
Ejaaz returns to break down the unstoppable rise of AI agents in crypto alongside David for a special New Year AI Rollup, setting the stage for a massive 2025.  They explore how frameworks like Ai16z’s Eliza are accelerating multi-chain agent deployment, why Zerebro’s creative mo ... Show More
1h 24m