logo
episode-header-image
Apr 6
12m 36s

Benchmark Bank Heist

SoundCloud Feeds
About this episode
What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning the answer it found inside. It's equal parts impressive and unsettling. This epi ... Show More
Up next
Apr 13
Unfaithful Chain of Thought
What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same mig ... Show More
24m 32s
Mar 30
Benchmarking AI Models
How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU ... Show More
29m 55s
Mar 23
The Hot Mess of AI (Mis-)Alignment
The paperclip maximizer — the classic AI doom scenario where a hyper-competent machine single-mindedly converts the universe into office supplies — might not be the AI risk we should actually lose sleep over. New research from Anthropic's AI safety division suggests misaligned AI ... Show More
22m 32s
Recommended Episodes
Feb 2022
AI Today Podcast: Overview of Synthetic Data
Machine learning algorithms need examples of data from which they can learn, especially supervised machine learning algorithms. However, one big challenge for those looking to put machine learning into practice is the lack of a sufficient quantity of good quality data examples fr ... Show More
47m 14s
Feb 2017
MLG 004 Algorithms - Intuition
<div> <p>Machine learning consists of three steps: prediction, error evaluation, and learning, implemented by training algorithms on large datasets to build models that can make decisions or classifications. The primary categories of machine learning algorithms are supervised, un ... Show More
23m 27s
Jun 2020
Rust and machine learning #4: practical tools (Ep. 110)
<p>In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly.</p> <p>To make a comparison with the Python ecos ... Show More
24m 18s
Feb 2017
MLG 001 Introduction
<p>Show notes: <a href= "https://ocdevel.com/mlg/1?utm_source=podcast&utm_medium=mlg&utm_campaign=mlg1" target="_blank" rel="noopener">ocdevel.com/mlg/1</a>. MLG teaches the fundamentals of machine learning and artificial intelligence. It covers intuition, models, math, languages ... Show More
8m 11s
Nov 2024
SE Radio 641: Catherine Nelson on Machine Learning in Data Science
<p><strong>Catherine Nelson</strong>, author of the new O'Reilly book, <em data-renderer-mark="true">Software Engineering for Data Scientists</em>, discusses the collaboration between data scientists and software engineers -- an increasingly common pairing on machine learning and ... Show More
48m 19s
Jul 2023
AI Today Podcast: AI Glossary Series – Automated Machine Learning (AutoML)
In this episode of the AI Today podcast hosts Kathleen Walch and Ron Schmelzer define the term Automated Machine Learning (AutoML), explain how this term relate to AI and why it’s important to know about them. Show Notes: FREE Intro to CPMAI mini course CPMAI Training and Certifi ... Show More
9m 11s
Mar 2017
MLG 009 Deep Learning
tail spinning
51m 28s
Apr 2021
464: A.I. vs Machine Learning vs Deep Learning
In this episode, I tackle three often conflated terms - AI, machine learning, and deep learning - to shine some light on what exactly they are. Additional materials: www.superdatascience.com/464 
7m 14s
Feb 2017
MLG 002 Difference Between Artificial Intelligence, Machine Learning, Data Science
<div> <div> <p>Artificial intelligence is the automation of tasks that require human intelligence, encompassing fields like natural language processing, perception, planning, and robotics, with machine learning emerging as the primary method to recognize patterns in data and make ... Show More
1h 5m