AI often looks fully automated. But behind the scenes, a huge amount of human judgment is shaping how these systems actually work.

In this episode, Craig Smith speaks with Phelim Bradley, co-founder and CEO of Prolific, a platform that connects millions of real people with researchers and AI labs to evaluate and improve AI systems.

They explore the hidden human layer behind modern AI, why traditional benchmarks are becoming less reliable, and why AI companies increasingly rely on real human feedback to measure model performance in the real world.

Phelim also explains how demographic differences influence how models are evaluated, why human judgment remains critical even as AI improves, and how the collaboration between humans and AI will shape the next phase of development.

This conversation reveals the human backbone behind today's AI systems.

Stay Updated:

Craig Smith on X: https://x.com/craigss

Eye on A.I. on X: https://x.com/EyeOn_AI

(00:00) Preview and Intro

(02:45) Founding Prolific And Early Pain Points

(06:30) From Mechanical Turk To Representativeness

(09:55) Academic Research And AI Use Cases Split

(13:40) Vetting Real Participants And Fighting Fraud

(17:45) Scale, Community Growth, And Talent Mix

(22:00) High-Complexity Projects Over Commoditised Labeling

(26:40) Measuring Model Persuasion With Live Conversations

(30:20) Demographic-Aware Model Preference Benchmarks

(34:10) The Rise Of Human Evaluation Over Benchmarks

(38:00) Enterprise Model Choice And Continuous Evaluation

(42:00) Why Humans Won't Disappear From The Loop