logo
episode-header-image
May 2025
32m 41s

How Googlebot crawls the web

Google
About this episode

In this episode of Search Off the Record, Martin and Gary from the Google Search Relations team take a deep dive into how Googlebot and web crawling work—past, present, and future. Through their humorous and thoughtful conversation, they explore how crawling evolved from the early days of the internet, when scripts could index a chunk of the web from a single homepage, to the more complex and considerate systems used today. They discuss the basics of what a crawler is, how tools like cURL or Wget relate, and how policies like robots.txt ensure crawlers play nice with web infrastructure.

 

The conversation also covers Google's internal shift to unified infrastructure for all crawling needs, highlighting how different teams moved from separate crawlers to a shared system that enforces consistent policies. They explain why some fetches bypass robots.txt (like user-initiated actions) and the rising impact of automated traffic from new products and AI agents. With a nod to initiatives like Common Crawl, the episode ends with a look at the road ahead, acknowledging growing internet congestion but remaining optimistic about the web's capacity to adapt.

Resources:

Episode transcript → https://goo.gle/sotr092-transcript 

 

Chapters: 

Chapters: 0:00 - Intro 
0:53 - What is a Web Crawler?
3:11 - Building a Minimal Crawler
6:12 - Ethical Crawling: Robots.txt & Host Health
7:42 - BackRub and Early Crawling Challenges
11:02 - The Anatomy of a Search Engine Paper
13:09 - Crawling Across Google Products
16:51 - New Crawlers & User Agent Strings
22:38 - Crawlers Beyond Google
23:17 - The Evolution of Crawlers
26:32 - Bad Actors and Overpowering Servers
27:31 - Reducing the Footprint on the Internet
28:44 - The Future of Crawlers
31:29- Conclusion

 

Listen to more Search Off the Record → https://goo.gle/sotr-yt

Subscribe to Google Search Channel → https://goo.gle/SearchCentral

 

Search Off the Record is a podcast series that takes you behind the scenes of Google Search with the Search Relations team.

 

#SOTRpodcast #SEO #SearchOfTheRecord

 

Speakers: Martin Splitt, Gary Illyes

Products Mentioned: Googlebot, Search 

Up next
Oct 30
How Search Off the Record tackles SEO and web development
Celebrate the 100th episode of Search Off the Record with Martin, Lizzi, Cherry, John and Gary as they revisit memorable moments, touching on a bunch of topics. This special episode touches on Google's mobile-first indexing, the intricacies of the Caffeine indexing system, and a ... Show More
30m 55s
Sep 4
Optimizing login-page content for Google Search
Explore critical aspects of search engine optimization for web content that lives behind a login or paywall. In this episode, Martin and John cover how Googlebot interacts with these pages, the role of paywall structured data, and methods to prevent unintended indexing of private ... Show More
26m 24s
Aug 21
Lazy loading demystified
Optimize your website's performance with lazy loading techniques. In this episode, John Mueller and Martin Splitt discuss lazy loading, and its SEO implications. Find out how lazy loading affects indexing, ranking, and Core Web Vitals. Whether you're using native lazy loading or ... Show More
24m 31s
Recommended Episodes
Feb 2025
How AI Search Is Changing The SEO Industry
<p>Founder at Knowatoa, Michael Buckbee, discusses how AI search technologies like ChatGPT and Perplexity are revolutionizing the SEO industry by uncovering new ranking opportunities for brands. In this episode, Michael shares his perspectives on: impact of Google's AI-driven sea ... Show More
14m 30s
Sep 11
Technical SEO infrastructure vs human-crafted content quality with limited resources
<p>Enterprise SEO teams waste resources on ineffective LLM.txt files instead of proven protocols. Duane Forrester, former Bing search engineer and founder of UnboundAnswers.com, explains why major crawlers including AI systems still follow established robots.txt standards. The di ... Show More
6m 57s
Apr 2025
Will Traditional Keyword Research Become Obsolete In An AI Search World?
<p>Keyword research faces significant evolution in the AI search era. Chris Andrew, co-founder and CEO of Scrunch AI, explains how traditional keyword targeting must adapt as search fragments across multiple AI platforms. He outlines how single keywords now expand into countless ... Show More
3m 36s
Jul 2024
SEO 2.0: How to Trick Google and Rank AI Content ft. Greg Isenberg
Episode 16: How is AI transforming the future of SEO and job markets? Matt Wolfe (https://x.com/mreflow) and Nathan Lands (https://x.com/NathanLands) are joined by innovator Greg Isenberg (https://x.com/gregisenberg), founder of Late Checkout and Boring Marketer. Greg hosts “The ... Show More
48m 14s
Aug 2024
Why Google search isn’t going anywhere anytime soon (from The Next Wave)
Is Google's dominance in search engines at risk with the rise of generative AI models? In this episode from The Next Wave, a podcast we think you'll like, hosts Matt Wolfe and Nathan Lands dive in to the conversation with Bilawal. They explore the potential challenges facing Goog ... Show More
50m 24s
Dec 2024
The Insane Inefficiencies of AI Crawlers - How to Get Shown on ChatGPT
<p>E536: AI crawlers for generative AI companies like OpenAI or Anthropic or even Meta are REALLY inefficient.</p> <p>Crazy inefficiencies compared to Googlebot.</p> <p>And this means there's actually a decent chance Generative AI can't even access your pages and posts to learn o ... Show More
19m 22s
Nov 2024
Google's View Of AI Content Revisited
<p>CEO at Originailty.ai, Jonathan Gillham, revisits Google's view of AI content. In this episode, Jonathan shares his perspectives on: exploring the impact of AI on SEO and content marketing and the role of detection technology in understanding AI's authenticity. <a href="https: ... Show More
23m 42s
Aug 23
Integration of SEO and CRO departments into a single "search experience" team
<p>In a world of AI search experiences, will impressions rise while clicks remain flat? Shaun Hinklein from Apollo.io challenges this assumption, arguing that Google's constant SERP experimentation suggests we'll see significant changes ahead. He predicts publishers will remain e ... Show More
3m 48s
Aug 26
Tools and Tips Special - Navigating AI Search, Ads & SEO in 2025
In this episode of The Digital Marketing Podcast, hosts Daniel Rowles and Ciaran Rogers return with a fresh round of insights and hands-on tools to help digital marketers adapt and experiment in the evolving AI-driven marketing landscape. From Google's AI Max campaigns to the exp ... Show More
27m 20s
Apr 2025
Optimizing For Traditional Search Engines Vs AI Platforms
<p>SEO professionals face a strategic dilemma between traditional search and AI platforms. Chris Andrew, Co-founder and CEO of Scrunch AI, explains why optimizing for both simultaneously is possible by focusing on human-centered content. He reveals how AI models access content in ... Show More
4m 56s