Question 1

What is the User-Agent for Google-Extended?

Accepted Answer

Google-Extended identifies itself with the User-Agent string "google-extended" (alternate forms: Google-Extended). Use this exact string in robots.txt rules to control access.

Question 2

Can I stop Google-Extended from using my content for AI training?

Accepted Answer

Yes. Add this to your robots.txt: User-agent: google-extended / Disallow: /. Google commits to respecting robots.txt for training data.

Question 3

Will blocking Google-Extended affect my AI citations?

Accepted Answer

No. Google-Extended is a training crawler, separate from real-time AI assistants. For example, blocking Google-Extended does not block Google's user-prompt assistants from citing your content live.

Question 4

What's the difference between Google-Extended and an AI assistant bot?

Accepted Answer

Google-Extended crawls broadly to build training datasets — your content becomes part of the model's general knowledge but without direct attribution or links. AI assistant bots (like ChatGPT-User, Claude-User) fetch specific pages in response to user prompts and cite sources back. They use separate User-Agents and can be controlled independently.

Question 5

How do I verify that a request is really from Google-Extended?

Accepted Answer

User-Agent alone is not enough — anyone can claim to be Google-Extended. Google may publish IP ranges or reverse-DNS verification in their crawler docs. BotSights flags spoofed traffic automatically.

Question 6

Is my content being used without permission?

Accepted Answer

Training crawlers collect publicly accessible content. The legal landscape around this is rapidly evolving (lawsuits in the US, EU AI Act, etc.). Robots.txt remains the most practical opt-out mechanism today, plus emerging standards like ai.txt.

Question 7

How often does Google-Extended crawl?

Accepted Answer

Training crawlers usually visit periodically — weekly or monthly waves rather than daily. If you see sudden spikes, monitor whether the bot is honoring Crawl-delay directives in your robots.txt.

Google-Extended

Google-Extended Traffic (Last 90 Days)

What is Google-Extended?

What Google-Extended means for your site

What should you do?

How to identify Google-Extended

How to block Google-Extended

Option 1: Block all access

Option 2: Block specific paths only

Option 3: Slow down with a crawl delay

Frequently Asked Questions

See which pages AI training crawlers target