Question 1

What is the User-Agent for CCBot?

Accepted Answer

CCBot identifies itself with the User-Agent string "CCBot" (alternate forms: ccbot). Use this exact string in robots.txt rules to control access.

Question 2

Can I stop CCBot from using my content for AI training?

Accepted Answer

Yes. Add this to your robots.txt: User-agent: CCBot / Disallow: /. Common Crawl commits to respecting robots.txt for training data.

Question 3

Will blocking CCBot affect my AI citations?

Accepted Answer

No. CCBot is a training crawler, separate from real-time AI assistants. For example, blocking CCBot does not block Common Crawl's user-prompt assistants from citing your content live.

Question 4

What's the difference between CCBot and an AI assistant bot?

Accepted Answer

CCBot crawls broadly to build training datasets — your content becomes part of the model's general knowledge but without direct attribution or links. AI assistant bots (like ChatGPT-User, Claude-User) fetch specific pages in response to user prompts and cite sources back. They use separate User-Agents and can be controlled independently.

Question 5

How do I verify that a request is really from CCBot?

Accepted Answer

User-Agent alone is not enough — anyone can claim to be CCBot. Common Crawl may publish IP ranges or reverse-DNS verification in their crawler docs. BotSights flags spoofed traffic automatically.

Question 6

Is my content being used without permission?

Accepted Answer

Training crawlers collect publicly accessible content. The legal landscape around this is rapidly evolving (lawsuits in the US, EU AI Act, etc.). Robots.txt remains the most practical opt-out mechanism today, plus emerging standards like ai.txt.

Question 7

How often does CCBot crawl?

Accepted Answer

Training crawlers usually visit periodically — weekly or monthly waves rather than daily. If you see sudden spikes, monitor whether the bot is honoring Crawl-delay directives in your robots.txt.

CCBot

CCBot Traffic (Last 90 Days)

What is CCBot?

What CCBot means for your site

What should you do?

How to identify CCBot

How to block CCBot

Option 1: Block all access

Option 2: Block specific paths only

Option 3: Slow down with a crawl delay

Frequently Asked Questions

See which pages AI training crawlers target