Question 1

What is the User-Agent for cohere-training-data-crawler?

Accepted Answer

cohere-training-data-crawler identifies itself with the User-Agent string "cohere-training-data-crawler". Use this exact string in robots.txt rules to control access.

Question 2

Can I stop cohere-training-data-crawler from using my content for AI training?

Accepted Answer

Compliance with robots.txt is unconfirmed for cohere-training-data-crawler. Try the robots.txt rule first, but verify with crawl logs that the bot stops appearing.

Question 3

Will blocking cohere-training-data-crawler affect my AI citations?

Accepted Answer

No. cohere-training-data-crawler is a training crawler, separate from real-time AI assistants. Real-time AI assistants use separate user-agents and are not affected by blocking training crawlers.

Question 4

What's the difference between cohere-training-data-crawler and an AI assistant bot?

Accepted Answer

cohere-training-data-crawler crawls broadly to build training datasets — your content becomes part of the model's general knowledge but without direct attribution or links. AI assistant bots (like ChatGPT-User, Claude-User) fetch specific pages in response to user prompts and cite sources back. They use separate User-Agents and can be controlled independently.

Question 5

How do I verify that a request is really from cohere-training-data-crawler?

Accepted Answer

User-Agent alone is not enough — anyone can claim to be cohere-training-data-crawler. The operator may publish IP ranges or reverse-DNS verification in their crawler docs. BotSights flags spoofed traffic automatically.

Question 6

Is my content being used without permission?

Accepted Answer

Training crawlers collect publicly accessible content. The legal landscape around this is rapidly evolving (lawsuits in the US, EU AI Act, etc.). Robots.txt remains the most practical opt-out mechanism today, plus emerging standards like ai.txt.

Question 7

How often does cohere-training-data-crawler crawl?

Accepted Answer

Training crawlers usually visit periodically — weekly or monthly waves rather than daily. If you see sudden spikes, monitor whether the bot is honoring Crawl-delay directives in your robots.txt.

cohere-training-data-crawler

cohere-training-data-crawler Traffic (Last 90 Days)

What is cohere-training-data-crawler?

What cohere-training-data-crawler means for your site

What should you do?

How to identify cohere-training-data-crawler

How to block cohere-training-data-crawler

Option 1: Block all access

Option 2: Block specific paths only

Option 3: Slow down with a crawl delay

Frequently Asked Questions

See which pages AI training crawlers target