AI2Bot-DeepResearchEval

AI Data Scraper

Operated by Allen Institute for AI

Last updated:

Monitor

Downloads content for AI model training without direct attribution.

Recommended action: Review robots.txt policy and decide if training access is acceptable.

Category

AI Data Scraper

Primary use case

AI model training

Trust level

Review recommended

robots.txt

Respected

AI2Bot-DeepResearchEval Traffic (Last 90 Days)

Not enough network data yet.

Track this bot on your site

What is AI2Bot-DeepResearchEval?

AI2Bot is operated by Ai2 (Allen Institute for AI), a non-profit research institute. The crawler explores web content to build training datasets for open language models. The DeepResearchEval variant specifically collects resources used in deep research benchmark evaluations, not for live user queries. The User-Agent string 'Mozilla/5.0 (compatible) AI2Bot (+https://www.allenai.org/crawler)' can be used in robots.txt rules to filter or block the crawler.

What AI2Bot-DeepResearchEval means for your site

AI2Bot-DeepResearchEval downloads your content to include in datasets used to train AI models, operated by Allen Institute for AI. Your text becomes part of the AI's general knowledge, but without direct attribution or links. This is a key distinction: training crawlers take your content, AI assistants cite it. You can control training access via robots.txt without affecting citations.

What should you do?

  • Decide whether you want Allen Institute for AI to train on your content
  • Block via robots.txt if unwanted: User-agent: ai2bot-deepresearcheval / Disallow: /
  • Monitor crawl patterns for unexpected spikes
  • Review BotSights data to see which pages are targeted

See AI2Bot-DeepResearchEval on your own site

BotSights tracks every AI2Bot-DeepResearchEval visit in real time, including which pages it crawls, how often, and from where.

Start free

How to identify AI2Bot-DeepResearchEval

AI2Bot-DeepResearchEval uses the user-agent "ai2bot-deepresearcheval" and respects robots.txt. It crawls broadly and systematically, often downloading full page content.

ai2bot-deepresearchevalAI2Bot-DeepResearchEval

How to block AI2Bot-DeepResearchEval

Three robots.txt options below. Pick the one that matches your goal. Each snippet lists every known AI2Bot-DeepResearchEval user-agent pattern so the rules apply regardless of which one the bot announces.

Edit robots.txt with care

A single misplaced line can de-index your entire site. Common mistake: pasting User-agent: * followed by Disallow: / blocks every bot, not just AI2Bot-DeepResearchEval, including Googlebot. Always paste the snippet between existing rules (not over them), keep the User-agent line scoped to AI2Bot-DeepResearchEval's patterns, and verify with Google's robots.txt tester before deploying. If you are not sure, ask a developer first.

Option 1: Block all access

Tells AI2Bot-DeepResearchEval not to crawl any URL on your site. Use this when you want the bot completely off your content.

User-agent: ai2bot-deepresearcheval
User-agent: AI2Bot-DeepResearchEval
Disallow: /

Option 2: Block specific paths only

Keep public content crawlable but exclude sensitive or non-public sections. Add one Disallow: line per path. Replace the example paths with your own.

User-agent: ai2bot-deepresearcheval
User-agent: AI2Bot-DeepResearchEval
Disallow: /admin/
Disallow: /private/
Disallow: /checkout/

Option 3: Slow down with a crawl delay

Crawl-delay is a voluntary directive that asks the bot to wait the given number of seconds between requests. Useful when AI2Bot-DeepResearchEval is hammering your origin and slowing the site down for real visitors, but you do not want to block it outright. The value is in seconds, so 10 means at most one request every ten seconds. Not all bots honour this directive (Googlebot ignores it; Bingbot, Yandex, and many AI crawlers do respect it).

User-agent: ai2bot-deepresearcheval
User-agent: AI2Bot-DeepResearchEval
Crawl-delay: 10

Frequently Asked Questions

What is the User-Agent for AI2Bot-DeepResearchEval?

AI2Bot-DeepResearchEval identifies itself with the User-Agent string "ai2bot-deepresearcheval" (alternate forms: AI2Bot-DeepResearchEval). Use this exact string in robots.txt rules to control access.

Can I stop AI2Bot-DeepResearchEval from using my content for AI training?

Yes. Add this to your robots.txt: User-agent: ai2bot-deepresearcheval / Disallow: /. Allen Institute for AI commits to respecting robots.txt for training data.

Will blocking AI2Bot-DeepResearchEval affect my AI citations?

No. AI2Bot-DeepResearchEval is a training crawler, separate from real-time AI assistants. For example, blocking AI2Bot-DeepResearchEval does not block Allen Institute for AI's user-prompt assistants from citing your content live.

What's the difference between AI2Bot-DeepResearchEval and an AI assistant bot?

AI2Bot-DeepResearchEval crawls broadly to build training datasets — your content becomes part of the model's general knowledge but without direct attribution or links. AI assistant bots (like ChatGPT-User, Claude-User) fetch specific pages in response to user prompts and cite sources back. They use separate User-Agents and can be controlled independently.

How do I verify that a request is really from AI2Bot-DeepResearchEval?

User-Agent alone is not enough — anyone can claim to be AI2Bot-DeepResearchEval. Allen Institute for AI may publish IP ranges or reverse-DNS verification in their crawler docs. BotSights flags spoofed traffic automatically.

Is my content being used without permission?

Training crawlers collect publicly accessible content. The legal landscape around this is rapidly evolving (lawsuits in the US, EU AI Act, etc.). Robots.txt remains the most practical opt-out mechanism today, plus emerging standards like ai.txt.

How often does AI2Bot-DeepResearchEval crawl?

Training crawlers usually visit periodically — weekly or monthly waves rather than daily. If you see sudden spikes, monitor whether the bot is honoring Crawl-delay directives in your robots.txt.

See which pages AI training crawlers target

Monitor training-oriented bots, identify the content they access most, and decide what to allow or block.

  • Track training crawler activity per page
  • See exactly which content is being scraped
  • Make smarter allow or block decisions
Check your exposure

Free plan available. No credit card required. Setup in 2 minutes.