Bot Database

by Google

Googlebot is Google's primary web crawler that discovers and indexes web pages for Google Search. It is the most active crawler on the internet and drives organic search visibility.

AhrefsBot

by Ahrefs

AhrefsBot crawls the web to build Ahrefs' backlink index and SEO database. It is one of the most active crawlers on the internet after Googlebot.

Applebot

by Apple

Applebot is Apple's web crawler that indexes content to power search features across Apple's ecosystem including Spotlight, Siri, and Safari suggestions.

GPTBot

by OpenAI

GPTBot is OpenAI's web crawler that collects data from publicly accessible web pages to improve AI models like ChatGPT. Site owners can control access via robots.txt.

meta-externalagent

by Meta

meta-externalagent crawls web content for training AI models and improving Meta's products by indexing content directly across the internet.

meta-webindexer

by Meta

meta-webindexer browses the internet to improve search results for Meta AI users, analyzing online content to make Meta AI's responses more relevant with proper citations.

Amazonbot

by Amazon

AI Search Crawler bot

ChatGPT-User

by OpenAI

ChatGPT-User is dispatched by OpenAI when a ChatGPT user asks a question that requires fetching live web content. It retrieves the page so ChatGPT can include it as a cited source in its response.

Claude-SearchBot

by Anthropic

Claude-SearchBot builds and refreshes the search index that Claude uses when a user runs a web-search-style query. It's the indexing layer between training (ClaudeBot, periodic) and live fetch (Claude-User, on-demand). Crawl pattern is steady and sitemap-driven rather than user-triggered. Blocking it limits how often your pages can surface in Claude's search results, but doesn't affect direct citations or training. Verification works the same way as the other two Anthropic bots: BotSights cross-checks the source IP against Anthropic's published JSON allowlist on every hit.

ClaudeBot

by Anthropic

ClaudeBot is Anthropic's training-data crawler. It systematically downloads pages so Claude's underlying language models can learn from your content during training cycles. This is a one-way relationship: ClaudeBot takes content, but it doesn't link back or cite you in user answers (that's what Claude-User does). If you want Anthropic to use your content for model training, leave it allowed; if you don't, blocking ClaudeBot prevents inclusion in future training datasets without affecting how Claude cites you in live conversations. Anthropic exposes the live IP allowlist as a JSON file, which means BotSights can verify each visit and flag anything spoofing the ClaudeBot user-agent.

facebookexternalhit

by Meta

facebookexternalhit fetches link previews when someone shares a URL on Facebook, Messenger, or Instagram. It reads Open Graph meta tags to generate the preview card.

SemrushBot

by Semrush

SemrushBot crawls websites to collect data for Semrush's SEO analytics platform, including backlinks, keyword rankings, and competitive analysis.

Bytespider

by ByteDance

Bytespider is operated by ByteDance, the company behind TikTok. It downloads training data for ByteDance's large language models including those powering Doubao, their ChatGPT competitor.

CCBot

by Common Crawl

CCBot creates an open repository of web data used by researchers and AI companies worldwide. Its crawl data has been used to train many major language models including GPT and LLaMA.

Claude-User

by Anthropic

Claude-User shows up on your site whenever a real user asks Claude a question and the answer requires up-to-date information from the web. It pulls one specific page in real time, and your URL may end up as a cited source in the response Claude shows that user. This is the bot you generally want active: blocking it removes your pages from being referenced inside Claude's user-facing answers (separate from training, which is governed by ClaudeBot, and search indexing, which is Claude-SearchBot). BotSights matches each Claude-User visit against Anthropic's published IP allowlist so you can tell genuine fetches apart from scrapers using the same user-agent string.

Gemini-Deep-Research

by Google

Gemini-Deep-Research is the agent responsible for collecting resources used in Google Gemini's Deep Research feature, which acts as a personal research assistant that browses the web on behalf of users.

Google-Extended

by Google

Google-Extended downloads web content for Google's AI products like Gemini and Vertex AI generative APIs. Blocking this bot prevents your content from being used for AI training without affecting Google Search indexing.

MistralAI-User

by Mistral AI

MistralAI-User is Mistral's AI assistant bot that performs web browsing tasks for users in Le Chat, retrieving web pages to answer user queries with cited sources.

OAI-SearchBot

by OpenAI

OAI-SearchBot is OpenAI's web crawler that indexes websites for SearchGPT, collecting web content to power AI-driven search results and real-time information retrieval.

Perplexity-User

by Perplexity AI

Perplexity-User fetches web pages when a Perplexity user asks a question. The retrieved content is used to generate an AI-powered answer with inline citations linking back to the source.

PerplexityBot

by Perplexity AI

PerplexityBot indexes web content to power Perplexity AI's search engine. Unlike Perplexity-User, this bot crawls proactively to build a search index rather than fetching on-demand for a specific user query.

ChatGPT Agent

by OpenAI

ChatGPT Agent is an autonomous AI agent that can use a web browser to navigate websites, interact with forms, and complete multi-step tasks on behalf of a ChatGPT user.

Google-Agent

by Google

Google-Agent is used by agents hosted on Google infrastructure to navigate the web and perform actions upon user request.

Manus-User

by Butterfly Effect

Manus-User is a browser-enabled AI agent that autonomously navigates websites, interprets content, and carries out multi-step tasks for users.

NovaAct

by Amazon

Nova Act is Amazon's AI agent that can use a web browser to navigate websites and complete multi-step tasks on behalf of a human user.

LinkedInBot

by LinkedIn

LinkedInBot fetches link previews when a URL is shared on LinkedIn. It reads Open Graph and meta tags to generate the post preview card visible to the poster's network.

Pinterestbot

Fetcher bot

by Meta

WhatsApp's preview bot fetches link metadata when someone shares a URL in a WhatsApp chat. It reads Open Graph tags to display a title, description, and thumbnail image.

Applebot-Extended

by Apple

Applebot-Extended trains Apple's foundation language models powering Apple Intelligence features across Apple products. Blocking this bot prevents AI training without affecting Siri or Spotlight.

Baiduspider

by Baidu

Search Engine Crawler bot

DotBot

SEO Crawler bot

DuckDuckBot

Search Engine Crawler bot

FacebookBot

by Meta

FacebookBot downloads web content to train Meta's AI speech recognition and language models. Separate from facebookexternalhit which handles link previews.

kagi-fetcher

by Kagi

kagi-fetcher fetches web content for Kagi AI's suite of tools including Assistant, Research, and other knowledge discovery features to answer user queries.

meta-externalfetcher

by Meta

meta-externalfetcher is used by Meta to perform user-initiated fetches of web pages from AI assistant product features like Meta AI.

MJ12bot

SEO Crawler bot

PhindBot

by Phind

PhindBot is the crawler for Phind, an AI-powered answer engine designed for developers. The bot indexes web content (technical documentation, blog posts, code examples) to power Phind's search results, which surface answers with citations linking back to source pages.

YandexBot

by Yandex

Search Engine Crawler bot

Amzn-User

by Amazon

Amzn-User is an AI assistant operated by Amazon that fetches web content to answer user queries through Alexa and other Amazon AI services.

DuckAssistBot

by DuckDuckGo

DuckAssistBot fetches web content for DuckDuckGo's AI-assisted answers feature, which generates brief responses to search queries using natural language technology.

Google-NotebookLM

by Google

Google-NotebookLM is an AI-powered research assistant that fetches source URLs when users add them to their notebooks, enabling the AI to analyze pages for context and insights.

AmazonBuyForMe

by Amazon

AI Agent bot

GoogleAgent-Mariner

by Google

AI Agent bot

TwinAgent

AI Agent bot

AddSearchBot

AI Search Crawler bot

Amzn-SearchBot

by Amazon

Uncategorized bot

Anomura

AI Search Crawler bot

atlassian-bot

AI Search Crawler bot

AzureAI-SearchBot

Uncategorized bot

Bravebot

AI Search Crawler bot

Channel3Bot

AI Search Crawler bot

Cloudflare-AutoRAG

AI Search Crawler bot

ExaBot

Uncategorized bot

Google-CloudVertexBot

by Google

AI Search Crawler bot

KlaviyoAIBot

by Klaviyo

KlaviyoAIBot is Klaviyo's web crawler for its Kai Customer Agent feature. It fetches publicly available pages from domains and URLs that you have explicitly connected to your Klaviyo account, indexing content to enable AI-generated content, AI answers, and product recommendations. KlaviyoAIBot follows the Robots Exclusion Protocol, honors HTTP 429/503 rate-limit responses with Retry-After, and is verifiable via HTTP Message Signatures (RFC 9421) and Cloudflare's Verified Bots program. It does not bypass authentication, paywalls, or access controls.

LinerBot

by Liner

LinerBot is the crawler for Liner, an AI research and answer engine. It indexes academic sources, blogs, and websites so Liner Smart Search can return answers with line-by-line source citations linking back to the original pages.

LinkupBot

AI Search Crawler bot

PetalBot

by Huawei

PetalBot is the web crawler for Huawei's Petal Search engine, Huawei Assistant, and Huawei AI Search. It crawls both desktop and mobile versions of websites to build a search index. Operators verify the bot via reverse DNS lookups on the aspiegel.com or petalsearch.com domains rather than published IP ranges.

QualifiedBot

by Qualified

QualifiedBot is the crawler operated by Qualified, a B2B sales AI platform. It crawls customer websites that have explicitly enabled Qualified's AI Sales Development Representative (Piper) so that Piper can answer visitor questions with content drawn from the customer's site.

TavilyBot

by Tavily

TavilyBot is the crawler operated by Tavily, a search API provider for AI applications and LLM developers. The bot indexes web content into Tavily's search index, which is then queried by AI agents and apps built on top of the Tavily Search API to surface cited sources in their responses.

TerraCotta

by Ceramic AI

TerraCotta is the web crawler for Ceramic AI, a web-scale search API used by AI assistants and LLMs to fetch current web information. Webmasters who want their public content included in Ceramic's index can add Allow: TerraCotta to robots.txt; the bot also respects standard robots.txt disallow rules. Operated by Ceramic Team — contact via crawler@ceramic.ai. Published IPs are listed at https://github.com/CeramicTeam/CeramicTerracotta/blob/main/ipList.txt.

xAI-SearchBot

by xAI

xAI-SearchBot is the live-search crawler operated by xAI, the AI company behind Grok. When a Grok user asks a question that needs current information, xAI-SearchBot fetches relevant web pages on-demand so Grok can ground its answer in fresh, citable content. The full user-agent is Mozilla/5.0 (compatible; xAI-SearchBot/1.0; +https://x.ai). It is the xAI counterpart to OAI-SearchBot (ChatGPT search), Claude-SearchBot (Anthropic search), and PerplexityBot (Perplexity index). The bot obeys robots.txt and identifies itself clearly in the User-Agent header. Allowing it means your content is eligible to appear as a cited source inside Grok's answers; blocking it removes that eligibility but does not affect any separate xAI training data collection.

YouBot

AI Search Crawler bot

ZanistaBot

AI Search Crawler bot

AI2Bot-DeepResearchEval

by Allen Institute for AI

AI2Bot is operated by Ai2 (Allen Institute for AI), a non-profit research institute. The crawler explores web content to build training datasets for open language models. The DeepResearchEval variant specifically collects resources used in deep research benchmark evaluations, not for live user queries. The User-Agent string 'Mozilla/5.0 (compatible) AI2Bot (+https://www.allenai.org/crawler)' can be used in robots.txt rules to filter or block the crawler.

Ai2Bot-Dolma

AI Data Scraper bot

ApifyWebsiteContentCrawler

Uncategorized bot

bigsur.ai

bigsur.ai operates a web crawler whose purpose is not publicly documented at the time of writing. The bot is associated with bigsur.ai, an AI infrastructure and model evaluation company. Categorized as an AI Data Scraper as a defensive default until clearer documentation is available.

ChatGLM-Spider

AI Data Scraper bot

CloudVertexBot

AI Data Scraper bot

cohere-training-data-crawler

AI Data Scraper bot

Cotoyogi

AI Data Scraper bot

Datenbank Crawler

AI Data Scraper bot

Diffbot

AI Data Scraper bot

FirecrawlAgent

Uncategorized bot

GoogleOther

by Google

AI Data Scraper bot

ICC-Crawler

AI Data Scraper bot

imageSpider

by ByteDance

AI Data Scraper bot

Kangaroo Bot

AI Data Scraper bot

laion-huggingface-processor

AI Data Scraper bot

LCC

AI Data Scraper bot

netEstate Imprint Crawler

AI Data Scraper bot

omgili

AI Data Scraper bot

PanguBot

AI Data Scraper bot

Poggio-Citations

Poggio-Citations is a web crawler whose specific purpose is not publicly documented. The name suggests it collects citation data for AI research or model training. Categorized as an AI Data Scraper as a defensive default until clearer documentation is available.

SBIntuitionsBot

AI Data Scraper bot

Spider

AI Data Scraper bot

Timpibot

AI Data Scraper bot

VelenPublicWebCrawler

AI Data Scraper bot

webzio-extended

AI Data Scraper bot

Devin

AI Coding Agent

by Cognition

Devin is a software engineering AI assistant by Cognition that can browse websites and perform web-based tasks, functioning as a collaborative AI teammate for engineering teams.

360Spider

Search Engine Crawler bot

Alexa Archive

Search Engine Crawler bot

alexa site audit

Search Engine Crawler bot

AlexandriaOrgBot

Search Engine Crawler bot

Algolia

Search Engine Crawler bot

Algolia Crawler

Search Engine Crawler bot

Atom Feed Robot

Search Engine Crawler bot

Baiduspider-render

by Baidu

Search Engine Crawler bot

bingbot

by Microsoft

Search Engine Crawler bot

cludo.com bot

Search Engine Crawler bot

Cốc Cốc

Search Engine Crawler bot

coccocbot

Search Engine Crawler bot

coccocbot-image

Search Engine Crawler bot

coccocbot-web

Search Engine Crawler bot

Coveo Bot

Search Engine Crawler bot

Coveobot

Search Engine Crawler bot

crawler.freespoke.com

Search Engine Crawler bot

Crawlson

Search Engine Crawler bot

Dataprovider

Search Engine Crawler bot

Daum

Search Engine Crawler bot

DuckDuckGo-Favicons-Bot

Search Engine Crawler bot

Feedfetcher-Google

by Google

Search Engine Crawler bot

FindFiles.net

by FindFiles.net

FindFiles.net is a file-specific search engine that indexes publicly accessible documents, images, videos, and other files across the web. The crawler verifies file availability via multiplexed HTTP/2 HEAD requests, may download images and videos for classification, and downloads executables to scan them for malicious content. The bot keeps a minimum interval of 10 seconds between requests to the same server.

FindITAnswersbot

Search Engine Crawler bot

Freespoke

Search Engine Crawler bot

FreespokeCrawler

Search Engine Crawler bot

Funnelback

Search Engine Crawler bot

FyndSearchEngine-Crawler

Search Engine Crawler bot

FyndSearchEngine-ReCrawler

Search Engine Crawler bot

GeedoProductSearch

Search Engine Crawler bot

Gigabot

Search Engine Crawler bot

Google Favicon

by Google

Search Engine Crawler bot

Google Images

by Google

Search Engine Crawler bot

Google Scholar

by Google

Search Engine Crawler bot

Google Videos

by Google

Search Engine Crawler bot

Googlebot-IA

by Google

Search Engine Crawler bot

Googlebot-Image

by Google

Search Engine Crawler bot

Googlebot-Mobile

by Google

Search Engine Crawler bot

Googlebot-News

by Google

Search Engine Crawler bot

Googlebot-Video

by Google

Search Engine Crawler bot

Greppr Web Crawler

Search Engine Crawler bot

HaosouSpider

Search Engine Crawler bot

Hype Machine

Search Engine Crawler bot

IbouBot

Search Engine Crawler bot

intelx.io_bot

Search Engine Crawler bot

Jooblebot

Search Engine Crawler bot

Kagibot

Search Engine Crawler bot

Level9SearchBot

Search Engine Crawler bot

Linespider

Search Engine Crawler bot

lyonl

Uncategorized bot

lyonl-crawler

Uncategorized bot

MagiBot

Search Engine Crawler bot

Marginalia Search

Search Engine Crawler bot

Mars Finder

Search Engine Crawler bot

MojeekBot

Search Engine Crawler bot

MotoMinerBot

Search Engine Crawler bot

MRGbot

Search Engine Crawler bot

MSN

by Microsoft

Search Engine Crawler bot

msnbot