Free Tool · Generator

Robots.txt Generator

Build a clean robots.txt — including AI bots like GPTBot, ClaudeBot, PerplexityBot, Google-Extended, and CCBot.

Start with a template

Welcome every bot — search, AI training, AI search, and social previews. The default and recommended posture for content sites that want maximum reach.

Rule 1

User-agents

Use * for the catch-all rule, or pick from the AI bot list below.

Allow paths

Disallow paths

Crawl-delay (s)

Optional. Most modern crawlers ignore Crawl-delay; Yandex still uses it.

Sitemap URLs

Absolute URLs only — e.g., https://example.com/sitemap.xml.

Preferred host (optional, Yandex)

Non-standard. Only Yandex respects this; Google ignores it.

1 hint

No Sitemap declared. Adding one helps Google + Bing discover all your pages — it should be the absolute URL to your sitemap.xml.

robots.txt

txtPlace at the root of your domain

User-agent: *
Allow: /

Validate after deploying.

Upload your robots.txt to your domain root, then test it with Google's robots.txt tester in Search Console.

AI + search crawler reference (65 bots)

Owner, purpose, and category for every AI / search bot we track.

▼

FilterShowing 65 of 65

User-agent	Owner	Category	Purpose	Docs
GPTBot	OpenAI	AI training	Trains OpenAI's foundation models (GPT-4o, GPT-5).
OAI-SearchBot	OpenAI	AI search / retrieval	Indexes content for OpenAI's SearchGPT product.
ChatGPT-User	OpenAI	AI search / retrieval	Fetches a page on demand when a ChatGPT user asks the model to browse a URL.
ClaudeBot	Anthropic	AI search / retrieval	Fetches content for citation inside Claude.
anthropic-ai	Anthropic	AI training	Legacy training crawler — still respected by current versions.
Claude-Web	Anthropic	AI search / retrieval	Inline browsing inside Claude conversations.	—
Claude-User	Anthropic	AI search / retrieval	User-triggered fetch from Claude.	—
Claude-SearchBot	Anthropic	AI search / retrieval	Crawler for Claude's search-augmented retrieval.	—
Googlebot	Google	Search engines	Web search index crawler.
Googlebot-Image	Google	Search engines	Crawls images for Google Images.	—
Googlebot-News	Google	Search engines	Selects content for Google News inclusion.	—
Googlebot-Video	Google	Search engines	Crawls video content for video-rich results.	—
Google-InspectionTool	Google	Search engines	Search Console live URL inspection + rendering.	—
Google-Extended	Google	AI training	Opt-in token for Gemini / Vertex training. Allowing it = opting in to AI training; blocking it = opt-out without affecting Search.
Google-CloudVertexBot	Google	AI training	Vertex AI fine-tuning corpus crawler.	—
GoogleOther	Google	Research / corpora	Generic Google research / experimental crawls.	—
GoogleOther-Image	Google	Research / corpora	Generic Google image research crawler.	—
GoogleOther-Video	Google	Research / corpora	Generic Google video research crawler.	—
Storebot-Google	Google	Search engines	Shopping crawler — product pricing, availability.	—
AdsBot-Google	Google	Search engines	Validates Google Ads landing-page quality.	—
AdsBot-Google-Mobile	Google	Search engines	Validates mobile Google Ads landing-page quality.	—
Mediapartners-Google	Google	Search engines	AdSense — checks page content for ad relevance.	—
Bingbot	Microsoft	Search engines	Bing web search index crawler.	—
BingPreview	Microsoft	Search engines	Bing's preview-page renderer.	—
MicrosoftPreview	Microsoft	AI search / retrieval	Bing Chat / Copilot preview crawler.	—
msnbot	Microsoft	Search engines	Legacy MSN crawler — still active.	—
PerplexityBot	Perplexity	AI search / retrieval	Builds Perplexity's search index.
Perplexity-User	Perplexity	AI search / retrieval	Real-time fetch when a Perplexity user requests a citation.	—
Applebot	Apple	Search engines	Spotlight, Siri, Safari Suggestions index.	—
Applebot-Extended	Apple	AI training	Apple Intelligence training opt-in. Allowing it = opting in to Apple's AI training.	—
Meta-ExternalAgent	Meta	AI training	Llama-product crawler.	—
Meta-ExternalFetcher	Meta	AI search / retrieval	Real-time browse on behalf of a Meta AI user.	—
FacebookBot	Meta	AI training	Meta AI / Llama training fetcher.	—
facebookexternalhit	Meta	Social previews	Open Graph / link-preview fetcher for Facebook + Instagram.	—
Bytespider	ByteDance	AI training	Trains ByteDance / Doubao models.	—
CCBot	Common Crawl	Research / corpora	Builds Common Crawl, the corpus most foundation models train on.
Diffbot	Diffbot	Research / corpora	Builds the Diffbot Knowledge Graph.	—
FriendlyCrawler	Webis (research)	Research / corpora	Common-Crawl-funded research crawler.	—
Cohere-ai	Cohere	AI search / retrieval	Cohere AI search + training fetcher.	—
cohere-training-data-crawler	Cohere	AI training	Cohere model training data crawler.	—
Mistral	Mistral AI	AI training	Mistral / Le Chat training corpus.	—
MistralAI-User	Mistral AI	AI search / retrieval	Le Chat browse-on-behalf-of-user.	—
Grok	xAI	AI search / retrieval	Grok training + retrieval fetcher.	—
xAI-Bot	xAI	AI training	xAI / Grok crawler.	—
PhindBot	Phind	AI search / retrieval	Phind dev-search index.	—
YouBot	You.com	AI search / retrieval	You.com AI search crawler.	—
Amazonbot	Amazon	AI training	Alexa + Amazon AI training crawler.	—
DuckDuckBot	DuckDuckGo	Search engines	DuckDuckGo search index.	—
YandexBot	Yandex	Search engines	Yandex search index (Russian + global).	—
Baiduspider	Baidu	Search engines	Baidu search index (China).	—
Naverbot	Naver	Search engines	Naver search index (Korea).	—
Kagibot	Kagi	Search engines	Kagi search index.	—
Ai2Bot	Allen Institute (Ai2)	AI training	Trains open research models (Olmo, Tülu).	—
Ai2Bot-Dolma	Allen Institute (Ai2)	AI training	Builds the Dolma open training corpus.	—
Twitterbot	X / Twitter	Social previews	Generates X / Twitter card previews.	—
LinkedInBot	LinkedIn	Social previews	Generates LinkedIn share previews.	—
Slackbot	Slack	Social previews	Unfurls links shared in Slack.	—
Discordbot	Discord	Social previews	Unfurls links shared in Discord.	—
WhatsApp	Meta (WhatsApp)	Social previews	Unfurls links shared in WhatsApp.	—
TelegramBot	Telegram	Social previews	Unfurls links shared in Telegram.	—
SemrushBot	Semrush	SEO tools	Semrush competitor + backlink analysis.	—
AhrefsBot	Ahrefs	SEO tools	Ahrefs SEO + backlink crawler.	—
MJ12bot	Majestic	SEO tools	Majestic backlink index.	—
DotBot	Moz	SEO tools	Moz / Open Site Explorer crawler.	—
rogerbot	Moz	SEO tools	Moz Pro audit crawler.	—

Automate it with Rankrize

Allowing AI bots ≠ getting cited by AI.

Letting GPTBot in is step one. Step two is structuring your content so it actually gets cited. Rankrize generates llms.txt, ai.txt, FAQ schema, and answer-capsule content built for AI citation. Run a free GEO audit.

Run free site analysisOne free per account · No credit card

How to use

How to use the Robots.txt Generator

Decide your AI posture before you decide your rules
Three options: (1) Allow all — best for content sites that want maximum reach including AI citations. (2) Allow search, block AI training — keep ranking on Google but opt out of AI model training. (3) Block all AI — keep ranking but never appear in ChatGPT, Claude, or Perplexity answers. Start with the matching template, then customize.
Allowing GPTBot ≠ getting cited by ChatGPT
Robots.txt is the door — content quality is the conversation. Allowing GPTBot lets OpenAI train on your pages; getting actually cited requires your content to be discoverable, well-structured (FAQ schema, HowTo schema, clean answer capsules), and authoritative. The Rankrize platform is built around this — schema generation + answer-capsule writing on every article.
Always include your sitemap URL — even if you also link it from <head>
Search engines discover sitemaps from three places: the robots.txt Sitemap directive, Search Console submissions, and HTML <link rel="sitemap"> tags. Robots.txt is the most universal — Google, Bing, Yandex, DuckDuckGo, and AI search engines all read it. Use absolute URLs (https://example.com/sitemap.xml), not relative paths.
Most-specific match wins — order doesn't matter
Per RFC 9309, when multiple rules apply to the same crawler, the most specific path match wins (longest matching pattern). Order in the file is irrelevant. So 'Disallow: /admin/' followed later by 'Allow: /admin/help' will let Googlebot read /admin/help even though /admin/ is broadly disallowed.
Test before deploying — and re-test after every change
Once your robots.txt is live, test it with Google Search Console's robots.txt tester (link in the validation panel). Paste a few sample URLs, pick a user-agent, and confirm it returns the expected allow / disallow. A typo can deindex your entire site overnight — verifying takes 30 seconds.
Don't use robots.txt to hide private pages
Disallow tells crawlers not to fetch a URL — it doesn't tell them not to index it. If other sites link to your /private-page, Google can still list it (with a thin snippet). For real privacy, use noindex meta tag on the page OR put it behind authentication. Robots.txt is for crawl-budget management, not access control.

Frequently asked questions

About the Robots.txt Generator

Disallow tells a crawler not to fetch a URL pattern. Allow re-permits a sub-path inside an otherwise-disallowed parent. Example: 'Disallow: /admin/' + 'Allow: /admin/admin-ajax.php' blocks the WP admin area but lets crawlers reach the JS file your front-end depends on. With no Disallow rule, everything is implicitly allowed.

Depends on your business model. If your traffic monetization model relies on people visiting your site (ads, lead gen, content sales), allowing AI crawlers is usually a net positive — they cite you with a link, driving referral traffic and brand awareness. If your business depends on scarcity (paid research reports, premium courses), you may want to block training crawlers (GPTBot, CCBot, Google-Extended) but still allow real-time AI search (ChatGPT-User, Perplexity-User, ClaudeBot).

Eventually. GPTBot is OpenAI's training crawler — blocking it tells OpenAI to exclude your content from FUTURE model training updates. Content already in the training set stays cited until that model version is retired (typically 6–18 months). To stop real-time citations in browsing-enabled ChatGPT, also block ChatGPT-User and OAI-SearchBot.

Google-Extended is the opt-in token for Google's Gemini and Vertex AI training. Allowing it = your content can be used to train future Gemini models. Blocking it = opt-out without any effect on your Google Search rankings (Googlebot is a separate crawler). If you allow Google to use your content for AI training, you increase the chances of being cited by Gemini and Google AI Overviews.

At the root of your domain — e.g., https://example.com/robots.txt. It must be a single text file at the exact path /robots.txt with no subdirectories. If you have multiple subdomains (blog.example.com, shop.example.com), each one needs its own robots.txt at its own root. WordPress, Shopify, Webflow, Framer, and Next.js all support uploading or generating it.

Whenever a new significant AI crawler launches (we update the AI bot reference table here as soon as new ones are documented), or whenever your site structure changes meaningfully (new admin paths, new private sections, new sitemaps). Otherwise, set-and-forget is fine. Most sites change theirs once or twice a year.

Yes. We track the registry against each provider's official documentation (OpenAI, Anthropic, Google, Apple, Perplexity, Meta) plus the darkvisitors.com index. Each entry links to the source's official docs page where available. The list is updated whenever a major provider announces a new crawler or renames an existing one.

Absolutely. Add a `Sitemap:` line for each one. Common pattern: a separate sitemap per content type (sitemap-pages.xml, sitemap-articles.xml, sitemap-products.xml) all referenced from a single robots.txt. Or one sitemap-index.xml that points to all your individual sitemaps — also fully supported.

More free tools

The full Rankrize free toolkit

Schema Markup Generator

Generate validated JSON-LD schema markup for 9 types — paste-ready for your site.

Google SERP Preview

See how your title and description render on Google desktop, mobile, and as a social card — with real pixel-width truncation.

Core Web Vitals Checker

Real Core Web Vitals from Google PageSpeed Insights — LCP, INP, CLS, FCP, TTFB. Lab + field data, mobile + desktop.

Meta Tag Generator

Build a complete, validated set of HTML meta tags — title, description, robots, canonical, Open Graph, Twitter cards, and more.

You're here