How it works
Paste your robots.txt and a list of paths you want to test. The simulator parses each User-agent block, then runs every path through the longest-match rule for each of the 11 AI crawlers tracked here. You see a matrix of green allows and red blocks, plus a per-bot summary of how many of your test paths are reachable.
Which bots matter
- GPTBot + ChatGPT-User + OAI-SearchBot - OpenAI splits training, browse and search into three separate agents. Block one and you only affect that behaviour.
- ClaudeBot + Claude-Web- Anthropic uses ClaudeBot for training and Claude-Web for real-time browsing. Block ClaudeBot if you're opting out of training.
- PerplexityBot + Perplexity-User - Two agents. Perplexity-User represents live user queries, so blocking it removes you from Perplexity answers entirely.
- Google-Extended, Applebot-Extended- Training opt-outs for Gemini and Apple Intelligence that don't affect traditional search ranking.
- CCBot, Bytespider - Common Crawl and ByteDance scrapers that most operators block outright.
Pair with
Generate a clean ruleset with the AI Crawl Rule Generator, then pair with a LLMs.txt Generator and cross-check with the Sitemap vs LLMs.txt Consistency Checker. Strategy reading: what is llms.txt.