Visited
0 / 8
Crawler
GPTBot
Strategy
BFS
Live demo · GPTBot traversing a site graph in breadth-first order, the same way major AI crawlers explore your link architecture.
Yes - all major AI crawlers follow links. They respect robots.txt, they prefer breadth-first traversal from your most-linked entry points, and they reach less-linked deep pages far less often. Pages more than 3-4 clicks from the homepage get visited occasionally at best, which is why internal link architecture is still a critical GEO discipline.
1. The crawlers you actually need to know
The shortlist as of 2026: GPTBot and ChatGPT-User (OpenAI training + browse), OAI-SearchBot (OpenAI search index), ClaudeBot and anthropic-ai (Anthropic training + retrieval), PerplexityBot (Perplexity live retrieval), Google-Extended (Gemini training opt-out control) and Cohere-ai (Cohere training). Allowing or blocking each one is one of the first things to audit when diagnosing a missing-from-AI-search problem - the full diagnostic lives in our why your site isn't in ChatGPT guide.
2. How they traverse
All major AI crawlers use a variant of breadth-first traversal starting from the URLs they already know about - your homepage, your sitemap, any URLs they've seen referenced from external sources. They follow internal hrefs the same way Googlebot does, with similar depth penalties: the deeper a page is from the homepage, the less often it gets crawled. The practical consequence is that flat site architectures (where every important page is 1-2 clicks from the homepage) get far better AI search visibility than deep ones.
3. External links and the citation graph
External links from trusted publishers do double duty: they help AI crawlers discover your site faster, and they directly feed the citation graph signal that determines whether the model trusts you enough to cite you. The citation graph is one of the five signals documented in our citation algorithm guide, and it's the hardest one to fake - you have to actually earn the mention.
4. Internal link architecture for AI search
Three principles. First, keep critical pages within 3 clicks of the homepage - hub pages with curated link lists are the cleanest way to do this. Second, make sure every page links to its category parent and sibling pages, not just back to the home. Third, publish a clean, current sitemap.xml and an llms.txt file that explicitly lists your most important URLs - both are direct hints to crawlers about where to spend their crawl budget.
For deep architectural patterns that work for both AI and classic search, the foundation overlap is documented in our GEO vs SEO guide.
5. Verifying crawl coverage
Filter your server logs by user agent (GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot, Google-Extended) over the last 30 days, group by URL and look for the gaps. If your hub pages are getting hit but your deep pages are not, you have an internal link architecture problem. The full measurement playbook lives in our track AI search visibility guide.
Recap
AI crawlers follow links the same way Googlebot does - with depth penalties, robots.txt respect and a strong preference for breadth-first traversal. The implications are clear: keep critical pages shallow, link to category siblings as well as parents, ship a clean sitemap and llms.txt, and verify with server logs. The brands with the strongest internal link architecture also have the strongest AI search visibility - it's not a coincidence.
Make sure GPTBot can find your best pages
Geolify GEO packages include a full crawl coverage audit, an internal-link refactor where needed, llms.txt + sitemap shipping, and per-platform monitoring. From $499.
FAQ
Do AI crawlers like GPTBot follow internal links?
Yes - GPTBot and the other major AI crawlers traverse internal links the same way Googlebot does, following hrefs from page to page. They tend to be slightly less aggressive than Googlebot in their crawl frequency, but they will eventually walk the entire reachable site graph if you let them. Pages buried more than 3-4 clicks from the homepage get visited far less often, so internal link architecture matters as much for AI crawlers as it does for SEO.
Do they follow external (outbound) links?
Yes, but the behaviour varies by crawler. GPTBot will follow outbound links during training crawls but is more selective during ChatGPT-User browse retrieval. ClaudeBot follows outbound links sparingly. PerplexityBot is the most aggressive at following outbound links because it builds answers from live retrieval and needs to chase fresh sources. The practical implication: outbound links from trusted publishers to your site still feed the AI citation graph.
Should I use rel='nofollow' on links to my own pages?
No. Internal nofollow is almost always a mistake - you're telling crawlers to ignore parts of your own site, which usually does more harm than good. The exception is if you have admin or login URLs you genuinely don't want indexed, in which case robots.txt or noindex meta is a cleaner signal than nofollow.
How do I check if AI crawlers are actually finding my deep pages?
Filter your server logs for the user agents GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, anthropic-ai, PerplexityBot and Google-Extended over the last 30 days, then group by URL. If your hub pages are getting hit but your deep pages are not, you have an internal-link problem - those deep pages are too far from any entry point the crawler is reaching.
Does linking from a high-authority site help me rank in ChatGPT?
Yes, in two ways. First, the crawler discovers you faster when a trusted source links to you - the AI crawler is essentially following the same edge in the graph the LLM has already learned to trust. Second, the inbound link is itself a citation graph signal that lifts your composite score for citation selection. This is why earned mentions on Wikipedia, Forbes, TechCrunch and similar publications compound so hard for AI search visibility.