Robots.txt Generator
Build a valid robots.txt file with visual rules, user-agent controls, and sitemap declaration.
Last updated: March 25, 2026
Find this tool useful? Support the project to keep it free!
Buy me a coffeeWhat is Robots.txt Generator?
A robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that implements the Robots Exclusion Protocol — the industry-standard way to communicate crawling preferences to search engines and other automated bots. When Googlebot, Bingbot, or any other well-behaved crawler visits your site, it checks this file first to see which pages it is allowed to crawl.
More than just blocking bad pages, a well-crafted robots.txt file also directs Googlebot to your XML sitemap (saving crawl budget), controls which bots can access different sections, and can block non-search crawlers (AI scrapers, aggressive crawlers) from consuming bandwidth. Importantly, robots.txt only controls crawling — not indexing. A page can be indexed without being crawled if other pages link to it. For true content protection, use noindex meta tags.
How to Use Robots.txt Generator
Add user agents to control — use "*" for all bots, or specify individual bots like "Googlebot" or "Bingbot"
Add Disallow rules for resources you want to block (e.g., /admin/, /wp-content/, /private/)
Add Allow rules if you need to re-allow specific paths within a blocked directory
Enter your XML sitemap URL to declare it to crawlers
Click "Download robots.txt" and upload the file to your website's root directory
Common Use Cases
- Blocking Googlebot from crawling admin panels, login pages, and CMS dashboards
- Preventing crawling of duplicate content pages (search results, filtered pages, staging URLs)
- Declaring your sitemap URL so all crawlers can discover it without needing Google Search Console
- Blocking aggressive AI scrapers and data harvesters (GPTBot, CCBot) from your content
- Limiting crawl rate with Crawl-delay to protect servers from crawler-induced load
- Blocking test or staging environments from being indexed if they share the same robots.txt
- Allowing specific crawlers (e.g., only Googlebot) while blocking all others for exclusive indexing
- Preventing image crawling for copyright-sensitive visual content
Example Input and Output
A well-structured robots.txt for a WordPress site — blocks common admin and generated paths, declares the sitemap, and blocks aggressive AI scrapers:
User-agent: all bots
Block: /wp-admin/, /wp-includes/, /?s=, /xmlrpc.php
Allow: /wp-admin/admin-ajax.php
Sitemap: https://example.com/sitemap_index.xml
Also block: GPTBot (OpenAI), CCBot (Common Crawl)User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /?s=
Disallow: /xmlrpc.php
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
Sitemap: https://example.com/sitemap_index.xmlClient-Side Processing
All robots.txt generation happens in your browser. No URL paths, sitemap addresses, or domain names you enter are sent to our servers.
Don't Block CSS/JS
Never block /wp-content/themes/, /static/, or /assets/ directories. Google renders pages like a browser and needs to access CSS and JS files to understand your content correctly. Blocking these can cause Google to misrender and downrank your pages.
Combine with Noindex
Use robots.txt for crawl control (saving crawl budget) and noindex meta tags for search result control. The safest approach for pages you want completely hidden: both Disallow in robots.txt AND a noindex response header served when crawlers do reach it.
Frequently Asked Questions
Does robots.txt prevent pages from appearing in search results?
Do all crawlers respect robots.txt?
What is "Crawl-delay" and should I use it?
How do I block AI scrapers like GPTBot and CCBot?
Where does robots.txt need to be placed?
Is the Disallow field case-sensitive?
How do I verify my robots.txt is working?
How This Tool Works
The tool maintains an internal state of user-agent blocks and path rules. As rules are added or edited, the output is re-generated in real time by sorting user-agent groups, formatting each Disallow/Allow directive as a separate line, appending the Sitemap URL at the end (as recommended by Google's specification), and rendering the complete file text. The output is generated purely in browser JavaScript — no server interaction occurs.
Technical Stack