WebToolsPlanet
seo Tools

Robots.txt Generator

Build a valid robots.txt file with visual rules, user-agent controls, and sitemap declaration.

Last updated: March 25, 2026

Client-Side Processing
Input Data Stays on Device
Instant Local Execution

Find this tool useful? Support the project to keep it free!

Buy me a coffee

What is Robots.txt Generator?

A robots.txt file is a plain text file placed at the root of your website (e.g., https://example.com/robots.txt) that implements the Robots Exclusion Protocol — the industry-standard way to communicate crawling preferences to search engines and other automated bots. When Googlebot, Bingbot, or any other well-behaved crawler visits your site, it checks this file first to see which pages it is allowed to crawl.

More than just blocking bad pages, a well-crafted robots.txt file also directs Googlebot to your XML sitemap (saving crawl budget), controls which bots can access different sections, and can block non-search crawlers (AI scrapers, aggressive crawlers) from consuming bandwidth. Importantly, robots.txt only controls crawling — not indexing. A page can be indexed without being crawled if other pages link to it. For true content protection, use noindex meta tags.

How to Use Robots.txt Generator

1

Add user agents to control — use "*" for all bots, or specify individual bots like "Googlebot" or "Bingbot"

2

Add Disallow rules for resources you want to block (e.g., /admin/, /wp-content/, /private/)

3

Add Allow rules if you need to re-allow specific paths within a blocked directory

4

Enter your XML sitemap URL to declare it to crawlers

5

Click "Download robots.txt" and upload the file to your website's root directory

Common Use Cases

  • Blocking Googlebot from crawling admin panels, login pages, and CMS dashboards
  • Preventing crawling of duplicate content pages (search results, filtered pages, staging URLs)
  • Declaring your sitemap URL so all crawlers can discover it without needing Google Search Console
  • Blocking aggressive AI scrapers and data harvesters (GPTBot, CCBot) from your content
  • Limiting crawl rate with Crawl-delay to protect servers from crawler-induced load
  • Blocking test or staging environments from being indexed if they share the same robots.txt
  • Allowing specific crawlers (e.g., only Googlebot) while blocking all others for exclusive indexing
  • Preventing image crawling for copyright-sensitive visual content

Example Input and Output

A well-structured robots.txt for a WordPress site — blocks common admin and generated paths, declares the sitemap, and blocks aggressive AI scrapers:

Your rules
User-agent: all bots
Block: /wp-admin/, /wp-includes/, /?s=, /xmlrpc.php
Allow: /wp-admin/admin-ajax.php
Sitemap: https://example.com/sitemap_index.xml

Also block: GPTBot (OpenAI), CCBot (Common Crawl)
Generated robots.txt file
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /wp-includes/
Disallow: /?s=
Disallow: /xmlrpc.php

User-agent: GPTBot
Disallow: /

User-agent: CCBot
Disallow: /

Sitemap: https://example.com/sitemap_index.xml

Client-Side Processing

All robots.txt generation happens in your browser. No URL paths, sitemap addresses, or domain names you enter are sent to our servers.

Don't Block CSS/JS

Never block /wp-content/themes/, /static/, or /assets/ directories. Google renders pages like a browser and needs to access CSS and JS files to understand your content correctly. Blocking these can cause Google to misrender and downrank your pages.

Combine with Noindex

Use robots.txt for crawl control (saving crawl budget) and noindex meta tags for search result control. The safest approach for pages you want completely hidden: both Disallow in robots.txt AND a noindex response header served when crawlers do reach it.

Frequently Asked Questions

Does robots.txt prevent pages from appearing in search results?
No — this is a critical misunderstanding. Robots.txt controls crawling, not indexing. Google can still index a blocked URL if it finds links to it from other pages. To prevent a page from appearing in search results, you must use a <meta name="robots" content="noindex"> tag or an X-Robots-Tag response header.
Do all crawlers respect robots.txt?
Well-behaved crawlers from reputable companies (Google, Bing, Apple, DuckDuckGo) follow robots.txt. Malicious bots, spammers, and some scrapers do not. Robots.txt is advisory, not enforced — it relies on voluntary compliance. For true access control, use server-level authentication (htaccess, Cloudflare firewall rules).
What is "Crawl-delay" and should I use it?
Crawl-delay specifies seconds between bot requests to your server (e.g., Crawl-delay: 5 means 5 seconds minimum between requests). Google does not support Crawl-delay in robots.txt — use Google Search Console's Crawl Rate setting instead. Bing and some other crawlers do respect it. Use it only if your server is struggling under crawler load.
How do I block AI scrapers like GPTBot and CCBot?
Add separate user-agent blocks: "User-agent: GPTBot" followed by "Disallow: /" to block OpenAI's crawler. Do the same for "CCBot" (Common Crawl), "anthropic-ai", and "Google-Extended" (Google's AI training crawler). Note: this only blocks data collection — content already indexed cannot be retroactively removed from AI training datasets.
Where does robots.txt need to be placed?
At the root of your domain: https://yourdomain.com/robots.txt. Subdomains need their own robots.txt (https://blog.yourdomain.com/robots.txt). It cannot be in a subdirectory like /public/robots.txt — crawlers only check the root path.
Is the Disallow field case-sensitive?
Yes. Disallow: /Admin/ and Disallow: /admin/ are different rules. Most file systems and web servers are case-sensitive, so match the actual URL case exactly. For safety, add both variants if unsure.
How do I verify my robots.txt is working?
Use Google Search Console's robots.txt Tester (Search Console → Settings → robots.txt tester) to test specific URLs against your rules. Google also shows the robots.txt it last crawled, which can differ from your current file if it was recently updated.

How This Tool Works

The tool maintains an internal state of user-agent blocks and path rules. As rules are added or edited, the output is re-generated in real time by sorting user-agent groups, formatting each Disallow/Allow directive as a separate line, appending the Sitemap URL at the end (as recommended by Google's specification), and rendering the complete file text. The output is generated purely in browser JavaScript — no server interaction occurs.

Technical Stack

Browser-native JavaScriptRobots Exclusion Protocol (RFC 9309)Real-time generationClient-side only