Technical SEO

robots.txt

A robots.txt file is a text file at a website's root URL that instructs search engine crawlers which URLs they are allowed or disallowed to crawl. It controls crawler access at the site level but does not control indexing — pages blocked by robots.txt can still appear in search results if linked from other sites.

Quick Answer

A robots.txt file is a text file at a website's root URL that instructs search engine crawlers which URLs they are allowed or disallowed to crawl. It controls crawler access at the site level but does not control indexing — pages blocked by robots.txt can still appear in search results if linked from other sites.

How robots.txt Works

A robots.txt file is placed at the root of a domain (e.g., yourdomain.com/robots.txt) and uses the Robots Exclusion Standard to tell crawlers which parts of the site they are and aren\'t allowed to crawl. Google\'s Googlebot, Bing\'s Bingbot, and other major crawlers check robots.txt before crawling any URL from a domain.

Why robots.txt Matters for B2B Marketing

The basic syntax uses `User-agent:` (specifying which crawler the rule applies to, `*` for all), `Disallow:` (paths to block), and `Allow:` (paths to permit, used to override a broader Disallow). You can also reference your sitemap URL in robots.txt, which is a common best practice.

robots.txt: Best Practices & Strategic Application

Robots.txt is widely misunderstood in a critical way: it controls crawling, not indexing. Blocking a URL in robots.txt prevents Google from crawling it, but if other sites link to that URL, Google can still index it (without seeing its content — just knowing the URL exists). To prevent indexing, use a noindex meta tag on the page itself. To prevent both crawling and indexing, you need a noindex tag (but the page must be crawlable to receive it).

Agency Perspective: robots.txt in Practice

The most catastrophic robots.txt error is accidentally blocking the entire site — typically by adding `Disallow: /` without realizing it applies to all user agents. This mistake has caused major sites to disappear from Google within days. Other common errors include: blocking CSS and JavaScript files needed for rendering (which prevents Google from seeing your content as users do), blocking image directories, and forgetting to update robots.txt after a site migration that changes URL structures.

Frequently Asked Questions: robots.txt

Put robots.txt Into Practice

MV3 Marketing helps B2B companies apply these strategies to drive measurable pipeline growth. Our team executes technical seo audit for technology, SaaS, and professional services companies.

Audit Your Technical Setup →