Robots Exclusion Protocol

(robots.txt) The Robot Exclusion Standard, also known as the Robots Exclusion Protocol or robots.txt protocol, is a convention to prevent cooperating web crawlers and other web robots from accessing all or part of a website which is otherwise publicly viewable. Robots are often used by search engines to categorize and archive web sites. This standard can be used in conjunction with, Sitemaps, and robot inclusion standard for websites.

If a site owner wishes to give instructions to web robots they must place a text file called robots.txt in the root of the web site hierarchy.

Robot Exclusion Standard Examples
Crawlers should support these directives.

# Comments appear after the "#" symbol at the start of a line
or after a directive only !

# Allow all robots to visit all files User-agent: * # The wildcard * specifies all robots Disallow:
# Keep all robots out! User-agent: * Disallow: /
# Tell a specific crawler User-agent: Googlebot # actual user-agent of the bot Disallow: /private/
# All Bots should not enter these directories User-agent: * Disallow: /cgi-bin/ Disallow: /images/ Disallow: /private/
# Tell crawlers not to index a specific file User-agent: * Disallow: /directory/file.html

Nonstandard extensions
Some crawlers may support these directives.

# Use the Allow directive(s) first, then the Disallow: User-agent: * Allow: /folder1/myfile.html Disallow: /folder1/
# Block files of a specific file type User-agent: Googlebot Disallow: /*.gif$
# Block all subdirectories that begin with private User-agent: Googlebot Disallow: /private*/
# Directive is not tied to any specific user-agents Sitemap: User-agent: Googlebot
# Some crawlers support multiple Sitemaps: Sitemap: Sitemap: User-agent: *
# Seconds to wait between server requests User-agent: * Crawl-delay: 10

# Some crawlers (Yandex, Google) support a Host directive.
# !- Note: The Host directive should be inserted at the bottom
# of the robots.txt file after Crawl-delay directive -!

# Specify preferred mirror'ed domain. Host:

Meta-Tags and Headers (more Robot-Exclusion methods)

40MB Free Web Space with FTP and HTTP @