Robot exclusion with robots.txt file

robots.txt is a text file that instructs search engine robots what pages within a website they do not have permission to index. If you have a web page (or a file or directory) that you do not want the robots to index (because it is a log file, a private directory, etc.), you may restrict the robots permission to that file by using a robots.txt file. When a robot attempts to index a site, it requests the robots.txt file first.

Suppose a search engine robot is about to index http://www.scriptingmaster.com. First, the robot will request http://www.scriptingmaster.com/robots.txt. (If the robots.txt file does not exist or is empty, the robot will index all files. Also, the file name is case-sensitive: it must be in lowercase. Note the robots.txt file must be in the root directory of a web site.) Then, it analyzes the robots.txt file for instructions on what documents from www.scriptingmaster.com it should exclude from indexing.

If you do not already have robots.txt file, it can be created with Notepad or some other text editor. Make sure you save your file in the root directory of the website and it must be saved as robots.txt.

The basic format of robots.txt file is listing of the particular spider whose access you want to limit and statements that specify which directory paths to disallow. You can also use the wildcard * to specifies rules for all spiders. For instance, the following:

User-agent: *
Disallow: /images/

denies access to the images folder for all spiders. If, however, you wanted to deny access to a specific spider such as Googlebot (Google's spider), you would add to you robots.txt file:

User-agent: Googlebot
Disallow: /images/

This denies access to the images folder for the Googlebot spider.

To deny access to a specific file, specify the user agent and location of the file:

User-agent: *
Disallow: /temp.htm
Disallow: /contact-us/contact.htm

This denies access to all spiders to the temp.htm and contact-us/contact.htm files. Note the temp.htm is located in the root directory of the website and contact.htm is located in folder called contact.

When creating robots.txt file make sure not to reveal any files that contains sensitive or private information. By revealing the location of those files, you may aid malicious visitors or robots to misuse the files.