What Is a Robot.txt File?

November 2, 2011

A robot.txt file is like a party invitation to the search engines. It is a file kept on the root of a website that can contain one line of code inviting the robots to review your site and index your pages or it can disallow them to index certain parts of your website. Confused? Why would you want the search engines to NOT index your site? Larger sites with different paths that are dynamically created might not want the robots to enter in the middle of a path so they block that entrance. Let’s use a travel website as an example – would you want someone to land in the middle of a hotel query when they searched? Probably not… Another reason to disallow would be if you have a customer profile section, usually to access such a section the user needs to login to see this section. Last example would be if you have forms for customer to access – these are usually not needed to be a search result.

To see examples of a robots.txt file I suggest you go to any large website and put in the address bar http://bigwebsite.com/robots.txt file and you will see the different things they allow or disallow.

Robots.txt sample syntax
‹meta name=”robots” content=”ARGUMENTS” /›

Sample arguments (replace the word Arguments above)
Noindex – tells page not to be indexed
Nofollow – all links on page not followed
Noarchive – page not cached
Noydir – (yahoo only) stops description and title tag overwrite
Noodp – (google, yahoo) stops description and title tag overwrite Nosnippet – stops google from generating description based on On-page text