Thursday, October 18, 2012

Restrict Search Engine Access for Optimization



 Restrict Search Engine Access for Optimization
Robots.txt is one of the many tools a SEO service provider uses. This is basically a text file they put in the ftp directory of the site. This file is the first file search engine crawlers check to see if there are special instructions for them. These instructions are in fact very easy ones like "enter" or "don't enter." However the only crawlers in the Internet are not search engine spiders. There are also some "black widows" that crawl in the web to find sensitive information. The best way to restrict access to those black widows is setting up a simple password script. Of course not every web page has sensitive information and you may want to restrict access to the good intentioned spiders of the search engines.

Maybe your web site has a complex structure with lots of internal links. Or maybe there are lots of keywords inside the site that cause formation of different URL's with the same content. You may want to restrict access to more than one page with the same content. If that is the case, Robots.txt file can solve your problem easily. When using this file you must be careful not to put the file on the highest level of your site. If you do this the crawlers don't visit any part of your site. As an example, if you put the robots text to the URL, " http:// www. yoursite.com/ robot.txt", the crawler does not visit any address starting with your base URL. So to stop a crawler from entering a link from your site you must put it under its own folder like " http:// www. yoursite.com /unwantedlink /robot.txt"
The syntax of robots.txt is as follows:

User-agent: *
Disallow: /
The first line specifies the crawler you don't want to access to your site. The asterisk means "all the crawlers." This means you can specify the search engine crawlers you don't want. In theory this feature should stop the black widows we talked about but because they have malicious intent they don't obey the rules and check the robots.txt. The second line must be as clear as it seems. It disallows the crawlers from entering this folder making your site easier to be indexed by search engine indexers.

Article Source: http://EzineArticles.com/5983131