Restrict Search Engine Access for Optimization
Robots.txt is one of the many tools a SEO service provider uses. This
is basically a text file they put in the ftp directory of the site.
This file is the first file search engine crawlers check to see if there
are special instructions for them. These instructions are in fact very
easy ones like "enter" or "don't enter." However the only crawlers in
the Internet are not search engine spiders. There are also some "black
widows" that crawl in the web to find sensitive information. The best
way to restrict access to those black widows is setting up a simple
password script. Of course not every web page has sensitive information
and you may want to restrict access to the good intentioned spiders of
the search engines.
Maybe your web site has a complex structure
with lots of internal links. Or maybe there are lots of keywords inside
the site that cause formation of different URL's with the same content.
You may want to restrict access to more than one page with the same
content. If that is the case, Robots.txt file can solve your problem
easily. When using this file you must be careful not to put the file on
the highest level of your site. If you do this the crawlers don't visit
any part of your site. As an example, if you put the robots text to the
URL, " http:// www. yoursite.com/ robot.txt", the crawler does not visit
any address starting with your base URL. So to stop a crawler from
entering a link from your site you must put it under its own folder like
" http:// www. yoursite.com /unwantedlink /robot.txt"
The syntax of robots.txt is as follows:
User-agent: *
Disallow: /
The
first line specifies the crawler you don't want to access to your site.
The asterisk means "all the crawlers." This means you can specify the
search engine crawlers you don't want. In theory this feature should
stop the black widows we talked about but because they have malicious
intent they don't obey the rules and check the robots.txt. The second
line must be as clear as it seems. It disallows the crawlers from
entering this folder making your site easier to be indexed by search
engine indexers.
Thursday, October 18, 2012
Restrict Search Engine Access for Optimization
6:00 AM
Black Widow Spiders