Wednesday, March 6, 2013

Understanding Website Robots txt

Understanding Website Robots txt
This post I try to discuss about robots.txt. Robot? Is it true robot robot?, Instead. Robots that I will describe here is the Web Robots (commonly known as Web Wanderers, Crawlers or Spiders), which is a program that automatically crawl the entire website.

Search engines like Google use robots to index the content or the content of such web, the spammers use a fake email address and so forth. Website owners menggukanan robots.txt file to instruct the robot to access the website, this is called The Robots Exclusion Protocol.




It works like this:

a robot visited website url, for example http://www.myarticle-article.blogspot.com/about-me.html. Previously, he had to check the existence of the robots.txt file on http://www.myarticle-article.blogspot.com/ robots.txt, it found its file and contains:

User-agent: *
Disallow: /

"User-agent: *" means this is addressed to all robots. "Disallow: /" tells the robot that robots should not visit all the pages of the website.

There are two important issues in the use of robots.txt is:

1. Robots can ignore robots.txt, especially robots that can directly malware scans the entire website and look for the weaknesses of the website (cracking), or it could be used by spammers to obtain email address

2. Because the robots.txt file is a public, can be seen by anyone, because, every person, could see parts of the server you do not want to be used by robots.


So, never use robots.txt to store important information from your server. So first, the discussion continued in the next post. Hopefully useful, thank you

No comments:

Post a Comment