ad

How to set Robots.txt file

How to set Robots.txt file

What is Robots.txt?


The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Block all web crawlers from all content

User-agent: * 
Disallow: /

Block a specific web crawler from a specific folder

User-agent: Googlebot 
Disallow: /no-google/

Block a specific web crawler from a specific web page


User-agent: Googlebot 
Disallow: /no-google/blocked-page.html

Sitemap Parameter

User-agent: * 
Disallow: 
Sitemap: http://www.example.com/none-standard-location/sitemap.xml

Important Rules


  • In most cases, meta robots with parameters "noindex, follow" should be employed as a way to to restrict crawling or indexation.
  • It is important to note that malicious crawlers are likely to completely ignore robots.txt and as such, this protocol does not make a good security mechanism.
  • Only one "Disallow:" line is allowed for each URL.
  • Each subdomain on a root domain uses separate robots.txt files.
  • Google and Bing accept two specific regular expression characters for pattern exclusion (* and $).
  • The filename of robots.txt is case sensitive. Use "robots.txt", not "Robots.TXT."
  • Spacing is not an accepted way to separate query parameters. For example, "/category/ /product page" would not be honored by robots.txt.

2 comments:

  1. The above shared post is very important for us, thanks for sharing.

    Gmail Tech Support

    ReplyDelete
  2. Thanks for explaining about this file. I have not been able to find similar information for a long time. Just recently, I found among all sources, an excellent website, where is write about how to open txt file https://wikiext.com/txt. Maybe if you will intresting in that, this article will be usefull for you.

    ReplyDelete