Robots.txt should not be used to ban large portions of your site

system · June 11, 2013, 11:30am

robots.txt must not be used to ban large portions of your site. Even if you ban significant portions of your site, search engine spiders may mark your site as “forbidden” in general and simply stop spidering your site as often.

Webinsane · June 11, 2013, 1:06pm

This is the first time that I encounter such theory. In general search engines push towards cleaner internet. They want you to organize your website so they can index less and provide more. It is all about resource preservation. So I doubt that this theory has real facts behind it.

Stevie_D · June 11, 2013, 4:36pm

I agree with Jack. Let Google have free rein to crawl all over your site and index as much or as little of your site as it wants to. Yes, by all means use robots.txt and rel=nofollow to block off areas of the site that won’t make sense to search engines or as landing pages, but apart from that let Google decide how it wants to deal with your site and you are likely to better than if you try to dictate or second-guess too much.

force · June 11, 2013, 4:37pm

This is a misconception. Search engines will continue to crawl you site–just not the pages/paths you disallow.

Stevie_D · June 11, 2013, 6:17pm

The problem is that you might inadvertently close off a route they were using to get to pages that you want them to crawl. The more restrictions you put into place as to where they can and can’t go, the bigger the risk that they won’t be able to quickly and effectively crawl the areas of the site that you want them to.

Jenish · June 12, 2013, 9:44am

Its up to you whether you want to disallow the search engine to crawl your major part of website by using robots.txt or not. If you are using robots.txt to disallow large number of pages/directories, it will slow down the crawler. You can use the Meta tags to block the Google Crawler from indexing your website too.

Webinsane · June 12, 2013, 10:54am

I would have to disagree. Lets assume you run complex software where you want to disable various sections that have no particular weight to any content. This way you save your resources as well.
For example this is Sitepoint robots.txt

User-agent: *
Disallow: /search
Disallow: /member
Disallow: /private
Disallow: /sendmessage
Disallow: /report
Disallow: /postings
Disallow: /editpost
Disallow: /newreply
Disallow: /showpost
Disallow: /online

User-agent: BoardTracker
Disallow: /

Stevie_D · June 12, 2013, 11:38am

That’s what I meant by “by all means use robots.txt and rel=nofollow to block off areas of the site that won’t make sense to search engines” … areas of the site that you can only use when logged in don’t make sense to search engines so they don’t need access to it.

alastairbrian · June 12, 2013, 11:44am

Well my experience is different. I have blocked 75% links of my website i.e I have four languages and out of these my site is only ready for english. I just cant allow Google to crawl my other 3 language pages as if it does it does it drop rank of my english version site as I am using Google translator for most my pages.

Google still needs to improve and above all its bot so we cant trust it and have to make sure it should not see anything which it does not understand.