Robot.txt File

Thanks for your help… Now anybody can understand this… in Remark column read carefully Disallow means " [the URL path you want to block]"

Thanks

I think I copied that from one of the numerous sites with detailed explanations of how to use a robots.txt file.

Personally I think it is a bit confusing and should not mention the URL which to me is the complete domain, path and/or web page.

I find it handy to be able to block a complete path but also be able to allow specific sub-paths under the blocked path.

You just need to mention sub-path to block that page…

Thanks

The Disallowed path mentioned has numerous other sub-paths.

I wanted to Allow the images sub-folder within the Disallowed path.

Maybe there are other ways to achieve the same result.

Lol He is good knowledge then your, just check his badge level. He is reputed man in this community, you can’t blame him in this way.

No, one using robots.txt to prevent duplicate issue, because all the webmaster using canonical link tag for duplicate issue, yes there are spammy webmaster, who are using robots.txt to prevent for google crawling, so spider does not know, you have copied something.

If you don’t allowed to crawl your duplicate pages, then why you need to block?, because your pages will not index at all if you block them, unless, someone enter exact query in search box.

1 Like

Lol Dupicate links and content are two different things…

Also @Goyllo you mention in your post…

in second point you mention some webmasters block their directories to not pass link juice… that robots in source page to disallow link rel=“nofollow”…

Just take one more example, to clear your doubts again.

For example, you have one directory say demo

www.example.com/demo/

and you have blocked that directory on robots.txt, and you have mention that directory in your blog post, then indirectly you are wasting your Pagerank, because it is blocked, and link juicy still passing to that link.

Here is brief guide about it, Hope it helps :slight_smile:

I’m sorry if my earlier reply upset you; it was not intentional. However, you were suggesting something which could lead to Google penalising your site, and as a Moderator, I felt it was important to explain - for the benefit of others reading this thread - why I considered the advice to be bad. My issue is entirely wirth the advice you were offering in that post, and not with you as a person. I’m sorry if you read it differently.

[quote=“John_Betong, post:26, topic:203889”]
I wanted to Allow the images sub-folder within the Disallowed path.
[/quote]@John_Betong - “Allow” was not in the original robots.txt specifications, and while some major bots - including Googlebot - do now recognise it, I’d be wary of relying on it.

See http://www.robotstxt.org/robotstxt.html and https://en.wikipedia.org/wiki/Robots_exclusion_standard.

1 Like

[quote]While Google won’t crawl or index the content blocked by robots.txt, we might still find and index a disallowed URL from other places on the web. As a result, the URL address and, potentially, other publicly available information such as anchor text in links to the site can still appear in Google search results. You can stop your URL from appearing in Google Search results completely by using your robots.txt in combination with other URL blocking methods, such as password-protecting the files on your server, or inserting indexing directive meta tags into your HTML.
[/quote]

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.