Block search engines indexing a few pages

yjones · September 5, 2012, 7:02am

Hi guys,
I use SEOMoz. This is alerting me to some ‘no meta data found’ errors on a few pages. Its actually just one page, but it displays different data depending on whats coming out of the db:
eg: mydomain.com/mypage/var-value-1, mydomain.com/mypage/var-value-2, mydomain.com/mypage/var-value-3 etc…

Can I use robots.txt to block all search engines from listing all pages after mydomain.com/mypage/ ? ie some kind of wildcard command?

I hope that makes sense. Please dont suggest I use a canonical tag. Its a long story, but because of the code thats on the page this is not an option.

Thanks

rajukk00 · September 5, 2012, 7:49am

For website’s good promotion, sitemap is very necessary. you will do to allow or disallow search engine with help of sitemap.

Stevie_D · September 5, 2012, 11:42am

[font=verdana]You can use robots.txt to block search engine spiders from accessing parts of your site*, if that’s what you want to do:

user-agent: *
disallow: /mypage/

but that asks robots not to look at anything in the “mypage” folder, which would include the index page, so that probably wouldn’t work for you.

Another option, which might or might not work for you, would be to create a rewrite regex to redirect mydomain.com/mypage/var-value-* to mydomain.com/mypage/ - it depends whether the DB needs that extra parameter to generate the page or if it’s just an artefact.

* OK, technically it’s “suggest that the might not want to look there”, but most search robots are well-behaved and do what they’re asked.[/font]

yjones · September 6, 2012, 10:40am

Thanks Stevie D - 2 possible solutions there. I’ll probably go with the robots.txt one first.

system · September 9, 2012, 10:01pm

If you’re using wordpress then Robots-META plugin will enable you to do this easily - you can set both the indexing and follow tags on a post by post basis.

You can also selectively noindex posts if you’re using the Thesis theme.