How to block Unknown robot (identified by 'bot*')

These eat up most the bandwidth on my site. I also have a problem with site speed at times and content stealers (scrappers). How do I write this in htaccess exactly to block all of these: Unknown robot (identified by ‘bot*’)?

Can you share a portion of your logs in order for us to get a better understanding of the problem?

What portion of the logs do you need?

Dave,

You might benefit from reading the mod_rewrite tutorial linked in my signature as it contains explanations and sample code. It’s helped may members and should help you, too. It has examples just for this situation. THEN, if you have questions, please come back and I’ll help you get the code exactly right.

Regards,

DK

That page seemed to be more about SEO and url redirects I had trouble finding how to block them exactly in Robot.txt and htaccess. Does anyone know a simple answer on how to block Unknown robot (identified by ‘bot*’) by sitemap and htaccess?

Dave,

Sorry, I thought I had an example of how not to abuse a server with the typically long list of bots to reject (with sample code). My error: mea culpa.

The {HTTP_USER} (I think that’s the Apache variable) is notoriously unreliable but you can use a RewriteCond to test for bot\* then Fail any request in the subsequent RewriteRule.

Regards,

DK

Fortunately the Apache documentation covers this exact scenario.

http://httpd.apache.org/docs/2.4/rewrite/access.html#blocking-of-robots