Blocking Bots still getting through?

Currently am blocking bots that try to showcase backlinks such as majestic and ahrefs but yet they are still appearing in their search data. Anybody have a good current list of bots to block from showing off your linky links?

It is much better to whitelist the good bots instead of blocking bad bots one-by-one. My personal complete guide of good bots is mentioned below.

Comprehensive Guide of Good Bots to WhiteList:


User-agent: googlebot
Disallow:

User-agent: googlebot-mobile
Disallow:

User-agent: googlebot-image
Disallow:

User-agent: bingbot
Disallow:

User-agent: msnbot
Disallow:

User-agent: slurp
Disallow:

User-agent: Teoma
Disallow:

User-agent: yandex
Disallow:

User-agent: sogou
Disallow:

User-agent: baiduspider
Disallow:

User-agent: exabot
Disallow:

User-agent: gigabot
Disallow:

User-agent: facebookexternalhit
Disallow:

User-agent: twiceler
Disallow:

User-agent: scrubby
Disallow:

User-agent: robozilla
Disallow:

User-agent: nutch
Disallow:

User-agent: ia_archiver
Disallow:

User-agent: baiduspider
Disallow:

User-agent: naverbot
Disallow:

User-agent: yeti
Disallow:

User-agent: yahoo-mmcrawler
Disallow:

User-agent: yahoo-blogs/v3.9
Disallow:

User-agent: psbot
Disallow:

User-agent: asterias
Disallow:

User-agent: java
Disallow:

User-agent: wget
Disallow:

User-agent: curl
Disallow:

User-agent: commons-httpclient
Disallow:

User-agent: python-urllib
Disallow:

User-agent: libwww
Disallow:

User-agent: httpunit
Disallow:

User-agent: phpcrawl
Disallow:

User-agent: *
Disallow: /
Disallow: /cgi-bin/

There are also some plugins on WordPress that blocks a large portion of those bad bots. The best effective technique though is to whitelist the decent ones.
Remember that not all bots obey your robots.txt file thus you might need to block them before they reach your website.

There is a nice technique for identify bad bots called “Bad Bots Blackhole”, which is a script that traps those bad bots that disobey your robots.txt file. You can google it online to find more about it.

Hope that helps!

Very very nice thank you. What about blocking ahrefs and majestic though?

We are whitelisting not blacklisting! so they will get blocked automatically.

I am glad that was helpful.

Good luck with your projects and thanks for your kind words.

Unfortunately, if the bots you are trying to block are in any way malicious (which is what I undersood from the original question), then robots.txt is not going to block them. Robots.txt is a voluntary protocol. In other words, it requests a bot not to visit the site. But it can’t prevent it from doing so. By definition, a malicious bot can and will ignore it.

It’s much better to use .Htaccess to do the blocking, if your servers upports it; or an equivalent method if it doesn’t.

Mike