zapitnu — 2010-09-23T04:42:08-04:00 — #1
I have heard that it is common that malicious bots can be identified by the fact they have usually will not contain an HTTP_ACCEPT_LANGUAGE header.
Anyone now how true this is?
And if true, how reliable is it? I want to die() when there is no HTTP_ACCEPT_LANGUAGE header in a effort to kill malicious bots.
Will I lose something if I do, good bots like google, yahoo? Will I lose real people?
aleksejs — 2010-09-23T06:12:33-04:00 — #2
Do you have any kind of CSRF protection in place?
aleksejs — 2010-10-23T16:10:26-04:00 — #3
zapitnu — 2010-09-23T06:01:43-04:00 — #4
Actually, I am dealing with form submission software, where it scans the internet for forms, stores the form vars, and then submits crap. In my case, every 3 minutes. So since this is form submission software I don't think robots.txt can do anything for me. The bot has already come by.
It uses random proxies so I can't block by IP. It has a user agent, but likely faked and made to appear common.
So that is why I am investigating the accept language thing I have heard about.
elgumbo — 2010-09-23T05:09:11-04:00 — #5
I don't know about the HTTP_ACCEPT_LANGUAGE theory but I used to use the robots.txt method to find them.
Add a disallow line in the robots.txt to a file you do not use on your site. Grab the details of any bot that access that page.
iprox — 2010-10-29T22:42:40-04:00 — #6
It is surprising how may bots are out there. Some bots pretend to be a valid search engine like Google and sneak into your website without you realizing it. Not all bots obey robots.txt either.
mittineague — 2010-10-30T01:25:31-04:00 — #7
Not only that, but as elgumbo mentioned some use it to find out where you don't want them to go and then go there. Set up a "honey pot" and you'll catch some.
Don't think of the robots.txt file as a security measure by any means.
eastcoast — 2010-10-30T13:54:30-04:00 — #8
If you're worried about bots it's worth looking at the code used by the well known wordpress plugin 'bad behaviour' (which can also be used out with the plugin)