Google bot sending malicious GET requests?

Hi all,

I found this line in a client’s web log files the other day:

66.249.71.143 - - [11/Sep/2012:17:14:36 -0400] “GET /cgi-bin/sw.pl?read=%7Ccat%20/etc/passwd%7C HTTP/1.1” 200 139952 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

As you can see, the GET string contains “cat /etc/passwd” which is a command in unix to grab the list of users on the server. Definitely a no-no, and something only a hacker / bad bot would do.

I checked the IP and it is from Google, as the agent string suggests.

So my question is, where did google pick this up? I suppose maybe a hacker site out there could have that link listed, but otherwise I’m stumped. Frankly I’m surprised google wouldn’t “weed out” links that are hack attempts.

has anyone else seen this kind of hit from google?

-Jim

How is Google supposed to distinguish links that are malicious from other links? The only way Google has any reason for making that call is if there is a link somewhere on your site that contains that in the address or another site that links to yours that has that as the page on your site that it is supposedly linking to. Unless someone has also managed to install sw.pl into the cgi-bin folder on your server the link will of course simply return a 404 rather than a 200.

If you have a public web stats directory, then links such as this will turn up there due to random malicious bot scans looking for common web app vulnerabilities

Well, that script is on the site, but that “GET” string (after the read=) is not linked anywhere, that I can see. I suppose Google could distinguish it as malicious the same way I parsed it out of the log file - a simple regex looking for /etc/passwd. So if someone crafted a link that would perform a SQL injection to another site, could they get google to spider it and do the dirty work? I suppose so.

We do have analytics on the site, so I suppose is another robot hit the site with that query, and google was able to grab it from the resulting page, google could have stored that and decided to try and spider it.

Should the script be there? Does it serve a purpose you are aware of, and is the file modified time stamp of it recent?

Yes, it’s a real script. As with other hack attempts, carefully crafted get/post parameters are sent in, to probe the script code. The script in this case ignores the bad request and returns a list of threads (it’s a forum script). So from google’s perspective, I guess it’s a url that works, so it’s checking it out.

Maybe you should download the script file and check it out with your antivirus software, it might have been tampered with.

Just out of curiosity does the page make this GET request using jQuery/Ajax when it loads up?