I am currently using a PHP code with the CURL library to extract results from Google. As some of you may know, Google doesnt like to be scrapped and it's why I am using several private HTTP proxies to do it.
Here is the problem. After a while, the proxies get blocked by Google.
Here is what I did to found out the problem.
When I notice that a proxy get blocked by Google in my script, I immediately go to Google manually logged in with the proxy, and strangely I am not blocked at all.
Here is my simple CURL code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'GOOGLE QUERY HERE');
curl_setopt($ch, CURLOPT_POST, 0);
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent); //$user_agent is randomly selected from a list wich contain the most popular user agent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($ch, CURLOPT_PROXY, $proxies); //$proxies is randomly selected from my proxies list
$source = curl_exec($ch);
IS there anything wrong in my code that could produce footprint/create undesirable cookies, etc..??
The thing that I really dont understand is why does Google block me when I am accessing his website using a script and not when I acces it manually even if I am sending the SAME query?
I cannot understand why you do not simply use their official API, unless for some reason you cannot comply with their terms and conditions - in which case what you are doing is contra them and therefore probably illegal.
Sorry Dieuz, this type of approach is against Google's [TOS, and therefore [URL="http://www.sitepoint.com/forums/faq.php?faq=selfpromo#faq_illegal"]forbidden](http://www.google.co.uk/accounts/TOS) to be discussed on SitePoint.
Have you considered seeing if Google allows you API access to the data you require?
I have not checked Google API yet.
I will take a look, thanks!
That being said: thread closed.
Dieuz, if have any questions regarding the API you were pointed to, please start a new thread.