CURL and Proxies

Hey there,

I am currently using a PHP code with the CURL library to extract results from Google. As some of you may know, Google doesnt like to be scrapped and it’s why I am using several private HTTP proxies to do it.

Here is the problem. After a while, the proxies get blocked by Google.

Here is what I did to found out the problem.

When I notice that a proxy get blocked by Google in my script, I immediately go to Google manually logged in with the proxy, and strangely I am not blocked at all.

Here is my simple CURL code:

$ch = curl_init();
						curl_setopt($ch, CURLOPT_URL, 'GOOGLE QUERY HERE');
						curl_setopt($ch, CURLOPT_POST, 0);
						curl_setopt($ch, CURLOPT_USERAGENT, $user_agent); //$user_agent is randomly selected from a list wich contain the most popular user agent						
						curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
						curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
						curl_setopt($ch, CURLOPT_COOKIEJAR, "my_cookies.txt");
						curl_setopt($ch, CURLOPT_COOKIEFILE, "my_cookies.txt");
						curl_setopt($ch, CURLOPT_COOKIESESSION, true);  
						curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
						curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, 1);
						curl_setopt($ch, CURLOPT_PROXY, $proxies); //$proxies is randomly selected from my proxies list						
$source = curl_exec($ch);

IS there anything wrong in my code that could produce footprint/create undesirable cookies, etc…??

The thing that I really dont understand is why does Google block me when I am accessing his website using a script and not when I acces it manually even if I am sending the SAME query?

I cannot understand why you do not simply use their official API, unless for some reason you cannot comply with their terms and conditions - in which case what you are doing is contra them and therefore probably illegal.

Sorry Dieuz, this type of approach is against Google’s TOS, and therefore [URL=“http://www.sitepoint.com/forums/faq.php?faq=selfpromo#faq_illegal”]forbidden to be discussed on SitePoint.

Have you considered seeing if Google allows you API access to the data you require?

I have not checked Google API yet.

I will take a look, thanks!

That being said: thread closed.

Dieuz, if have any questions regarding the API you were pointed to, please start a new thread.