Firefox cURL can't access Yahoo.com but Chrome can

Hey guys,

When I try to browse to yahoo.com via cURL and Firefox, it does not work. When I try to browse to yahoo.com through Firefox alone, it does work. When I try to browse via cURL and Chrome, it does work. What gives? Why can’t I browse with cURL passing along my actual Firefox useragent info, but I CAN with Chrome’s useragent info?

And why can I browse normally from Firefox just not through cURL?

Have you tried spoofing the useragent information rather than using the browser?

Yes. As a matter of fact, after lots of testing, I found that the site loads via cURL on Firefox ONLY when I set the user-agent to NOT include the word “Firefox” in the user-agent string. So

curl_setopt($curl, CURLOPT_USERAGENT, 'blahanything Firefox blah');

does NOT work, but any user-agent string that does not have the string Firefox in it will work.

Very very strange. But the weirder thing is, I can visit yahoo.com using Firefox without cURL, which has the string Firefox in it… obviously.

I am really just trying to get it so

curl_setopt($curl, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);

works. And as of right now, using Firefox through cURL doesn’t work.

P.S. on Chrome and Safari through cURL, I set the user-agent string to be a valid Firefox string and POOF, all of a sudden it doesn’t work. So it definitely has to do with cURL + word “Firefox” in the user-agent string. Why?! I don’t know!

Is cURL outputting the user agent string exactly the same as your browser?

Reason i ask is yahoo are known for not liking some browsers and if they can’t recognise a browser (eg curl screws up the name or puts in a line terminator) then their configuration might not want to serve you / cURL.

They don’t like Seamonkey at all and refuse to allow it access to the new yahoo.

100% sure that the user-agent strings are the same. The Firefox through cURL string is the same as the Firefox alone string.

This just makes no sense!

No it doesn’t does it.

Perhaps you could post the headers from both your firefox and the cURL for us to look at? - By headers i mean the client and server headers. There must be something causing this.

Thanks for your help thus far in trying to figure this out with me. Is grabbing the client and server headers possible to achieve without using a packet sniffer like Wireshark?

P.S.
user-agent string Firefox through cURL:

Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1

user-agent string Firefox only

Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1

Yahoo seems to be ok as long as the Accept header is set:

$headers = array ("Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);

You can view what headers are being sent through the curl request by setting:

curl_setopt($ch, CURLINFO_HEADER_OUT, 1);

then calling

print_r(curl_getinfo($ch, CURLINFO_HEADER_OUT));

after calling curl_exec()

Thats only the user-agent string. Re-read my previous request asking or both Client AND Server headers. Not header, headers.

This is great. Thank you very much. :slight_smile:
So I still wonder why this wasn’t working before when cURLing w/ Firefox. Maybe when you cURL with Firefox as the user-agent it doesn’t have those Accept headers?? But with Chrome and Safari you do I guess. Weird.