cURL on site that redirects to disclaimer, updated

Hello,

This is my first ever post, so I apologize in advance if I am wrong in approach or etiquette, and thank you for any assistance despite my faults.
I know this topic was covered back in March

However I am looking for assistance. The accepted solution of that topic is no longer viable as the Water Survey of Canada page has updated. I have tried every possible variation I can think of / find, but I admit I am neither proficient in cURL nor PHP.
Say for example I am trying to scrape data from http://wateroffice.ec.gc.ca/report/report_e.html?type=realTime&stn=08KA007 - how might I go about doing that and getting around the ever-present disclaimer, now that the website has been updated and the finely-crafted script by @rpkamp does not work anymore?

Thank you all very much!

See if the following works. The contents of the page should be in the $res variable. How you scrape it will be up to you.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://wateroffice.ec.gc.ca/report/report_e.html?type=realTime&stn=08KA007');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch, CURLOPT_COOKIE, 'disclaimer=agree');
$res = curl_exec($ch); 
curl_close($ch);  
echo $res;

It appears that the site removes the disclaimer if it detects the presence of a disclaimer cookie so you just need to ensure that you send it with the curl request.

2 Likes

Have to say i’m a bit disappointed in the site not making the data more easily available (like a CSV form, for example) if it’s going to be publicly distributed data.

For those that actually care: Normally I throw up the standardized “Dont scrape data without permission”, but this site specifically says the information MAY be copied and distributed as long as you’re not selling the data.

Thanks! This worked perfectly!!

Hello. I totally agree. I am only looking to get the river data for research purposes. USGS makes it exceptionally easy - but Canada does not seem as user-friendly. Thanks again for the support!

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.