curl_multi_init and get multi page in php

Hello

I write following code to get Multiple page of the this site: www.mobile.ir
This website have mobile price category … this category have 42 navigation page.
I write following code with curl_multi_init and xpath to get data from Multiple page from category.

$ch1= curl_init ("http://www.mobile.ir/phones/prices.aspx");
$ch2 = curl_init ("http://www.mobile.ir/phones/prices.aspx?sort=date&dir=desc&brandid=0&terms=&duration=14&pagesize=50&price_from=-1&price_to=-1&provinceid=0&shopid=0&page=2");
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch1,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch1, CURLOPT_HEADER, 0);
curl_setopt($ch1, CURLOPT_ENCODING, 'UTF-8');
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch2,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch2, CURLOPT_HEADER, 0);
curl_setopt($ch2, CURLOPT_ENCODING, 'UTF-8');

$mh = curl_multi_init();

//add the two handles
curl_multi_add_handle($mh,$ch1);
curl_multi_add_handle($mh,$ch2);

$active = null;
//execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);

while ($active && $mrc == CURLM_OK) {
if (curl_multi_select($mh) != -1) {
do {
$mrc = curl_multi_exec($mh, $active);
} while ($mrc == CURLM_CALL_MULTI_PERFORM);
}
}



$dom = new DOMDocument('1.0', 'utf-8');
libxml_use_internal_errors(true);
$dom->loadHTML($mrc);
libxml_clear_errors();
$xpath = new DOMXpath($dom);

$data = array();
$table_rows = $xpath->query('//table[@id="price_table"]/tbody/tr'); // target the row (the browser rendered <tbody>, but actually it really doesnt have one)

if($table_rows->length <= 0) { // exit if not found
echo 'no table rows found';
exit;
}
$s = "st\xECna";
foreach($table_rows as $tr) { // foreach row
$row = $tr->childNodes;
if($row->item(0)->tagName != '<tr class="carHeader"> <td>نام کارخانه </td> <td>نام خودرو </td> <td>قیمت نمایندگی (ریال) </td> <td>قیمت بازار (ریال) </td> </tr>') { // avoid headers
$data[] = array(
'Name' => trim($row->item(2)->nodeValue),
'Price' => trim($row->item(4)->nodeValue),

);
}
}

echo '<pre>';
print_r($data); 

But don’t display data …
when I use curl_init() to get data from one page without problem working and displaying data (name and price) but when use curl_multi_init dont display I really need…

I never used curl_multi_exec, but I’m pretty sure that this line is wrong:

$dom->loadHTML($mrc);

Your $mrc variable is the return value of curl_multi_exec, and if you check the official documentation, you’ll see that the return value of this function is

A cURL code defined in the cURL Predefined Constants.

So you’re passing a constant to loadHTML, which is supposed to take in HTML.
Also, it doesn’t make sense to me that you’re looping trough different pages to take the content but you only have one place where you load the HTML? Shouldn’t you load the HTML in a loop also, since you have multiple pages?

I don’t know my solution to get multiple page data is ok.

I cant actually help you, because what you’re doing is in violation of that website’s terms of service, and thus, illegal.

What i will tell you is that your do…while loop overwrites $mrc repeatedly, and curl_multi_exec returns an INT. So $dom->loadHTML($mrc) makes no sense; $mrc does not contain HTML.

Maybe he owns the website or acquired the permission. But, you’re right, we should always specify that you need the permission to do this!

(if he owned the website he wouldnt be needing to screen-scrape the data :wink: )