Hi,
I’m working on a small information website for the business I work for. (Sales and Support)
Our Parent Company has a main website for the world wide business, with a Product page showing the complete product line.
My Manager said he wants our website to show the Product line from our Parent company’s website, but wants the Product list to appear like it’s on our website.
He does not want the header or side bars to show up from our Parent company’s website. He just wants the Product line to appear like it is on our own page on our website, using their product page.
The only way I’ve come up with is using iframes, but the entire web page shows up, which does not work for what they are looking to do and it looks pretty bad.
Is this possible to do, or should I just link to our parent company product page and have that product page open into a new tab or new browser window etc ?
Edit-I’m not actually sure whether permissions come into play here - it being from a different website. Just something I thought about.
I have an example script of DomXPath from something I previously coded, if you want to see it in action.
Edit2 - I’m guessing permissions won’t be an issue because it’s not like you’re doing anything other than reading the source (readily accessible to everyone) and reading that.
I say this is a good option.
The downside is though, that if they change the markup, you’d have to follow suite.
You can’t do that with css and html. You can do it with AJAX and proxy to work-around same domain policy issues. The *best way to do it is server-side via a scrape or better yet a feed and/or service provided by the other site.
Pain in the @ss. It is so much easier to use a simple server-side proxy. Especially if you’re not a dedicated host or a cloud were you can control the server settings. However, you also have to consider whether it is appropriate to make your site reliable on consuming the other ones resources every request. In most cases this not appropriate and/or reliable so if possible it is better off to pull in/import the data into your own site and run a cron job to sync against the source site on schedule independent from the website itself.
I actually did this a few months ago on a friend’s site to demo a better UI for his site. I just wrote a small 20 line PHP script, pulled it in on request with curl, and echod the html body on my own domain. Then fed that into a jQuery variable and then I was able to manipulate it like a normal DOM, instead of doing complex scraping.
<?php
header('Content-Type: application/json');
header("access-control-allow-origin: *");
// Defining the basic cURL function
function curl($url) {
// Assigning cURL options to an array
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE, // Setting cURL's option to return the webpage data
CURLOPT_FOLLOWLOCATION => TRUE, // Setting cURL to follow 'location' HTTP headers
CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
CURLOPT_CONNECTTIMEOUT => 120, // Setting the amount of time (in seconds) before the request times out
CURLOPT_TIMEOUT => 120, // Setting the maximum amount of time for cURL to execute queries
CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
CURLOPT_USERAGENT => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36", // Setting the useragent
CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
);
$ch = curl_init(); // Initialising cURL
curl_setopt_array($ch, $options); // Setting cURL's options using the previously assigned array data in $options
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
curl_close($ch); // Closing cURL
return $data; // Returning the data from the function
}
function scrape_between($data, $start, $end){
$data = stristr($data, $start); // Stripping all data from before $start
$data = substr($data, strlen($start)); // Stripping $start
$stop = stripos($data, $end); // Getting the position of the $end of the data to scrape
$data = substr($data, 0, $stop); // Stripping all data from after and including the $end of the data to scrape
return $data; // Returning the scraped data from the function
}
$page1 = curl("site.com/page1");
$page2 = curl("site.com/page2");
$start = "<body>";
$end = "</body>";
echo json_encode(scrape_between($page1, $start, $end) . scrape_between($page2, $start, $end)); //. scrape_between($page3, $start, $end));
?>
Client
$.getJSON('url.php', function(data) {
var remotePage = $(data.json);
alert(remotePage.find('.some-element').text());
});
Something like that will work so long as performance and server load isn’t an issue. However, if you have a high traffic site that isn’t a very good solution since 1 hit to that page results in 2 more requests indirectly. Not to mention waiting for the content from a UI standpoint or if SEO is a concern.
Since the product information is from the parent company website, instead of a restful api, couldn’t you just have access to the database that is serving the product information and get it directly?
Assuming that scraping is the only option to get the data, I’d have PHP cache it and output it directly into the target page when requested. That way the extra overhead on both servers will be minimal.
Is it possible for the parent company to produce a restricted web-page omitting header, footer, sidebar, etc so that you can include the web-page within your iFrame? I would have thought it would be beneficial to the parent site if you could access just the product line data.