Is It Possible To Just Show a Section of a Page From Another Website?

ckrocker · May 13, 2015, 10:15pm

Hi,
I’m working on a small information website for the business I work for. (Sales and Support)

Our Parent Company has a main website for the world wide business, with a Product page showing the complete product line.

My Manager said he wants our website to show the Product line from our Parent company’s website, but wants the Product list to appear like it’s on our website.

He does not want the header or side bars to show up from our Parent company’s website. He just wants the Product line to appear like it is on our own page on our website, using their product page.

The only way I’ve come up with is using iframes, but the entire web page shows up, which does not work for what they are looking to do and it looks pretty bad.

Is this possible to do, or should I just link to our parent company product page and have that product page open into a new tab or new browser window etc ?

I know enough HTML & CSS to be dangerous

Thanks for any help or suggestions

RyanReese · May 13, 2015, 10:18pm

You’d need to do this server-side. In PHP, it’s DomXPath. You basically need to read the HTML (narrow it down) and go from there.

http://php.net/manual/en/class.domxpath.php

Edit-I’m not actually sure whether permissions come into play here - it being from a different website. Just something I thought about.

I have an example script of DomXPath from something I previously coded, if you want to see it in action.

Edit2 - I’m guessing permissions won’t be an issue because it’s not like you’re doing anything other than reading the source (readily accessible to everyone) and reading that.

I say this is a good option.

The downside is though, that if they change the markup, you’d have to follow suite.

oddz · May 13, 2015, 10:49pm

You can’t do that with css and html. You can do it with AJAX and proxy to work-around same domain policy issues. The *best way to do it is server-side via a scrape or better yet a feed and/or service provided by the other site.

mawburn · May 13, 2015, 11:17pm

Just to add to this, this is why it needs to be done serverside:

But if you control the site being pulled from, you can configure it to allow this. How it’s done escapes me right now, though.

oddz · May 14, 2015, 12:56am

Pain in the @ss. It is so much easier to use a simple server-side proxy. Especially if you’re not a dedicated host or a cloud were you can control the server settings. However, you also have to consider whether it is appropriate to make your site reliable on consuming the other ones resources every request. In most cases this not appropriate and/or reliable so if possible it is better off to pull in/import the data into your own site and run a cron job to sync against the source site on schedule independent from the website itself.

mawburn · May 14, 2015, 1:37am

I actually did this a few months ago on a friend’s site to demo a better UI for his site. I just wrote a small 20 line PHP script, pulled it in on request with curl, and echod the html body on my own domain. Then fed that into a jQuery variable and then I was able to manipulate it like a normal DOM, instead of doing complex scraping.

<?php
    header('Content-Type: application/json');
    header("access-control-allow-origin: *");

 	// Defining the basic cURL function
    function curl($url) {
        // Assigning cURL options to an array
        $options = Array(
            CURLOPT_RETURNTRANSFER => TRUE,  // Setting cURL's option to return the webpage data
            CURLOPT_FOLLOWLOCATION => TRUE,  // Setting cURL to follow 'location' HTTP headers
            CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
            CURLOPT_CONNECTTIMEOUT => 120,   // Setting the amount of time (in seconds) before the request times out
            CURLOPT_TIMEOUT => 120,  // Setting the maximum amount of time for cURL to execute queries
            CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
            CURLOPT_USERAGENT => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.99 Safari/537.36",  // Setting the useragent
            CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
        );
         
        $ch = curl_init();  // Initialising cURL 
        curl_setopt_array($ch, $options);   // Setting cURL's options using the previously assigned array data in $options
        $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
        curl_close($ch);    // Closing cURL 
        return $data;   // Returning the data from the function 
    }

    function scrape_between($data, $start, $end){
        $data = stristr($data, $start); // Stripping all data from before $start
        $data = substr($data, strlen($start));  // Stripping $start
        $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
        $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
        return $data;   // Returning the scraped data from the function
    }

    $page1 = curl("site.com/page1");
    $page2 = curl("site.com/page2");

    $start = "<body>";
    $end = "</body>";

    echo json_encode(scrape_between($page1, $start, $end) . scrape_between($page2, $start, $end)); //. scrape_between($page3, $start, $end));
?>

Client

$.getJSON('url.php', function(data) {
    var remotePage = $(data.json);
    alert(remotePage.find('.some-element').text());
});

Nothing fancy, but it worked well. I mostly copied the php from here: WEB SCRAPING WITH PHP & CURL.

oddz · May 14, 2015, 2:23am

Something like that will work so long as performance and server load isn’t an issue. However, if you have a high traffic site that isn’t a very good solution since 1 hit to that page results in 2 more requests indirectly. Not to mention waiting for the content from a UI standpoint or if SEO is a concern.

mawburn · May 14, 2015, 3:27am

Well it’s not optimal by any means, but it works.

The optimal solution would be to build a restful api on the primary site. But sometimes you gotta work with what you got.

Mittineague · May 14, 2015, 3:31am

Seeing as the info is

That’s what I’d push for.

mawburn · May 14, 2015, 3:34am

Yup. Definitely agree. I probably should have said it sooner.

WolfShade · May 14, 2015, 4:38am

Since the product information is from the parent company website, instead of a restful api, couldn’t you just have access to the database that is serving the product information and get it directly?

Or am I completely missing something, here?

V/r,

fretburner · May 14, 2015, 12:00pm

Assuming that scraping is the only option to get the data, I’d have PHP cache it and output it directly into the target page when requested. That way the extra overhead on both servers will be minimal.

WolfShade · May 14, 2015, 1:01pm

The biggest downside to screen-scraping is if the codebase on the source page changes, it breaks the screen-scraping.

V/r,

King2500 · May 16, 2015, 4:10pm

Here is a pure CSS solution (based on CSS3-Transforms):

It’s working in IE9+ and every other major browser.
It required you know the “offset” where your desired content starts and it shouldn’t change.

John_Betong · May 17, 2015, 5:17am

Is it possible for the parent company to produce a restricted web-page omitting header, footer, sidebar, etc so that you can include the web-page within your iFrame? I would have thought it would be beneficial to the parent site if you could access just the product line data.

system · August 16, 2015, 12:32pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.