Hello,
We want a code that would read the content of a Web site and return the following info:
Title
Description
Keywords
I have found and modified a code that does this in case of most web sites, but it fails in case of some web sites such as: www.uber.com
The code is below.
Can you suggest what to do with it to have it return nothing, such as NULL, if it cannot find such information about a Web site. And in case it cannot get such info say in 10 Seconds.
The code developed so far is:
function file_get_contents_curl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
if (isset($_POST['url'])) {
$url = $_POST['url'];
$html = file_get_contents_curl("$url");
echo '<p>url: '. $url . '<p>';
//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
if (isset($nodes)) {
//get and display what you need:
$title = $nodes->item(0)->nodeValue;
}
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
if($meta->getAttribute('name') == 'keywords')
$keywords = $meta->getAttribute('content');
}
if (isset($title)) {
echo "Title: $title". '<br/><br/>';
}
if (isset($description)) {
echo "Description: $description". '<br/><br/>';
}
if (isset($keywords)) {
$kw_list = explode(', ', $keywords);
echo 'KWs are:<br>';
for ($i=0; $i<count($kw_list); ++$i) {
echo '<br>' . $kw_list[$i];
}
}
}
Thanks.
Dean