Noindex on dynamically generated pages?

tim_getdim · June 26, 2013, 7:56pm

I have spent several hours looking for a solution but so far have not come up with one.

I know there is a way to generate meta tags dynamically with php (most often title, description, keywords, etc)
but is there a way to generate a robots noindex tag?

I have a php page that dynamically generates other pages and i want it to pass the noindex tag to all the pages it generates.
I still want the links to be followed, but i dont want the pages themselves indexed.

is this possible?

Banana_Man · June 27, 2013, 7:02pm

I’ve never done it myself but off the top of my head maybe you could create a robots.php file and dynamically create the no index rules in that and then use mod_rewrite to change the robots.php file name to robots.txt.

You could also add the HTML tag to each page:

Why do you want the links to be followed if you don’t want the pages indexed though?

tim_getdim · June 27, 2013, 7:12pm

If it were as simple as putting the html tag in each page, i wouldnt be asking.
Only one page actually exists as a php file. This page dynamically generates subsequent pages which do not actually exist as individual files.

as far as the why - they are pages of links from LinkMarket - please dont tell me the pro’s and con’s of pages that only contain links, its not my call, i’m just the code monkey and my opinions and suggestions are frequently ignored by the boss…

If your not familiar with it, Link Market is a link exchange directory site. Each site linked to from these pages are linking back to the homepage of this site im working on.

Forgive me if i sound rude at all. It was not intentional.

cpradio · June 27, 2013, 7:18pm

Use header() and write the X-Robots-Tag: noindex out?
https://www.google.com/search?q=set+robots+noindex+via+header&oq=set+robots+noindex+via+header&aqs=chrome.0.57j0l3j62l2.7731&sourceid=chrome&ie=UTF-8&safe=active

tim_getdim · June 27, 2013, 7:40pm

I’m still pretty new with php and i want to make sure i’m implementing this correctly.
here is the full source code of my-link-page.php which generates said subpages:

(NOTE: the entirety of the page’s code was generated for me by LinkMarket - i simply copied and pasted, The header() function is the only thing i added myself)


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
    <title>My Link Page</title>
	<style type="text/css">
		body, table, td, tr, a {color: #333333; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px;}
		a{text-decoration: underline;}
		a:hover {color: #999999;}
		a:active {color: #999999;}
		a:visited{color: #666666;}
		.tblcel_results, .tblcel_sl_header {border-bottom-style: solid; border-bottom-width: 1; }
		.url_and_date, .sl_url_and_date{font-style:italic;filter: alpha(opacity=50);}
		.description{filter: alpha(opacity=50);}
		.nav_numbers{padding: 4px;}
	</style>
</head>
<body>
	<div>
		<?php
header("X-Robots-Tag: noindex", true);

		 /*
		   Link Market Link Page Module
		   Copyright 2003 Link Market, All Rights Reserved.
		   LDMS CODE for: http://www.liveoutloudproductions.com/ 
		   WARNING: Do not change code below or your link page will not work!
		 */

		 $user_id = "dh57bhX8M9Y=";

		 $url = "http://api.linkmarket.com/mng_dir/get_links.php?user_id="
				.$user_id."&cid=".$_GET['cid']."&start=".$_GET['start']."";

		echo GetLMDSContent($url);

		function GetLMDSContent($url)
		{
		$buffer = ""; 
		$urlArr = parse_url($url);
		if($urlArr[query])
		{
		$urlArr[query] = "?".$urlArr[query];
		}

		$fp = fsockopen($urlArr[host], 80, $errno, $errstr, 30);
		if (!$fp){echo "$errstr ($errno)<br />\
";}
		else
		{
		$out = "GET /".substr($urlArr[path], 1).$urlArr[query]." HTTP/1.0\\r\
";
		$out .= "Host: ".$urlArr[host]."\\r\
";
		$out .= "Connection: Close\\r\
\\r\
";
		fwrite($fp, $out);
		while (!feof($fp))
		{
		$buffer .= fgets($fp, 128);
		} 
		fclose($fp);
		}

		$buffer = strstr($buffer,"\\r\
\\r\
");

		return $buffer;
		}

		?>
	</div>
</body>
</html>

tim_getdim · June 27, 2013, 7:46pm

come to think of it,

could i use this x-robots tag (or standard robots meta tag) in my robots.txt file regarding a specific subfolder on the server?

say this links page was in its own folder on the server and i wanted to apply noindex to everything in that folder. is this easier? is this even possible?

felgall · June 27, 2013, 9:23pm

PHP headers must come before any of the HTML and even before the doctype or you will get an error. If you want to apply it after you start writing the HTML then you need to do it as the HTML meta tag which will attempt to apply the nodindex that the header would have applied after the page has already started to be created.

<?php
header("X-Robots-Tag: noindex", true);
 ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
    <title>My Link Page</title>
	<style type="text/css">
		body, table, td, tr, a {color: #333333; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px;}
		a{text-decoration: underline;}
		a:hover {color: #999999;}
		a:active {color: #999999;}
		a:visited{color: #666666;}
		.tblcel_results, .tblcel_sl_header {border-bottom-style: solid; border-bottom-width: 1; }
		.url_and_date, .sl_url_and_date{font-style:italic;filter: alpha(opacity=50);}
		.description{filter: alpha(opacity=50);}
		.nav_numbers{padding: 4px;}
	</style>
</head>
<body>
	<div>

		 /*
		   Link Market Link Page Module
		   Copyright 2003 Link Market, All Rights Reserved.
		   LDMS CODE for: http://www.liveoutloudproductions.com/ 
		   WARNING: Do not change code below or your link page will not work!
		 */

		 $user_id = "dh57bhX8M9Y=";

		 $url = "http://api.linkmarket.com/mng_dir/get_links.php?user_id="
				.$user_id."&cid=".$_GET['cid']."&start=".$_GET['start']."";

		echo GetLMDSContent($url);

		function GetLMDSContent($url)
		{
		$buffer = ""; 
		$urlArr = parse_url($url);
		if($urlArr[query])
		{
		$urlArr[query] = "?".$urlArr[query];
		}

		$fp = fsockopen($urlArr[host], 80, $errno, $errstr, 30);
		if (!$fp){echo "$errstr ($errno)<br />\
";}
		else
		{
		$out = "GET /".substr($urlArr[path], 1).$urlArr[query]." HTTP/1.0\\r\
";
		$out .= "Host: ".$urlArr[host]."\\r\
";
		$out .= "Connection: Close\\r\
\\r\
";
		fwrite($fp, $out);
		while (!feof($fp))
		{
		$buffer .= fgets($fp, 128);
		} 
		fclose($fp);
		}

		$buffer = strstr($buffer,"\\r\
\\r\
");

		return $buffer;
		}

		?>
	</div>
</body>
</html>

If the pages you don’t want indexed are all in a folder then why not just put an appropriate robots.txt file there denying the search engines access to that entire folder.

User-agent: *
Disallow: /

tim_getdim · June 28, 2013, 3:06pm

I thought about that, but will that still allow the links to be crawled? if not it kindof defeats the purpose…

also, if i use the x-robots tag with a php header, how do i ensure that the generated pages will pull the header?
It seems like just putting the header in, above everything else, will only apply it to the page it is on…

Sorry if im asking stupid questions.

cpradio · June 28, 2013, 3:18pm

From what I’ve read, noindex simply tells the bot to not index your page. It wills till crawl it and follow any links on it unless you use noindex,nofollow

John_Betong · June 28, 2013, 3:53pm

This is the way I dynamically set the Meta Robots Tag: (single included header used on over 3,000 pages)



<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
  <title>My Link Page</title>
  <link type="text/css"  rel="stylesheet" href="http://localhost/assets/css/vo13-scrn-nor.css" />
  <?php
    # DEBUG to display a list of $_SERVER Parameters
    #echo '<pre>'; print_r($_SERVER); echo '</pre>';

    # Set default robotsContent and test for particulr URIs
       $robotsContent = 'index, follow';
       if( '/E-bay-Help.html' == $_SERVER['REQUEST_URI'] ):
         $robotsContent = 'noindex, follow';
       endif;  
       echo '<meta name="robots" content="' .$robotsContent .'" />';
  ?>
</head>
<body>
   <!-- Blurb goes here -->
</body>
</html>

The if() statement should be tailored to suit your requirements, using an array() and in_array() caters for multiple URIs