Noindex on dynamically generated pages?

I have spent several hours looking for a solution but so far have not come up with one.

I know there is a way to generate meta tags dynamically with php (most often title, description, keywords, etc)
but is there a way to generate a robots noindex tag?

I have a php page that dynamically generates other pages and i want it to pass the noindex tag to all the pages it generates.
I still want the links to be followed, but i dont want the pages themselves indexed.

is this possible?

I’ve never done it myself but off the top of my head maybe you could create a robots.php file and dynamically create the no index rules in that and then use mod_rewrite to change the robots.php file name to robots.txt.

You could also add the HTML tag to each page:

<meta name=“robots” content=“noindex”>

Why do you want the links to be followed if you don’t want the pages indexed though?

If it were as simple as putting the html tag in each page, i wouldnt be asking.
Only one page actually exists as a php file. This page dynamically generates subsequent pages which do not actually exist as individual files.

as far as the why - they are pages of links from LinkMarket - please dont tell me the pro’s and con’s of pages that only contain links, its not my call, i’m just the code monkey and my opinions and suggestions are frequently ignored by the boss…

If your not familiar with it, Link Market is a link exchange directory site. Each site linked to from these pages are linking back to the homepage of this site im working on.

Forgive me if i sound rude at all. It was not intentional.

Use header() and write the X-Robots-Tag: noindex out?
https://www.google.com/search?q=set+robots+noindex+via+header&oq=set+robots+noindex+via+header&aqs=chrome.0.57j0l3j62l2.7731&sourceid=chrome&ie=UTF-8&safe=active

I’m still pretty new with php and i want to make sure i’m implementing this correctly.
here is the full source code of my-link-page.php which generates said subpages:

(NOTE: the entirety of the page’s code was generated for me by LinkMarket - i simply copied and pasted, The header() function is the only thing i added myself)


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
    <title>My Link Page</title>
	<style type="text/css">
		body, table, td, tr, a {color: #333333; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px;}
		a{text-decoration: underline;}
		a:hover {color: #999999;}
		a:active {color: #999999;}
		a:visited{color: #666666;}
		.tblcel_results, .tblcel_sl_header {border-bottom-style: solid; border-bottom-width: 1; }
		.url_and_date, .sl_url_and_date{font-style:italic;filter: alpha(opacity=50);}
		.description{filter: alpha(opacity=50);}
		.nav_numbers{padding: 4px;}
	</style>
</head>
<body>
	<div>
		<?php
header("X-Robots-Tag: noindex", true);

		 /*
		   Link Market Link Page Module
		   Copyright 2003 Link Market, All Rights Reserved.
		   LDMS CODE for: http://www.liveoutloudproductions.com/ 
		   WARNING: Do not change code below or your link page will not work!
		 */

		 $user_id = "dh57bhX8M9Y=";

		 $url = "http://api.linkmarket.com/mng_dir/get_links.php?user_id="
				.$user_id."&cid=".$_GET['cid']."&start=".$_GET['start']."";

		echo GetLMDSContent($url);

		function GetLMDSContent($url)
		{
		$buffer = ""; 
		$urlArr = parse_url($url);
		if($urlArr[query])
		{
		$urlArr[query] = "?".$urlArr[query];
		}

		$fp = fsockopen($urlArr[host], 80, $errno, $errstr, 30);
		if (!$fp){echo "$errstr ($errno)<br />\
";}
		else
		{
		$out = "GET /".substr($urlArr[path], 1).$urlArr[query]." HTTP/1.0\\r\
";
		$out .= "Host: ".$urlArr[host]."\\r\
";
		$out .= "Connection: Close\\r\
\\r\
";
		fwrite($fp, $out);
		while (!feof($fp))
		{
		$buffer .= fgets($fp, 128);
		} 
		fclose($fp);
		}

		$buffer = strstr($buffer,"\\r\
\\r\
");

		return $buffer;
		}

		?>
	</div>
</body>
</html>

come to think of it,

could i use this x-robots tag (or standard robots meta tag) in my robots.txt file regarding a specific subfolder on the server?

say this links page was in its own folder on the server and i wanted to apply noindex to everything in that folder. is this easier? is this even possible?

PHP headers must come before any of the HTML and even before the doctype or you will get an error. If you want to apply it after you start writing the HTML then you need to do it as the HTML meta tag which will attempt to apply the nodindex that the header would have applied after the page has already started to be created.

<?php
header("X-Robots-Tag: noindex", true);
 ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
    <title>My Link Page</title>
	<style type="text/css">
		body, table, td, tr, a {color: #333333; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 11px;}
		a{text-decoration: underline;}
		a:hover {color: #999999;}
		a:active {color: #999999;}
		a:visited{color: #666666;}
		.tblcel_results, .tblcel_sl_header {border-bottom-style: solid; border-bottom-width: 1; }
		.url_and_date, .sl_url_and_date{font-style:italic;filter: alpha(opacity=50);}
		.description{filter: alpha(opacity=50);}
		.nav_numbers{padding: 4px;}
	</style>
</head>
<body>
	<div>

		 /*
		   Link Market Link Page Module
		   Copyright 2003 Link Market, All Rights Reserved.
		   LDMS CODE for: http://www.liveoutloudproductions.com/ 
		   WARNING: Do not change code below or your link page will not work!
		 */

		 $user_id = "dh57bhX8M9Y=";

		 $url = "http://api.linkmarket.com/mng_dir/get_links.php?user_id="
				.$user_id."&cid=".$_GET['cid']."&start=".$_GET['start']."";

		echo GetLMDSContent($url);

		function GetLMDSContent($url)
		{
		$buffer = ""; 
		$urlArr = parse_url($url);
		if($urlArr[query])
		{
		$urlArr[query] = "?".$urlArr[query];
		}

		$fp = fsockopen($urlArr[host], 80, $errno, $errstr, 30);
		if (!$fp){echo "$errstr ($errno)<br />\
";}
		else
		{
		$out = "GET /".substr($urlArr[path], 1).$urlArr[query]." HTTP/1.0\\r\
";
		$out .= "Host: ".$urlArr[host]."\\r\
";
		$out .= "Connection: Close\\r\
\\r\
";
		fwrite($fp, $out);
		while (!feof($fp))
		{
		$buffer .= fgets($fp, 128);
		} 
		fclose($fp);
		}

		$buffer = strstr($buffer,"\\r\
\\r\
");

		return $buffer;
		}

		?>
	</div>
</body>
</html>

If the pages you don’t want indexed are all in a folder then why not just put an appropriate robots.txt file there denying the search engines access to that entire folder.

User-agent: *
Disallow: /

I thought about that, but will that still allow the links to be crawled? if not it kindof defeats the purpose…

also, if i use the x-robots tag with a php header, how do i ensure that the generated pages will pull the header?
It seems like just putting the header in, above everything else, will only apply it to the page it is on…

Sorry if im asking stupid questions.

From what I’ve read, noindex simply tells the bot to not index your page. It wills till crawl it and follow any links on it unless you use noindex,nofollow

This is the way I dynamically set the Meta Robots Tag: (single included header used on over 3,000 pages)



<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
  <title>My Link Page</title>
  <link type="text/css"  rel="stylesheet" href="http://localhost/assets/css/vo13-scrn-nor.css" />
  <?php
    # DEBUG to display a list of $_SERVER Parameters
    #echo '<pre>'; print_r($_SERVER); echo '</pre>';

    # Set default robotsContent and test for particulr URIs
       $robotsContent = 'index, follow';
       if( '/E-bay-Help.html' == $_SERVER['REQUEST_URI'] ):
         $robotsContent = 'noindex, follow';
       endif;  
       echo '<meta name="robots" content="' .$robotsContent .'" />';
  ?>
</head>
<body>
   <!-- Blurb goes here -->
</body>
</html>


The if() statement should be tailored to suit your requirements, using an array() and in_array() caters for multiple URIs