Best method to not dilute search indexing

Hi,

If I have two domains [noparse]a.com[/noparse] and [noparse]b.com[/noparse]. Both these domains route to the same site. The [noparse]a.com[/noparse] domain is the domain that search engines should index. There may be many people that have the [noparse]b.com[/noparse] domain.

What is the least resource intensive way to make this happen as there is domain maps, redirects, DNS web forwarding or just sticking with two Apache virtual servers and have them answer when they’re called.

Currently, there is not a deep page structure, however coming in January there will be, so the method I use should scale well to this page structure growth.

I do have access to the Apache configuration file so a rewrite map is possible.

Your thoughts are appreciated!

Regards,
Steve

Hi Steve!

I’m not sure what you’re asking. Are a and b co-located and serving the same files? If so, use a mod_rewrite redirection to get rid of the (secondary) domain (eventually, you can use it for something else) but, as you’re aware, having the same content served by two domains is punished by Google (et al).

Let me know if you need help with the trivial mod_rewrite to do this for you.

On the other hand, did I miss the point of the question and head off into left field? :shifty:

Regards,

DK

Hi David,

No you did not miss the point.

I simple did not know what would be the best long-term way to handle this. The a and b files are the same on the same server, and you are right that I want to do this so that no google punishment ensues.

I’ve got the rewrite down, but I won’t hesitate to ask for you help should I run into a problem.

Many thanks!
Steve

Hi,

Here is my rewrite so far.


RewriteEngine on
# Match any alpha-numeric character using .php extention
RewriteRule ^([a-z0-9]+)$ $1.php 
# Match either domain
RewriteCond %{HTTP_HOST} !^a\\.com$ [NC] [OR]
RewriteCond %{HTTP_HOST} !^b\\.com$ [NC] 
# Permanently redirect either domain to a non www version
RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L]

It works the way it is written, however I would like to try to tighten it up against a few things and have been running into infinite loops in doing so.

In the browser if I type http://b.com/home it redirects to http://a.com/home.php instead of http://a.com/home. If I however type http://b.com/contact I get http://a.com/contact - which is correct. I thought as rules are processed sequentially that no matter if the request URI is a.com/something.php or b.com/something.php that it would serve the not php version?

Furthermore I tried different ways to get the custom 404.php file working but no matter what sequence I put it in, I got the generic Internal Server error 500 infinite loop. I was using this code for the 404.php file.


[COLOR=#000000][FONT=arial]# Check it is not a file
RewriteCond %{REQUEST_FILENAME} !-f
# Check it is  not a directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .? /404.php [L][/FONT][/COLOR]

The 404.php file is located in the root directory of the site. I’ve checked the error & access logs and it doesn’t provide any meaningful feedback on this.

I hope I’ve been clear.

Well I got the 404.php error resolved. I changed the .htaccess code to:


# Check it is not a file
RewriteCond %{REQUEST_FILENAME} !-f
# Check it is not a directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .? 404.php [L]

I removed the / that used to read /404.php; I thought the slash was needed to define it was in the root directory, but in my case it was causing the problem?

I still am working on the other issue defined in Post #4.

Regards,
Steve

Ok,

I figured out a way to simplify this. It occurred to me that given that the two virtual hosts for a.com and b.com are pointing to the same files, including the .htaccess file. I don’t need to mention b.com in the .htaccess. Instead this code does all of the following:

  • Cleans .php from all files (wanted to not needlessly tell bad people that PHP is used)
  • Removes www
  • Redirects b.com/$1 to a.com/$1 (where $1 is the request uri)
  • Traps all missing pages in requests to b.com/$1 and redirects them to a custom 404.php file.

Here is the code:


RewriteEngine on


# Match  alpha-numeric characters in the Request URI that use .php extensions
RewriteRule ^([a-z0-9]+)$ $1.php 

# Match either domain
RewriteCond %{HTTP_HOST} !^a\\.com$ [NC] 
# Permanently redirect either domain to a non www versions
RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L]

# 404 error matching
# Not a file
RewriteCond %{REQUEST_FILENAME} !-f
# Not a directory
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .? 404.php [L]

Hi Steve,

Sorry for my delay returning to your post - don’t be afraid to PM me (I don’t usually bite … too hard :lol:)!

I did look over your code and only had one serious comment about it: Before redirecting (.*) to $1.php, check that the file exists (as a .php file).

Your last post, however, is the most critical: It attempts to give your “specifications.”

With the a.com and b.com domains sharing the files, there’s a need (on your part) to pick the preferred domain name (so you’re not penalized by SE’s). I’ll assume a.com is preferred.

  • Cleans .php from all files (wanted to not needlessly tell bad people that PHP is used)
  • Removes www
  • Redirects b.com/$1 to a.com/$1 (where $1 is the request uri)
  • Traps all missing pages in requests to b.com/$1 and redirects them to a custom 404.php file.

Okay, I’d reorder that to:

  • Redirect b.com to a.com with 301
  • Remove www (subdomain) with 301
  • Strip .php file extension (ONLY if not {IS_SUBREQ}) with 301
  • Redirect extensionless filenames to .php version (ONLY if it exists) - hidden
  • Handle 404s (hidden) - personally, I’d redirect to a sitemap or simply use an ErrorDocument statement instead

Let me code in the modified order (for simplicity’s sake):


RewriteEngine on

# Redirect b.com to a.com with 301 AND
RewriteCond %{HTTP_HOST} b\\.com [NC,OR]
# Remove www (subdomain) with 301
RewriteCond %{HTTP_HOST} ^www\\. [NC]
# Note: {HTTP_HOST} is not case sensitive while mod_rewrite is so the No Case flags [I]are [/I]needed
RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L]
# Note: Both the {REQUEST_URI} and {QUERY_STRING} are preserved; {IS_SUBREQ is not set}

# Strip .php file extension (ONLY if not {IS_SUBREQ}) with 301
RewriteCond %{IS_SUBREQ} !true
RewriteRule ^(.*)\\.php$ $1 [R=301,L]
# Note: {REQUEST_URI} has .php file extension stripped; {IS_SUBREQ} is now true

# Redirect extensionless filenames to .php version (ONLY if it exists) - hidden
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^([^.]+)$ $1.php [L]
# Note: {REQUEST_URI} regains .php file extension; {IS_SUBREQ} is now true

# Handle 404s - personally, I'd redirect to a sitemap or simply use an ErrorDocument statement instead
# Preferred - as core directive, I'd move before RewriteEngine on
ErrorDocument 404 /404.php

# Second Choice
# ErrorDocument 404 /sitemap.php
# Note that ErrorDocument requires the status code and an ABSOLUTE URI/URL - I used internal absolute URIs

# Last Choice - but may be useful if you need to know what was requested in the 404 script
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .? 404.php?request=%{REQUEST_URI} [L]

# Remove the comments before uploading to the production server.

Please note that the .? in the RewriteRules above will ALWAYS evaluate to true so the request is made based on the RewriteCond statements preceding the RewriteRule. Also, I preach to avoid the villany of (.*) so I will use the Apache variable which amounts to the same thing (when that’s all that in the regex for RewriteRules): {REQUEST_URI}.

Not much there which is out of the ordinary except the {IS_SUBREQ}. Is Subrequest is only set when there has been an INTERNAL redirection made (I admit to not being sure whether it’s null or false if not set) and it’s only used to prevent looping between your visible and usable formats. My signature’s tutorial shows that I first enabled “loopy code” by adding a marker (key) to a query string and tested for that … until I discovered that the {IS_SUBREQ} “marker” is already available via Apache!

Regards,

DK

Hi David,

Thank you for your detailed explanation and rework of the rewriting code. Unfortunately the server’s configuration may be getting in the way of your suggested changes as I am getting

[COLOR=#000000][FONT=Helvetica]The web page at [/FONT][/COLOR][B]http://a.com/var/www/ClientFolder/404[/B][COLOR=#000000][FONT=Helvetica] has resulted in too many redirects.

This message is being generated with only this code:

[/FONT][/COLOR]# Strip .php file extension (ONLY if not {IS_SUBREQ}) with 301RewriteCond %{IS_SUBREQ} !true
RewriteRule ^(.*)\\.php$ $1 [R=301,L]
# Note: {REQUEST_URI} has .php file extension stripped; {IS_SUBREQ} is now true


# Redirect extensionless filenames to .php version (ONLY if it exists) - hidden
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^([^.]+)$ $1.php [L]
# Note: {REQUEST_URI} regains .php file extension; {IS_SUBREQ} is now true

I understand your recommendation to test for a key match on loopy code, but am not sure why I get a loop when I don’t using the now modified code (based on some of your recommendations).


# 404.php is a sitemap with a message that the requested page was not found.
ErrorDocument 404 /404.php


RewriteEngine on


# Redirect b.com to a.com with 301 and...
RewriteCond %{HTTP_HOST} !^b\\.com$ [NC] 
# Remove www (subdomain) with 301
RewriteCond %{HTTP_HOST} ^www\\. [NC]
# match request URIs 
RewriteRule .? http://a.com%{REQUEST_URI} [R=301,L]


# Redirect extensionless filenames to .php version (ONLY if it exists)
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^([^.]+)$ $1.php [L]



The one other issue I’m not sure if you were trying to show me is when someone requests a.com/somepage.php, if it exists then it is not rewritten to a.com/somepage it stays a.com/somepage.php but it does resolve if the request is made to a.com/somepage?

Many thanks and Merry Christmas :slight_smile:

Steve,

Thanks! Christmas was great! 'Hope you enjoy yours as it’s already the wee hours there and Santa’s in the Mountain time zone.

Regards,

DK