Needless overhead or not - rewrite?

Hi,

These rewrite rules work really well. I am wondering if the fact that I don’t enforce the www first that I am causing needless overhead as it needs to switch from domain 1 to domain 2 before enforcing the www?

here it is:


RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d


# Redirect old_domain.com to new_domain.com with 301 and...
RewriteCond %{HTTP_HOST} ^old_domain.com$ [NC]
RewriteRule ^([a-zA-Z0-9]+)$ http://new_domain.com/$1 [R=301,L]


# Ensure www  with 301
RewriteCond %{HTTP_HOST} !^www\\. [NC]
RewriteRule ^ http%1://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]


# remove .php ONLY if requested directly
RewriteCond %{THE_REQUEST} (\\.php\\sHTTP/1)
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L,QSA]


# Redirect extensionless version to .php version
RewriteRule ^([a-z]+)$ $1.php

Do you have any suggested improvements?

Regards,
Steve

Steve,

Whoa! Now you’re going backward!

First, your answer: It doesn’t matter as you’ll be eliminating www on domain2 anyway.

Comments on your code:

[COLOR="#0000FF"]RewriteEngine on
# Get in the habit of starting with this (technique)[/COLOR]

[COLOR="#800080"]RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# This will only affect the following RewriteRule block so this is totally irrelevant.[/COLOR]

# Redirect old_domain.com to new_domain.com with 301 and...
RewriteCond %{HTTP_HOST} [COLOR="#FF0000"]^[/COLOR]old_domain.com$ [NC]
[COLOR="#FF0000"]# You only care about the domain redirection[/COLOR]
RewriteRule ^([a-zA-Z0-9]+)$ http://[COLOR="#0000FF"]www.[/COLOR]new_domain.com/$1 [R=301,L]
[COLOR="#A9A9A9"]# Well either that or RewriteRule ^([a-zA-Z0-9]+)$ http://new_domain.com%{REQUEST_URI} [R=301,L][/COLOR]
[COLOR="#0000FF"]# If you're going to enforce www, start with making the redirection correctly[/COLOR]

# Ensure www  with 301
RewriteCond %{HTTP_HOST} !^www\\. [NC]
RewriteRule [COLOR="#FF0000"]^[/COLOR] http[COLOR="#FF0000"]%1[/COLOR]://www.%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
[COLOR="#FF0000"]# You've specified a null {REQUEST_URI} (okay, just a start anchor); I'd use .? as "correct regex"
# You've not created %1 so this is clearly in error.[/COLOR]

# remove .php ONLY if requested directly
RewriteCond %{THE_REQUEST} [COLOR="#FF0000"]([/COLOR]\\.php[COLOR="#FF0000"]\\sHTTP/1)[/COLOR]
[COLOR="#FF0000"]# You don't need to create an atom here nor do you need anything after \\.php
# You could also use the {IS_SUBREQ} variable, too.[/COLOR]
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L[COLOR="#FF0000"],QSA[/COLOR]]
[COLOR="#FF0000"]# Because you're not altering/adding a query string, you don't need the QSA flag[/COLOR]

# Redirect extensionless version to .php version
RewriteRule ^([a-z]+)$ $1.php [COLOR="#0000FF"][L]
# Not required but it's technique to close this code.[/COLOR]

Okay, some comments were more technique but some do need correction.

Regards,

DK

[FONT=arial]

Hi DK,

Thanks for your candid assessment :wink: :eek: :slight_smile:

Based on your feedback and further reading, I’ve got this now; pretty much what your recommended:


RewriteEngine on


# Ensure www  with 301
[COLOR=#0000ff]# Fixed the error (Do not know why it was rewriting correctly even with this error?).
# changed this and added the missing atom.
# first enforced rule, so redirect from old to new domain will have guaranteed www[/COLOR]

RewriteCond %{HTTP_HOST} !^www\\. [NC]
RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
RewriteRule ^(.+)$ http://www\\.%1/$1 [R=301,L]

# Redirect old_domain.com to new_domain.com with 301
[COLOR=#800080]# Changed regex to (.+), (.?) throws a 500 error?[/COLOR]
RewriteCond %{HTTP_HOST} ^www.old_domain.com$
RewriteRule ^(.+)$ http://www.new_domain.com/$1 [R=301]


# remove .php ONLY if requested directly
[COLOR=#ff0000]# removed the superfluous atom[/COLOR] 
RewriteCond %{THE_REQUEST} (\\.php)
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L]


# Redirect extensionless version to .php version
RewriteRule ^([a-z]+)$ $1.php [L}

I am now finding some problems that I did not initially see:

  1. I can’t figure out why typing [noparse]old_domain.com[/noparse] as a URL I get 403 Forbidden when the rewrite code first ensures that all requests have www enforced and a straight forward redirect?
  2. I also get a 500 error if I use the ^(.?)$ as the regex for the redirect RewriteRule?
  3. If I use [noparse]www.old_domain.com/home[/noparse] then the redirect works. So why does this but the root domain give forbidden?
  4. Clicking search engine links to new_domain.com do not append www?

I also tried using the


RewriteCond %{IS_SUBREQ} false
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L]

but it gave me a configuration error.

Fundamentally the rewrite from old to new domain is the most important as it will be common for search links and bookmarks to have [noparse]www.old_domain.com[/noparse] or old_domain.com.

Thanks for your teaching on this!

Regards,
Steve[/FONT]

Well, after digging in the apache tutorials, I lifted this off their example and it works


[COLOR=#0000ff]# Using the NULL means that I don't have to explicitly check if it is 
# the old domain It matches if it isn't what I specify. Their example 
# escaped the periods in the domain name.[/COLOR]
RewriteCond %{HTTP_HOST} !^www\\.new_domain\\.com [NC]
[COLOR=#0000ff]# I think this means don't match line breaks, so I don't think I need it, 
# but kept it to preserve their example[/COLOR]
RewriteCond %{HTTP_HOST} !^$
[COLOR=#0000ff]# Option use of the /, match anything (surprised they recommend this)
# Probably the noescape (NE) flag is not required as the domain URLs have only [a-zA-Z0-9_][/COLOR]
RewriteRule ^/?(.*)$ http://www.new_domain.com/$1 [L,R=301,NE]

# Ensure www  with 301
RewriteCond %{HTTP_HOST} !^www\\. [NC]
RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
RewriteRule ^(.+)$ http://www\\.%1/$1 [R=301,L]


# remove .php ONLY if requested directly
RewriteCond %{THE_REQUEST} (\\.php)
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L]


# Redirect extensionless version to .php version
RewriteRule ^([a-z]+)$ $1.php [L]

Now I believe that I’m moving forward again :wink:

Regards,
Steve

Hi Steve!


RewriteEngine on

# redirect from old to new domain will have guaranteed www

RewriteCond %{HTTP_HOST} !^www\\. [NC]
RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
RewriteRule ^(.+)$ http://www[COLOR="#FF0000"]\\[/COLOR].%1/$1 [R=301,L]
[COLOR="#FF0000"][B]# You do NOT escape anything in the redirection - it is NOT regex![/B][/COLOR]

[indent]I prefer:

RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L]

because your :kaioken: EVERYTHING :kaioken: atom is already available as {REQUEST_URI} (and it handles ^/? on its own).[/indent]

# Redirect old_domain.com to new_domain.com with 301
[COLOR=#800080]# Changed regex to (.+), (.?) throws a 500 error?
[B]# Then WHY did you make an atom of it and not use {REQUEST_URI}?[/B][/COLOR]
RewriteCond %{HTTP_HOST} ^www[COLOR="#0000FF"]\\[/COLOR].old_domain\\.com$[COLOR="#0000FF"] [NC][/COLOR]
[COLOR="#0000FF"][COLOR="#0000FF"]# You need to escape the dot character(s) in regex AND
# {HTTP_HOST} is NOT case sensitive so you need the No Case flag[/COLOR][/COLOR]
RewriteRule ^(.+)$ http://www.new_domain.com/$1 [R=301]

[indent]Again, no need to capture {REQUEST_UIRI} this way! Instead, use

[B]RewriteRule .? http://www.new_domain.com%{REQUEST_URI} [R=301,L][/B]
[/indent]

# remove .php ONLY if requested directly
[COLOR=#ff0000]# removed the superfluous atom[/COLOR] 
RewriteCond %{THE_REQUEST} (\\.php)
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L]

[indent]If you feel that you need to ensure nothing after .php, i.e., 
something.php/something_else.php, then use \\{space} (NOTE: replace
{space} with a space - too pedantic?).[/indent]

# Redirect extensionless version to .php version

[COLOR="#0000FF"][indent]I'd first check that $1.php exists as a file with

[B]RewriteCond %{REQUEST_FILENAME}.php -f[/B][/COLOR][/indent]

RewriteRule ^([a-z]+)$ $1.php [L[COLOR="#FF0000"]}[/COLOR] 
# SYNTAX ERROR/TYPO - we know it's actually ]

  1. I can only guess that it had something to do with requiring something in the URI in your code - which is precisely why I use any optional character and the {REQUEST_URI} variable.
  2. It shouldn’t … but no need to use the anchors or create an atom if you use the optional character and {REQUEST_URI} variable.
  3. Same answer as the first one - you’ve required at least one character and home is providing four where the simple domain request CANNOT match the requirement for at least one character.
  4. It takes a while for SEs to catch up.

I suspect it’s because {IS_SUBREQ} is only created when it becomes true, i.e., it’s returning null. Change the false to !true as, logically, that’s the same thing (except for null) and not true is what you’re looking for.

Well, moving forward but you’ve discovered a few new stumbling blocks as the above indented comments have discussed.

:smiley: BTW, I am not trying to abuse you! I am simply being my pedantic (and, probably, PITA) self by making sure that you (and anyone else reading this thread) get the full story.

I’m rather surprised, though, that you have not recognized the benefit of .? and %{REQUEST_URI}. .? will always evaluate to TRUE and you have already captured your ^(.*)$ in Apache’s {REQUEST_URI} variable. Since Apache handles the leading / problem (between the two major versions of Apache), there is no sense in doing that yourself - an additional benefit. With less work for Apache to do (parse, copy and create a new variable), .? and %{REQUEST_URI} will also be marginally faster because it is far more efficient.

If you’re old_domain is a .com and your new_domain is another .com, it would make your handling of {HTTP_HOST} a little easier - at least the part where you could combine the www and non-www treatments and simply capture the new or old domain name and redirect to their www’d versions. Fine as is, though, as the general case should help everyone.

Please ask any questions about any of the above.

Regards,

DK

Hi Dk,

No worries about hard on me, I’m late to the party and I’m drinking the diluted punch, so it takes a little longer for me to get drunk :wink:

To your surprise about not finding the benefit of the %REQUEST_URI, remember that the Apache docs are massive. Recently I’ve shied away from reading most blogs about rewrite as I see most of them enforcing greedy practices which you have aggressively identified as a scourge, so it takes a little longer to crawl through a get how to effectively use each variable and how not to be greedy :smiley:

I’ve incorporated and worked to understand your many recommendations in this last thread and have for the most part improved (simplified and caused less overhead processing) the rewrites:


RewriteEngine on


# Redirect old_domain.com to new_domain.com with 301
RewriteCond %{HTTP_HOST} !^www\\.new_domain\\.com
RewriteCond %{HTTP_HOST} !^$
[COLOR=#0000FF]# Nice %{REQUEST_URI} is great to use!
[/COLOR][COLOR=#0000FF]# Interestingly using RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L] fails while the shown RewriteRule .? http://www.new_domain.com%{REQUEST_URI} [[/COLOR][COLOR=#0000FF]R=301[/COLOR][COLOR=#0000FF],[/COLOR][COLOR=#0000FF]L] works; [/COLOR][COLOR=#0000FF]they're almost the same thing?[/COLOR]

RewriteRule .? http://www.new_domain.com%{REQUEST_URI} [[COLOR=#000000]R=301,L[/COLOR]]


# Ensure www  with 301
RewriteCond %{HTTP_HOST} !^www. [NC]
[COLOR=#000000]RewriteCond %{HTTP_HOST} ^(.+)$ [NC][/COLOR][COLOR=#0000cd]
[/COLOR][COLOR=#0000ff]# Made use of your recommended .? and %{REQUEST_URI}, it works well here[/COLOR]
RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L]


# remove .php ONLY if requested directly
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{THE_REQUEST} (\\.php)
[COLOR=#0000ff]# I tried to figure out how I could get away from creating an atom when we already have the request_uri. The main sticking point is how to remove the explicit filename.php match in the RegEx[/COLOR]
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L]


# Redirect extensionless version to .php version
RewriteRule ^([a-z]+)$ $1.php [L]

I’m not sure why [COLOR=#000000]

RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L] 

fails? As I understand, this translates to :
[/COLOR][INDENT][COLOR=#0000ff]Match everything and redirect to http://www. + RewriteCondition Match plus the query parameter(s)

[/COLOR][/INDENT]
[COLOR=#000000]Your a champ for all you help! I greatly appreciate it!

Regards,
Steve
[/COLOR]

Hi Steve!

You’re holding a drinking party and didn’t invite me? Bummer! I’ll have to go a lot easier on you from now on. :drink:

Too true about the Apache docs but they’re really quite good (if you can translate the highly technical details into something intelligible). No worries, I’ve done that for you and most of the translation is in the Article linked in my signature.

Yes, I’ve seen some of those recommending using the :kaioken: EVERYTHING :kaioken: atom for matching, well, everything; that’s the red flag that tells me the author doesn’t have a clue! (StommePoes actually posted a PHP article, http://perlmonks.org/?node=Death%20to%20Dot%20Star!, but he should have known how to kill those UGLY %20’s! Note that (.*) does have it’s uses but NOT as most try to use it! I’m glad that you’re aware of its pitfalls and know to define exactly what you want to accept in your match/redirection sets. Kudos for that!


RewriteEngine on


# Redirect old_domain.com to new_domain.com with 301
RewriteCond %{HTTP_HOST} !^www\\.new_domain\\.com
RewriteCond %{HTTP_HOST} !^$
[COLOR=#0000FF]# Nice %{REQUEST_URI} is great to use!
[/COLOR][COLOR=#0000FF]# Interestingly using RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L] fails while the shown RewriteRule .? http://www.new_domain.com%{REQUEST_URI} [[/COLOR][COLOR=#0000FF]R=301[/COLOR][COLOR=#0000FF],[/COLOR][COLOR=#0000FF]L] works; [/COLOR][COLOR=#0000FF]they're almost the same thing?[/COLOR]
[COLOR="#FF0000"]# Duh? Did I ***REALLY*** do that? Of course you're at old_domain.com and need
# to redirect to new_domain.com so %1 would not work. It (%1) was supposed to retain
# the existing domain name and simply allow adding the www subdomain. MY ERROR![/COLOR]

RewriteRule .? http://www.new_domain.com%{REQUEST_URI} [R=301,L]


# Ensure www  with 301
RewriteCond %{HTTP_HOST} !^www. [NC]
[COLOR=#000000]RewriteCond %{HTTP_HOST} ^(.+)$ [NC][/COLOR][COLOR=#0000cd]
[/COLOR][COLOR=#0000ff]# Made use of your recommended .? and %{REQUEST_URI}, it works well here[/COLOR]
[COLOR="#FF0000"]# As above, that's what it was supposed to do[/COLOR]
RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L]


# remove .php ONLY if requested directly
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteCond %{THE_REQUEST} (\\.php)
[COLOR=#0000ff]# I tried to figure out how I could get away from creating an atom when we already have the request_uri. The main sticking point is how to remove the explicit filename.php match in the RegEx[/COLOR]
[COLOR="#FF0000"]# {THE_REQUEST} will not change so it retains the ORIGINAL request but
# you don't need the parentheses, though, as you're not using this value for anything
# (except to see that a PHP script had been requested). The regex below specifies
# a filename (parameter?) request in the DocumentRoot - more valuable information
# for the code. You do need the atom below, not above.[/COLOR]
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L]


# Redirect extensionless version to .php version
RewriteRule ^([a-z]+)$ $1.php [L]

As for “RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L]”, it REQUIRES an exit strategy as it’s regex is designed to ALWAYS BE TRUE (as is the ^(.*)$ you were using). Its preceding RewriteCond statement(s) provide the false option. You’re only slightly off in your translation (probably semantics rather than an error) as %1 represents the non-www {HTTP_HOST} and %{REQUEST_URI} is the URI. “query parameters”? No, it simply copies the existing URI whether it’s a file, directory or something you’ll handle later in your mod_rewrite code … like query parameters).

Thanks for the “Brownie Button” (or is that expression too dated for anyone to understand these days?)!

Regards,

DK

Hi DK,

If I have another Digital drinking party I’ll be sure to invite you! I should have mentioned that I faked my way into a Mensa party and tried not too look like a fraud but it was hard :wink:

Thanks for your excellent descriptions as to what was happening!

I plan to continue with the Apache docs; once you get used to them, they have very valuable information. I will filter out the gratuitous usage of (.*).

Well to finish this thread unless you have any further critiques here is what I ended-up with:


ErrorDocument 404 /404.php


RewriteEngine on


# Redirect old_domain.ca to new_domain.ca with 301
RewriteCond %{HTTP_HOST} !^www\\.new_domain\\.ca
RewriteCond %{HTTP_HOST} !^$
RewriteRule .? http://www.new_domain.ca%{REQUEST_URI} [L,R=301]


# Ensure www  with 301
RewriteCond %{HTTP_HOST} !^www. [NC]
RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L]


# remove .php, .htm, .html, or .aspx ONLY if requested directly
# to mask the underlying technology
RewriteCond %{THE_REQUEST} \\.php
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L]
RewriteCond %{THE_REQUEST} \\.htm
RewriteRule ^([a-zA-Z]+)\\.htm$ /$1 [R=301,L]
RewriteCond %{THE_REQUEST} \\.html
RewriteRule ^([a-zA-Z]+)\\.html$ /$1 [R=301,L]
RewriteCond %{THE_REQUEST} \\.aspx
RewriteRule ^([a-zA-Z]+)\\.aspx$ /$1 [R=301,L]


# Redirect extensionless version to .php version
RewriteRule ^([a-z]+)$ $1.php [L]



I’m old enough to get the Brownie Button comment but I’m sure that likely it misses 80% of the crowd here. No problems you deserve it!

Steve

Hi Steve!

Mensa party? Did they drink Mensa’s there? I’m supposed to be eligible but never saw the need nor had the interest as it seems like self-aggrandizement.

The descriptions were as much for the SP members as for you as I’m sure you grasp these (regex) concepts very easily.

Good! I think PHP.net’s documentation is superior but, if you use the search on Apache.org’s Home page, you can usually find what you want. Just don’t eliminate (.*) from your bag of tricks as it does have its uses (especially for the non-English speakers who must use encoded characters).


ErrorDocument 404 /404.php


RewriteEngine on


# Redirect old_domain.ca to new_domain.ca with 301
RewriteCond %{HTTP_HOST} !^www\\.new_domain\\.ca
RewriteCond %{HTTP_HOST} !^$
RewriteRule .? http://www.new_domain.ca%{REQUEST_URI} [L,R=301]


# Ensure www  with 301
RewriteCond %{HTTP_HOST} !^www. [NC]
RewriteCond %{HTTP_HOST} ^(.+)$ [NC]
RewriteRule .? http://www.%1%{REQUEST_URI} [R=301,L]


# remove .php, .htm, .html, or .aspx ONLY if requested directly
# to mask the underlying technology
RewriteCond %{THE_REQUEST} \\.php
RewriteRule ^([a-zA-Z]+)\\.php$ /$1 [R=301,L]
RewriteCond %{THE_REQUEST} \\.htm
RewriteRule ^([a-zA-Z]+)\\.htm$ /$1 [R=301,L]
RewriteCond %{THE_REQUEST} \\.html
RewriteRule ^([a-zA-Z]+)\\.html$ /$1 [R=301,L]
RewriteCond %{THE_REQUEST} \\.aspx
RewriteRule ^([a-zA-Z]+)\\.aspx$ /$1 [R=301,L]


# Redirect extensionless version to .php version
RewriteRule ^([a-z]+)$ $1.php [L]



You’re killing .php, .htm, .html, or .aspx file extensions and only restoring .php? I hope that means you’ve

  1. replaced all .htm, .html, and .aspx scripts with .php scripts OR
  2. are using a “handler file” to read the .php script names to fetch the contents to display.

Yeah, not my problem but it’s a thought (the multiple file extensions to remove could open a new can of worms).

Thanks!

Regards,

DK

Well glad you wasted your talents here instead :wink:

Yes PHP’s documentation is great as will help a broader cross-section of people than Apache’s but alas the Apache mod-rewrite in PHP.net’s documentation is not comprehensive enough :frowning:

Yes to 1 & 2. I’m killing these extensions because I only need it actually rewritten to a php file and I was thinking that by stripping common extensions a user that was trying to find out your server side language would not be able to use the " Let’s try .aspx - no that failed, Ok lets try .php not that gets stripped too…'. Dare I ask what can of worm this opens :eek:

SS,

Thanks.

Yeah, that’s what my wife said (and not in a nice way).

Documentation quality varies but both PHP.net and Apache.org are great (Apache’s is just difficult to get to and highly technical).

Killing #1 and #2 are fine if you’re not going to reinstate them with those extensions (and have replaced the original file extensions with a .php extension). The can of worms is when you want to reinstate more than one file extension on extensionless filenames (as some members have wanted to do). My usual response is that Apache isn’t clairvoyant and can’t guess which file extension to use (so it just picks the first one in the mod_rewrite).

Regards,

DK

Hi DK

Thanks again! I see the worms that could wiggle and boar into the wood of one’s website. Lucky for me, this site is a small replacement site that is being redone starting next week. I’ll opt not to do this when the site is large as it becomes more likely that this would happen.

Regards,
Steve