Another url rewrite question

I just cant seem to learn regex - its not close enough to my main work and I seldom get back to this stuff regularly enough. Yet I do have to try and fix some issues from a menu change giving a lot of errors.
I feel a 301 redirect in htaccess is the right thing to do.

I am trying to get rid of old urls that finish with .html now site.com/page1/article304.html needs to become site.com/page1/article304

I have looked around but not found anything similar enough so I was wondering if I could get a direct solution here. Thanks for those with the knowledge who like to share.

Scroll down near the bottom here this may help

Not a bad starting point - some usefulr rmembering things here

RedirectMatch 301 /(.*)\.(html) http://www.site.org/$1

doesnt quite have the regex I need though - this is close but it doesnt redirect properly and is a very dangerous line ! breaks all the site.

ARGH! Another BAD referral as the :kaioken: EVERYTHING :kaioken: atom is the bane of all mod_rewrite newbies!

landed, you might benefit from reading the mod_rewrite tutorial linked in my signature as it contains explanations and sample code. It’s helped may members and should help you, too.

Regards,

DK

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)\.html$ /$1 [L,R=301]

the above worked. I am not sure how you can do this without the global (.*) in this case but I did take your point and its dangers.

I have read already your tutorials but this subject is too big or my grey matter too small for dipping in and out as a web producer. These subjects need specialists sure. Thanks for posting everyone.

landed,

I would advise learning some regex as it can be critical:

page1/article304.html => page1/article304

RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-z0-9/]+)\\.html$ $1 [R=301,L]

That will do it very nicely and avoid any problems with dot (and other misc) characters.

Regards,

DK

Thank you for the improved lines of code ! I am using them and am grateful as others who also come by will be I am sure.
To continue my learning I wondered what happens if I have further rules as I understand that the L means dont process further

so page.html changes to page then it stops doing further so what if I wanted to pick up on a later rule as well…

so request url is initially

http://site.com/honduras.html

then becomes

http://site.com/honduras

but i want that to redirect to (or want to respect another rule)

so redirect 301 /honduras http://site.com/central-america/honduras

is a loop going to be possible, ie does the apache first strip the html then goes round and does the adding of central-america process…or is it not like a loop do we get ONE pass.

I got a full path in the url here so this didnt work for me - i.e. it was matching the file path instead of the url and so i get /public/g/ etc which is normally hidden in urls.

landed,

No problem. I really loathe the inappropriate use of (.*) so my signature’s tutorial does let you know how to get around its pitfalls.

No, the Last flag tells mod_rewrite to restart from the beginning with the new {REQUEST_URI}. Otherwise, it’ll go to the end and, because it had a match/redirection, it will start over from the end. You’re merely saving a few microseconds of processing time (where speed is essential).

OMG! A new requirement in the middle of a thread? Oh, well, if you have a list of CA countries to which you need to add the CA subdirectory, then you’ll need to provide the list, match the country and redirect. If this is what you REALLY want, please provide the list you’re using in your database and an attempt to accomplish my “pseudo code” and I’ll be back later to help. Don’t forget that this new rule is more specific than the “general rule” so it has to precede it.

mod_rewrite is only a one pass proposition is there are no matches.

You might benefit from reading the mod_rewrite tutorial linked in my signature as it contains explanations and sample code. It’s helped may members and should help you, too.

Regards,

DK

Unfortunately, with that rewrite rule, page1/article-304.html would not rewrite to page1/article-304. Your regexp doesn’t match dashes, nor does it match many other valid URL characters. Certainly there are situations where matching with dot isn’t appropriate, but this is a situation where matching with dot is absolutely appropriate. The OP wants to rewrite all URLs ending in .html, so it makes perfect sense to match on all characters.

[FONT=Courier New]# Any RewriteConds here

Match any URL (dot means any) ending in .html

RewriteRule ^(.+)\.html$ $1 [R=301,L][/FONT]

Jeff,

Correct! The OP didn’t specify dashes so why bother adding those (to expose his script to more than it can handle)?

Ditto “many other valid URL characters.” You certainly don’t want to match : as it’s not a valid URI character; ? as it’s not a valid URI character; etc.

The value of this is that mod_rewrite can do some limited error checking for you so your scripts don’t have to (albeit, it would be a good idea for them to validate the input before accessing the database).

Finally, yes, the dot character is specified following the character range definition and it’s followed by html and the end anchor. What’s your point? Do you want to allow multiple dots to be matched?

Specificity makes a difference in mod_rewrite. The tighter you can specify your requirements the easier it is to write good mod_rewrite code.

Okay, you do get a point for using the + metacharacter rather than the * I see all too often.

You might benefit from reading the mod_rewrite tutorial linked in my signature as it contains explanations and sample code. It’s helped may members and should help you, too.

Regards,

DK

I disagree with your approach. I think your regexp becomes more complicated than it needs to be due to possibly long lists of characters in the class. I think you run the risk of introducing bugs by forgetting certain characters. And to boot, I think you get little to no benefit for it.

The last point I’ll leave you with is that the Apache documentation (which, in my opinion, is more authoritative than your personal tutorial) uses dot in situations exactly like this one.

[FONT=Courier New]# example 1: file extension change
RewriteRule ^(.+)\.html$ $1.php

example 2: parse out basename

RewriteRule ^(.+)\.html$ $1[/FONT]

Thanks for your help guys I would like to say that I find the apache docs hard to read (a matter of the little grey matter being too little). I have got further this time and maybe little by little it is sinking in. I dont think you can oversimplify any tutorials and any new ones are always a welcome read by me.

Interesting that none seem to cover how the url life may loop through the .htaccess file from the above I see that the url indeed will pass through until no further matches happen the L makes us go back to the start with the current changed url now. And a strategy for handling specificity. DK was saying that to do more base (towards the left hand side of the string) changes first then to get to the original OP question the .html removal which could therefore be the very last match we want to do and similarly removal of www or adding www as people want (more cosmetic)

the .html or .php removal is useful as it means the url might have a better chance of living longer as its less specific in seo terms.