Redirecting from abc-123.html to abc.html for thousands of URLs

I have a list of about 10k old URLs (which are not 404) formatted at name-id.html which have been updated to not contain the id and I need to setup 301 redirects from the old URLs to the new ones. So…

apple-35.html → apple.html
banana-326.html → banana.html
carrot-3735.html – carrot.html

Any way to auto rewrite URLs (if is 404 and ends in #.html) or whether I can just setup 10k individual “Redirect 301” rules? Would 10k redirect rules slow everything down?

Thanks!

Hi Oleg,

Assuming that there are no numbers in the name part of the URL, then this rewrite rule should work for all those URLs you posted:

RewriteRule ^(\w+)-\d+\.html$ $1.html [R=301]

Thanks! Would this also work for something like banana-apple-23542.html ?

Also, is there any clause to only do this for 404 pages? Like if the page is a 404 and matches the structure, then redirect. This is a big ecommerce site and there may or may not be other pages in a similar structure. I’m pretty sure there aren’t but I want to be on the safe side and account for the future.

If there are names outtside of what you posted as examples it could still be possible, but you’ll need a tighter regex. i…e. not only to get what you want, but to not get what you want not to.

I guess you could write a script to list all that match, but with that many even reading the list would be a chore.

Maybe it would be best to do the best you can and then fix any that show up in your logs.

would it affect load speed at all? having 10k “Redirect 301 old new” lines for each url manually. Cause i have all of the urls that would need it but i’m thinking a rule may be better than manual redirects.

htaccess files are read every HTTP request so you don’t want a massive htaccess file.
A regex is definately what you want.

Are these actual .html files, or are these URLs being served by a server side script like PHP?

If they are actual URLs it’s quite easy, just change the code to

RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^(\w+)-\d+\.html$ $1.html [R=301]

which basically says, that if (RewriteCond) the requested file (%{REQUEST_FILENAME}) is not (!) a file (-f) (i.e, 404), then do the rewite, otherwise don’t.

If they aren’t actual html files this trick won’t work because Apache can’t see what the server side language will or will not show. In that case you would have to code the redirects in you server side language instead of in Apache.

They aren’t actual files =/

Like via the script that generated the pages in the first place?

Exactly.