Search Engine-Friendly URLs

Still, if possible it’s probably better to stay on the safe side :wink: I think that’s the route I’ll be taking!

Thanks for this article, I found it really useful.

I just wanted to comment on two things:

1.) Seriously, if you are making some sort of dictionary or resource website that is based around searches, allow the user to type in http://example.org/query_here. I must say, that is the most annoying part of Wikipedia for me. Let me type it in myself! I wish dictionary.com would do that also. PHP.net does it wonderfully, just go to php.net/include, and the include() function reference page pops up!

2.) Some people might want to head over to drupal.org and look at their .htaccess file. They used an interesting technique for achieving URLs like this. They basically have everything after the domain name and slash rewritten to ?q=. Therefore, you could type http://example.org/article/215, and explode() the / out of there, call the article function, and then fetch the id 215. Or you could have pages like http://example.org/aboutus and http://example.org/donate without having to create files for all of them (as in the 3rd method mentioned above).

Interestingly enough my re-write script was really simple and because I was already using $_SERVER('REQUEST_URI) to extract and then later explode the url into chuncks, I didnt even have to modify my original page handler.

I simply send the data in the same format but mod-rewite avoids having to send a 404 error back to the user:

ErrorDocument 404 /error.php

RewriteEngine on
news index
RewriteRule ^latest-news/ myPageHandlerScript.php
RewriteRule ^latest-news myPageHandlerScript.php
news details
RewriteRule ^latest-news/article/([a-zA-Z&0-9-.:@]+)/([0-9]+) myPageHandlerScript.php
news pagination
RewriteRule ^latest-news/([0-9]+)/([0-9]+)/([0-9]+) myPageHandlerScript.php
#projects
RewriteRule ^our-work/([a-zA-Z&0-9-.:@]+)/([a-zA-Z&0-9-.:@]+)/([0-9]+) myPageHandlerScript.php
#services
RewriteRule ^our-services/ myPageHandlerScript.php
#services details
RewriteRule ^our-services/([a-zA-Z&0-9-.:@]+)/([0-9]+) myPageHandlerScript.php

#services details
RewriteRule ^sitemap/ sitemap.php
RewriteRule ^sitemap sitemap.php

etc

It works really well, XML generatin works perfectly now and presents no unexpected errors, this one is SEO friendly throughout.

You can see it in action here: http://www.bigwebcompany.co…uk

There is a drawback to these types of URL… Yahoo sucks… Yahoo likes to drop the last traing slash and then concatenate directories - using the above examples, a request from Slurp might be

/article/999/12article/999/article12/999

It sounds crazy I know, but I have spent weeks trying to “retrain” Slurp because of this exact problem, only to find it dropping the trailing slash on the retrained URLs

A further complication is that Slurp makes up queries for example /article/?id=123, where the id 123 does not exist in any way on your site.

So much fun :frowning:

In my personal experience the mod_rewrite works far better than the other options. I originally used an error page to parse the variables from the names of html files similar to the way you described however occasionally IE had problems and would display its own 404 page. So I settled into using mod_rewrite.
Originally I tried using a directory structure for the rewrites but it was causing trouble with my relative links, so I decided to use the name of a html file to pass the variables the same way as I did in the error method. This way there would be no hanging directories which make it obvious variables are being passed and it left the pages with easy to remember names.

If anyone is interested I used the following rewrite rule:
RewriteBase /
RewriteRule ^([a-zA-Z]+).html$ index.php?mode=$1

This way my entire website appears to be a bunch of static pages while giving me the ease of using a single file.

Just wondering why Sitepoint chose PATH_INFO method over mod_rewrite. Is there any specific reason for this decision?

Thank you for the ideas. The ForceType Directive method works great for my site. I was able to nest the script and now I’ll never need to use those silly ?'s in the URL’s again.

The only problem I encountered was the first value of the array. In your example,

$var_array = explode(“/”,$PATH_INFO);

produces:

$var_array[0] = “article.php”

$var_array[1] = 999

$var_array[2] = 12

But when I ran the same code on my server, $var_array[0] always returned blank. It’s not a problem, just curious as to why this could be happening.

Thanks again!

maybe your server disable register global

When I use this, the pages are not storing sessions :frowning:
The site has a member login and it workls but after that the session gets timed out

As of PHP 4.3.2, PATH_TRANSLATED is no longer set implicitly under the Apache 2 SAPI in contrast to the situation in Apache 1, where it’s set to the same value as the SCRIPT_FILENAME server variable when it’s not populated by Apache. This change was made to comply with the CGI specification that PATH_TRANSLATED should only exist if PATH_INFO is defined.

Apache 2 users may use AcceptPathInfo = On inside httpd.conf to define PATH_INFO.

It’s a predefined variable, $_SERVER.

What a search engine sees is the page going out of the web server. don’t see how we can achieve that without rewrite the pages.

i have noticed reallysimpleserver at www.reallysimplesoft.com which does rewrite on the fly. never tried.

cant you just use $_SERVER[‘REQUEST_URI’] instead of $PATH_INFO to get around this?

On servers with PHPSUEXEC, ForceType doesn’t work. Instead use SetHandler, with everything else exactly the same, and it will work fine.

The zend framework uses query string like this:
www.somesite.com/search/name/value/name/value
e.g.
www.somesite.com/search/lang/en/term/monkeys

and then getParam(‘term’) to retreieve them.
This eliminates some of the problems described in this blog but does make URLs longer.

Be aware if you are using any additional forward slashes in your URL that all your <link>, <img>, <script> and <a> tags with src or href attributes will be relative to the whole URL now. So you will often want to prefix them with / to make them relative to your domain.

For example the page retreived with www.somesite.com/search/lang/en/str/monkeys may have <img src=“gfx/small.png” alt=“”> what would actually reference www.somesite.com/search/lang/en/str/monkeys/gfx/small.png, obviously not what you want. Don’t use …/…/ to solve this, it just gets silly instead try <img src=“/gfx/small.png” alt=“”>

Hope this helps someone.

Awesome article. One of the most straightforward and easy to implement I have found.

Thank you. Very simple.

Excellent read! Wow, I am going to be researching more about this. Thank you!

Use ModRewrite!!
That’s the right way!!

Never, never, never use the “404 error handler” method. It wouldn’t work with suphp or php as cgi, the 404 headers would be already sent and all pages would be actually served as 404 errors.

And most modern hostings use suphp or other suexec-like technique for user separation.

ModRewrite!

ModRewrite!

It works.