What are URL "best practices" these days?

I’m starting another web site, and thinking about how the URL structure should be.

The last time I did this, it was de rigeur to convert this:
www.example.com/article.php?ID=8838
into this:
www.example.com/article/8838.php

These days I see lots more variations on the theme. Things like

www.example.com/article/qq8838

or

www.example.com/article/qq8838/parameter

or

www.example.com/type/subtype/lots-and-lots-of-text-that-seems-a-little-too-much-text-than-should-be-used/

or

www.example.com/type/subtype/lots-and-lots-of-text-that-seems-a-little-too-much-text-than-should-be-used/8838

One thing I’ve noticed is that the .php|.asp|.etc extensions on the web’s largest sites are all gone, absorbed into the rewrite or script.

I know having all that text in the URL is supposed to help search engines, but wasn’t that only true back when Google choked on parameters? Do search engines even eat URL chow anymore, or is that an outdated concept? It certainly doesn’t help the users, or there wouldn’t be a million URL shortening services out there.

All of the above examples are from the 10 largest web sites on the internet, so both the lots of text and the no-text options are doing well for those sites.

My preference is to do something like
www.example.com/artlce/8838
but then what do I do in the case I have to pass two parameters? www.example.com/article/8838/argument
seems needlessly complicated.

Another point to note is that the titles of these articles sometimes change. Not often. Of the 10,000 or so planned articles, maybe five or six will change names in any given year. But that could affect the URL if I choose to put the title of the article into it.

It’s completely arbitrary now that it’s become trivial for people to separate URLs from physical files. Whatever scheme you like is what you do. Google does consider words found in the URL (where slashes and dashes are considered separators) when matching search queries, so they can give a tiny boost to ranking for those words.

As far as implementing that kind of titles go, the standard is to generate and store the “slug” – the URL version of the title with non-alphanumeric characters stripped and spaces turned to dashes – as its own field in the database. That way you can index it for faster retrieval to find the article matching the URL, and if you can change the slug and title independently.

It’s well worth looking at what other sites are doing. Have a look at Google news - http://news.google.co.uk/nwshp?hl=en&tab=wn - and see the huge variety of different URL structures on news links.

A good type of URL, but if you split your articles into categories, why not include the name of a category to a URL? The more keywords your URLs have, the better for Google they become.

What’s wrong with it? Do you think your URL becomes unnecessarily long or is it difficult to extract parameters from such an address?

Another point to note is that the titles of these articles sometimes change. Not often. Of the 10,000 or so planned articles, maybe five or six will change names in any given year. But that could affect the URL if I choose to put the title of the article into it.[/QUOTE]

[/QUOTE]
Long, yes. Difficult, no.

I’m primarily interested in improving the experience for the end user, not for the search engines. One thing I’ve noticed in the 20-or-so sites I’ve created is that it’s all about content. No amount of search engine trickery can compete with good content and a good user experience.

I think that’s borne out by the experience of IMDB: lots of great content made it one of the top ten English language web sites on the internet, in spite of its SEO-hostile URLs.

If you just add a category to the URLs it won’t hurt, especially when you remove an article name from an URL and leave only a category, a list of categories is displayed. So, your URLs will be “hackable” and meaningful.

Agreed. But I think Google probably checks the popular sites (IMDB, WikiPedia) first before showing other results.

I think that would fall into putting too much of a hand into the search results, but then again, that doesn’t meant that they don’t. However, IMDB and Wikipedia are very definitive go-to sites and many people link to them, so they would naturally rise to the top.

I think the most important thing (as someone else mentioned briefly) is to try to put relevant keywords in the URL.

Google (and presumably other search engines) seem to give a substantial boost when keywords in the URL match the body content, and especially when that matches up with content on incoming links.

Aside from that, I would try to keep them as clean and short as possible.

I would just use domain/article/[uniqueID][.html|.xml|.json]
It doesn’t need all that fancy stuff or “hackable” urls.

IMHO a waste of time and computer resources doing anything further.
([.html|.xml|.json] === if you want the ability to return different formats)