Question about building a Pretty URL

Recently, I added on to my Pretty URL to handle sorting of Article Summaries, and things now look like this…


http://local.debbie/finance/markets/by-date/desc/3
http://local.debbie/finance/markets/by-title/asc/1
http://local.debbie/finance/markets/by-rating/desc/5

Where the Sort-Field = ‘by-date’, Sort-Order = ‘desc’, and Page-Number = ‘3’

Before going to bed last night, I realized that I forgot to add Filtering, and would like to further expand my Pretty URL like this…


http://local.debbie/finance/markets/editors-choice/by-date/desc/3
http://local.debbie/finance/markets/last-day/by-date/desc/3
http://local.debbie/finance/markets/last-month/by-date/desc/3

Where the Filter = ‘editors-choice’, Sort-Field = ‘by-date’, Sort-Order = ‘desc’, and Page-Number = ‘3’

Does that URL look okay, or is it too unwieldy?

Sincerely,

Debbie

Personally, I’d put the Sort-Field, Sort-Order, and Filter in the query string instead of directly in the URL because:

  1. They don’t change what is on the site, only how it is displayed (pretty much the definition of the query string)
  2. The fields are optional, which makes them perfect candidates for query string
  3. You are creating completely different URLs for the same data, which is not very wise SEO-wise
  4. You are creating very deep URLs which is again is not very wise SEO-wise

So I would do something like

/finance/markets/3?sort=date&sort_dir=desc&filter=editors-choice

This to me clearly states that I’m on page 3 of finance/markets, but I’ve changed it so that it’s displaying by date descending, and filters on editor-choice only.
Also, with query strings, I can remove any parameter I don’t like, and the URL is still valid! With your scheme it isn’t all that clear what can just be removed and what not.

Whatever you decide to do, make sure you point to a canonical URL in any URL that has different sorting (not for filtering, it doesn’t apply there)

ScallioXTX,

I hope you aren’t about to break my website design… :frowning:

You’ve totally lost - and scared - me on this… :frowning:

First off, the sample URLs I posted above are created from an Apache mod_rewrite file that took an enormous amount of work to make things look pretty…

I was under the impression that Query Strings were bad because they look crude, are harder to read, and even can expose the structure of your website to hackers.

Before I started adding on the capability to Filter and Sort, my website/URL was set up like this…

When a user goes to a Section, the URL looks like this…


http://www.debbie.com/finance/

When a user goes to a Subsection, the URL looks like this…


http://www.debbie.com/finance/economy/

When a user wants to read an Article, the URL looks like this…


http://www.debbie.com/finance/economy/fastest-growing-small-business-sectors-for-2013

As mentioned above, after building my Subsection home page - which lists Article Summaries for that Subsection - I realized that I needed to break things up into maybe 20 Article Summaries per Page, and that means I need to provide a way for people to Filter and Sort to get to the Articles they want to read.

So I figured the following would be a “pretty” and logical way for users to see how Articles were being sorted on the Subsection home page…


http://www.debbie.com/finance/economy[b][COLOR="#FF0000"]/by-date/desc/3[/COLOR][/b]

You are saying that is bad?? :frowning:

I read that link and it didn’t really make sense… :-/

Again, I thought what I was doing was “good”…

Sincerely,

Debbie

I’m not saying it’s bad, I’m saying I wouldn’t do it that way, for reasons outlined above.
Indeed, some say that query strings are a Bad Thing ™, but that doesn’t mean they don’t have their place. For stuff like this I really find the query string a more appropriate location. Others may disagree on this of course.

Regarding the query string being unreadable and showing stuff to hackers, that only applies if you do everything via the query string, i.e.

mydomain.com/?cat=1

But I’m not suggesting that. I’m suggesting a nice structure for the base url, with query string parameters for the sorting and filtering

As for your mod rewrite, you’d just have to remove the sort and filter specific rules. And there is nothing you need to do to add query strings to your URLs, that works out of the box with mod_rewrite.

Basically what it’s saying is if that have you have different pages with the same info displayed differently (i.e. one URL with articles sorted ascending, another URL with the same articles sorted descending), you have to pick one of them and deem that the “canonical” version (mostly people use the version you’d see if you don’t apply sorting or filtering manually). Suppose we call the one that is sorted ascending canonical, then we’d have to indicate on the page where the articles that are sorted descending that the ascending version is canonical. So you’d place this on the descending page


<link rel="canonical" href="http://blog.example.com/same/articles/but/sorted/ascending" />

Then when a search engine crawls you page with the articles sorted descending it knows not to index that page, but to index the page where the articles are sorted ascending instead. If you let google index both pages there is a chance you will get a duplicate content penalty [since the pages contain the same content but in a different order].

Makes sense?

ScallioXTX,

I guess I’d rather have people challenge me and make me a better web developer, but it sure can be frustrating at times!!

So let’s say I took your advice and had this…


www.debbie.com/finance/economy?sortfield=by-date&sortorder=asc&page=1
www.debbie.com/finance/economy?sortfield=by-date&sortorder=asc&page=2
www.debbie.com/finance/economy?sortfield=by-date&sortorder=asc&page=3

Sorry if you explained before, but why wouldn’t Google consider these 3 pages as similar or duplicates versus this…


www.debbie.com/finance/economy/by-date/asc/1
www.debbie.com/finance/economy/by-date/asc/2
www.debbie.com/finance/economy/by-date/asc/3

Off Topic:

Finance is the Section and Economy is the Subsection - and not a file - so shouldn’t it be…


www.debbie.com/finance/economy[B][COLOR="#FF0000"]/[/COLOR][/B]?sortfield=by-date&sortorder=asc&page=1
www.debbie.com/finance/economy[B][COLOR="#FF0000"]/[/COLOR][/B]?sortfield=by-date&sortorder=asc&page=2
www.debbie.com/finance/economy[B][COLOR="#FF0000"]/[/COLOR][/B]?sortfield=by-date&sortorder=asc&page=3

Okay.

Yeah, I know, but I worked so hard to craft this perfect mod_rewrite, and now you are saying not to use it because it will hurt me in Google’s eyes?!

I guess where I am also confused is that most of my content is dynamic.

A user would have to choose to go to Sort by Date, in Descending Order and land on Page 3.

So how can a Google bot crawl webpages that don’t exist as physical files?

Sincerely,

Debbie

P.S. Opinions from other gurus are welcome on this topic!!!

Google doesn’t consider pages similar or duplicate based on URL, but only based on the content on that pages. You can structure the URL all you like as far as google is concerned (ever noticed how ugly amazon’s URLs are? They still rank fine). Pretty URLs are just a hint that the content is static, while query strings are hint that content is dynamic.

Depends on who you ask. It’s a matter of taste really. It’s possible to argue either way.

Google doesn’t care about files or directories, all google cares about are URLs. So google can crawl both page 1 ascending and page 3 descending, and since they contain the same content, it’s wise to put a canonical from page 3 descending to page 1 ascending (and from page 2 descending to page 2 ascending, etc)

While we’re at it, you may want to add some pagination hints as well :slight_smile: http://googlewebmastercentral.blogspot.nl/2011/09/pagination-with-relnext-and-relprev.html

So, on a side note, if I have the same Article cross-referenced in different places on my website, then that is okay?

For example, if I had these URLs all pointing to the same Article, would that be okay…


www.debbie.com/featured/be-sure-to-charge-sales-tax-online

www.debbie.com/legal/taxes/be-sure-to-charge-sales-tax-online

www.debbie.com/business-type/online/be-sure-to-charge-sales-tax-online

I guess that I don’t understand how crawling works…

I thought what Google did was go to www.debbie.com, and then follow all of the links on my Home Page, and then the links off of each corresponding page until it had “crawled” my entire website. Right?

In the page header, I have “tabs” like “Finance”, “Legal”, “Management”, etc. So Google would crawl the corresponding links: “/finance/”, “/legal/”, “/management/”, right?

And it would crawl links like “/finance/economy/”.

But the are no links on my website like “/finance/economy/by-rating/asc/9”

If you clicked on the link “Economy” (i.e. “/finance/economy/”) and were routed to the Economy subsection home page, then you would see two drop-downs where you could choose “Filter = ‘Last 30 Days’” and “Sort = ‘By Rating (Asc)’” and “Page = ‘5’”, and then my script would redirect back to "/finance/economy/by-rating/asc/5, but that URL doesn’t exist statically anywhere, so how could it get crawled by Google??


Also, you keep saying, “and since they contain the same content”…

I’m not sure what you mean.

Let’s back up…

As it stands now, when you go to “/finance/economy/” there are 200 Article Summaries on that page. (Each is unique!)

Let’s say I allow 10 Article Summaries per Page.

So now I have 1, 2, 3,… , 10 pages but each page still has entirely unique content!

And if you toggle between Sort Ascending and Sort Descending, each of those 10 pages will still have unique content!

And if you toggle between Sort by Date and Sort by Title, each of those 10 pages will still have unique content!

So why would Google complain if it indeed did index…


www.debbie.com/finance/economy/by-date/asc/1
www.debbie.com/finance/economy/by-date/desc/1
www.debbie.com/finance/economy/by-title/asc/1
www.debbie.com/finance/economy/by-title/desc/1

I would agree that maybe the default view of…

  • No Filter
  • Sort by Date
  • Desc
  • Start on Page 1

…is the “true” home page of the “Economy” section, but I don’t see any of those combinations as “duplicate content”… :-/

By contrast, if I had an e-commerce site with 15 products on a page, and one URL showed those 15 products sorted by color, and a second URL showed those same 15 products sorted by price, and so on, THEN you would be creating bloated indexing that serves no purpose, but that is not what I have on my website.

Am I missing something here?

BTW, SEO is very important to me, and I do want to build the best website possible. I am just not understanding what you are telling me.

Sincerely,

Debbie

Hi Debbie,

The approach that I have taken to increase my SEO Brownie Points is:

  1. ensure every page is unique and content is not duplicated (Oxymoron ?)
  2. append version or date, etc with pages that have identical prefixes (eliminates identical title warnings)
  3. each and every page has a specific Canonical link
  4. each page has unique “meta contents” (thankfully key words are now redundant :slight_smile:

As an example, Google “crap joke central” “johns-jokes”, notice there are 52 references to unique pages.

Following the above five points manages to keep the Master happy :slight_smile:

HTML Improvements
Last updated Apr 24, 2014
We didn’t detect any content issues with your site. As we crawl your site, we check it to detect any potential issues with content on your pages, including duplicate, missing, or problematic title tags or meta descriptions. These issues won’t prevent your site from appearing in Google search results, but paying attention to them can provide Google with more information and even help drive traffic to your site. For example, title and meta description text can appear in search results, and useful, descriptive text is more likely to be clicked on by users. More Information

=====================//====================
There should be links to your finance, legal, featured pagination pages on your home page.

As far as the below pagination and searches are concerned I would be inclined to exclude the items in robots.txt and to have a “no-follow, no-index” links. Reason is they are only linked lists and have no SEO link-juice content .

The also create problems with duplicate page content that has identical link-juice :frowning:

http://local.debbie/finance/markets/by-date/desc/3
http://local.debbie/finance/markets/by-title/asc/1
http://local.debbie/finance/markets/by-rating/desc/5
http://local.debbie/finance/markets/editors-choice/by-date/desc/3
http://local.debbie/finance/markets/last-day/by-date/desc/3
http://local.debbie/finance/markets/last-month/by-date/desc/3

=====================//====================
The following pages will create problems that should and can be avoided by eliminating the three categories.
www.debbie.com/featured/be-sure-to-charge-sales-tax-online
www.debbie.com/legal/taxes/be-sure-to-charge-sales-tax-online
www.debbie.com/business-type/online/be-sure-to-charge-sales-tax-online

=====================//====================
As far as my joke site is concerned each and every page is included in the dynamically generated Sitemap.xml

http://www.johns-jokes.com/c_sitemap

Just my two cents for the day :slight_smile:

I can’t say exactly how Google does it, but in my experience they seem to find most everything. I’m pretty sure their crawler evaluates JavaScript. Plus if anyone on the Web shares one of those links somewhere – a forum or an article or a tweet or wherever – then Google can find it that way too. Seems safer to assume that Google will find it one way or another.

My preference would probably be to have sorting and pagination as query string values. The filter seems like a toss-up to me between query string or path segment. If you know that you will only ever use one filter at a time, then it should be find as a path segment. But if you think you ever might want a combination of filters, such as filter=markets & filter=economy, then you’ll definitely want to put the filter in the query string.

Yes, that is fine, and even advisable because it’s good for your internal link profile. Make sure you link a lot to pages you want to rank high (is what our SEO people told me).

Check what @Jeff_Mott; stated about this above. That’s what would’ve said as well.

Let’s for an example that you have 9 articles and you show 3 per page.
Then on page 1 ascending you have

  • Article 1
  • Article 2
  • Article 3

and on page 3 descending you have

  • Article 3
  • Article 2
  • Article 1

These are the same articles! Ordered in a different way, but they are the same articles, so basically these pages have the same content, but in a different order.
So, I would suggest page 1 ascending is the canonical version of page 3 descending.
I don’t know how to make it any more clear than that.

I disagree because Debbie’s site will have three separate URL’s all being populated most probably with identical content.

Google will complain and issue warnings that the URL’s have duplicate content, titles and meta content, etc

Goole Webmaster Tools:
[COLOR=#333333]

[/COLOR]HTML Improvements
Last updated Apr 24, 2014
We didn’t detect any content issues with your site. As we crawl your site, [b]we check it to detect any potential issues with content on your pages, including duplicate, missing, or problematic title tags or meta descriptions. [b] These issues won’t prevent your site from appearing in Google search results, but paying attention to them can provide Google with more information and even help drive traffic to your site. For example, title and meta description text can appear in search results, and useful, descriptive text is more likely to be clicked on by users. More Information
[COLOR=#333333]

Believe me it has taken years to eliminate Google errors and warnings :)[/COLOR]

Your example is wrong, so obviously it is unclear to me.

If, as you say, I have 9 articles with 3 articles per page, and I sort in Ascending Order, then I would have…


[b][u]Page 1:[/u][/b]
- Article_1
- Article_2
- Article_3

[b][u]Page 2:[/u][/b]
- Article_4
- Article_5
- Article_6

[b][u]Page 3:[/u][/b]
- Article_7
- Article_8
- Article_9

And if I sorted things in Descending Order, then I would have…


[b][u]Page 1:[/u][/b]
- Article_9
- Article_8
- Article_7

[b][u]Page 2:[/u][/b]
- Article_6
- Article_5
- Article_4

[b][u]Page 3:[/u][/b]
- Article_3
- Article_2
- Article_1

In this simple example, yes, Page 2 has “duplicate” articles, but “in real life” it would be unlikely things would fall like that?!

The point being, I basically do NOT have duplicate data by introducing Sorting…

Sincerely,

Debbie

Your example is the same as mine, and they both contain duplicate content, and not just on page 2, see colors below

  • Page 1 ascending contains the same articles as page 3 descending
  • Page 2 ascending contains the same articles as page 2 descending
  • Page 3 ascending contains the same articles as page 1 descending

Or, more generic for n pages, page i ascending contains the same articles as page n+1-i
This is always the case, regardless of how many pages you have.

I must have misread her question. I agree, it’s not a good idea to link to all kind of different URLs, but just use one.