This to me clearly states that I’m on page 3 of finance/markets, but I’ve changed it so that it’s displaying by date descending, and filters on editor-choice only.
Also, with query strings, I can remove any parameter I don’t like, and the URL is still valid! With your scheme it isn’t all that clear what can just be removed and what not.
Whatever you decide to do, make sure you point to a canonical URL in any URL that has different sorting (not for filtering, it doesn’t apply there)
I hope you aren’t about to break my website design…
You’ve totally lost - and scared - me on this…
First off, the sample URLs I posted above are created from an Apache mod_rewrite file that took an enormous amount of work to make things look pretty…
I was under the impression that Query Strings were bad because they look crude, are harder to read, and even can expose the structure of your website to hackers.
Before I started adding on the capability to Filter and Sort, my website/URL was set up like this…
When a user goes to a Section, the URL looks like this…
http://www.debbie.com/finance/
When a user goes to a Subsection, the URL looks like this…
http://www.debbie.com/finance/economy/
When a user wants to read an Article, the URL looks like this…
As mentioned above, after building my Subsection home page - which lists Article Summaries for that Subsection - I realized that I needed to break things up into maybe 20 Article Summaries per Page, and that means I need to provide a way for people to Filter and Sort to get to the Articles they want to read.
So I figured the following would be a “pretty” and logical way for users to see how Articles were being sorted on the Subsection home page…
I’m not saying it’s bad, I’m saying I wouldn’t do it that way, for reasons outlined above.
Indeed, some say that query strings are a Bad Thing ™, but that doesn’t mean they don’t have their place. For stuff like this I really find the query string a more appropriate location. Others may disagree on this of course.
Regarding the query string being unreadable and showing stuff to hackers, that only applies if you do everything via the query string, i.e.
mydomain.com/?cat=1
But I’m not suggesting that. I’m suggesting a nice structure for the base url, with query string parameters for the sorting and filtering
As for your mod rewrite, you’d just have to remove the sort and filter specific rules. And there is nothing you need to do to add query strings to your URLs, that works out of the box with mod_rewrite.
Basically what it’s saying is if that have you have different pages with the same info displayed differently (i.e. one URL with articles sorted ascending, another URL with the same articles sorted descending), you have to pick one of them and deem that the “canonical” version (mostly people use the version you’d see if you don’t apply sorting or filtering manually). Suppose we call the one that is sorted ascending canonical, then we’d have to indicate on the page where the articles that are sorted descending that the ascending version is canonical. So you’d place this on the descending page
Then when a search engine crawls you page with the articles sorted descending it knows not to index that page, but to index the page where the articles are sorted ascending instead. If you let google index both pages there is a chance you will get a duplicate content penalty [since the pages contain the same content but in a different order].
Google doesn’t consider pages similar or duplicate based on URL, but only based on the content on that pages. You can structure the URL all you like as far as google is concerned (ever noticed how ugly amazon’s URLs are? They still rank fine). Pretty URLs are just a hint that the content is static, while query strings are hint that content is dynamic.
Depends on who you ask. It’s a matter of taste really. It’s possible to argue either way.
Google doesn’t care about files or directories, all google cares about are URLs. So google can crawl both page 1 ascending and page 3 descending, and since they contain the same content, it’s wise to put a canonical from page 3 descending to page 1 ascending (and from page 2 descending to page 2 ascending, etc)
I guess that I don’t understand how crawling works…
I thought what Google did was go to www.debbie.com, and then follow all of the links on my Home Page, and then the links off of each corresponding page until it had “crawled” my entire website. Right?
In the page header, I have “tabs” like “Finance”, “Legal”, “Management”, etc. So Google would crawl the corresponding links: “/finance/”, “/legal/”, “/management/”, right?
And it would crawl links like “/finance/economy/”.
But the are no links on my website like “/finance/economy/by-rating/asc/9”
If you clicked on the link “Economy” (i.e. “/finance/economy/”) and were routed to the Economy subsection home page, then you would see two drop-downs where you could choose “Filter = ‘Last 30 Days’” and “Sort = ‘By Rating (Asc)’” and “Page = ‘5’”, and then my script would redirect back to "/finance/economy/by-rating/asc/5, but that URL doesn’t exist statically anywhere, so how could it get crawled by Google??
Also, you keep saying, “and since they contain the same content”…
I’m not sure what you mean.
Let’s back up…
As it stands now, when you go to “/finance/economy/” there are 200 Article Summaries on that page. (Each is unique!)
Let’s say I allow 10 Article Summaries per Page.
So now I have 1, 2, 3,… , 10 pages but each page still has entirely unique content!
And if you toggle between Sort Ascending and Sort Descending, each of those 10 pages will still have unique content!
And if you toggle between Sort by Date and Sort by Title, each of those 10 pages will still have unique content!
So why would Google complain if it indeed did index…
…is the “true” home page of the “Economy” section, but I don’t see any of those combinations as “duplicate content”… :-/
By contrast, if I had an e-commerce site with 15 products on a page, and one URL showed those 15 products sorted by color, and a second URL showed those same 15 products sorted by price, and so on, THEN you would be creating bloated indexing that serves no purpose, but that is not what I have on my website.
Am I missing something here?
BTW, SEO is very important to me, and I do want to build the best website possible. I am just not understanding what you are telling me.
The approach that I have taken to increase my SEO Brownie Points is:
ensure every page is unique and content is not duplicated (Oxymoron ?)
append version or date, etc with pages that have identical prefixes (eliminates identical title warnings)
each and every page has a specific Canonical link
each page has unique “meta contents” (thankfully key words are now redundant
As an example, Google “crap joke central” “johns-jokes”, notice there are 52 references to unique pages.
Following the above five points manages to keep the Master happy
HTML Improvements
Last updated Apr 24, 2014
We didn’t detect any content issues with your site. As we crawl your site, we check it to detect any potential issues with content on your pages, including duplicate, missing, or problematic title tags or meta descriptions. These issues won’t prevent your site from appearing in Google search results, but paying attention to them can provide Google with more information and even help drive traffic to your site. For example, title and meta description text can appear in search results, and useful, descriptive text is more likely to be clicked on by users. More Information
=====================//====================
There should be links to your finance, legal, featured pagination pages on your home page.
As far as the below pagination and searches are concerned I would be inclined to exclude the items in robots.txt and to have a “no-follow, no-index” links. Reason is they are only linked lists and have no SEO link-juice content .
The also create problems with duplicate page content that has identical link-juice
=====================//====================
As far as my joke site is concerned each and every page is included in the dynamically generated Sitemap.xml
I can’t say exactly how Google does it, but in my experience they seem to find most everything. I’m pretty sure their crawler evaluates JavaScript. Plus if anyone on the Web shares one of those links somewhere – a forum or an article or a tweet or wherever – then Google can find it that way too. Seems safer to assume that Google will find it one way or another.
My preference would probably be to have sorting and pagination as query string values. The filter seems like a toss-up to me between query string or path segment. If you know that you will only ever use one filter at a time, then it should be find as a path segment. But if you think you ever might want a combination of filters, such as filter=markets & filter=economy, then you’ll definitely want to put the filter in the query string.
Yes, that is fine, and even advisable because it’s good for your internal link profile. Make sure you link a lot to pages you want to rank high (is what our SEO people told me).
Check what @Jeff_Mott; stated about this above. That’s what would’ve said as well.
Let’s for an example that you have 9 articles and you show 3 per page.
Then on page 1 ascending you have
Article 1
Article 2
Article 3
and on page 3 descending you have
Article 3
Article 2
Article 1
These are the same articles! Ordered in a different way, but they are the same articles, so basically these pages have the same content, but in a different order.
So, I would suggest page 1 ascending is the canonical version of page 3 descending.
I don’t know how to make it any more clear than that.
I disagree because Debbie’s site will have three separate URL’s all being populated most probably with identical content.
Google will complain and issue warnings that the URL’s have duplicate content, titles and meta content, etc
Goole Webmaster Tools: [COLOR=#333333]
[/COLOR]HTML Improvements Last updated Apr 24, 2014 We didn’t detect any content issues with your site. As we crawl your site, [b]we check it to detect any potential issues with content on your pages, including duplicate, missing, or problematic title tags or meta descriptions. [b] These issues won’t prevent your site from appearing in Google search results, but paying attention to them can provide Google with more information and even help drive traffic to your site. For example, title and meta description text can appear in search results, and useful, descriptive text is more likely to be clicked on by users. More Information [COLOR=#333333]
Believe me it has taken years to eliminate Google errors and warnings :)[/COLOR]
Your example is the same as mine, and they both contain duplicate content, and not just on page 2, see colors below
Page 1 ascending contains the same articles as page 3 descending
Page 2 ascending contains the same articles as page 2 descending
Page 3 ascending contains the same articles as page 1 descending
Or, more generic for n pages, page i ascending contains the same articles as page n+1-i
This is always the case, regardless of how many pages you have.