I have been putting an enormous amount of effort into trying to make my URL's a.) Simple, b.) Readable, and c.) Pretty
Am I wasting my time on trying to keep purely Pretty URLs? :-/
Right now I am struggling to keep a Query String out of my Article URL, but don't see how I can do that.
What I want is this...
What I likely will get stuck using is...
Yes they do.
Yes GET variables are fine and sometimes a good thing.
You're probably wasting your time if you are spending too much of it trying to keep pretty URLs. The vast majority of users don't care, not even if they are other developers. Look at LinkedIn's urls for instance. They don't' care about pretty URLs even the slightest tiny bit.
Most of that is just URL rewrite stuff....
Not quite sure why you would sort an article by date desc, but this should work just fine
I don't want to sort the Article - I want to sort and paginate the 200 User Comments below my awesome article!!
I think I figured a way to use $GET and $SESSION to grab the sort criteria and redirect to a "clean" URL, but I still have one issue dogging me...
Right now my PHP generate a Navigation Bar with hyperlinks...
// Other Page.
$commentsNav .= "<li><a href='/$section/$subsection/$article?sortname=$sortName&sortdir=$sortDir&page=$page'>$page</a></li>\\r\
I don't see how I can allow navigation, and not also have the Page in the URL?! :-/
There's absolutely nothing wrong with including paging in the url:
If you have paging on the article AND on comments, add them to the end
1.) I'm trying to keep my Article URL clutter-free
2.) I've learned in the last week that including Sorting and Pagination in a "Directory style" Pretty URL is at best not desirable for SEO, and at worst can cause trouble.
So if I have to leave in a page number, then I'd have to do this which goes against #1...
It's not the end of the world, but I want perfection!!
Says who? I have never had SEO problems caused by paging.
You could make your pagination links nofollow, which should eliminate any dupe issues (though I still don't see why it'd be an issue)
The consensus is that it is undesirable to put sort criteria in a file-path style URL, because that will create numerous URLs pointing to similar content thus causing dups.
Placing sort criteria in the Query String is much better and allows you to tell Google to ignore said criteria if you so choose.
Google will treat these as duplicate content...
Google may or may not treat these as duplicate...
But as they are written, it looks like 3 separate URLs pointing to nearly identical content, thus making it duplicate.
By having it this way...
...you can tell Google to ignore the "page" parameter, so Google will see...
Nofollows are also discouraged by Google.
In this case where page 1, 2, 3,... are different User Comments for the same Article, it might be an option, but it is better to jut let Google know what I'm doing using more accepted means.
Unless I can figure out some way to get the pagination info from hyperlinks in my Nav Bar to the Session, then the best idea is to go with something like...
Not the end of the world, but it would be so much prettier and easier to read without the mini Query String appended. sigh
I'm not sure where you got that info, but I'm pretty sure it's incorrect. I would like to see some empirical evidence to support it.
regardless, you can use canonical instead: http://moz.com/learn/seo/duplicate-content
note: this is less than a year old, but it's Cutts himself saying not to fret over dupe content: http://searchengineland.com/googles-matt-cutts-duplicate-content-wont-hurt-you-unless-it-is-spammy-167459
When my website is online "in the year 2525" then I'll give you empirical data!
Until then, we had a lively discussion here last week...
[URL="http://www.sitepoint.com/forums/showthread.php?1206682-Question-about-building-a-Pretty-URL"]Question about building a Pretty URL
Right, if you use it correctly...
Another discussion which didn't yield the response I had hoped for...
[URL="http://www.sitepoint.com/forums/showthread.php?1207791-Google-and-Duplicate-Content"]Google and Duplicate Content
I've seen all of the Google videos and read their Webmaster site pretty intensely.
Bottom-line is they are vague and do not cover the specific things I have questions about - see last thread above?!
@John_Betong ; does have a good point of waiting until my site is live and seeing what Google says.
Of course the counter to that is I want to perfect my website BEFORE it goes live...
P.S. Aesthetically were are on the same page as far as "Pretty URLs".
Where we differ is that I am now "spooked" by what @ScallioXTX ; and some others have said on and off SitePoint, and so I am busting my *ss to re-architect whatever needs to be in order to get an A+ from Google and my future users!!!
Please take a look at this thread which actually solved the problem of duplicate titles, pages, content, etc
I followed the advice and just checked Google Webmaster Tools -> HTML Improvements
[h=2]HTML ImprovementsLast updated May 5, 2014[/h]We didn't detect any content issues with your site. As we crawl your site, we check it to detect any potential issues with content on your pages, including duplicate, missing, or problematic title tags or meta descriptions. These issues won't prevent your site from appearing in Google search results, but paying attention to them can provide Google with more information and even help drive traffic to your site. For example, title and meta description text can appear in search results, and useful, descriptive text is more likely to be clicked on by users. More Information
Your mission to create a perfect site before publishing will no doubt prevent your site ever going live because the goal posts are changing on a daily basis.
just my two cents
Yes, It seems like you're spending a lot of time on something so very insignificant. I wouldn't believe that there is any noticeable impact between using using GET string instead of having some *perfected "pretty URL". Complete waste of time and effort but of course the SEO people will say otherwise because that is what they sell.
Oh come on, you can do better than that...
If I have 10, 20, 30 different URLs all pointing to the SAME CONTENT you honestly don't think Google would have issues???
What seems to be lacking in this discussion, and what Matt Cutts seems to also leave out is this...
IF you are just concerned about "duplicate content" when it comes to Pagination, then the solution is easy... Just use rel="prev" and rel="next"
And IF you are just concerned about "duplicate content" when it comes to Sorting, then you can probably get away with rel="canonical"
But the question I keep asking - and no one has answered - is, "How do you account for both Pagination and Sorting in the same URL?"
Having Sorting and Pagination together is like a Cartesian Product and makes things much more complex.
Now, in an ideal world, Debbie could have her original Pretty URL for the Subsection landing page like this...
And this URL for a given Article (with Comments)...
Unfortunately, it doesn't seem like we live in an "ideal world", and as far as I can tell from watching Matt Cutts and other Google videos, plus doing lots and lots of reading online - from reputable sources - I think my original URLs would not do so well as far as SEO goes.
And since my website is almost done, and I have labored so hard, why not fix things now, so my great Articles don't end up on Page 5 of the Search Results?!
That is hardly "wasting my time on something insignificant"!!
You've got the tools - you just need to use them correctly. If you're on page one with no criteria, you leave the rel attribute out - if you're on a page where it's sorted or paged, you would use one of the rel attributes. Or if you'd like, you could use the canonical link on the sorted page which might point to the paged link, which would use the prev/next rel attribute.
I think the point everyone's trying to make is, not every link is indexed by Google (that's what the rel attribute is for). Concentrate on getting your straight, vanilla site content to index well, and don't worry about the sorted/indexed pages. Let the rel attribute do what it does, and not penalize you. Granted, you may not get much positive link juice from the sorted/paginated pages, but you won't get penalized.
But again, no one - including Google - clearly address the issue of Sorting + Pagination.
If you read what Google says, then their advice on Pagination cancels out what they advice on Sorting?!
Like most things in life, I suspect there is a way to make it work for just Sorting or just Pagination or both Sorting and Pagination, but Google doesn't clearly spell out what to do.
This thread is splintering fast, but I'll gladly do a "deeper dive" of what I mean - with examples - for those who are interested.
I think we agree that the best way to address this issue is with a live website!! (Hey, I'm trying to get there as fast as I can!!)
After sleeping on things last night, I have decided to "chill" as far as my Subsection landing page which provides a listing of all Articles in the form of Article-Summaries.
But the one area I'm still pulling my hair out over is my Article template page. (That is what will make or break my website!!!)
If I didn't have Comments at the bottom, it would be easy-peasy.
But thinking users may want to sort Comments, and that it might also make sense to Paginate Comments, that takes my plain-vanilla Article template and possibly screws up its SEO if I don't do things properly...
All part of the learning experience, I suppose?! (:
Google Chrome and Opera both hide all of the URL except the domain name except when you are actually in that field. The other browsers should soon be doing the same. The domain name is the only piece that your visitors will care about - if they care about that much. The days when it was necessary to try to produce easy to remember addresses is long gone. People can now share addresses even without knowing the domain.
Use rel next and prev, and don't use rel canonical.
1) Canonical will prevent indexing of content from non-canonical pages (source, slide 30), and you want every paginated page to be indexed.
2) A page sorted by date descending is likely to show very different results than a page sorted by title ascending, and therefore are not duplicates of each other.
EDIT: If you wanted to get super tricky, you could have Google index only one of the many possible sort orders.
"Long gone" seems awfully exaggerated. Chrome's "hide URL" feature hasn't even appeared yet in the stable build. Last I checked it was only in Canary. We're a long ways off from "long gone."
That is consistent with what I have been reading with Google.
Jeff, originally I said the exact same thing.
But then @ScallioXTX ; and I started to mix it up in this thread (Question about building a Pretty URL) starting at Post #11.
I guess ScallioXTX made me second-guess myself, and was the main instigator in my flipping out about SEO!!
So do you stand by your - and my former - argument that Sorted and Paginated pages do NOT contain duplicate content on a page-by-page basis, or do you think he has a point?
Also, do you still stick by your advice to me earlier that it is better to leave permanent parts of your website in the URL (e.g. "www.debbie/finance/markets") and place things that change - like sorting parameters - in the Query String (e.g. "?sortname=by-date&sortdir=desc&page=3") ?
I would like to think that I have taken most of people's advice here on SitePoint - as well as Google's suggestions - and implemented them on my site.
However, I still worry about my Article page and how I should handle Comments.
It is almost like I am merging two disparate things onto one page and one URL.
I was able to figure out how to remove the sorting parameters (i.e. "by-date" and "desc") from the URL and stick them in $_SESSION, so that solves one problem.
(Where "page=3" shows the 3rd page/block of User-Comments for the Article, while still showing the main Article above.)
I guess this URL doesn't look too bad, and since my PHP generates the rel="prev" and rel="next" for each corresponding page, hopefully that is all Google needs to understand that all of these URL's belong to the main Article...
But I dunno... :shifty:
He has a point. The strongest information I've found so far that backs up ScallioXTX is Google's URL parameters article and video. In the video, she gives a filtering example. One page has ?category=YouTube where 20 items are listed, and another page has ?category=YouTube&size=M where a subset of 5 items are listed. In this case, she says Google would rather crawl the first URL to see all 20 items rather than crawl both URLs and see a redundant 5 items. Though, she doesn't say whether this helps or hurts you ranking. She only says that this lets Google crawl more efficiently.
She starts talking about sorting at around 6:20. She says developers should ask themselves, "Can Googlebot discover everything useful when the sort parameter isn't displayed [that is, not in efffect]?" If yes, then you could (should?) tell Google not to crawl the sorted URLs.
Yes, but not for SEO reasons. For ease of implementation reasons.
Let's say the user arrives at /finance/markets, and they click to the next page. Presumably that URL will be /finance/markets/2. You'll have to be careful to distinguish that kind of URL from, say, /finance/markets/by-date. Same for sortdir.
This is another case where you would end up with a URL namespace clash. That is, different kinds of information can appear in the same spot, and you'll likely have to develop a complicated set of pattern matching rules in order to always do the right thing. Whereas the query string is perfectly suited for optional, un-ordered parameters.
This one's especially fun, because if you go to page 2 of the comments, is that page new or duplicate content? The answer seems to be half-and-half. The article will be the same, but the comments will be new. In this case, I'd err on the side of indexing too much rather than too little.
EDIT: Alternatively, page 2+ of the comments could be made to not display the article at all, and instead use that extra space to display twice as many comments.
Well... ish. You may have made the machines happy, but we humans got the short end of the stick. Let's say I wanted to share a link of your most recent stuff, which means sorted by date descending. But now I can't, because the URL for that doesn't exist.
next page →