casbboy — 2014-05-03T16:08:39-04:00 — #1
While I don't understand it,
Tons of incoming links to our sites have either an accidental extra space or some strange character at the end of the url in their href, which creates 404 returns when they arrive to my site.
Is it possible to have a rewrite condition that if it spots three things, either "\", or any url encoding to 301 redirect that out. They are always at the end.
I'm seeing "\", %5C, %3C, %20, and a few others.
Trying to enjoy the link gain rather then feeding 404.
I use NGINX.
dklynn — 2014-05-03T17:32:14-04:00 — #2
mod_rewrite understands encoded characters so you can include them inside character range definitions as their "real" (printable) character. Note, though, that %20's space is used in the syntax of a mod_rewrite statement so it MUST be escaped (with a \). The \ character should never be used in a URI and it should be escaped if you're trying to match it within a query string.
NGINX? Dunno, but if they're trying to emulate the Apache.org people, it'll be the same.
casbboy — 2014-05-13T02:41:54-04:00 — #3
So my main goal it would be seem is to do a rewrite where, if the URL does contain a '%' sign, suggesting bad formed URL, I just stop the url and 301 redirect to the entire URL before the % sign (and leave off whatever comes after).
Working on a way to do that, but haven't found a solution just yet. And you are right, NGINX is very close to mod_rewrite.
dklynn — 2014-05-14T07:01:35-04:00 — #4
I suspect that you can't "catch" a % in a URL because it's a reserved character in a URI which denotes character encoding. In that case, I'd suggest (1) checking your visitor log for "hack attacks" utilizing character encoding like that and (2) specify the characters you will accept at your website (i.e., [a-z.]+ for all lowercase characters and the dot character ... aw, add the / character, too, and Fail everyone else! You can only do so much babysitting visitors and a redirection (rather than my harsh Fail) should take visitors to a sitemap which can be used to single-click to every page in your website.