Problem with non-ascii characters in a link

Hi,
I have a link like this:

http://uzman-bilgisayar.simpg.net/Güvenlikeri-p42.html

Because of the u with dots … it comes out like this:

http://uzman-bilgisayar.simpg.net/Güvenlikeri-p42.html

This should get directed to page 42 of the website by my .htaccess
but it doesn’t.

I have tried to remove the “%” with :

$mn_link2 = str_replace('%','',$mn_link[2]);
echo "<li><a href='$mn_link2'>$mn_name[2]</a></li>";

But still I can not get a clean link

Anyone know why ?

PS
I tried the same code to take out ‘e’ from same link
And that works.

i.e.

$mn_link2 = str_replace('e','',$mn_link[2]);
echo "<li><a href='$mn_link2'>$mn_name[2]</a></li>";


I tried to take out u with dots : ü

$mn_link2 = str_replace('ü','',$mn_link[2]);
echo "<li><a href='$mn_link2'>$mn_name[2]</a></li>";  

But that didn’t work

I also tried:

$mn_link2 = str_replace('%C3%BC','',$mn_link[2]);
echo "<li><a href='$mn_link2'>$mn_name[2]</a></li>";

But again … that didn’t work

Does anyone how I can clean up this url data before
using it as a link ?

BTW - I want the Turkish spelling in the link text - just not in the
link url itself … and it can be replaced with anything as it is the “-p42.html”
that is important for the redirect.

Thanks.

.

Post your .htaccess rewrite rule too so we can compare the whole process.

OK

Here is my .htaccess

Options +SymLinksifOwnerMatch 
RewriteEngine On

# BELOW IS STUFF TO BLOCK SPAMMING ATTACKS 
######################################################
# Block out any script trying to set a mosConfig value through the URL
RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]

# Block out any script trying to base64_encode crap to send via URL
RewriteCond %{QUERY_STRING} base64_encode.*(.*) [OR]

# Block out any script that includes a &lt;script&gt; tag in URL
RewriteCond %{QUERY_STRING} (&lt;|%3C).*script.*(&gt;|%3E) [NC,OR]

# Block out any script trying to set a PHP GLOBALS variable via URL
RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]

# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})

# Send all blocked request to homepage with 403 Forbidden error!
RewriteRule ^(.*)$ index.php [NC,L]
#
######################################################

# GETTING RSS FILE BY PAGE NUMBER 
# http://villarentfethiye.simpg.net/rss_feed-5.xml

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^[\\.0-9,:\\/-a-z]+rss_feed-([0-9]+)\\.xml$ http://simpg.net/rss_feed.php?rss=$1 [NC,QSA,L]

# GETTING SUPPORTING PAGES BY PAGE NO
# http://some-name.mobi6.net/greatest-gadget-p13.html

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^[\\.0-9,:\\/-a-z]+-p([0-9]+)\\.html$ http://simpg.net/info.php?p=$1 [NC,QSA,L]

# GETTING MAIN PAGE BY URL NAME
RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

RewriteCond %{REQUEST_FILENAME} !-f  
RewriteCond %{REQUEST_FILENAME} !-d  
RewriteRule ^(.*)$ 404.php?url=$1 [L]

Hope that helps.

PS - The Feed re-direct is not working.
That is a subject of a different thread. :frowning:

.

.

Okay, if I had to venture a guess, the a-z won’t encompass the non-ascii characters.

I was able to get it to work using:

RewriteRule ^(.*?)-p([0-9]+)\\.html$ info.php?p=$2 [NC,QSA,L]

, but I imagine there may be a better solution…

OK - that’s fixes it., :smiley:

Thanks.

I thought it might also fix the rss_feed re-direct problem

So I changed that rule to:


# GETTING RSS FILE BY PAGE NUMBER
# http://villarentfethiye.simpg.net/rss_feed-5.xml

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^(.*?)rss_feed-([0-9]+)\\.xml$ rss_feed.php?rss=$2 [NC,QSA,L]


But when I click on my RSS image I get taken to this address:
[B]http://simpg.net/info.php?a=villarentfethiye&rss=5[/B]

I am testing it on this page: Mysite

That &rss=5 is the correct page number - so it is nearly working !!

The complete .htaccess file is this:

Options +SymLinksifOwnerMatch
RewriteEngine On

BELOW IS STUFF TO BLOCK SPAMMING ATTACKS

######################################################

Block out any script trying to set a mosConfig value through the URL

RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [OR]

Block out any script trying to base64_encode crap to send via URL

RewriteCond %{QUERY_STRING} base64_encode.(.) [OR]

Block out any script that includes a <script> tag in URL

RewriteCond %{QUERY_STRING} (<|%3C).script.(>|%3E) [NC,OR]

Block out any script trying to set a PHP GLOBALS variable via URL

RewriteCond %{QUERY_STRING} GLOBALS(=|[|%[0-9A-Z]{0,2}) [OR]

Block out any script trying to modify a _REQUEST variable via URL

RewriteCond %{QUERY_STRING} _REQUEST(=|[|%[0-9A-Z]{0,2})

Send all blocked request to homepage with 403 Forbidden error!

RewriteRule ^(.*)$ index.php [NC,L]

######################################################

Redirect old file path to new file path

Redirect vacationvillasfethiyerental.villarentfethiye.simpg.net http://example.com/newdirectory/newfile.html

To block an IP address:

RewriteCond %{REMOTE_ADDR} ^(A\.B\.C\.D)$

RewriteRule ^/* http://www.domain.com/sorry.html [L]

Re-direct for broken images

RewriteCond %{REQUEST_FILENAME} !-f

RewriteRule ^images/.*\.jpg$ /images/default.jpg [L]

GETTING RSS FILE BY PAGE NUMBER

http://villarentfethiye.simpg.net/rss_feed-5.xml

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^(.*?)rss_feed-([0-9]+)\.xml$ rss_feed.php?rss=$2 [NC,QSA,L]

GETTING SUPPORTING PAGES BY PAGE NO

http://some-name.mobi6.net/greatest-gadget-p13.html

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^(.*?)-p([0-9]+)\.html$ info.php?p=$2 [NC,QSA,L]

GETTING MAIN PAGE BY URL NAME

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ 404.php?url=$1 [L]

Can you see what I have done wrong ??

Thanks again.

.

Yeah, but I’m struggling figuring out how I’d fix it.

In short, here is what you have happening:

Initial URL: villarentfethiye.simpg.net/rss_feed-5.xml

Gets caught by

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^(.*?)rss_feed-([0-9]+)\\.xml$ rss_feed.php?rss=$2 [NC,QSA,L]

Which produces the following path: villarentfethiye.simpg.net/rss_feed.php?rss=5

And that gets caught by

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

And produces the final result of: simpg.net/info.php?a=villarentfethiye&rss=5

We need to prevent that final Rewrite Rule from executing against your RSS feed. @dklynn ; Got any advice?

Looking at this one more time, the following thought crossed my mind, try changing:

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

To:

RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteCond %{QUERY_STRING} ^rss= [NC, OR]
RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

Or: (this may be a better solution)

RewriteCond %{REQUEST_FILENAME} !-f #do not run this rule if the requested file actually exists
RewriteCond %{HTTP_HOST} ^(.+).simpg.net$ [NC]
RewriteCond %{HTTP_HOST} !^www.simpg.net$ [NC]
RewriteRule ^.*$ http://simpg.net/info.php?a=%1 [NC,QSA,L]

Hi.

I thought that the “L” meant LAST command, so it should not
execute any more redirect anyway ??

Maybe it is just Last line in that bunch of commands,
meaning redirect now.

Anyway, I tried both the suggestions and unfortunately I get 500
Internal Server Errors on both. :injured:

jekko,

I used to believe that the Last flag meant “}”, i.e., end of the current RewriteRule (with its RewriteCond’s) statement. NOT SO. It tells Apache to immediately update the {REQUEST_URI} and begin another pass through the .htaccess (from the one in the DocumentRoot).

As for your original question, the Internet used to be “ASCII-centric.” Apache certainly took that to heart and looks at encoded characters in a different way.

From my experience answering similar questions (space in the URI, etc), I’ve discovered that you can use accented/encoded characters within a character range definition (be sure to escape a space with a /). I’ve not tested the series of accented characters but I’d be willing to bet that they can be defined in a range just as easily as a-z. Don’t worry if characters are encoded in the URI, Apache knows what they look like when they’re decoded.

For more information on URIs, have a look at the geeky Uniform Resource Identifiers (URI): Generic Syntax. It’s well worth the effort to read.

Regards,

DK