Problem with persian to url for clean url?

hello
i have .htaccess for clean url of title, my problem is with persian to url. if persian be to url that page have error 404 else not error and that page is true. why and what do i do?

codes:
.htaccess


RewriteEngine On

RewriteRule ^([a-zA-Z0-9-/]+)$ article.php?url=$1
RewriteRule ^([a-zA-Z0-9-/]+)/$ article.php?url=$1

Insert to databes(php):

<?php
include('config.php');

function string_limit_words($string, $word_limit) {
   $words = explode(' ', $string);
   return implode(' ', array_slice($words, 0, $word_limit));
}
$blog='';

if($_SERVER["REQUEST_METHOD"] == "POST")
{
$title=mysql_real_escape_string($_POST['title']);
$body=mysql_real_escape_string($_POST['body']);
$title=htmlentities($title);
$body=htmlentities($body);
$date=date("Y/m/d");

$newtitle=string_limit_words($title, 6);
$urltitle=preg_replace('/[^a-z0-9]/i',' ', $newtitle);

$newurltitle=str_replace(" ","-",$newtitle);
$url=$newurltitle;


mysql_query("insert into blog(title,body,url) values('$title','$body','$url')");
if(isset($newurltitle)){
$blogurl="http://localhost/seo/$url";
}
}

?>

get url and show info of database:

<?php
include('config.php');

if($_GET['url'])
{

$url=mysql_real_escape_string($_GET['url']);
$url=$url;
$sql=mysql_query("select title,body from blog where url='$url'");
$count=mysql_num_rows($sql);
$row=mysql_fetch_array($sql);
$title=$row['title'];
$body=$row['body'];
}
else
{
echo '404 Not URL Available.';
}

?>

no, isn’t working codes. see:

publish.php


<?php
include('config.php');

function string_limit_words($string, $word_limit) {
   $words = explode(' ', $string);
   return implode(' ', array_slice($words, 0, $word_limit));
}
$blog='';

if($_SERVER["REQUEST_METHOD"] == "POST")
{
$title=mysql_real_escape_string($_POST['title']);
$body=mysql_real_escape_string($_POST['body']);
$title=htmlentities($title);
$body=htmlentities($body);
$date=date("Y/m/d");

$newtitle=string_limit_words($title, 6);
$urltitle=preg_replace('/[^a-z0-9]/i',' ', $newtitle);

$newurltitle=str_replace(" ","-",$newtitle);
$url=$newurltitle;


mysql_query("insert into blog(title,body,url) values('$title','$body','$url')");
if(isset($newurltitle)){
$blogurl="http://localhost/seo/$url";
}
}

?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
 
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 
 
<title>SEO Friendly URLs</title> 
 
<style> 
.editbox
{
display:none
}
td
{
padding:7px;
}
body
{
font-family:Arial, Helvetica, sans-serif;
font-size:14px;
}

 .shade
{
box-shadow:0px 0px 18px #000000;
-moz-box-shadow:0px 0px 18px #000000;
-webkit-box-shadow:0px 0px 18px #000000;
border-radius: 8px;-moz-border-radius: 8px; -webkit-border-radius: 8px;
} 


 
</style> 
 
</head> 
 
<body bgcolor="#dedede"> 
<div style="margin:0 auto; width:750px; padding:10px; background-color:#fff; height:800px;" class="shade"> 
<div style="margin-top:10px;"> </div> 
<h2><a href="<?php if(isset($blogurl)){ echo $blogurl;} ?>"><?php if(isset($blogurl)){ echo $blogurl;} ?></a></h2>

<h1>SEO Friendly URLs with PHP</h1> 
 <br><br>
<form method="post" action="">
<table width="100%">
<tr>
<td width="80px" valign="top">
<b>Title:</b>
</td>
<td><input type="text"  style="width:400px;border:solid 2px #006699; padding:5px" name="title"/></td>
</tr>
<tr>
<td width="100px" valign="top">
<b>Body:</b>
</td>
<td><textarea name="body" style="width:400px; height:200px; border:solid 2px #006699; padding:5px"></textarea></td>
</tr>

<tr>
<td width="100px">

</td>
<td><input type="submit"  value=" Publish "/></td>
</tr>


</table>
</form>
</div>

 
</body></html>

article.php


<?php
include('config.php');

if($_GET['url'])
{

$url=mysql_real_escape_string($_GET['url']);
$url=$url;
$sql=mysql_query("select title,body from blog where url='$url'");
$count=mysql_num_rows($sql);
$row=mysql_fetch_array($sql);
$title=$row['title'];
$body=$row['body'];
}
else
{
echo '404 Not URL Available.';
}

?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 
 
<html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> 
 
<title><?php echo $title; ?></title> 

<style> 
.editbox
{
display:none
}
td
{
padding:7px;
}
body
{
font-family:Arial, Helvetica, sans-serif;
font-size:14px;
}

 .shade
{
box-shadow:0px 0px 18px #000000;
-moz-box-shadow:0px 0px 18px #000000;
-webkit-box-shadow:0px 0px 18px #000000;
border-radius: 8px;-moz-border-radius: 8px; -webkit-border-radius: 8px;
} 


 
</style> 
 
</head> 
 
<body bgcolor="#dedede"> 
<div style="margin:0 auto; width:750px; padding:10px; background-color:#fff; height:800px;" class="shade"> 


<?php 
if($count)
{
echo "<h1>$title</h1>
<div class='body'>$body</div>";
}
else
{
echo "<h1>Not URL Available 404.</h1>";
}

?>

</div>

 
</body></html>

this error:Not URL Available 404. is of publish.php. not know what is error!?

I dont know problem is of code php or .htaccess???

Try


RewriteEngine On

RewriteRule ^(.+)$ article.php?url=$1
RewriteRule ^(.+)/$ article.php?url=$1

(and don’t tell anyone I suggested you use .*'s evil twin brother :hush:)

thanks!
i use of top code to insert page have this error :


Not URL Available 404.

Are you saying what I suggested isn’t working?

Sounds like a character encoding problem.

In article.php, could you please add the following on the very first line (after <?php) and post what output you get and if that is what you expect?


var_dump($_GET);

The first thing you need to do is change this:

if($_GET['url'])

to:

if(!empty($_GET['url']))

then, if you are sending the requests to index.php you need to use:

 <IfModule mod_rewrite.c>
 RewriteEngine On
 RewriteBase /
 RewriteRule ^index\\.php$ - [L]
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteCond %{REQUEST_FILENAME} !-d
 RewriteRule . /index.php [L]
 </IfModule>

Your current mod_rewrite will not allow you to access other files on your site and will result in error 404. For example, Google cannot find your robots.txt using that code, it redirects google to index.php. The above code says that if the requested file or directory does not exist, send the data to index.php, other wise load the directory or file requested.

use and get me this:

array(1) { ["url"]=> string(11) "article.php" }

this work and get me url and go error but not work url and have error 404:

Object not found!

The requested URL was not found on this server. The link on the referring page seems to be wrong or outdated. Please inform the author of that page about the error.

If you think this is a server error, please contact the webmaster.

Error 404

localhost
6/18/2011 4:01:33 AM
Apache/2.2.12 (Win32) DAV/2 mod_ssl/2.2.12 OpenSSL/0.9.8k mod_autoindex_color PHP/5.3.0 mod_perl/2.0.4 Perl/v5.10.0

i use of localhost (XAMPP).

my .htaccess is:

RewriteEngine On

RewriteRule ^([a-zA-Z0-9-/]+)$ article.php?url=$1
RewriteRule ^([a-zA-Z0-9-/]+)/$ article.php?url=$1

an change to:

<IfModule mod_rewrite.c>
 RewriteEngine On
 RewriteBase /
 RewriteRule ^article\\.php$ - [L]
 RewriteCond %{REQUEST_FILENAME} !-f
 RewriteCond %{REQUEST_FILENAME} !-d
 RewriteRule . /article.php [L]
 </IfModule>

Oops, totally forgot you have article.php and not index.php.

It does not look like char encoding to me, looks like mod_rewrite error only.

Give this a go. If this does not work, post one of the links with query string so I can see it please.

If this is NOT in your web root, you need to add:


RewriteBase /NameOfContainingDirectoryHere/

If not in the web root, this might be the sole issue. Change that and try before doing anything else.

Try this:


RewriteEngine On
RewriteBase /path/to/script/
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^([A-Za-z0-9-_]+)/$ article.php?url=$1 [QSA,L]

That allows for letters, numbers, dashes and underscores in the URL. If that is not enough try adding the characters in the code or use a wildcard.

Gonna go watcha movie now, but hope this does it for you!

not work!!!

Is the .htaccess you posted in your first post the complete .htaccess? If it isn’t, could you please post all of it?

yes that is full. i send a message for you please check that…

As I suspected the problem is indeed with character encoding (I don’t ~think~ it is, I ~know~ it is), but I have no idea how to solve it :s

So if anyone else wants to have a stab at it, this is what’s happening.

As a very basic example, consider this .htaccess


RewriteRule ^(.*)$ test.php?url=$1 [L]

and for our PHP script:


var_dump($_GET);

This will just redirect everything to test.php, and put the original URL in $_GET[‘url’].

So far so good, and this works as expected for western languages. However, it all goes bonkers when we throw Persian in the mix.

For example the Persian string دربار

So, we go to /دربار and we see …


array
  'url' => string 'دربار' (length=10)

The problem is that I have no idea whatsoever what encoding that is :s

It’s not utf8, because utf8_decode only gives


string '?????' (length=5)

(However I might point out that if you put an é in the URL and use utf8_decode that works fine, so Apache sometimes does use utf8!)

So then I though it may be utf-16, but I couldn’t find how to decode that (iconv and mb_convert_encoding didn’t seem to help). All I got was other kinds of garbage.

So all in all, the problem is:

How do you get from دربار back to دربار ?

In all honestly, I wouldn’t unless a very good reason occurred; it would be too expensive. I would just create an easy(ier) to handle slug from the intended URL, a hash of some sort.

SELECT id, title, body FROM content WHERE slug = MD5($url) LIMIT 1;

It would be a good to find an answer, but PHP isn’t known for handling UTF-8 too well, never mind any other encodings.

Thanks for the PM Binboy, fixing you up now.

As far as encoding, it does not work with english either. That theory is shot down, I just tried it on the url specified by the poster.

All fixed, try it yourself and let me know if you have any issues. I have to update the PHP code to convert the UTF8 characters to single byte characters.


RewriteEngine On
RewriteBase /
RewriteCond %{SCRIPT_FILENAME} !-d
RewriteCond %{SCRIPT_FILENAME} !-f
RewriteRule ^seo/(.*)/?$ article.php?url=$1 [QSA,L]

You forgot to add seo/ to the rewriterule and you didn’t set the rewritebase.

I will make a few script changes for you after my morning cup of Joe, I see a few areas that need changed.

Now that the base issue was fixed, because the mod_rewrite did not work at all, I changed up your scripts a bit.

You can handle utf8 characters now… but make sure your database collation is for UTF8 too. The utf8 characters cannot show in the URL encoded though, so utf8 chars are being decoded and then encoded. How it works now, is that it will remove all of the mutibyte characters from the url, but leave them in the title. That is the closest you are going to get without a lot of coding.

Please note the security functions I added to config.php.

Also a few tips:

  1. htmlspecialchars() converts special characters to HTML entities.

  2. htmlentities() does the same except it also encodes quotes. It is probably best to use str_replace() to remove quotes instead. str_replace is faster than preg_replace() btw.

  3. Only use mysql_real_escape_string() right before or during the actual mysql query. It is not needed for the rest of the script, only the query itself. It adds backslashes to your string in certain situations.

  4. Make use of strip_tags(), not just htmlspecialchars(). It removes all html tags. Call this before calling htmlspecialchars.

  5. Your preg_replace had: ‘/[^a-z0-9]/i’ and this does not allow for upper case letters, so it would actually be: ‘/[^a-zA-Z0-9]/i’

Thank you very much,
I try to take on another script and do the test there…

1)Why not show the address of Persian words and numbers show?! (i want shoe Persian words to url for best seo)
2)For seo is best URL encoded or no?
3)Why not send duplicate entry for title? (i want free entry for title )