Accented Characters

Hi,

I’m working with Accented Characters for the first time. I’m using a charset of UTF-8 but am unsure of how to save content in my MYSQL database. Does anyone know if i should be using htmlentities() or utf8_encode() to store text.

htmlentities(drämmer) would produce drämmer

while

utf8_encode(drämmer) would produce drämmer

Thanks!


htmlentities(drämmer) would produce dr[B]ä[/B]mmer

I think htmlentities() is more convenient than the latter

Just save the utf8 string as-is into your database. No need to do anything with your string first.
Just make sure mysql is set to use utf8 by default and also make sure if you use indexes on your text to set collation of indexes as UTF-8
Usually you would issue mysql command ‘SET NAMES UTF8’ before inserting or reading data from database.
Something like this:
mysql_query (‘SET NAMES utf8’);

You can google for this, there are other ways to set utf8 as your default charset for mysql.

Ok this is driving me up the wall. I have set my charset to utf8 in my html file. The table i am sending the accented characters to is set to utf8_unicode_ci. If i insert some accented characters to the database e.g. Öäfgå@emål.iå, it appears in phpMyAdmin as Ã�äfgÃ¥@emÃ¥l.iÃ¥. If i echo this out in my html file it appears as i want it to appear i.e. Öäfgå@emål.iå.

I have tried adding the suggested “mysql_query (‘SET NAMES utf8’);” after my MySQL connection but it doesnt do anything differently. Maybe i am inserting the “mysql_query (‘SET NAMES utf8’);” incorrectly. Can someone show me in the Class below how i should be adding the “mysql_query (‘SET NAMES utf8’);” function correctly. I am just learning OOP at the moment as well so im just a bit confused what i should be doing exactly!

Thanks

class MySQLDatabase {
	
	// setup Class attributes
	private $connection;
	
	// automatically run the open_connection() method
	function __construct() {
		$this->open_connection();	
	}
	
	// open database connection and then select database
	public function open_connection() {
		$this->connection = mysql_connect(DB_HOST, DB_USER, DB_PASS);
		if(!$this->connection) {
			die (mysql_error());	
		}	
		else {
			$db_select = mysql_select_db(DB_NAME, $this->connection);
			if (!$db_select) {
				die (mysql_error);
			}	
		}	
	}

// query the database
	public function query($sql) {
		$result = mysql_query($sql, $this->connection);
		$this->confirm_query($result);
		return $result;
	}	
	
	// fetch the array from database
	public function fetch_array($result_set) {
    	return mysql_fetch_array($result_set);
  	}
	
	// confirm the query
	private function confirm_query($result) {
		if (!$result) {
			die(mysql_error());
		}
	}

}

I’m not sure what problem you’re having? You said it echoes out the correct value. Therefore it’s storing the correct value…

I was under the impression that it would also be stored in the database as Öäfgå@emål.iå as well though. Maybe im wrong about this? It works ok as is i suppose but if i ever want to edit something in the database manually with accented character text i will not be able to type it directly into the database.

It may simply be that phpmyadmin is NOT using UTF-8 as it’s display charset…

What do you mean StarLion about phpmyadmin not using UTF-8 as it’s display charset? Is this something that would need to be changed in the phpmyadmin configuration file?

Looks like i was wrong earlier on when i said it still echo’s it out fine to my webpage from the database. It displays lowercase accented characters ok but i get problems with uppercase. I have also just tried manually putting the accented characters into the database i.e. Öäfgå@emål.iå but when i print this out it gets scrambled. I’m getting more confused by the hour! In this last test i have manually inserted accented characters into a UTF8 database table and echo’d it on a utf-8 charset webpage and it gets scrambled???

If you want to bite the bullet on this issue, Kore Nordmann is the man to read (see his FAQs too linked from paragraph 1).

The key to it seems to be that everything must be encoded exactly the same - starting from the text files users might be copying and pasting from ** to enter into data into the database, all the way through to the web browser.

** this is the hardest thing to detect and control, use a good IDE to open up the file and check the encoding