Changing charset and collation in a populated MySQL database

I’ve never had to do it so I don’t know. But this thread  characters appearing in database values made me wonder, is there a “safe” or “minimal work involved” way to change the character encoding and collation from the MySQL defaults to UTF-8 without making a total mess of things?

I think you should specify what “MySQL defaults” you are referring to because my MySQL installation has utf8 as default - but I suppose you mean latin1 :).

The simplest answer is - just change the collation of your text columns to utf8 using ALTER TABLE and all data will be converted without any loss of information (because utf8 can represent all latin characters). The only nuisance can be natural primary key columns that you are using as foreign keys - if you have cases like that then you’ll have to disable foreign key checks before changing their collation because the collation of columns joined with a FK must be identical and normally MySQL will not let you change one first and then the other.

That’s all there is to it as far as converting the data is concerned.

Another thing is the character set used by your application. It is advisable that your application explicitly sets the character set of database connection instead of relying on server defaults, which is done using SET NAMES, $mysqli->set_charset(), and the like. This is the character set in which your application will receive and send data regardless of the character set of the database. So these are two separate things and actually you can change the character set of your database tables and your application will be unaffected by that change as long as both character sets can represent the characters that you are using.

If I were in your shoes I would split the task into 2 steps:

  1. Adjust your application to use utf8 and to send/receive data in utf8 - do things like SET NAMES(‘utf8’), or set_charset(‘utf8’) on your db driver, etc. And all other things like sending web pages in utf8, processing data in utf8, etc. The db can remain in latin1 while your application will work in utf8 - all will be well as long as you stay within the range of latin1 characters.

  2. Change all text columns to a utf8 collation. Once you do this you will be able to use the full set of unicode characters.