kreut — 2012-07-27T14:50:20-04:00 — #1
I've been recently initiated into the world of UTF-8 encoding and how it can help me create a database with languages such as Spanish. In my database, I'll need to add some mathematical symbols (such as the infinity sign). Though I could just turn them into html entities via php and str_replace, I'd prefer to keep the actual characters in the database. Doing some research, it seems like by changing the datatype from varchar to nvarchar, my database will be able to handle such characters.
A couple of questions regarding this:
1) If there's data already in the database and I change the column from varchar to nvarchar, will any of the existing characters be messed up?
2) It seems like nvarchar offers much more flexibility than varchar. If my application will need to deal with different languages & mathematical symbols, why wouldn't I want to use nvarchar?
Thanks so much,
r937 — 2012-07-27T15:01:56-04:00 — #2
NVARCHAR uses two bytes for each character, VARCHAR only one
the only reason not to use NVARCHAR is disk space, which, in the 21st century, hardly matters
as for converting an existing table, a quick test should be able to answer that
by the way, searching "difference between VARCHAR and NVARCHAR" or "VARCHAR versus NVARCHAR" turns up dozens and dozens of good articles
kreut — 2012-07-27T15:13:09-04:00 — #3
Thanks for the helpful info!
system — 2012-07-27T16:09:51-04:00 — #4
Oracle has gone further and one can ask for chars not bytes as storage units.
varchar2(20 char) means you can store 20 characters: 20, 40, 60, 80 bytes, or more.
varchar2(20) means you can store 20 bytes.
The maximum length of a varchar2 is 4000 bytes so the field will max out at 4000 bytes - not 4000 characters.
kreut — 2012-07-27T16:25:33-04:00 — #5
mysql: would this be an option here as well?
system — 2012-07-27T16:41:54-04:00 — #6
Not sure, but I believe there is a better one.
In MySQL you can declare the charset and collation even at column level:
MySQL: varchar(100) character set UTF8
kreut — 2012-07-27T16:48:05-04:00 — #7
guelphdad — 2012-07-27T17:13:07-04:00 — #8
nvarchar isn't a datatype supported by mysql.
r937 — 2012-07-27T18:46:04-04:00 — #9
actually, it is
kreut — 2012-07-27T19:57:07-04:00 — #10
Update: I used the Alter Table command to Modify the columns to use NVARCHAR and after doing so, I was able to add the infinity symbol as I had hoped. (Though I was a bit discouraged at first as it's not one of the drop down PHPMYadmin options)
Though everything seems to work now, I feel like I have a conceptual whole regarding utf-8 and the whole nvarchar business and lest I dig a hole for myself, I figure that I better ask away. Basically, I first investigated the utf-8 encoding because I had to add some Spanish characters to the database. And, that in fact worked just great without using nvarchar after making sure that my webpage, and database were utf-8. So..why do I need the nvarchar to make mathematical symbols work? Aren't they part of the utf-8 character set?
r937 — 2012-07-27T20:04:22-04:00 — #11
you don't need NVARCHAR if you specify utf8 on your VARCHAR -- they're the same thing
kreut — 2012-07-27T22:22:08-04:00 — #12
OK! That's becoming clearer. And doing the change at the column level does fix things. However, it still doesn't explain why the Spanish accented symbols worked but the mathematical symbols didn't. If it helps, originally I didn´t actually specify utf8 on the varchar, I did it at the database level. And, after performing the query:
SELECT default_character_set_name FROM information_schema.SCHEMATA S
WHERE schema_name = "schemaname";
I saw that it was in fact utf8 (I also know this because my Spanish accented characters are showing)
So, what might explain some of the utf8 characters working and others not working?
(As a side note, when I switch the collation between latin_1 and utf8 on the column in question, the inifinity symbol doesn't show and then does show which confuses me since I thought that collation was simply how entries are ordered.)
Any additional thoughts would be appreciated.
system — 2012-07-28T03:38:22-04:00 — #13
Well, the latin1 charset fully covers the Spanish language. So the Spanish special chars would display even without a utf8 charset. However, the math symbols are not covered in this charset.
Looking at the "Character Sets and Collations That MySQL Supports" section in MySQL docs: http://dev.mysql.com/doc/refman//5.5/en/charset-charsets.html, I don't see any latin1 or utf8 collations, those are charsets.
kreut — 2012-07-28T13:27:45-04:00 — #14
Thanks so much for the reply. After more investigation, while I thought my columns and tables were utf8, they actually weren't.
I've now set the default charset to utf8 at the database, table, and column level and all symbols are working as hoped. I appreciate all who took the time to respond to these questions.