UTF8, but when set it doesn't work?

Hi,

Got a bit of a weird one, which I can’t quite pin down.

Forum IvoireLink.net: Main Index

Its currently set as this, which kinda works:

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

…but when I change it to UTF-8 (as its a french site), it buggers up ALL the other foreign charachters, and fixes the “Voici la catégorie de sports” part, so it shows up correctly as “Voici la catégorie de sports”

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

Can anyone spot my bodge up? I’m guessing its something stupid - but I’ve never ever seen it when some charachters work, but not all of them - and vice versa when changing it to UTF 8 :confused:

TIA!

Andy

Are you… saving the file as UTF-8, or are you saving your ISO-8859-1 file with just the character meta changed?

Does your UTF-8 in the character meta match the mime-type being served?

They all need to match. Just changing the META to read UTF-8 doesn’t mean the file is saved encoded AS UTF-8.

So, you need these three to match:
File Format/Encoding
Meta saying what encoding is used
Mime-type on the server.

Sounds like you’ve got one, maybe two of those and not all three.

Of course as a forums you also have how the posts were char-accepted meaning the data stored in the SQL databases may not be encoded to UTF-8, completely boning any chance of your existing posts ever being served as UTF-8 properly without adding more php to translate the old ones on the fly; this is why changing character encodings on an existing website is most always a disaster. Is that forum script set up to send utf-8?

Hy
If the page of your site are dinamically generated by a php script, you should have in your php file this code:

header('Content-type: text/html; charset=utf-8');

Also, you should save the file Encoded in UTF-8 , with the editor you use.

Viewing your source, I find this:

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

Since your server response header does not specify character encoding, the meta statement rules. Forcing the browser to utf-8 shows you are using utf-8 encoding, so changing the meta statement should fix things up.

cheers,

gary

Notice ultranerds said changing that meta was what’s messing it up – hence it being back at the unbroken version – and hence his problem NOT lying with the meta, but lying in changing the meta and nothing else.

Jason, When I looked at the linked page, the meta statement set character encoding to iso-8859-1, and character rendering indicated that multi-byte character were rendered as multiple single byte characters. Forcing FF to use utf-8 as the encoding made the character renderings correct. From that I deduce the actual encoding is, indeed, utf-8 and, since the server does not set encoding, that leaves the meta element to do so.

Did you test as I did, or take the OP’s word for what was done?

with

<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1" />

Sports - Voici la catégorie de sports

Forcing utf-8 decoding:

Sports - Voici la catégorie de sports

Server response header:

Date: Mon, 25 Jul 2011 00:28:09 GMT
Server: Apache/2.2.19 (Unix) mod_ssl/2.2.19 OpenSSL/0.9.8e-fips-rhel5 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Expires: Tue, 25 Jan 2000 12:00:00 GMT
Cache-Control: no-store
Pragma: no-cache
Last-Modified: Mon, 25 Jul 2011 00:28:10 GMT
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html

200 OK

cheers,

gary

Funny, it was the other way around when I tested… we’re probably looking at shifting code as he tries to figure it out.

Making that change using Opera’s editor just made the page worse – Opera still reporting ISO-8859-1 even with the META – but that’s consistent with the behavior of just trying to use the meta to change that in the first place.

Though looking deeper it has all sorts of code errors that could be putting the rendering all over the place across browsers. (originally I just looked at it in Opera). LINK inside BODY, MULTIPLE HEAD and BODY elements…

AHA, that’s why Opera’s ignoring it… all content after the second HEAD goes back to the default; ISO-8859-1… you say HEAD twice and BODY twice, don’t expect things to be applied properly.

Yeah, I saw the syntax errors, but figured 1) fix the first things first, and 2) you’d be refactoring the code anyway. :eek:

cheers,

gary

Hi,

Thanks for the replies everyone :slight_smile: I see what you mean, there are 2 BODY and 2 HEAD tags. Lemme try and fix those up, and see if that helps (I expect it will, as you said - its resetting the encoding for the page once it reaches the 2nd head)

Will keep you posted

Cheers

Andy

I’ve fixed up that part of it, but still no joy (I’ve removed the 2nd instances of <head> and <body>), yet not change. I also tried removing the extra stuff (scripts, link, etc) after the closing </head> tag, but that didn’t help either

I’ve also change the meta-type now to utf-8, so you can see the issue I’m having

<meta http-equiv="content-type" content="text/html; charset=utf-8" />

Any more suggestions?

TIA!

Andy

Right now I’m seeing
Actualit�s

Server says UFT-8, browser is set to unicode (utf-8) setting in Firefox. Since I’m seeing the ? I’d either say the document wasn’t originally saved as UTF-8 (though Gary says he sees otherwise) or that somewhere the document gets converted to latin-1 and then back.

I usually got this type of error (?'s instead of é stuff) when I wrote and saved UTF-8 documents and someone decided to host them on a Latin-1 server :S my only solution to that kind of lack of control was to take everyone outside US-ASCII and manually writing out character entities. :frowning: Since you have control of the server you shouldn’t have to do that.

Cute – your forums part is working properly, everything outside the forums is not. It was the other way around with the meta the other way…

This is most likely because the stuff you have around the forums isn’t saved as UTF-8 encoding. Load the files in, and make sure they’re being saved with the right encoding.

Hi,

Thanks for the reply. Man this is doing my nut in :frowning:

The pages are served up using a “template” system, and in that template I see the part:

              <select id="slct" name="what">
                <option value="news" <%if what eq "news"%>selected="yes"<%endif%>>Actualité</option>
                <option value="events" <%if what eq "events"%>selected="yes"<%endif%>>Evénéments</option>
                <option value="yellow" <%if what eq "yellow"%>selected="yes"<%endif%>>Pages Jaunes</option>
                <option value="classifieds" <%if what eq "classifieds"%>selected="yes"<%endif%>>Pétites Annonces</option>
              </select>

(so the encoding is fine there)

That template is set as ANSII. Same goes for the forum homepage templates.

Yeah, thats cos the META encoding was changed to UTF-8 (which is why I was confused to hell as to why when I changed from “normal” encoding into UTF8, it then reversed the encoding

ARGH!!! Thanks anyway guys, I’ll keep digging and see if I can come up with anything

Cheers

Right now I’m seeing
Actualit�s

The OP needs to open his html and template files files and “save as” in utf-8.

cheers,

gary

OMG, just tried that and it seems to have worked (gotta go through several hundred templates though to change them into UTF8 format, so may take a while - unless there is a SSH command I can run to do this quicker? ;))

Thanks!

Andy

The double header/body issue implies to me you are including those forum files. The forum templates are already in utf-8, are they not? If so, you needn’t bother with them.

cheers,

gary

Hi,

Nope, all of them were in ANSI, not UTF8 format. I’ve done this now with them all, and it works like a charm - thanks :slight_smile:

Andy

More than once I’ve seen people on these forums with documents which were saved as “ANSI” (I’m not even sure what that is, I thought it was a standards body)… it seems copies of Notepad and other text editors (esp outside the US?) are defaulted to that.

Wonder if it would be a good idea to have a charset/MIME type sticky thread somewhere in the forums we could point people too? (with a link to that W3C page that explains the BOM pretty well)

ANSI is the organisation which set the ASCII standard, i.e. 256 different symbols that a computer can use, etc. Hence why Unicode was needed. ANSI can also mean Windows-1252 a superset of ISO 8859-1 in fact there can be disambiguation.

This is why I always use named entities for anything outside ASCII … I have no idea what encoding my text editor uses, so marking characters up as, eg, &eacute; solves the problem of what encoding to set. As a bonus, named entities are often easier to remember than the Alt-#### codes needed to produce them.