Czech character display problem

I have frustrating display problem concerning Czech characters.

Users notes are saved to a MySQL database in utf-8. The Czech characters are saved in correct format in database.

They display correctly on one page but on another using exactly the same code to both retrieve from database and display in a textarea field they do not display correctly.

Other Czech characters on the page outside the textarea field on both pages do display correctly, leading me to believe the page encoding is correct. It is only the characters inside the textarea field that display incorrectly.

I can give urls but you would need to register to see the pages in question.

site url: Acupuncture For The Mind - Akupunkturapromysl

Register (free) then in the user home page (Acupuncture For The Mind - Course Control Center) scroll to the bottom and you will see the notes field. This displays Czech characters as it should.

If you go to module one (Acupuncture For The Mind - Akupunkturapromysl) and scroll down the notes field at the bottom of the page does not display Czech characters correctly.

Any help or suggestions gratefully received before what little hair I do have left is pulled out! :slight_smile:

I do not know the answer but what I would try is this;

echo the text onto the page outside of the <textarea> - does that still display bad characters?

Thus indicating whether it is a problem with the text area (or the code that generates the text area).

then

echo the offending text on the first (comparison) page you mentioned, hopefully proving the database handling code is not at fault.

Cups thanks for the suggestion. Here’s what happened:

echo the characters after retrieving from the database = same error

copy the actual characters and echo anywhere = no problem - so the Czech characters display correctly on all pages

retrieving from the database causes the characters to display incorrectly on the one page, but not on the other

checked the data in the database and it is saved and displays correctly there as Czech characters

both pages encoded as utf-8 and confirmed in the http headers

both pages use identical code to retrieve and dispay the data

any ideas?

you said originally:

Other Czech characters on the page outside the textarea field on both pages do display correctly, leading me to believe the page encoding is correct. It is only the characters inside the textarea field that display incorrectly.

This suggested that it was only the text being displayed inside the textarea which was displaying incorrectly.

I suggested:

echo the text onto the page outside of the <textarea> - does that still display bad characters?

I meant on that same page, leave off the text area and display the offending text with the good text around it.

Are you saying you did this and you have a mixture of good and bad text on the same page, as straight html?

Where is the text for the textarea originally generated from?

I recall having an issue like this yours - where users could upload a text file (for a translation) and that text file was not utf-8, and therefore the text was tainted all the way through its life-cycle until I displayed it.

Edit:

I suspect it is the same case if they even copy from a text file which is not utf-8 and paste into an input screen - you may have to implicitly set encoding to uft-8 prior to saving.

Here are some quality i18n and l10n links which might help you turn up the answer.

Character Sets / Character Encoding Issues [Web Application Component Toolkit]
PHP charset/encoding FAQ - Kore Nordmann - PHP / Projects / Politics
Charset vs. Encoding - Kore Nordmann - PHP / Projects / Politics
Internationalisation Gotchas — Internationalisation Tips

The best advice though it to enforce utf-8 at every single step of the way, from where the text is generated (not from where it is copied as mentioned previously) through to the server encoding, database i/o, html and browser.

Cups hi and thanks for your help.

When the data is pulled from the database it displays badly inside or outside the form textarea field.

If I copy the same Czech characters and echo them straight to the page they display fine. It is pulling them from the database that precipitates the bad display.

The Czech characters display and are therefore saved correctly inside the database and they also display inside another page correctly when pulled from the database.

So on one page all is fine. On another with exactly the same code, it is not! That is the puzzle.

Thanks for the links, going through them now.