Inverted question marks when pasting apostrophes

dausboy · October 17, 2006, 2:59pm

when i’m copying and pasting from word into internet explorer all (') apostrophes get changed into inverted (¿) question marks in the paragraph. what html tag should i put to prevent this from happening. i already put

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" >

but nothing has changed. please what should i add to the head so that i get normal the normal character???
please do note that the content is saved in the database as CLOB type.

Buddy_Bradley · October 17, 2006, 3:47pm

It’s because Word uses the correct apostrophe character, but HTML can only cope with the single or double quote. You could do a find-and-replace to fix the incorrect quote characters with either the basic form or the unicode character reference.

dausboy · October 17, 2006, 3:59pm

but the thing is this is a web application how can i tell all my users what to do? what do u think is the best solution? u have to take in mind that users copy and paste from word inside the application, i’m sure they won’t be using something else and its quite annoying doing DB changes everytime to find and replace the wrong characters.

AutisticCuckoo · October 17, 2006, 4:26pm

It’s an encoding problem. Word uses Windows-1252, where the typographically correct apostrophe is included. But you declare the encoding as ISO 8859-1, and the code point used for the apostrophe lies in the range reserved for C1 control characters in the ISO encoding.

See the HTML FAQ for more information, including possible workarounds.

dausboy · October 17, 2006, 5:34pm

so which encoding type should i use to resolve this problem???

AutisticCuckoo · October 18, 2006, 5:41am

The important thing is that the encoding you (or, rather, your web server) declare in the Content-Type HTTP header is the same one as you used when saving your source file.

It looks like you’re saving your file as Windows-1252, which is not something I’d recommend on a public web site since it’s Windows specific.

Either save the file as ISO 8859-1 or change the encoding declaration on the server to Windows-1252. Note, however, that this apostrophe is not available in the ISO encoding, so if you choose that way, you need to use an entity (’) or a reference (’).

Probably the best solution would be to use UTF-8. That means saving the file as UTF-8 and making the server send UTF-8 as the encoding declaration. UTF-8 can represent any character in the ISO 10646, which is the character repertoire used by HTML. (It’s virtually the same thing as Unicode.)

dausboy · October 18, 2006, 1:47pm

I’ve added <meta http-equiv=“Content-Type” content=“text/html; charset=utf-8” > but i’m still seeing the inverted question marks. My internet explorer uses Western European Encoding. If i’m copying from a normal word document, should i save that document as a utf-8 web page and then copy paste onto my application? what else is missing u think?

AutisticCuckoo · October 18, 2006, 4:22pm

The META element will be ignored if your web server is sending encoding information in the Content-Type header. You must make sure that your web server is sending the correct encoding, or that it doesn’t send any encoding information at all (in which case your META element may be applied).

Also, you cannot just change the encoding declaration without changing the actual encoding. The easiest way for you is probably to declare the encoding as Windows-1252, but as I said before, that encoding doesn’t really belong on a public web site.

If you copy from Word, you may run into problems depending on which editor you use for your HTML document. You may have to set it to Windows-1252 first, then copy from Word, then save as UTF-8. I don’t know if you can make Word use UTF-8; I haven’t looked, and I don’t have Word available at the moment.