Character Encoding Issues - How to Fix?

Ok, so I’m coding in XHTML. I tend to copy/paste content from Word into notepad to hand code the content for the web. Problem > many symbols such as quotes " ", apostrophes ', colons :, dashes -, semi-colons;, etc. all show up as vertical black bars.

Why is this? Is there a way to transfer a complete document from MS Word into notepad that preserves all special characters and formatting without going through and manually re-coding them all by hand?

Generally it’s advised to keep Word as far away from web design as possible (unless you work for Microsoft, of course). What character encoding are you using for your pages? Ideally, set your pages / server to deliver utf-8 encoding, as that’s the most reliable. Not sure if that will help in this situation, but it’s worth a look.

Thank you Ralph! As a rule I do that too on my own website projects - I stay far way from Word. However, at work - the document metas already declare the character set as UTF-8, and always have. Since I work as an in house SEO, our writers write the content in Word, send the documents to me, and I code and post the content. Before they go live, I always edit them to remove the faulty formatting and ensure the correct characters are displayed. It does the same thing as notepad in CuteFTP when I edit the documents if I find a stray character somewhere. So, I’m scratching my head on this one.

When/where do they do this? On the finished, uploaded page?

One other thing to check is the character encoding being sent by the server, as that overrides anything in the head of your document. (Often a browser’s developer tools will indicate the encoding used.) But short of that, I tend to re-type any characters that aren’t displaying properly. Perhaps there’s a better way, but I don’t mind a bit of manual work.

In theory newer versions of Word should be able to save as as HTML or XML but really how to change encoding aside from that is generally word specific, I know 2003, 2007, and the 2011 should all have slightly different ways of doing this.