Office uses their own character set and I suspect that may affecting your document. Also, there's a second option that may affect your script and that is when people mark a paragraph with a particular language (although, let's face it, most people leave Word to decide the language used for each word)
I guess the way people use Word may also an advantage or a disadvantage (like using styles or using tabs, etc) and of course, the mistakes they do (white spaces at the end of the sentence, etc)
All in all, I see the character set as the biggest challenge because you can't change it in Word. And also, the version makes a huge difference.
For versions 2007 and above, you can take advantage that it is XML and do it yourself using something like simplexml. Depending on the output you want, you still may consider the use of a library like docvert
You may also want to look into http://www.phpdocx.com/
Although the software is mainly used to convert to word, it can be also used the other way around.
If you're using the Zend Framework, maybe you could try www.phplivedocx.org/downloads/
I don't have any experience with these libraries but I do know they exist.