Pages changing charset encoding on their own!

Hi

The charset changes automatically from utf-8 to gb18030. I had this problem in the past and the feeling was that it was due to foreign accents.
I am at a loss.

the page I was (am) working on is http://pintotours.net/Work/Punta.html it kept changing the charset. I copied the whole lot to a http://pintotours.net/Work/NewFile.html and, for the moment the charset is ok,
Help…

PS - I found another page with the same problem. It also had foreign accents, and in thei place were what appeared like Japanese or Chinese characcters. The charset above had changed. The problem is that once it changes it seems to be impossible to correct the situation. I have a screen shot of this last instance but don’t know how to insert it in the post

UPDATE

i would appreciate comments on the charset and how to correct it, please

Just now, after sorting out the errors on the pages, I close the Punta.html where I had sibstituted gb18030 for utf-8 and when I reopened it the

<head>
<meta charset="utf-8">

had disappeared and instead and as always I get the foolowing instead:

<head><meta http-equiv="Content-Type" content="text/html; charset=gb18030">

This is the actual validation error and I don’t know how to get rid of it as it keeps returmning automativcally:

Line 4, Column 75: Internal encoding declaration gb18030 disagrees with the actual encoding of the document (utf-8).<head><meta http-equiv="Content-Type" content="text/html; charset=gb18030">

LATER

I found another 8 pages with the same problem. 5 of them had a Japanese charset (euc-jp). The reason was that there was a non-utf-8 (I think) character in this text:

when you book directly with Marriott®. Here’s how it works:

getting rid of the character is easy; changing the cheset back to utf-8 also; but when you reopen the page in the server magically the utf.8 code has gone and been replaced by a different one.

The only solution I found in the past was to copy the contents of the file and paste it into another file of a DIFFERENT name. If you then rename that file with the original name the problem returns. It appears that somewjere (JustHost? Google?) has some kind of bot that looks at pages everynow and then (a few months apart) and “corrects” charsets. Once it does that that filename stays in memory and there is nothing you can do to solve the matter, other than create a new file with a different name, with the obvious negative consequences of redirects.

This is my feeling and nothing else, Any comments or ideas would be appreciated.

Thank you

What system/platform/cms is this sitting on? Straight HTML doesn’t change like that.

Hi ralph

I thought I had been abandoned…

I don’t understand your question. The site is hosted by JustHost.
To give you an idea, I have a page http://pintotours.net/Asia/Indonesia/bali.php with a pop-up for the hotel Jayakarta. This popup also comes uop with euc-jp. It turns out that the ORIGINAL page was written for another hotel belonging to Marriott and there was some text that included the trademark sign Marriott®

Even though all the page was deleted and new text/images placed in, the fact that I retained the original file name has dictated that the charset will remain euc-jp no matter how many time I try to correct it.

As I wrote above this eems to be some kind of bot that comes everu now and then (months apart) and corrects what it thinks is wrong and then keeps it in memory until the next time round, I guess…

I just had an idea: I have this at the very bottom of my .htaccess:

#Set charset

<filesMatch "\.(htm|html|css|js|php)$"> 
AddDefaultCharset UTF-8
</filesMatch>

Anything wrong there?

How are you uploading content? To some kind of web interface / content management system? What do you mean by “original page”? That’s a bit vague, but points to you doing something other than just building a simple, static site.

Hi ralph

Normally, I do not upload html or css files. I write them/change them directly in the server at JustHost. i go into the cPanel / File Manager and open or create the file, and then do all the writing/changing.

What I meant fo original page in the previous post is that in that instance I wrote a page for a hotel belonging to marriott which(I know now) included the trademark sign. because of that that page became corrupted and somehow, someone, something decided to change the charset code. When I decided to delete that hotel and place anothe one in instead, instead of creating a new page I simply changed the content of the old one without realizing that the charset had changed. It explains why even though there are no funny accents/signs in that page it also has this Japanese charset.

PS

Please note that the above explanation refers to a single page. I have another 4 pages with Japanese charset, which are all from Marriott hotels and had that trademark sign. Interestingly I found the same sign in another 2 or 3 Marriott pages which did not have the charset changed (YET…). I also have 3 other pages with the gb18030 which was due to accents in Spanish names.

Are you sure you have UTF-8 selected in the code editor box when you edit pages in this way?

FWIW, I’d never work on files this way. I think it’s much safer - and easier - to work on local copies and upload them when I’m sure there are no errors.

2 Likes

Hi TechnoBear

I am absolutely sure. I validate pages as I work on them and when I am finished with them.

For instance, this other page that I noticed yesterday having the different charset http://pintotours.net/Europe/Spain/BarceloAtenea.html suddenly has the errors connected to the charset.

Do I specifically go look at the charset when I finish? No, but the validations should fail.

Besides, the issue now is how to correct it. I know the roundabout way of solving the problem: copy content, paste it into new file of different name, redirect old page to new one.

But why am I unable to change the charset when there are no funny characters in the page any longer?

All I can suggest is that you download the pages, work on them locally and then upload them again and see if the problem persists.

Hi TechnoBear

Interesting!

I downloaded the file http://pintotours.net/Work/Punta.html onto the Desktop; deleted it in the server; changed the charset back to utf-8 with Notepad++; besides, used Notepad’s “Encoding” to make sure that utf-8 was chosen; I did not validate becaise I don’t know how to do it on the Desktop; I uploaded the file and SO FAR the code has not changed. BUT… the top bar of the page in File Manager “Encoding” states gb18030, which is usually how the problem starts.

Once back in the server, the file now validates

I have to investigate how to write pages locally, but have no idea how to see the finished product online

You can either use a manual upload to the validator, or use the Web Developer Toolbar (on Firefox, anyway), which has a “validate local HTML” option. No doubt others will suggest other methods, but those work for me.

[quote=“qim, post:11, topic:106094”]
I have to investigate how to write pages locally, but have no idea how to see the finished product online
[/quote]If it’s just an HTML page, you can simply open it in your browser. Most code editors offer a button or link to let you do that straight from the editor, but if not, you can just double-click the file to open it.

[quote=“qim, post:11, topic:106094”]
BUT… the top bar of the page in File Manager “Encoding” states gb18030, which is usually how the problem starts.
[/quote]Can you not simply change that to UTF-8? Does it give an error message if you do that? If you’re opening the code editor with UTF-8 selected as the encoding, it’s hard to see why it would then give a different encoding on the page.

Bottom line - don’t use online editing. Do it locally and upload, because that seems to have solved your problem.

Hi

There is a drop-down list and yes I can choose utf-8 but when I save, clase and return it is back in gb18030.
No error messages or of any kind

The good news so far is that the html code has not changed and

<head>
<meta charset="utf-8">

is still there.

Hi

After opening and closing a few times, the blasted chrset has come back! It is back to

<head><meta http-equiv="Content-Type" content="text/html; charset=gb18030">

Why???

I don’t know. Have you asked the hosting company? It seems to be something to do with the settings on the cPanel editor.

I did! lat time it happened and the only conclusion we came to was that it could be the foreign accents. But got no joy out of them as to how to solve it. I also sent them a message yesteraday but it takes a couple of days before they answer.

I take it from all this that it is a mystery to everybody!

Coming back to coding locally. What would you use? Notepad++? I guess I could download the essential files (html, css and whatever elese I was working on) and see it through Notepad’s browsers links

Your editor doesn’t really matter. I do use Notepad++ though and I have no quarrels with it.

As Ryan says, it’s really just a matter of preference. I use Bluefish and really like it. When I had Window some years ago, I used the free (not trial) version of CoffeCup HTML editor. But it’s up to you. There are quite a number of free editors about; just try them until you find one you like.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.