Charset=ISO-8859-1 doesnt have the euro symbol

[B]Big site often in UTF-8+“EUR” with ISO-8859-1, Shift_JIS, Big5 for visitor input[/B]

Here’s an attempt at investigating what charsets are used by big sites world-wide, and particularly, on what scale is UTF-8 spread, expanding, or shrinking. It was done in 2008 and refreshed since (albeit the changes are smaller than expected).

The charset a site is using is stated, 1st priority by its HTTP Header (known using [URL=“http://web-sniffer.net”]HTTP Web-Sniffer 1.0.44), 2nd priority by its HTML source (“View > Source” in ie6, F12 in ie9, Ctrl+U in Chrome 23, etc).

Test Summary

  1. Tested Thu 17 Jul 2008 (using HTTP Web-Sniffer 1.0.37). Big sites’ most used charsets are apparently UTF-8, [URL=“http://en.wikipedia.org/wiki/ISO_8859-1”]ISO-8859-1 ([URL=“http://en.wikipedia.org/wiki/ISO_8859-1#Coverage”]West Europe, Americas, Oceania, most Africa), [URL=“http://en.wikipedia.org/wiki/Windows-1252”]Windows-1252, [URL=“http://en.wikipedia.org/wiki/ISO_8859-15”]ISO-8859-15, Chinese Traditional [URL=“http://en.wikipedia.org/wiki/Big5”]Big5 ([URL=“http://en.wikipedia.org/wiki/Taiwan”]TW, [URL=“http://en.wikipedia.org/wiki/Hong_Kong”]HK, [URL=“http://en.wikipedia.org/wiki/Macau”]MO), Chinese Simplified [URL=“http://en.wikipedia.org/wiki/Guobiao_code”]GB2312 ([URL=“http://en.wikipedia.org/wiki/People%27s_Republic_of_China”]CN), Japanese [URL=“http://en.wikipedia.org/wiki/Shift_JIS”]Shift_JIS, Korean [URL=“http://en.wikipedia.org/wiki/Extended_Unix_Code#EUC-KR”]euc-kr.
  2. Revised Mon 11 May - Sun 17 May 2009, found only 4 changes: SONY and [URL=“http://www.sony.com/index.php”]Sony USA ISO-8859-1 > UTF-8, [URL=“http://www.bmw.co.kr”]BMW Korea euc-kr > UTF-8, [URL=“http://www.microsoft.com/france/windows/default.mspx”]Microsoft FR Vista ISO-8859-1 > UTF-8.
  3. Mon 04 Jan 2010 19:13:16 +0100, found (coming from the nice BENNY & PEGGY - ON THE SUNNY SIDE video) a good example of a carefully designed Japanese site nicely displaying intertwined Japanese and Western characters by correct use of UTF-8: [URL=“http://www.som-arc.com”][B]SOM ASSOCIATE ARCHITECTS[/B].
  4. In Jul 2011 when checking again, that SOM site had converted to “charset=Shift_JIS” and added a Contact page (also in “Shift_JIS”); the few others checked were unchanged; so I then aborted checking.
  5. On Sat 15 Dec 2012 I rechecked, a little faster than in 2008 (only Web-Sniffer, little source check). A number have switched to UTF-8 (SONY EN, Toshiba FR, Toyota JP, Renault JP), a few back or both ways (NISSAN, BMW).

Result Details

Below showing what old charsets have been, between 2008 and 2012, replaced with new ones.

  1. Toshiba world-wide (UTF-8), [URL=“http://www.toshiba.com/tai”]Toshiba USA (ISO-8859-1), [URL=“http://www.toshiba.co.jp/index.htm”]Toshiba Japan Top Page (UTF-8), [URL=“http://www.toshiba.co.jp/index_j3.htm”]Toshiba JP (Shift_JIS), [URL=“http://www.toshiba.fr”]Toshiba FR ([COLOR=red]ISO-8859-1[/COLOR] [COLOR=#10aa10]UTF-8[/COLOR]) > [COLOR=red][URL=“http://www.toshiba.fr/contact/index.asp”]Contacts (ISO-8859-1) > true email (no forms)[/COLOR], replaced in 2012 [COLOR=#10aa10]with [URL=“http://www.toshiba.fr/fr/Contacts”]Contacts (UTF-8 but no more real contact)[/COLOR]
  2. CLEVO US (UTF-8), [URL=“http://www.clevo.com.tw/tw/index.asp”]CLEVO TW (Big5), contacts are by true email
  3. Hitachi (UTF-8), [URL=“http://www.hitachi.com/global/index.html”]Hitachi Global (UTF-8), [URL=“http://www.hitachi.co.jp”]Hitachi JP (Shift_JIS), [URL=“http://www.hitachi.fr”]Hitachi FR (UTF-8) > [URL=“http://www.hitachi.fr/products/personal/electronics/index.html”]Électronique de consommation (UTF-8) > [URL=“http://www.hitachi.fr/contact_support/index.html”]Nous contacter (UTF-8) > [COLOR=red][URL=“http://www.hitachi.com/GlobalSupport/ContactUs?form_type=global_support”]formulaire du service client (ISO-8859-1)[/COLOR] in 2012 replaced with UTF-8 contact-less page with email addresses
  4. Toyota (UTF-8), [URL=“http://www.toyota.com/chinese/index.html”]Toyota CN (GB2312), [URL=“http://www.toyota.co.jp”]Toyota JP ([COLOR=red]Shift_JIS[/COLOR] [COLOR=#10aa10]UTF-8[/COLOR]), [URL=“http://www.toyota.fr”]Toyota FR (UTF-8) > [URL=“http://www.toyota.fr/forms/contact.aspx”]Contacts (2012 now .tmex, still UTF-8) > [COLOR=red][URL=“http://www.toyota.fr/forms/contactframe.aspx”]Posez votre question (UTF-8, 404-ed in 2012)[/COLOR] > [URL=“http://profiler.toyota.fr/clubcard/index2.jsp”]Form (still ISO-8859-1 in 2012), replaced [COLOR=#10aa10]with [URL=“http://bac.toyota.fr/forms/toyota/otherdemands.aspx/index”]Formulaire (still UTF-8)[/COLOR]
  5. NISSAN COMPUTER CORP ([COLOR=red]UTF-8[/COLOR] [COLOR=#10aa10]ISO-8859-1[/COLOR]), [URL=“http://www.nissan-global.com/EN/index.html”]NMC (Nissan Motor Corp) ([COLOR=red]ISO-8859-1[/COLOR] [COLOR=#10aa10]UTF-8[/COLOR]), [URL=“http://www.nissan-global.com/JP”]NMC JP ([COLOR=red]Shift_JIS[/COLOR] [COLOR=#10aa10]ISO-8859-1[/COLOR]), [URL=“http://www.nissan.fr”]Nissan FR (UTF-8) > Contact > [URL=“http://metafaq.nissan.fr/templates/nissan/main/mainPage?_mftvst:groupID=%24all_fr&_mftvst:langStr=%24fr&_mftvst:localeStr=%24fr&_mftvst:moduleID=%24&id=SNQ1EIB89SM36ADGA5ATVHVCS6”]Contactez Nissan (UTF-8) = actually a KB, with NO contact entry, replaced in 2012 with email addresses
  6. Renault (UTF-8), [URL=“http://www.renault.com.br”]Renault BR (UTF-8), [URL=“http://www.renault.jp”]Renault JP ([COLOR=red]Shift_JIS[/COLOR] [COLOR=#10aa10]UTF-8[/COLOR]), [URL=“http://www.renault.fr”]Renault FR (UTF-8) > [URL=“http://www.renault.fr/contact”]Contact (UTF-8) > [URL=“http://www.renault.fr/contact/contact-direct/index.jsp”]Service Relation Client Renault or [URL=“http://www.renault.fr/contact/contact-direct/FR-gd-c.jsp”]here (form, UTF-8, but after 1 hour of tests and phone calls, it appears this form actually refuses any character outside ISO-8859-1)
  7. BMW ([COLOR=red]UTF-8[/COLOR] [COLOR=#10aa10]ISO-8859-1[/COLOR]), [URL=“http://www.bmw.fr/fr/fr”]BMW FR FR FR ([COLOR=red]UTF-8[/COLOR] [COLOR=#10aa10]ISO-8859-1[/COLOR]) > [I]various forms that, while hidden behind frames, appear coded in [COLOR=red]ISO-8859-15[/COLOR] [COLOR=#10aa10]UTF-8[/COLOR][/I], [URL=“http://www.bmw.co.jp/jp/ja”]BMW JP JP JA (UTF-8, but no forms found; in 2012, ISO-8859-1 with [URL=“http://www.bmw.co.jp/jp/ja/owners/customer_support/customer_support.html”]form in UTF-8), [URL=“http://www.bmw.com.cn”]BMW CN (UTF-8), [URL=“http://www.bmwhk.com”]BMW HK ([COLOR=red]Big5[/COLOR] [COLOR=#10aa10]UTF-8[/COLOR]), [URL=“http://www.bmwhk.com/index.htm”]BMW HK Store (Big5), [URL=“http://www.bmw.co.kr”]BMW Korea (euc-kr > UTF-8 > ISO-8859-1)
  8. Microsoft sites world-wide are apparently all in UTF-8, including forms in NON-ASCII languages, e.g.: Microsoft > [URL=“http://www.microsoft.com/en/us/default.aspx”]MS US (UTF-8), [URL=“http://go.microsoft.com/?linkid=2028351”]Contact Us, [URL=“http://support.microsoft.com/contactus/cu_inventory?ws=mscom”]View Customer Service Solution Centers, bottom right: [URL=“http://support.microsoft.com/contactus/cu_sc_more_master?ws=mscom#tab1”]Contacts, [URL=“http://support.microsoft.com/contactus/emailcontact.aspx?scid=sw;en;1539&ws=morecust&ws=mscom”]E-mail Customer Service > [URL=“https://support.microsoft.com/contactus/emailcontact.aspx?scid=sw;en;1539&ws=morecust&ws=mscom”]Customer Service Contact Us ([B]form, UTF-8[/B]); idem [URL=“http://www.microsoft.com/fr/fr”]Microsoft FRANCE (UTF-8), …, [URL=“https://support.microsoft.com/contactus/emailcontact.aspx?scid=sw;fr;1513&ws=MoreInfo”]Contactez Nous : Plus d’information ([I]form, UTF-8, but replies in ISO-8859-1 with other chars corrupted[/I]); [URL=“http://www.microsoft.com/ja/jp/default.aspx”]Microsoft Japan (UTF-8), bottom left: [URL=“http://www.microsoft.com/japan/customer/default.aspx”]お問い合わせ先 > [URL=“http://www.microsoft.com/japan/customer/default.aspx”]Microsoft Customer Service & Support, [URL=“http://www.microsoft.com/japan/customer/directory/web_mail.aspx”]ウェブ/メールでのお問い合わせ > [URL=“http://www.microsoft.com/japan/customer/directory/web_mail.aspx”]Microsoft Japan Customer Directory Web Mail ([COLOR=red]Shift_JIS[/COLOR] [COLOR=#10aa10]UTF-8[/COLOR]), [URL=“http://go.microsoft.com/?linkid=2028918”]Contact US > [URL=“https://support.microsoft.com/contactus/emailcontact.aspx?scid=sw;ja;1238&ws=japan”]Contact Us マイクロソフトへのご意見・ご要望 ([B]form, UTF-8[/B]); [URL=“http://msdn.microsoft.com”]MSDN and [URL=“http://technet.microsoft.com”]TechNet (all languages) are apparently entirely in UTF-8: [URL=“http://msdn.microsoft.com/ja-jp/default%28en-us%29.aspx”]MSDN JP, [URL=“http://msdn.microsoft.com/zh-cn/default.aspx”]MSDN CN, [URL=“http://msdn.microsoft.com/ko-kr/default.aspx”]MSDN KR, [URL=“http://msdn.microsoft.com/zh-tw/default%28en-us%29.aspx”]MSDN TW; [URL=“http://update.microsoft.com”]MU, [URL=“http://windowsupdate.microsoft.com”]WU, [URL=“http://officeupdate.microsoft.com”]OU, [URL=“http://www.xbox.com”]Xbox, MSKB (e.g. [URL=“http://support.microsoft.com/kb/953979”]KB953979) as well; [URL=“http://www.microsoft.com/france/windows/default.mspx”]Microsoft FR Vista ([COLOR=red]ISO-8859-1[/COLOR]) has been rewritten and relocated in [URL=“http://www.microsoft.com/france/windows/windows-vista”]Microsoft FR Vista (UTF-8)
  9. however excepted US (where UTF-8 has no difference with ASCII), some parts of sub-sites are still in fixed-length charsets, e.g. Microsoft JP Vista ([COLOR=red]Shift_JIS[/COLOR] [COLOR=#10aa10]UTF-8[/COLOR]), [URL=“http://www.microsoft.com/france/hardware/mouseandkeyboard/default.mspx”]Microsoft FR mice and kbds (ISO-8859-1) relocated in 2012 to [URL=“http://www.microsoft.com/hardware/fr-fr”]Microsoft FR Claviers, souris, webcams et autres (UTF-8).
  10. Wikipedia, [URL=“http://ja.wikipedia.org/wiki/%E3%83%A1%E3%82%A4%E3%83%B3%E3%83%9A%E3%83%BC%E3%82%B8”]Wikipedia JP, [URL=“http://en.wikipedia.org/wiki/Main_Page”]Wikipedia EN, [URL=“http://fr.wikipedia.org/wiki/Accueil”]Wikipedia FR, are apparently all [B]totally in UTF-8[/B], including the proprietary yet rich end-user editor available for each page - and in the [URL=“http://en.wikipedia.org/w/index.php?title=Wikipedia:Sandbox&action=edit”]Editing Wikipedia:Sandbox (form+display, UTF-8, OK), where any difficult char string is correctly rendered, whether entered in visible characters, in NCRs or in entities
  11. amazon (some pages windows-1252, most ISO-8859-1, even the pages tagged “UTF8”), [URL=“http://www.amazon.fr”]amazon FR and [URL=“http://www.amazon.de”]amazon DE (View = unknown, because: HTTP headers = 8859-1 + 8859-15, source = 8859-1), [URL=“http://www.amazon.co.uk”]amazon UK (ISO-8859-1), [URL=“http://www.amazon.co.jp”]amazon JP (Shift_JIS). In 2012, “€” has been replaced everywhere with “EUR” (apparently after my emails to amazon, see §4 in [URL=“http://www.sitepoint.com/forums/showthread.php?930959-charset-ISO-8859-1-doesnt-have-the-euro-symbol#post5265392”]Encode in local charsets … and use FINANCIAL Euro symbol), and “windows-1252” has disappeared; no other changes, e.g. many HTTP headers still ISO-8859-15, even while still often carrying “ie=UTF8” in their URLs.
  12. SitePoint (UTF-8), [URL=“http://www.sitepoint.com/article/guide-web-character-encoding”]Article ([URL=“http://www.sitepoint.com/guide-web-character-encoding”]relocated, still UTF-8) or [URL=“http://www.sitepoint.com/blogs/2008/09/04/trying-to-decipher-that-eula-better-have-a-phd”]Blog ([URL=“http://www.sitepoint.com/trying-to-decipher-that-eula-better-have-a-phd”]relocated, still UTF-8) > [URL=“http://www.sitepoint.com/forums/showthread.php?t=450442”]Forums ([URL=“http://www.sitepoint.com/forums/showthread.php?450442-The-Definitive-Guide-to-Web-Character-Encoding”]relocated, still ISO-8859-1) > [URL=“http://www.sitepoint.com/forums/newreply.php?do=newreply&p=4248560”]Reply Form (newer example, still ISO-8859-1): SitePoint, while promoting UTF-8 and aptly applying it on regular pages, turns to ISO-8859-1 as soon as visitors’ input is significant
  13. The Autistic Cuckoo (ISO-8859-1) > [URL=“http://www.autisticcuckoo.net/about/toolman.php”]Autistic Cuckoo (ISO-8859-1) > [URL=“http://www.autisticcuckoo.net/arkiv.php?id=2009/02/10/hjalp-australien”]Hjälp Australien (ISO-8859-1) (In 2012 this site [COLOR=red]doesn’t open[/COLOR], yet still responds “ISO-8859-1”. Swedish is [URL=“http://en.wikipedia.org/wiki/ISO_8859-1#Languages_with_complete_coverage”]100% covered by ISO-8859-1, example [URL=“http://www.youtube.com/watch?v=pvUndiQbxTc”]Dagen är nära, i.e. Lascia Ch’io Pianga in Swedish, and its [URL=“http://www.youtube.com/all_comments?threaded=1&v=pvUndiQbxTc”]lyrics)
  14. Accessites (UTF-8) > [URL=“http://accessites.org/site”]Site (ISO-8859-1) > [URL=“http://accessites.org/site/contact”]Contact (form, UTF-8)
  15. sk89q > [URL=“http://sk89q.therisenrealm.com/testground/utf8email”]UTF-8 web form and UTF-8 webmail reply (form+mail, UTF-8, OK), not found in 2012.

Result Summary

Windows-1252 seems disappearing, “€” looks slowly replaced with “EUR”, ISO-8859-15 remains rare, big sites tend to switch to UTF-8, but the (big or small) ones having to deal with non-English speaking or having not as big resources tend to revert, for their inputs (forms, forums), to main local fixed-length charsets, still very efficient and reliable, mainly the 3 biggest: ISO-8859-1 (the historical web standard), Shift_JIS (main Japanese), Big5 (main Chinese).

Note on UTF-8

UTF-8, which as all charsets is ASCII-based, is fine when it adds nothing to ASCII (like in English language that uses only ASCII characters). It works fine too when the character flow is one-way (web sites with no feedback or interaction). But when there is significant amount of input from the other end (Forms, Forums or other interactive websites, email) then after 2 or 3 steps traveling or editing with interactions with sites, programs, DBs or users using other charsets, UTF-8 too often causes major problems on NON-ASCII characters; when ASCII remain the majority of characters (Europe), UTF-8 emails, while poor and ugly, are still readable; but when ASCII are minority or absent (e.g. Japan), UTF-8, in addition to becoming useless (since it then requires 2 bytes for each character, defeating the very purpose of UTF-8), soon makes emails totally unreadable, causing massive rejuection in the population, who revert to fixed-length encodings as ISO-8859-1, Shift_JIS or else. Notice however that, while apparently difficult to implement and use in real life long workflow (many loud people say it’s easy… yet fall short from help to the ones impaired), UTF-8 does already fills its high promises in some cases (Wikipedia), so we probably will have it work as expected in some not too remote future - after the current period where ISO-8859-1 (the historical web standard, and the ASCII-nearest 255-char charset) or whatever national-scale fixed-length charsets (Shift_JIS, Big5) remain meanwhile more reliable for email, forms, forums, and other public interactions.

Versailles, Sat 15 Dec 2012 17:28:00 +0100
Let’s make sure of the facts before getting in the cause – Fontenelle