Charset=ISO-8859-1 doesnt have the euro symbol

rfl · December 1, 2012, 1:58pm

i wrote this blog using java and mysql; when i enter the € (euro symbol), all i get, when i retrieve the data from database, is a question mark
does someone has any idea how to solve this?
as i said before, i’m using charset=ISO-8859-1 (latin 1, in mysql)

thanks in advance

James_Hibbard · December 1, 2012, 4:09pm

Hi there,

ISO/IEC 8859-1 is missing some characters for French and Finnish text, as well as the euro sign.
Could you simply not specify another charset on your pages, such as utf-8 or ISO-8859-15?

rfl · December 1, 2012, 4:43pm

i tried utf-8 but got some strange characters, so i’m gonna try the other
brb

rfl · December 1, 2012, 4:50pm

i tried
<meta http-equiv = “Content-Type” content = “text/html; charset = iso-8859-15”>
but no luck; still the question mark instead…

rfl · December 1, 2012, 5:28pm

i tyried again with utf-8 without success

James_Hibbard · December 1, 2012, 5:41pm

Can you check the character set for the database table in which your content is stored.
What is that?

rfl · December 1, 2012, 5:45pm

its latin1

James_Hibbard · December 1, 2012, 7:21pm

Hi,

So, to summarize:
One or more fields in your database (which is a latin1_whatever) show the Euro symbol fine in PHPMyAdmin.
However, when you try to output these fields in a webpage, the Euro sign shows up as the question-mark.
You are using the following meta tag on the page: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Is that correct?

Could you provide the code you are using to read the data from the database and to output it on the page.

rfl · December 1, 2012, 8:13pm

>> [COLOR=#464646][FONT=Helvetica Neue]One or more fields in your database (which is a latin1_whatever) show the Euro symbol fine in PHPMyAdmin.
no, when i open mysql query browser, i already have a question mark

>> [/FONT][/COLOR][COLOR=#464646][FONT=Helvetica Neue] when you try to output these fields in a webpage, the Euro sign shows up as the question-mark.
yes

>> [/FONT][/COLOR]You are using the following meta tag on the page: <meta http-equiv=“Content-Type” content=“text/html; charset=UTF-8” />
Is that correct?
no, i’m currently using <meta http-equiv = “Content-Type” content = “text/html; charset = iso-8859-1”> meta tag, but i tested with both [COLOR=#464646][FONT=Helvetica Neue]charset = iso-8859-15 and also with UTF-8

>> [/FONT][/COLOR][COLOR=#464646][FONT=Helvetica Neue]Could you provide the code you are using to read the data from the database and to output it on the page.

i think it would be also relevant posting the code to insert in bd
the servlet that inpust to db:
[/FONT][/COLOR]

package blog;

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.sql.*;
import bd.*;


public class Escrever extends HttpServlet {
    private JdbcAccess access;
    private int linhas;


    public void init() throws ServletException {
        access = new JdbcAccess("avulsas");
    }


    public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
        String titulo = request.getParameter("titulo");
        String texto = request.getParameter("texto");
        long data = java.lang.System.currentTimeMillis() / 1000;


        String sql = "INSERT INTO posts (data, titulo, texto) VALUES (" + data + ", '" + titulo + "',  '" + texto + "')";


        try {
            linhas = access.executaUpdate(sql);
        }
        catch (SQLException msg) {}


        response.setContentType("text/html");


        if (linhas ==1) {
            PrintWriter out = response.getWriter();
            out.println("<html>");
            out.println("<head>");
            out.println("<title>Escrever</title>");
            out.println("<meta HTTP-EQUIV=\\"REFRESH\\" content=\\"0; url=http://rsacramento.no-ip.org/Blog\\"");
            out.println("</head>");
            out.println("<body>");
            out.println("</body>");
            out.println("</html>");
        }
    }


    public void doPost(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
        doGet(request, response);
    }
}

the servlet i use to read from db is next:

package blog;

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import java.sql.*;
import bd.*;
import util.*;


public class Avulsas extends HttpServlet {
    private JdbcAccess access;


    public void init() throws ServletException {
        access = new JdbcAccess("avulsas");
    }


    public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
        String sql = "SELECT * FROM posts order by data desc LIMIT 5";
        String apagina = "";


        try {
            ResultSet rs = access.executaQuery(sql);
            access.fecha(access.getConnection());


            apagina = AvulsasUtil.formata(rs);
            request.setAttribute("apagina", apagina);
            getServletContext().getRequestDispatcher("/jsp/avulsas.jsp").forward(request, response);
        }
        catch (SQLException msg) {
//          String erro = "De momento não é possível comunicar com a base de dados.<br /> Tente mais tarde.";
            Object erro = msg.toString();
            request.setAttribute("erro", erro);
            getServletContext().getRequestDispatcher("/jsp/erro.jsp").forward(request, response);
        }
    }


    public void doPost(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
        doGet(request, response);
    }
}

and also:

package util;

import java.sql.*;
import java.text.*;


public class AvulsasUtil {
    public static synchronized String formata(ResultSet rs) throws SQLException {
        StringBuilder pagina = new StringBuilder();
        String dataTemporaria = "";


        while (rs.next()) {
            long mili = rs.getLong(2);
            mili = mili * 1000;
            java.util.Date data = new java.util.Date(mili);
            String padraoExtenso = "EEEEEE, d 'de' MMMMMM 'de' yyyy";
            String padraoHora = "HH:mm";
            SimpleDateFormat  sdfExtenso = new SimpleDateFormat(padraoExtenso);
            SimpleDateFormat  sdfHora = new SimpleDateFormat(padraoHora);
            String dataPorExtenso = sdfExtenso.format(data);
            String dataHoraria = sdfHora.format(data);
            String titulo = rs.getString(3);
            String texto = rs.getString(4);
            if (!dataTemporaria.equals(dataPorExtenso)) {
                dataTemporaria = dataPorExtenso;
                pagina.append("<h1>" + dataTemporaria + "</h1>\
");
                pagina.append("<h2>" + titulo + "</h2>\
");
                pagina.append("<p><span class = \\"horas\\">" +
                        dataHoraria +
                        " </span>" +
                        texto +
                        "</p>\
");
            }
            else {
                pagina.append("<h2>" + titulo + "</h2>\
");
                pagina.append("<p><span class = \\"horas\\">" +
                        dataHoraria +
                        " </span>" +
                        texto +
                        "</p>\
");
            }
        }


        return pagina.toString();
    }
}

hope it helps

James_Hibbard · December 1, 2012, 8:22pm

Hi,

If you can’t see the Euro symbol in PHPMyAdmin, that is not so hopeful.

I know it sounds obvious, but did you try setting the correct encoding in your browser?
Which browser are you using?
Which encoding do you have?
Does this problem occur in all browsers?

Regarding your code, I’m afraid my Java isn’t wonderful.
Had it been PHP and had the Euro sign been displaying properly in PHPMyAdmin, I would have had a bunch of suggestions.

As it is, if changing your browser’s charset encoding doesn’t help, it might be the case that the Euro symbol isn’t being stored correctly in the first place.

rfl · December 1, 2012, 8:37pm

i’m testing with latest opera, with very recent chrome and with i.e.8, and its equal all over
i read this article:
http://www.oracle.com/technetwork/articles/javase/httpcharset-142283.html,
but was no help for me too
>> [COLOR=#000000][FONT=Helvetica Neue]it might be the case that the Euro symbol isn’t being stored correctly in the first place
[FONT=verdana]yeap

thanks anyway[/FONT] :)[/FONT][/COLOR]

rfl · December 1, 2012, 8:39pm

i guess must have something to do with server’s charset… (using tomcat)

Michel_Merlin · December 13, 2012, 9:00pm

[B]Encode in local charsets (ISO-8859-1, Shift-JIS, etc) and use FINANCIAL Euro symbol[/B]

Recommendation:

First you need the same charset everywhere in your information handling chain, seamlessly from forms and email to web pages to DB, including all according back-and-forth interfaces.
For this you need to select a charset that will actually work in real world. If dealing with public in (North or South) America or Western Europe, the only currently (while waiting for UTF-8 to become ready) efficient (hence affordable and reliable) combination is ISO-8859-1 + [URL=“http://en.wikipedia.org/wiki/Euro”]EUR[/B] (the [URL=“http://en.wikipedia.org/wiki/ISO_4217”]ISO 4217 3-letter FINANCIAL symbol). In Japan, [B][URL=“http://en.wikipedia.org/wiki/JIS_encoding”]JIS or [URL=“http://en.wikipedia.org/wiki/Shift_JIS”]Shift-JIS + EUR. And accordingly in the rest of the world.

Explanation:

In real life in France I often receive from big companies French email messages that they have entirely stripped from the due French accents (no matter the mailer they use), making them ugly and difficult to read, yet readable; this is apparently because, being usually English-speaking, they still encode in UTF-8, ignoring (since in English UTF-8 brings no difference or drawback or benefit over ASCII) that UTF-8 is the cause of their problems with NONASCII chars; oppositely the email messages I receive in Western language (FR, EN, DE) from most other companies or individuals are encoded in ISO-8859-1, and rid of charset problems. This (temporary I hope) situation is IMO because compatibility problems between UTF-8 and traditional fixed-length charsets have been underestimated by the official bodies in charge of enforcing UTF-8; as a result, UTF-8 problems in real world with NONASCII characters:
[LIST=1]
are inexistent in English, where all characters are ASCII, so encoding in UTF-8 is actually encoding in ASCII;
are few in Western European languages, where few chars are NONASCII, so encoding in UTF-8 does make documents inelegant, but not unreadable;
are total in Japan, where most chars are NONASCII, so UTF-8 not only augments the document size but, in real world, causes most characters replaced with Mojibakes, making UTF-8 vastly rejected by regular people (Note: I still need, and would appreciate, more recent, direct, helpful, precise and reliable checks and facts in English about charsets in Japan, from able persons, if possible Japanese or living in Japan; same about China mainland, Hong-Kong and Taiwan).
[/LIST]
If you send some text (through email or a form) to someone in the public, you have no control over what they will do with that text (editing, replying, forwarding), and particularly what programs or charset(s) will be used down the workflow. Many of your correspondents will knowingly or not use their local charset, so if you have encoded in another one (namely UTF-8 if you are NOT writing in English), they will encounter a lot of big problems with no solution apparent to them, whence their going back traditional charsets or removing accents.
In real world, UTF-8’s goal ([I]efficiently[/I] representing [I]all[/I] the 0.1-1-million [URL=“http://en.wikipedia.org/wiki/Unicode”]Unicode characters in the world) has only been successfully achieved in complete closed [I]pure-UTF-8[/I] environments built with careful intelligent thinking and sufficient resources, as Wikipedia; others generally tend to go back to “traditional” local fixed-length charsets ([URL=“http://en.wikipedia.org/wiki/ISO/IEC_8859-1”]ISO-8859-1 in Western European Languages, [URL=“http://en.wikipedia.org/wiki/JIS_encoding”]JIS for email and [URL=“http://en.wikipedia.org/wiki/Shift_JIS”]Shift-JIS for web pages in Japanese, etc).
ISO-8859-15’s main goal (and effect) is to introduce the [URL=“http://en.wikipedia.org/wiki/Euro_sign”]Euro [I]typographical[/I] symbol “€”, but it does so by substituting it to the general [URL=“http://en.wikipedia.org/wiki/Currency_(typography)”]currency [I]typographical[/I] symbol “¤”, so in real world if you send an ISO-8859-15-encoded “€”, somewhere down the workflow it will inevitably get replaced with an ISO-8859-1-encoded “¤”, building a damageable confusion, thus making ISO-8859-15 unsafe thus actually unusable. Oppositely, the Euro financial symbol “EUR” is recognized, understood, read, written, conveyed, transcribed, immediately sans ambiguity or error by any person or machine or program world-wide, from financial traders to shoe shiners, from Bhutan to Manhattan. So, after (inter alia) my various posts and emails, many sites like amazon (.fr, .de, etc) or wikipedia (all) have now switched, in their use or recommendations, from “€” to “EUR”.

Details: For Long URLs, Accentuated Chars, encode as Quoted-Printable, Western European (ISO), use EUR for Euro symbol of Sun 19 Nov 2006.

Versailles, Thu 13 Dec 2012 22:00:00 +0100

Jeff_Mott · December 13, 2012, 10:15pm

Hi, Michel. Thanks for your very detailed replies! I hope you don’t mind a follow-up question. Perhaps it’s due to me being in an English-speaking bubble, but my understanding was that UTF-8 is universally understood by now. Many large [URL=“http://www.yahoo.co.jp/”]websites use it, I presume successfully. In Western Europe or eastern countries, is there still software being used that doesn’t support UTF-8?

Michel_Merlin · December 14, 2012, 11:15am

[B]Big sites tend to UTF-8 for EN pages, local charsets for forms[/B]

In the 2 sites you link ( http://www.google.fr and http://www.yahoo.co.jp ) and in the very page we are posting on right now ([URL=“http://www.sitepoint.com/forums/showthread.php?930959-charset-ISO-8859-1-doesnt-have-the-euro-symbol”]charset=ISO-8859-1 doesnt have the euro symbol), let’s check the charset they state in their [URL=“http://www.google.com/search?q=HTTP+Headers”]HTTP Headers (using [URL=“http://web-sniffer.net/”]HTTP Web-Sniffer](http://www.google.fr/) 1.0.44) and in their HTML source (I recall that, whatever we can think about it, the HTTP Header has priority over the HTML source):

http://www.google.fr (and [URL=“http://www.google.co.jp/”]http://www.google.co.jp BTW) actually uses ISO-8859-1 (states “Content-Type: text/html; charset=ISO-8859-1” in its HTTP Header, and nothing in its HTML source)
http://www.yahoo.co.jp (as well as [URL=“http://www.yahoo.fr/”]www.yahoo.fr, that redirects to [URL=“http://fr.yahoo.com/”]http://fr.yahoo.com, or as [URL=“http://www.yahoo.com/”]http://www.yahoo.com ) actually uses UTF-8 (states “Content-Type: text/html; charset=utf-8” in its HTTP Header, and <meta http-equiv=“content-type” content=“text/html; charset=utf-8”> in its HTML source
this SitePoint page actually uses ISO-8859-1 (states “Content-Type: text/html; charset=ISO-8859-1” in its HTTP Header, and <meta http-equiv=“Content-Type” content=“text/html; charset=ISO-8859-1”> in its HTML source), and nevertheless displays correctly the Euro “€” and Currency “¤” typographical symbols (and some).

Notice however that between my checks of 2008 and 2011, some sites have converted from UTF-8 to local charsets, some the other way; a typical case is SONY, where global and US sites have switched from ISO-8859-1 to UTF-8 (that will do them no hurt at all since for English UTF-8 is actually ASCII), while local sites have remained in, or converted to, local charsets, especially in their form pages: see SONY and [URL=“http://www.sony.com/index.php”]Sony USA (from ISO-8859-1 to UTF-8), [URL=“http://www.sony.net/”]Sony Global (ISO-8859-1), [URL=“http://www.sony.jp/”]Sony JP (Shift-JIS), [URL=“http://www.sony.fr/section/accueil”]Sony FR (UTF-8) > [URL=“http://www.sony.fr/section/contact”]Contact (UTF-8) > [URL=“http://213.186.45.65/~sony/frcontact/rub_3/acc.asp”]Form (Windows-1252).

Sure everything exists in Nature, yet the remaining ones that don’t support it at all must be rare. However many sites support UTF-8 but incompletely or wrongly. A notable case is Microsoft, who despite its vast resources never corrected Outlook Express’ big UTF-8 flaw in editing HTML source, and took years before correctly taming UTF-8 everywhere else.

Versailles, Fri 14 Dec 2012 12:15:10 +0100

rfl · December 14, 2012, 3:37pm

that’s intriguing: how do they do it that i cant have it?

Michel_Merlin · December 14, 2012, 5:07pm

I guess what you mean is you don’t have the Euro and Currency signs properly displayed. It can’t be a FONT problem since the fonts used (Arial, Verdana) are very common. So, have you checked your browser is set to detect the charset in the web page, as told in my “Details” link in last line of 21:00?

Versailles, Fri 14 Dec 2012 18:07:25 +0100

rfl · December 14, 2012, 5:27pm

yes, in opera, chrome and ie8

Jeff_Mott · December 14, 2012, 5:57pm

I suspect the euro and currency symbols are special cases. The browser probably error corrects for the euro symbol, because even though it isn’t in the iso-8859-1 set, it is in the windows-1252 set. And the currency symbol is actually in iso-8859-1, so that one is legal.

If yours is coming through as a question mark, then I suspect it’s not the browser but your server that is sending it that way. You’ll need to follow Michel’s advice to have “the same charset everywhere in your information handling chain.” If either your application or your database is Latin1, then either one of those could be replacing illegal characters with the question mark. You’ll need to pick a charset that can support all characters (in the English-speaking world, UTF-8 is by far the most popular choice), and make sure everything is using that charset.

rfl · December 14, 2012, 7:03pm

i’m still working on how to alter tomcat’s charset
a bit off topic: i notice that in my app, if i have a " character or a - character, there it goes again - i get a question mark; but if i edit it, i mean, rewrite it, i get it right!