characters appearing in database values

I have a strange issue where I type text into a textarea, it gets UPDATE to a MySQL database. mb_detect_encoding() on the $_POST variable in PHP returns “ASCII”.

Then, when the text is read back from the database, added o the textarea field, I just update it to the database again, but this time all the spaces have a  characters in front of them. mb_detect_encoding() on the $_POST variable gives “UTF-8”. Example data in the database:

Due to inclement weather, we will be closed on 1/4.

The HTML headers have the utf8 meta tag:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

OS platform (windows vs *nix), PHP, MySQL versions don’t seem to matter–the issue seems to happen everywhere.

The database is using the default collation of latin1_swedish_ci. Changing the collation to utf8_general_ci doesn’t make a difference.

I’ve been banging my head against the wall for hours on this issue–I’ve never encountered it before. The reason this matters is that I need to count the number of characters and limit the string to x number of characters. When I try to save the string to the database for a second time, there are extra characters counted in the string, but when I do a var_dump() of the string, there is nothing extra there. I don’t get it. :confused:

Any ideas?

It’s definately a character encoding issue. I’ve seen this before when some files are saved as UTF-8
(the BOM “space” gets saved as an Â)

Are you using the optional arguments?

mb_detect_encoding($str, 'UTF-8', true);

No.

$result = mb_detect_encoding($str)

The result ends up being either “ASCII” or “UTF-8”. I’m not doing a boolean check. I was trying to figure out what encoding was actually being used.

So, since you’re saying the BOM space may be the issue, do you know how I might address it?

I haven’t had problems since I changed everything to UTF-8 i.e.
My text editior
Web pages
Database

ASCII is OK if you know you’ll only ever need that set of characters I suppose but UTF-8 has those characters covered and others as well.

I think if you add in the arguments you can “force” the strings to seen as UTF-8 instead of ASCII

You might end up foo-bar-ing your database if you change the charset and collation but I think it would be worth doing for the long term benefits.
Save a back-up most definately and if possible experiment on localhost before messing with a live online database.

Yes, but how? I’ve found all sorts of articles giving bits and pieces and differing information, so I have no idea if one change I make actually works or not.

[edit]:
I have Eclipse set to UTF-8 by default.

I have the utf8 HTML meta tag on all pages.

I have UTF-8 set for PDO:

$dbconn = new PDO('mysql:host='.$host.';dbname='.$database_name, $user, $password, array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8"));

I’ve run these SQL statements for my database and tables:

SET NAMES utf8;
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

However, that annoying  character still keeps showing up in VARCHAR and TEXT fields in the database–but only when I update a field for a second time based on what was loaded from the database. This issue only seems to surface for HTML textarea fields, and not text input fields.

What else is left?

It still does it for new INSERTS or only with old content?

Hopefully only old then, removing the “” could be done.

Otherwise I’m suspecting it’s an Eclipse setting that’s in play.

The condition under which this happens:

  1. update/insert text from a previously blank textarea or when all the textarea text is deleted and re-entered. (PHP sees the supplied text prior to UPDATE/INSERT as ASCII).
  2. Get the text from the database and display it in the textarea (PHP sees the retrieved text as UTF8).
  3. Leave the text in the textarea unchanged or alter it in some way, update it in the database, and then the extra characters appear. (PHP sees the supplied text prior to UPDATE as UTF8).

What version of PHP is in use?

I’ve tried it on 5.3.27 and 5.4.7.

In the site’s header, what is the value for the meta:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

I read somewhere that it was essential to have <meta http-equiv=“Content-Type” content=“text/html; charset=utf-8”[COLOR=#000080]> [/COLOR]immediately after declaring <head> to ensure that the contents uses the declared charset.

I was wondering if immediately after your UPDATE statement, a PHP script is called to extract the text from the database before the html header statement is called.

Have you tried setting the PHP header(…).



[COLOR=#000000][FONT=Consolas]<?php header('content-type: text/html; charset: utf-8'); ?>


[/FONT][/COLOR]

Can you please post the output of a SHOW CREATE TABLE for the table(s) concerned?

What versions of MySQL are both servers using?

No. I would’ve thought the meta tag would be enough. I’ve never needed to set a PHP header for content-type before.

I’ve tried with MySQL 5.5.27 and 5.5.32.

Here’s the creation script, but with just the table in question:


SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='TRADITIONAL,ALLOW_INVALID_DATES';

-- -----------------------------------------------------
-- Table `MessageTemplates`
-- -----------------------------------------------------
CREATE TABLE IF NOT EXISTS `MessageTemplates` (
  `MID` INT UNSIGNED NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(255) NOT NULL,
  `subject` VARCHAR(255) NULL,
  `msg` VARCHAR(255) NULL COMMENT 'my comment',
  `msgLong` TEXT NULL COMMENT 'my comment',
  `lastupdated` DATETIME NULL,
  PRIMARY KEY (`MID`),
  UNIQUE INDEX `name_UNIQUE` (`name` ASC))
ENGINE = InnoDB;

SET SQL_MODE=@OLD_SQL_MODE;
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS;

Here is an export of the table with data. One row is for an INSERT. The other is what happens after an UPDATE (when using the text supplied from the database. If I delete and type something, the UPDATE doesn’t introduce the extra character).

Note that I have a PHP function that trims the string to 160 characters. Since each  character adds to the string length, there is a difference between the two strings.

SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO";
SET time_zone = "+00:00";


/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;


--
-- Table structure for table `messagetemplates`
--

CREATE TABLE IF NOT EXISTS `messagetemplates` (
  `MID` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `subject` varchar(255) DEFAULT NULL,
  `msg` varchar(255) DEFAULT NULL COMMENT 'my comment',
  `msgLong` longtext COMMENT 'my comment',
  `lastupdated` datetime DEFAULT NULL,
  PRIMARY KEY (`MID`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=14 ;

--
-- Dumping data for table `messagetemplates`
--

INSERT INTO `messagetemplates` (`MID`, `name`, `subject`, `msg`, `msgLong`, `lastupdated`) VALUES
(12,  'INSERT Test', '', 'Due to inclement weather, the primary event  location will be closed on 1/4/14. We apologize for the inconvenience.  Please visit example.org for updates.', '', '2014-03-24 17:43:40'),
(13,    'UPDATE Test', '',  'Due to inclement weather, the primary event location will be closed on 1/4/14. We apologize for the inconvenience. Please visit example.orgÂ',    '', '2014-03-24 17:43:59');

/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;

To maybe see why the spaces were being replaced with wonky characters, I put the string into an array of characters:

array(171) {
  [0]=>
  string(1) "D"
  [1]=>
  string(1) "u"
  [2]=>
  string(1) "e"
  [3]=>
  string(1) "&#65533;"
  [4]=>
  string(1) "&#65533;"
  [5]=>
  string(1) "t"
  [6]=>
  string(1) "o"
}

And used ord() to get the ASCII code for each character

array(171) {
  [0]=>
  int(68)
  [1]=>
  int(117)
  [2]=>
  int(101)
  [3]=>
  int(194)
  [4]=>
  int(160)
  [5]=>
  int(116)
  [6]=>
  int(111)
}

I’m not sure if that provides any useful clues or not.

Agh! Ok, I found the problem.

The library I was using to generate HTML input/textarea fields was automatically replacing spaces with the nbsp special character. Somehow going back and forth between the database, the spaces were getting messed up a bit.

I noticed this when I was looking at the HTML source and noticed a difference between the textarea values depending on when on was UPDATING the text.

Once I removed that feature, everything started working as I would normally expect.

Good grief. :rolleyes:

Thank you to everyone who took the time to look at this and help with troubleshooting :slight_smile:

What processing or functions is the data being run through between its entry into the $_POST array and its insertion to the database?

That’s basically what the problem was, although I didn’t think of it until later since the code processing the $_POST array was the only part I was looking at, and not the code that created the input fields later.