I am trying to replace ampersands & to & in my string. Seems easy but it must be smart enough to not replace strings like to   and other stuff like Ï
Currently I have:
$message = ereg_replace(“&”, “&”, $message);
I want text to be like:
tom & jerry = tom & jerry
& = & (& on it’s own with nothing before or after it)
= (stay unchanged and other html ascii codes that have a & in front)
If you are lucky enough to be running PHP 5.2.3+, you can use htmlspecialchars() or htmlentities() with double_encode set to false. You might also want to try html_entity_decode() the string first and then apply htmlspecialchars() for somewhat similar effect. Or you could just write a suitable regex.
“When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.”.
Cool! But my PHP version on my web host is 5.2.0. Only version 5.2.3 and up support double_encode. Might be worth the upgrade just for this function to work properly.
The root of your problem is, that you encoded data too early. You should never have the need for the functionality, you’re describing. Where do you get your data from?
I see. You could try to prevent posters from posting invalid markup then. Eg. validate it and give an error message. I’m not sure if that’s feasible – It probably depends on your audience.
Else you can use htmltidy, which is a tool for cleaning up malformed HTML.
You’re accepting HTML (Or a subset hereof) as input. Thus you should validate that this input is valid HTML. That includes encoding ampersands as entities. As it stands, you have really no way of knowing if the user wanted to write an & or the literal text &, if the input text is &. It’s not a major thing, but it’s just a bad practise to mix different levels of abstraction like that.
In the forum posts, there isn’t any HTML code, just BBCode like [ U R L ] http://www.whatever.com [/ U R L]. I convert some of the BBCode into HTML code. Actually it is not the forum posts that is giving me problems but the ampersand character.
Do you guys know a regular expression that could solve the problem I pointed out at the top of this topic?
#FORMAT STRING INTO PURE HTML FIRST
$message = trim(html_entity_decode($message));
#REPLACE HTML ENTITIES WITH HTML CODES
$message = htmlentities($message, ENT_NOQUOTES);
#REPLACE < & > HTML CODES WITH THE ACTUAL CHARACTERS
$message = str_replace(array("<", ">"), array("<", ">"), $message);
I think the above code works way better than anything I have tried so far.