The correct way of using htmlspecialchars()

Good day,

I have an HTML code like

<div><span>Some content inside</span></div>

in my DB and I want to echo it.
If I use, for instance,

<?php $var = '<div><span>Some content inside</span></div>' //a content retrieved from the DB;
echo htmlspecialchars($var);?>

it will echo ‘<div><span>Some content inside</span></div>’ inside the wrapper div.

How should I use htmlspecialchars() when echoing similar HTML code?

Hi there,
The htmlspecialchars() method is used to convert special characters to HTML entities.So if you want to display as HTML, you can echo it directly(e.g. “echo $var”).

So, if I have something like ‘<div><span>Some content inside</span></div>’ inside my DB and I want to echo it inside some div, I can simply echo (echo ‘<div><span>Some content inside</span></div>’)?

But what if I have a textarea where the user is allowed to enter .;:'"{}[]$%#@!^&*()-_+=|<>?~, should I sanitize that code before outputting it?

htmlspecialchars is used when you want your text to be displayed as it is in the browser. So if your text string in the db is <div><span>Some content inside</span></div>, then usage of htmlspecialchars depends on what you want to achieve:

  1. If you want the text to be rendered by the browser in html you output it directly (echo $var). Then you will see your exact string when you view the page source in the browser but people browsing your page normally will see the text as rendered by the browser. You are basically keeping html source in your db so there’s no need to convert it any more.

Actually, if you want to go deeper you should note that any text that is not a tag or attribute name should go through htmlspecialchars first, so in your example, Some content inside should have already been passed through htmlspecialchars before putting it the db. How you do it depends on your setup - if a user is entering pain text in a field and your application is putting this text into a generated html code then you should pass the text through htmlspecialchars. It’s just that in this example there are no special characters to escape so there’s no difference. But if someone entered Some content inside about B&B then your final string in the db should be <div><span>Some content inside about B&B</span></div>. If all this html is entered by a user then it is the responsibility of this person to write html properly and escape special characters by hand.

  1. If you want the text to be displayed in the browser including the html code (without parsing) then you should use htmlspecialchars on the whole string: echo htmlspecialchars($var). The same applies when you want to provide the html code in a textarea for editing:

<textarea name="var"><? echo htmlspecialchars($var) ?></textarea>

Thank you @Lemon Juice for the reply.

One more question.
It’s nor related with this topic, but how can I display hyperlinks inside textarea (if possible)?

I’ve already figured out.
Textarea is designed to display text only.

Also keep in mind you should be using strip_tags with your output of this data (and possibly when storing it). You need to limit what HTML elements the user can use. script tags, links, images, etc are all openings to XSS attacks on your site. You should prevent those tags at all costs.

There is a special mode for textareas that enable rich text editing, then you can have links, images, colours and all kinds of formatting. It can get pretty complicated to provide a workable and cross-browser implementation so very often it’s easiest to use a ready-made solution like TinyMCE.

I wouldn’t really advise anyone they should do it - it really depends on the requirements. If text is escaped properly when output then XSS attacks won’t happen. What I see is that tag stripping is overused (because people apply it by default) and the result is that many comment systems will corrupt user comments because they will stripp what was an integral and valid part of the message. You never know what writing style someone will use and they may very well use some <sigh>html-like techniques</sigh> to illustrate a point or even they may input real html - why would you want to strip that? Sure, it’s a good idea to filter out invalid content but in this case security is achieved with htmlspecialchars() when outputting.

Sorry, I should have prefaced, that if the poster decided they wanted to output the div so it gets parsed and rendered, then strip_tags would be a good candidate to help alleviate any XSS attacks.

Plus strip tags won’t alter the text in between the tags, so the text remains, it simply removes the tags so the XSS opportunity is gone.

<?php
$output = '<a href="myurl.com">This is my site</a> **<sigh>wish it were better</sigh>**';
echo strip_tags($output);

Outputs:

This is my site **wish it were better**

So yes, it wiped out <sigh> too, but honestly, I don’t care! My goal is to protect my users and my site, if I tell you the following HTML elements are allowed, and you use non-existing ones… Too bad. (just my 2 cents)

Could you please give me some example of doing it?
I have tried making div with ‘contentEditable’ set to true, but it has some disadvantages.

Thanks for the reply.
I also think that using strip_tags() adds more security to your applications, especially when this function supports tags as a second argument that when specified will be bypassed.

If the poster is keeping all content as html in a div and a span in the db then I think it’s obvious they want to output it so it gets parsed and rendered. That’s where htmlspecialchars comes in to alleviate any XSS attacks without any tag stripping.

Well but here you are talking about a very specific requirement where you want to allow a selected range of html tags entered by the user and you are informing them about it. I was talking about a general use case where the user is presented with a simple text input for a comment (or any other piece of information) - when I see something like this I expect the content I wrote to appear on the site like I intended. The <sigh> tag may not be crucial to the content but I may want to use another tag, which is the gist of what I have to say and silently stripping it ruins my message - the automated script will not know.

I’ve seen many times pages where users posted a comment several times trying to write what they wanted to write each time failing because the server stripped some of their content behind the scenes and finally had to resort to hacks like inserting spaces or other characters to work around the “safety mechanisms”. I find this behaviour ugly that’s why I don’t like advising anyone to use strip_tags by default just because it will make their site secure - no, escape the output properly with htmlspecialchars to protect the site, and if you have any special requirement like allowing users some html (and informing them about it) then apply tag stripping.

I don’t really have much knowledge about contentEditable, I’ve always used ready-made rich text editors like TinyMCE - have a look at the examples they have on their site and you will know how to use it. Or search for other simpler rich text editors, there are many of them.

Thank you.
The one you are using, does it have any disadvantages/bugs?

Most often I use TinyMCE, occasionally I also used FCKEditor - now under the name of CKEditor. I prefered TinyMCE because FKCKeditor was huge (lots of code) but I don’t know how they compare nowadays. FCKEditor’s advantage was that it had a free file and image manager plugin while TinyMCE’s was paid (but you could use a 3rd party file manager or write your own).

What I find is that such editors are not always a good idea because they give a lot too much power for an average user unless you employ some filtering mechanisms so they cannot push any dangerous or disallowed content in html or javascript. So it depends.

Another disadvantage is that the handling of html formatting is still not perfect and sometimes something doesn’t work and the user is stuck unless they know html and can fix the content source by hand - I mean sometimes many nested html tags accumulate over time after the text has been edited many times and it takes someone who knows the stuff to clean it.

Also, people who don’t know html often don’t know the concept of using <h1>, <h2>, … tag, paragraphs, lists, style classes, etc. so they may find it hard to edit content properly without creating a mess after some time (once I had to spend time educating my client about how to use the editor). But overall it can pay because structuring complex pages so that the user edits everything in separate fields can be much more time consuming than just throwing a rich text editor at them and telling them how to use it.

Unfortunately, the users that I’m going to work with (the majority) aren’t familiar with html format, so I have to find something more ‘stable’.
However, thanks for such an elaborate answer.

And sorry for any mistakes in the text, as English isn’t my first language.