When to use HTMLENTITIES

When are you supposed to use code like this…


htmlentities($name, ENT_QUOTES)

Looking back at my recent code, I think I forgot to put it in some places, and now I’m confused when and where to use it?! :-/

Debbie

You’d use it just before you echo content for an HTML page.

So I just do it before displaying it, but not necessarily while I am using working with variables from my database or in general?


Also, what happens if a variable/field is Blank/Empty/Null with respect to HTMLENTITIES?

Do I need to always use something like this…


	$answerEnt = (isset($name) ? htmlentities($name, ENT_QUOTES) : '');

Thanks,

Debbie

Correct. The main reason is that there may be several ways that you present your data. HTML is certainly the most common, but you might also present your data as JSON, or you might use it to compose an email, or for a private administrative or reporting task. Only the presentation layer of your application should know the specifics of how the data is being rendered, so only the presentation layer should handle escaping.

I didn’t have the answer to that off the top of my head, so I double checked the docs, and it made me think that you would get back an empty string. I also ran a quick test script and confirmed that behavior.

$x = null;
$y = htmlentities($x);
var_dump($y);

Jeff,

Sorry for the late reply.

So let me ask this…

I’m trying to be a “good girl” and use htmlentities($variable, ENT_QUOTES) on all of my outputted variables, but I honestly find that a DRAG from both a reptitiveness standpoint and from a Code Prettiness standpoint.

Isn’t there a way do something like this before outputting things…


function createSafeOutput($x){
	$safe = htmlentities($x, ENT_QUOTES);

	return $safe;
}

$firstname = createSafeOutput($firstName);
$address = createSafeOutput($address);

and so on…

Debbie

In fact that’s a very good and smart change, and the way you wrote it is just fine too. The only change I might make would be to rename the function escapeHTML, because I think that would be a bit more clear and specific about what it does.

Jeff, I have no problems renaming things. (I just threw that name together on a whim last night.)

So, I have one vote “Yes” for my proposal above.

Are there some more PHP gurus out there who would like to chime in and let me know if they think my proposal is a good or bad idea?

Thanks,

Debbie

Debbie,

It is definitely a good thing. I usually abstract my data escaping like you outlined as well, simply from the point of if I need to alter it in the future, I rather update one location instead of thousands.

So long as the data entry forms do not allow HTML your good with that approach. Things become about 1000 times more complex once you allow HTML or an abstraction of it like bbcode. The most secure method though will be stripping HTML on user input or going as far as to make it a validation requirement before allowing data entry. If you would do those things which are probably more reliable anyway entity conversion matters little besides for valid HTML. Not that valid HTML isn’t important but just about all modern browsers are forgiving with the common character entity conversion cases.

+1 for validating data on input. Ideally, you would validate/sanitize any data from an external source, whether it be from a user, an external API, an XML file from another site, or whatever. If you validate/sanitize all external data before you work with it, then you don’t need to be “as” worried about how you display it.

Don’t get me wrong, you still should be mindful of how you output data but it’s always a good idea to sanitize input before working with it.

And how would you go about “sanitizing” Form data?

In my “create_account.php”, I do this…


		// ************************
		// Validate Form Data.		*
		// ************************

		// Check First Name.
		if (empty($trimmed['firstName'])){
			// No First Name.
			$errors['firstName'] = 'Enter your First Name.';
		}else{
			// First Name Exists.
			if (preg_match('#^[A-Z \\'.-]{2,30}$#i', $trimmed['firstName'])){
				// Valid First Name.
				$firstName = $trimmed['firstName'];
			}else{
				// Invalid First Name.
				$errors['firstName'] = 'First Name must be 2-30 characters (A-Z \\' . -)';
			}
		}//End of CHECK FIRST NAME


		// Check Username.
		if (empty($trimmed['username'])){
			// No Username.
			$errors['username'] = 'Enter your Username.';
		}else{
			// Username Exists.


			// ************************
			// Check Username Format.	*
			// ************************
			if (preg_match('~(?x)				# Comments Mode
						^				# Beginning of String Anchor
						(?=.{8,30}$)		# Ensure Length is 8-30 Characters
						[a-z0-9_.-]*		# Match only certain Characters
						$				# End of String Anchor
						~i', $trimmed['username'])){

				// Valid Username.

				// Check Username Availability.

			}else{
				// Invalid Username.

				$errors['username'] = 'Username must be 8-30 characters (A-Z 0-9 _ - .)';
			}//End of CHECK USERNAME FORMAT
		}//End of CHECK USERNAME

Debbie

Those to methods of filtering name/username work. If they have passed that test, it’s probably pretty safe to display firstname and username without additional filtering.

When NO HTML is allowed everything can be simple as comparing the user supplied value to that of the value passed through strip_tags. When the values are not equal the input contains HTML. In which case cancel form processing and provide a message to the user. That is probably the simplest method. Though using such a simple approach does have its pitfalls like false positives as discussed in the php docs. Probably 90% of cases though the simple approach with tag stripping will be adequate when no HTML or abstraction of it is allowed. What can be done is to wrap the strip tags call in another function so it can be extended upon as you run into edge cases with false positives or exceptions.