Best way to sanitize Email input for SQL

I keep saying escape rather than sanitize. Escaping prevents exactly this.

1 Like

Obviously what you are referring to as escaping is not what I mean by escaping as what you are saying makes no sense with regard to my understanding of what escaping is…

When I refer to escaping I am referring to an output function where VALID DATA that can be confused with the code it is about to be jumbled with needs to be changes to something else in order to not be run. For example if you have a form field where you ask to have a comparison entered and they enter “3 < 5” then you need to escape that valid data to output it to a web page. You don’t need to escape it when writing it to a database. Of course if they enter that in what is supposed to be a name field then the input validation should reject it.

Processing steps:

  1. validate user entered data and sanitize other data that can be tampered with
  2. perform whatever processing is required on the data
  3. output the data (escaping if necessary when there is no way to keep the data separate and the data can validly contain characters that can be confused with the code where you are outputting to).

It is the processing in step 2 that validating/sanitizing is mainly intended to provide security for - otherwise by feeding the right data into the processing someone could potentially crash the server or worse.

Of course the main use for validation is to prevent typos and garbage input caused by the user misunderstanding what they are being asked to enter.As far as the security aspect is concerned you don’t get any different benefit than you would get from sanitizing.

Gotcha. That’s where we had a clarity problem. When we talk about security, we almost always mean your step 3. That’s where injection would happen. That’s where XSS would happen. But you apparently didn’t mean those, or anything else in step 3. When talking about step 2, I think we more frequently say that validating keeps that data meaningful to your application.

That’s the biggest problem most people have with security - they think injection - but something that can attack the processing can be just as much a security risk but one that many overlook.

At the very least invalid input can crash step 2. The right invalid input might crash the server or put the processing into an infinite loop.

Also, why waste resources trying to process invalid data and then jumping in at the end to stop the invalid data resulting in injection? Better is to ensure that the data being processed is either valid (validation) or can at least not cause any harm (sanitization of what is expected to be valid data).

With valid input you only need escaping where there is no way to keep the code and data separate and where valid data can be mistaken for code…

Only those who have never actually learnt about computer security rely on escaping. Effectively they are only patching security holes they have already created instead of writing secure code in the first place.

I agree. Which is why I think it was just a clarity problem. When someone says sanitize for security, that person almost always means sanitize as a means to prevent injection. But now you’ve clarified what you meant, so I think we’re all good.

Careful. You had a good point, but now you’re drifting into a ridiculous exaggeration. There are many places where it is absolutely correct to escape. Values that go into an email header need to be escaped for MIME characters; values that go into a system command such as sendmail need to be escaped for shell characters; values that go into a web page need to be escaped for HTML characters. And so on. There are many, many places where escaping is absolutely the right thing to do. Even people who learned about computer security know that. :wink:

Not even remotely true.

I agree and have never claimed otherwise. You have provided several good examples there of situations where escaping is appropriate. It has nothing to do with security though as those situations still require escaping even for valid values.

Escaping only becomes a security patch when invalid values can get that far through the processing rather than being sanitized on input. When you confirm that your processing is only handling valid values by validation or sanitizing all inputs then the necessary escaping will only ever be processing valid values.

For example on an investment web site there may be a need to display a shareholder’s name… That shareholder might be “T&M O’Brian-Smith” in which case you definitely need to escape the name before you can display it in a web page as otherwise the & in the name will be misinterpreted as the start of an entity code. If all you are doing is displaying the name then you could omit the sanitize step in this case as then there is no other processing that could have security issues as a result. I’d still be inclined to sanitize (for example using a regular expression such as ‘/[^A-Za-z& -'.]+/’ at the top of the code) as then you know that all data is valid before any processing of any data is done and that therefore security holes in your code cannot be exploited using invalid data.

Even where escaping is not required because there are no possible valid values that can be misinterpreted as code it is still reasonable to include the escaping step even though it should always do absolutely nothing. This is called security defense in depth and protects against someone accidentally changing the code to bypass the validation/sanitizing of a field. I suppose at that point it has become a security measure as it then protects against injection if someone manages to bypass all of the prior levels of security. It is however only a fallback for when the prior security measures fail.

Erm… maybe we’re having another communication problem, because when you say, “Only those who have never actually learnt about computer security rely on escaping,” that sounds very much like you’re claiming otherwise.

And poor María-Jose Carreño Quiñones is told her name isn’t valid. :wink:

Don’t get too carried away with validation and sanitization. You’re going to repeat the mistakes of the past where we over-validated email addresses and rejected many that were perfectly fine.

Yes. This is what I’ve been saying. Always escape. Valid or not. Sanitized or not. Escape. You’re guaranteed protection against injection (which has everything to do with security, FYI), regardless of what you consider a valid value to be.

Those who only implement escaping are just patching one of the many potential security holes in their code. That only protects against injection and that is only one of the many possible security issues that their code could have. Unless there is no processing at all that needs to occur before you are outputting the value then some form of sanitizing to protect the intervening code is needed.

Perhaps my name regex was not such a good example but you can certainly remove numbers and <> characters from names when you validate/sanitize them which will eliminate at least some potential security holes in the code.

The main point I am trying to make is that there are lots of security issues that escaping provides no protection for where sanitizing to remove at least some invalid characters simplifies the possible security issues. Remember that every invalid combination that you block at the start is one less that can potentially cause problems with the following code.

If a field is supposed to contain a number and you allow other things to get into the processing then as soon as the code tries to do a calculation it will crash and may expose information that makes other attacks on security easier…

Bringing the subject back to email addresses (given that you have pointed out the problem with over validating/sanitizing), the following IS a valid email address that almost everyone would incorrectly reject as invalid

"#$<> <>&"@[127.0.0.1]

of course since the address is on the local computer it isn’t a very useful email address but it does comply with all the rules for being valid. Change the IP address to an internet based one and someone might actually be using that address.

I’m pretty late to join this thread, but I wanted to point to an article that most developers have not read but definitely should: http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

It identifies a whole host of problems, but offers nothing in the way of solutions.

Jeff already made a valid point about names with additional characters, which is worth consideration, but just how far do you take this? Can we add this to the list:-

#41. There may actually be a real “Bobby Drop Tables” out there and we should not discriminate.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.