Php function to tell us whether a String is made of all acceptable letters & numbers

Hello,

Is there a Php function or code that you can recommend which code will be given a string
and if this string is made up of valid letters & numbers then it returns True otherwise False.

Keeping in mind that the string may contain German Umlat characters such as found in:
Begrüßung, it may also contain similar French, Italian & Spanish characters.

So basically if the string contains any of the non Word like characters such as:
, ; / ? " ’ : @#$%^&*()-_=+!`~\|<>.,\

we want it to return False otherwise return True, with blank space being OK.

Regards,

Hi WorldNews,

This function that uses preg_match($patter, $string) seems to do what you need.


$string = 'This has a SOB!';
echo isStringValid($string);
function isStringValid($string){
   $pattern = "/[\\*\\]|[,]|[;]|[\\/]|[\\?]|[\\"]|[\\']|[:]|[@]|[#]|[$]|[%]|[\\^]|[&]|[\\*]|[\\(]|[\\)]|[\\-]|[_]|[=]|[\\+]|[!]|[\\`]|[~]|[\\\\\\]|[\\!]|[\\|]|[&lt;]|[&gt;]|[\\.][\\,]/";
   $has_banned_char = preg_match($pattern, $string);
   if($has_banned_char == 0){
       return 0;
   } else {
       return 1;
   }
}
/* Outputs 1 for having a banned char */
$string = 'This has a SOB';
echo isStringValid($string);
/* Outputs 0 for not having a banned char */

The only char I am not sure about is '\\' as it blew up my script when trying this out it is currently using [\\\\\\] but I am not sure this is right?

Hope this helps.

Steve



You can approach this two ways; either (1) check if all characters in the input are allowed and raise an error if they’re not, or (2) check if there are characters in the input that are not allowed and raise an error when they are.

I usually opt for option 1 because it’s easier to control and in general there are a lot less characters you don’t want then characters you do want.

So it’s usually easiest to grab yourself an ASCII table, find the characters you want to allow, and filter using those ranges.


function checkInput($input) {
    $allowed = array(
        array(32, 32), // space
        array(48, 57), // 0-9
        array(65, 90), // A-Z
        array(97, 122), // a-z
        array(192, 214), // À-Ö
        array(216, 246), // Ø-ö
        array(248, 255), // ø-ÿ
    );

    foreach(str_split($input) as $char) {
        $ord = ord($char);
        foreach($allowed as $range) {
            list($begin, $end) = $range;
            if ($ord >= $begin && $ord <= $end) {
                continue 2;
            }
        }
        return false;
    }
    
    return true;
}

@ServerStorm;

You don’t need to put those characters in character classes, and the | are superfluous for this purpose too.

This would also work:


"/\\*,;\\/\\?\\"\\':@#\\$%\\^&\\(\\)\\-_=+!\\`~\\\\\\\\!\\|&lt;&gt;\\./"

:slight_smile:

Oh, and for a backslash you need four backslashes in the expression, \\\\ (I forgot why)

Thanks ScallioXTX,

I am just learning how to use RegEx, so I have been challenging myself to try to help, but alas I cannot yet be an expert in this area. I originally tried something similar to what you recommend, however preg_math() threw an error. After learning that the \ character causes problems, I realize that it was most likely the culprit and if I had fixed it then it should have worked.

With that said, I like your recommended way of ‘accepted values’ as it is more than likely more efficient then the regex.

Thanks for taking the time to teach a little :slight_smile:

Regards,
Steve

Hi,

I tried your idea, but it does not work regarding German, French, etc. letters which are not English letters,
such as for example:

Begrüßung

causes your code to give False message about this word as having bad characters, but of course those are
all good German characters.

Any suggestions to get around this short coming?

[SIZE=2][FONT=trebuchet ms]Hi,

Using ScallioXTX’s regex pattern it the following strings did not work, but using my original regex they do:


$string = 'Begrüßung@work';
echo isStringValid($string);
function isStringValid($string){
   $pattern = "/\\*,;\\/\\?\\"\\':@#\\$%\\^&\\(\\)\\-_=+!\\`~\\\\\\\\!\\|<>\\./";
   $has_banned_char = preg_match($pattern, $string);
   if($has_banned_char == 0){
       return 1; // is valid
   } else {
       return 0; // is not valid
   }
}

This did not work, but this did:


$string = 'Begrüßung@work';
echo isStringValid($string);
function isStringValid($string){
   $pattern = "/[\\*\\]|[,]|[;]|[\\/]|[\\?]|[\\"]|[\\']|[:]|[@]|[#]|[$]|[%]|[\\^]|[&]|[\\*]|[\\(]|[\\)]|[\\-]|[_]|[=]|[\\+]|[!]|[\\`]|[~]|[\\\\\\\\]|[\\!]|[\\|]|[&lt;]|[&gt;]|[\\.][\\,]/";
   $has_banned_char = preg_match($pattern, $string);
   if($has_banned_char == 0){
       return 1; // is valid
   } else {
       return 0; // is not valid
   }
}

Using the string:


$string = 'Deutsch übersetzen scheint zu funktionieren in der deutschen ok, wenn auch nicht sicher, besser versuchen http://translate.google.ca/?hl=en&tab=wT';
/* Outputs 0 (Is not valid) */

Once I change the string by removing the web address and the commas it showed being valid:


&lt;?php
$string = 'Deutsch übersetzen scheint zu funktionieren in der deutschen ok wenn auch nicht sicher besser versuchen';

I am not sure why ScallioXTX does not work by my expression works… seem a lucky happenstance because I didn’t know any better :smiley: My expression just says match to ONE character in the bracket. If it finds a match it returns a number of matches, which will be greater than 0; therefore 0 means that no matches are found and the string is valid.

Hope this works for you.

Steve[/FONT][/SIZE]

Right, I forgot that all my characters has to be stuffed in a character class … :-/

Okay, here we go:


function isStringValid($string) {
   $pattern = "~[*,;/\\?\\"\\':@#\\$%\\^&\\(\\)_=+!\\]`\\~\\\\\\\\!\\|&lt;&gt;\\.-]~";
   return 0 === preg_match($pattern, $string);
}

$string = 'Begrüßung@work';
var_dump(isStringValid($string)); // false

$string = '\\\\';
var_dump(isStringValid($string)); // false

$string = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 01234567890 Begrüßung';
var_dump(isStringValid($string)); // true

This is easier to read and less processing. Nice you got it working!

This

return 0 === preg_match($pattern, $string);

is better too :slight_smile:

Steve

Wouldn’t it just be simpler to use [1]+$ to accept letters, numbers and whitespace and just reject everything else. The characters you want to accept will always be far smaller than the thousands of characters you don’t want.


  1. \W\s ↩︎

Hi Fegall,
Does the '/[2]+$/include all letters including those that would be found in German and/or French? If so, this is even better.

Regards,
Steve


  1. \W\s ↩︎

  2. \W\s ↩︎

Hi,

1st, what is: [1]+$

2nd, I am waiting for answer to your question whether this will handle German Umlat type characters, and similar non-English French characters.


  1. \W\s ↩︎

  2. \W\s ↩︎

fel: Trick is he specified he was going to be accepting multiple foreign languages, precluding using a specific locale to identify word characters. I thought the same…

Incidentally, fel’s code will only work if the ENTIRE string is non-word characters. You want a one-and-done, so the ‘corrected’ string would be simply to match for ~\W~, which you’d then take the inverse-answer of to determine validity. (A “valid” string would NOT match.)
\W is “Any non-word character”. Word characters are defined by your locale settings.

Hi,

Just wanted to let you all know that after some going back and forth, that I chose this code:

function isStringValid($string)
{
$pattern = “~[*,;/\?\”\':@#\$%\^&\(\)_=+!\]`\~\\\\!\|<>\.-]~";
return 0 === preg_match($pattern, $string);

}

This offers the best compromise in allowing German, French, etc Words while stopping any
non-Word like chars.

Cheers :slight_smile:

Glad you found the one that worked best for you. Fegall’s way was nice but a little harder to understand for novice :smiley: so glad you went ScallioXTX’s as it was my preference too!

Regards,
Steve